-
Intl - EU - Senior Site Reliability Engineer
- Insight Global (Novato, CA)
-
Job Description
Systems Design, Scaling & Resilience
Build and operate distributed Unix-based systems (Ubuntu, Debian, Red Hat, CentOS).
Implement auto-scaling and self-healing infrastructure.
Tune kernel, filesystems, and networking parameters.
Ensure timely security patching and compliance.
Integrate Linux systems with enterprise auth services (AD, LDAP, Kerberos).
Automation & Infrastructure as Code
Design and maintain automation tools (Terraform, Ansible, Pulumi).
Automate configuration, service rollout, and patching.
Develop backend automation in Python, Go, or Ruby.
Extend platform automation APIs and workflows.
Observability, Monitoring & Incident Response
Develop observability pipelines (Datadog, Grafana, open-source tools).
Create service-level dashboards and alerts.
Participate in 24/7 on-call rotation and incident management.
Conduct post-mortems and root cause analysis.
Multi-Cloud Platform Engineering
Manage systems across AWS, GCP, and on-prem platforms.
Architect high-availability systems with multi-region failover.
Implement backup, recovery, and DR workflows.
Support hybrid environments (VMware/vSphere, container-based platforms).
Collaboration, Standards & Enablement
Work closely with backend and DevOps teams.
Contribute to system reliability standards and documentation.
Mentor engineers on Unix system performance and debugging.
We are a company committed to creating inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity employer that believes everyone matters. Qualified candidates will receive consideration for employment opportunities without regard to race, religion, sex, age, marital status, national origin, sexual orientation, citizenship status, disability, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to [email protected] . The EEOC "Know Your Rights" Poster is available here (https://www.eeoc.gov/sites/default/files/2023-06/22-088\_EEOC\_KnowYourRights6.12ScreenRdr.pdf) .
To learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy: https://insightglobal.com/workforce-privacy-policy/ .
Skills and Requirements
Required Skills & Experience
7+ years in SRE, Infrastructure, or Systems Engineering roles.
Extensive experience with Unix/Linux systems.
Strong debugging and optimization skills.
Experience with AWS and/or GCP.
Strong programming skills in Python and shell scripting.
Deep understanding of CI/CD workflows and GitOps practices.
Expertise with Terraform, Ansible, or similar IaC tools.
Experience with hybrid infrastructure (cloud/on-prem).
Hands-on experience with observability tools.
Ability to troubleshoot complex reliability issues. Nice to Have
Experience with live game infrastructure.
Contributions to open-source tooling.
Familiarity with telemetry systems (ETL, Flink/Zookeeper, Kinesis).
Familiarity with service mesh (Linkerd, Istio) and Kubernetes-native architecture.
Experience using Datadog for monitoring and visualization.
Experience with MySQL/Postgres in RDS and bare metal installations. null
We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal employment opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment without regard to race, color, ethnicity, religion,sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military oruniformed service member status, or any other status or characteristic protected by applicable laws, regulations, andordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request to [email protected].
-
Recent Searches
- Full Stack Senior Software (Connecticut)
- Physician Coding Cardiology Complex (Arkansas)
- zoning system (United States)
- zoning shift (United States)
Recent Jobs
-
Intl - EU - Senior Site Reliability Engineer
- Insight Global (Novato, CA)
-
Senior Engineer, Software Engineering
- RTX Corporation (Cedar Rapids, IA)
-
Hardware Design Engineer
- Assertive Professionals (Hanover, MD)
-
Senior Software Engineer UI/UX (d/w/m)
- Leica Microsystems (Miami, FL)