-
Sr. Site Reliability Engineer
- Insight Global (Seattle, WA)
-
Job Description
An employer in the Pacific Northwest is seeking a highly skilled Senior Site Reliability Engineer (SRE). This role is critical in ensuring the reliability, scalability, and performance of our infrastructure and services. You will work on automation, infrastructure-as-code, and observability solutions while collaborating with cross-functional teams to deliver secure and efficient systems. You will design, implement, and maintain Infrastructure as Code (IaC) solutions using Ansible and Terraform for consistent and scalable deployments. You will develop automation scripts and tools in Python to streamline operational workflows, including system upgrades and configuration management. You will manage and optimize containerized environments using Kubernetes, ensuring high availability and resilience. You will drive automation for system upgrades and patching processes to reduce downtime and improve operational efficiency. You will collaborate on networking-focused projects, leveraging tools like NetBox and Infoblox for IP address management and network automation. You will support and enhance virtualization and cloud environments, including OpenStack and VMware, for hybrid infrastructure solutions. You will implement and maintain observability frameworks using Grafana and Prometheus to monitor system health and performance. You will partner with development and operations teams to define and track SLIs, SLOs, and KPIs, ensuring alignment with reliability goals.
We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment regardless of their race, color, ethnicity, religion, sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military or uniformed service member status, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to [email protected] learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy: https://insightglobal.com/workforce-privacy-policy/.
Skills and Requirements
• 8+ years of experience in networking concepts and experience with network automation tools.
• 6+ years of hands-on experience with NetBox or Infoblox for network resource management.
• Proficiency in Python for automation and scripting tasks.
• 6+ years of expertise in Kubernetes, OpenStack, and VMware environments.
• Familiarity with observability tools such as Grafana and Prometheus.
• Solid understanding of IaC principles and experience with Ansible and Terraform.
• Ability to work in a fast-paced environment and collaborate effectively across teams. Experience mentoring less experienced SREs
-