-
Site Reliability Engineer
- Insight Global (Alpharetta, GA)
-
Job Description
Insight Global is looking for a Site Reliability Engineer who’s excited to help us build, maintain, and scale our cloud-native infrastructure. In this role, you’ll work closely with our development and operations teams to keep systems reliable, efficient, and ready to grow. You’ll design and manage Azure cloud environments using Terraform and Terragrunt, optimize Kubernetes clusters on AKS, and streamline deployments through GitHub Actions and ArgoCD. A big part of the job is improving reliability with monitoring and observability tools like Grafana, automating repetitive tasks, and jumping in for on-call support and incident response when needed. You’ll also collaborate with developers to boost application performance, advocate for SRE best practices like SLIs and SLOs, and continuously look for ways to make our systems faster, more secure, and cost-effective. If you’re passionate about automation, scalability, and working in a fast-paced, collaborative environment, this is the role for you.
We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment regardless of their race, color, ethnicity, religion, sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military or uniformed service member status, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to [email protected] learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy: https://insightglobal.com/workforce-privacy-policy/.
Skills and Requirements
3+ years of experience in an SRE, DevOps, or cloud infrastructure role.
Strong experience with Azure cloud services and infrastructure.
Hands-on experience with java and Terraform and Terragrunt for infrastructure-as-code.
Proficiency with Kubernetes (preferably AKS), Databricks and container orchestration.
Experience with CI/CD tools, especially GitHub Workflows/Actions and ArgoCD.
Solid understanding of observability tools like Grafana (Prometheus, Loki, Tempo experience is a plus). Experience with Databricks and Datalakes
Masters Degree in Computer Science
-