-
Site Reliability Engineer
- Insight Global (Plymouth, PA)
-
Job Description
We’re looking for a Site Reliability Engineer to help keep our applications running smoothly, reliably, and efficiently. You’ll work behind the scenes to monitor performance, automate operations, and support large-scale systems across cloud platforms like AWS and GCP. If you enjoy solving problems, improving systems, and working with modern tools like Terraform, Kubernetes, and Dynatrace, this role is for you.
This is a hybrid role 4 days onsite. Location options are Boca Raton, FL, Blue Bell, PA or Irving, TX.
We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment regardless of their race, color, ethnicity, religion, sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military or uniformed service member status, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to [email protected] learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy: https://insightglobal.com/workforce-privacy-policy/.
Skills and Requirements
5+ years of experience in Site Reliability Engineering or similar roles.
Experience supporting production applications
Strong background in monitoring, incident response, troubleshoot issues, lead post-mortem analysis to prevent future problems, and system performance tuning.
Ability to automate infrastructure and operations using Terraform, Ansible, and scripting languages like Python or Java.
Manage and optimize Kubernetes clusters for scalable deployments.
Support cloud infrastructure across AWS and GCP, ensuring systems are secure, stable, and cost-efficient.
Experience managing distributed systems and dynamic cloud infrastructure.
Comfortable working in complex environments and solving ambiguous problems.
Exposure to post-mortem analysis and reliability-focused engineering practices.
Strong communication skills and ability to work cross-functionally. Experience with Dynatrace, Prometheus, or similar observability tools.
Familiarity with CI/CD pipelines and automation best practices.
-
Recent Jobs
-
Site Reliability Engineer
- Insight Global (Plymouth, PA)
-
Senior Industrial Engineer - Remote with Travel
- Perdue Farms, Inc. (Charlotte, NC)