-
REMOTE Site Reliability Engineer (SRE)
- Insight Global (Brookfield, WI)
-
Job Description
We’re looking for a REMOTE SRE who has a software engineering background—someone who can drop into ongoing projects, quickly mesh with cross-functional teams, and drive reliability outcomes with strong procedural and systems thinking. This is a backfill aimed at stabilizing and improving production systems and delivery practices. You’ll focus on SaaS services, reliability engineering, observability, and pragmatic automation. The right person writes clean, tested code, reasons about distributed systems, and applies software engineering discipline to operational problems.
Key Responsibilities:
• Embed with product and platform teams to own reliability for key services; come in and “run with” active projects.
• Define and drive SLOs/SLAs/SLIs; implement actionable alerting and dashboards (primary: Datadog).
• Automate reliability work (deployment, scaling, failover, incident workflows) using code-first approaches.
• Author infrastructure as code (primarily Terraform) and collaborate on Docker/Kubernetes workflows.
• Instrument services (.NET primary stack; Python/Rust for tooling; Java is a plus) for observability and performance.
• Own incidents end-to-end: triage, root cause, postmortems, and preventative engineering.
• Apply systems thinking to reduce complexity, improve resilience, and increase change velocity safely.
• Partner with security and cloud teams on guardrails, least-privilege, and cross-cloud considerations.
• Write stories and technical docs that clarify problems, solutions, and acceptance criteria.
• Continuously improve reliability patterns, runbooks, and automation pipelines.
We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment regardless of their race, color, ethnicity, religion, sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military or uniformed service member status, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to [email protected] learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy: https://insightglobal.com/workforce-privacy-policy/.
Skills and Requirements
• Proven SRE experience (3+ years minimum at mid-staff level) owning reliability for production systems.
• Software engineering background with strong procedural thinking; you’ve shipped production code.
• Proficient in scripting languages such as Python, Bash, or similar
• .NET expertise as the primary skillset (services, APIs, performance, instrumentation).
• Datadog hands-on experience (dashboards, monitors, logs, APM, alerting).
• AWS foundational knowledge (you don’t need a pro cert; you can reason about core services and IAM).
• Infrastructure as Code with Terraform (modules, state, environments).
• Practical knowledge of Docker and Kubernetes (how it works, how to debug and operate).
• Familiarity with SQL/Postgres (querying, performance basics). • Continued education and/or advanced degree(s) in Computer Science, Information Technology, or a related field
• AWS certifications (such as AWS Certified Solutions Architect, AWS Certified Database - Specialty, or AWS Certified Security - Specialty)
• Ability to understand and refactor complex legacy software
• Experience in environments subject to HIPAA and/or PCI regulations
• Professional experience with project lifecycle planning such as Agile/Scrum
• Comfortable with Atlassian software suite (Jira, Confluence, and OpsGenie)
• Experience with Rust
• AWS Glue
• AWS Neptune or other AWS purpose-built databases
-
Recent Searches
- Service Manager Fire Protection (Irving, TX)
- Secret Test Automation Engineer (Virginia)
- Staff Machine Learning Engineer (Framingham, MA)
- Engineering Coop Spring Summer (United States)
Recent Jobs
-
REMOTE Site Reliability Engineer (SRE)
- Insight Global (Brookfield, WI)
-
(USA) Area Manager - Transportation Office (Transportation)
- Walmart (Pauls Valley, OK)