-
Site Reliability Engineer
- iCIMS (Denver, CO)
-
Job Overview
We are seeking a skilled Engineer, Site Reliability (SRE) to contribute to the reliability, scalability, and performance of our multi-cloud SaaS platform serving thousands of customers worldwide. This role involves hands-on technical work in incident response, system monitoring, automation, and continuous improvement of our platform reliability. The successful candidate will work within a global SRE team to ensure optimal system performance and customer satisfaction.
About Us
When you join iCIMS, you join the team helping global companies transform business and the world through the power of talent. Our customers do amazing things: design rocket ships, create vaccines, deliver consumer goods globally, overnight, with a smile. As the Talent Cloud company, we empower these organizations to attract, engage, hire, and advance the right talent. We’re passionate about helping companies build a diverse, winning workforce and about building our home team. We're dedicated to fostering an inclusive, purpose-driven, and innovative work environment where everyone belongs.
Responsibilities
+ **System Monitoring & Reliability:**
+ Monitor multi-cloud infrastructure (AWS, Azure, GCP) using New Relic, Grafana, and Sumo Logic
+ Maintain reliability of AWS resources, Auth0/Okta authentication, databases, and legacy applications
+ Implement monitoring, alerting, and dashboards for assigned systems
+ **Incident Management & Response:**
+ Respond to alerts and incidents within SLA timeframes
+ Perform root cause analysis and document findings
+ Create and maintain runbooks and troubleshooting procedures
+ Participate in 24/7 on-call rotation
+ **Automation & Improvement:**
+ Develop scripts to reduce manual operational overhead
+ Build monitoring and alerting solutions
+ Support infrastructure-as-code initiatives
+ Implement automated remediation where possible
+ **Success Metrics:**
+ **Customer Impact** : Reduced MTTR and improved customer satisfaction scores
+ **Reliability** : Achievement of 99.9%+ uptime SLAs across all products and regions
+ **Proactive Prevention:** Reduction in incident frequency through automated detection and prevention
+ **Cross-functional Collaboration:** Improved partnership metrics with Product, Engineering, and Customer Success teams
+ **Automation Delivery:** Complete assigned automation projects to reduce manual tasks
+ **Knowledge Sharing:** Contribute to team knowledge base and mentor junior engineers
Qualifications
+ 4+ years experience in SRE, DevOps, or Infrastructure Engineering
+ Hands-on experience with AWS (required) and Azure (preferred)
+ Strong Linux system administration skills
+ Experience with monitoring tools (New Relic, Grafana, Prometheus)
+ Scripting skills in Python, Bash, or similar
+ Knowledge of databases (SQL Server, PostgreSQL, MongoDB)
Preferred
Technical Experience:
+ SaaS experience in a global environment
+ Authentication and identity management systems knowledge
+ Cloud certifications (AWS, Azure, or Google Cloud)
+ Infrastructure-as-code tools (Terraform, CloudFormation)
Education/Certifications/Licenses:
+ Bachelor’s degree in computer science, Engineering, Information Systems, or related technical field
+ Equivalent combination of education and experience will be considered
Working Conditions:
+ Global role requiring flexibility for incident response and team coordination across time zones
+ Occasional client-facing responsibilities during critical incidents
+ Travel may be required for team building
+ Hybrid work environment with team members distributed globally
EEO Statement
iCIMS is a place where everyone belongs. We celebrate diversity and are committed to creating an inclusive environment for all employees. Our approach helps us to build a winning team that represents a variety of backgrounds, perspectives, and abilities. So, regardless of how your diversity expresses itself, you can find a home here at iCIMS.
We are proud to be an equal opportunity and affirmative action employer. We prohibit discrimination and harassment of any kind based on race, color, religion, national origin, sex (including pregnancy), sexual orientation, gender identity, gender expression, age, veteran status, genetic information, disability, or other applicable legally protected characteristics. If you would like to request an accommodation due to a disability, please contact us at [email protected].
Compensation and Benefits
We accept applications for this position on an ongoing basis until the position is filled. Applications will be reviewed as they are received, and qualified candidates may be contacted throughout the posting period.
The anticipated base pay range for this position is $100,000-140,000.00 annually. Final compensation will be based on factors such as relevant experience, skills, education, internal equity, and market data. This range aligns with our commitment to equitable and transparent compensation practices, as required by applicable law.
Competitive health and wellness benefits include medical, dental, vision, 401(k), dependent care, short term and long-term disability, life and AD&D insurance, bonding and parental leave, mindfulness resources, an open vacation policy, sick days, paid holidays, quiet hours each workday, and tuition reimbursement. Benefits and eligibility may vary by location, role, and tenure. Learn more here: https://careers.icims.com/benefits
-
Recent Jobs
-
Site Reliability Engineer
- iCIMS (Denver, CO)
-
Clinical Specialist, Pain Interventions
- Medtronic (Norfolk, VA)
-
Principal Data Engineer - Information Security Strategy & Analytics
- AbbVie (North Chicago, IL)
-
Monitoring Engineer - Mid
- SAIC (Washington, DC)