- Walmart (Sunnyvale, CA)
- …software engineering, or related area and3 years' experience in site reliability engineering, site and system administration, infrastructure management, or related ... area.Option 2: 5 years' experience in site reliability engineering, site and system administration, infrastructure management, or related area. **Preferred… more
- Palo Alto Networks (Santa Clara, CA)
- …and actionable insights into our systems' performance and health. **Your Impact** As a Senior SRE with the Cortex Cloud Security Posture Management team, you will: + ... influence the operability of the product and ensure the reliability and availability of our services **Your Experience** +...Clear understanding of incident and alerts management in Site Reliability Engineering + DevOps/SRE Expertise - 4+ years of… more
- NVIDIA (Santa Clara, CA)
- Site Reliability Engineering (SRE) at NVIDIA is an engineering discipline to design, build and maintain large scale production systems with high efficiency and ... internal and external facing GPU cloud services run maximum reliability and uptime as promised to the users and...be doing: + Design, implement and support operational and reliability aspects of large scale Kubernetes clusters with focus… more
- Coinbase (Sacramento, CA)
- …Q3 2023. *What you'll be doing (ie. job duties):* * Improve observability, reliability and availability by defining and measuring key metrics * Build automation and ... and automate incident response * Proactively find and analyze reliability problems across our business units and stack, then...and hold accountable the engineering team to improve the reliability of our systems and make reliability … more
- ServiceNow, Inc. (San Diego, CA)
- …technical engineers who are tasked with maintaining and developing the reliability , scalability and performance of the ServiceNow cloud infrastructure. Our SRE's ... repeatable issues. + Drive initiatives with partner teams to improve the reliability and performance of the infrastructure through improved system design. + Drive… more
- NVIDIA (Santa Clara, CA)
- …large-scale systems supporting critical use cases for AI Infrastructure, driving reliability , operability, and scalability across global public and private clouds. + ... + Build tools and frameworks to improve observability, define actionable reliability metrics, and enable fast issue resolution, driving continuous improvement in… more
- Broadridge Financial Solutions (El Dorado Hills, CA)
- …come join the Broadridge team. Broadridge is growing! We are seeking a Site Reliability Engineer to join our team. We are looking for someone responsible for design, ... the infrastructure, automation and the overall productivity of the SRE (Service Reliability Engineering) team. + Tracks Service Level Indicators (SLI) to ensure the… more
- Rubrik (Sacramento, CA)
- …and services with the objective of achieving and exceeding availability and reliability goals * Manage and streamline monitoring systems to enhance observability and ... visibility * Perform Production Readiness Assessments of new services to identify reliability needs and surface potential gaps * Develop and maintain documentation… more
- Coinbase (Sacramento, CA)
- …fully supported. Coinbase is hiring! We are looking for an experienced Site Reliability Engineer (SRE) to join the IT Operations Corporate Engineering team to build ... and scale our identity and access management tooling. A successful candidate will have demonstrated previous success in similar role(s) in rapidly growing, security-first environments. The right person is passionate about infrastructure as code, open source… more
- LiveRamp (San Francisco, CA)
- …issues with Engineering teams** + **Setup and maintain Infrastructure & Product Reliability monitoring and alerting** + **Maintain and enhance CI/CD Tooling and ... Terraform scripts in support of the mission in close collaboration with DevOps team** + **Maintain and enhance Engineering Operational Documentation for supported products.** + **Provide expertise to build and maintain products operational documentation and… more