- Cornerstone onDemand (Dublin, CA)
- We are seeking a highly skilled Site Reliability Engineer with 3 years of experience to join our dynamic team. The ideal candidate will have a strong background ... designing, implementing, and managing cloud-based solutions. As a Site Reliability Engineer , you will play a key...+ Maintain operational run book procedures for all production systems and document the knowledge base. + Administer incident… more
- NVIDIA (Santa Clara, CA)
- …health + Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity + ... Site Reliability Engineering (SRE) at NVIDIA is an engineering...discipline to design, build and maintain large scale production systems with high efficiency and availability using the combination… more
- NVIDIA (Santa Clara, CA)
- …health. + Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity + ... Site Reliability Engineering (SRE) at NVIDIA is an engineering...discipline to design, build and maintain large scale production systems with high efficiency and availability using the combination… more
- NVIDIA (Santa Clara, CA)
- …on the world. NVIDIA is looking to hire a deeply technical and creative Site Reliability Engineer to build, support and maintain the next generation AI powered ... challenges, automate processes, and iterate for efficiency + Tackle systemic reliability issues with multi-functional teams. + Monitor, optimize, and manage system… more
- PennyMac (Westlake Village, CA)
- …quickly and accurately, is critical to the success of anyone in this role. The Engineer III, Site Reliability Operations will: + Monitoring - Oversee 24/7 health ... A Typical Day As a member of the Site Reliability Operations (SRO) team, you will help provide 24/7...timely and accurate resolution of service disruptions + Advanced Systems Administration - Perform and troubleshoot a wide range… more
- Palo Alto Networks (Santa Clara, CA)
- …including the design, implementation, and continuous enhancement of our comprehensive observability systems . To meet the opportunities that such a role provides, you ... to develop innovative solutions that provide clear and actionable insights into our systems ' performance and health. **Your Impact** As a Principal SRE with the… more
- Palo Alto Networks (Santa Clara, CA)
- …a large hybrid infrastructure and is one of the largest GCP customers. As a Site Reliability Engineer , you will be part of a team supporting the services running ... This includes automation, architecture, performance, metrics, troubleshooting, security, and reliability . Our stack includes Kubernetes, Docker, GCP, AWS, Ansible,… more
- DoorDash (San Francisco, CA)
- …next 10B even better. About the Role We are seeking a highly motivated Senor/Staff Reliability Engineer to join our team. This individual will play a key role ... engineering discipline. + Experience facilitating DFMEAs for complex components and/or systems . + Experience developing risk assessments based on Weibull, fault… more
- LinkedIn (Mountain View, CA)
- … and troubleshooting production systems at scale. Suggested Skills: . Distributed Systems . Technical Leadership . Infrastructure Reliability . Systems ... passion for distributed technologies and algorithms, API design and systems design, and your passion for writing code that...impact within our company. As a Sr. Staff Software Engineer , you will be a key technical leader and… more
- NVIDIA (Santa Clara, CA)
- …Will Be Doing: + Architect, lead, and scale globally distributed production systems supporting AI/ML, HPC, and critical engineering platforms across hybrid and ... change safety, and release velocity. + Define and evolve platform-wide reliability metrics, capacity forecasting strategies, and uncertainty testing approaches for… more