- Charles Schwab (San Francisco, CA)
- …and explore next-generation GenAI efforts that will redefine how we serve our clients. As a Senior Engineer on AI.x, you will play a key role in bringing these ... areas of technology today. As the Director of AI SRE & DevOps you will lead infrastructure and reliability...build towards Schwab's enterprise strategy. You will focus on production availability but also lead the strategy for building… more
- ServiceNow, Inc. (San Diego, CA)
- …**_The Federal SRE Team has 3 shifts that provide 24x7 production support for our Government Community Cloud infrastructure._** _Below are some highlights._ + ... if you want to understand more about ServiceNow as a company and the SRE role. **As an Engineer on the SRE team you will:** + Provide relief and sustainable… more
- Google (Mountain View, CA)
- Senior Software Engineer , Site Reliability Engineering _corporate_fare_ Google _place_ Sunnyvale, CA, USA; Mountain View, CA, USA **Mid** Experience driving ... + Read acareer profile (https://careers.google.com/stories/site-reliability-engineering-profile-google/) about why a software engineer chose to join SRE . Behind everything… more
- NVIDIA (Santa Clara, CA)
- …and Terraform, with a proven track record of building and managing production infrastructure. + SRE -oriented mindset with extensive experience in diagnosing ... vehicles. We are now looking for a ML Platform Engineer to help accelerate the next era of machine...GPU systems. Join our top team and apply your SRE and software engineering skills to craft robust, user-friendly… more
- NVIDIA (Santa Clara, CA)
- …automated, and secure production environments. We are seeking a deeply skilled Senior Staff Site Reliability Engineer ( SRE ) to advance our enterprise ... experience (or equivalent experience). + 10+ years of software engineering/DevOps/ SRE experience, with a significant focus on operational security, automation,… more
- NVIDIA (Santa Clara, CA)
- Site Reliability Engineering ( SRE ) at NVIDIA is an engineering discipline to design, build and maintain large scale production systems with high efficiency and ... and deployment and open source cloud enabling technologies like Kubernetes and OpenStack. SRE at NVIDIA ensures that our internal and external facing GPU cloud… more
- NVIDIA (Santa Clara, CA)
- …experience. + 10+ years operating large-scale production systems in roles such as SRE , Production Engineer , or Platform Engineer and 5+ years ... to model training clusters to real-time decision making. This isn't a typical SRE role, you'll help design and run NVIDIA's global telemetry backbone, the platform… more
- pony.ai (Fremont, CA)
- …globally. Pony.ai went public at NASDAQ in November 2024. Responsibilities As a ( Senior ) Kubernetes Engineer , you will: + Design, operate, and optimize ... security policies, and operational guidelines. + Contribute to observability and SRE practices to ensure reliability at scale (SLOs, incident reviews, metrics-driven… more
- NVIDIA (Santa Clara, CA)
- We are seeking a AI Infrastructure Engineer to integrate third-party infrastructure partners into NVIDIA's operational excellence programs. This cross-functional ... alignment. The ideal candidate should possess experience in delivering production infrastructure across various cloud providers, including hands-on experience in… more
- Walmart (Sunnyvale, CA)
- …high-performance checkout services running in Edge and Cloud. As a Site Reliability Engineer in the CPC Team, you will work with L2, Other dependent Applications, ... you'll do ** + Incident triage, Escalation and Resolution: Triage site-impacting production issues by quantifying impact, severity and urgency, analyzing systems for… more