- Charles Schwab (San Francisco, CA)
- …and explore next-generation GenAI efforts that will redefine how we serve our clients. As a Senior Engineer on AI.x, you will play a key role in bringing these ... areas of technology today. As the Director of AI SRE & DevOps you will lead infrastructure and reliability...build towards Schwab's enterprise strategy. You will focus on production availability but also lead the strategy for building… more
- ServiceNow, Inc. (San Diego, CA)
- …**_The Federal SRE Team has 3 shifts that provide 24x7 production support for our Government Community Cloud infrastructure._** _Below are some highlights._ + ... if you want to understand more about ServiceNow as a company and the SRE role. **As an Engineer on the SRE team you will:** + Provide relief and sustainable… more
- NVIDIA (Santa Clara, CA)
- …and Terraform, with a proven track record of building and managing production infrastructure. + SRE -oriented mindset with extensive experience in diagnosing ... vehicles. We are now looking for a ML Platform Engineer to help accelerate the next era of machine...GPU systems. Join our top team and apply your SRE and software engineering skills to craft robust, user-friendly… more
- NVIDIA (Santa Clara, CA)
- Site Reliability Engineering ( SRE ) at NVIDIA is an engineering discipline to design, build and maintain large scale production systems with high efficiency and ... and deployment and open source cloud enabling technologies like Kubernetes and OpenStack. SRE at NVIDIA ensures that our internal and external facing GPU cloud… more
- NVIDIA (Santa Clara, CA)
- …automated, and secure production environments. We are seeking a deeply skilled Senior Staff Site Reliability Engineer ( SRE ) to advance our enterprise ... experience (or equivalent experience). + 10+ years of software engineering/DevOps/ SRE experience, with a significant focus on operational security, automation,… more
- NVIDIA (Santa Clara, CA)
- We are seeking a AI Infrastructure Engineer to integrate third-party infrastructure partners into NVIDIA's operational excellence programs. This cross-functional ... alignment. The ideal candidate should possess experience in delivering production infrastructure across various cloud providers, including hands-on experience in… more
- NVIDIA (Santa Clara, CA)
- …experience. + 10+ years operating large-scale production systems in roles such as SRE , Production Engineer , or Platform Engineer and 5+ years ... to model training clusters to real-time decision making. This isn't a typical SRE role, you'll help design and run NVIDIA's global telemetry backbone, the platform… more
- pony.ai (Fremont, CA)
- …globally. Pony.ai went public at NASDAQ in November 2024. Responsibilities As a ( Senior ) Kubernetes Engineer , you will: + Design, operate, and optimize ... security policies, and operational guidelines. + Contribute to observability and SRE practices to ensure reliability at scale (SLOs, incident reviews, metrics-driven… more
- Walmart (Sunnyvale, CA)
- …high-performance checkout services running in Edge and Cloud. As a Site Reliability Engineer in the CPC Team, you will work with L2, Other dependent Applications, ... you'll do ** + Incident triage, Escalation and Resolution: Triage site-impacting production issues by quantifying impact, severity and urgency, analyzing systems for… more
- Oracle (Sacramento, CA)
- …it a world class engineering center with the focus on excellence. As a Senior Principal Site Reliability DevOps Engineer , you will be responsible for defining ... person who loves a challenge? Solve the complex puzzles you've been dreaming of as our Engineer . If you have a passion for innovation in tech, we want you on our… more