• Director of AI SRE & DevOps, AI.x

    Charles Schwab (San Francisco, CA)
    …and explore next-generation GenAI efforts that will redefine how we serve our clients. As a Senior Engineer on AI.x, you will play a key role in bringing these ... areas of technology today. As the Director of AI SRE & DevOps you will lead infrastructure and reliability...build towards Schwab's enterprise strategy. You will focus on production availability but also lead the strategy for building… more
    Charles Schwab (12/06/25)
    - Related Jobs
  • Senior Site Reliability Engineer

    ServiceNow, Inc. (San Diego, CA)
    …**_The Federal SRE Team has 3 shifts that provide 24x7 production support for our Government Community Cloud infrastructure._** _Below are some highlights._ + ... if you want to understand more about ServiceNow as a company and the SRE role. **As an Engineer on the SRE team you will:** + Provide relief and sustainable… more
    ServiceNow, Inc. (11/18/25)
    - Related Jobs
  • Senior Software Engineer , Site…

    Google (Mountain View, CA)
    Senior Software Engineer , Site Reliability Engineering _corporate_fare_ Google _place_ Sunnyvale, CA, USA; Mountain View, CA, USA **Mid** Experience driving ... + Read acareer profile (https://careers.google.com/stories/site-reliability-engineering-profile-google/) about why a software engineer chose to join SRE . Behind everything… more
    Google (11/29/25)
    - Related Jobs
  • Senior ML Platform Engineer - Lepton

    NVIDIA (Santa Clara, CA)
    …and Terraform, with a proven track record of building and managing production infrastructure. + SRE -oriented mindset with extensive experience in diagnosing ... vehicles. We are now looking for a ML Platform Engineer to help accelerate the next era of machine...GPU systems. Join our top team and apply your SRE and software engineering skills to craft robust, user-friendly… more
    NVIDIA (11/04/25)
    - Related Jobs
  • Senior Staff Site Reliability…

    NVIDIA (Santa Clara, CA)
    …automated, and secure production environments. We are seeking a deeply skilled Senior Staff Site Reliability Engineer ( SRE ) to advance our enterprise ... experience (or equivalent experience). + 10+ years of software engineering/DevOps/ SRE experience, with a significant focus on operational security, automation,… more
    NVIDIA (09/30/25)
    - Related Jobs
  • Senior Site Reliability Engineer

    NVIDIA (Santa Clara, CA)
    Site Reliability Engineering ( SRE ) at NVIDIA is an engineering discipline to design, build and maintain large scale production systems with high efficiency and ... and deployment and open source cloud enabling technologies like Kubernetes and OpenStack. SRE at NVIDIA ensures that our internal and external facing GPU cloud… more
    NVIDIA (11/05/25)
    - Related Jobs
  • Senior Site Reliability Engineer

    NVIDIA (Santa Clara, CA)
    …experience. + 10+ years operating large-scale production systems in roles such as SRE , Production Engineer , or Platform Engineer and 5+ years ... to model training clusters to real-time decision making. This isn't a typical SRE role, you'll help design and run NVIDIA's global telemetry backbone, the platform… more
    NVIDIA (12/06/25)
    - Related Jobs
  • ( Senior ) Software Engineer

    pony.ai (Fremont, CA)
    …globally. Pony.ai went public at NASDAQ in November 2024. Responsibilities As a ( Senior ) Kubernetes Engineer , you will: + Design, operate, and optimize ... security policies, and operational guidelines. + Contribute to observability and SRE practices to ensure reliability at scale (SLOs, incident reviews, metrics-driven… more
    pony.ai (09/16/25)
    - Related Jobs
  • Senior AI Infrastructure Engineer

    NVIDIA (Santa Clara, CA)
    We are seeking a AI Infrastructure Engineer to integrate third-party infrastructure partners into NVIDIA's operational excellence programs. This cross-functional ... alignment. The ideal candidate should possess experience in delivering production infrastructure across various cloud providers, including hands-on experience in… more
    NVIDIA (10/24/25)
    - Related Jobs
  • Senior , Software Engineer

    Walmart (Sunnyvale, CA)
    …high-performance checkout services running in Edge and Cloud. As a Site Reliability Engineer in the CPC Team, you will work with L2, Other dependent Applications, ... you'll do ** + Incident triage, Escalation and Resolution: Triage site-impacting production issues by quantifying impact, severity and urgency, analyzing systems for… more
    Walmart (11/14/25)
    - Related Jobs