• AI and ML HPC Cluster Engineer

    NVIDIA (Santa Clara, CA)
    …resource utilization. + Directly administer internal research clusters, conduct upgrades, incident response , and reliability improvements. + Develop and improve ... most advanced computing workloads. NVIDIA is looking for an AI/ML HPC Cluster Engineer to join our MARS team. You will provide technical engagement and problem… more
    NVIDIA (01/03/26)
    - Related Jobs
  • Senior CloudOps Engineer

    Insight Global (South San Francisco, CA)
    …Prometheus, and Grafana. * Background in incident response and post-mortem strategy. * Prior involvement in cloud governance and policy implementation. ... Job Description We are seeking a hands-on Senior CloudOps Engineer to lead the execution of our AWS ...Engineer to lead the execution of our AWS cloud operations strategy. This role is critical to driving… more
    Insight Global (12/19/25)
    - Related Jobs
  • Staff Systems Software Engineer

    General Motors (Mountain View, CA)
    …you run it" culture from initial design through deployment, monitoring, and production incident response . **What Will Give You a Competitive Edge (Preferred ... Role** The Infrastructure Engineering organisation at GM is building a cloud -native platform that transforms how developers interact with automotive test hardware.… more
    General Motors (12/03/25)
    - Related Jobs
  • Software Engineer III

    Robert Half-Robert Half Corporate (San Ramon, CA)
    …and resolution of moderate to complex issues in production platforms, defining incident response approaches and resolution playbooks. + Provides Level III ... **Who We Are** Robert Half is seeking a Senior Software Engineer III - ATI to join our team supporting the underlying infrastructure, platforms, and services that… more
    Robert Half-Robert Half Corporate (12/02/25)
    - Related Jobs
  • Sr Principal Hardware Security Engineer

    Oracle (Santa Clara, CA)
    …supply chain + Establish and/or participate (as needed) in PSIRT (Product Security Incident Response Team) relationships with key Oracle hardware suppliers and ... the next generation of Oracle hardware that underlies all of Oracle's Cloud and Enterprise platform offerings. These systems utilize leading edge technology to… more
    Oracle (11/25/25)
    - Related Jobs
  • Senior AI Site Reliability Engineer

    Charles Schwab (San Francisco, CA)
    … with infrastructure as code. + Experience implementing monitoring, alerting, and incident response for large-scale distributed systems. + Proven track record ... serve our clients. As a Senior AI Site Reliability Engineer on AI.x, you will play a key role...datasets. + 3+ years of experience with containers and cloud -native applications, and the ability to operationalize them in… more
    Charles Schwab (12/25/25)
    - Related Jobs
  • Senior DevOps Software Engineer

    KBR (El Segundo, CA)
    …(AWS EKS, Azure AKS). + Help with vulnerability assessments, security monitoring, and incident response automation. + Work with developers to implement secure ... Title: Senior DevOps Software Engineer Belong. Connect. Grow. with KBR! KBR's National...the instantiation of DevOps pipelines in both on-prem and cloud environments. Work Environment: + Location: On-site + Travel… more
    KBR (11/25/25)
    - Related Jobs
  • Software Engineer , Reliability Platform

    DoorDash (San Francisco, CA)
    …them away for other engineers. You understand concepts like SLOs, error budgets, and incident response though this is a platform development team, not an ... Control - building the systems engineers use to provision services, request cloud resources, and safely make config changes across traffic, compute, and secrets… more
    DoorDash (01/02/26)
    - Related Jobs
  • AI Security Engineer , Manager

    Deloitte (Costa Mesa, CA)
    …artifacts + Manage audit trails and automated compliance checks + Implement AI-specific incident response and develop regulatory disclosure playbooks + Manage AI ... Work you'll do As a Deloitte Manager, AI Security Engineer , you will be crucial in safeguarding our advanced...week. + 5+ years of experience in cybersecurity (application, cloud and data security) with strong proficiency in security… more
    Deloitte (10/22/25)
    - Related Jobs
  • Senior Full Stack Engineer

    Planetart (Calabasas, CA)
    …issues in production and development environments, with a focus on urgent incident response . + AWS Infrastructure Management: Deploy, configure, and manage ... well as in Europe. Job Overview PlanetArt is seeking a Senior Full-Stack Engineer to support the company's technology initiatives and lead its engineering efforts.… more
    Planetart (12/11/25)
    - Related Jobs