• Senior Site Reliability Engineer

    NVIDIA (Santa Clara, CA)
    Site Reliability Engineering (SRE) at NVIDIA is an engineering discipline to design, build and maintain large scale production systems with high efficiency and ... internal and external facing GPU cloud services run maximum reliability and uptime as promised to the users and...be doing: + Design, implement and support operational and reliability aspects of large scale Kubernetes clusters with focus… more
    NVIDIA (08/01/25)
    - Related Jobs
  • Lead Speed and Reliability Engineer

    NVIDIA (Santa Clara, CA)
    …SOL quality and efficiency. The DFP team is looking for a Speed and Reliability Lead. You will be leading and crafting testability features related to Speed, Timing ... and Reliability from ground up as you help turbocharge NVIDIA's...bringup and tuning a plus, related to timing, speed, reliability and power. + Familiarity with STA timing closure,… more
    NVIDIA (05/29/25)
    - Related Jobs
  • Principal Staff Site Reliability

    NVIDIA (Santa Clara, CA)
    …NTP/PTP, DHCP, and LDAP. This includes building for performance and reliability at global scale, covering automation, monitoring, high availability, capacity ... architectures and identify opportunities for containerization to improve scalability, reliability , and efficiency. + Strong analytical skills with the ability… more
    NVIDIA (08/21/25)
    - Related Jobs
  • Senior Site Reliability Engineer

    Rubrik (Sacramento, CA)
    …and services with the objective of achieving and exceeding availability and reliability goals * Manage and streamline monitoring systems to enhance observability and ... visibility * Perform Production Readiness Assessments of new services to identify reliability needs and surface potential gaps * Develop and maintain documentation… more
    Rubrik (08/20/25)
    - Related Jobs
  • Reliability Engineer - COBRA 1

    Huntington Ingalls Industries (San Diego, CA)
    …Action System (FRACAS) to streamline corrective action processes using Reliability , Availability, Maintainability - Cost (RAM-C). Tracks parts consumption, maintains ... documentation, and deploys technical support to troubleshoot maintenance issues. Provides reports on repair action costs, particularly for high-cost scenarios, justifying the economic feasibility of corrective actions. Contributes to technology refreshment… more
    Huntington Ingalls Industries (08/15/25)
    - Related Jobs
  • Senior Site Reliability Engineer

    LiveRamp (San Francisco, CA)
    …issues with Engineering teams** + **Setup and maintain Infrastructure & Product Reliability monitoring and alerting** + **Maintain and enhance CI/CD Tooling and ... Terraform scripts in support of the mission in close collaboration with DevOps team** + **Maintain and enhance Engineering Operational Documentation for supported products.** + **Provide expertise to build and maintain products operational documentation and… more
    LiveRamp (08/07/25)
    - Related Jobs
  • Principal Site Reliability Engineer

    Palo Alto Networks (Santa Clara, CA)
    …automation, architecture, performance, observability, troubleshooting, security, and reliability . Our Infrastructure Platform stack includes Terraform, Kubernetes, ... GitLab CI/CD, GitOps, Prometheus, Grafana, Loki, Docker, GCP, Backstage, MySQL, PagerDuty, FireHydrant, Python, Bash, Java, NodeJS and Go. **Your Impact** + Design, build, and operate reliable, secure Cloud infrastructure across multi-cloud environments +… more
    Palo Alto Networks (07/31/25)
    - Related Jobs
  • Engineer II, RMS ( Reliability

    Safran (Carson, CA)
    Engineer II, RMS ( Reliability , Maintainability, Safety) Company : Safran Cabin Job field : Architecture and systems engineering Location : Carson , California , ... and Failure Modes and Effects Summary per MIL-STD-1629A and D6-56674. -Prepare Engineering Reliability Parts Prediction Count Reports (ERPPC) RMS Engineer II is… more
    Safran (06/19/25)
    - Related Jobs
  • Staff Site Reliability Engineer

    MongoDB (San Francisco, CA)
    …to build next-generation, AI-powered applications. We are looking for an experienced Staff Engineer for our SRE, InfraSec team, to guide the security of our ... cloud-based infrastructure. As a Staff SRE, you will be very hands-on technically while also mentoring a small team of SREs. The InfraSec team collaborates closely with other engineering teams to ensure that our infrastructure adheres to the highest security… more
    MongoDB (08/08/25)
    - Related Jobs
  • Site Reliability Engineer

    Insight Global (Santa Clara, CA)
    …Planning and Processes organization where you will be working as a Senior SRE Engineer . The position will be part of a fast-paced crew that develops and maintains ... sophisticated internal cloud provisioning products. The team works with various other business units such as Graphics Processors, Mobile Processors, Deep Learning, Artificial Intelligence and Driverless Cars to cater to their infrastructure & systems needs. As… more
    Insight Global (08/01/25)
    - Related Jobs