• Network Site Reliability

    NVIDIA (Santa Clara, CA)
    …experience. + Minimum of 8 years of industry experience in network site reliability engineering, network automation, network operations, or related areas. ... for our network infrastructure. We are looking for an engineer who is passionate about the network and making...of the network infrastructure, ensuring its high availability and reliability . + Partnering with architecture and deployment teams to… more
    NVIDIA (07/26/25)
    - Related Jobs
  • Senior Site Reliability

    NVIDIA (Santa Clara, CA)
    Site Reliability Engineering (SRE) at NVIDIA is an engineering discipline to design, build and maintain large scale production systems with high efficiency and ... internal and external facing GPU cloud services run maximum reliability and uptime as promised to the users and...be doing: + Design, implement and support operational and reliability aspects of large scale Observability & Telemetry collection… more
    NVIDIA (08/02/25)
    - Related Jobs
  • Senior Site Reliability

    NVIDIA (Santa Clara, CA)
    Site Reliability Engineering (SRE) at NVIDIA is an engineering discipline to design, build and maintain large scale production systems with high efficiency and ... internal and external facing GPU cloud services run maximum reliability and uptime as promised to the users and...be doing: + Design, implement and support operational and reliability aspects of large scale Kubernetes clusters with focus… more
    NVIDIA (08/01/25)
    - Related Jobs
  • Principal Site Reliability

    NVIDIA (Santa Clara, CA)
    …etc.) and Infrastructure as Code (Terraform, CDK, Pulumi). + Proficiency in Site Reliability Engineering concepts like error budgets, SLOs, distributed tracing, ... change safety, and release velocity. + Define and evolve platform-wide reliability metrics, capacity forecasting strategies, and uncertainty testing approaches for… more
    NVIDIA (07/31/25)
    - Related Jobs
  • Site Reliability Engineer

    Celonis (Redwood City, CA)
    …and resilience of our platform. The team applies advanced software engineering and Site Reliability Engineering (SRE) principles to drive system reliability , ... for that, we need you to join us. **The Team** As a member of our Reliability Engineering team, you will play a critical role in ensuring the health, performance,… more
    Celonis (07/18/25)
    - Related Jobs
  • Staff Site Reliability

    ServiceNow, Inc. (San Diego, CA)
    It all started in sunny San Diego, California in 2004 when a visionary engineer , Fred Luddy, saw the potential to transform how we work. Fast forward to today - ... engineers who are tasked with maintaining and developing the reliability , scalability and performance of the ServiceNow cloud infrastructure....as a company and the SRE role. As an Engineer on the SRE team you will: + Provide… more
    ServiceNow, Inc. (07/15/25)
    - Related Jobs
  • Senior Site Reliability

    ServiceNow, Inc. (San Diego, CA)
    It all started in sunny San Diego, California in 2004 when a visionary engineer , Fred Luddy, saw the potential to transform how we work. Fast forward to today - ... engineers who are tasked with maintaining and developing the reliability , scalability and performance of the ServiceNow cloud infrastructure....as a company and the SRE role. **As an Engineer on the SRE team you will:** + Provide… more
    ServiceNow, Inc. (07/09/25)
    - Related Jobs
  • Senior Staff Site Reliability

    Palo Alto Networks (Santa Clara, CA)
    …team to influence the operability of the product and ensure the reliability and availability of our services **Your Experience** + DevOps/SRE Expertise: 5+ ... years of experience as a DevOps/SRE engineer with a passion for technology and a strong...passion for technology and a strong motivation for high reliability at the service level + Observability Tools: High… more
    Palo Alto Networks (07/15/25)
    - Related Jobs
  • Senior Site Reliability

    Rubrik (Palo Alto, CA)
    …and services with the objective of achieving and exceeding availability and reliability goals * Manage and streamline monitoring systems to enhance observability and ... visibility * Perform Production Readiness Assessments of new services to identify reliability needs and surface potential gaps * Develop and maintain documentation… more
    Rubrik (08/07/25)
    - Related Jobs
  • Senior Site Reliability

    LiveRamp (San Francisco, CA)
    …issues with Engineering teams** + **Setup and maintain Infrastructure & Product Reliability monitoring and alerting** + **Maintain and enhance CI/CD Tooling and ... Terraform scripts in support of the mission in close collaboration with DevOps team** + **Maintain and enhance Engineering Operational Documentation for supported products.** + **Provide expertise to build and maintain products operational documentation and… more
    LiveRamp (08/07/25)
    - Related Jobs