• Senior Site Reliability Engineer

    NVIDIA (Santa Clara, CA)
    …next wave of artificial intelligence. Join our team at NVIDIA as a Senior Site reliability engineer focused on HPC storage and play a crucial role in designing, ... deploying distributed storage solutions, build automation tools, and ensuring the efficient operations of our growing IT ecosystem. You will collaborate closely with… more
    NVIDIA (08/21/25)
    - Related Jobs
  • Sr. Service Reliability Engineer

    Proofpoint (Pittsburgh, PA)
    …cybersecurity. Protection Starts with People. Proofpoint. **The Role** As a Service Reliability Engineer at Proofpoint you will be responsible for provisioning, ... AWS. You will contribute to the architecture to improve scalability, service reliability , capacity, and performance. You will write automation code for provisioning… more
    Proofpoint (10/02/25)
    - Related Jobs
  • Principal Engineer Software - Site…

    Northrop Grumman (San Diego, CA)
    …making history. Northrop Grumman Aeronautics Systems has an opening for a **Principal Engineer Software - Site Reliability ** to join our team of qualified, ... The selected candidate will be working with modern technology to improve system reliability , using automation and monitoring tools to manage IT tasks, minimizing… more
    Northrop Grumman (09/19/25)
    - Related Jobs
  • Senior Site Reliability Engineer

    Toyota (Plano, TX)
    …**Who we're looking for** Toyota Financial Services is hiring a Senior Site Reliability Engineer to support and scale our enterprise integration platforms. ... platform-focused environments and is passionate about reducing operational toil, improving reliability , and enabling velocity for development teams. **What you'll be… more
    Toyota (07/30/25)
    - Related Jobs
  • Entry level 2026 - Site Reliability

    IBM (Tucson, AZ)
    …responsibilities** Your Role and Responsibilities As an Automation focused intern for Site Reliability Engineer , you will perform the following tasks: * Develop, ... Automation code for various procedures/runbooks defined for several Data Center Builds, Operations and Support related tasks. * Develop, Test, Deploy and Maintain… more
    IBM (10/08/25)
    - Related Jobs
  • Software Development Engineer III, Region…

    Amazon (Arlington, VA)
    Description Join AWS Region Reliability and help revolutionize how AWS operates at scale! We're building innovative solutions that redefine and optimize AWS ... operations across the company. Our team develops tools that...software development - Establish and improve operational practices including monitoring , telemetry, and incident management - Identify and resolve… more
    Amazon (08/14/25)
    - Related Jobs
  • Principal Site Reliability Engineer

    Palo Alto Networks (Santa Clara, CA)
    …the market leader in this space. We are seeking development heavy Site Reliability Engineers to design, build, maintain, and scale production services and server ... work follow using python or go code + Build BGP and networking monitoring / remediation tools + Engage with customers on escalations to provide remediation +… more
    Palo Alto Networks (09/25/25)
    - Related Jobs
  • Linux Site Reliability Engineer

    Nutanix (Albany, NY)
    …Are you a detail-oriented problem solver with a passion for optimizing cloud operations and a knack for writing efficient scripts? If so, you'll thrive in ... the Private Cloud team within the broader Global Cloud Operations (GCO) team, a dynamic assembly of over 70...events. **Your Role** + Ensure the 24/7 availability and reliability of Nutanix's cloud services and infrastructure. + Respond… more
    Nutanix (09/24/25)
    - Related Jobs
  • Principal Site Reliability Engineer

    Palo Alto Networks (Santa Clara, CA)
    …automation, architecture, performance, observability, troubleshooting, security, and reliability . Our Infrastructure Platform stack includes Terraform, Kubernetes, ... and automation frameworks** , championing **Infrastructure as Code (IaC)** and ** Monitoring as Code (MaC)** principles. + **Automate robust deployments** and… more
    Palo Alto Networks (10/07/25)
    - Related Jobs
  • Principal Site Reliability Engineer

    Palo Alto Networks (Santa Clara, CA)
    …automation, architecture, performance, observability, troubleshooting, security, and reliability . Our Infrastructure Platform stack includes Terraform, Kubernetes, ... tools and automation frameworks, championing Infrastructure as Code (IaC) and Monitoring as Code (MaC) principles + Automate robust deployments and orchestrate… more
    Palo Alto Networks (09/06/25)
    - Related Jobs