• Senior ML Storage Engineer - GPU…

    NVIDIA (Santa Clara, CA)
    …on the world. We are seeking a highly skilled and experienced Sire Reliability Engineer to design, deploy, and manage high speed storage offering in our large-scale ... and cost-effectiveness. + Continuously improve storage infrastructure provisioning, management, observability and day to day operation through automation. + Ensure… more
    NVIDIA (07/31/25)
    - Related Jobs
  • Senior Software Engineer

    Broadcom (Palo Alto, CA)
    …Account, please Sign-In before you apply.** **Job Description:** **Core Kubernetes Software Engineer - VMware Cloud Foundation** VMware by Broadcom, a leader in ... infrastructure, data center networking, and security, is seeking a Core Kubernetes Software Engineer to join our Common Platform Group in the VMware Cloud Foundation… more
    Broadcom (06/30/25)
    - Related Jobs
  • Senior Technical Marketing Engineer

    Palo Alto Networks (Santa Clara, CA)
    …behind innovative cybersecurity products, particularly at the intersection of digital experience monitoring and SASE. They aim to own the end-to-end development of ... features + Technical Ownership: Own the technical product area of Real User Monitoring (RUM) and provide analytics to monitor digital experience in Prisma Access +… more
    Palo Alto Networks (07/06/25)
    - Related Jobs
  • Senior Data Processing Platform…

    NVIDIA (Santa Clara, CA)
    …data systems like Ray, Spark Rapids + Familiarity with metrics collection, health monitoring , and observability tools + Building, operating and maintaining full ... ML platform for data scientists to use. As a data processing platform engineer , you will design, implement and operate Kubernetes based GPU accelerated data… more
    NVIDIA (08/09/25)
    - Related Jobs
  • Senior Site Reliability Engineer

    Coinbase (Sacramento, CA)
    …root cause analysis, and blameless retrospectives * Define metrics and bolster monitoring / observability across corporate IAM systems * Participate in regular ... supported. Coinbase is hiring! We are looking for an experienced Site Reliability Engineer (SRE) to join the IT Operations Corporate Engineering team to build and… more
    Coinbase (08/09/25)
    - Related Jobs
  • Senior Software Engineer , DGX Cloud…

    NVIDIA (Santa Clara, CA)
    …the buildout and integration of NCPs and CSPs into this marketplace. As a software engineer on the DGX Cloud Lepton Marketplace team, you'll play a key role in ... experience with kubernetes including cluster operations, operator development, node health monitoring and working with GPU resource scheduling. What you will be… more
    NVIDIA (07/24/25)
    - Related Jobs
  • Senior Site Reliability Engineer

    LiveRamp (San Francisco, CA)
    …with Engineering teams** + **Setup and maintain Infrastructure & Product Reliability monitoring and alerting** + **Maintain and enhance CI/CD Tooling and Terraform ... clouds (GCP or AWS)** + **Experience with deployment and monitoring of highly scalable products.** + **Hands on experience...+ **Experience with SRE best practices, working knowledge of observability principles is a big plus** + **Ability to… more
    LiveRamp (08/07/25)
    - Related Jobs
  • Senior Site Reliability Engineer

    Rubrik (Palo Alto, CA)
    …and exceeding availability and reliability goals * Manage and streamline monitoring systems to enhance observability and enable proactive identification ... of issues. * Coordinate and manage incidents, upgrades and changes for InfoSec's applications and services * Drive post-incident analysis with partner teams and/or vendors to identify root cause and ensure preventative measures are implemented promptly *… more
    Rubrik (08/07/25)
    - Related Jobs
  • Distinguished Software Engineer

    LinkedIn (Mountain View, CA)
    senior technical leader driving the long-term reliability and observability strategy across LinkedIn's infrastructureRe-architect LinkedIn's backend systems to ... for operational excellence and incident responseDefine and build frameworks to improve monitoring , alerting, and observability across hundreds of services and… more
    LinkedIn (06/04/25)
    - Related Jobs
  • Principal Site Reliability Engineer

    Lumen (Sacramento, CA)
    …connect the world and shape the future. **The Role** We are looking for a Senior Site Reliability Engineer (SRE)/ Platform Engineer / DevOps Engineer ... GitHub Actions for GitOps workflows and CI/CD pipelines. + Monitoring & Observability - Proficiency in Prometheus, Grafana, and incident management workflows.… more
    Lumen (08/08/25)
    - Related Jobs