• Senior Storage Production Engineer - DGX…

    NVIDIA (Santa Clara, CA)
    …spectrum of challenges. Practices such as proactive storage performance monitoring , automated fault detection and remediation, scalable data replication strategies, ... high availability, and data integrity. + Develop and maintain storage monitoring , logging, and alerting systems to ensure proactive detection and resolution… more
    NVIDIA (08/13/25)
    - Related Jobs
  • Engineer II

    IBM (San Jose, CA)
    …and decision-making in a collaborative environment * Familiarity with cloud monitoring tools to implement robust observability practices that prioritize ... service networking. We deliver the Infrastructure Cloud through an enterprise -grade unified SaaS platform, HCP, as well as to...have at least 3+ years of experience as an engineer * You have professional experience developing with modern… more
    IBM (09/30/25)
    - Related Jobs
  • Senior Storage and Networking Product…

    NVIDIA (Santa Clara, CA)
    …within AI, ML, and HPC. Joining our team as a Storage & Networking Product Engineer involves being part of a group that fosters the development of highly available, ... end-to-end performance across the full stack. + Develop automated systems for monitoring , recording, and notifying in storage and networking. + Build and maintain… more
    NVIDIA (09/25/25)
    - Related Jobs
  • Senior Software Development Engineer

    Zoom (San Jose, CA)
    …APIs with Spring MVC and related technologies. + Guiding performance tuning, monitoring , and observability of Spring-based services. + Collaborating with product ... expect You will work as a hands-on backend software engineer with system-level thinking, using Java to extend functionality,...out to build the best collaboration platform for the enterprise , and today help people communicate better with products… more
    Zoom (09/11/25)
    - Related Jobs
  • Sr. Engineer

    IBM (San Jose, CA)
    …of DevOps principles in a cloud environment. * Familiarity with cloud monitoring tools to implement robust observability practices that prioritize metrics, ... service networking. We deliver the Infrastructure Cloud through an enterprise -grade unified SaaS platform, [10] HCP, as well as...have at least 6+ years of experience as an engineer . * Expertise designing and building authorization systems (RBAC… more
    IBM (09/03/25)
    - Related Jobs
  • Principal Software Engineer

    NVIDIA (Santa Clara, CA)
    …make a lasting impact on the world. We are looking for a Principal Software Engineer to join our Software Infrastructure team in Santa Clara, CA. This role blends ... large-scale training and inference pipelines. + Build developer-focused tooling for monitoring , profiling, and debugging database performance in real time. +… more
    NVIDIA (08/15/25)
    - Related Jobs
  • Senior Data Processing Platform Engineer

    NVIDIA (Santa Clara, CA)
    …data systems like Ray, Spark Rapids + Familiarity with metrics collection, health monitoring , and observability tools + Building, operating and maintaining full ... ML platform for data scientists to use. As a data processing platform engineer , you will design, implement and operate Kubernetes based GPU accelerated data… more
    NVIDIA (08/09/25)
    - Related Jobs
  • Principal Staff Site Reliability Engineer

    NVIDIA (Santa Clara, CA)
    …building for performance and reliability at global scale, covering automation, monitoring , high availability, capacity planning, and lifecycle management. + Define ... optimizations (SR-IOV/ DPU) + Experience with Technologies like eBPF and XDP for Observability & DDoS mitigation + Collect and review system data for capacity and… more
    NVIDIA (08/21/25)
    - Related Jobs
  • Principal Hardware Engineer - Hardware…

    Cadence Design Systems, Inc. (San Jose, CA)
    …platform and processes to improve operations. Key Responsibilities: + Implement monitoring framework to improve infrastructure reliability, observability , and ... and EDA. + Demonstrated ability to operate and manage large-scale, enterprise -grade environments + Excellent communication skills, both written and verbal. +… more
    Cadence Design Systems, Inc. (07/10/25)
    - Related Jobs
  • Senior Site Reliability Engineer - FedRAMP

    Rubrik (Sacramento, CA)
    …and exceeding availability and reliability goals * Manage and streamline monitoring systems to enhance observability and enable proactive identification ... company, operates at the intersection of data protection, cyber resilience and enterprise AI acceleration. The Rubrik Security Cloud platform is designed to deliver… more
    Rubrik (08/20/25)
    - Related Jobs