• Senior Manager, Network Site Reliability…

    NVIDIA (Santa Clara, CA)
    …a secure operational environment. + Lead initiatives to improve network observability by integrating advanced monitoring and alerting systems, collaborating ... GeForce Now is looking for a Manager, Network Site Reliability Engineer (SRE) to enhance our network infrastructure and operations. We are looking for a leader who… more
    NVIDIA (08/08/25)
    - Related Jobs
  • Senior Manager, Developer Relations…

    NVIDIA (Santa Clara, CA)
    …Kubernetes, Docker, Kubeflow), along with agentic architectures (eg, MCP, LangGraph) and observability tools for monitoring autonomous systems. + Track record of ... NVIDIA is seeking a highly technical Senior Director to lead Developer Relations Managers to...assurance, and post-release support. + Experience (as a software engineer or technical product manager) in one or more… more
    NVIDIA (09/17/25)
    - Related Jobs
  • Senior Developer Relations Manager…

    NVIDIA (Santa Clara, CA)
    …Kubernetes, Docker, Kubeflow), along with agentic architectures (eg, MCP, LangGraph) and observability tools for monitoring complex, autonomous systems. + Track ... NVIDIA is seeking a highly technical Senior Developer Relations Manager to drive adoption of...assurance, and post-release support. + Experience (as a software engineer or technical product manager) in one or more… more
    NVIDIA (08/20/25)
    - Related Jobs
  • AVP, Technology Operations

    PennyMac (Westlake Village, CA)
    …maintaining service level agreements (SLAs) that meet or exceed business requirements. + Monitoring & Observability - Lead the development and implementation of ... Site Reliability Operations Engineers across all levels (1,2,3, & Senior ). Foster a culture of excellence, collaboration, and continuous...comprehensive monitoring and observability practices using New Relic… more
    PennyMac (08/07/25)
    - Related Jobs
  • Cloud & Platform Engineering Architect

    Rubrik (Palo Alto, CA)
    …with **cloud cost management tools and strategies** . + Familiarity with monitoring , logging, and observability tools (eg, Sentinel, Prometheus, Grafana, ELK ... a culture of excellence. + **Incident Response & Troubleshooting:** Provide senior -level support for cloud-related incidents, performing root cause analysis and… more
    Rubrik (08/14/25)
    - Related Jobs