- NVIDIA (Santa Clara, CA)
- …a secure operational environment. + Lead initiatives to improve network observability by integrating advanced monitoring and alerting systems, collaborating ... GeForce Now is looking for a Manager, Network Site Reliability Engineer (SRE) to enhance our network infrastructure and operations. We are looking for a leader who… more
- NVIDIA (Santa Clara, CA)
- …Kubernetes, Docker, Kubeflow), along with agentic architectures (eg, MCP, LangGraph) and observability tools for monitoring autonomous systems. + Track record of ... NVIDIA is seeking a highly technical Senior Director to lead Developer Relations Managers to...assurance, and post-release support. + Experience (as a software engineer or technical product manager) in one or more… more
- NVIDIA (Santa Clara, CA)
- …Kubernetes, Docker, Kubeflow), along with agentic architectures (eg, MCP, LangGraph) and observability tools for monitoring complex, autonomous systems. + Track ... NVIDIA is seeking a highly technical Senior Developer Relations Manager to drive adoption of...assurance, and post-release support. + Experience (as a software engineer or technical product manager) in one or more… more
- PennyMac (Westlake Village, CA)
- …maintaining service level agreements (SLAs) that meet or exceed business requirements. + Monitoring & Observability - Lead the development and implementation of ... Site Reliability Operations Engineers across all levels (1,2,3, & Senior ). Foster a culture of excellence, collaboration, and continuous...comprehensive monitoring and observability practices using New Relic… more
- Rubrik (Palo Alto, CA)
- …with **cloud cost management tools and strategies** . + Familiarity with monitoring , logging, and observability tools (eg, Sentinel, Prometheus, Grafana, ELK ... a culture of excellence. + **Incident Response & Troubleshooting:** Provide senior -level support for cloud-related incidents, performing root cause analysis and… more