- NVIDIA (Santa Clara, CA)
- …+ Design, implement and support operational and reliability aspects of large scale Observability & Telemetry collection platform with a focus on performance at ... Site Reliability Engineering ( SRE ) at NVIDIA is an engineering discipline to...Production + 8+ years experience delivering foundational infrastructure and observability platforms. + Experience in one or more of… more
- NVIDIA (Santa Clara, CA)
- … observability (DCGM, NVML, etc.) and integration into large‑scale telemetry systems. + Deep knowledge of AI/ML infrastructure, high‑performance computing (HPC), ... right in the center of this revolution. Resiliency and Observability are key to delivering customer value and exhilarating...GPU hardware, network, and software stack, along with the telemetry signals that reveal them, and how they correlate… more
- Amazon (San Francisco, CA)
- …to build innovative solutions for their most complex challenges. Today, AWS's observability services are critical for customers running modern applications at scale. ... The insights provided by AWS' full stack observability solutions help detect, investigate, and remediate problems faster, and coupled with AI and ML, proactively… more
- PagerDuty (San Francisco, CA)
- …experience. + Demonstrated fluency with data analysis or analytics products ( telemetry , observability , post-incident review). + Proven success shipping features ... in a flexible, award-winning workplace. PagerDuty is seeking a Senior Product Manager, Incident Analysis to join our talented,...products trusted by some of the world's top DevOps, SRE , and digital operations teams. The ideal candidate thrives… more
- NVIDIA (Santa Clara, CA)
- …infrastructure for bare metal provisioning, testing and bringup. + Knowledge of SRE principles ( observability , SLOs, logging, etc.). + Strong experience in ... We are looking for an outstanding architect for a Senior System Engineer role for system bringup and datacenter...such as TPM, TXT, and SecureBoot. + Exposure to telemetry catalog and observability stack. + Exposure… more
- Confluent (Sacramento, CA)
- …One Team. One Data Streaming Platform. **About the Role:** We are seeking a Senior Software Engineer II to architect, build, and operate services that are core to ... services (authentication, authorization, identity, secrets management, policy enforcement, security telemetry pipelines, etc.), while also ensuring these systems are… more
- NVIDIA (Santa Clara, CA)
- …external partners to facilitate product adoption + Track metrics and make telemetry based informed decisions with stakeholder alignment + Expand the visibility of ... communicate and collaborate with cross-functional teams such as Product, Research, SRE , security, sales, marketing, PLC, security teams. + Strong understanding of… more
- Nutanix (San Jose, CA)
- …systems, high availability, and multi-site replication design. + Experience with ** observability , telemetry , and AIOps** for large-scale platforms. Additional: + ... Proven ability to work across cross-functional engineering, product, and SRE teams. + Excellent system design documentation and architecture diagramming skills. +… more
- Oracle (Sacramento, CA)
- …OCI's edge. - Contribute to scalable data and control planes (policy, signaling, telemetry , orchestration) with a focus on resiliency and fault isolation. - Help ... policy) with OCI networking, DNS, and edge services under guidance from senior engineers. - Participate in operational readiness: support SLO/SLA tracking, on-call… more
- Humana (Sacramento, CA)
- …healthcare systems and compliance frameworks (HIPAA, HITRUST) + Proficiency with observability and telemetry platforms (eg, Splunk, DynaTrace, SolarWinds) and ... functions around Cloud compliance, metrics/reporting and cost optimization + Provide senior level expertise on decisions and priorities regarding the enterprises… more