- NVIDIA (Santa Clara, CA)
- …on the world. We are seeking a highly skilled and experienced Sire Reliability Engineer to design, deploy, and manage high speed storage offering in our large-scale ... and cost-effectiveness. + Continuously improve storage infrastructure provisioning, management, observability and day to day operation through automation. + Ensure… more
- Broadcom (Palo Alto, CA)
- …Account, please Sign-In before you apply.** **Job Description:** **Core Kubernetes Software Engineer - VMware Cloud Foundation** VMware by Broadcom, a leader in ... infrastructure, data center networking, and security, is seeking a Core Kubernetes Software Engineer to join our Common Platform Group in the VMware Cloud Foundation… more
- Palo Alto Networks (Santa Clara, CA)
- …behind innovative cybersecurity products, particularly at the intersection of digital experience monitoring and SASE. They aim to own the end-to-end development of ... features + Technical Ownership: Own the technical product area of Real User Monitoring (RUM) and provide analytics to monitor digital experience in Prisma Access +… more
- NVIDIA (Santa Clara, CA)
- …data systems like Ray, Spark Rapids + Familiarity with metrics collection, health monitoring , and observability tools + Building, operating and maintaining full ... ML platform for data scientists to use. As a data processing platform engineer , you will design, implement and operate Kubernetes based GPU accelerated data… more
- Coinbase (Sacramento, CA)
- …root cause analysis, and blameless retrospectives * Define metrics and bolster monitoring / observability across corporate IAM systems * Participate in regular ... supported. Coinbase is hiring! We are looking for an experienced Site Reliability Engineer (SRE) to join the IT Operations Corporate Engineering team to build and… more
- NVIDIA (Santa Clara, CA)
- …the buildout and integration of NCPs and CSPs into this marketplace. As a software engineer on the DGX Cloud Lepton Marketplace team, you'll play a key role in ... experience with kubernetes including cluster operations, operator development, node health monitoring and working with GPU resource scheduling. What you will be… more
- LiveRamp (San Francisco, CA)
- …with Engineering teams** + **Setup and maintain Infrastructure & Product Reliability monitoring and alerting** + **Maintain and enhance CI/CD Tooling and Terraform ... clouds (GCP or AWS)** + **Experience with deployment and monitoring of highly scalable products.** + **Hands on experience...+ **Experience with SRE best practices, working knowledge of observability principles is a big plus** + **Ability to… more
- Rubrik (Palo Alto, CA)
- …and exceeding availability and reliability goals * Manage and streamline monitoring systems to enhance observability and enable proactive identification ... of issues. * Coordinate and manage incidents, upgrades and changes for InfoSec's applications and services * Drive post-incident analysis with partner teams and/or vendors to identify root cause and ensure preventative measures are implemented promptly *… more
- LinkedIn (Mountain View, CA)
- …senior technical leader driving the long-term reliability and observability strategy across LinkedIn's infrastructureRe-architect LinkedIn's backend systems to ... for operational excellence and incident responseDefine and build frameworks to improve monitoring , alerting, and observability across hundreds of services and… more
- Lumen (Sacramento, CA)
- …connect the world and shape the future. **The Role** We are looking for a Senior Site Reliability Engineer (SRE)/ Platform Engineer / DevOps Engineer ... GitHub Actions for GitOps workflows and CI/CD pipelines. + Monitoring & Observability - Proficiency in Prometheus, Grafana, and incident management workflows.… more