- Coinbase (Sacramento, CA)
- …Kubernetes at Coinbase, working closely with the Routing, Security, Reliability, and Observability teams (among many others). *What you'll be doing (ie. job ... duties):* * Build tooling and automation to make management of our Kubernetes clusters easy...Kubernetes clusters easy and reliable. * Build tooling and automation to improve the developer and operational experience of… more
- LinkedIn (Mountain View, CA)
- …to enhance traffic distribution, load balancing, and fault tolerance. + Drive automation , observability , and fault tolerance initiatives, reducing downtime and ... improving MTTR (Mean Time to Recovery). + Analyze network traffic telemetry to optimize load balancing, manage traffic spikes, and plan for future capacity needs. + Establish and monitor key performance metrics, SLAs, and SLOs for DNS and traffic routing… more
- Amazon (Santa Clara, CA)
- …we strive to predict and resolve customer issues through self-service and automation solutions. The CET team leads AI and Large Language Models (LLM)-driven ... knowledge sources and actuation capabilities. - Innovate and implement observability and logging mechanisms for proactive issue identification, troubleshooting, and… more
- Rubrik (Palo Alto, CA)
- …or Pulumi. + Strong scripting skills (Python, Bash, or similar) for automation . + Experience with observability and telemetry tools (OpenTelemetry, FluentBit, ... traffic routing, and policy enforcement across clouds. + Implement network observability (end-to-end traffic visibility, flow tracing, correlation ID propagation) to… more
- EPAM Systems (San Jose, CA)
- …next-generation managed services that integrate DevOps, SRE principles, and intelligent automation to transform how we deliver value. This pivotal leadership role ... future-ready service landscapes that leverage GenAI, AIOps, and advanced observability across cloud and on-premise environments while scaling our capabilities… more
- PennyMac (Westlake Village, CA)
- …agreements (SLAs) that meet or exceed business requirements. + Monitoring & Observability - Lead the development and implementation of comprehensive monitoring and ... observability practices using New Relic and other tools to...initiatives to optimize and standardize operational processes, focusing on automation , self-service capabilities, and elimination of manual work through… more
- Vail Resorts (CA)
- …Point firewalls, Cisco Nexus, Extreme Fabric, Azure, disaster recovery, and network automation . The Principal Analyst will lead the transition of contact centers to ... (eg, call recording consent models, toll-free usage, CLI rules). + Observability : global telemetry (NetFlow/IPFIX), synthetic testing, MOS/RTT analytics, SIP ladder… more
- NVIDIA (Santa Clara, CA)
- …tackle challenges through active troubleshooting and a commitment to network automation , observability , documentation, and operational excellence. What you'll be ... GeForce Now is looking for a Manager, Network Site Reliability Engineer (SRE) to enhance our network infrastructure and operations. We are looking for a leader who… more
- McAfee, Inc. (San Jose, CA)
- …+ Lead the design and implementation of AWS infrastructure and automation solutions. + Collaborate with cross-functional teams to define infrastructure requirements ... for AIOps by integrating AI/ML and NoOps principles to drive intelligent automation across cloud operations. + Leverage machine learning models to predict incidents,… more
- NVIDIA (Santa Clara, CA)
- …testing, integration, quality assurance, and post-release support. + Experience (as a software engineer or technical product manager) in one or more of the following ... programming. + Significant technical depth in industrial software, robotics and/or automation technologies (ee, CAE, CAD, PLM, ERP, MES, SCADA, EDA), with… more