- NVIDIA (Santa Clara, CA)
- …NTP/PTP, DHCP, and LDAP. This includes building for performance and reliability at global scale, covering automation, monitoring, high availability, capacity ... like eBPF and XDP for Observability & DDoS mitigation + Collect and review system data for capacity and planning purposes, analyze capacity data and develop plans… more
- PennyMac (Westlake Village, CA)
- …of homeownership through the complete mortgage journey. A Typical Day As the Site Reliability Operations, Engineer II (SRO), you will help the team provide 24/7 ... (NYSE: PFSI) is a specialty financial services firm with a comprehensive mortgage platform and integrated business focused on the production and servicing of US… more
- S&P Global (Princeton, NJ)
- **About the Role:** **Grade Level (for internal use):** 09 **Site Reliability Engineer - Datadog Specialist** **The Team:** The IT Operations team at S&P Dow ... is tasked with owning and maintaining the Production IT systems that underpin S&P DJI's index platforms and applications,...+ 4 years of experience in SRE, DevOps, or platform engineering roles. + Bachelor's degree in Computer Science… more
- Zscaler (San Jose, CA)
- …agility with a cloud-first strategy. We're seeking a highly skilled and experienced SRE Platform Engineer to join our SRE Cloud Platform Engineering Team. ... resilient, and secure. Our cloud native Zero Trust Exchange platform protects thousands of customers from cyberattacks and data...+ 5+ years of experience in Cloud-SRE, DevOps, or Systems Engineering with a focus on software development +… more
- New York Times (New York, NY)
- …and inference platforms. + Reliability & Observability: Ensure end-to-end system reliability , monitoring, and cost transparency across data and ML ... at least one compiled language like Java or Go + Experience designing systems with scalability, reliability , and cost-efficiency as first-class concerns + Cloud… more
- BOOZ, ALLEN & HAMILTON, INC. (Reston, VA)
- Case Management Platform Backend Engineer Key Role: Architect and implement the backend foundation of our next-generation case management platform . Perform ... design, data architecture, integration engineering, and workflow enablement, ensuring the platform serves as a scalable, extensible system -of-record for security… more
- Red Hat (Raleigh, NC)
- …will ensure the reliability , performance, and scalability of our core platform systems while actively contributing to the acceleration of AI integration ... Excellence & Reliability : Drive initiatives to improve operational efficiency, system reliability , and performance by designing and implementing AI-powered… more
- DoorDash (San Francisco, CA)
- …engineering bar - Establish metrics & processes that improve developer velocity, system reliability , and long-term maintainability. We're excited about you ... long-term platform thinking, making sound trade-offs. + Care deeply about reliability , performance, observability, and security in production systems . + Lead… more
- MongoDB (Chicago, IL)
- We are looking for an experienced Senior or Staff Engineer for our SRE, InfraSec team, to guide the security of our cloud-based infrastructure. As a Staff SRE, you ... They build essential security infrastructure and implement controls that reinforce the platform 's security posture. This is an SRE team, which means you can… more
- MongoDB (Palo Alto, CA)
- …for a Senior Engineer to help build the next-generation inference platform that supports embedding models used for semantic search, retrieval, and AI-native ... Atlas and designed for developer-first experiences. As a Senior Engineer , you'll focus on building core systems ...ensure tight integration with Atlas, and contribute to a platform designed for reliability , performance, and ease… more