- ServiceNow, Inc. (Santa Clara, CA)
- …the future. **As a Senior Staff Machine Learning Engineer - Site Reliability Engineer you will:** + Contribute to the design, development and ... implementation of infrastructure, platform, deployment and observability features that power AI workloads. + Collaborate with researchers, AI engineers, and infrastructure teams to ensure our GPU clusters perform efficiently, scale well, and remain reliable. +… more
- Palo Alto Networks (Santa Clara, CA)
- …and actionable insights into our systems' performance and health. **Your Impact** As a Senior Staff SRE with the Cortex Observability team, you will: + Cloud ... influence the operability of the product and ensure the reliability and availability of our services **Your Experience** +...passion for technology and a strong motivation for high reliability at the service level + Observability Tools: High… more
- Google (Sunnyvale, CA)
- …**Preferred qualifications:** + Master's degree in Computer Science or Engineering. Site Reliability Engineering (SRE) combines software and systems engineering ... learn and grow. **To learn more:** check out our books on Site Reliability Engineering (https://landing.google.com/sre/book.html) or read a career profile… more
- NVIDIA (Santa Clara, CA)
- …is inspired to do their best work. We are seeking a highly skilled Principal Staff SRE to join our dynamic team. Our company is at the forefront of technological ... NTP/PTP, DHCP, and LDAP. This includes building for performance and reliability at global scale, covering automation, monitoring, high availability, capacity… more
- Google (Sunnyvale, CA)
- …technologies, storage, or hardware architecture. + 5 years of experience with site reliability engineering focused on building and maintaining scalable, reliable ... systems. **Preferred qualifications:** + Master's degree or PhD in Engineering, Computer Science, or a related technical field. + 8 years of experience with data structures/algorithms. + 5 years of experience in a technical leadership role leading project… more
- Two95 International Inc. (Sacramento, CA)
- Position - Site Reliability Engineering Manager Location - Sacramento, CA Type - Fulltime Salary - $Market ESSENTIAL JOB FUNCTIONS AND BASIC DUTIES + The SREM ... efforts, including audit support. Architecture/Engineering Support + Consult with other IT and reliability staff reports to ensure that reliability is… more
- MongoDB (San Francisco, CA)
- …role, with a strong focus on security work, with ideally 2+ years in a senior or staff engineering role Security Mindset: + A comprehensive understanding of all ... next-generation, AI-powered applications. We are looking for an experienced Staff Engineer for our SRE, InfraSec team, to guide...guide the security of our cloud-based infrastructure. As a Staff SRE, you will be very hands-on technically while… more
- Amazon (Cupertino, CA)
- …cutting AI platforms for the world's largest Cloud Services provider. As a Senior Reliability Engineer you will engage with an experienced cross-disciplinary ... Description The Trainium Manufacturing, Quality and Reliability (MQR) Team is part of AWS Annapurna...staff to conceive and design infrastructure technologies. You will… more
- Amazon (Cupertino, CA)
- …across cross-geographical ODMs and CMs. As part of the Manufacturing, Quality and Reliability Team in AWS Annapurna Labs focused on Machine Learning products that ... performance at low cost. The Trainium Manufacturing, Quality and Reliability Team is part of AWS Annapurna Labs focused...team and the ODM and CM partners. As a Senior Manufacturing Engineer you will engage with an experienced… more
- ServiceNow, Inc. (Pleasanton, CA)
- …powering next-generation analytics to support ServiceNow's Cloud and AI growth. As our Senior Staff DevOps Engineer for Cloud Analytics & FinOps Engineering ... for infrastructure changes with drift detection and remediation. **Observability & Site Reliability Engineering** + Architect comprehensive monitoring using… more