- NVIDIA (Santa Clara, CA)
- …that are groundbreaking in AI and computing. What you'll be doing: As a Reliability Methodology Engineer at NVIDIA, you will be responsible for ensuring our ... products and systems operate flawlessly. Your key duties will include: +...test engineering teams to apply DFT methodologies to improve reliability screening specific to HTOL (Component level Hight Temp… more
- Coinbase (Sacramento, CA)
- …impact . *Role* - We would like to add a Senior Software Engineer to help promote reliability culture across Coinbase. You would be helping company-wide ... fully supported. *What you'll be doing (ie. job duties):* *Team* - Core Reliability team is a vital part of Infrastructure(Platform) org responsible for paving the… more
- ServiceNow, Inc. (San Diego, CA)
- …the problem-we ** engineer it away** with software. You'll join Network Reliability & Resiliency (NR2): a diverse crew of network, software, hardware, and ... It all started in sunny San Diego, California in 2004 when a visionary engineer , Fred Luddy, saw the potential to transform how we work. Fast forward to today -… more
- NVIDIA (Santa Clara, CA)
- …for a passionate member to join our DGX Cloud Engineering Team as a Sr. Site Reliability Engineer . In this role, you will play a significant part in helping to ... quality? Do you pride yourself in building cloud-scale software systems ? If so, join our team at NVIDIA, where...reliability . + Design, build, and implement scalable cloud-based systems for PaaS/IaaS. + Work closely with other teams… more
- Amazon (Cupertino, CA)
- …designs cutting AI platforms for the world's largest Cloud Services provider. As a Senior Reliability Engineer you will engage with an experienced ... * You will have a fundamental understanding of Reliability statistics/ Reliability tests and/or solid understanding of computer systems to influence… more
- NVIDIA (Santa Clara, CA)
- …health + Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity + ... Site Reliability Engineering (SRE) at NVIDIA is an engineering...discipline to design, build and maintain large scale production systems with high efficiency and availability using the combination… more
- NVIDIA (Santa Clara, CA)
- NVIDIA is looking for a Senior Site Reliability Engineer to work in IPP (Infrastructure, Planning and Process). IPP is a global organization within NVIDIA. ... hosts a heterogeneous mix of machines and devices with various operating systems (Windows/Linux/Android), a multitude of hardware platforms both NVIDIA GPUs and… more
- LinkedIn (Mountain View, CA)
- … and troubleshooting production systems at scale. Suggested Skills: + Distributed Systems + Technical Leadership + Infrastructure Reliability + Systems ... passion for distributed technologies and algorithms, API design and systems design, and your passion for writing code that...impact within our company. As a Sr. Staff Software Engineer , you will be a key technical leader and… more
- NVIDIA (Santa Clara, CA)
- …us accelerate the next wave of artificial intelligence. Join our team at NVIDIA as a Senior Site reliability engineer focused on HPC storage and play a ... such as high-performance NFS, S3-compatible object storage, and distributed storage systems + Develop tooling to automate deployment and management of large-scale… more
- NVIDIA (Santa Clara, CA)
- Join our team in Santa Clara, CA, USA as a Senior Site Reliability Engineer . At NVIDIA, you'll be part of the team shaping the future of computing and ... techniques and Infrastructure as Code (IaC). + Deep understanding of Linux operating systems and TCP/IP fundamentals. + Expertise with at least one major cloud… more