- NVIDIA (Santa Clara, CA)
- …at NVIDIA, you will lead the development of DGX Cloud strategy for GPU fleet lifecycle, health, observability and utilization monitoring, and remediation. You ... define and drive the technical implementation for DGX Cloud operations practice for GPU fleet lifecycle. + Collaborate on Cross Domain Disciplines: drive the… more
- Google (New York, NY)
- … GPU Performance team is responsible for optimizing, modeling and evaluating GPU systems for comparative analysis and benchmarking for Google's internal ... GPU Performance Engineer _corporate_fare_ Google _place_...We strive for extracting maximum efficiency in Google's growing GPU fleet . The team identifies performance opportunities… more
- NVIDIA (Santa Clara, CA)
- …+ Understanding of performance, security and reliability in complex distributed systems . Familiarity with system level architecture, data synchronization, fault ... science of computer graphics. With the invention of the GPU - the engine of modern visual computing -...Cluster Manager. + Hands-on experience developing and/or operating hardware fleet management systems . Proven operational excellence in… more
- NVIDIA (Seattle, WA)
- … Engineer to join our DGX Cloud team and build the foundational systems that drive NVIDIA's high-performance GPU infrastructure. You will play a technical ... lead role in designing scalable cloud services that integrate with diverse systems including GPU telemetry in datacenters, and enabling operational automation… more
- NVIDIA (Seattle, WA)
- … Engineer to join our DGX Cloud team and build the foundational systems that drive NVIDIA's high-performance GPU infrastructure. You will play a critical ... role in designing scalable cloud services that integrate with diverse systems including GPU telemetry in datacenters, and enabling operational automation across… more
- Meta (Menlo Park, CA)
- …aims to enable Meta-wide ML products and innovations to leverage our large-scale GPU training and inference fleet through an observable, reliable and ... of the following machine learning/deep learning domains: Distributed ML Training, GPU architecture, ML systems , AI infrastructure, high performance computing,… more
- LinkedIn (Mountain View, CA)
- …algorithms, AI frameworks, data infra, compute software, and hardware to harness the power of our GPU fleet with thousands of latest GPU cards. The team also ... billions of user queries. Model Training Infrastructure: As an engineer on the AI Training Infra team, you will...compute efficient infra on top of native cloud, enable GPU based inference for a large variety of use… more
- LinkedIn (Mountain View, CA)
- …algorithms, AI frameworks, data infra, compute software, and hardware to harness the power of our GPU fleet with thousands of latest GPU cards. The team also ... billions of user queries Model Training Infrastructure: As an engineer on the AI Training Infra team, you will...technical discipline + Experience building ML applications, LLM serving, GPU serving. + Experience with search systems … more
- LinkedIn (Mountain View, CA)
- …algorithms, AI frameworks, data infra, compute software, and hardware to harness the power of our GPU fleet with thousands of latest GPU cards. The team also ... billions of user queries. Model Training Infrastructure: As an engineer on the AI Training Infra team, you will...compute efficient infra on top of native cloud, enable GPU based inference for a large variety of use… more
- Microsoft Corporation (Mountain View, CA)
- …and operate at the intersection of AI algorithmic innovation, purpose-built AI hardware, systems , and software. We are a team of highly capable and motivated people ... Windows, Bing, SQL Server, and Dynamics. As a Principal Engineer on the team, you will have the opportunity...all levels of abstraction including kernel, model, algorithm and system level, monitor performance and drive efficiencies that contribute… more