- Red Hat (Boston, MA)
- …and scale LLM deployments. As a Principal Machine Learning Engineer focused on distributed vLLM (http://github.com/vllm-project/) infrastructure in the llm-d ... with our team to tackle the most pressing challenges in scalable inference systems and Kubernetes-native deployments. Your work with distributed systems and… more
- Celestica (San Jose, CA)
- …Americas Country: USA State/Province: California City: San Jose **Summary** The Staff Engineer , PCB Layout works with cross functional teams with other PCB Layout ... to a sizable team of engineers. You will also analyze complex systems and recommend solutions. + Review and interpret customer requirements and specifications,… more
- NVIDIA (Santa Clara, CA)
- …NPI Operations Team is looking for a highly motivated System Product Development Engineer to lead the development and productization of DGX products through mass ... production ramp. This role focuses on DGX systems and L11 rack-scale AI supercomputer reference platforms-some of the most advanced computing systems in the… more
- General Motors (Sunnyvale, CA)
- …on performance, availability, concurrency, and scalability. We're committed to maximizing GPU utilization across platforms (B200, H100, A100, and more) while ... the Role:** We are seeking a Staff ML Infrastructure engineer to help build and scale robust Compute platforms...needs. The ideal candidate brings experience in designing distributed systems for ML, strong problem-solving skills, and a product… more
- NVIDIA (Santa Clara, CA)
- …the unlimited potential of AI to define the next era of computing. Our GPU technology powers computers, robots, and self-driving cars that can understand the world. ... inspired to do their best work. We are now looking for an Automation Engineer . NVIDIA's product ecosystem is rapidly expanding, increasing the need for intelligent … more
- NVIDIA (Santa Clara, CA)
- … systems with high efficiency and availability using the combination of software and systems engineering practices. This is a highly specialized discipline ... at NVIDIA ensures that our internal and external facing GPU cloud services run maximum reliability and uptime as...a set of engineering approaches to running better production systems and optimizations. Much of our software … more
- NVIDIA (Santa Clara, CA)
- … with high efficiency and availability. It encompasses various areas, including software and systems engineering practices, storage, data management, and ... such as storage architecture, high-performance distributed storage, data management, systems , networking, coding, database management, capacity planning, continuous delivery… more
- NVIDIA (Santa Clara, CA)
- … systems with high efficiency and availability using the combination of software and systems engineering practices. This is a highly specialized discipline ... at NVIDIA ensures that our internal and external facing GPU cloud services run maximum reliability and uptime as...a set of engineering approaches to running better production systems and optimizations. Much of our software … more
- Google (Sunnyvale, CA)
- Senior Product Engineer , Machine Learning Accelerators _corporate_fare_ Google _place_ Sunnyvale, CA, USA **Mid** Experience driving progress, solving problems, and ... contract manufacturers and component suppliers for data center server accelerator products ( GPU , FPGA or ASIC). + Experience working with contract manufacturers and… more
- Oracle (Columbia, SC)
- …largest AI and HPC customers. These fabrics are the foundation underneath OCI's AI, GPU and HPC services, and support major tier-0 vendors in the generative AI ... the RDMA network underneath your workload. A Principal Network Engineer on our team supports the design, deployment, and...on operation and support of RDMA/RoCE network fabrics and systems , through a combination of a deep network understanding… more