- Meta (Bellevue, WA)
- …industry experience in developing compilers, ML systems, ML accelerators, GPU performance , and similar. **Preferred Qualifications:** Preferred Qualifications: ... open source, cutting-edge, and industry leading. **Required Skills:** Software Engineer , Systems ML - PyTorch Compiler, PyTorch Framework, PyTorch...10. Expert knowledge of GPU or ML accelerator performance and developing… more
- Meta (Redmond, WA)
- …research GPU super clusters. You are a hybrid software/systems/infrastructure engineer who ensures that Meta's Research Super Clusters run smoothly and have ... with working on the frontiers of research.In this software engineer role, you will serve as the point of...every day at Meta's large scale ML model training GPU clusters, and we are always learning. **Required Skills:**… more
- Amazon (Seattle, WA)
- …- Develop robust monitoring and debugging tools to ensure the reliability and performance of training workflows on large GPU clusters. Design and maintain ... to process massive data, scale machine learning models while optimizing GPU utilization, memory management, and the training workflows (like kernel fusion,… more
- NVIDIA (WA)
- We are now looking for a Senior ASIC Design Engineer . NVIDIA is seeking ASIC Design Engineers to implement the world's leading SoC's and GPU 's. This position ... be doing: + As a key member of the GPU Design team, you will implement, document and deliver...Design team, you will implement, document and deliver high performance , area and power efficient RTL to achieve design… more
- Walmart (Bellevue, WA)
- **Position Summary ** We are seeking a highly experienced Principal Software Engineer specializing in AI Systems to join our team and play a pivotal role in ... robust infrastructure for our Generative AI powered applications, deploying LLMs on GPU instances, supporting advanced AI research and development within our public… more
- GE HealthCare (Bellevue, WA)
- **Job Description Summary** As a Deep Learning Engineer , you will play a crucial role in bridging the gap between AI science and production, helping to train AI ... and resolve challenges relating to large-scale model training involving multi- GPU and/or distributed training regimes. + Demonstrating algorithms to meet… more
- Meta (Bellevue, WA)
- …the new product introduction (NPI) phase. **Required Skills:** Hardware Systems Engineer , AI NPI Responsibilities: 1. Drive and execute end-to-end system validation ... or more of the following modules/domains: PCIe, NVlink, Networking, Flash, Memory, CPU, GPU , TPU, DRAM (DDR4/5 or HBM), AI silicon/AI accelerators 15. 3+ years of… more
- NVIDIA (Redmond, WA)
- …inference and training (eg FlashInfer, Flash Attention) + Strong experience in GPU performance optimizations + Strong experience machine learning systems ... next era of computing. An era in which our GPU acts as the brains of computers, robots, and...are rapidly growing. If you're a creative and autonomous engineer with a real passion for technology, we want… more
- Meta (Bellevue, WA)
- …GPUs/CPUs- Seamless eval reporting, analysis and debugging **Required Skills:** Software Engineer , Systems ML - GenAI Evals Platform Responsibilities: 1. Design ... facilitating development of novel evals and judges 3. Optimize GPU / CPU utilization to maximize efficiency and speed...CPU, or AI hardware accelerators 15. Experience in system performance optimizations such as runtime analysis of latency, memory… more
- Amazon (Bellevue, WA)
- …for large scale deep learning model training (100+ billion parameter GPT, 1000s of GPU devices). You have a proven track record of bringing innovative research to ... prior experience in one of: resource orchestrators like slurm/kubernetes, high performance computing, building scalable systems, experience in large language model… more