- NVIDIA (Santa Clara, CA)
- NVIDIA is hiring engineers to scale up its AI Infrastructure. We expect you to have a strong programming background, a deep understanding of distributed systems , ... capacity to build and deploy leading infrastructure solutions for a broad range of AI -based applications that affect core data science. What are you waiting for if… more
- Meta (Menlo Park, CA)
- …10. Experience in leading teams working on high performance computing ( HPC ) and AI /ML systems , including: GPU/ASIC-based kernel development and ... ROCm), distributed systems for large scale training and serving, and systems architecture and performance 11. Accelerator (GPU/ASIC) kernel development and… more
- Micron Technology, Inc. (San Jose, CA)
- …in growing the Artificial Intelligence ( AI ), Machine Learning (ML) and High- Performance Computing ( HPC ) business segments. You will be working on innovative ... of Work (SOWs), business term sheets, and other customer-facing documents for high- performance memory products. + Represent the Product Management team in Product… more
- NVIDIA (Santa Clara, CA)
- …works on multimodal foundation models, large-scale robot learning, embodied AI , and physics simulation. Our past projects include Eureka ... What you will be doing: + Design and maintain large-scale distributed training systems to support multi-modal foundation models for robotics. + Optimize GPU and… more
- NVIDIA (Santa Clara, CA)
- …GPUs and SOCs powering product lines for the growing field of artificial intelligence ( AI ) and high- performance computing ( HPC ). What you'll be doing: + ... features to improve system Reliability, Availability, Serviceability (RAS), and performance in the Datacenter. + Model and analyze RAS...parallel computing. More recently, GPU deep learning ignited modern AI - the next era of computing - with… more
- Meta (Menlo Park, CA)
- …networks, powering our global data centers and supporting cutting-edge technologies like AI , Generative AI , Recommendation engines, and Metaverse. Our network ... to join our teams and help build scalable distributed systems , develop innovative solutions to our challenges, and ship...firmware, and software for network devices, transport stacks, and AI workloads 2. Debug complex system-level issues and lead… more
- Meta (Menlo Park, CA)
- …networks, powering our global data centers and supporting cutting-edge technologies like AI , Generative AI , Recommendation engines, and Metaverse. Our network ... to join our teams and help build scalable distributed systems , develop innovative solutions to our challenges, and ship...firmware, and software for network devices, transport stacks, and AI workloads 2. Debug complex system-level issues and lead… more
- NVIDIA (CA)
- …developing the CUDA-Q platform for programming powerful hybrid quantum-classical multi-processor systems . We are looking for a dedicated engineer with expertise in ... real-time systems , GPU-programming, and is proficient in parallel and distributed...If you love the craft of software engineering and high- performance algorithm implementation, and relish the chance to impact… more
- NVIDIA (Santa Clara, CA)
- …GH200 superchip provides performance and productivity required for strong scaling for HPC and generative AI workload. Scale out is inherent to the design ... the world. Today, we are increasingly known as "the AI computing company." We are looking to grow our...& analysis engines. Experience with Redfish. Experience with notification systems like PagerDuty. + Active Open Compute (OCP) and… more
- NVIDIA (Santa Clara, CA)
- …12+ years of professional experience building and scaling high- performance distributed systems , ideally in ML, HPC , or large-scale data infrastructure. + ... is preferred), large scale training (DDP/FSDP, NCCL, tensor/pipeline parallelism), and performance profiling. + Strong systems background: datacenter networking… more