- IBM (San Jose, CA)
- …technical areas in the context of hybrid cloud, AI systems , networking, security, high-speed networked-storage, accelerators, and HPC principles. The ... focuses on the next generation Hybrid Cloud infrastructure for AI , Storage, HPC and Quantum applications. The...Experience with GPU Systems * Familiarity with HPC system performance evaluation. * Familiarity with… more
- Oracle (Seattle, WA)
- …strives to be the go-to experts on RDMA cluster architecture and its relationship to AI /ML/ HPC performance . We apply our deep understanding of these unique ... / AI workloads, or maintaining an HPC / AI system. + Experience troubleshooting or tuning performance.../ AI system. + Experience troubleshooting or tuning performance on distributed systems . + Familiarity with… more
- Genentech (South San Francisco, CA)
- …Position** **2026 Summer Intern - Computational Sciences Center of Excellence -** ** AI systems performance engineering** **Department Summary** A healthier ... and scientific computing workloads. + Support benchmarking and performance testing efforts for AI systems...HPC or cloud environments. + Experience with distributed systems and parallel computing techniques, including data, model, and… more
- Meta (Menlo Park, CA)
- …on existing accelerator systems and guiding the future of models and AI HW at Meta. This drives improved performance , new model architectures and ... the following areas: Accelerators/GPU architectures, High Performance Computing ( HPC ), Machine Learning Compilers, Training/Inference ML Systems , Model… more
- Bloomberg (New York, NY)
- …maintaining system software that enables communication between GPUS, CPUs, and storage in scale-out AI and HPC systems . This role will also be responsible ... overseeing the ongoing monitoring, support, and maintenance of our HPC / AI clusters, ensuring peak performance ...enables communication between GPUS, CPUs, and storage in scale-out AI and HPC systems +… more
- Amazon (Cupertino, CA)
- …and operating AWS cloud offerings that enable high performance and scalability in AI /ML and HPC workloads. You are intrigued by the continuous release of ... Want to do industry leading work delivering continuous price performance improvements in the cloud for AI ...have tremendous interest in cloud scale and curious how systems and software decisions impact the user. You insist… more
- NVIDIA (Santa Clara, CA)
- …Understanding of fast, distributed storage systems like Lustre and GPFS for AI / HPC workloads + Familiarity with deep learning frameworks like PyTorch and ... We are seeking a Senior AI /ML Performance and Efficiency Engineer, GPU...to end + Debugging and optimization experience with NSight Systems and NSight Compute + Experience with debugging large-scale… more
- Amazon (Seattle, WA)
- …design, deliver, and operate next-generation infrastructure that powers breakthrough innovation in AI /ML and HPC workloads. If you're passionate about pushing ... Do you want to shape the future of Generative AI at AWS? Join the team building the foundation...the limits of performance , efficiency, and scalability in the cloud, this is… more
- Oracle (Cheyenne, WY)
- …what's possible. Responsibilities + Lead architecture, system design, and implementation for high- performance RDMA solutions across OCI's AI / HPC platforms, ... If you thrive at the intersection of large-scale distributed systems , high-speed networking, and AI workloads, this... performance tuning at scale. + Familiarity with AI / HPC stacks and workloads: NCCL/RCCL/MPI, Slurm or… more
- Amazon (Cupertino, CA)
- …design, deliver, and operate next-generation infrastructure that powers breakthrough innovation in AI /ML and HPC workloads. If you're passionate about pushing ... Do you want to shape the future of Generative AI at AWS? Join the team building the foundation...the limits of performance , efficiency, and scalability in the cloud, this is… more