- NVIDIA (Santa Clara, CA)
- …make a lasting impact on the world. We are seeking a highly skilled and experienced HPC Cluster Engineer to design, deploy, and operate GPU Compute Clusters ... + Provide leadership and strategic mentorship on the management of large-scale HPC systems including the deployment of compute, networking, and storage. + Develop… more
- NVIDIA (Santa Clara, CA)
- …+ Provide leadership and strategic mentorship on the management of large-scale HPC systems including the deployment of compute, networking, and storage. + Develop ... and operating large scale compute infrastructure. + Experience with AI/ HPC job schedulers and orchestrators, such as Slurm, K8s...such as Slurm, K8s or LSF. Applied experience with AI/ HPC workflows that use MPI and NCCL. + Proficient… more
- NVIDIA (Santa Clara, CA)
- …and planning abilities. Experience working with High Performance Computing ( HPC ), GPUs, and high-performance networking (RDMA, Infiniband, RoCE) are strongly ... will be harnessing multiple data streams, ranging from GPU hardware diagnostics to cluster and network telemetry. + Work on software that manages NVLINK topography… more
- NVIDIA (Santa Clara, CA)
- …Make the choice to join us today! As a member of the GPU AI/ HPC Infrastructure team, you will provide leadership in the design and implementation of ground ... + Provide leadership and strategic guidance on the management of large-scale HPC systems including the deployment of compute, networking, and storage. + Develop… more
- NVIDIA (Santa Clara, CA)
- …artificial intelligence. Join our team at NVIDIA as a Senior Site reliability engineer focused on HPC storage and play a crucial role in designing, ... software + Experience with RDMA (InfiniBand or RoCE) fabrics + Background with HPC cluster management tools such as Slurm, PBS, LSF, etc. + Passionate and… more
- NVIDIA (Santa Clara, CA)
- NVIDIA is searching for a senior or principal engineer who specializes in building cutting-edge infrastructure for large-scale foundation model training in the ... to support multi-modal foundation models for robotics. + Optimize GPU and cluster utilization for efficient model training and fine-tuning on massive datasets. +… more
- NVIDIA (Santa Clara, CA)
- …The data center platforms like GB200 NVL72 by NVIDIA are redefining AI, HPC , and cloud computing. To accommodate leading workloads globally, our diagnostic systems ... hardware technologies. We're in search of a visionary technical leader to engineer and propel innovation in diagnostics for NVIDIA's partner ecosystem. This role… more
- NVIDIA (Santa Clara, CA)
- …seeking a distributed software engineer to join our team! As a Senior engineer , you'll be instrumental in developing and optimizing AI infrastructure ... engineers. What You Will Be Doing: As a software engineer specializing in backend development, you'll work in a...well as container technologies like Docker and Kubernetes, and HPC /AI platforms such as Slurm. Ways to stand out… more
- NVIDIA (Santa Clara, CA)
- NVIDIA is seeking a Senior Firmware Engineer to join our CSP Engagements team, focusing on system software for Datacenter products such as GB200. This role ... see: + Deep expertise in data center server architectures, HPC systems, and hardware-software co-design. + Deep expertise in...out from the crowd: + Knowledge of cloud and cluster level deployment and management systems. + Experience with… more
- NVIDIA (Santa Clara, CA)
- …GPU Computing. We are passionate about markets include gaming, automotive, vision, HPC , datacenters and networking in addition to our traditional OEM business. ... Linux experience, reliability testing with various telemetries, scale out cluster , test plan development, track record in developing AI...are rapidly growing. If you're a creative and autonomous engineer with a real passion for technology, we want… more
Recent Jobs
-
Payroll Clerk
- vaco (La Presa, CA)
-
USMA Regional Medical Scientific Director (Rmsd) GI Immunology - Northwest Territory (Remote)
- Merck & Co. (Sacramento, CA)
-
Recycling Coordinator
- Monster (Columbus, OH)
-
Associate Director, Global Oncology Medical Affairs, Medical Analytics
- Daiichi Sankyo, Inc. (Bernards, NJ)