- NVIDIA (Santa Clara, CA)
- …of experience crafting and operating large scale compute infrastructure. + Experience with AI / HPC job schedulers and orchestrators, such as Slurm, K8s or LSF. ... Applied experience with AI / HPC workflows that use MPI and NCCL. + Proficient in using Linux including Centos/RHEL and/or Ubuntu Linux distributions. A solid… more
- NVIDIA (Santa Clara, CA)
- …intelligence. Make the choice to join us today! As a member of the GPU AI / HPC Infrastructure team, you will provide leadership in the design and implementation ... years of experience designing and operating large scale compute infrastructure + Experience with AI / HPC advanced job schedulers, such as Slurm, K8s, PBS, RTDA or… more
- NVIDIA (Santa Clara, CA)
- …a lasting impact on the world. We are seeking a highly skilled and experienced HPC Cluster Engineer to design, deploy, and operate GPU Compute Clusters for EDA ... experience crafting and operating large scale compute infrastructure, including cluster configuration managements tools such as BCM or Ansible....tools such as BCM or Ansible. + Experience with AI / HPC job schedulers and orchestrators, such as… more
- NVIDIA (Santa Clara, CA)
- NVIDIA is hiring engineers to scale up its AI Infrastructure. We expect you to have a strong programming background, knowledge of datacenter hardware, operations, ... and planning abilities. Experience working with High Performance Computing ( HPC ), GPUs, and high-performance networking (RDMA, Infiniband, RoCE) are strongly… more
- NVIDIA (Santa Clara, CA)
- Join the NVIDIA Deep Learning Frameworks Infrastructure team as a Senior Systems Engineer focusing on High-Performance AI & Networking Applications, committed to ... equivalent experience. + 8+ years of proven experience in AI / HPC Infrastructure. + Familiarity with AI...NCCL, NIXL, NVSHMEM, UCX. + Experience developing or maintaining cluster management and monitoring tools Ex: ansible for infrastructure… more
- NVIDIA (Santa Clara, CA)
- We are now looking for a Senior Software Engineer for AI Resiliency. At NVIDIA, we are pushing the boundaries of what's possible in AI . We are currently ... Senior Software Engineer to lead the development of AI software resiliency for the most powerful AI...GPUs. Your expertise will be crucial in driving down cluster downtime towards zero, ensuring that our AI… more
- NVIDIA (Santa Clara, CA)
- …and telemetry frameworks. + Familiarity with GPU computing (CUDA), large-scale AI / HPC workloads, NVLink, Grace, and cluster -level deployment/management. ... NVIDIA is seeking a Senior Manager to lead our System Software SWAT...with at least 5 years in data center or HPC software environments. + Bachelor's degree or equivalent experience.… more
- NVIDIA (Santa Clara, CA)
- NVIDIA is searching for a senior or principal engineer who specializes in building cutting-edge infrastructure for large-scale foundation model training in the ... works on multimodal foundation models, large-scale robot learning, embodied AI , and physics simulation. Our past projects include Eureka… more
- NVIDIA (Santa Clara, CA)
- …training deep learning models at scale, and a good mathematical foundation to analyze new AI algorithms. We focus on AI models for autonomous driving such as ... agent behavior models, end-to-end AV architectures, AI safety, closed-loop training approaches, and AV foundation models...running on thousands of GPUs; + Optimize GPU and cluster utilization for efficient model training and fine-tuning on… more
- NVIDIA (Santa Clara, CA)
- …best work. The data center platforms like GB200 NVL72 by NVIDIA are redefining AI , HPC , and cloud computing. To accommodate leading workloads globally, our ... Passion for mentoring and building high-performing teams. NVIDIA is at the forefront of AI , HPC , and visualization. Our diagnostics are the nervous system of our… more
Recent Jobs
-
Senior Mechanical Packaging Design Engineer (Hybrid)
- RTX Corporation (Windsor Locks, CT)
-
Systems Administrator Technician _TS/SCI with Polygraph
- General Dynamics Information Technology (Aurora, CO)
-
Cyber Oracle Cloud Security - Senior Consultant
- Deloitte (Denver, CO)
-
Assistant/Associate/Senior Scientist - Henry Ford Health Associate/Professor - Michigan State University
- Henry Ford Health System (Detroit, MI)