- NVIDIA (Santa Clara, CA)
- …people to make them operational in production? We are seeking a dedicated Cluster Deployment Operations Engineer to support product deployments and issues by ... years of experience in at least two of the following: HPC/large-scale cluster administration, Linux systems engineering, infrastructure automation (eg, Ansible,… more
- NVIDIA (Santa Clara, CA)
- …DevOps tools to automate software updates, perform maintenance tasks, and monitor cluster availability, ensuring seamless operations. + Take ownership of daily ... cluster failures and issues, troubleshooting them promptly to maintain...in deploying and administrating clusters, servers, switches, and related infrastructure . + Automation expert with hands on skills in… more
- NVIDIA (Santa Clara, CA)
- …the world's most advanced computing workloads. NVIDIA is looking for an AI/ML HPC Cluster Engineer to join our MARS team. You will provide technical engagement ... mission, our team, Managed AI Superclusters (MARS) builds and scales the infrastructure , platforms, and tools that enable researchers and engineers to develop the… more
- NVIDIA (Santa Clara, CA)
- …lasting impact on the world. We are seeking a highly skilled and experienced HPC Cluster Engineer to design, deploy, and operate GPU Compute Clusters for EDA and ... of 5 years of proven experience crafting and operating large scale compute infrastructure , including cluster configuration managements tools such as BCM or… more
- Broadcom (Bellevue, WA)
- …you apply.** **Job Description:** **About Broadcom** Broadcom Inc. is a global infrastructure technology leader built on 50 years of innovation, collaboration, and ... We design, develop, and supply a broad range of semiconductor and infrastructure software solutions. Our category-leading product portfolios serve the world's most… more
- Oracle (Seattle, WA)
- **Job Description** OCI AI Infrastructure is at the forefront of building cutting-edge GPU supercomputers that scale to tens of thousands of GPUs without ... team strives to be the go-to experts on RDMA cluster architecture and its relationship to AI/ML/HPC performance. We...+ Troubleshoot performance problems on RDMA clusters and perform cluster performance validation, including on very novel and not… more
- Bloomberg (New York, NY)
- Senior Software Engineer - Market Data Platform, Cluster Management Location New York Business Area Engineering and CTO Ref # 10046371 **Description & ... in it for you:** As a Market Data Platform engineer , you will: + Get hands-on experience working on...and diagnosing unexpected issues in production. The market data infrastructure you'll help build and improve is mission-critical for… more
- NVIDIA (Santa Clara, CA)
- …Make the choice to join us today! As a member of the GPU AI/HPC Infrastructure team, you will provide leadership in the design and implementation of ground breaking ... + Minimum 5+ years of experience designing and operating large scale compute infrastructure + Experience with AI/HPC advanced job schedulers, such as Slurm, K8s,… more
- NVIDIA (Santa Clara, CA)
- …Minimum of 6 years of experience crafting and operating large scale compute infrastructure . + Experience with AI/HPC job schedulers and orchestrators, such as Slurm, ... staying ahead of new technologies and effective approaches in the HPC and AI/ML infrastructure fields. Ways to stand out from the crowd: + Experience with NVIDIA… more
- NVIDIA (Santa Clara, CA)
- NVIDIA is hiring engineers to scale up its AI Infrastructure . We expect you to have a strong programming background, knowledge of datacenter hardware, operations, ... help advance NVIDIA's capacity to build and deploy leading infrastructure solutions for a broad range of AI-based applications...multiple data streams, ranging from GPU hardware diagnostics to cluster and network telemetry. + Work on software that… more