- Google (Sunnyvale, CA)
- Software Engineer , Cluster Networking _corporate_fare_ Google _place_ Sunnyvale, CA, USA **Mid** Experience driving progress, solving problems, and ... for software that manages changes to Google's cluster networks worldwide. Networking enables the growing...team builds systems to make this possible. As a Software Engineer , You will be working with… more
- Oracle (Seattle, WA)
- …our customers. Our team strives to be the go-to experts on RDMA cluster architecture and its relationship to AI/ML/HPC performance. We apply our deep understanding ... benchmarking. + Troubleshoot performance problems on RDMA clusters and perform cluster performance validation, including on very novel and not fully understood… more
- NVIDIA (Santa Clara, CA)
- …diverse environments. Ways to stand out from the crowd: + Proficiency with cluster networking including InfiniBand and Spectrum-X. + Experience with NVIDIA ... them operational in production? We are seeking a dedicated Cluster Deployment Operations Engineer to support product...with large language models (LLMs) as part of a software development or content creation workflow - we rely… more
- NVIDIA (Santa Clara, CA)
- …cluster availability and performance. + Manage the rollout and rollback of cluster software and firmware updates, ensuring smooth transitions and minimal ... via NVLink and InfiniBand + Implement modern DevOps tools to automate software updates, perform maintenance tasks, and monitor cluster availability, ensuring… more
- NVIDIA (Santa Clara, CA)
- …lasting impact on the world. We are seeking a highly skilled and experienced HPC Cluster Engineer to design, deploy, and operate GPU Compute Clusters for EDA and ... the management of large-scale HPC systems including the deployment of compute, networking , and storage. + Develop and improve our ecosystem around GPU-accelerated… more
- Broadcom (Bellevue, WA)
- …learning, and technical excellence. **Job Summary** We are seeking an experienced Senior Software Engineer with knowledge in both Kubernetes as well as Go ... and supply a broad range of semiconductor and infrastructure software solutions. Our category-leading product portfolios serve the world's...(Golang) to join our VCF Cluster Management team. In this role, you will be… more
- NVIDIA (Santa Clara, CA)
- …the management of large-scale HPC systems including the deployment of compute, networking , and storage. + Develop and improve our ecosystem around GPU-accelerated ... occur. + Build innovative tooling to accelerate researchers' velocity, troubleshooting, and software performance at scale. What we need to see: + Bachelor's degree… more
- Walmart (Sunnyvale, CA)
- **Position Summary ** We are seeking a highly skilled Principal Engineer (Ceph/Scale-Out Storage) with 10years+ of deep technical experience in distributed storage ... environments. The ideal candidate will have strong expertise across Linux, networking , storage internals, and distributed systems, with the ability to diagnose… more
- Oracle (Seattle, WA)
- …in making critical technical decisions and setting engineering vision. + Knowledge of cluster networking and cloud networking . + Solid understanding of ... and more reliable. If you're passionate about building high-performance software , value clean design at scale, and enjoy working...Oracle Cloud, AWS, and Azure. + Hands-on experience in ** cluster networking ** and **cloud networking **… more
- NVIDIA (Santa Clara, CA)
- …involves developing tools for AI researchers and SW/HW teams running AI workload in GPU cluster . As a member of the software development team, we will work with ... of GPU cluster job scheduling (Slurm or Kubernetes), storage and networking + Experience with NVIDIA GPUs, CUDA Programming and NCCL + Motivated self-starter… more