- NVIDIA (Redmond, WA)
- …+ Familiarity with advanced networking architecture, particularly in large scale HPC environments. + Significant software development and deployment experience + ... Detailed knowledge of Data Center architectures and the economics large scale deployments. + Comprehensive understanding of network management, collaborative problem solving and telemetry. + Excellent interpersonal skills including ability to explain… more
- Meta (Bellevue, WA)
- …Qualifications: 7. Experience in leading teams working on high performance computing ( HPC ) and AI/ML systems, including: 8. Communication libraries (eg, NCCL, RCCL, ... UCC, MPI) 9. GPU/ASIC-based kernel development and optimization (eg CUDA, ROCm) 10. Distributed systems for large scale training and serving 11. Systems Architecture + Performance 12. Large scale distributed systems 13. Experience running a large-scale program… more