- NVIDIA (Santa Clara, CA)
- …commitment to customer success + Knowledge of networking protocols, especially RoCE, Infiniband , and the common L1 to L4 protocols + Background in high-performance ... networking and low-level programming, with strong proficiency in C + Strong analytical and problem-solving skills + Strong time-management and organization skills for coordinating multiple initiatives, priorities, and implementations of new technology and… more
- Meta (Menlo Park, CA)
- …7. Experience with NCCL and distributed GPU performance analysis on RoCE/ Infiniband 8. PhD in Computer Science, Computer Engineering, or relevant technical ... field 9. Knowledge of GPU architectures and CUDA programming 10. Knowledge of ML, deep learning and LLM 11. Experience with both data parallel and model parallel training, such as Distributed Data Parallel, Fully Sharded Data Parallel (FSDP), Tensor Parallel,… more
- Insight Global (Baltimore, MD)
- …skills in Bash and Python. * Experience with high-speed interconnects (eg, Infiniband ) and advanced networking in HPC environments. * Problem-Solving: Ability to ... troubleshoot complex issues and work independently to resolve them. more
- NVIDIA (Seattle, WA)
- …and reproduction for customers installing our products with a focus on Infiniband , next-generation AI, and HPC server technologies. + Own and resolve customer ... issues during installation, operation, maintenance or product application or interoperability with other vendors + Work with the latest hardware (eg GPUs, AI accelerators, high-speed interconnects) and software technologies such as ML frameworks and tools like… more
- IBM (Herndon, VA)
- …architecture design experience with HPC to include storage, file system, InfiniBand , security, authentication, and compute architecture with 5 years' experience. ... Knowledge of HPC hardware architectures, including processors, memory subsystems, network fabrics, and interconnects with 5 years' experience Must Be Eligible For A Federal Clearance Public Trust required but can obtain after being hired. **Preferred technical… more
- Synergy ECP (Columbia, MD)
- …cluster/node monitoring tools such as NHC (Node Health Check) Familiar with InfiniBand network communications Familiar with parallel file systems such as Lustre ... Experience with the Atlassian Suite of Tools (Jira, Confluence, Bitbucket) more
- NVIDIA (Santa Clara, CA)
- …close to the hardware + Background with PCIE, NVLink or server IO technologies like Infiniband , Ethernet is a plus + Previous experience of working on a large system ... software code base is preferable + Very strong problem solving and debugging skills + Ability to self-manage, show leadership, and have good interpersonal skills Your base salary will be determined based on your location, experience, and the pay of employees… more
- NVIDIA (Santa Clara, CA)
- …with accelerated compute and communications technologies such BlueField Networking, Infiniband topologies, NVMesh, and/or the NVIDIA Collective Communication Library ... (NCCL). + Experience working with a centralized security organization to prioritize and mitigate security risks. Prior experience in a ML/AI focused role or on a team matching specific keywords is welcome but not required. NVIDIA is leading the way in… more
- Celestica (Merrimack, NH)
- …and command-line tools. * Familiarity with networking concepts (Ethernet, TCP/IP, InfiniBand ) and network testing methodologies. * Experience with test methodologies ... such as performance testing, reliability testing, stress testing, and fault injection. * Excellent problem-solving, analytical, and debugging skills. * Strong communication and interpersonal skills, with the ability to collaborate effectively across diverse… more
- Celestica (Merrimack, NH)
- …and command-line tools. * Familiarity with networking concepts (Ethernet, TCP/IP, InfiniBand ) and network testing methodologies. * Experience with test methodologies ... such as performance testing, reliability testing, stress testing, and fault injection. * Excellent problem-solving, analytical, and debugging skills. * Strong communication and interpersonal skills, with the ability to collaborate effectively across diverse… more