- NVIDIA (Santa Clara, CA)
- …stacks that optimize them (eg NCCL, CUDA), as well as HPC technologies such as InfiniBand , MPI, NVLink and others. Your base salary will be determined based on your ... location, experience, and the pay of employees in similar positions. The base salary range is 184,000 USD - 287,500 USD. You will also be eligible for equity and benefits (https://www.nvidia.com/en-us/benefits/) . Applications for this job will be accepted at… more
- NVIDIA (Santa Clara, CA)
- …+ Background with Parallel Computing, PCIE, NvLink or server product technologies like Infiniband , Ethernet is a plus + Previous experience of working on a large ... system software code base is preferable + Very strong problem solving and debugging skills + Ability to self-manage, show leadership, and have good interpersonal skills With competitive salaries and a generous benefits package, NVIDIA is widely considered to… more
- Cisco (San Jose, CA)
- …based digital interfaces such as PCIe, Ethernet, Fibre Channel or InfiniBand . + Strong problem-solving skills and effective conflict resolution abilities. ... **PREFERRED QUALIFICATIONS:** + 8+ years experience in high-speed digital system/board design OR Field Application Engineering for high-speed digital devices. + Experience with semiconductor device reliability testing and characterization methods, such as… more
- NVIDIA (Santa Clara, CA)
- …ICMP, tunneling protocols (VXLAN, Geneve, FoU, GRE), etc. + Familiarity with Infiniband networking. + Background with Host management systems (DHCP, Redfish, UEFI) ... and host security services such as TPM, TXT, and SecureBoot. + Kubernetes and/or distributed task scheduling. NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High-Performance Computing and Visualization. The GPU, our… more
- NVIDIA (Santa Clara, CA)
- …to the hardware + Background with PCIE, NVLink or server product technologies like Infiniband , Ethernet is a plus + Previous experience of working on a large system ... software code base is preferable + Very strong problem solving and debugging skills + Ability to self-manage, show leadership, and have good interpersonal skills With competitive salaries and a generous benefits package, NVIDIA is widely considered to be one… more
- NVIDIA (Santa Clara, CA)
- …scheduling & orchestration (eg, Slurm, Kubernetes, LSF), high-speed networking (eg, Infiniband , RoCE, Amazon EFA), and containers technologies (Docker, Enroot). + ... Expertise in running and optimizing large-scale distributed training workloads using PyTorch (DDP, FSDP), NeMo, or JAX. Also, possess a deep understanding of AI/ML workflows, encompassing data processing, model training, and inference pipelines. + Proficiency… more
- NVIDIA (Santa Clara, CA)
- …to write Torch code and occasional custom GPU kernels. + Expertise in InfiniBand , NVLink, RoCE, RDMA, and collective‑comm libraries. Your base salary will be ... determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 120,000 USD - 189,750 USD for Level 2, and 148,000 USD - 235,750 USD for Level 3. You will also be eligible for equity and benefits… more
- Meta (Menlo Park, CA)
- …14. 7. Networking protocols including at least one of the following: RDMA, InfiniBand , DHCP, NTP, SCP, SFTP or SNMP 15. 8. Maintaining web-based applications using ... at least one of the following: Apache, Memecached, or Squid 16. 9. Version Control tools including Mercurial 17. 10. Diagnosing and troubleshooting issues ranging from low-level hardware issues to large scale failures within datacenter clusters **Public… more
- Meta (Menlo Park, CA)
- …large scale. 27. Experience with testing open-source hardware. 28. Familiarity with InfiniBand (IB), Remote Direct Memory Access (RDMA), and RDMA over Converged ... Ethernet (RoCE) network topologies, as well as their respective physical infrastructure. **Public Compensation:** $177,000/year to $251,000/year + bonus + equity + benefits **Industry:** Internet **Equal Opportunity:** Meta is proud to be an Equal Employment… more
- Meta (Menlo Park, CA)
- …7. Experience with NCCL and distributed GPU performance analysis on RoCE/ Infiniband 8. PhD in Computer Science, Computer Engineering, or relevant technical ... field 9. Knowledge of GPU architectures and CUDA programming 10. Knowledge of ML, deep learning and LLM 11. Experience with both data parallel and model parallel training, such as Distributed Data Parallel, Fully Sharded Data Parallel (FSDP), Tensor Parallel,… more