• Senior Software Engineer, Networking Software

    NVIDIA (Santa Clara, CA)
    …commitment to customer success + Knowledge of networking protocols, especially RoCE, Infiniband , and the common L1 to L4 protocols + Background in high-performance ... networking and low-level programming, with strong proficiency in C + Strong analytical and problem-solving skills + Strong time-management and organization skills for coordinating multiple initiatives, priorities, and implementations of new technology and… more
    NVIDIA (10/16/25)
    - Related Jobs
  • Software Engineer, SystemML - AI Networking

    Meta (Menlo Park, CA)
    …7. Experience with NCCL and distributed GPU performance analysis on RoCE/ Infiniband 8. PhD in Computer Science, Computer Engineering, or relevant technical ... field 9. Knowledge of GPU architectures and CUDA programming 10. Knowledge of ML, deep learning and LLM 11. Experience with both data parallel and model parallel training, such as Distributed Data Parallel, Fully Sharded Data Parallel (FSDP), Tensor Parallel,… more
    Meta (10/16/25)
    - Related Jobs
  • HPC Systems Engineer

    Insight Global (Baltimore, MD)
    …skills in Bash and Python. * Experience with high-speed interconnects (eg, Infiniband ) and advanced networking in HPC environments. * Problem-Solving: Ability to ... troubleshoot complex issues and work independently to resolve them. more
    Insight Global (10/16/25)
    - Related Jobs
  • Technical Account Manager -MGX and HGX

    NVIDIA (Seattle, WA)
    …and reproduction for customers installing our products with a focus on Infiniband , next-generation AI, and HPC server technologies. + Own and resolve customer ... issues during installation, operation, maintenance or product application or interoperability with other vendors + Work with the latest hardware (eg GPUs, AI accelerators, high-speed interconnects) and software technologies such as ML frameworks and tools like… more
    NVIDIA (10/15/25)
    - Related Jobs
  • Lead High Performance Computing System Architect

    IBM (Herndon, VA)
    …architecture design experience with HPC to include storage, file system, InfiniBand , security, authentication, and compute architecture with 5 years' experience. ... Knowledge of HPC hardware architectures, including processors, memory subsystems, network fabrics, and interconnects with 5 years' experience Must Be Eligible For A Federal Clearance Public Trust required but can obtain after being hired. **Preferred technical… more
    IBM (10/15/25)
    - Related Jobs
  • Systems Integration Engineer

    Synergy ECP (Columbia, MD)
    …cluster/node monitoring tools such as NHC (Node Health Check) Familiar with InfiniBand network communications Familiar with parallel file systems such as Lustre ... Experience with the Atlassian Suite of Tools (Jira, Confluence, Bitbucket) more
    Synergy ECP (10/15/25)
    - Related Jobs
  • Senior System Software Engineer, GPU Server

    NVIDIA (Santa Clara, CA)
    …close to the hardware + Background with PCIE, NVLink or server IO technologies like Infiniband , Ethernet is a plus + Previous experience of working on a large system ... software code base is preferable + Very strong problem solving and debugging skills + Ability to self-manage, show leadership, and have good interpersonal skills Your base salary will be determined based on your location, experience, and the pay of employees… more
    NVIDIA (10/15/25)
    - Related Jobs
  • Senior DGX Cloud Software Engineer…

    NVIDIA (Santa Clara, CA)
    …with accelerated compute and communications technologies such BlueField Networking, Infiniband topologies, NVMesh, and/or the NVIDIA Collective Communication Library ... (NCCL). + Experience working with a centralized security organization to prioritize and mitigate security risks. Prior experience in a ML/AI focused role or on a team matching specific keywords is welcome but not required. NVIDIA is leading the way in… more
    NVIDIA (10/15/25)
    - Related Jobs
  • Storage & Server Test Engineer (Austin)

    Celestica (Merrimack, NH)
    …and command-line tools. * Familiarity with networking concepts (Ethernet, TCP/IP, InfiniBand ) and network testing methodologies. * Experience with test methodologies ... such as performance testing, reliability testing, stress testing, and fault injection. * Excellent problem-solving, analytical, and debugging skills. * Strong communication and interpersonal skills, with the ability to collaborate effectively across diverse… more
    Celestica (10/14/25)
    - Related Jobs
  • Senior Lead Storage & Server Test Engineer…

    Celestica (Merrimack, NH)
    …and command-line tools. * Familiarity with networking concepts (Ethernet, TCP/IP, InfiniBand ) and network testing methodologies. * Experience with test methodologies ... such as performance testing, reliability testing, stress testing, and fault injection. * Excellent problem-solving, analytical, and debugging skills. * Strong communication and interpersonal skills, with the ability to collaborate effectively across diverse… more
    Celestica (10/14/25)
    - Related Jobs