• AI / HPC Systems

    Meta (Menlo Park, CA)
    …fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI / HPC Systems Performance Engineer Responsibilities: 1. ... **Summary:** Meta's AI Training and Inference Infrastructure is growing exponentially...workloads that expects a loss-less fabric interconnect. To improve performance of these systems we constantly look… more
    Meta (08/01/25)
    - Related Jobs
  • Senior AI - HPC Cluster Engineer…

    NVIDIA (Santa Clara, CA)
    …analyzing and tuning performance for a variety of AI / HPC workloads. Excellent problem-solving to analyze complex systems , identify bottlenecks, and ... and implement GPU compute clusters for deep learning and high- performance computing. What you'll be doing: + Provide leadership...storage systems like Lustre and GPFS for AI / HPC workload. Experience working with deep learning… more
    NVIDIA (07/31/25)
    - Related Jobs
  • Principal Systems Development Engineer…

    Dell Technologies (Round Rock, TX)
    **Principal Systems Development Engineer for AI and HPC solutions team** Our customers' system requirements are usually highly complex. Bringing together ... hardware and software systems design, Systems Development Engineering operates at...or 6+ years with a master's degree * High Performance Computer skills sets with experience working and managing… more
    Dell Technologies (07/20/25)
    - Related Jobs
  • Senior Observability Architect, AI

    NVIDIA (Santa Clara, CA)
    …looking for a technical leader to define a vision and roadmap for distributed observability systems for large-scale AI and HPC clusters and workloads and ... and visualization to spectacularly improve efficiency, performance , and productivity of AI and HPC workloads. You will lead technical teams to develop,… more
    NVIDIA (05/15/25)
    - Related Jobs
  • Senior HPC and AI Networking…

    NVIDIA (Santa Clara, CA)
    …fit for you, we'd love to hear from you! NVIDIA is seeking a Senior High Performance Computing ( HPC ) and AI Networking Performance Research and Analysis ... In this exciting role, you will profile and analyze AI workloads on large GPUs and CPUs scale clusters...and platforms, such as HCAs, Switches, CPUs, GPUs, and Systems . You will develop performance analysis tools… more
    NVIDIA (07/11/25)
    - Related Jobs
  • Software Systems Engineer for AI

    Dell Technologies (Round Rock, TX)
    …with a bachelor's degree or 6+ years with a master's degree * High Performance Computer systems , setup management and use *Advanced understanding of appropriate ... **Principal Systems Development Engineer** Our customers' system requirements are...across extended teams * Experience managing and using High Performance Clusters, including knowledge in slurm, Linux and Kubernettes… more
    Dell Technologies (07/20/25)
    - Related Jobs
  • AI Infrastructure Engineer - HPC

    Cisco (San Jose, CA)
    AI Infrastructure Engineer - HPC Apply (https://jobs.cisco.com/jobs/Login?projectId=1443781) + Location:San Jose, California, US + Alternate LocationAnywhere is ... and managing the internal NVIDIA DGX and Cisco-UCS based AI platforms at Cisco. You will provide leadership in...SaltStack, Puppet and/or Chef + Deep understanding of operating systems , computer networks, and high- performance applications. +… more
    Cisco (07/15/25)
    - Related Jobs
  • Senior Software Architect, AI

    NVIDIA (Santa Clara, CA)
    …group at NVIDIA has openings for software architects in the field of AI and high- performance networking and system software. We research, develop, and ... and usable. + Creating proofs-of-concept to evaluate and motivate extensions in AI Frameworks (PyTorch/NEMO), HPC programming models (MPI, OpenSHMEM, PGAS), new… more
    NVIDIA (07/31/25)
    - Related Jobs
  • Senior Solution Architect, HPC

    NVIDIA (Santa Clara, CA)
    …Be Doing: + Primary responsibilities will include building and enabling robust AI / HPC infrastructure for customers + Support operational and reliability aspects ... of large-scale AI clusters, focusing on performance at scale,...in working with customers + Expertise with parallel file systems (eg Lustre, GPFS, BeeGFS, WekaIO) and high-speed interconnects… more
    NVIDIA (06/18/25)
    - Related Jobs
  • Senior Storage Engineer, HPC & GPU

    Samsung SDS America (Ridgefield Park, NJ)
    …highly skilled and experienced Data Center Storage Engineer with exposure to High Performance Computing ( HPC ) and GPU Infrastructure. The ideal candidate will ... for HPC and GPU-intensive workloads. + Evaluate and implement high- performance storage technologies, including NVMe, SSD, parallel file systems (eg,… more
    Samsung SDS America (06/21/25)
    - Related Jobs