• Senior Software Engineer , AI Resiliency

    NVIDIA (Santa Clara, CA)
    We are now looking for a Senior Software Engineer for AI Resiliency. At NVIDIA, we are pushing the boundaries of what's possible in AI. We are currently seeking a ... Senior Software Engineer to lead the development of AI software resiliency...and performance tuning large-scale AI workloads in cloud and HPC environments, ensuring seamless operation of AI training and… more
    NVIDIA (10/15/25)
    - Related Jobs
  • Principal Network Engineer - DC and AI…

    NVIDIA (Santa Clara, CA)
    We are seeking a highly skilled Principal Network Engineer to join our dynamic team to build the next generation of IT AI Clusters and help lead the team through a ... while building a solid foundation with automation. We are looking for a passionate engineer who will solve networking problems for scalable AI clusters. This is a… more
    NVIDIA (10/02/25)
    - Related Jobs
  • Senior Math Libraries Engineer , CPU…

    NVIDIA (Santa Clara, CA)
    NVIDIA is looking for an expert software engineer to help us deliver CUDA-X libraries across the NVIDIA CPU and GPU ecosystem. For over a decade, NVIDIA's ... accelerated computing platform has revolutionized HPC and AI with applications ranging from COVID-19 research...are rapidly growing. If you're a creative and autonomous engineer with a real passion for technology, we want… more
    NVIDIA (09/26/25)
    - Related Jobs
  • Sr Principal Software Engineer , Networking…

    Oracle (Cheyenne, WY)
    …Cloud) AI Infrastructure Innovation team is pioneering the creation of next-generation AI/ HPC networking for GPU superclusters at massive scale. Our mission is to ... system design, and implementation for high-performance RDMA solutions across OCI's AI/ HPC platforms, including frontend and backend fabrics. + Innovate on network… more
    Oracle (12/20/25)
    - Related Jobs
  • Network Development Engineer

    Oracle (Baton Rouge, LA)
    …cluster networking domain and enable seamless, accelerated High-Performance Compute ( HPC ), Artificial Intelligence and Machine Learning advancements. We envision a ... of state-of-the-art RDMA clusters tailored specifically for AI, ML, HPC workloads. We strive to be the go-to experts...deep understanding of the unique demands of AI/ML and HPC applications. By staying at the forefront of technological… more
    Oracle (12/20/25)
    - Related Jobs
  • Senior Network Development Engineer

    Oracle (Annapolis, MD)
    …cluster networking domain and enable seamless, accelerated High-Performance Compute ( HPC ), Artificial Intelligence and Machine Learning advancements. We envision a ... of state-of-the-art RDMA clusters tailored specifically for AI, ML, HPC workloads. We strive to be the go-to experts...deep understanding of the unique demands of AI/ML and HPC applications. By staying at the forefront of technological… more
    Oracle (12/13/25)
    - Related Jobs
  • Senior Math Libraries Engineer - Sparsity…

    NVIDIA (Santa Clara, CA)
    …to simplify and accelerate computing for unstructured sparsity in DL and HPC . Around the world, leading commercial and academic organizations are revolutionizing AI, ... and accelerate computing for unstructured sparsity in DL and HPC on NVIDIA GPUs + Enable the system in...of sparse computations, in particular sparsity in AI and HPC + Good understanding of LLMs, Deep Learning methods… more
    NVIDIA (11/18/25)
    - Related Jobs
  • Hits-U III Site Lead Engineer Research…

    General Dynamics Information Technology (Vicksburg, MS)
    …**Job Family:** IT Infrastructure and Operations **Skills:** High-Performance Computing ( HPC ) Systems,People Management,Team Management **Experience:** 8 + years of ... must be an effective leader with a broad technical background in HPC Data Center management/support, capable of interacting and communicating with all disciplines… more
    General Dynamics Information Technology (10/22/25)
    - Related Jobs
  • Software Engineer , SystemML - AI…

    Meta (Menlo Park, CA)
    …Communications Library), which enables multi-GPU and multi-node data communication through HPC -style collectives. NCCL has been integrated into PyTorch and is on ... of GenAI/LLM scaling reliability and performance. **Required Skills:** Software Engineer , SystemML - AI Networking Responsibilities: 1. Tech-leading the collective… more
    Meta (12/20/25)
    - Related Jobs
  • Software Engineer , SystemML - Scaling…

    Meta (Menlo Park, CA)
    …Communications Library), which enables multi-GPU and multi-node data communication through HPC -style collectives. NCCL has been integrated into PyTorch and is on ... of GenAI/LLM scaling reliability and performance. **Required Skills:** Software Engineer , SystemML - Scaling / Performance Responsibilities: 1. Enabling reliable… more
    Meta (12/20/25)
    - Related Jobs