• Sr Software Dev Engineer, Edge AI ML Platform…

    Amazon (Sunnyvale, CA)
    …Devices (Lab126) where you'll architect and implement distributed training systems that scale to hundreds of billions of parameters. Your work will enable novel ... maintain model quality while reducing size Drive innovation in both large- scale training and edge-optimized model deployment Key job responsibilities - Architect… more
    Amazon (08/02/25)
    - Related Jobs
  • Senior Software Engineer, AI Resiliency

    NVIDIA (Santa Clara, CA)
    …in defining and implementing critical resiliency features for AI supercomputers at a scale of 100,000+ GPUs. Your expertise will be crucial in driving down cluster ... software features that improve AI system reliability at a massive scale , such as fast checkpoint-recovery, error detection, error isolation, and straggler/hang… more
    NVIDIA (07/22/25)
    - Related Jobs
  • Distinguished Software Engineer - NVLink Fusion…

    NVIDIA (Santa Clara, CA)
    …AI and HPC software stack. NVIDIA NVLink Fusion will enable industry-leading AI scale -up and scale -out performance with NVIDIA technology plus semi-custom ASICs ... hyperscalers to build an ASIC hybrid AI infrastructure with NVIDIA NVLink, rack- scale architecture. We're searching for a highly motived, technical architect to… more
    NVIDIA (07/22/25)
    - Related Jobs
  • Sr. Staff Software Engineer, AI Infra

    LinkedIn (Mountain View, CA)
    …engineering and serving with hundreds of billions of parameters models and large scale feature engineering infra for all AI use cases from recommendation models, ... high performance, enable on-device and online training. Challenges include scale (10s of thousands of QPS, multiple terabytes of...using thousands of features), and enabling GPU inference at scale . As a Sr. Staff Software Engineer, you will… more
    LinkedIn (07/18/25)
    - Related Jobs
  • Principal, Software Engineer

    Walmart (Sunnyvale, CA)
    …addressing risks, aligning with evolving regulations, and enabling the business to scale confidently and securely. The team is also actively exploring and applying ... technology decisions, mentor teams, and lead by example in building high- scale , intelligent systems that integrate cutting-edge AI/ML and agentic technologies. You… more
    Walmart (07/12/25)
    - Related Jobs
  • Senior HPC Architect, Networking

    NVIDIA (Santa Clara, CA)
    …architect/engineer for a Senior HPC architect role to support deployment and bringup of large- scale GPU compute clusters. Be a key player to enable the most exciting ... in artificial intelligence and GPU computing. Provide insights on and implement at- scale system administration and tuning mechanisms for large- scale compute… more
    NVIDIA (07/09/25)
    - Related Jobs
  • Principal Applied Scientist, FAR (Frontier AI…

    Amazon (San Francisco, CA)
    …bridge the gap between state-of-the-art research and real-world deployment at Amazon scale . In this role, you'll combine hands-on technical work with scientific ... robotic foundation models and efficient, promptable model architectures that can scale across diverse robotic applications. Key job responsibilities - Lead technical… more
    Amazon (06/25/25)
    - Related Jobs
  • Senior Research Engineer, Foundation Model…

    NVIDIA (Santa Clara, CA)
    …or principal engineer who specializes in building cutting-edge infrastructure for large- scale foundation model training in the Generalist Embodied Agent Research ... team that consistently produces influential works on multimodal foundation models, large- scale robot learning, embodied AI, and physics simulation. Our past projects… more
    NVIDIA (06/07/25)
    - Related Jobs
  • Senior Performance and Resilience Engineer - LLM…

    Red Hat (Sacramento, CA)
    **Job Summary:** The Red Hat Performance and Scale Engineering team (PSAP) is hiring a hands-on performance and resilience engineer to lead the "AI workloads fault ... injection and resilience at scale " efforts for vLLM and llm-d (distributed LLM inference...engineering team. The border mission of the Performance and Scale team is to establish performance and scale more
    Red Hat (08/28/25)
    - Related Jobs
  • Staff Software Engineer, Cloud TPU

    Google (Mountain View, CA)
    …with information and one another. Our products need to handle information at massive scale , and extend well beyond web search. We're looking for engineers who bring ... ideas from all areas, including information retrieval, distributed computing, large- scale system design, networking and data storage, security, artificial… more
    Google (08/28/25)
    - Related Jobs