• Senior Software Engineer, Inference Platform

    MongoDB (Palo Alto, CA)
    …Our industry-leading developer data platform, MongoDB Atlas, is the only globally distributed , multi-cloud database and is available in more than 115 regions across ... emphasis on performance and reliability + Experienced in cloud-native architectures, distributed systems, and multi-tenant service design + Familiar with concepts in… more
    MongoDB (06/26/25)
    - Related Jobs
  • Senior AI Software Engineer, GenAI Framework

    NVIDIA (Santa Clara, CA)
    …and optimize models by designing and implementing the latest in distributed training algorithms, model parallel paradigms, model optimizations, defining robust APIs, ... the entire software stack. + Innovate and improve model architectures, distributed training algorithms, and model parallel paradigms. + Accelerate foundation model… more
    NVIDIA (06/25/25)
    - Related Jobs
  • Sr. Machine Learning Engineer, Amazon General…

    Amazon (Sunnyvale, CA)
    …initiatives. We leverage advanced hardware, innovative software architectures, and distributed computing techniques to enable breakthrough research and product ... yourself by developing and institutionalizing best practices in AI/ML infrastructure and distributed computing across the organization. A day in the life 8+ years… more
    Amazon (06/23/25)
    - Related Jobs
  • Software Development Engineer - AI/ML, AWS Neuron,…

    Amazon (Cupertino, CA)
    …compiler engineers and runtime engineers to create, build and tune distributed inference solutions with Trn1. Experience optimizing inference performance for both ... Pytorch or JAX is a must. Deepspeed and other distributed inference libraries are central to this and extending...responsibilities This role will help lead the efforts building distributed inference support into Pytorch, Tensorflow using XLA and… more
    Amazon (06/21/25)
    - Related Jobs
  • Principal Data Platform Architect

    NVIDIA (Santa Clara, CA)
    …are looking for a technical leader to define a vision and roadmap for distributed data platform and observability systems for large-scale AI and HPC clusters and ... We Need to See: + Experience designing and building large scale, distributed observability systems. + Ability to collaborate with data scientists, researchers, and… more
    NVIDIA (06/17/25)
    - Related Jobs
  • Lead, Site Reliability Engineer, Fabric

    MongoDB (San Francisco, CA)
    …Our industry-leading developer data platform, MongoDB Atlas, is the only globally distributed , multi-cloud database and is available in more than 115 regions across ... SRE on the Fabric team, you will leverage your expertise in networking, distributed systems, and automation to ensure our systems are resilient, scalable, and… more
    MongoDB (06/17/25)
    - Related Jobs
  • Senior Deep Learning Research Engineer, Advanced…

    NVIDIA (Santa Clara, CA)
    …in CPU and/or GPU architecture. Knowledge of high-performance computing and distributed programming. + Strong communication and interpersonal skills along with the ... ability to work in a dynamic and distributed team. + Doctoral degree in Computer Science, Computer...crowd: + Experience architecting or developing large-scale deep learning distributed systems + Knowledge of CPU and GPU architecture… more
    NVIDIA (06/15/25)
    - Related Jobs
  • Senior Software Development Engineer, AI/ML, AWS…

    Amazon (Cupertino, CA)
    …compiler engineers and runtime engineers to create, build and tune distributed inference solutions with Trainium and Inferentia. Experience optimizing inference ... JAX is a must. Experience with Deepspeed and other distributed inference libraries is a bonus, as extending these...responsibilities This role will help lead the efforts building distributed inference support for Pytorch in the Neuron SDK.… more
    Amazon (06/14/25)
    - Related Jobs
  • Senior Software Engineer - Parallel Computing…

    NVIDIA (Santa Clara, CA)
    …for emerging AI workloads. From debugging performance bottlenecks in thousand-GPU distributed systems to influencing next-generation hardware design, we push the ... CUDA optimization, GPU programming, numerical libraries (cuBLAS, NCCL), or distributed computing. + Compiler engineering background: LLVM, GCC, domain-specific… more
    NVIDIA (06/07/25)
    - Related Jobs
  • Applied Scientist, ML_AI

    Amazon (Santa Clara, CA)
    …and Education group working on foundation models, large-scale representation learning, and distributed learning methods and systems. At AWS AI/ML you will invent, ... efficient model architecture, training objective and curriculum design * Distributed training, accelerated optimization methods * Continual learning, multi-task/meta… more
    Amazon (06/03/25)
    - Related Jobs