- MongoDB (Palo Alto, CA)
- …Our industry-leading developer data platform, MongoDB Atlas, is the only globally distributed , multi-cloud database and is available in more than 115 regions across ... emphasis on performance and reliability + Experienced in cloud-native architectures, distributed systems, and multi-tenant service design + Familiar with concepts in… more
- NVIDIA (Santa Clara, CA)
- …and optimize models by designing and implementing the latest in distributed training algorithms, model parallel paradigms, model optimizations, defining robust APIs, ... the entire software stack. + Innovate and improve model architectures, distributed training algorithms, and model parallel paradigms. + Accelerate foundation model… more
- Amazon (Sunnyvale, CA)
- …initiatives. We leverage advanced hardware, innovative software architectures, and distributed computing techniques to enable breakthrough research and product ... yourself by developing and institutionalizing best practices in AI/ML infrastructure and distributed computing across the organization. A day in the life 8+ years… more
- Amazon (Cupertino, CA)
- …compiler engineers and runtime engineers to create, build and tune distributed inference solutions with Trn1. Experience optimizing inference performance for both ... Pytorch or JAX is a must. Deepspeed and other distributed inference libraries are central to this and extending...responsibilities This role will help lead the efforts building distributed inference support into Pytorch, Tensorflow using XLA and… more
- NVIDIA (Santa Clara, CA)
- …are looking for a technical leader to define a vision and roadmap for distributed data platform and observability systems for large-scale AI and HPC clusters and ... We Need to See: + Experience designing and building large scale, distributed observability systems. + Ability to collaborate with data scientists, researchers, and… more
- MongoDB (San Francisco, CA)
- …Our industry-leading developer data platform, MongoDB Atlas, is the only globally distributed , multi-cloud database and is available in more than 115 regions across ... SRE on the Fabric team, you will leverage your expertise in networking, distributed systems, and automation to ensure our systems are resilient, scalable, and… more
- NVIDIA (Santa Clara, CA)
- …in CPU and/or GPU architecture. Knowledge of high-performance computing and distributed programming. + Strong communication and interpersonal skills along with the ... ability to work in a dynamic and distributed team. + Doctoral degree in Computer Science, Computer...crowd: + Experience architecting or developing large-scale deep learning distributed systems + Knowledge of CPU and GPU architecture… more
- Amazon (Cupertino, CA)
- …compiler engineers and runtime engineers to create, build and tune distributed inference solutions with Trainium and Inferentia. Experience optimizing inference ... JAX is a must. Experience with Deepspeed and other distributed inference libraries is a bonus, as extending these...responsibilities This role will help lead the efforts building distributed inference support for Pytorch in the Neuron SDK.… more
- NVIDIA (Santa Clara, CA)
- …for emerging AI workloads. From debugging performance bottlenecks in thousand-GPU distributed systems to influencing next-generation hardware design, we push the ... CUDA optimization, GPU programming, numerical libraries (cuBLAS, NCCL), or distributed computing. + Compiler engineering background: LLVM, GCC, domain-specific… more
- Amazon (Santa Clara, CA)
- …and Education group working on foundation models, large-scale representation learning, and distributed learning methods and systems. At AWS AI/ML you will invent, ... efficient model architecture, training objective and curriculum design * Distributed training, accelerated optimization methods * Continual learning, multi-task/meta… more