- NVIDIA (Santa Clara, CA)
- …you will work with internal teams and external partners to integrate distributed systems , manage large-scale data pipelines, and operationalize next-generation ... pipelines using Go, Python, Bash, and Bazel to ensure reproducibility, efficiency, and reliable distributed execution. + Integrate simulation and drive logs (eg… more
- Rubrik (Palo Alto, CA)
- …/Kernel or Networking domain + Strong fundamentals in data structures, algorithms, and distributed systems design + Strong background in Systems Programming ... and CTO, our mission is to build a highly reliable , secure, and scalable software-defined platform. We are the...Go, and either C++, Java, or Scala + Large distributed systems design and development experience is… more
- NVIDIA (Austin, TX)
- …from the crowd: + Technical competency in managing and automating large-scale distributed systems independent of cloud providers. Advanced hands-on experience ... part of an DGX Cloud team responsible for production systems that enable large scalable GPU clusters to be...Bright Cluster Manager) + Proven operational excellence in maintaining reliable and performant AI infrastructure. NVIDIA is… more
- NVIDIA (Santa Clara, CA)
- …achieve this goal, we are looking for an engineer with a deep understanding of distributed systems , outstanding design skills, and a track record in building and ... the broader NVIDIA team to design and build a reliable , scalable, and efficient storage-as-a-service tailored to AI...years of industry experience + Strong background in developing distributed systems involving Golang, Kubernetes, and Cloud… more
- Microsoft Corporation (Redmond, WA)
- …healthcare, economics, and the environment. Are you passionate about building the future of reliable , large-scale cloud and AI systems ? The ** Systems ... Interns to tackle cutting-edge challenges at the intersection of distributed systems , AI systems...letter. **Preferred Qualifications** + Experience of building scalable and reliable systems . + Demonstrated ability to develop… more
- NVIDIA (Santa Clara, CA)
- …design, or enterprise platform engineering. + Deep expertise in architecting large-scale distributed systems with a focus on reliability, performance, and ... record of publishing technical papers, architecture patterns, or thought leadership in AI systems . + Knowledge of observability tools, telemetry dashboards, and… more
- NVIDIA (Santa Clara, CA)
- …is ideal + Demonstrated ability in building scalable, agile, and robust distributed systems + Successful product rollouts and collaboration with early ... NVIDIA DGX Cloud is a fully managed, cloud-based AI supercomputing platform that provides organizations with direct...Software Engineer with experience in building highly agile and reliable software to join us. We are building and… more
- Microsoft Corporation (Redmond, WA)
- …This is an opportunity to deepen your expertise in distributed systems , programming models, and multi-modal AI integration (text, audio, video), while ... in solving complex technical challenges in one or more domains such as distributed systems , AI /ML infrastructure, developer platforms, or cloud services.… more
- Meta (Menlo Park, CA)
- …leverage our large-scale GPU training and inference fleet through an observable, reliable and high-performance distributed AI /GPU communication stack. ... learning domains: Distributed ML Training, GPU architecture, ML systems , AI infrastructure, high performance computing, performance optimizations, or… more
- NVIDIA (Santa Clara, CA)
- …and inference more reliable , scalable, and efficient. If you're passionate about AI , distributed systems , and high-performance computing, we want to hear ... driving down cluster downtime towards zero, ensuring that our AI systems remain robust and reliable...detection. + Hands-On Coding & Optimization: Contribute to large-scale distributed systems with high-quality, production-level C++ and… more
Recent Jobs
-
Operating Engineer Apprentice
- ASM Global (Houston, TX)
-
Manager Financial Planning and Analysis
- PennyMac (Westlake Village, CA)
-
Senior Process Engineer
- Abbott (Sylmar, CA)
-
Plant Utilities Engineer 1
- SUNY Upstate Medical University (Syracuse, NY)