Reliable Distributed Ai Systems Jobs

368 jobs (page 8)

Categories

All Categories

Engineering (117)

Software/IT (117)

Management (12)

Senior Systems Software Engineer, AV…

NVIDIA (Santa Clara, CA)

…you will work with internal teams and external partners to integrate distributed systems , manage large-scale data pipelines, and operationalize next-generation ... pipelines using Go, Python, Bash, and Bazel to ensure reproducibility, efficiency, and reliable distributed execution. + Integrate simulation and drive logs (eg… more

NVIDIA (09/19/25)
- Related Jobs
Senior Technical Systems AI…

NVIDIA (Santa Clara, CA)

…design, or enterprise platform engineering. + Deep expertise in architecting large-scale distributed systems with a focus on reliability, performance, and ... record of publishing technical papers, architecture patterns, or thought leadership in AI systems . + Knowledge of observability tools, telemetry dashboards, and… more

NVIDIA (10/16/25)
- Related Jobs
Software Engineer Data/ AI /Intelligent…

Cisco (San Jose, CA)

…platforms, such as AWS, Azure, or Google Cloud. + Understanding of distributed systems concepts, including scalability, reliability, fault tolerance, and data ... Team** Our dedicated team members are building the future of Cisco's AI -driven platforms and data infrastructure, supporting innovation across the globe. You will… more

Cisco (12/01/25)
- Related Jobs
Senior Software Engineer, Distributed…

NVIDIA (Austin, TX)

…from the crowd: + Technical competency in managing and automating large-scale distributed systems independent of cloud providers. Advanced hands-on experience ... part of an DGX Cloud team responsible for production systems that enable large scalable GPU clusters to be...Bright Cluster Manager) + Proven operational excellence in maintaining reliable and performant AI infrastructure. NVIDIA is… more

NVIDIA (10/04/25)
- Related Jobs
Solutions Architect - Rack Scale AI…

NVIDIA (Santa Clara, CA)

…hosts a heterogeneous mix of machines and devices with various operating systems (Windows/Linux/Android), a multitude of hardware platforms both NVIDIA GPUs and ... Tegra Processors. Are you passionate about distributed infrastructure and looking for sophisticated, critical issues, ready to build the next generation of cloud… more

NVIDIA (12/10/25)
- Related Jobs
Software Development Engineer, Distributed…

Amazon (Seattle, WA)

…base. You'll bring a passion for innovation, data, search, analytics, and distributed systems . You'll also: - Solve challenging technical problems, often ... one of several AWS tools used for building Generative AI on AWS. The Neuron Compiler Engineering team is...for identifying and designing solutions that enable efficient and reliable build, test, and release mechanisms for the Neuron… more

Amazon (12/01/25)
- Related Jobs
Principal Software Developer - OCI AI…

Oracle (Nashville, TN)

…Work closely with a collaborative and experienced global team. - Expand your knowledge in AI , cloud computing, and distributed systems . - Contribute to one ... tools to operationalize Large Language Models (LLMs) and agentic AI systems . Our goal is to empower...will contribute to the design and implementation of scalable, distributed systems that serve LLMs and support… more

Oracle (11/25/25)
- Related Jobs
Principal Software Engineer- OCI AI…

Oracle (San Juan, PR)

…Work closely with a collaborative and experienced global team. - Expand your knowledge in AI , cloud computing, and distributed systems . - Contribute to one ... tools to operationalize Large Language Models (LLMs) and agentic AI systems . Our goal is to empower...will contribute to the design and implementation of scalable, distributed systems that serve LLMs and support… more

Oracle (11/25/25)
- Related Jobs
Senior Software Engineer, AI Resiliency

NVIDIA (Santa Clara, CA)

…and inference more reliable , scalable, and efficient. If you're passionate about AI , distributed systems , and high-performance computing, we want to hear ... driving down cluster downtime towards zero, ensuring that our AI systems remain robust and reliable...detection. + Hands-On Coding & Optimization: Contribute to large-scale distributed systems with high-quality, production-level C++ and… more

NVIDIA (10/15/25)
- Related Jobs
Software Developer 3 - OCI AI Platform

Oracle (Columbus, OH)

…. This is a highly technical, hands-on role where you'll build large-scale distributed systems , optimize AI /ML workflows, and collaborate with ... observability, CI/CD pipelines, and operational excellence. Troubleshoot complex issues in distributed systems and participate in on-call rotations as needed.… more

Oracle (11/25/25)
- Related Jobs

"Alerted.org

Advanced Search

Recent Searches

Recent Jobs

Account Login

Sign Up

Forgot your password?