Reliable Distributed Ai Systems Jobs

379 jobs (page 22)

Categories

All Categories

Engineering (123)

Software/IT (117)

Management (14)

Staff Cloud Software Engineer, Backend

TP-Link North America, Inc. (Irvine, CA)

…Azure, OCI) and cloud-based databases (eg, MongoDB, SQL databases). + Proficiency in distributed systems , including middleware such as message queues. + Advanced ... the United States, TP-Link Systems Inc. is a global provider of reliable networking devices and smart home products, consistently ranked as the world's top… more

TP-Link North America, Inc. (10/10/25)
- Related Jobs
Senior, Software Engineer

Walmart (Bentonville, AR)

…technologies such as Kafka for building scalable, event-driven architectures and ensuring reliable data streaming between distributed systems . + Experience ... related field, with 5+ years of experience in large-scale distributed systems . + Strong communication skills and...and analytics. Nice to have : + Familiarity with AI /GenAI, LLMs, chatbots, and using AI tools… more

Walmart (11/22/25)
- Related Jobs
Sr. Research Engineer, Machine Learning, AGI…

Amazon (Lockbourne, OH)

…of the fundamentals of Computer Science, and practical experience building large-scale distributed systems . This person has thrived and succeeded in delivering ... to lead the development of industry-leading models with multimodal systems . As a Senior SDE with the AGI team,...Large Language Models (LLMs) and Generative Artificial Intelligence (Gen AI ). You will have significant influence on our overall… more

Amazon (10/22/25)
- Related Jobs
Site Reliability Engineer (Senior or Staff), Atlas

MongoDB (Pittsburgh, PA)

…As a senior SRE, you will be expected to be able to design & build complex systems , operate with autonomy and act as owner for everything you do. The SRE Atlas team ... the various Atlas software engineering teams to provide expertise about running systems at scale, build new tooling and automation and perform essential maintenance… more

MongoDB (10/10/25)
- Related Jobs
Manager, SRE FedRAMP

Cisco (Chicago, IL)

…Lead a team of super smart engineers who are passionate about large scale distributed systems for Splunk Cloud Observability in FedRAMP environments + Manage ... etc. + Excellent problem-solving, triaging, and debugging skills in large-scale distributed systems **Preferred Qualifications** + Familiarity working with… more

Cisco (11/20/25)
- Related Jobs
Software Engineer - Commerce + Ecosystems

Microsoft Corporation (Redmond, WA)

…As a Software Engineer, the candidate will design, develop, and maintain robust, reliable , and highly distributed software systems using modern technologies. ... thrive at work and beyond. **Responsibilities** + Designs, develops, and maintains distributed software systems using modern technologies. + Collaborates with… more

Microsoft Corporation (12/05/25)
- Related Jobs
Senior HPC Cluster Engineer - EDA

NVIDIA (Santa Clara, CA)

…HPC including InfiniBand, RDMA and RoCE. + Understanding of fast, distributed storage systems such as Lustre and GPFS for AI /HPC workload. + Familiarity with ... ). + Experience analyzing and tuning performance for a variety of AI /HPC workloads. Excellent problem-solving to analyze complex systems , identify bottlenecks,… more

NVIDIA (12/10/25)
- Related Jobs
Manager, Production Engineering

Meta (Boston, MA)

…you will work closely with other engineers and researchers to ensure that our AI training infrastructure is reliable , efficient, and scalable. You will also have ... **Summary:** The AI Production Engineering team at Meta is responsible...the advancement of the field.Production Engineering is a hybrid software/ systems group that ensures Meta's services and products run… more

Meta (11/01/25)
- Related Jobs
Senior Software Development Engineer in Test…

NVIDIA (Santa Clara, CA)

…in automation farm or in cloud. You will continuously innovate and develop scalable, reliable , high performance systems and tools to enable the next generation ... develop test content using C/C++? Do you excel using AI tools to aid in solving complex issues? We'd...large scale, running hundreds of tests per day in distributed heterogeneous servers with NVIDIA's GPUs connect to verify… more

NVIDIA (12/12/25)
- Related Jobs
Principal Software Engineer - DGX Cloud Kubernetes…

NVIDIA (WA)

…proficiency in Go and experience building scalable Go services that manage complex distributed systems + Hands-on experience with Helm, Kustomize, and managing ... to seamlessly install, upgrade, and manage cluster runtime packages powering NVIDIA's AI Accelerators. You'll work on innovative controller systems that manage… more

NVIDIA (11/07/25)
- Related Jobs

"Alerted.org

Advanced Search

Recent Searches

Recent Jobs

Account Login

Sign Up

Forgot your password?