- Meta (Menlo Park, CA)
- …domains: High speed networking (RDMA), Distributed ML Training, GPU architecture, ML systems , AI infrastructure, high performance computing, performance ... large-scale GPU training and inference fleet through an observable, reliable and high- performance distributed AI /GPU communication stack. Currently, one of the… more
- Meta (Menlo Park, CA)
- …following machine learning/deep learning domains: Distributed ML Training, GPU architecture, ML systems , AI infrastructure, high performance computing, ... large-scale GPU training and inference fleet through an observable, reliable and high- performance distributed AI /GPU communication stack. Currently, one of the… more
- Oracle (Raleigh, NC)
- …conversational search, and summarization. + Work with Oracle Vector Database and other retrieval systems to optimize AI performance . + Build and optimize ... improve quality and close gaps. + Stay current with emerging trends in AI infrastructure, agent frameworks, HPC systems , and cloud-native technologies;… more
- Oracle (Nashville, TN)
- …networking, HPC , or GPU infrastructure. + Expertise in designing data feedback systems that improve AI model performance through continuous learning. + ... and compute platforms. In this role, you will design and deliver AI -powered systems for predictive incident detection, automated remediation, and root-cause… more
- Amazon (Seattle, WA)
- …directly with silicon architects and compiler teams to push the boundaries of AI acceleration * Drive performance benchmarking and tuning that directly impacts ... and ML framework internals * Strong understanding of distributed systems and ML optimization * Passion for performance...AI accelerators - Experience in developing CUDA kernels, HPC and inference optimization, tensors operations Amazon is an… more
- SHI (Lansing, MI)
- …architecture, presales engineering, or datacenter solution design, including 5+ years dedicated to AI infrastructure or HPC systems . + Strong understanding ... as the chief architect and trusted technical strategist for multimillion-dollar AI and datacenter modernization opportunities, ensuring performance , scalability,… more
- Oracle (Santa Clara, CA)
- …or Scala + Proven experience designing, implementing, and managing infrastructure for AI /ML or HPC workloads. + Understanding machine learning frameworks and ... monitoring (eg, Jenkins, GitLab CI/CD, Prometheus). + Strong experience with High- Performance Computing systems **Responsibilities** + Take ownership of problems… more
- Cisco (Research Triangle Park, NC)
- …security, networking, observability, and more. We design, build, and maintain high- performance compute and AI platforms-including NVIDIA DGX and Cisco-UCS ... Remote withing USA Meet the Team At Cisco, the AI Infrastructure Services team is at the forefront of...years Experience deploying and administering NVIDIA (DGX) or equivalent high- performance -compute ( HPC ) clusters (eg, Cray, HPE, IBM).… more
- Oracle (Nashville, TN)
- …automation, and diagnostic services. These are essential for running distributed AI /ML/ HPC workloads across thousands of GPUs, leveraging technologies like ... CPU, Network, Storage with the goal to optimize customer experience and customer workload performance on our AI infrastructure. + Develop "best-in-class" AI … more
- NVIDIA (Santa Clara, CA)
- …that shape roadmap decisions, set performance goals, and define standards for real-time AI systems . + Mentor and elevate the team, guiding peers and junior ... Electrical Engineering, or related fields. + Deep passion for computer vision, real-time AI , and sensor driven systems , with an ability for translating research… more