- TE Connectivity (CA)
- …optical transceivers, near/co-package optical transceivers, optical interconnects for advanced AI / HPC environment, compute, storage, and networking hardware ... + Define novel, scalable CPO/NPO architectures to meet future performance , power, and density demands of AI ,...AI , ML, and hyperscale workloads. + Architect CPO/NPO systems across optical, electrical, thermal, and mechanical domains. +… more
- Meta (Menlo Park, CA)
- …These workloads expect a loss-less fabric interconnect with minimal latency. To improve performance of these systems we constantly look for opportunities across ... that the network is running smoothly and meets stringent performance and availability requirements of RDMA (Remote Direct Memory...responsible for design, model, develop, test, deploy and operate AI / HPC Networks at scale 2. Provide continual… more
- NVIDIA (Santa Clara, CA)
- …detection and remediation of performance and reliability issues. + Optimize AI /ML and HPC workloads by crafting intelligent caching, low-latency storage ... Team's internal and partner-facing storage environments. We focus on delivering high- performance , highly available storage systems that scale while enabling… more
- SLAC National Accelerator Laboratory (Menlo Park, CA)
- …labs. Our team also works on advanced computational workflows linking high performance computing ( HPC ) with experiments in real-time, in collaboration with ... fill open positions to work with our team on AI /ML solutions for challenging problems in modeling and control...unprecedented capabilities in modeling and control of complex, nonlinear systems in particle accelerators, which in turn enables new… more
- NVIDIA (Santa Clara, CA)
- …GPUs and SOCs powering product lines for the growing field of artificial intelligence ( AI ) and high- performance computing ( HPC ). What you'll be doing: + ... Today, NVIDIA is tapping into the unlimited potential of AI to define the next era of computing. An...features to improve system Reliability, Availability, Serviceability (RAS), and performance in the Datacenter. + Model and analyze RAS… more
- NVIDIA (Santa Clara, CA)
- NVIDIA is hiring engineers to scale up its AI Infrastructure. We expect you to have a strong programming background, a deep understanding of distributed systems , ... capacity to build and deploy leading infrastructure solutions for a broad range of AI -based applications that affect core data science. What are you waiting for if… more
- NVIDIA (Santa Clara, CA)
- …NVLink, NVIDIA InfiniBand networking, NVIDIA Grace CPUs, and a fully optimized NVIDIA AI and HPC software stack. NVIDIA NVLink Fusion will enable ... center platform & node designs. From single node HGX/DGX systems all the way up to large multi-node NVLink...industry-leading AI scale-up and scale-out performance with NVIDIA… more
- Meta (Menlo Park, CA)
- …10. Experience in leading teams working on high performance computing ( HPC ) and AI /ML systems , including: GPU/ASIC-based kernel development and ... ROCm), distributed systems for large scale training and serving, and systems architecture and performance 11. Accelerator (GPU/ASIC) kernel development and… more
- NVIDIA (Santa Clara, CA)
- …works on multimodal foundation models, large-scale robot learning, embodied AI , and physics simulation. Our past projects include Eureka ... What you will be doing: + Design and maintain large-scale distributed training systems to support multi-modal foundation models for robotics. + Optimize GPU and… more
- NVIDIA (Santa Clara, CA)
- …GPUs and SOCs powering product lines for the growing field of artificial intelligence ( AI ) and high- performance computing ( HPC ). What you'll be doing: + ... features to improve system Reliability, Availability, Serviceability (RAS), and performance in the Datacenter. + Model and analyze RAS...parallel computing. More recently, GPU deep learning ignited modern AI - the next era of computing - with… more