- The Voleon Group (Berkeley, CA)
- …a multibillion‑dollar asset manager, and we have ambitious goals for the future. As a Senior Cluster Site Reliability Engineer (SRE), you will help ... scale our research compute cluster to meet our growing needs, and you will...leverage engineering skills to ensure high degrees of uptime, reliability , and robustness. Our research clusters are at the… more
- The Voleon Group (Berkeley, CA)
- A leading technology firm in Berkeley is seeking a Senior Cluster Site Reliability Engineer to ensure high uptime and manage operational issues for their ... research compute cluster . Candidates should have extensive SRE experience, knowledge of HPC frameworks, and scripting skills. The role emphasizes collaboration with… more
- Lawrence Berkeley National Laboratory (Berkeley, CA)
- …Lab's ( LBNL ) Information Technology Division ( IT ) has an opening for a Senior HPC Cluster Systems Administrator to join their ScienceIT Team ! In this ... by building, integrating, and maintaining Linux-based resources, high-performance computing cluster systems, and Kubernetes clusters. This role provides extensive… more
- NVIDIA Corporation (Santa Clara, CA)
- …take great pride in providing excellent, comprehensive support to our customers! Sr Site Reliability Engineer in this role will significantly impact and ... experience in Computer Science or related field. 8+ years of experience in site reliability engineering and/or software development roles. Fluency in Python… more
- Fluidstack (San Francisco, CA)
- …regions". Building internal tooling to decrease deployment time and increase cluster reliability , including automation where the customer benefits clearly ... join us in building what's next. About the Role Senior / Staff SREs at Fluidstack sit at the...working across software, hardware, and operations to ensure the reliability and performance of our global GPU cloud. They… more
- Pantera Capital (Palo Alto, CA)
- …knowledge with their teammates. About the Role We are seeking a highly skilled Senior Site Reliability Storage Engineer to join our mission-driven team, ... with up to 25% travel required. Required Qualifications 5+ years of experience as a Site Reliability Engineer or similar role, with a focus on building and… more
- Boson AI (Palo Alto, CA)
- About The Role We're looking for a Senior Site Reliability Engineer to help us run one of the most exciting GPU clusters around-our Toronto datacenter packed ... as we continue to scale. Responsibilities Manage and optimize HPC cluster operations Deploy and maintain infrastructure‑as‑code solutions Support ML/research teams… more
- Google Inc. (Sunnyvale, CA)
- …the ability to build consensus across organizational boundaries. About the job Site Reliability Engineering (SRE) combines software and systems engineering to ... Senior Staff Software Engineer, SRE, ML Fleet Systems...Understanding of resource management systems (eg, Borg, Kubernetes, Flex), cluster management, and scheduling algorithms. Familiarity with Machine Learning… more
- Pantera Capital (Palo Alto, CA)
- A leading tech firm in Palo Alto is seeking a Senior Site Reliability Storage Engineer to design and optimize Kubernetes clusters. The ideal candidate will ... Kubernetes orchestration and distributed systems. Responsibilities include managing system reliability , developing software for cluster provisioning, and… more
- Cadence Design Systems, Inc. (San Jose, CA)
- …experienced AI Systems Engineer to join our team. This is a hands-on, senior individual contributor role that will be pivotal in leading the development, operations, ... solutions, and networking to ensure optimal performance, scalability, and reliability for all our AI workloads. + Cloud AI...services on both GCP and Azure. + Hands-on GPU Cluster Management: Take a leadership role in the configuration,… more
Recent Jobs
-
Senior Software Engineer
- ClearEdge IT Solutions, LLC (Annapolis, MD)
-
Senior Process Engineer
- Optimax Systems, Inc (Ontario, NY)
-
Java Back-End Engineer
- CGI Technologies and Solutions, Inc. (Merrimack, NH)
-
Director, Distinguished Engineer (Card Tech)
- Capital One (Mclean, VA)