• Senior Site Reliability Engineer, HPC

    NVIDIA (Santa Clara, CA)
    …As a Site Reliability Engineer, you are responsible for the big picture of how our systems relate to each other, we use a breadth of tools and approaches to tackle a ... + Manage and support workload and resource schedulers in a large-scale HPC environment. + Automate Everything: Develop automation scripts to automate deployment,… more
    NVIDIA (04/04/25)
    - Related Jobs
  • Senior System Software Engineer, NCCL…

    NVIDIA (Santa Clara, CA)
    …the crowd: + Experience conducting performance benchmarking and developing infrastructure on HPC clusters. Prior system administration experience, esp for ... runtimes like NCCL and NVSHMEM for Deep Learning and HPC applications. We are looking for a motivated Partner...Develop tools and automation to isolate issues on new systems and platforms, including cloud platforms (Azure, AWS, GCP,… more
    NVIDIA (04/22/25)
    - Related Jobs
  • Senior High Performance Computing Engineer

    SLAC National Accelerator Laboratory (Menlo Park, CA)
    …these groundbreaking scientific projects. You will be responsible for the advanced administration of our Slurm batch system , alongside deploying, optimizing, and ... parallel applications (eg, gdb, Valgrind, Nvidia Nsight). + In-depth knowledge of Linux operating systems and advanced shell scripting. + Proven expertise… more
    SLAC National Accelerator Laboratory (04/26/25)
    - Related Jobs
  • Analytics DevOps and Platform Engineer (Flex…

    UCLA Health (Los Angeles, CA)
    …IT professional with a strong foundation in cloud computing, Windows and Linux administration , Citrix virtualization, DevOps principles, and automation. The ... platforms (Azure, AWS) and on-premises data centers. + Possess expert knowledge of system administration and security protocols for both Windows and Linux more
    UCLA Health (02/20/25)
    - Related Jobs
  • Sr. Storage Engineer

    The Walt Disney Company (Emeryville, CA)
    …file storage solutions incorporating NFS, SMB, and S3 protocols + Proficiency in Linux system administration , including automated OS installation, package ... the key technology pillars of storage, software tools, and Linux administration . As an essential team member,...our studio. **RESPONSIBILITIES:** + Build and support our on-prem HPC storage systems + Develop software tools… more
    The Walt Disney Company (03/26/25)
    - Related Jobs
  • Systems Development Engineer - SRE, Kuiper

    Amazon (San Diego, CA)
    …no legacy constraints. The team works with customer requirements and wireless system teams to define modems, high-speed interfaces, embedded processors, and DSP ... quickly and confidently with robust verification frameworks that scale with our systems . About the team The Kuiper Silicon teams deliver custom communication silicon… more
    Amazon (05/01/25)
    - Related Jobs
  • Director of Research Computing and Infrastructure

    UCLA Health (Los Angeles, CA)
    …+ Familiarity with PACS and related medical imaging and DICOM services + Windows, LINUX , and FreeBSD system administration + Experience in services/programs ... overseeing the department's entire research computing infrastructure, including high-performance computing ( HPC ) and storage systems . Reporting to the Department… more
    UCLA Health (05/04/25)
    - Related Jobs