• Senior DGX Cloud Software Engineer

    NVIDIA (Santa Clara, CA)
    …operational capacity of our bare-metal, accelerated compute infrastructure and codify reliability best-practices in the broader DGX Cloud platform ecosystem. What ... working with or developing multi-cloud infrastructure services. Experience teaching reliability engineering (eg SRE) and/or other scale-oriented cloud systems… more
    NVIDIA (07/26/25)
    - Related Jobs
  • Senior GPU and HPC Infrastructure…

    NVIDIA (Santa Clara, CA)
    …+ Implement monitoring and health management capabilities that enable industry-leading reliability , availability, and scalability of GPU assets. You will be ... systems (Kubernetes, SLURM) + Understanding of performance, security and reliability in complex distributed systems. Familiarity with system level architecture,… more
    NVIDIA (07/10/25)
    - Related Jobs
  • Senior , Data Engineer

    Carrington (Anaheim, CA)
    …designs and functional specifications to ensure performance, scalability, and reliability while contributing to future cloud migration efforts. + Collaborates ... designs, and functional specifications to ensure performance, scalability, and reliability . + Develops and maintains comprehensive documentation of data processes,… more
    Carrington (07/08/25)
    - Related Jobs
  • Senior Software Engineer , Bare…

    NVIDIA (Santa Clara, CA)
    …Implementing monitoring and health management capabilities that enable industry leading reliability , availability, and scalability of GPU assets. You will be ... Working with teams across NVIDIA to ensure production AI clusters run reliability and consistently with maximum performance. Evaluating system failures and improving… more
    NVIDIA (06/30/25)
    - Related Jobs
  • Senior Silicon Circuits System Design…

    NVIDIA (Santa Clara, CA)
    …and board designers, software/firmware engineers, HW/SW applications engineering, process/ reliability specialists, ATE engineers, product managers, sales, and ... path analysis, power analysis, process technologies, transistor/device physics, silicon reliability , and aging mechanisms. + Familiarity with Perl, C/C++, tool… more
    NVIDIA (06/13/25)
    - Related Jobs
  • Senior Software Engineer

    Microsoft Corporation (Mountain View, CA)
    …and code quality. + 1+ year(s) of experience applying site- reliability engineering (SRE) practices, including monitoring, incident response, and improving ... system resilience. Software Engineering IC4 - The typical base pay range for this role across the US is USD $119,800 - $234,700 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City… more
    Microsoft Corporation (08/21/25)
    - Related Jobs
  • Senior /Software Engineer III…

    Walmart (Sunnyvale, CA)
    …impacting the complete product for non-functional requirements like reliability , operability, performance efficiency and security. Troubleshoot performance and ... availability bottlenecks for the application. + Develop, maintain, and enhance automated test cases and deployment procedures. + Follow coding and design best practices developed by the teams and contribute towards their continuous improvement. **What you'll… more
    Walmart (08/21/25)
    - Related Jobs
  • Senior Prediction and Planning…

    NVIDIA (Santa Clara, CA)
    …to deploy AI models in production environments, ensuring performance, safety, and reliability standards are met. + Integrate machine learning models directly with ... vehicle firmware to deliver production-quality, safety-critical software. What We Want to See: + Hands-on experience building LLMs, VLMs, or VLAs from scratch or a proven track record as a top-tier coder passionate about autonomous systems. + BS/MS in Computer… more
    NVIDIA (08/19/25)
    - Related Jobs
  • Senior Network Engineer

    Newegg Inc. (Diamond Bar, CA)
    …architectural changes and design enhancements to the infrastructure to improve reliability , redundancy, and performance, reduces costs and anticipates Company growth ... and acquisitions. * Ensure that all network-related procedures and policies are documented including diagrams, disaster recovery and all network configurations. * Monitor system performance and provide security measures, troubleshooting and maintenance as… more
    Newegg Inc. (08/19/25)
    - Related Jobs
  • Senior DevOps Engineer

    NVIDIA (Santa Clara, CA)
    …+ Monitor and optimize system health, performance, build/test throughput, and pipeline reliability . + Collaborate closely with Omniverse and Isaac developer teams to ... smooth release workflows, increase velocity, and enforce code + test quality What we need to see: + BS (or equivalent experience) with 5+ years of professional experience in DevOps, SRE, or Build/Release engineering roles at similar scale. + Fluent in Python… more
    NVIDIA (08/16/25)
    - Related Jobs