• Senior DGX Cloud AI Infrastructure Software…

    NVIDIA (Santa Clara, CA)
    …APIs for integration with NVIDIA's resiliency stacks. + Define meaningful and actionable reliability metrics to track and improve system and service reliability . ... + Skilled in problem-solving, root cause analysis, and optimization. What we need to see: + Minimum of 8+ years of experience in developing software infrastructure for large scale AI systems. + Bachelor's degree or higher in Computer Science or a related… more
    NVIDIA (08/02/25)
    - Related Jobs
  • 2026 Facilities Engineering Chemical Engineer

    Chevron Corporation (El Segundo, CA)
    …may choose to focus on other areas of interest, including: * ** Reliability & Integrity Engineering** - focused on asset integrity, reliability , maintainability, ... and operability of operating assets. * **Technical Safety Engineering** \- identify hazards, assess risk, and develop mitigation strategies through application of engineering design principles to a managed level of risk for operating facilities and capital… more
    Chevron Corporation (08/02/25)
    - Related Jobs
  • Security Officer- Access Control

    Allied Universal (Santa Clara, CA)
    …your attention to detail and commitment to Allied Universal's values-agility, reliability , innovation, teamwork, and integrity-will be highly valued. Join our team ... procedures. **What We're Looking For:** + Availability across various days and shifts + Reliability and ability to adapt to different post assignments + A desire to… more
    Allied Universal (08/02/25)
    - Related Jobs
  • Executive Support Technician

    Meta (Menlo Park, CA)
    …and support the latest technology while maintaining a high level of reliability to ensure seamless operations for the executive team. 3. Maintain confidentiality ... in the pursuit of resolving complex issues while maintaining a high level of reliability . 6. Work across the industry to improve or resolve bugs with third-party… more
    Meta (08/01/25)
    - Related Jobs
  • Mechanical Engineer, Data Center Engineering

    Meta (Fremont, CA)
    …benchmark of innovation for the data center industry and improve reliability , safety, and efficiency through innovative solutions, enabling Infrastructure teams to ... and develop new data center cooling technologies/systems to reduce cost, improve reliability , efficiency, and speed to market. Contribute to development of data… more
    Meta (08/01/25)
    - Related Jobs
  • Network Engineer

    Meta (Menlo Park, CA)
    …an active participant in deep technical discussions on how to improve the reliability and efficiency of Meta's networks. 6. Collaborate with partner teams in ... 12. Work on performance measurement to ensure improved efficiency and reliability of the network. **Minimum Qualifications:** Minimum Qualifications: 13. Requires… more
    Meta (08/01/25)
    - Related Jobs
  • Package Design Engineer

    Meta (Menlo Park, CA)
    …products 18. Lead package development to establish package manufacturability and reliability 19. Collaborate with multi-functional teams with in Meta and define ... products 24. Lead package development to establish package manufacturability and reliability 25. Collaborate with multi-functional teams with in Meta and define… more
    Meta (08/01/25)
    - Related Jobs
  • Research Scientist, Robotics Systems

    Meta (Burlingame, CA)
    …problems, as well as complex robotics that balance ergonomics, performance, reliability , and usability while maintaining the highest safety and quality standards. ... communication hardware/firmware and systems integration while balancing ergonomics, performance, reliability 4. Research and design planning and control algorithms… more
    Meta (08/01/25)
    - Related Jobs
  • Application Manager, Identity and Access…

    Meta (Fremont, CA)
    …testing, deployment, and ongoing maintenance 2. System Performance & Reliability : Monitor application performance, identify bottlenecks, and implement solutions to ... ensure high availability, reliability , and scalability 3. Security & Compliance: Ensure IAM applications adhere to internal security policies, industry best… more
    Meta (08/01/25)
    - Related Jobs
  • Software Engineer, SystemML - AI Networking

    Meta (Menlo Park, CA)
    …SW stacks around NCCL and PyTorch to improve the full-stack distributed ML reliability and performance (eg Large-Scale GenAI/LLM training) from the trainer down to ... seeking for engineers to work on the space of GenAI/LLM scaling reliability and performance. **Required Skills:** Software Engineer, SystemML - AI Networking… more
    Meta (08/01/25)
    - Related Jobs