- Amazon (Cupertino, CA)
- …with trade-offs, SW development, testing and building diagnostic tools for continuous monitoring . A successful candidate in this role has prior understanding of ... software development experience - 6+ years of designing or architecting (design patterns, reliability and scaling) of new and existing systems experience - 6+ years… more
- Hatch (Los Angeles, CA)
- …with minimal supervision. Communications systems and networks are used by operations departments to dispatch fleets, support critical safety systems, enable ... implementation, administration and optimization of enterprise network management systems and monitoring utilities such as HP Open View, WhatsUp Gold, Nagios,… more
- Amazon (San Diego, CA)
- …features such as elastic network interfaces, firewalls (security groups), routing, monitoring , and HW acceleration where we continuously push the bounds on ... performance. We are seeking a Software Engineer with experience in System Software / Embedded Software,...5+ years of leading design or architecture (design patterns, reliability and scaling) of new and existing systems experience… more
- Amazon (Sunnyvale, CA)
- …* You will develop code, build CI/CD pipelines, test automation, and dashboards for monitoring health of systems and data pipelines. * You will work with multiple ... mentor junior and new team members. * You will engineer and build a cloud service that is reliable...2+ years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience… more
- Microsoft Corporation (Santa Clara, CA)
- …, efficiency, observability, and performance of products while also driving consistency in monitoring and operations at scale. + Coaching and mentorship of ... with our server and data center infrastructure, security and compliance, operations , globalization, and manageability solutions. Our focus is on smart growth,… more
- Sandia National Laboratories (Livermore, CA)
- …to enhance our nation's security! We are seeking an experienced Software Engineer (job title: R&D S&E Cybersecurity) to design and implement cutting-edge solutions ... software system implementation, technical documentation, unit and integration testing, reliability and performance assessment, and/or systematic technology insertion. +… more
- Walmart (Sunnyvale, CA)
- …inference pipelines. A strong focus on performance, security, and reliability is essential. The candidate should demonstrate organizational-level architectural ... Establish the gold standard for non-functional requirements, including system reliability , security, cost-efficiency, and the extreme low-latency performance essential… more
- Amazon (East Palo Alto, CA)
- …for efficient model training and fine-tuning on massive datasets. - Develop robust monitoring and debugging tools to ensure the reliability and performance of ... that are used by millions of companies worldwide to manage day-to-day operations . We will accomplish this by accelerating our customers' businesses through delivery… more
- NVIDIA (Santa Clara, CA)
- … monitoring and health management capabilities that enable industry-leading reliability , availability, and scalability of GPU assets. You will be harnessing ... to have a strong programming background, knowledge of datacenter hardware, operations , and networking, familiarity with software testing and deployment, familiarity… more
- NVIDIA (Santa Clara, CA)
- …to have significant software engineering experience with kubernetes including cluster operations , operator development, node health monitoring and working with ... software related to scheduling GPU resources on kubernetes. + Implementing monitoring and health management capabilities that enable industry leading reliability… more