-
Senior Software Technical Program Manager - GPU…
- NVIDIA (Santa Clara, CA)
-
We are looking for an experienced, highly motivated Senior Software Technical Program Manager to lead our efforts in developing pioneering compute software solutions for critically important environments. Our work has made major impact in various fields, and is used across leading academic institutions, start-ups, and industry! This is an outstanding opportunity to lead and manage our communication libraries like NCCL, NVSHMEM, UCX for Deep Learning and HPC. We need passionate, hard-working, and creative people to help us reach our engineering goals.
What you will be doing:
This GPU Communication Libraries role will strongly collaborate across SW Development Managers, Engineers, Product Marketing, Customer Program Management, Quality Assurance, and other logistics personnel to establish and implement streamlined processes for the development of advanced Compute Software solutions for cloud service providers and OEM customers. In this role, you will collect requirements, help define priorities, remove blockers, drive planning and scheduling for all phases of the software development lifecycle. Additionally, you'll be responsible for the continuous improvement and maintenance of all processes related to enterprise support and establish process for next-gen architecture and feature engagements to avoid missed opportunities of influencing changes in HW architecture. You will have the opportunity to partner with diverse technical groups, spanning all organizational levels.
+ Responsible for leading status meetings, proactively addressing challenges, customer concerns, and serving as primary POC for building and upholding prioritized release schedules and plans.
+ Strategically plan and partner across Nvidia teams to drive software objectives while maintaining schedules and formulating risk management strategies for risks identified across multiple parallel work streams.
+ Lead existing product development enhancements and software release processes, while collaborating with engineering management to optimize the development workflow and efficiency.
+ Translate customer requirements into actionable landmarks and tasks internally, ensuring customers are continually informed on issue statuses.
+ Drive Virtual reviews and establish continuous feedback loops by communicating benchmarking results and customer insights to product and engineering leadership.
+ Track and report large-scale performance benchmarking across all clusters. Build performance dashboards and reporting processes to monitor KPIs and surface performance trends
+ Collaborate across internal teams and third-party partners across time zones, as necessary, to resolve customer issues and oversee customer releases.
+ Partner with Customer Program Managers addressing software issues, including technical feedback from OEMs, CSPs, and partners.
What we need to see:
+ 12+ overall years of experience in the software industry with specialization in HPC networking or system software.
+ 6+ years program management experience in a similar or related role.
+ BS, MS, or Ph.D. in CS, CE, EE (related technical field) or equivalent experience.
+ Hands on experience with software development for hardware platforms or communication runtime or high performance networking with demonstrated success in delivering these complex products to customers.
+ Proficiency in Agile software development methodologies.
+ Proven experience to creatively resolve technical and resource issues, and think strategically and tactically building consensus to ensure program success
+ Comprehensive understanding of software engineering principles, including experience with widely-adopted configuration management tools and productivity-enhancing tools and automation processes.
+ Exceptional attention to detail and a demonstrated capacity for multitasking, in a dynamic environment with shifting priorities and changing requirements.
+ Strong communication and technical presentation skills and ability to work independently and actively with minimal guidance.
+ Previous experience coordinating activities between HW and SW organizations
Ways to stand out from the crowd:
+ Solid understanding of the Deep Learning Framework ecosystem for Training and Inference
+ Solid understanding of operating systems, datacenter servers, graphics principles and standards.
+ Background with parallel programming models (MPI, SHMEM) and at least one communication runtime (MPI, NCCL, NVSHMEM, OpenSHMEM, UCX, UCC).
+ Knowledge of a modern programming language is desired as well as depth in HPC and ML/DL fundamentals
+ Background with RDMA, high-performance networking technologies (InfiniBand, RoCE, Ethernet, EFA), network architecture and network topologies.
Our technology has no boundaries! NVIDIA is building the world’s most groundbreaking and innovative compute platforms for the world to use. At the center of NVIDIA's culture are our core values like innovation, excellence and determination and team, that guide us to be the best we can be.
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 192,000 USD - 304,750 USD.
You will also be eligible for equity and benefits (https://www.nvidia.com/en-us/benefits/) .
Applications for this job will be accepted at least until August 24, 2025.
NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.
-
Recent Jobs
-
Senior Software Technical Program Manager - GPU Communication Libraries
- NVIDIA (Santa Clara, CA)
-
EHS Specialist Coordinator
- System One (Dallas, TX)
-
UX Product Research - Manager
- Deloitte (New Orleans, LA)