-
Senior ML Software Engineer - Quantization…
- Microsoft Corporation (Redmond, WA)
-
Overview
Do you want to be at the forefront of innovating the latest hardware designs to propel Microsoft’s cloud growth? Are you seeking a unique career opportunity that combines technical capabilities, cross team collaboration, with business insight and strategy?
Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees, we come together with a growth mindset, innovate to empower others, and collaborate to achieve our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond. In alignment with our Microsoft values, we are committed to cultivating an inclusive work environment for all employees to positively impact our culture every day.
Join the Strategic Planning and Architecture (SPARC) team within Microsoft’s Azure Hardware Systems and Infrastructure (AHSI) organization, the team behind Microsoft’s expanding Cloud Infrastructure and for powering Microsoft’s “Intelligent Cloud” mission. Microsoft delivers more than 200 online services to more than one billion individuals worldwide and AHSI is the team behind our expanding cloud infrastructure. We deliver the core infrastructure and foundational technologies for Microsoft's cloud businesses including Microsoft Azure, Bing, MSN, Office 365, OneDrive, Skype, Teams and Xbox Live.
Responsibilities
+ Design and develop novel quantization and numerics kernels to enable efficient deployment of LLM inference and training in Microsoft’s Azure production environments.
+ Drive software development and model optimization tooling proof-of-concept effort to streamline deployment of quantized models.
+ Analyze performance bottlenecks in quantized state-of-the-art LLM architectures and drive performance improvements.
+ Prototype and evaluate emerging low-precision data formats through proof-of-concept implementations on novel hardware accelerator SDK.
+ Co-design model architecture optimized for low-precision deployment in close collaboration with companywide AI/ML teams.
+ Work cross-functionally with data scientists and ML researchers/engineers across organizations to align on model accuracy and performance goals.
+ Partner with hardware architecture and AI software framework teams to ensure end-to-end system efficiency.
Qualifications
Required/Minimum Qualifications
+ Bachelor's Degree in Computer Science, Electrical or Computer Engineering, or related field AND 4+ years of industry experience in high-performance ML systems, GPU kernel development, or ML runtime/infrastructure development OR Master's Degree in Computer Science, Electrical or Computer Engineering, or related field AND 3+ years of industry experience in high-performance ML systems, GPU kernel development, or ML runtime/infrastructure development OR Doctorate in Computer Science, Electrical or Computer Engineering, or related field AND 1+ year(s) of industry experience in high-performance ML systems, GPU kernel development, or ML runtime/infrastructure development.
Other Requirements:
+ Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.
Preferred Qualifications
+ Demonstrated experience delivering production-grade software in areas such as model compression, low-precision numerics (FP8, INT8/4, NVFP4, MX formats, etc.), low-level kernel development, and performance optimization.
+ Proficiency with modern deep learning frameworks, including PyTorch, TensorFlow, TensorRT, and ONNX Runtime.
+ Expertise in GPU/NPU kernel development using CUDA, Triton, ROCm, or comparable frameworks and fast model bring up on a new stack
+ Strong understanding of Transformer and LLM architectures, with hands-on experience in optimization techniques such as quantization, pruning, tensor/parameter sharding, model parallelism, KV-cache optimization, and Flash Attention etc.
+ Practical experience with large-scale model evaluation, including benchmarking state-of-the-art LLMs and fine-tuning (SFT or RL) large models.
+ Solid programming skills in Python, C, and C++.
+ Excellent communication abilities and a proven capacity to collaborate effectively in hybrid team-oriented environments.
+ Hands-on experience implementing and optimizing low-level linear algebra routines, including custom BLAS kernels would be a plus.
+ Deep knowledge of mixed-precision arithmetic units, including numerical formats and microarchitecture, is highly desirable.
Applied Sciences IC4 - The typical base pay range for this role across the U.S. is USD $119,800 - $234,700 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $158,400 - $258,000 per year.
Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:
https://careers.microsoft.com/us/en/us-corporate-pay
This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations. (https://careers.microsoft.com/v2/global/en/accessibility.html)
-
Recent Searches
- administrative education coordinator part (United States)
- Business Process Integration C (Minnesota)
Recent Jobs
-
Senior ML Software Engineer - Quantization & Numerics
- Microsoft Corporation (Redmond, WA)