-
Principal Applied Scientist
- Microsoft Corporation (Redmond, WA)
-
Overview
Microsoft is a company where passionate innovators come to collaborate, envision what can be and take their careers further. This is a world of more possibilities, more innovation, more openness, and the sky is the limit thinking in a cloud-enabled world.
Microsoft’s Azure Data engineering team is leading the transformation of analytics in the world of data with products like databases, data integration, big data analytics, messaging & real-time analytics, and business intelligence. The products our portfolio include Microsoft Fabric, Azure SQL DB, Azure Cosmos DB, Azure PostgreSQL, Azure Data Factory, Azure Synapse Analytics, Azure Service Bus, Azure Event Grid, and Power BI. Our mission is to build the data platform for the age of AI, powering a new class of data-first applications and driving a data culture.
•Within Azure Data, the messaging and real-time analytics team provides comprehensive solutions and a robust platform that enables users to ingest high granularity signals (real-time & observability) and complex data, converting those into a competitive advantage in real-time for both end users and modern applications.
•Within the Microsoft Fabric product pillar, the Real-Time Intelligence (RTI) team is hiring a Principal Applied Scientist to lead the science of evaluating (evals) and improving LLM-powered agents operating on live operational data. This role focuses on building end-to-end evaluation systems for agentic workflows, covering planning, tool use, retrieval, safety, and end-user outcomes, and turning them into flywheels that continuously raise agent quality, reliability, and business impact.
What makes RTI unique is its deep integration across Fabric’s real-time surfaces, rich instrumentation on event-level data, and shared ML/LLM evaluation platforms that let us ship science rapidly across multiple experiences. In this role, you’ll partner closely with engineering and product to architect low-latency evaluation and monitoring pipelines, design offline and online experiments (including LLM-as-judge and human-in-the-loop workflows), and define the quality standards that govern agents from initial research through deployment and continuous improvement.
We do not just value differences or different perspectives. We seek them out and invite them in so we can tap into the collective power of everyone in the company. As a result, our customers are better served.
Responsibilities
•- Lead end-to-end science for evaluating LLM-powered agents on real-time and batch workloads: designing evaluation frameworks, metrics, and pipelines that capture planning quality, tool use, retrieval, safety, and end-user outcomes, and partnering with engineering for robust, low-latency deployment.
- Advance evaluation methodologies for agents across RTI surfaces by driving test set design, auto-raters (including LLM-as-judge), human-in-the-loop feedback loops, and measurable lifts in key quality metrics such as task success rate, reliability, and safety.
- Establish rigorous evaluation and reliability practices for LLM/agent systems: from offline benchmarks and scenario-based evals to online experiments and production monitoring, defining guardrails and policies that balance quality, cost, and latency at scale.
- Collaborate with PM, Engineering, and UX to translate evaluation insights into customer-visible improvements, shaping product requirements, de-risking launches, and iterating quickly based on telemetry, user feedback, and real-world failure modes.
- Provide technical leadership and mentorship within the applied science and engineering community, fostering inclusive, responsible-AI practices in agent evaluation, and influencing roadmap, platform investments, and cross-team evaluation strategy across Fabric.
Embody our culture (https://careers.microsoft.com/v2/global/en/culture) and values (https://www.microsoft.com/en-us/about/corporate-values)
Qualifications
Required/Minimum Qualifications
•- Bachelor's Degree in Statistics, Computer Science, Electrical or Computer Engineering, or related field AND 8+ years related experience
OR Master's Degree in Statistics, Computer Science, Electrical or Computer Engineering, or related field AND 6+ years related experience
OR Doctorate in Statistics, Computer Science, Electrical or Computer Engineering, or related field AND 5+ years related experience
OR equivalent experience.
Other Requirements
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings: Microsoft Cloud Background Check:
This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Preferred/Additional Qualifications
•- 2+ years designing and running ML/LLM evaluation and experimentation (offline metrics + online A/B tests)
-Proven experience applying machine learning, statistics, and measurement science to LLM and agent evaluation, ideally in real-time or streaming scenarios.
- Proficiency in agentic AI concepts (e.g., multi-step agents, tool orchestration, retrieval/RAG, workflow automation) and familiarity with techniques for assessing safety, robustness, anomaly detection, and causal impact of agent behaviors.
- Strong programming and modeling skills in languages such as Python, and experience building evaluation services or pipelines on distributed systems (e.g., running large-scale offline evals, auto-raters, or LLM-as-judge workloads).
- Ability to design, implement, and interpret rigorous evaluations end-to-end: constructing eval sets and scenarios, combining offline metrics with human/LLM raters, running online experiments (A/B tests, holdouts), and instrumenting reliability monitoring at scale.
- Collaborative mindset with demonstrated success partnering across Engineering, PM, and UX to define quality bars, translate evaluation insights into roadmap decisions, and iterate quickly on customer-facing agent and LLM experiences.
\#azdat
\#azuredata
Applied Sciences IC6 - The typical base pay range for this role across the U.S. is USD $163,000 - $296,400 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $220,800 - $331,200 per year.
Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:
https://careers.microsoft.com/us/en/us-corporate-pay
This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations. (https://careers.microsoft.com/v2/global/en/accessibility.html)
-
Recent Jobs
-
Principal Applied Scientist
- Microsoft Corporation (Redmond, WA)
-
Principal Mechanical Engineer
- RTX Corporation (Louisville, KY)
-
Engineering Environmental Test Lead
- Stratolaunch, LLC (Mojave, CA)
-
3DX Software Engineer
- Ford Motor Company (Dearborn, MI)