"Alerted.org

Job Title, Industry, Employer
City & State or Zip Code
20 mi
  • 0 mi
  • 5 mi
  • 10 mi
  • 20 mi
  • 50 mi
  • 100 mi
Advanced Search

Advanced Search

Cancel
Remove
+ Add search criteria
City & State or Zip Code
20 mi
  • 0 mi
  • 5 mi
  • 10 mi
  • 20 mi
  • 50 mi
  • 100 mi
Related to

  • Senior Reliability Engineer

    Microsoft Corporation (Mountain View, CA)



    Apply Now

    The Firmware Deployment team within Microsoft’s Silicon Cloud Hardware Infrastructure Engineering (SCHIE) organization is responsible for building and operating world-class software and data-driven services that support Azure’s hardware infrastructure development. Our mission is to enable safe, reliable, and intelligent deployment of firmware payloads across the Azure fleet, ensuring system health and operational quality at scale.

     

    We are seeking a **Site Reliability Engineer** within the Firmware Deployment team, you will be instrumental in shaping the future of the Azure Fleet. Your primary responsibility will involve developing and applying stable firmware releases across the GPU fleet, as well as potentially supporting other related environments. This work is essential to maintain Microsoft’s security and performance standards while delivering an outstanding experience for our customers.

     

    Your efforts in deploying and managing firmware updates will ensure the reliability and efficiency of Azure’s hardware infrastructure. By focusing on stability and operational excellence, you will help safeguard system health and contribute to the ongoing success and growth of Azure’s global infrastructure.

    Responsibilities

    + Build and bring specializedknowledge across multiple production aspects (monitoring, release engineering, testing, live site excellence, buildout, performance optimization, capacity management)

    + Analyze large-scale telemetry and operational data to uncover insights and drive data-informed decisions.

    + Use the proven set of principles and practices such as safe deployment, testing for reliability, single point of failures elimination, disaster recovery, SLOs based monitoring, throttling, infrastructure management automation, post-mortem excellence, and adoption of common systems

    + Respond to alerts and incidents.

    + Build and follow playbooks to drive root cause analysis and reviews

    + Partner with hardware and firmware teams to understand system behavior and identify opportunities for predictive analytics.

    + Participate in an on-call rotation and availability during non-standard business hours and contribute to service reliability and incident resolution.

    Qualifications

    Required/minimum qualifications:

    + Master's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 4+ years technical experience in software engineering, network engineering, or systems administration

    + OR equivalent experience.

    + 3+ years of experience in software engineering or operations for large-scale distributed systems.

    + Ability to support a 24x7 data center environment, including participation in an on-call rotation and availability during non-standard business hours(evening, nights, weekends, or holidays) as operational needs require.

    + Proficiency in one or more programming languages (C#, Python, Go, or similar).

    + Understanding of cloud infrastructure (Azure preferred), networking, and system design.

    + Familiarity with monitoring tools, incident management frameworks, and DevOps practices.

    Other Requirements:

    Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings:

    Microsoft Cloud Background Check:

    This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.

     

    Site Reliability Engineering IC4 - The typical base pay range for this role across the U.S. is USD $119,800 - $234,700 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $158,400 - $258,000 per year.

     

    Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: https://careers.microsoft.com/us/en/us-corporate-pay

     

    Microsoft will accept applications for the role until October 24th, 2025

     

    \#SCHIE  #AZURE  #Cloud  **\#MSCareerEvents25**

     

    Microsoft is an equal opportunity employer. Consistent with applicable law, all qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations (https://careers.microsoft.com/v2/global/en/accessibility.html) .

     


    Apply Now



Recent Searches

  • Director Automation Continuous Improvement (United States)
  • Senior Principal Systems Engineer (United States)
  • Actuarial Intern Part Time (Connecticut)
  • Mammography Tech Per Diem (California)
[X] Clear History

Recent Jobs

  • Senior Reliability Engineer
    Microsoft Corporation (Mountain View, CA)
  • Buyer
    Siemens (Hingham, MA)
[X] Clear History

Account Login

Cancel
 
Forgot your password?

Not a member? Sign up

Sign Up

Cancel
 

Already have an account? Log in
Forgot your password?

Forgot your password?

Cancel
 
Enter the email associated with your account.

Already have an account? Sign in
Not a member? Sign up

© 2025 Alerted.org