"Alerted.org

Job Title, Industry, Employer
City & State or Zip Code
20 mi
  • 0 mi
  • 5 mi
  • 10 mi
  • 20 mi
  • 50 mi
  • 100 mi
Advanced Search

Advanced Search

Cancel
Remove
+ Add search criteria
City & State or Zip Code
20 mi
  • 0 mi
  • 5 mi
  • 10 mi
  • 20 mi
  • 50 mi
  • 100 mi
Related to

  • Principal Site Reliability Engineer

    UKG (Ultimate Kronos Group) (Lowell, MA)



    Apply Now

    Company Overview

     

    With 80,000 customers across 150 countries, UKG is the largest U.S.-based private software company in the world. And we’re only getting started. Ready to bring your bold ideas and collaborative mindset to an organization that still has so much more to build and achieve? Read on.

     

    At UKG, you get more than just a job. You get to work with purpose. Our team of U Krewers are on a mission to inspire every organization to become a great place to work through our award-winning HR technology built for all.

     

    Here, we know that you’re more than your work. That’s why our benefits help you thrive personally and professionally, from wellness programs and tuition reimbursement to U Choose — a customizable expense reimbursement program that can be used for more than 200+ needs that best suit you and your family, from student loan repayment, to childcare, to pet insurance. Our inclusive culture, active and engaged employee resource groups, and caring leaders value every voice and support you in doing the best work of your career. If you’re passionate about our purpose — people —then we can’t wait to support whatever gives you purpose. We’re united by purpose, inspired by you.

    About the Team:

    Site Reliability Engineers at UKG are critical team members that have a breadth of knowledge encompassing all aspects of service delivery. They develop software solutions to enhance, harden and support our service delivery processes. This can include building and managing CI/CD deployment pipelines, automated testing, capacity planning, performance analysis, monitoring, alerting, chaos engineering and auto remediation.

     

    Site Reliability Engineers must be passionate about learning and evolving with current technology trends. They strive to innovate and are relentless in pursuing a flawless customer experience. They have an “automate everything” mindset, helping us bring value to our customers by deploying services with incredible speed, consistency, and availability.

     

    About the Role

     

    Site Reliability Engineers (SREs) at UKG play a critical role in delivering scalable, reliable, and secure services to our customers. As Principal SRE, you will be a force multiplier—combining deep software engineering expertise with systems knowledge to build robust automation, drive operational excellence, and elevate the overall reliability of our services.

     

    This role is highly technical and hands-on. You will design and implement solutions that eliminate toil and optimize performance, including developing automated testing frameworks, intelligent alerting systems, and self-healing mechanisms.

    Responsibilities

    -Architect, develop, and maintain scalable automation, internal tools, health checks, monitoring, auto-remediation to improve service availability, reliability, latency, scalability, and system resiliency—ensuring services withstand failures and recover gracefully to maintain high availability.

     

    -Lead incident response effort to minimize customer impact and reduce MTTx, including leading post-incident reviews to identify root causes and implement long-term solutions.

     

    -Provide strategic guidance and design consultation throughout the full-service lifecycle—from architecture and capacity planning to production readiness—while establishing and enforcing SRE standards for system architecture, observability, incident response, and reliability metrics.

     

    - Partnership closely with product, infrastructure, and engineering teams to integrate reliability goals into the development process.

     

    - Mentor and guide engineers across the organization on reliability principles and best practices and serve as a reliability evangelist to drive cultural and operational changes that improve engineering velocity.

     

    - Leverage generative AI agents and automation tools to enhance operational efficiency, automate health checks, incident detection and resolution, and drive innovative solutions in site reliability engineering.

     

    - Define, implement, and measure SLIs and SLOs to guide reliability-focused engineering decisions.

    Basic Qualifications

    - Minimum 8 years of engineering experience, including 5+ years in Site Reliability, DevOps, or Production Engineering roles.

     

    - Advanced proficiency in one or more programming languages (e.g., Python, Go, Java, or C++) with the ability to write production-grade software.

     

    - Strong Linux systems expertise, including scripting, performance tuning, and debugging.

     

    - Hands-on experience operating large-scale distributed systems in public cloud environments, preferably GCP.

     

    - Deep knowledge of Kubernetes and container orchestration patterns in production environments.

     

    - Experience with GitHub Actions and modern CI/CD practices.

     

    - Deep experience with SLI/SLO design, service health instrumentation, and production telemetry.

     

    - Proven ability to build dashboards and alerts using Splunk and Grafana.

     

    - Strong understanding of observability systems, including: Metrics pipelines, Distributed tracing, Log aggregation, Alerting strategies and incident triage

     

    - Familiarity with infrastructure-as-code tools (e.g., Terraform, Ansible).

     

    -Experience building and supporting highly available, customer-facing systems.

     

    - Experience working with generative AI agents or AI-driven automation tools to support incident management, monitoring, or operational workflows.

     

    - Broad grounding in at least two of the following: Cloud Architecture, Nginx, Security, or Database Technologies

     

    - Strong troubleshooting skills for complex system issues, with proven experience leading incident response efforts.

     

    - Excellent communication and collaboration skills, with experience mentoring and guiding engineers.

    Preferred Qualifications

    - Experience implementing chaos engineering, load testing, and resilience modeling.

     

    -Google Cloud Professional Architect Certification is a plus.

     

    -Understanding of OpenTelemetry (metrics, tracing, logs) and its integration into observability pipelines.

     

    Where we’re going

     

    UKG is on the cusp of something truly special. Worldwide, we already hold the #1 market share position for workforce management and the #2 position for human capital management. Tens of millions of frontline workers start and end their days with our software, with billions of shifts managed annually through UKG solutions today. Yet it’s our AI-powered product portfolio designed to support customers of all sizes, industries, and geographies that will propel us into an even brighter tomorrow!

     

    Equal Opportunity Employer

     

    UKG is proud to be an equal opportunity employer and is committed to maintaining a diverse and inclusive work environment. All qualified applicants will receive considerations for employment without regard to race, color, religion, sex, age, disability, marital status, familial status, sexual orientation, pregnancy, genetic information, gender identity, gender expression, national origin, ancestry, citizenship status, veteran status, and any other legally protected status under federal, state, or local anti-discrimination laws.

     

    View The EEO Know Your Rights poster (https:www.eeoc.gov/sites/default/files/2022-10/EEOC\_KnowYourRights\_screen\_reader\_10\_20.pdf)

     

    UKG participates in E-Verify. View the E-Verify posters here (https:www.e-verify.gov/sites/default/files/everify/posters/EVerifyParticipationPoster.pdf) .

     

    It is unlawful in Massachusetts to require or administer a lie detector test as a condition of employment or continued employment. An employer who violates this law shall be subject to criminal penalties and civil liability.

     

    The pay range for this position is $142,100 to $204,200, however, base pay offered may vary depending on skills, experience, job-related knowledge and location. This position is also eligible for a short-term incentive and a long-term incentive as part of total compensation. Information about UKG’s comprehensive benefits can be reviewed on our careers site at https:www.ukg.com/careers (https:www.ukg.com/careers)

     

    It is the policy of Ultimate Software to promote and assure equal employment opportunity for all current and prospective Peeps without regard to race, color, religion, sex, age, disability, marital status, familial status, sexual orientation, pregnancy, genetic information, gender identity, gender expression, national origin, ancestry, citizenship status, veteran status, and any other legally protected status entitled to protection under federal, state, or local anti-discrimination laws. This policy governs all matters related to recruitment, advertising, and initial selection of employment. It shall also apply to all other aspects of employment, including, but not limited to, compensation, promotion, demotion, transfer, lay-offs, terminations, leave of absence, and training opportunities.

     


    Apply Now



Recent Searches

  • data center capacity analyst (United States)
  • Java Programming (United States)
  • rn bilingual case manager (United States)
  • sustainability program manager (United States)
[X] Clear History

Recent Jobs

  • Principal Site Reliability Engineer
    UKG (Ultimate Kronos Group) (Lowell, MA)
  • Senior Software Engineer
    Microsoft Corporation (Redmond, WA)
  • Software Engineer IV (Cyber) - NTC
    Nomad Global Communication Solutions (Huntsville, AL)
  • Software Engineer, Systems (Remswe19)
    Meta (Dover, DE)
[X] Clear History

Account Login

Cancel
 
Forgot your password?

Not a member? Sign up

Sign Up

Cancel
 

Already have an account? Log in
Forgot your password?

Forgot your password?

Cancel
 
Enter the email associated with your account.

Already have an account? Sign in
Not a member? Sign up

© 2025 Alerted.org