-
Lead Infrastructure Engineer - Observability
- Truist (Atlanta, GA)
-
The position is described below. If you want to apply, click the Apply Now button at the top or bottom of this page. After you click Apply Now and complete your application, you'll be invited to create a profile, which will let you see your application status and any communications. If you already have a profile with us, you can log in to check status.
Need Help? (https://www.brainshark.com/bbandt/careers-site-faq)
_If you have a disability and need assistance with the application, you can request a reasonable accommodation. Send an email to Accessibility ([email protected]?subject=Accommodation%20request)_
_(accommodation requests only; other inquiries won't receive a response)._
Regular or Temporary:
Regular
**Language Fluency:** English (Required)
Work Shift:
1st shift (United States of America)
Please review the following job description:
We are seeking a highly skilled and forward-thinking lead observability engineer to architect, implement, and evolve enterprise-grade observability capabilities across the Truist technology landscape. This role will drive the design and adoption of a modern, scalable observability platform rooted in OpenTelemetry (Otel) and enriched by complementary technologies including Prometheus, Grafana, Jaeger, and commercial APM solutions. You will lead the strategy for metrics, traces, and synthetic monitoring – enabling end-to-end visbility, accelerated incident response, and a frictionless developer experience.
In this role, you’ll champion a shift from reactive monitoring to proactive, intelligence-driven observability. You’ll lead efforts to standardize telemetry pipelines, embed observability into CI/CD workflows, and integrate signal-based insights into reliability, performance, and business outcomes. Success in this position means reducing mean-time-to-detect (MTTD), accelerating root cause analysis, and creating a resilient, insight-rich environment that empowers engineering teams to deliver with confidence.
ESSENTIAL DUTIES AND RESPONSIBILITIES
Following is a summary of the essential functions for this job. Other duties may be performed, both major and minor, which are not mentioned below. Specific activities may change from time to time.
1. Performs problem tracking, diagnosis and root-cause analysis, replication, troubleshooting, and resolution for complex issues. In this capacity, performs programming and debugging activities.
2. Responds to issues in a timely manner by receiving and investigating incidents or service tickets.
3. Analyzes and observes trends with technical issues and develops recommendations for long- term improvements.
4. Documents all relevant end-user interactions and steps taken to resolve incidents.
5. Has occasional contact with end-users.
6. Communicates status of issue resolution to internal customers.
7. May engage and manage outside vendors.
8. Applies in-depth knowledge of application support and an understanding of best practices.
9. Typically leads moderately complex projects and participates in larger, more complex initiatives.
10. Solves complex technical and operational problems.
11. Acts as a resource for teammates with less experience.
12. May have people management responsibilities for a small team.
QUALIFICATIONS
Required Qualifications:
The requirements listed below are representative of the knowledge, skill and/or ability required. Reasonable accommodations may be made to enable individuals with disabilities to perform the essential functions.
1. Bachelor's degree and five years of experience in development or application support or an equivalent combination of education and work experience.
2. In-depth knowledge in information systems and ability to identify, apply, and implement best practices.
3. Understanding of key business processes and competitive strategies related to the IT function.
4. Ability to plan and manage projects.
5. Ability to solve complex problems by applying best practices.
6. Ability to provide direction and mentor less experienced teammates.
7. Ability to interpret and convey complex, difficult, or sensitive information.
Preferred Qualifications:
1. Bachelor's degree and six years of experience or an equivalent combination of education and work experience.
2. Expertise with OpenTelemetry (Otel), including custom instrumentation, collector configuration, and pipeline design for traces, metrics, and logs.
3. Hands-on experience with observability tooling, such as Prometheus, Grafana, Jaegar, Loki, Elastic, Splunk, and/or Dynatrace in enterprise-grade environments.
4. Strong background in distributed systems, cloud-native architectures, and K8s, with the ability to identify observability gaps across service meshes, APIs, and event-driven platforms.
5. Proficiency in scripting or development languages (e.g. Python, Go, Bash, or Java) to automate telemetry integration, create custom exporters, and contributed to platform tooling.
6. Proven track record of driving enterprise adoption of observability standards and practices, including influencing telemetry strategies across engineering, SRE, and platform teams.
**General Description of Available Benefits for Eligible Employees of Truist Financial Corporation:** All regular teammates (not temporary or contingent workers) working 20 hours or more per week are eligible for benefits, though eligibility for specific benefits may be determined by the division of Truist offering the position. Truist offers medical, dental, vision, life insurance, disability, accidental death and dismemberment, tax-preferred savings accounts, and a 401k plan to teammates. Teammates also receive no less than 10 days of vacation (prorated based on date of hire and by full-time or part-time status) during their first year of employment, along with 10 sick days (also prorated), and paid holidays. For more details on Truist’s generous benefit plans, please visit our Benefits site (https://benefits.truist.com/)
. Depending on the position and division, this job may also be eligible for Truist’s defined benefit pension plan, restricted stock units, and/or a deferred compensation plan. As you advance through the hiring process, you will also learn more about the specific benefits available for any non-temporary position for which you apply, based on full-time or part-time status, position, and division of work.
_Truist is an Equal Opportunity Employer that does not discriminate on the basis of race, gender, color, religion, citizenship or national origin, age, sexual orientation, gender identity, disability, veteran status, or other classification protected by law. Truist is a Drug Free Workplace._
EEO is the Law (https://www.eeoc.gov/sites/default/files/2022-10/EEOC\_KnowYourRights\_screen\_reader\_10\_20.pdf)
Pay Transparency Nondiscrimination Provision (https://www.dol.gov/sites/dolgov/files/OFCCP/pdf/pay-transp\_%20English\_formattedESQA508c.pdf)
E-Verify (https://e-verify.uscis.gov/web/media/resourcesContents/E-Verify\_Participation\_Poster\_ES.pdf)
-