- 
        Lead Application Reliability Engineer
- Citigroup (Irving, TX)
- 
             Overview of the Company: Citi, the leading global bank, has approximately 200 million customer accounts and does business in more than 160 countries and jurisdictions. Citi provides consumers, corporations, governments, and institutions with a broad range of financial products and services, including consumer banking and credit, corporate and investment banking, securities brokerage, transaction services, and wealth management. As a bank with a brain and a soul, Citi creates economic value that is systemically responsible and in our clients’ best interests. As a financial institution that touches every region of the world and every sector that shapes your daily life, our Enterprise Operations & Technology teams are charged with a mission that rivals any large tech company. Our technology solutions are the foundations of everything we do from keeping the bank safe, managing global resources, and providing the technical tools our workers need to be successful to designing our digital architecture and ensuring our platforms provide a first-class customer experience. We reimagine client and partner experiences to deliver excellence through secure, reliable, and efficient services. Our commitment to diversity includes a workforce that represents the clients we serve from all walks of life, backgrounds, and origins. We foster an environment where the best people want to work. We value and demand respect for others, promote individuals based on merit, and ensure opportunities for personal development are widely available to all. Ideal candidates are innovators with well-rounded backgrounds who bring their authentic selves to work and complement our culture of delivering results with pride. If you are a problem solver who seeks passion in your work, come join us. We’ll enable growth and progress together. Overview of the Role: The selected candidate will become the key engineer in supporting and advancing the platform used for threat-modeling process in Citi. The responsibilities will cover (among others) maintaining and supporting the threat-modeling application as well as developing relevant tools used throughout the threat-modeling process. The application is comprised of web servers and backend data storage databases and supporting it requires understanding of middleware, database, container, and AWS cloud environment as well as change-control and compliance processes. We are seeking a highly skilled and dedicated **Lead Application Reliability Engineer** to ensure the continuous availability, optimal performance, and security of a critical threat-modeling application. This role is central to our operational excellence, involving comprehensive support and maintenance of a robust technology stack including middleware, databases, Linux, and AWS EKS, all within a strictly regulated and change-controlled financial environment. The ideal candidate will leverage modern DevOps principles to drive stability and efficiency. Responsibilities: + Ensure high availability and optimal performance of the threat-modeling application through proactive monitoring, incident management, and efficient troubleshooting. + Perform routine and emergency application and infrastructure maintenance, including patching, upgrades, and configuration management, adhering strictly to change control procedures. + Conduct root cause analysis (RCA) for production incidents and implement preventative measures to minimize future occurrences. + Develop and maintain automation scripts and tools (e.g., using Python, Bash) to streamline operational tasks, improve monitoring, and facilitate efficient deployments. + Proactively identify, recommend, and implement enhancements to existing application maintenance practices, operational workflows, and system reliability. + Serve as a technology subject matter expert for internal and external stakeholders, contributing to technology domain roadmaps and firm-mandated controls and compliance initiatives. + Appropriately assess and mitigate risk in all technical decisions, ensuring compliance with applicable laws, rules, regulations, and internal policies, while escalating and reporting control issues with transparency. + Present technical work to senior stakeholders, the team, and other technical teams. + Mentor and train junior team members, fostering a culture of knowledge sharing and continuous improvement. Qualifications: + **6+ years** of relevant experience in an **Engineering role** , preferably in Financial Services or a large, complex, and/or global environment. + Experience managing and troubleshooting **Linux Operating Systems** **_(e.g., Red Hat Enterprise Linux (RHEL), CentOS, Ubuntu)_** , including **System Administration Tasks** like **_User Management, Service Restarts,_** and **_File System Checks_** – **_Must Have_** . + Proficiency in **Scripting for Automation** **_(e.g., Bash, Python)_** and with **Configuration Management Tools** **_(e.g., Ansible, Puppet, Chef)_** for system administration and infrastructure automation – **_Must Have_** . + Experience with container orchestration using Helm and Kubernetes on platforms like **_AWS EKS, GCP GKE,_** or **_OpenShift_** – **_Must Have_** . + Working knowledge of **Relational Databases** **_(e.g., PostgreSQL)_** , including basic querying – **_Must Have_** . + Proven track record of maintaining applications and their technology stacks compliant with security and configuration requirements, including successfully passing internal and external security audits by demonstrating secure configuration of applications and infrastructure **_(e.g., implementing least privilege access, hardening OS, managing firewall rules)_** and ensuring continuous compliance with regulatory standards **_(e.g., SOX, GDPR)_** through automated checks and reporting – **_Must Have_** . + Demonstrated adherence to strict change control procedures, executing all changes **_(e.g., code deployments, infrastructure updates)_** through a formalized change management process **_(e.g., ITSM, ServiceNow)_** with proper documentation and approvals – **_Must Have_** . + Experience with **Ticketing Systems** **_(e.g., Jira,_** **_ServiceNow_** **_)_** – **_Must Have_** . + Working understanding of **Middleware Components** (e.g., Nginx, Tomcat or equivalents). + Familiarity with **Development Concepts** **_(e.g., Git, CI/CD, Pipelines, SDLC)_** . + Strong communication skills, both written and verbal, for technical and non-technical audiences. + Demonstrated analytical and diagnostic skills, with an ability to identify process improvements and best practices. + Ability to work independently, manage multiple tasks, take ownership of initiatives, and operate effectively in a matrixed environment under pressure and tight deadlines. Associate Level Certification Required:** **_(Require a Minimum of 1 or more of the following)_ + Kubernetes and Cloud Native Associate (KCNA), Certified Kubernetes Application Developer (CKAD), Certified Kubernetes Administrator (CKA), Kubernetes and Cloud Native Security Associate (KCSA) + Red Hat Certified System Administrator or like certification + AWS Certified Developer, AWS Certified SysOps Administrator + CompTIA Cloud+ Google Associate Cloud Engineer or other GCP certification + HashiCorp Certified: Terraform Associate Associate Cybersecurity Certification:** **_(Not required but any of the following would be a plus)_ + GIAC Security Essentials (GSEC) + ISC2 Systems Security Certified Practitioner (SSCP) + CompTIA CySA+ Microsoft Certified: Security Operations Analyst Associate; Information Protection Administrator Associate Education: Bachelor’s degree/University degree or equivalent experience ------------------------------------------------------ Job Family Group: Technology ------------------------------------------------------ Job Family: Systems & Engineering ------------------------------------------------------ Time Type: Full time ------------------------------------------------------ Primary Location: Irving Texas United States ------------------------------------------------------ Primary Location Full Time Salary Range: $125,760.00 - $188,640.00 In addition to salary, Citi’s offerings may also include, for eligible employees, discretionary and formulaic incentive and retention awards. Citi offers competitive employee benefits, including: medical, dental & vision coverage; 401(k); life, accident, and disability insurance; and wellness programs. Citi also offers paid time off packages, including planned time off (vacation), unplanned time off (sick leave), and paid holidays. For additional information regarding Citi employee benefits, please visit citibenefits.com. Available offerings may vary by jurisdiction, job level, and date of hire. ------------------------------------------------------ Most Relevant Skills Please see the requirements listed above. ------------------------------------------------------ Other Relevant Skills For complementary skills, please see above and/or contact the recruiter. ------------------------------------------------------ Anticipated Posting Close Date: Nov 03, 2025 ------------------------------------------------------ _Citi is an equal opportunity employer, and qualified candidates will receive consideration without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other characteristic protected by law._ _If you are a person with a disability and need a reasonable accommodation to use our search tools and/or apply for a career opportunity review Accessibility at Citi (https://www.citigroup.com/citi/accessibility/application-accessibility.htm) ._ _View Citi’s EEO Policy Statement (https://www.citigroup.com/global/eeo-aa-policy) and the Know Your Rights (https://www.eeoc.gov/sites/default/files/2023-06/22-088\_EEOC\_KnowYourRights6.12ScreenRdr.pdf) poster._ Citi is an equal opportunity and affirmative action employer. Minority/Female/Veteran/Individuals with Disabilities/Sexual Orientation/Gender Identity. 
 
 
- 
        
Recent Jobs
- 
                
                    Lead Application Reliability Engineer
                
                - Citigroup (Irving, TX)
- 
                
                    Manager, OPO Administrative Operations
                
                - University of Miami (Miami, FL)