-
Reliability Architect
- TEKsystems (Jersey City, NJ)
-
Description
Principal Responsibilities and Outputs: Architectural Guidance: Publish technology strategies and supporting architectures to mature business and technology operations to enable AI/MLOps. Standards and Best Practices: Publish observability standards and best practices for adopting new and existing frameworks or technologies. Technical Solutions: Translate business goals into technical solutions designs to include descriptive and diagnostic capabilities through engineering at delivery satisfying non-functional requirements for business solutions. Delivery Enhancements: Create actionable Observability Driven Development procedures to ensure consistent adoption of open standard (i.e. OTel, MELTS) industry frameworks. AI Augmented Testing: Deliver strategies to help enable more AI-Augmented testing capabilities empower federated execution and central enterprise governance. Communication and Education: Develop and routinely publish communication as well as training and education sessions for knowledge transfer and raising awareness of current or future enterprise direction. Reliability Design: Design and implement full stack applications for reliability and integration patterns to enable more operational predictability and prescriptive disruption response. Monitoring and Alerting: Establish appropriate monitoring and alerting standards for performance, scalability, availability, and reliability. Experience: Distributed Applications: Minimum of 10 years’ experience in the design and implementation of distributed applications. Networking and Infrastructure: Minimum of 5 years’ experience in networking, infrastructure, middleware, and database architecture. Highly Available Architecture: Minimum of 5 years’ experience in highly available architecture and solution implementation. Disaster Recovery: Minimum of 5 years’ experience with industry patterns, methodologies, and techniques across disaster recovery disciplines. Knowledge and Skills: Problem-Solving: Ability to solve problems and engineer solutions that meet resiliency requirements. Independent Work: Ability to work independently with minimal supervision. Public Cloud Environment: Strong knowledge of AWS and Azure cloud environment is a plus. Performance Analysis: Experience with performance analysis, tuning, and engineering is a plus. Monitoring Tools: Knowledge of monitoring tools such as CloudWatch, CloudTrail, Splunk, and other application monitoring tools. Technical Expertise: In-depth, hands-on expertise in Java, SQL, and Linux. Collaboration: Comfortable working in an open, highly collaborative team. Troubleshooting: Strong troubleshooting skills. Automation Scripting: Ability to write scripts (Bash, PHP, Python) for automation of solution resiliency validation and verification. Communication: Excellent oral and written communication skills along with the ability to communicate at all levels. Chaos Engineering: Experience in chaos engineering is a huge plus. Educational Background : Bachelor's Degree in a technical discipline or equivalent work experience. DevOps and Agile: Support the adoption of DevOps methodology and Agile project management. Potential deliverables : Historical Analytics Architecture (Requirements Documents, Logical, and Technical Designs) Data Fabric Architecture (Requirements Documents, Logical, and Technical Designs) Alerting Architecture (Requirements Documents, Logical, and Technical Designs) AI Ops Strategy AI Observability Strategy OTel Standards and Strategies Logging Standards Various Prototype work Observability API demonstrations AIOps (Predictive and Prescriptive activities) demonstrations Observability Maturity Models and Assessment Structure Creating Training and Education Materials
Additional Skills & Qualifications
Additional Qualifications: Cloud-Based Solutions: Minimum of 3 years’ experience in testing, architecting, and delivering cloud-based solutions. Chaos Engineering Tools: Experience with tools like Gremlin or Cavisson NetHavoc. Enterprise Java Technologies: Expertise in enterprise Java technologies, tools, and system architectures. Automation Frameworks: Experience with test automation frameworks and tools such as Selenium, TestNG, and API testing tools like Postman or RestAssured. Documentation: Experience writing test-related documentation such as test plans, strategies, or post-testing reports.
Experience Level
Expert Level
Pay and Benefits
The pay range for this position is $75.00 - $90.00/hr.
Eligibility requirements apply to some benefits and may depend on your job classification and length of employment. Benefits are subject to change and may be subject to specific elections, plan, or program terms. If eligible, the benefits available for this temporary role may include the following:
• Medical, dental & vision • Critical Illness, Accident, and Hospital • 401(k) Retirement Plan – Pre-tax and Roth post-tax contributions available • Life Insurance (Voluntary Life & AD&D for the employee and dependents) • Short and long-term disability • Health Spending Account (HSA) • Transportation benefits • Employee Assistance Program • Time Off/Leave (PTO, Vacation or Sick Leave)
Workplace Type
This is a hybrid position in Jersey City,NJ.
Application Deadline
This position is anticipated to close on May 26, 2025.
About TEKsystems and TEKsystems Global Services
We’re a leading provider of business and technology services. We accelerate business transformation for our customers. Our expertise in strategy, design, execution and operations unlocks business value through a range of solutions. We’re a team of 80,000 strong, working with over 6,000 customers, including 80% of the Fortune 500 across North America, Europe and Asia, who partner with us for our scale, full-stack capabilities and speed. We’re strategic thinkers, hands-on collaborators, helping customers capitalize on change and master the momentum of technology. We’re building tomorrow by delivering business outcomes and making positive impacts in our global communities. TEKsystems and TEKsystems Global Services are Allegis Group companies. Learn more at TEKsystems.com.
The company is an equal opportunity employer and will consider all applications without regard to race, sex, age, color, religion, national origin, veteran status, disability, sexual orientation, gender identity, genetic information or any characteristic protected by law.
-
Recent Jobs
-
Reliability Architect
- TEKsystems (Jersey City, NJ)
-
Principal Software Engineer with Test Equipment
- Raytheon (Tucson, AZ)