-
Cloud Platform Engineer
- HTC Global Services Inc (Dearborn, MI)
-
Job Description: Employees in this job function are focused on developing and maintaining reusable software components that serve the needs of product developers in the organization. They are responsible for designing, implementing, integrating and maintaining the underlying infrastructure and software applications that support developer productivity and self-service.
Key Responsibilities:
+ Collaborate with enterprise architects, software architects, software engineering teams, etc. to design the platform infrastructure and tools encompassing servers, networks, storage, databases, cloud services, etc.
+ Implement and manage the infrastructure that supports the platform tools and ensuring that upgrades, security patches and other performance improvements are regularly performed.
+ Evaluate cloud providers, containerization solutions and other complex technologies to deeply understand the configurations available and create abstractions of common configurations that can be utilized easily by application teams for their workloads
+ Write and execute automated Infrastructure as Code scripts and utilize processes like CI/CD to streamline and automate how the platform infrastructure is provisioned, configured and managed to improve consistency, traceability and repeatability
+ Integrate performance and monitoring best practices including QoS and SLA metrics to scale platform applications and managed services automatically due to demand.
+ Incorporate security and disaster recovery best practices into the infrastructure applications by integrating access control, identity management, logging/monitoring, public/private network configurations, data encryption, storage backup and disaster recovery, etc.
+ Facilitate the integration of enterprise managed software configurations into deployment pipelines managed by application teams to ensure approved configurations and best practices around security, networking, logging & monitoring, performance & scale, etc. are applied
+ Advocating feedback with service providers and developers, to ensure the platform continues to grow and evolve to meet their needs.
Skills Required:
+ Scripting, Automation, Root Cause Analysis, Troubleshooting (Problem Solving), Cloud Architecture, IT Solutions, GitHub, Cloud Infrastructure, Change Management, Technical Analysis, Developer, Tekton, Utilization Management, Kubernetes
Skills Preferred:
+ Ansible, GCP, Dynatrace, Powershell, Access Controls, Python, Information Security, VMware
Experience Required:
+ Engineer 3 Exp: Prac. In 2 coding lang. or adv. Prac. in 1 lang. 6+ years in IT; 4+ years in development
Education Required:
+ Associate Degree, College Senior
Education Preferred:
+ Certification Program, Bachelor's Degree
Additional Information :
+ Conduct capacity planning and forecasting for the OpenShift Virtualization platform, including compute, memory, storage, and network resources, to ensure scalability and prevent resource exhaustion.
+ Analyze resource utilization trends and make recommendations for infrastructure scaling, consolidation, or optimization.
+ Collaborate with application teams and stakeholders to understand future demand and project capacity needs.
+ Develop and maintain capacity models and reports to support strategic planning.
+ Develop automation solutions (scripts, playbooks) for repetitive OSV tasks, including configuration changes, VM management, auditing, remediation and integration with ticketing systems
+ Leverage automation to enable delivering operator updates and changes efficiently at scale
+ Implement Site Reliability Engineering (SRE) principles and practices to improve overall platform stability, performance, and operational efficiency
+ Role Based Access Control deployment and auditing
+ Namespace and Resource Quota management
+ Implement and maintain comprehensive end to end observability solutions (monitoring, logging, tracing) for the OSV environment, including integration with tools like Dynatrace and Prometheus/Grafana
+ Explore and implement Event Driven Architecture (EDA) for enhanced real time monitoring and response.
+ Develop capabilities to flag and report abnormalities and identify "blind spots" in observability
+ Perform deep dive Root Cause Analysis (RCA), potentially utilizing available tooling, to quickly identify and resolve issues across the global compute environment
+ Find the needle in a haystack/unhealthy bits in the compute universe (Globally) for faster time to resolution
+ Monitor VM health, resource usage, and performance metrics proactively
+ Monitor for unusual activity that might indicate a compromise or misconfiguration
+ Solution Design & Consulting - Knowledge Management
#LI-AA1 #LI-Hybrid
What Makes HTC A Great Place To Build Your Future
HTC Global Services wants you to join our team. Come build new things with us and advance your career. At HTC Global, you’ll collaborate with experts, work alongside clients, and be part of high-performing teams driving success together. You’ll have long-term opportunities to grow your career and develop skills in the latest emerging technologies.
At HTC Global Services, our employees have access to a comprehensive benefits package. Benefits can include Group Health (Medical, Dental, and Vision), Paid Time Off, Paid Holidays, 401(k) matching, Group Life and Disability insurance, Professional Development opportunities, Wellness programs, and a variety of other perks.
Our success as a company is built on inclusion and diversity. HTC Global Services is committed to providing a workplace free from discrimination and harassment, where every employee is treated with dignity and respect. We celebrate differences and believe that diverse cultures, perspectives, and skills drive innovation and success. HTC is an Equal Opportunity Employer and a proud National Minority Supplier. We seek to empower each individual, fostering an environment where everyone feels valued, included, and respected.
-