-
Lead DevOps Engineer - Hybrid
- MSys Inc. (Richmond, VA)
-
Job summary:
Title:
Lead DevOps Engineer - Hybrid
Location:
Richmond, VA, USA
Length and terms:
Long term - W2 Only
Position created on 11/11/2025 05:52 pm
Job description:
** Webcam interview; *** Long term project *** Hybrid ***Linkedin Must*** Due to security reasons only USC/GC **
Job Description:
+ Design & Implement Solutions: Build and maintain comprehensive observability platforms that provide deep insights into complex systems, incorporating logs, metrics, and traces.
+ System Instrumentation: Instrument applications, infrastructure, and services to collect telemetry data using frameworks like OpenTelemetry.
+ Data Analysis & Visualization: Develop dashboards, reports, and alerts using tools like Prometheus, Grafana, and Splunk to visualize system performance and detect issues.
+ Collaboration: Work with development, SRE, and DevOps teams to integrate observability best practices and align monitoring with business and operational goals.
+ Automation: Develop scripts and use Infrastructure as Code (IaC) tools like Ansible and Terraform to automate monitoring configurations and telemetry collection.
+ Implement and manage full-stack observability using Datadog, ensuring seamless monitoring across infrastructure, applications, and services.
+ Instrument agents for on-premise, cloud, and hybrid environments to enable comprehensive monitoring.
+ Design and deploy key service monitoring, including dashboards, monitor creation, SLA/SLO definitions, and anomaly detection with alert notifications.
+ Configure and integrate Datadog with third-party services such as ServiceNow, SSO enablement, and other ITSM tools
Key Skills & Tools:
+ Observability Tools: Proficiency in monitoring, logging, and tracing tools, including Prometheus, Grafana, ELK Stack (Elasticsearch, Logstash, Kibana), Splunk, Datadog, New Relic, and cloud-native solutions like AWS CloudWatch.
+ Programming Languages: Expertise in languages such as Python and Go for scripting and automation.
+ Infrastructure & Cloud Platforms: Experience with cloud platforms (AWS, GCP, Azure) and container orchestration systems like Kubernetes.
+ Infrastructure as Code (IaC): Familiarity with Terraform and Ansible for managing infrastructure and configurations.
+ CI/CD & Automation: Experience with CI/CD pipelines and automation tools like Jenkins.
+ System & Software Engineering: A strong background in both system operations and software development.
+ Optimize cloud agent instrumentation, with cloud certifications being a plus.
+ Datadog Fundamental, APM and Distributed Tracing Fundamentals & Datadog Demo Certification (Mandatory)
+ Strong understanding of Observability concepts (Logs, Metrics, Tracing)
+ Expertise in security & vulnerability management in observability
+ Possesses 2 years of experience in cloud-based observability solutions, specializing in monitoring, logging, and tracing across AWS, Azure, and GCP environments.
Contact the recruiter working on this position:
The recruiter working on this position is Nadeem Ahmed Razvi(Shaji Team)
His/her contact number is +(1) (202) 7381674 His/her contact email is [email protected]
Our recruiters will be more than happy to help you to get this contract.
-