"Alerted.org

Job Title, Industry, Employer
City & State or Zip Code
20 mi
  • 0 mi
  • 5 mi
  • 10 mi
  • 20 mi
  • 50 mi
  • 100 mi
Advanced Search

Advanced Search

Cancel
Remove
+ Add search criteria
City & State or Zip Code
20 mi
  • 0 mi
  • 5 mi
  • 10 mi
  • 20 mi
  • 50 mi
  • 100 mi
Related to

  • Infrastructure/GPU Engineer

    Cognizant (Denver, CO)



    Apply Now

    Cognizant is seeking a highly skilled hands-on Infrastructure Engineer with proven experience in the physical and technical deployment of AI-ready environments optimized for AI and machine learning workloads. This role focuses on NVIDIA DGX or similar systems, GPU-accelerated compute clusters, high-speed networking, and scalable storage solutions. The ideal candidate will have deep expertise in infrastructure design ,deployment, workload orchestration, and performance optimization in enterprise environments.

     

    This is a remote role in the US. Salary range for this role is between $99,000 and $116,000 depending on skills and qualifications of the candidate. Applications will be accepted till 10/21/2025.

    Key Responsibilities

    System Design & Deployment

    + Help in rightsizing GPU investment

    + Architect and deploy NVIDIA DGX systems and GPU-based compute clusters.

    + Design and implement scalable parallel filesystems (e.g., Lustre, BeeGFS, GPFS).

    + Integrate high-speed interconnects using InfiniBand, RoCE, and RDMA.

    + Collaborate on rack planning and airflow optimization.

    Cluster & Infrastructure Management

    + Configure and manage Slurm Workload Manager for job scheduling.

    + Deploy and maintain cluster orchestration tools

    + Automate provisioning using PXE boot, Terraform, Redfish, and Kubernetes.

    + Perform firmware updates, BIOS/IPMI/BMC configuration, and OS provisioning

    + Knowledge of Run.ai, ClearML or similar platform

    Networking & Performance Optimization

    + Design and validate network topologies including IPMI, internal/external networks, and InfiniBand fabrics.

    + Optimize RDMA and RoCE configurations for low-latency, high-throughput data transfers.

    + Conduct performance benchmarking using GPU-Burn, NCCL, and NVSM.

    Monitoring & Troubleshooting

    + Implement system health checks and diagnostics across compute, storage, and network layers.

    + Troubleshoot hardware/software issues and ensure reliable infrastructure operation.

     

    Required Skills & Qualifications

    Technical Expertise

    + Deep understanding of NVIDIA DGX architecture, CUDA, and GPU compute.

    + Strong Linux system administration and shell scripting skills.

    + Experience with Slurm, parallel filesystems, and high-speed networking (InfiniBand/RDMA/RoCE).

    + Familiarity with containerization (Docker), orchestration (Kubernetes), and automation tools (Ansible, Redfish).

    Preferred Qualifications

    + Experience with BBCM, and DGX BasePOD/SuperPOD configuration

     

    Certifications by Nvidia or equivalent OEM.

     

    Cognizant is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to sex, gender identity, sexual orientation, race, color, religion, national origin, disability, protected Veteran status, age, or any other characteristic protected by law.

     


    Apply Now



Recent Searches

[X] Clear History

Recent Jobs

  • Infrastructure/GPU Engineer
    Cognizant (Denver, CO)
  • Patient Care Technician - Tele/Oncology - PT Nights with Benefits
    Hackensack Meridian Health (Neptune City, NJ)
  • Clinical Denials Prevention & Appeals Specialist RN- Per Diem
    Nuvance Health (Danbury, CT)
  • UI and Backend Integration Senior Payments Software Engineer
    Truist (Atlanta, GA)
[X] Clear History

Account Login

Cancel
 
Forgot your password?

Not a member? Sign up

Sign Up

Cancel
 

Already have an account? Log in
Forgot your password?

Forgot your password?

Cancel
 
Enter the email associated with your account.

Already have an account? Sign in
Not a member? Sign up

© 2025 Alerted.org