-
Senior NVIDIA Server Administrator
- Insight Global (Chicago, IL)
-
Job Description
JOB DUTIES: We are seeking a highly skilled Senior Server Administrator to join our AI Engineering team. This role is critical to the deployment, maintenance, and optimization of high-performance computing infrastructure, specifically leveraging NVIDIA’s advanced GPU technologies. You will work closely with AI researchers, data scientists, and software engineers to ensure our systems are robust, scalable, and tuned for cutting-edge machine learning workloads.
• Administer and maintain GPU-accelerated servers and clusters, including NVIDIA A100, H100, and other high-end GPU sets.
• Installation, configuration, management, and support of NVIDIA software suites like Omniverse, NVAIE, Run.ai and other associated components.
• Manage and optimize NVIDIA software stack components such as CUDA, cuDNN, TensorRT, NCCL, and NGC containers.
• Monitor system performance, troubleshoot hardware/software issues, and ensure high availability of AI infrastructure.
• Collaborate with DevOps and AI teams to support containerized workflows (Docker, Kubernetes) and distributed training environments.
• Implement security best practices and ensure compliance with internal and external standards.
• Lead upgrades, patching, and lifecycle management of GPU servers and related infrastructure.
• Provide documentation, automation scripts, and training for internal teams.
• Worker is also responsible for performing other job duties as assigned by management from time to time.
We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment regardless of their race, color, ethnicity, religion, sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military or uniformed service member status, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to [email protected] learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy: https://insightglobal.com/workforce-privacy-policy/.
Skills and Requirements
• 5+ years of experience in server administration, with at least 3 years focused on NVIDIA GPU-based systems.
• Deep understanding of Linux system administration, especially in HPC or AI environments.
• Hands-on experience with NVIDIA GPU drivers, CUDA toolkit, and performance tuning.
• Familiarity with Slurm, Kubernetes, or other job scheduling and orchestration tools.
• Experience with monitoring tools (e.g., Prometheus, Grafana) and infrastructure automation (e.g., Ansible, Terraform).
• Strong scripting skills (Bash, Python, etc.).
• Excellent problem-solving and communication skills. • NVIDIA Certified Professional or similar credentials.
• Experience with multi-GPU and multi-node training setups.
• Experience with third party colocation providers of high-end GPU sets.
• Familiarity with AI/ML frameworks (e.g., PyTorch, TensorFlow) and their GPU dependencies.
• Exposure to cloud-based GPU infrastructure (AWS, Azure, GCP).
-
Recent Jobs
-
Senior NVIDIA Server Administrator
- Insight Global (Chicago, IL)
-
DSP Algorithm Engineer (Lead or Senior)
- The Boeing Company (El Segundo, CA)
-
Computer Assistant
- Air Combat Command (Grand Forks AFB, ND)
-
Summer 2026 Intern - Economic Research
- Federal Reserve Bank (Detroit, MI)