-
ML Systems Engineer
- TEKsystems (Redmond, WA)
-
Are you interested in working with an organization that is helping to make the next generation of cutting-edge technology possible? Can you think "outside the
box" to engineer technology solutions that solve complex issues? Is coordinating and collaborating with some of the sharpest researchers in the world an exciting challenge for you? If so, we could use your help!
The Research Technology Engineering team supports the technological needs of one of the most diverse corporate research labs in operation today. With over 1000 researchers doing work in more than 55 areas across 6 labs worldwide, the technology exposure, engineering, and problem-solving opportunities are diverse and
continually changing.
The ML Systems Engineer will help understand our users’ data processing needs, optimize and troubleshoot their workloads on our supercomputer systems, engineer solutions to recurring problems and keep our technical documentation current and relevant. You will interact with researchers and engineers directly to
understand their needs and issues, then consult with them to improve their outcomes. Foundational knowledge of current machine learning/deep learning techniques is critical. Passion for new technology and intellectual curiosity are also keys for success in this role.
Daily responsibilities will include (but are not limited to) monitoring our cluster infrastructure and users’ jobs for failures and inefficiencies. Consulting with
researchers to troubleshoot job configuration and optimization. Instruct team members and researchers on best practices for job configuration and
performance. Analyze the performance of the job and recommend fixes based on knowledge from our knowledge base and the experience you bring.
We are seeking a skilled GPU Efficiency Analyst to join our team and contribute to the optimization of machine learning (ML) and deep learning (DL) workloads on AI platforms. The ideal candidate will have a strong experience in running large ML workloads in distributed clusters, including GPU and DGX systems, to fine-tune performance.
Responsibilities:
+ Conduct GPU efficiency analysis for ML/DL jobs on AI platforms.
+ Optimize ML models by running large workloads in distributed clusters (GPU) and DGX systems.
+ Utilize PyTorch, TensorFlow, and Azure ML for model optimization and performance tuning.
+ Deploy and manage ML workloads using Docker containers and Kubernetes.
+ Collaborate with cross-functional teams to identify and implement performance improvements.
+ Monitor and analyze system performance metrics to ensure optimal GPU utilization.
+ Provide detailed reports and recommendations based on analysis findings.
Requirements:
+ Bachelor's degree in Computer Science, Engineering, or a related field.
+ Proven experience in GPU efficiency analysis and ML/DL workload optimization.
+ Proficiency in PyTorch, TensorFlow, and Azure ML.
+ Experience with Docker containers and Kubernetes for deployment and management.
+ Strong analytical and problem-solving skills.
+ Excellent communication and teamwork abilities.
Pay and Benefits
The pay range for this position is $55.00 - $72.00/hr.
Eligibility requirements apply to some benefits and may depend on your job classification and length of employment. Benefits are subject to change and may be subject to specific elections, plan, or program terms. If eligible, the benefits available for this temporary role may include the following: • Medical, dental & vision• Critical Illness, Accident, and Hospital• 401(k) Retirement Plan – Pre-tax and Roth post-tax contributions available• Life Insurance (Voluntary Life & AD&D for the employee and dependents)• Short and long-term disability• Health Spending Account (HSA)• Transportation benefits• Employee Assistance Program• Time Off/Leave (PTO, Vacation or Sick Leave)
Workplace Type
This is a fully remote position.
Application Deadline
This position is anticipated to close on May 7, 2025.
About TEKsystems and TEKsystems Global Services
We’re a leading provider of business and technology services. We accelerate business transformation for our customers. Our expertise in strategy, design, execution and operations unlocks business value through a range of solutions. We’re a team of 80,000 strong, working with over 6,000 customers, including 80% of the Fortune 500 across North America, Europe and Asia, who partner with us for our scale, full-stack capabilities and speed. We’re strategic thinkers, hands-on collaborators, helping customers capitalize on change and master the momentum of technology. We’re building tomorrow by delivering business outcomes and making positive impacts in our global communities. TEKsystems and TEKsystems Global Services are Allegis Group companies. Learn more at TEKsystems.com.
The company is an equal opportunity employer and will consider all applications without regard to race, sex, age, color, religion, national origin, veteran status, disability, sexual orientation, gender identity, genetic information or any characteristic protected by law.
-
Recent Jobs
-
ML Systems Engineer
- TEKsystems (Redmond, WA)
-
Health Facilities Surveyor
- State of Arkansas (Little Rock, AR)
-
Power and Analog Senior Electrical Engineer
- Raytheon (Marlborough, MA)
-
Oracle Database Administrator III
- LA Care Health Plan (Los Angeles, CA)