"Alerted.org

Job Title, Industry, Employer
City & State or Zip Code
20 mi
  • 0 mi
  • 5 mi
  • 10 mi
  • 20 mi
  • 50 mi
  • 100 mi
Advanced Search

Advanced Search

Cancel
Remove
+ Add search criteria
City & State or Zip Code
20 mi
  • 0 mi
  • 5 mi
  • 10 mi
  • 20 mi
  • 50 mi
  • 100 mi
Related to

  • AI Inference Engineer

    quadric.io, Inc (Burlingame, CA)



    Apply Now

    Quadric has created an innovative general purpose neural processing unit (GPNPU) architecture. Quadric's co-optimized software and hardware is targeted to run neural network (NN) inference workloads in a wide variety of edge and endpoint devices, ranging from battery operated smart-sensor systems to high-performance automotive or autonomous vehicle systems. Unlike other NPUs or neural network accelerators in the industry today that can only accelerate a portion of a machine learning graph, the Quadric GPNPU executes both NN graph code and conventional C++ DSP and control code.

    Role:

    The AI Inference Engineer in Quadric is the key bridge between the world of AI/LLM models and Quadric unique platforms. The AI Inference Engineer at Quadric will [1] port AI models to Quadric platform; [2] optimize the model deployment for efficient inference; [3] profile and benchmark the model performance. This senior technical role demands deep knowledge of AI model algorithms, system architecture and AI toolchains/frameworks.

    Responsibilities:

    + Quantize, prune and convert models for deployment

    + Port models to Quadric platform using Quadric toolchain

    + Optimize inference deployment for latency, speed

    + Benchmark and profile model performance and accuracy

    + Develop tools to scale and speed up the deployment

    + Make Improvement to SDK and runtime

    + Provide technical support and documents to customers and developer community

    Requirements

    Requirements:

    + Bachelor’s or Master’s in Computer Science and/or Electric Engineering.

    + 5+ years of experience in AI/LLM model inference and deployment frameworks/tools

    + experience with model quantization (PTQ, QAT) and tools

    + experience with model accuracy measures

    + experience with model inference performance profiling

    + experience with at least one of the following frameworks: onnxruntime, Pytorch, vLLM, huggingface-transformer, neural-compressor, llamacpp

    + Proficiency in C/C++ and Python

    + Demonstrate good capability in problem solving, debug and communication

     

    Benefits

     

    + Health Care Plan (Medical, Dental & Vision)

    + Retirement Plan (401k, IRA)

    + Life Insurance (Basic, Voluntary & AD&D)

    + Paid Time Off (Vacation, Sick & Public Holidays)

    + Family Leave (Maternity, Paternity)

    + Short Term & Long Term Disability

    + Training & Development

    + Work From Home

    + Free Food & Snacks

    + Stock Option Plan

     


    Apply Now



Recent Searches

  • Compliance Advisory Program Manager (Bridgewater, NJ)
  • Principal Application Engineer Semiconductor (Los Angeles County, CA)
  • Engineering Project Manager SDE (United States)
  • CPIMS Assistant (United States)
[X] Clear History

Recent Jobs

  • AI Inference Engineer
    quadric.io, Inc (Burlingame, CA)
[X] Clear History

Account Login

Cancel
 
Forgot your password?

Not a member? Sign up

Sign Up

Cancel
 

Already have an account? Log in
Forgot your password?

Forgot your password?

Cancel
 
Enter the email associated with your account.

Already have an account? Sign in
Not a member? Sign up

© 2025 Alerted.org