- 
        Lead Machine Learning Engineer - ESPN+…
- The Walt Disney Company (San Francisco, CA)
- 
             Disney Entertainment and ESPN Product & Technology Technology is at the heart of Disney’s past, present, and future. Disney Entertainment and ESPN Product & Technology is a global organization of engineers, product developers, designers, technologists, data scientists, and more – all working to build and advance the technological backbone for Disney’s media business globally. The team marries technology with creativity to build world-class products, enhance storytelling, and drive velocity, innovation, and scalability for our businesses. We are Storytellers and Innovators. Creators and Builders. Entertainers and Engineers. We work with every part of The Walt Disney Company’s media portfolio to advance the technological foundation and consumer media touch points serving millions of people around the world. Here are a few reasons why we think you’d love working here: **1. Building the future of Disney’s media:** Our Technologists are designing and building the products and platforms that will power our media, advertising, and distribution businesses for years to come. **2. Reach, Scale & Impact:** More than ever, Disney’s technology and products serve as a signature doorway for fans' connections with the company’s brands and stories. Disney+. Hulu. ESPN. ABC. ABC News…and many more. These products and brands – and the unmatched stories, storytellers, and events they carry – matter to millions of people globally. **3. Innovation:** We develop and implement groundbreaking products and techniques that shape industry norms and solve complex and distinctive technical problems. Job Summary: ESPN is investing in real-time ML platforms that power next-generation personalization and on-platform decisioning for live and short-form sports experiences. As Lead Machine Learning Engineer, you will own critical pieces of the ML infrastructure and operations stack—from low-latency online inference services and streaming feature platforms to model deployment, observability, and reliability practices. You’ll partner closely with the Principal MLE, platform/SRE, data, and product teams to design for scale, latency, cost, and resilience, ensuring our ML services meet production SLOs for millions of users. You will set technical standards, establish MLOps best practices, mentor engineers, and help evolve our real-time experimentation and safety/governance frameworks Responsibilities and Duties of the Role: 1) Real-Time ML Platform Architecture & Services + Architect, build, and operate low-latency online inference services (e.g., GPU/CPU serving, autoscaling, request routing, canary/shadow, blue-green). + Design multi-region, highly available ML services with graceful degradation, back-pressure, and circuit-breaking. + Lead capacity planning and cost management for high-QPS workloads. 2) MLOps, Reliability & Observability + Establish CI/CD for models (pipelines, approvals, rollbacks), model registry, and artifact/version lineage. + Define and uphold SLOs/SLAs, build end-to-end observability (metrics, logs, traces), and operate an on-call/incident response rhythm for ML services. + Implement drift detection, data quality/contract checks, and guardrails for safety/compliance. 3) Data & Feature Platform for Streaming ML + Lead feature store strategy (offline/online consistency, schema evolution, embedding/feature freshness). + Build streaming data pipelines (event ingestion, enrichment, aggregation) to feed real-time models. 4) Experimentation & Online Evaluation + Partner with experimentation teams to enable A/B, canary, interleaving, and online metrics with statistically sound guardrails. + Instrument post-deployment validation and continuous evaluation loops. 5) Modeling Enablement & Technical Leadership + Collaborate with applied ML to productionize retrieval, ranking, and re-ranking (e.g., ANN vector search, embeddings). + Mentor engineers; drive design reviews, RFCs, and cross-team architectural decisions. Required Education, Experience/Skills/Training: Basic Qualifications: + Demonstrated ownership of production ML services with successful launches and measurable reliability/latency outcomes. + Deep experience operating real-time ML systems: online inference, feature stores, streaming data pipelines, and online evaluation. + Strong software engineering and distributed systems skills (e.g., Kubernetes, containers, service mesh; AWS; IaC such as Terraform/Helm). + Proficiency with ML serving frameworks (e.g., Onnx, TorchServe, TF Serving, TensorRT) and modernML frameworks (PyTorch/TensorFlow). + Hands-on with streaming stacks (Kafka/Kinesis/Pub/Sub; Flink/Spark Streaming Jobs) and low-latency storage/caches (Redis, DynamoDB). + Competence in observability (DataDog, Grafana, OpenTelemetry), CI/CD (GitHub Actions/Jenkins/Argo CD), and testing (offline/online, canary/shadow). + Excellent cross-functional communication; experience mentoring engineers and driving engineering standards. + Programming expertise in Python and at least one of Go/Java/C++. Preferred qualifications: + Experience running large‑scale personalization/recommendation or decisioning in production with p95/p99 latency and availability targets. + Expertise with vector search and embedding platforms (e.g., FAISS/ScaNN; OpenSearch vector/Pinecone) and ANN system design. + Experience with multi‑tenant ML platforms, multi‑region/active‑active architectures, and disaster recovery. + Background with experimentation platforms (A/B, bandits), feedback loops & delayed labels, and exploration‑exploitation strategies. + Demonstrated cost‑to‑serve optimization for ML workloads (GPU scheduling, right‑sizing, autoscaling strategies). + Ability to navigate 0→1 platform builds and ambiguous product requirements with pragmatic tradeoffs. Experience with: + 7+ years building and operating ML systems in production, including real‑time services at scale. Required Education: + Bachelor’s degree in Computer Science, Machine Learning, Data Science, or a related field, or equivalent practical experience. \#DISNEYTECH The hiring range for this position in New York, NY & Seattle, WA is $172,300-$231,100 per year, in San Francisco, CA is $183,700.00-$246,400.00 per year and in Los Angeles, CA is $164,500.00 to $220,600.00 per year. The base pay actually offered will take into account internal equity and also may vary depending on the candidate’s geographic region, job-related knowledge, skills, and experience among other factors. A bonus and/or long-term incentive units may be provided as part of the compensation package, in addition to the full range of medical, financial, and/or other benefits, dependent on the level and position offered. **Job ID:** 10121018 **Location:** San Francisco,California **Job Posting Company:** Disney Entertainment and ESPN Product & Technology The Walt Disney Company and its Affiliated Companies are Equal Employment Opportunity employers and welcome all job seekers including individuals with disabilities and veterans with disabilities. If you have a disability and believe you need a reasonable accommodation in order to search for a job opening or apply for a position, email [email protected] with your request. This email address is not for general employment inquiries or correspondence. We will only respond to those requests that are related to the accessibility of the online application system due to a disability. 
 
 
- 
        
Recent Jobs
- 
                
                    Lead Machine Learning Engineer - ESPN+ Personalization
                
                - The Walt Disney Company (San Francisco, CA)
- 
                
                    Class B Driver
                
                - Byrne Dairy & Deli (Auburn, NY)