Magic logo

Software Engineer - Pretraining Data

Magic

About Magic

Magic’s mission is to build safe AGI that accelerates humanity’s progress on the world’s most important problems. We believe the most promising path to safe AGI lies in automating research and code generation to improve models and solve alignment more reliably than humans can alone. Our approach combines frontier-scale pre-training, domain-specific RL, ultra-long context, and test-time compute to achieve this goal.

About the Role

As a Software Engineer working on our pretraining data, you will write efficient and robust pipelines for giant, multimodal datasets. You will develop and optimize web scraping techniques to harvest and maintain data at internet-scale.

Responsibilities

  • Design and implement multimodal (video, audio, text, etc.) web crawlers for scraping and indexing petabytes of data.
  • Create large-scale data processing pipelines using tools like Ray, Apache Spark, Apache Flink, Google BigQuery, etc.
  • Implement and scale deduplication techniques across modalities and apply heuristic and model-based techniques for parsing and filtering crawled data.
  • Identify new data sources for inclusion in pre/post-training datasets.

What We’re Looking For

  • Strong proficiency in distributed computing and parallel processing techniques.
  • Obsession with details, reliability, and good testing to ensure data quality and integrity.
  • Experience with designing and maintaining high-performance, scalable data architectures.
  • Ability to design, develop, and operate an LLM data pipeline from web scraping to data loading.

Our Culture

  • Integrity: Words and actions should be aligned.
  • Hands-on: At Magic, everyone is building.
  • Teamwork: We move as one team, not N individuals.
  • Focus: Safely deploy AGI. Everything else is noise.
  • Quality: Magic should feel like magic.

Compensation, Benefits, and Perks (US)

  • Annual salary range: $100K - $900K
  • Equity is a significant part of total compensation, in addition to salary.
  • 401(k) plan with 6% salary matching.
  • Generous health, dental, and vision insurance for you and your dependents.
  • Unlimited paid time off.
  • Option to work in-person in SF or remotely.
  • Visa sponsorship and relocation stipend to bring you to SF.
  • A small, fast-paced, highly focused team.

Benefits
Extracted with AI

  • 401(k)
  • Vision insurance
  • Generous health, dental and vision insurance
  • Unlimited paid time off
  • Visa sponsorship
  • Relocation stipend

Similar jobs

Last update: 23 minutes ago

CHAI: AI Platform logo
CHAI: AI Platform

Senior ML Infrastructure Engineer

Join CHAI: AI Platform as a Senior ML Infrastructure Engineer to build and scale ML systems in Palo Alto.

micro1 logo
micro1

Machine Learning Engineer with AI/ML Experience

Join us as a Machine Learning Engineer to develop AI/ML models and applications. Work remotely with top-tier companies.

Inclusively logo
Inclusively

Senior Software Engineer, Machine Learning

Join as a Senior Software Engineer in Machine Learning, working remotely to build ML-driven products for user engagement.

Waabi logo
Waabi

Remote Software Engineer

Join Waabi as a Remote Software Engineer to develop cutting-edge self-driving technology. Work with AI, Python, C++, and more.

Meta logo
Meta

Research Engineer, Language - Generative AI

Join Meta as a Research Engineer in Generative AI, focusing on large language models and NLP.

Tesla logo
Tesla

Internship Software Engineer - Machine Learning Infrastructure

Join Tesla as an Internship Software Engineer in Machine Learning Infrastructure. Work on AI infrastructure and neural network scaling.

Ema Unlimited logo
Ema Unlimited

Machine Learning Engineer

Join Ema Unlimited as a Machine Learning Engineer in SF Bay Area, working on cutting-edge AI solutions with a focus on NLP and ML technologies.

SPREAD AI logo
SPREAD AI

FullStack Software Developer

Join SPREAD AI as a FullStack Software Developer to innovate in data management and engineering intelligence.

Niantic, Inc. logo
Niantic, Inc.

Senior Software Engineer, Machine Learning

Join Niantic as a Senior Software Engineer in Machine Learning to enhance products using generative AI technologies.

DwellFi  logo
DwellFi

AI Solutions Software Engineer

Join DwellFi as an AI Solutions Software Engineer to develop innovative AI solutions using LangChain or Llama.

Adobe logo
Adobe

Machine Learning Engineer - University Graduate 2025

Join Adobe as a Machine Learning Engineer to develop AI technologies for Photoshop and Digital Imaging.

FutureHouse logo
FutureHouse

Software Engineer - Member of Technical Staff

Join FutureHouse as a Software Engineer to innovate AI systems for scientific research in San Francisco.

Standard AI logo
Standard AI

Senior Software Engineer, Backend

Join Standard AI as a Senior Backend Engineer to design scalable microservices and APIs. Remote role with competitive salary and benefits.

Magic Eden logo
Magic Eden

Senior Backend Engineer

Join Magic Eden as a Senior Backend Engineer to build scalable systems using Node.js and cloud technologies.

OpenAI logo
OpenAI

Software Engineer, ChatGPT Enterprise

Join OpenAI as a Software Engineer for ChatGPT Enterprise, focusing on secure, scalable AI solutions.

Stability AI logo
Stability AI

Remote Data Engineer - Research

Join Stability AI as a Remote Data Engineer to build scalable data infrastructure for AI models.

MarketWise logo
MarketWise

AI/ML Data Engineer

Join MarketWise as an AI/ML Data Engineer to develop data pipelines and ETL processes using Python and cloud platforms.

Intuit logo
Intuit

Software Engineer 2 - Platform Security

Join Intuit as a Software Engineer 2 in Platform Security, focusing on cloud infrastructure and security best practices.

OfferFit logo
OfferFit

Machine Learning Engineer

Join OfferFit as a Machine Learning Engineer to design and scale AI platforms. Work remotely with a focus on Python, MLOps, and data science.

Uplimit logo
Uplimit

Software Engineer - Full Stack

Join Uplimit as a Full Stack Software Engineer to build AI-powered learning platforms. Work on cutting-edge AI projects in a hybrid environment.

Messari logo
Messari

Data Engineer with Blockchain and Cloud Experience

Join Messari as a Data Engineer to design blockchain data models, build dashboards, and derive insights. Remote role with competitive benefits.

SPREAD AI logo
SPREAD AI

FullStack Software Developer

Join SPREAD AI as a FullStack Software Developer in Berlin. Work with JavaScript, Python, Go, and more in a hybrid setup.

Intel Corporation logo
Intel Corporation

AI Software Development Engineer

Join Intel as an AI Software Development Engineer to develop and deploy AI applications, enhancing engineering productivity.

PayPal logo
PayPal

Machine Learning Engineer

Join PayPal as a Machine Learning Engineer to develop advanced ML solutions for product and marketing strategies.