Magic logo

Software Engineer - Pretraining Data

Magic

About Magic

Magic’s mission is to build safe AGI that accelerates humanity’s progress on the world’s most important problems. We believe the most promising path to safe AGI lies in automating research and code generation to improve models and solve alignment more reliably than humans can alone. Our approach combines frontier-scale pre-training, domain-specific RL, ultra-long context, and test-time compute to achieve this goal.

About the Role

As a Software Engineer working on our pretraining data, you will write efficient and robust pipelines for giant, multimodal datasets. You will develop and optimize web scraping techniques to harvest and maintain data at internet-scale.

Responsibilities

  • Design and implement multimodal (video, audio, text, etc.) web crawlers for scraping and indexing petabytes of data.
  • Create large-scale data processing pipelines using tools like Ray, Apache Spark, Apache Flink, Google BigQuery, etc.
  • Implement and scale deduplication techniques across modalities and apply heuristic and model-based techniques for parsing and filtering crawled data.
  • Identify new data sources for inclusion in pre/post-training datasets.

What We’re Looking For

  • Strong proficiency in distributed computing and parallel processing techniques.
  • Obsession with details, reliability, and good testing to ensure data quality and integrity.
  • Experience with designing and maintaining high-performance, scalable data architectures.
  • Ability to design, develop, and operate an LLM data pipeline from web scraping to data loading.

Our Culture

  • Integrity: Words and actions should be aligned.
  • Hands-on: At Magic, everyone is building.
  • Teamwork: We move as one team, not N individuals.
  • Focus: Safely deploy AGI. Everything else is noise.
  • Quality: Magic should feel like magic.

Compensation, Benefits, and Perks (US)

  • Annual salary range: $100K - $900K
  • Equity is a significant part of total compensation, in addition to salary.
  • 401(k) plan with 6% salary matching.
  • Generous health, dental, and vision insurance for you and your dependents.
  • Unlimited paid time off.
  • Option to work in-person in SF or remotely.
  • Visa sponsorship and relocation stipend to bring you to SF.
  • A small, fast-paced, highly focused team.

Benefits
Extracted with AI

  • 401(k)
  • Vision insurance
  • Generous health, dental and vision insurance
  • Unlimited paid time off
  • Visa sponsorship
  • Relocation stipend

Similar jobs

Last update: 23 minutes ago

DeepL logo
DeepL

Senior Backend Engineer C++

Join DeepL as a Senior Backend Engineer C++ to design and maintain scalable backend services using C++ and AI technologies.

Computer Futures logo
Computer Futures

Cloud Data Engineer

Seeking a Cloud Data Engineer with expertise in AWS, Python, and CI/CD for a hybrid role in Hannover. Join our dynamic team!

Together AI logo
Together AI

Senior Backend Engineer - Java, Rust, Go

Join Together AI as a Senior Backend Engineer in Amsterdam. Work with Java, Rust, and Go to build scalable backend systems.

dataroots logo
dataroots

Expert Machine Learning Engineer

Join Dataroots as an Expert Machine Learning Engineer to design and deliver AI-powered solutions, focusing on machine learning models.

Aiven logo
Aiven

Staff Software Engineer

Join Aiven as a Staff Software Engineer to develop cloud operations platforms using open-source technologies. Hybrid work in Berlin.

Personio logo
Personio

Staff Software Engineer, Data Platform

Join Personio as a Staff Software Engineer in Berlin to build scalable data platforms using Kafka, Kubernetes, and AWS. Drive innovation and excellence.

Applied Intuition logo
Applied Intuition

Software Engineer - Autonomous Driving

Join Applied Intuition as a Software Engineer in Munich to tackle autonomous driving challenges with top ADAS/AV programs.

Elastic logo
Elastic

Software Engineer II - Developer Experience

Join Elastic as a Software Engineer II in Developer Experience, focusing on test frameworks for Kibana. Remote work, competitive benefits.

BCG X logo
BCG X

AI Engineer

Join BCG X as an AI Engineer in Milan, Italy. Develop AI solutions, partner with clients, and drive innovation in a dynamic environment.

Motius logo
Motius

Senior Backend Developer

Join Motius as a Senior Backend Developer to work on cutting-edge R&D projects using AWS, Docker, GraphQL, and more in a hybrid work environment.

Sofico logo
Sofico

Senior Software Engineer - Java, Microservices

Join Sofico as a Senior Software Engineer focusing on Java and Microservices in Bavaria, Germany. Work on ERP solutions for automotive finance.

HeyJobs logo
HeyJobs

Senior Software Engineer - AWS, Python, Ruby on Rails

Join HeyJobs as a Senior Software Engineer to design scalable systems using AWS, Python, and Ruby on Rails in a dynamic team.

Bitmovin logo
Bitmovin

Senior C++ Software Engineer

Join Bitmovin as a Senior C++ Software Engineer to develop scalable video streaming solutions using modern C++ and cloud-native architectures.

Computer Futures logo
Computer Futures

Mid-Level Full Stack Software Engineer - Cloud & Web

Join as a Full Stack Software Engineer focusing on C#, Azure, and Microservices in a dynamic team with flexible work options.

n8n logo
n8n

Senior Software Engineer (Node.js & TypeScript)

Join n8n as a Senior Software Engineer to build AI applications using Node.js and TypeScript. Remote role within Europe.

Aiven logo
Aiven

Senior Software Engineer - Python, Apache Kafka

Join Aiven as a Senior Software Engineer in Berlin, focusing on Python and Apache Kafka in a hybrid work environment.

Huawei Nederland logo
Huawei Nederland

Information Retrieval Algorithm Engineer

Join Huawei as an Information Retrieval Algorithm Engineer to develop cutting-edge AI technologies in Amsterdam.

Topicus logo
Topicus

Software Engineer - Cloud Applications and Python

Join Topicus as a Software Engineer in Arnhem to develop cloud applications using Python, REST APIs, and ETL processes for healthcare data services.

Trust In SODA logo
Trust In SODA

Senior Software Engineer - Dispatching

Join as a Senior Software Engineer to lead dispatching services design, optimizing global networks with Go, Ruby, and React.

ZAUBAR logo
ZAUBAR

Senior Fullstack & Unity Developer (Gen AI, AR)

Join ZAUBAR as a Senior Fullstack & Unity Developer in Berlin to create immersive AR experiences with GenAI. Work on cutting-edge technology in a dynamic team.

Cere Network logo
Cere Network

Principal AI Engineer

Join Cere Network as a Principal AI Engineer to drive AI innovation in Web3. Requires 10+ years in AI/ML, NLP, and software development.

Nebius AI logo
Nebius AI

Senior Backend Engineer (Go)

Join Nebius as a Senior Backend Engineer (Go) to develop fault-tolerant cloud services in a hybrid work environment.

Nubank logo
Nubank

Senior Software Engineer - Data Platform

Join Nubank as a Senior Software Engineer to build and maintain core data infrastructure, ensuring reliable and scalable data flow.

Bitmovin logo
Bitmovin

Staff C++ Software Engineer

Join Bitmovin as a Staff C++ Engineer to lead video streaming tech innovations. Work with C++, Docker, Kubernetes in a hybrid role.