Magic logo

Software Engineer - Pretraining Data

Magic

About Magic

Magic’s mission is to build safe AGI that accelerates humanity’s progress on the world’s most important problems. We believe the most promising path to safe AGI lies in automating research and code generation to improve models and solve alignment more reliably than humans can alone. Our approach combines frontier-scale pre-training, domain-specific RL, ultra-long context, and test-time compute to achieve this goal.

About the Role

As a Software Engineer working on our pretraining data, you will write efficient and robust pipelines for giant, multimodal datasets. You will develop and optimize web scraping techniques to harvest and maintain data at internet-scale.

Responsibilities

  • Design and implement multimodal (video, audio, text, etc.) web crawlers for scraping and indexing petabytes of data.
  • Create large-scale data processing pipelines using tools like Ray, Apache Spark, Apache Flink, Google BigQuery, etc.
  • Implement and scale deduplication techniques across modalities and apply heuristic and model-based techniques for parsing and filtering crawled data.
  • Identify new data sources for inclusion in pre/post-training datasets.

What We’re Looking For

  • Strong proficiency in distributed computing and parallel processing techniques.
  • Obsession with details, reliability, and good testing to ensure data quality and integrity.
  • Experience with designing and maintaining high-performance, scalable data architectures.
  • Ability to design, develop, and operate an LLM data pipeline from web scraping to data loading.

Our Culture

  • Integrity: Words and actions should be aligned.
  • Hands-on: At Magic, everyone is building.
  • Teamwork: We move as one team, not N individuals.
  • Focus: Safely deploy AGI. Everything else is noise.
  • Quality: Magic should feel like magic.

Compensation, Benefits, and Perks (US)

  • Annual salary range: $100K - $900K
  • Equity is a significant part of total compensation, in addition to salary.
  • 401(k) plan with 6% salary matching.
  • Generous health, dental, and vision insurance for you and your dependents.
  • Unlimited paid time off.
  • Option to work in-person in SF or remotely.
  • Visa sponsorship and relocation stipend to bring you to SF.
  • A small, fast-paced, highly focused team.

Benefits
Extracted with AI

  • 401(k)
  • Vision insurance
  • Generous health, dental and vision insurance
  • Unlimited paid time off
  • Visa sponsorship
  • Relocation stipend

Similar jobs

Last update: 23 minutes ago

yourfirm GmbH logo
yourfirm GmbH

Senior Fullstack Developer for AI-Driven Mission Technologies

Seeking a Senior Fullstack Developer for AI-driven mission technologies, focusing on Java, JavaScript, Python, and C++. Remote work available.

FoodLabs logo
FoodLabs

Senior C++ Computer Vision Engineer

Join a cutting-edge AI-DeepTech startup in Berlin as a Senior C++ Computer Vision Engineer. Work on world-class on-device AI technology.

DeepL logo
DeepL

Senior Backend Engineer C++

Join DeepL as a Senior Backend Engineer C++ to design and maintain scalable backend services using C++ and AI technologies.

Gorgias logo
Gorgias

Senior Full-Stack Engineer ReactJS/NodeJS

Join Gorgias as a Senior Full-Stack Engineer specializing in ReactJS and NodeJS, enhancing AI-powered ecommerce solutions.

Computer Futures logo
Computer Futures

Cloud Data Engineer

Seeking a Cloud Data Engineer with expertise in AWS, Python, and CI/CD for a hybrid role in Hannover. Join our dynamic team!

AnyDesk Software logo
AnyDesk Software

Salesforce Software Engineer

Join AnyDesk as a Salesforce Software Engineer to develop and maintain internal business systems in a dynamic, remote-friendly environment.

dataroots logo
dataroots

Expert Machine Learning Engineer

Join Dataroots as an Expert Machine Learning Engineer to design and deliver AI-powered solutions, focusing on machine learning models.

Aiven logo
Aiven

Staff Software Engineer

Join Aiven as a Staff Software Engineer to develop cloud operations platforms using open-source technologies. Hybrid work in Berlin.

Uber logo
Uber

Staff Software Engineer, Fullstack, Capacity & Efficiency Engineering

Join Uber as a Staff Software Engineer in Amsterdam, focusing on fullstack development and capacity efficiency engineering.

Zalando logo
Zalando

Senior Backend/Data Engineer

Join Zalando as a Senior Backend/Data Engineer in Berlin to enhance our audience-building platform using AWS, Java, Scala, and SQL.

Personio logo
Personio

Staff Software Engineer, Data Platform

Join Personio as a Staff Software Engineer in Berlin to build scalable data platforms using Kafka, Kubernetes, and AWS. Drive innovation and excellence.

Uber logo
Uber

Staff Software Engineer - Backend

Join Uber as a Staff Software Engineer - Backend, focusing on membership systems. Work with Java, Python, C++, and more in Amsterdam.

Persona logo
Persona

LLM Backend Developer

Join Persona as a LLM Backend Developer, work remotely, and develop AI-driven backend systems for top startups.

Zalando logo
Zalando

Backend Software Engineer - Privacy Technology

Join Zalando as a Backend Software Engineer in Privacy Technology, focusing on data protection and privacy automation services.

Carbon13 logo
Carbon13

Cofounder - Full Stack Developer/Data Scientist for Climatech Startup

Join Carbon13 as a cofounder in climate tech, leveraging AI, data science, and software development to combat climate change.

Grammarly logo
Grammarly

Entry Level Back-End Software Engineer (Java)

Join Grammarly as an Entry Level Back-End Software Engineer in Berlin. Work with Java, AWS, and more in a hybrid environment.

RightCrowd logo
RightCrowd

Full Stack Engineer with Node.js and React

Join RightCrowd as a Full Stack Engineer to develop cloud-native applications using Node.js and React. Work remotely with cutting-edge technology.

Elastic logo
Elastic

Software Engineer II - Developer Experience

Join Elastic as a Software Engineer II in Developer Experience, focusing on test frameworks for Kibana. Remote work, competitive benefits.

BCG X logo
BCG X

AI Engineer

Join BCG X as an AI Engineer in Milan, Italy. Develop AI solutions, partner with clients, and drive innovation in a dynamic environment.

Uber logo
Uber

Staff Software Engineer - Backend

Join Uber as a Staff Software Engineer - Backend to develop and enhance solutions for millions of members globally.

Holland Casino logo
Holland Casino

Data Engineer with ETL and SQL Expertise

Join Holland Casino as a Data Engineer to build and maintain data infrastructure for the Online Casino, focusing on ETL, SQL, and cloud solutions.

Reaktor logo
Reaktor

Lead Developer with DevOps and Functional Programming

Join Reaktor as a Lead Developer in Amsterdam, focusing on DevOps, Functional Programming, and JavaScript in a hybrid work environment.

Catalyze Group logo
Catalyze Group

Full Stack Developer with AI and API Expertise

Join Catalyze Group as a Full Stack Developer to build AI-powered grant-writing tools. Work with React, Django, and more in Amsterdam.

Uber logo
Uber

Software Engineer - Backend - Membership

Join Uber as a Backend Software Engineer to develop member-first experiences in a collaborative team, impacting global foundations.