Magic logo

Software Engineer - Pretraining Data

Magic

About Magic

Magic’s mission is to build safe AGI that accelerates humanity’s progress on the world’s most important problems. We believe the most promising path to safe AGI lies in automating research and code generation to improve models and solve alignment more reliably than humans can alone. Our approach combines frontier-scale pre-training, domain-specific RL, ultra-long context, and test-time compute to achieve this goal.

About the Role

As a Software Engineer working on our pretraining data, you will write efficient and robust pipelines for giant, multimodal datasets. You will develop and optimize web scraping techniques to harvest and maintain data at internet-scale.

Responsibilities

  • Design and implement multimodal (video, audio, text, etc.) web crawlers for scraping and indexing petabytes of data.
  • Create large-scale data processing pipelines using tools like Ray, Apache Spark, Apache Flink, Google BigQuery, etc.
  • Implement and scale deduplication techniques across modalities and apply heuristic and model-based techniques for parsing and filtering crawled data.
  • Identify new data sources for inclusion in pre/post-training datasets.

What We’re Looking For

  • Strong proficiency in distributed computing and parallel processing techniques.
  • Obsession with details, reliability, and good testing to ensure data quality and integrity.
  • Experience with designing and maintaining high-performance, scalable data architectures.
  • Ability to design, develop, and operate an LLM data pipeline from web scraping to data loading.

Our Culture

  • Integrity: Words and actions should be aligned.
  • Hands-on: At Magic, everyone is building.
  • Teamwork: We move as one team, not N individuals.
  • Focus: Safely deploy AGI. Everything else is noise.
  • Quality: Magic should feel like magic.

Compensation, Benefits, and Perks (US)

  • Annual salary range: $100K - $900K
  • Equity is a significant part of total compensation, in addition to salary.
  • 401(k) plan with 6% salary matching.
  • Generous health, dental, and vision insurance for you and your dependents.
  • Unlimited paid time off.
  • Option to work in-person in SF or remotely.
  • Visa sponsorship and relocation stipend to bring you to SF.
  • A small, fast-paced, highly focused team.

Benefits
Extracted with AI

  • 401(k)
  • Vision insurance
  • Generous health, dental and vision insurance
  • Unlimited paid time off
  • Visa sponsorship
  • Relocation stipend

Similar jobs

Last update: 23 minutes ago

Magic logo
Magic

Software Engineer - TypeScript

Join Magic as a Software Engineer in San Francisco, focusing on TypeScript and AI development. Equity, 401(k), and health benefits included.

Magical logo
Magical

Senior AI/ML Engineer for Productivity Automation

Senior AI/ML Engineer needed for productivity automation in San Francisco. Expertise in Python, AWS, TensorFlow, and cloud services required.

Magic logo
Magic

Senior Security Engineer

Join Magic as a Senior Security Engineer to lead security initiatives, manage vulnerabilities, and ensure compliance in a remote role.

MagicSchool AI logo
MagicSchool AI

Principal Software Engineer (Full Stack)

Principal Software Engineer role in Denver, focusing on full-stack development, agile methodologies, and user interface design.

MagicSchool AI logo
MagicSchool AI

Staff Full Stack Engineer (AI Focused)

Join MagicSchool AI as a Staff Full Stack Engineer focused on AI, leveraging existing models to enhance our educational platform.

Meta logo
Meta

Software Engineer, Generative AI

Join Meta as a Software Engineer in Generative AI, focusing on NLP and large language models. Work with a global team to innovate AI products.

MagicSchool AI logo
MagicSchool AI

Staff Software Engineer (Full Stack)

Join MagicSchool AI as a Staff Software Engineer in Denver, CO. Full-stack role with a focus on JavaScript, TypeScript, and Next.js.

eyepop.ai logo
eyepop.ai

Senior Software Engineer - Machine Learning and Data Science

Join EyePop.ai as a Senior Software Engineer to develop and scale machine learning and data science software pipelines.

Magical logo
Magical

Senior Full Stack Software Engineer (Hybrid, San Francisco/Toronto)

Join Magical as a Senior Full Stack Software Engineer in San Francisco or Toronto. Work on innovative projects with a focus on productivity.

FutureHouse logo
FutureHouse

Software Engineer - Member of Technical Staff

Join FutureHouse as a Software Engineer to innovate AI systems for scientific research in San Francisco.

Meta logo
Meta

Software Engineer, Generative AI

Join Meta as a Software Engineer in Generative AI, focusing on NLP and large language models.

Magical logo
Magical

Senior Full Stack React Software Engineer

Join Magical as a Senior Full Stack React Engineer in San Francisco. Revolutionize productivity with cutting-edge tech and a dynamic team.

Helm.ai logo
Helm.ai

Remote Software Engineer - Machine Learning and Cloud Infrastructure

Join Helm.ai as a Remote Software Engineer to develop ML tools, build cloud infrastructure, and work on AI technology.

Uplimit logo
Uplimit

Software Engineer - AI

Join Uplimit as a Software Engineer - AI to build innovative AI-driven learning solutions. Work on cutting-edge projects in a hybrid environment.

Meta logo
Meta

Software Engineer, Language - Generative AI

Join Meta as a Software Engineer in Generative AI, focusing on NLP and large language models.

Meta logo
Meta

Software Engineer, Language - Generative AI

Join Meta as a Software Engineer in Generative AI, focusing on Large Language Models and NLP.

Magic logo
Magic

Senior Security Engineer

Senior Security Engineer role focusing on web3 security, remote work, with extensive benefits including 401(k) and health insurance.

Tesla logo
Tesla

Software Engineer, Machine Learning Infrastructure

Join Tesla as a Software Engineer in ML Infrastructure to optimize and scale neural network training with Python, C++, and PyTorch.

Meta logo
Meta

Software Engineer, Language - Generative AI

Join Meta as a Software Engineer in Generative AI, focusing on NLP and large language models. Competitive salary and benefits.

Mozilla.ai logo
Mozilla.ai

Remote Machine Learning Engineer

Join Mozilla.ai as a Remote Machine Learning Engineer to develop scalable AI solutions with open-source tools.

Vizcom logo
Vizcom

AI Engineer with Full-Stack Development Skills

Join Vizcom as an AI Engineer to develop cutting-edge AI models and integrate them into our design platform. Remote work, full-stack skills required.

OpenAI logo
OpenAI

Software Engineer, Applied Engineering

Join OpenAI as a Software Engineer in Applied Engineering to develop innovative AI products using JavaScript, React, and Python.

Google logo
Google

Software Engineer AI/ML, Devices and Services

Join Google as a Software Engineer AI/ML to develop systems for devices like Pixel and Nest, enhancing supply chain processes.

Meta logo
Meta

Software Engineer, Infrastructure

Join Meta as a Software Engineer in Infrastructure, focusing on large-scale systems and distributed components.