Software Engineer - Pretraining Data
MagicAbout Magic
Magic’s mission is to build safe AGI that accelerates humanity’s progress on the world’s most important problems. We believe the most promising path to safe AGI lies in automating research and code generation to improve models and solve alignment more reliably than humans can alone. Our approach combines frontier-scale pre-training, domain-specific RL, ultra-long context, and test-time compute to achieve this goal.
About the Role
As a Software Engineer working on our pretraining data, you will write efficient and robust pipelines for giant, multimodal datasets. You will develop and optimize web scraping techniques to harvest and maintain data at internet-scale.
Responsibilities
- Design and implement multimodal (video, audio, text, etc.) web crawlers for scraping and indexing petabytes of data.
- Create large-scale data processing pipelines using tools like Ray, Apache Spark, Apache Flink, Google BigQuery, etc.
- Implement and scale deduplication techniques across modalities and apply heuristic and model-based techniques for parsing and filtering crawled data.
- Identify new data sources for inclusion in pre/post-training datasets.
What We’re Looking For
- Strong proficiency in distributed computing and parallel processing techniques.
- Obsession with details, reliability, and good testing to ensure data quality and integrity.
- Experience with designing and maintaining high-performance, scalable data architectures.
- Ability to design, develop, and operate an LLM data pipeline from web scraping to data loading.
Our Culture
- Integrity: Words and actions should be aligned.
- Hands-on: At Magic, everyone is building.
- Teamwork: We move as one team, not N individuals.
- Focus: Safely deploy AGI. Everything else is noise.
- Quality: Magic should feel like magic.
Compensation, Benefits, and Perks (US)
- Annual salary range: $100K - $900K
- Equity is a significant part of total compensation, in addition to salary.
- 401(k) plan with 6% salary matching.
- Generous health, dental, and vision insurance for you and your dependents.
- Unlimited paid time off.
- Option to work in-person in SF or remotely.
- Visa sponsorship and relocation stipend to bring you to SF.
- A small, fast-paced, highly focused team.
Benefits Extracted with AI
- 401(k)
- Vision insurance
- Generous health, dental and vision insurance
- Unlimited paid time off
- Visa sponsorship
- Relocation stipend
Similar jobs
Last update: 23 minutes ago
Senior ML Infrastructure Engineer
Join CHAI: AI Platform as a Senior ML Infrastructure Engineer to build and scale ML systems in Palo Alto.
Machine Learning Engineer with AI/ML Experience
Join us as a Machine Learning Engineer to develop AI/ML models and applications. Work remotely with top-tier companies.
Senior Software Engineer, Machine Learning
Join as a Senior Software Engineer in Machine Learning, working remotely to build ML-driven products for user engagement.
Remote Software Engineer
Join Waabi as a Remote Software Engineer to develop cutting-edge self-driving technology. Work with AI, Python, C++, and more.
Research Engineer, Language - Generative AI
Join Meta as a Research Engineer in Generative AI, focusing on large language models and NLP.
Internship Software Engineer - Machine Learning Infrastructure
Join Tesla as an Internship Software Engineer in Machine Learning Infrastructure. Work on AI infrastructure and neural network scaling.
Machine Learning Engineer
Join Ema Unlimited as a Machine Learning Engineer in SF Bay Area, working on cutting-edge AI solutions with a focus on NLP and ML technologies.
FullStack Software Developer
Join SPREAD AI as a FullStack Software Developer to innovate in data management and engineering intelligence.
Senior Software Engineer, Machine Learning
Join Niantic as a Senior Software Engineer in Machine Learning to enhance products using generative AI technologies.
AI Solutions Software Engineer
Join DwellFi as an AI Solutions Software Engineer to develop innovative AI solutions using LangChain or Llama.
Machine Learning Engineer - University Graduate 2025
Join Adobe as a Machine Learning Engineer to develop AI technologies for Photoshop and Digital Imaging.
Software Engineer - Member of Technical Staff
Join FutureHouse as a Software Engineer to innovate AI systems for scientific research in San Francisco.
Senior Software Engineer, Backend
Join Standard AI as a Senior Backend Engineer to design scalable microservices and APIs. Remote role with competitive salary and benefits.
Senior Backend Engineer
Join Magic Eden as a Senior Backend Engineer to build scalable systems using Node.js and cloud technologies.
Software Engineer, ChatGPT Enterprise
Join OpenAI as a Software Engineer for ChatGPT Enterprise, focusing on secure, scalable AI solutions.
Remote Data Engineer - Research
Join Stability AI as a Remote Data Engineer to build scalable data infrastructure for AI models.
AI/ML Data Engineer
Join MarketWise as an AI/ML Data Engineer to develop data pipelines and ETL processes using Python and cloud platforms.
Software Engineer 2 - Platform Security
Join Intuit as a Software Engineer 2 in Platform Security, focusing on cloud infrastructure and security best practices.
Machine Learning Engineer
Join OfferFit as a Machine Learning Engineer to design and scale AI platforms. Work remotely with a focus on Python, MLOps, and data science.
Software Engineer - Full Stack
Join Uplimit as a Full Stack Software Engineer to build AI-powered learning platforms. Work on cutting-edge AI projects in a hybrid environment.
Data Engineer with Blockchain and Cloud Experience
Join Messari as a Data Engineer to design blockchain data models, build dashboards, and derive insights. Remote role with competitive benefits.
FullStack Software Developer
Join SPREAD AI as a FullStack Software Developer in Berlin. Work with JavaScript, Python, Go, and more in a hybrid setup.
AI Software Development Engineer
Join Intel as an AI Software Development Engineer to develop and deploy AI applications, enhancing engineering productivity.
Machine Learning Engineer
Join PayPal as a Machine Learning Engineer to develop advanced ML solutions for product and marketing strategies.