About Magic
Magic’s mission is to build safe AGI that accelerates humanity’s progress on the world’s most important problems. We believe the most promising path to safe AGI lies in automating research and code generation to improve models and solve alignment more reliably than humans can alone. Our approach combines frontier-scale pre-training, domain-specific RL, ultra-long context, and test-time compute to achieve this goal.
About the Role
As a Software Engineer working on our pretraining data, you will write efficient and robust pipelines for giant, multimodal datasets. You will develop and optimize web scraping techniques to harvest and maintain data at internet-scale.
Responsibilities
- Design and implement multimodal (video, audio, text, etc.) web crawlers for scraping and indexing petabytes of data.
- Create large-scale data processing pipelines using tools like Ray, Apache Spark, Apache Flink, Google BigQuery, etc.
- Implement and scale deduplication techniques across modalities and apply heuristic and model-based techniques for parsing and filtering crawled data.
- Identify new data sources for inclusion in pre/post-training datasets.
What We’re Looking For
- Strong proficiency in distributed computing and parallel processing techniques.
- Obsession with details, reliability, and good testing to ensure data quality and integrity.
- Experience with designing and maintaining high-performance, scalable data architectures.
- Ability to design, develop, and operate an LLM data pipeline from web scraping to data loading.
Our Culture
- Integrity: Words and actions should be aligned.
- Hands-on: At Magic, everyone is building.
- Teamwork: We move as one team, not N individuals.
- Focus: Safely deploy AGI. Everything else is noise.
- Quality: Magic should feel like magic.
Compensation, Benefits, and Perks (US)
- Annual salary range: $100K - $900K
- Equity is a significant part of total compensation, in addition to salary.
- 401(k) plan with 6% salary matching.
- Generous health, dental, and vision insurance for you and your dependents.
- Unlimited paid time off.
- Option to work in-person in SF or remotely.
- Visa sponsorship and relocation stipend to bring you to SF.
- A small, fast-paced, highly focused team.
Benefits Extracted with AI
- 401(k)
- Vision insurance
- Generous health, dental and vision insurance
- Unlimited paid time off
- Visa sponsorship
- Relocation stipend
Similar jobs
Last update: 23 minutes ago
Software Engineer - TypeScript
Join Magic as a Software Engineer in San Francisco, focusing on TypeScript and AI development. Equity, 401(k), and health benefits included.
Senior Fullstack Developer for AI-Driven Mission Technologies
Seeking a Senior Fullstack Developer for AI-driven mission technologies, focusing on Java, JavaScript, Python, and C++. Remote work available.
Senior C++ Computer Vision Engineer
Join a cutting-edge AI-DeepTech startup in Berlin as a Senior C++ Computer Vision Engineer. Work on world-class on-device AI technology.
Senior AI/ML Engineer for Productivity Automation
Senior AI/ML Engineer needed for productivity automation in San Francisco. Expertise in Python, AWS, TensorFlow, and cloud services required.
Senior Backend Engineer C++
Join DeepL as a Senior Backend Engineer C++ to design and maintain scalable backend services using C++ and AI technologies.
Cloud Data Engineer
Seeking a Cloud Data Engineer with expertise in AWS, Python, and CI/CD for a hybrid role in Hannover. Join our dynamic team!
Expert Machine Learning Engineer
Join Dataroots as an Expert Machine Learning Engineer to design and deliver AI-powered solutions, focusing on machine learning models.
Staff Software Engineer
Join Aiven as a Staff Software Engineer to develop cloud operations platforms using open-source technologies. Hybrid work in Berlin.
Senior Security Engineer
Join Magic as a Senior Security Engineer to lead security initiatives, manage vulnerabilities, and ensure compliance in a remote role.
Staff Software Engineer, Data Platform
Join Personio as a Staff Software Engineer in Berlin to build scalable data platforms using Kafka, Kubernetes, and AWS. Drive innovation and excellence.
Senior Full Stack Software Engineer (Hybrid, San Francisco/Toronto)
Join Magical as a Senior Full Stack Software Engineer in San Francisco or Toronto. Work on innovative projects with a focus on productivity.
Software Engineer, Generative AI
Join Meta as a Software Engineer in Generative AI, focusing on NLP and large language models. Work with a global team to innovate AI products.
Senior Backend/Data Engineer
Join Zalando as a Senior Backend/Data Engineer in Berlin to enhance our audience-building platform using AWS, Java, Scala, and SQL.
Senior Backend Engineer - Java, Rust, Go
Join Together AI as a Senior Backend Engineer in Amsterdam. Work with Java, Rust, and Go to build scalable backend systems.
LLM Backend Developer
Join Persona as a LLM Backend Developer, work remotely, and develop AI-driven backend systems for top startups.
Principal Software Engineer (Full Stack)
Principal Software Engineer role in Denver, focusing on full-stack development, agile methodologies, and user interface design.
Software Engineer II - Developer Experience
Join Elastic as a Software Engineer II in Developer Experience, focusing on test frameworks for Kibana. Remote work, competitive benefits.
AI Engineer
Join BCG X as an AI Engineer in Milan, Italy. Develop AI solutions, partner with clients, and drive innovation in a dynamic environment.
Senior Full-Stack Engineer ReactJS/NodeJS
Join Gorgias as a Senior Full-Stack Engineer specializing in ReactJS and NodeJS, enhancing AI-powered ecommerce solutions.
Salesforce Software Engineer
Join AnyDesk as a Salesforce Software Engineer to develop and maintain internal business systems in a dynamic, remote-friendly environment.
Entry Level Back-End Software Engineer (Java)
Join Grammarly as an Entry Level Back-End Software Engineer in Berlin. Work with Java, AWS, and more in a hybrid environment.
Staff Software Engineer, Fullstack, Capacity & Efficiency Engineering
Join Uber as a Staff Software Engineer in Amsterdam, focusing on fullstack development and capacity efficiency engineering.
Software Engineer, Generative AI
Join Meta as a Software Engineer in Generative AI, focusing on NLP and large language models.
Staff Full Stack Engineer (AI Focused)
Join MagicSchool AI as a Staff Full Stack Engineer focused on AI, leveraging existing models to enhance our educational platform.