Research Engineer, Pre-training Data Processing
OpenAIAbout The Team
At OpenAI, we strongly believe in the importance of data and have seen repeatedly how large of an impact focusing on data quality can yield across all of our projects. The Pre-training Data Processing team brings this focus to the pre-training of our flagship GPT models, owning the pipelines for turning raw data into the high quality, diverse, and multimodal datasets used to train our largest models. We work closely with teams focused on data acquisition, data quality, and multimodal data throughout Research. Most recently, in collaboration with these groups, we were responsible for building the dataset used to pre-train OpenAI’s newest multimodal model GPT-4o.
In addition to building new pre-training datasets, we collaborate on data research and acquisition with teams in Pre-training and Multimodal to explore ways to get more out of data, including questions around efficiency, efficacy, and diversity. We also own and continuously improve the infrastructure used across several teams to prepare data for training models small and large.
About The Role
As a Research Engineer here, you will be responsible for building AI systems that can perform previously impossible tasks or achieve unprecedented levels of performance. We're looking for people with solid engineering skills who are comfortable working with large distributed systems and strive to write quality, well-tested code.
The most outstanding deep learning results are increasingly attained at a massive scale, and these results require engineers who are comfortable working in large distributed systems. We expect engineering to play a key role in most major advances in AI of the future.
In This Role, You Will
- Build and own data pipelines operating on internet-scale data spanning the text, image, and audio modalities.
- Collaborate with many teams within Pre-training and across the company to incorporate our latest and greatest research into pre-training datasets.
- Research new methods for improving our datasets alongside researchers within Pre-training.
You Might Thrive In This Role If You
- Enjoy working at the cutting-edge of large language model research.
- Have experience running complicated processing on very large datasets.
- Are comfortable working in a fast-paced, dynamic environment - research can evolve quite rapidly!
About OpenAI
OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.
We are an equal opportunity employer and do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, veteran status, disability or any other legally protected status.
Benefits Extracted with AI
- Equal opportunity employer
- Diversity and inclusion initiatives
- Reasonable accommodations for disabilities
Similar jobs
Last update: 23 minutes ago
Research Engineer, Pre-training Architecture
Join OpenAI as a Research Engineer to advance neural network architectures and improve AI models.
Research Scientist, Human-AI Interaction
Join OpenAI as a Research Scientist in Human-AI Interaction, focusing on data collection and cognitive science.
Software Engineer, ChatGPT Enterprise
Join OpenAI as a Software Engineer for ChatGPT Enterprise, focusing on secure, scalable AI solutions.
Research Scientist, Pre-training Synthetic Data
Join OpenAI as a Research Scientist focusing on pre-training synthetic data, leveraging skills in biochemistry, cell biology, and machine learning.
Research Engineer, Language - Generative AI
Join Meta as a Research Engineer in Generative AI, focusing on large language models and NLP.
Senior Applied AI Researcher
Join CHAI: AI Platform as a Senior Applied AI Researcher to optimize and innovate AI solutions in a high-growth environment.
Tech Lead Manager, ChatGPT Research Acceleration
Lead a team to accelerate ChatGPT research at OpenAI, focusing on system performance and team management.
Deep Learning Researcher
Join DataRobot as a Deep Learning Researcher to advance generative AI capabilities and integrate them into product offerings.
Backend Software Engineer
Join OpenAI as a Backend Software Engineer to develop platform capabilities and integrate systems using AI.
Residency - Model Behavior
Join OpenAI's Residency program to transition into AI, focusing on model behavior with Python and data analytics skills.
AI Research Engineer, New PhD Graduate
Join Duolingo as an AI Research Engineer to solve complex problems and innovate in AI and data science.
Mid-Level AI Researcher
Join Leonardo.Ai as a Mid-Level AI Researcher to develop and refine AI models, focusing on model training and optimization.
Research Engineer, Product
Join Google DeepMind as a Research Engineer to apply cutting-edge AI models to real-world problems. Hybrid work in New York.
Full Stack Engineer - Leverage Engineering
Join OpenAI as a Full Stack Engineer to build innovative products using AI models in a fast-paced environment.
AI Research Scientist - Generative AI Red Teaming
Join Meta as an AI Research Scientist focusing on Generative AI Red Teaming, advancing AI responsibly.
AI Research Engineer, New PhD Graduate
Join Duolingo as an AI Research Engineer to solve complex problems in AI, Data Science, and NLP. PhD required. Relocation to Pittsburgh, PA.
Applied AI Scientist
Join ClimateAi as an Applied AI Scientist to develop AI solutions for climate resilience. Work with diverse teams in a hybrid environment.
Remote Data Engineer - Research
Join Stability AI as a Remote Data Engineer to build scalable data infrastructure for AI models.
AI Engineer Intern, Self-Driving
Join Tesla as an AI Engineer Intern to develop large-scale models for self-driving technology. Work on cutting-edge AI techniques.
Applied Scientist, Artificial General Intelligence
Join AWS as an Applied Scientist in Artificial General Intelligence, driving AI innovation in cloud computing.
Mid-Level AI Researcher
Join Leonardo.Ai as a Mid-Level AI Researcher to develop AI models and enhance generative AI platforms.
Founding Applied AI Engineer
Join ottobooks as a Founding Applied AI Engineer to revolutionize accounting with AI. Focus on NLP, OCR, and more.
AI Engineer Intern - Export & Inference
Join Tesla as an AI Engineer Intern focusing on Export & Inference. Work on cutting-edge AI projects in Palo Alto.
Applied Scientist, Artificial General Intelligence
Join Amazon's AGI team as an Applied Scientist to develop cutting-edge AI technology in Computer Vision and NLP.