OpenAI logo

Research Engineer, Pre-training Data Processing

OpenAI

About The Team

At OpenAI, we strongly believe in the importance of data and have seen repeatedly how large of an impact focusing on data quality can yield across all of our projects. The Pre-training Data Processing team brings this focus to the pre-training of our flagship GPT models, owning the pipelines for turning raw data into the high quality, diverse, and multimodal datasets used to train our largest models. We work closely with teams focused on data acquisition, data quality, and multimodal data throughout Research. Most recently, in collaboration with these groups, we were responsible for building the dataset used to pre-train OpenAI’s newest multimodal model GPT-4o.

In addition to building new pre-training datasets, we collaborate on data research and acquisition with teams in Pre-training and Multimodal to explore ways to get more out of data, including questions around efficiency, efficacy, and diversity. We also own and continuously improve the infrastructure used across several teams to prepare data for training models small and large.

About The Role

As a Research Engineer here, you will be responsible for building AI systems that can perform previously impossible tasks or achieve unprecedented levels of performance. We're looking for people with solid engineering skills who are comfortable working with large distributed systems and strive to write quality, well-tested code.

The most outstanding deep learning results are increasingly attained at a massive scale, and these results require engineers who are comfortable working in large distributed systems. We expect engineering to play a key role in most major advances in AI of the future.

In This Role, You Will

  • Build and own data pipelines operating on internet-scale data spanning the text, image, and audio modalities.
  • Collaborate with many teams within Pre-training and across the company to incorporate our latest and greatest research into pre-training datasets.
  • Research new methods for improving our datasets alongside researchers within Pre-training.

You Might Thrive In This Role If You

  • Enjoy working at the cutting-edge of large language model research.
  • Have experience running complicated processing on very large datasets.
  • Are comfortable working in a fast-paced, dynamic environment - research can evolve quite rapidly!

About OpenAI

OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.

We are an equal opportunity employer and do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, veteran status, disability or any other legally protected status.

Benefits
Extracted with AI

  • Equal opportunity employer
  • Diversity and inclusion initiatives
  • Reasonable accommodations for disabilities

Similar jobs

Last update: 23 minutes ago

OpenAI logo
OpenAI

Research Engineer, Pre-training Architecture

Join OpenAI as a Research Engineer to advance neural network architectures and improve AI models.

OpenAI logo
OpenAI

Research Scientist, Human-AI Interaction

Join OpenAI as a Research Scientist in Human-AI Interaction, focusing on data collection and cognitive science.

OpenAI logo
OpenAI

Software Engineer, ChatGPT Enterprise

Join OpenAI as a Software Engineer for ChatGPT Enterprise, focusing on secure, scalable AI solutions.

OpenAI logo
OpenAI

Research Scientist, Pre-training Synthetic Data

Join OpenAI as a Research Scientist focusing on pre-training synthetic data, leveraging skills in biochemistry, cell biology, and machine learning.

Meta logo
Meta

Research Engineer, Language - Generative AI

Join Meta as a Research Engineer in Generative AI, focusing on large language models and NLP.

CHAI: AI Platform logo
CHAI: AI Platform

Senior Applied AI Researcher

Join CHAI: AI Platform as a Senior Applied AI Researcher to optimize and innovate AI solutions in a high-growth environment.

OpenAI logo
OpenAI

Tech Lead Manager, ChatGPT Research Acceleration

Lead a team to accelerate ChatGPT research at OpenAI, focusing on system performance and team management.

DataRobot logo
DataRobot

Deep Learning Researcher

Join DataRobot as a Deep Learning Researcher to advance generative AI capabilities and integrate them into product offerings.

OpenAI logo
OpenAI

Backend Software Engineer

Join OpenAI as a Backend Software Engineer to develop platform capabilities and integrate systems using AI.

OpenAI logo
OpenAI

Residency - Model Behavior

Join OpenAI's Residency program to transition into AI, focusing on model behavior with Python and data analytics skills.

Duolingo logo
Duolingo

AI Research Engineer, New PhD Graduate

Join Duolingo as an AI Research Engineer to solve complex problems and innovate in AI and data science.

Leonardo.Ai logo
Leonardo.Ai

Mid-Level AI Researcher

Join Leonardo.Ai as a Mid-Level AI Researcher to develop and refine AI models, focusing on model training and optimization.

Google DeepMind logo
Google DeepMind

Research Engineer, Product

Join Google DeepMind as a Research Engineer to apply cutting-edge AI models to real-world problems. Hybrid work in New York.

OpenAI logo
OpenAI

Full Stack Engineer - Leverage Engineering

Join OpenAI as a Full Stack Engineer to build innovative products using AI models in a fast-paced environment.

Meta logo
Meta

AI Research Scientist - Generative AI Red Teaming

Join Meta as an AI Research Scientist focusing on Generative AI Red Teaming, advancing AI responsibly.

Duolingo logo
Duolingo

AI Research Engineer, New PhD Graduate

Join Duolingo as an AI Research Engineer to solve complex problems in AI, Data Science, and NLP. PhD required. Relocation to Pittsburgh, PA.

ClimateAi logo
ClimateAi

Applied AI Scientist

Join ClimateAi as an Applied AI Scientist to develop AI solutions for climate resilience. Work with diverse teams in a hybrid environment.

Stability AI logo
Stability AI

Remote Data Engineer - Research

Join Stability AI as a Remote Data Engineer to build scalable data infrastructure for AI models.

Tesla logo
Tesla

AI Engineer Intern, Self-Driving

Join Tesla as an AI Engineer Intern to develop large-scale models for self-driving technology. Work on cutting-edge AI techniques.

Amazon Web Services (AWS) logo
Amazon Web Services (AWS)

Applied Scientist, Artificial General Intelligence

Join AWS as an Applied Scientist in Artificial General Intelligence, driving AI innovation in cloud computing.

Leonardo.Ai logo
Leonardo.Ai

Mid-Level AI Researcher

Join Leonardo.Ai as a Mid-Level AI Researcher to develop AI models and enhance generative AI platforms.

ResiQuant logo
ResiQuant

Founding Applied AI Engineer

Join ottobooks as a Founding Applied AI Engineer to revolutionize accounting with AI. Focus on NLP, OCR, and more.

Tesla logo
Tesla

AI Engineer Intern - Export & Inference

Join Tesla as an AI Engineer Intern focusing on Export & Inference. Work on cutting-edge AI projects in Palo Alto.

Amazon logo
Amazon

Applied Scientist, Artificial General Intelligence

Join Amazon's AGI team as an Applied Scientist to develop cutting-edge AI technology in Computer Vision and NLP.