OpenAI logo

Research Engineer, Pre-training Data Processing

OpenAI

About The Team

At OpenAI, we strongly believe in the importance of data and have seen repeatedly how large of an impact focusing on data quality can yield across all of our projects. The Pre-training Data Processing team brings this focus to the pre-training of our flagship GPT models, owning the pipelines for turning raw data into the high quality, diverse, and multimodal datasets used to train our largest models. We work closely with teams focused on data acquisition, data quality, and multimodal data throughout Research. Most recently, in collaboration with these groups, we were responsible for building the dataset used to pre-train OpenAI’s newest multimodal model GPT-4o.

In addition to building new pre-training datasets, we collaborate on data research and acquisition with teams in Pre-training and Multimodal to explore ways to get more out of data, including questions around efficiency, efficacy, and diversity. We also own and continuously improve the infrastructure used across several teams to prepare data for training models small and large.

About The Role

As a Research Engineer here, you will be responsible for building AI systems that can perform previously impossible tasks or achieve unprecedented levels of performance. We're looking for people with solid engineering skills who are comfortable working with large distributed systems and strive to write quality, well-tested code.

The most outstanding deep learning results are increasingly attained at a massive scale, and these results require engineers who are comfortable working in large distributed systems. We expect engineering to play a key role in most major advances in AI of the future.

In This Role, You Will

  • Build and own data pipelines operating on internet-scale data spanning the text, image, and audio modalities.
  • Collaborate with many teams within Pre-training and across the company to incorporate our latest and greatest research into pre-training datasets.
  • Research new methods for improving our datasets alongside researchers within Pre-training.

You Might Thrive In This Role If You

  • Enjoy working at the cutting-edge of large language model research.
  • Have experience running complicated processing on very large datasets.
  • Are comfortable working in a fast-paced, dynamic environment - research can evolve quite rapidly!

About OpenAI

OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.

We are an equal opportunity employer and do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, veteran status, disability or any other legally protected status.

Benefits
Extracted with AI

  • Equal opportunity employer
  • Diversity and inclusion initiatives
  • Reasonable accommodations for disabilities

Similar jobs

Last update: 23 minutes ago

dataroots logo
dataroots

Expert Machine Learning Engineer

Join Dataroots as an Expert Machine Learning Engineer to design and deliver AI-powered solutions, focusing on machine learning models.

BCG X logo
BCG X

AI Engineer

Join BCG X as an AI Engineer in Milan, Italy. Develop AI solutions, partner with clients, and drive innovation in a dynamic environment.

Together AI logo
Together AI

Senior Backend Engineer - Java, Rust, Go

Join Together AI as a Senior Backend Engineer in Amsterdam. Work with Java, Rust, and Go to build scalable backend systems.

Huawei Nederland logo
Huawei Nederland

Information Retrieval Algorithm Engineer

Join Huawei as an Information Retrieval Algorithm Engineer to develop cutting-edge AI technologies in Amsterdam.

DeepL logo
DeepL

Senior Backend Engineer C++

Join DeepL as a Senior Backend Engineer C++ to design and maintain scalable backend services using C++ and AI technologies.

Cere Network logo
Cere Network

Principal AI Engineer

Join Cere Network as a Principal AI Engineer to drive AI innovation in Web3. Requires 10+ years in AI/ML, NLP, and software development.

Applied Intuition logo
Applied Intuition

Software Engineer - Autonomous Driving

Join Applied Intuition as a Software Engineer in Munich to tackle autonomous driving challenges with top ADAS/AV programs.

Aiven logo
Aiven

Staff Software Engineer

Join Aiven as a Staff Software Engineer to develop cloud operations platforms using open-source technologies. Hybrid work in Berlin.

DwellFi  logo
DwellFi

AI Solutions Software Engineer

Join DwellFi as an AI Solutions Software Engineer to develop innovative AI solutions using LangChain or Llama. Remote position in Palo Alto, CA.

Huawei Nederland logo
Huawei Nederland

Senior ASR / TTS Researcher

Join Huawei's research center in Amsterdam as a Senior ASR/TTS Researcher, focusing on speech synthesis and AI.

xai logo
xai

Product AI Engineer

Join xAI as a Product AI Engineer to develop cutting-edge AI consumer products using ML, Python, and Rust in Palo Alto, CA.

Blueprint logo
Blueprint

AI Engineer - Machine Learning and Robotics

Join Blueprint as an AI Engineer in Machine Learning and Robotics, focusing on scalable AI model training systems. Hybrid role in Redmond, WA.

Poggio logo
Poggio

Senior AI Engineer

Join Poggio as a Senior AI Engineer to innovate AI systems for enterprise sales, focusing on AI capabilities and system performance.

Nebius AI logo
Nebius AI

Senior Backend Engineer (Go)

Join Nebius as a Senior Backend Engineer (Go) to develop fault-tolerant cloud services in a hybrid work environment.

Computer Futures logo
Computer Futures

Cloud Data Engineer

Seeking a Cloud Data Engineer with expertise in AWS, Python, and CI/CD for a hybrid role in Hannover. Join our dynamic team!

Nebius AI logo
Nebius AI

Senior Software Engineer (C++)

Join Nebius as a Senior Software Engineer (C++) to develop reliable cloud services in a hybrid work environment.

Stream logo
Stream

Python AI Developer Advocate

Join Stream as a Python AI Developer Advocate to build community and enhance AI integrations. Engage with developers and influence product roadmaps.

zoom logo
zoom

AI Software Engineer

Join Zoom as an AI Software Engineer to design and optimize AI algorithms and applications. Work remotely with a focus on AI infrastructure.

Shopify logo
Shopify

Machine Learning Platform Engineer

Join Shopify as a Machine Learning Platform Engineer to build cutting-edge AI infrastructure and tools. Work remotely in a dynamic environment.

Aleph logo
Aleph

Frontend Engineer, AI

Join Aleph as a Frontend Engineer focusing on AI to develop innovative features using React.js and AI technologies in a remote role.

HeyJobs logo
HeyJobs

Senior Software Engineer - AWS, Python, Ruby on Rails

Join HeyJobs as a Senior Software Engineer to design scalable systems using AWS, Python, and Ruby on Rails in a dynamic team.

xai logo
xai

AI Engineer & Researcher - Data / Crawling

Join xAI as an AI Engineer & Researcher to build data processing systems and manage cloud workloads.

NVIDIA logo
NVIDIA

Machine Learning Engineer - LLM Fine-tuning and Performance

Join NVIDIA as a Machine Learning Engineer specializing in LLM fine-tuning and performance optimization. Work with cutting-edge ML technologies.

Arena logo
Arena

Machine Learning Scientist

Join Arena as a Machine Learning Scientist to develop AI systems using PyTorch and TensorFlow, focusing on real-world problem-solving.