OpenAI logo

Research Engineer, Pre-training Data Processing

OpenAI

About The Team

At OpenAI, we strongly believe in the importance of data and have seen repeatedly how large of an impact focusing on data quality can yield across all of our projects. The Pre-training Data Processing team brings this focus to the pre-training of our flagship GPT models, owning the pipelines for turning raw data into the high quality, diverse, and multimodal datasets used to train our largest models. We work closely with teams focused on data acquisition, data quality, and multimodal data throughout Research. Most recently, in collaboration with these groups, we were responsible for building the dataset used to pre-train OpenAI’s newest multimodal model GPT-4o.

In addition to building new pre-training datasets, we collaborate on data research and acquisition with teams in Pre-training and Multimodal to explore ways to get more out of data, including questions around efficiency, efficacy, and diversity. We also own and continuously improve the infrastructure used across several teams to prepare data for training models small and large.

About The Role

As a Research Engineer here, you will be responsible for building AI systems that can perform previously impossible tasks or achieve unprecedented levels of performance. We're looking for people with solid engineering skills who are comfortable working with large distributed systems and strive to write quality, well-tested code.

The most outstanding deep learning results are increasingly attained at a massive scale, and these results require engineers who are comfortable working in large distributed systems. We expect engineering to play a key role in most major advances in AI of the future.

In This Role, You Will

  • Build and own data pipelines operating on internet-scale data spanning the text, image, and audio modalities.
  • Collaborate with many teams within Pre-training and across the company to incorporate our latest and greatest research into pre-training datasets.
  • Research new methods for improving our datasets alongside researchers within Pre-training.

You Might Thrive In This Role If You

  • Enjoy working at the cutting-edge of large language model research.
  • Have experience running complicated processing on very large datasets.
  • Are comfortable working in a fast-paced, dynamic environment - research can evolve quite rapidly!

About OpenAI

OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.

We are an equal opportunity employer and do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, veteran status, disability or any other legally protected status.

Benefits
Extracted with AI

  • Equal opportunity employer
  • Diversity and inclusion initiatives
  • Reasonable accommodations for disabilities

Similar jobs

Last update: 23 minutes ago

FoodLabs logo
FoodLabs

Senior C++ Computer Vision Engineer

Join a cutting-edge AI-DeepTech startup in Berlin as a Senior C++ Computer Vision Engineer. Work on world-class on-device AI technology.

dataroots logo
dataroots

Expert Machine Learning Engineer

Join Dataroots as an Expert Machine Learning Engineer to design and deliver AI-powered solutions, focusing on machine learning models.

BCG X logo
BCG X

AI Engineer

Join BCG X as an AI Engineer in Milan, Italy. Develop AI solutions, partner with clients, and drive innovation in a dynamic environment.

Huawei Nederland logo
Huawei Nederland

Information Retrieval Algorithm Engineer

Join Huawei as an Information Retrieval Algorithm Engineer to develop cutting-edge AI technologies in Amsterdam.

DeepL logo
DeepL

Senior Backend Engineer C++

Join DeepL as a Senior Backend Engineer C++ to design and maintain scalable backend services using C++ and AI technologies.

yourfirm GmbH logo
yourfirm GmbH

Senior Fullstack Developer for AI-Driven Mission Technologies

Seeking a Senior Fullstack Developer for AI-driven mission technologies, focusing on Java, JavaScript, Python, and C++. Remote work available.

Darktrace logo
Darktrace

Solutions Engineer

Join Darktrace as a Solutions Engineer in Amsterdam, providing technical pre-sales and post-sales support in a hybrid work environment.

Together AI logo
Together AI

Senior Backend Engineer - Java, Rust, Go

Join Together AI as a Senior Backend Engineer in Amsterdam. Work with Java, Rust, and Go to build scalable backend systems.

Aiven logo
Aiven

Staff Software Engineer

Join Aiven as a Staff Software Engineer to develop cloud operations platforms using open-source technologies. Hybrid work in Berlin.

Carbon13 logo
Carbon13

Cofounder - Full Stack Developer/Data Scientist for Climatech Startup

Join Carbon13 as a cofounder in climate tech, leveraging AI, data science, and software development to combat climate change.

Huawei Nederland logo
Huawei Nederland

Senior ASR / TTS Researcher

Join Huawei's research center in Amsterdam as a Senior ASR/TTS Researcher, focusing on speech synthesis and AI.

Persona logo
Persona

LLM Backend Developer

Join Persona as a LLM Backend Developer, work remotely, and develop AI-driven backend systems for top startups.

Catalyze Group logo
Catalyze Group

Full Stack Developer with AI and API Expertise

Join Catalyze Group as a Full Stack Developer to build AI-powered grant-writing tools. Work with React, Django, and more in Amsterdam.

Uber logo
Uber

Staff Software Engineer, Fullstack, Capacity & Efficiency Engineering

Join Uber as a Staff Software Engineer in Amsterdam, focusing on fullstack development and capacity efficiency engineering.

Computer Futures logo
Computer Futures

Cloud Data Engineer

Seeking a Cloud Data Engineer with expertise in AWS, Python, and CI/CD for a hybrid role in Hannover. Join our dynamic team!

Zalando logo
Zalando

Backend Software Engineer - Privacy Technology

Join Zalando as a Backend Software Engineer in Privacy Technology, focusing on data protection and privacy automation services.

Reaktor logo
Reaktor

Lead Developer with DevOps and Functional Programming

Join Reaktor as a Lead Developer in Amsterdam, focusing on DevOps, Functional Programming, and JavaScript in a hybrid work environment.

Optiver logo
Optiver

Production Engineer

Join Optiver as a Production Engineer in Amsterdam to manage live trading environments and enhance system reliability and performance.

Reddit, Inc. logo
Reddit, Inc.

Senior Solutions Engineer

Join Reddit as a Senior Solutions Engineer in Amsterdam to support our growing advertising business with technical expertise and problem-solving skills.

Zalando logo
Zalando

Senior Backend/Data Engineer

Join Zalando as a Senior Backend/Data Engineer in Berlin to enhance our audience-building platform using AWS, Java, Scala, and SQL.

Cere Network logo
Cere Network

Principal AI Engineer

Join Cere Network as a Principal AI Engineer to drive AI innovation in Web3. Requires 10+ years in AI/ML, NLP, and software development.

i4talent detachering logo
i4talent detachering

Senior Data Engineer

Join i4talent as a Senior Data Engineer to lead cloud transitions and data projects. Enjoy a fun work environment with great benefits.

Skytree logo
Skytree

Senior IoT Engineer

Join Skytree as a Senior IoT Engineer to lead IoT projects, focusing on Azure IoT solutions, edge computing, and data pipelines.

Gorgias logo
Gorgias

Senior Full-Stack Engineer ReactJS/NodeJS

Join Gorgias as a Senior Full-Stack Engineer specializing in ReactJS and NodeJS, enhancing AI-powered ecommerce solutions.