OpenAI logo

Research Engineer, Pre-training Data Processing

OpenAI

About The Team

At OpenAI, we strongly believe in the importance of data and have seen repeatedly how large of an impact focusing on data quality can yield across all of our projects. The Pre-training Data Processing team brings this focus to the pre-training of our flagship GPT models, owning the pipelines for turning raw data into the high quality, diverse, and multimodal datasets used to train our largest models. We work closely with teams focused on data acquisition, data quality, and multimodal data throughout Research. Most recently, in collaboration with these groups, we were responsible for building the dataset used to pre-train OpenAI’s newest multimodal model GPT-4o.

In addition to building new pre-training datasets, we collaborate on data research and acquisition with teams in Pre-training and Multimodal to explore ways to get more out of data, including questions around efficiency, efficacy, and diversity. We also own and continuously improve the infrastructure used across several teams to prepare data for training models small and large.

About The Role

As a Research Engineer here, you will be responsible for building AI systems that can perform previously impossible tasks or achieve unprecedented levels of performance. We're looking for people with solid engineering skills who are comfortable working with large distributed systems and strive to write quality, well-tested code.

The most outstanding deep learning results are increasingly attained at a massive scale, and these results require engineers who are comfortable working in large distributed systems. We expect engineering to play a key role in most major advances in AI of the future.

In This Role, You Will

  • Build and own data pipelines operating on internet-scale data spanning the text, image, and audio modalities.
  • Collaborate with many teams within Pre-training and across the company to incorporate our latest and greatest research into pre-training datasets.
  • Research new methods for improving our datasets alongside researchers within Pre-training.

You Might Thrive In This Role If You

  • Enjoy working at the cutting-edge of large language model research.
  • Have experience running complicated processing on very large datasets.
  • Are comfortable working in a fast-paced, dynamic environment - research can evolve quite rapidly!

About OpenAI

OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.

We are an equal opportunity employer and do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, veteran status, disability or any other legally protected status.

Benefits
Extracted with AI

  • Equal opportunity employer
  • Diversity and inclusion initiatives
  • Reasonable accommodations for disabilities

Similar jobs

Last update: 23 minutes ago

OpenAI logo
OpenAI

Research Engineer, Pre-training Architecture

Join OpenAI as a Research Engineer to advance neural network architectures and improve AI models.

OpenAI logo
OpenAI

Research Scientist, Pre-training Synthetic Data

Join OpenAI as a Research Scientist focusing on pre-training synthetic data, leveraging skills in biochemistry, cell biology, and machine learning.

OpenAI logo
OpenAI

Research Engineer/Scientist, Perception - OpenAI

Join OpenAI as a Research Engineer/Scientist in Perception, enhancing AI capabilities in San Francisco. Hybrid work, relocation offered.

OpenAI logo
OpenAI

Software Engineer, Applied Engineering

Join OpenAI as a Software Engineer in Applied Engineering to develop innovative AI products using JavaScript, React, and Python.

OpenAI logo
OpenAI

Software Engineer Intern, Applied Emerging Talent

Join OpenAI as a Software Engineer Intern to work on cutting-edge AI technology in a fast-paced environment.

OpenAI logo
OpenAI

Research Scientist, Human-AI Interaction

Join OpenAI as a Research Scientist in Human-AI Interaction, focusing on data collection and cognitive science.

OpenAI logo
OpenAI

Tech Lead Manager, Human Data

Lead a team enhancing AI data solutions with OpenAI, focusing on safety and innovation in San Francisco.

OpenAI logo
OpenAI

Software Engineer, Applied Emerging Talent

Join OpenAI as a Software Engineer to develop ChatGPT and API features using JavaScript, React, and Python.

OpenAI logo
OpenAI

Engineering Manager, Human Data

Lead the Human Data Team at OpenAI, enhancing AI models like ChatGPT through data solutions. Hybrid work, based in San Francisco.

xai logo
xai

AI Engineer & Researcher - Data / Crawling

Join xAI as an AI Engineer & Researcher to build data processing systems and manage cloud workloads.

OpenAI logo
OpenAI

Software Engineer, ChatGPT Enterprise

Join OpenAI as a Software Engineer for ChatGPT Enterprise, focusing on secure, scalable AI solutions.

OpenAI logo
OpenAI

Residency - Model Behavior

Join OpenAI's Residency program to transition into AI, focusing on model behavior with Python and data analytics skills.

OpenAI logo
OpenAI

New Products Platform Engineer

Join OpenAI as a New Products Platform Engineer to build future computing systems in a hybrid work model in San Francisco.

OpenAI logo
OpenAI

Senior Data Engineer - Real Estate and Workplace

Senior Data Engineer for Real Estate and Workplace at OpenAI, skilled in ETL, Apache Spark, and Airflow.

OpenAI logo
OpenAI

Software Engineer, Privacy

Join OpenAI as a Software Engineer focusing on privacy, developing secure backend systems in a hybrid work model in San Francisco.

OpenAI logo
OpenAI

Senior Software Engineer, Observability

Join OpenAI as a Senior Software Engineer in Observability, ensuring system reliability and scalability in a fast-paced environment.

dataroots logo
dataroots

Expert Machine Learning Engineer

Join Dataroots as an Expert Machine Learning Engineer to design and deliver AI-powered solutions, focusing on machine learning models.

OpenAI logo
OpenAI

Full-Stack Software Engineer - People Innovation

Join OpenAI as a Full-Stack Software Engineer in San Francisco, focusing on HR, culture, and recruiting innovations.

FoodLabs logo
FoodLabs

Senior C++ Computer Vision Engineer

Join a cutting-edge AI-DeepTech startup in Berlin as a Senior C++ Computer Vision Engineer. Work on world-class on-device AI technology.

OpenAI logo
OpenAI

Developer Advocate, Developer Experience

Join OpenAI as a Developer Advocate to engage with the developer community, create technical content, and advocate for developers' needs.

BCG X logo
BCG X

AI Engineer

Join BCG X as an AI Engineer in Milan, Italy. Develop AI solutions, partner with clients, and drive innovation in a dynamic environment.

OpenAI logo
OpenAI

Solutions Engineer, Global Affairs

Join OpenAI as a Solutions Engineer in Global Affairs, enhancing stakeholder engagement and AI adoption in San Francisco.

OpenAI logo
OpenAI

Senior Design Engineer, Communications Design

Join OpenAI as a Senior Design Engineer in San Francisco to craft high-impact user experiences with a focus on design and engineering.

OpenAI logo
OpenAI

Backend Software Engineer

Join OpenAI as a Backend Software Engineer to develop platform capabilities and integrate systems using AI.