Nebius AI logo

MLOps Engagement Engineer

Nebius AI

Join Our Team as an MLOps Engagement Engineer

Nebius AI is seeking an experienced MLOps Engagement Engineer to join our dynamic team. This role is pivotal in designing, implementing, and maintaining large-scale distributed machine learning (ML) training and inference workflows. As an MLOps Engagement Engineer, you will work closely with our Solutions Architect and support team, providing hands-on expertise to our largest customers and internal teams.

Key Responsibilities

  • Design and Implement Distributed ML Workflows: Develop and maintain scalable, efficient, and reliable ML training pipelines on Kubernetes (K8s) and Slurm, leveraging containerization (e.g., Docker) and orchestration.
  • Optimize ML Training Performance: Collaborate with data scientists and engineers to enhance ML model training and inference performance.
  • Develop Solutions Library: Design, deploy, and manage K8s and Slurm clusters for large-scale ML training, utilizing our ready-to-deploy solutions.
  • Integrate with ML Frameworks: Ensure seamless execution of distributed ML training workloads by integrating K8s and Slurm with popular ML frameworks like TensorFlow, PyTorch, or MXNet.
  • Monitor and Troubleshoot: Develop monitoring and logging tools to track distributed training performance, identify bottlenecks, and troubleshoot issues.
  • Develop Automation Tools: Create automation scripts and tools to streamline ML training workflows, leveraging technologies like Ansible, Terraform, or Python.
  • Stay Updated with Industry Trends: Participate in industry conferences, meetups, and online forums to stay abreast of the latest developments in MLOps, K8s, Slurm, and ML.

Required Qualifications

  • 3+ years of experience in MLOps, DevOps, or a related field.
  • Strong experience with Kubernetes and containerization (e.g., Docker).
  • Experience with Slurm or other distributed computing frameworks.
  • Proficiency in Python, with experience in ML frameworks like TensorFlow, PyTorch, or MXNet.
  • Strong understanding of distributed computing concepts, including parallel processing and job scheduling.
  • Experience with automation tools like Ansible, Terraform, or Python.
  • Excellent problem-solving skills with the ability to troubleshoot complex issues.
  • Strong communication and collaboration skills, with experience working with cross-functional teams.

Preferred Qualifications

  • Experience with cloud providers like AWS, GCP, or Azure.
  • Knowledge of ML model serving and deployment.
  • Familiarity with CI/CD pipelines and tools like Jenkins, GitLab CI/CD, or CircleCI.
  • Experience with monitoring and logging tools like Prometheus, Grafana, or ELK Stack.

Why Join Us?

Nebius AI is a leading AI cloud platform with one of the largest GPU capacities in Europe. We offer a unique opportunity to work with cutting-edge technology and a team of highly skilled engineers. If you are passionate about AI and ML and eager to tackle new challenges, we invite you to join our team.

Work Environment

This position offers a hybrid work environment, allowing you to work both on-site in our Amsterdam office and remotely. We are committed to providing a flexible work environment that supports work-life balance and professional growth.

Apply today to become a part of our innovative team and contribute to the future of AI and ML at Nebius AI.

Benefits
Extracted with AI

  • Flexible work environment
  • Opportunities for professional development
  • Access to cutting-edge technology

Similar jobs

Last update: 23 minutes ago

Nebius AI logo
Nebius AI

Senior Backend Engineer (Go)

Join Nebius as a Senior Backend Engineer (Go) to develop fault-tolerant cloud services in a hybrid work environment.

Nebius AI logo
Nebius AI

Senior Software Engineer (C++)

Join Nebius as a Senior Software Engineer (C++) to develop reliable cloud services in a hybrid work environment.

dataroots logo
dataroots

Expert Machine Learning Engineer

Join Dataroots as an Expert Machine Learning Engineer to design and deliver AI-powered solutions, focusing on machine learning models.

netgo logo
netgo

Senior Cloud DevOps Engineer

Join netgo as a Senior Cloud DevOps Engineer in Berlin. Work with Kubernetes, GitOps, and more in a dynamic team environment.

Covestro logo
Covestro

Senior DevOps Engineer - Price & Deal Management

Join Covestro as a Senior DevOps Engineer to drive digital transformation in pricing and deal management with AWS, Docker, and Java expertise.

Together AI logo
Together AI

Senior Backend Engineer - Java, Rust, Go

Join Together AI as a Senior Backend Engineer in Amsterdam. Work with Java, Rust, and Go to build scalable backend systems.

IDEMIA logo
IDEMIA

DevOps Engineer with Kubernetes and Terraform

Join IDEMIA as a DevOps Engineer in Haarlem, focusing on CI/CD, Kubernetes, and Terraform. Enhance IT infrastructure and security.

ITQ logo
ITQ

Platform Engineer with Cloud and DevOps Expertise

Join ITQ as a Platform Engineer to design, implement, and maintain cloud-native platforms using Kubernetes and DevOps practices.

Computer Futures logo
Computer Futures

Cloud Data Engineer

Seeking a Cloud Data Engineer with expertise in AWS, Python, and CI/CD for a hybrid role in Hannover. Join our dynamic team!

Basetime BV logo
Basetime BV

Senior Python Developer with AWS Experience

Join Basetime BV as a Senior Python Developer to develop and maintain AWS cloud solutions. Hybrid work, competitive salary, and growth opportunities.

Topicus logo
Topicus

Software Engineer - Cloud Applications and Python

Join Topicus as a Software Engineer in Arnhem to develop cloud applications using Python, REST APIs, and ETL processes for healthcare data services.

Swift logo
Swift

Senior Developer with Kubernetes and Automation Expertise

Join Swift as a Senior Developer to enhance our Kubernetes platform with automation and security expertise.

Aiven logo
Aiven

Staff Software Engineer

Join Aiven as a Staff Software Engineer to develop cloud operations platforms using open-source technologies. Hybrid work in Berlin.

Motius logo
Motius

Senior Backend Developer

Join Motius as a Senior Backend Developer to work on cutting-edge R&D projects using AWS, Docker, GraphQL, and more in a hybrid work environment.

Redcare Pharmacy logo
Redcare Pharmacy

Senior DevOps Engineer with Linux, Kubernetes, and GCP

Join Redcare Pharmacy as a Senior DevOps Engineer to enhance infrastructure efficiency using Linux, Kubernetes, and GCP.

EOS Karriere logo
EOS Karriere

Senior DevOps Engineer

Join EOS Karriere as a Senior DevOps Engineer in Hamburg, focusing on automation and system transition in a hybrid work environment.

Huawei Nederland logo
Huawei Nederland

Information Retrieval Algorithm Engineer

Join Huawei as an Information Retrieval Algorithm Engineer to develop cutting-edge AI technologies in Amsterdam.

CARFAX Europe logo
CARFAX Europe

Senior DevOps Engineer

Join CARFAX Europe as a Senior DevOps Engineer to manage AWS infrastructure, develop CI/CD pipelines, and enhance system observability.

Stichting RINIS logo
Stichting RINIS

Senior Developer with C#, Java, and Python

Join RINIS as a Senior Developer to build secure data exchange solutions using C#, Java, Python, and more in a hybrid work environment.

n8n logo
n8n

Senior Software Engineer (Node.js & TypeScript)

Join n8n as a Senior Software Engineer to build AI applications using Node.js and TypeScript. Remote role within Europe.

Oviva logo
Oviva

Staff DevOps Engineer

Join Oviva as a Staff DevOps Engineer to enhance automation and standardization in a hybrid role in Berlin, Germany.

LEGALFLY logo
LEGALFLY

Back End Engineer with Node.js and TypeScript

Join LegalFly as a Back End Engineer to revolutionize legal AI with Node.js and TypeScript in a hybrid role in Ghent.

BeFrank logo
BeFrank

Data Engineer with Azure and PySpark

Join BeFrank as a Data Engineer to build and enhance our data platform using Azure and PySpark. Hybrid work in Amsterdam.

Personio logo
Personio

Staff Software Engineer, Data Platform

Join Personio as a Staff Software Engineer in Berlin to build scalable data platforms using Kafka, Kubernetes, and AWS. Drive innovation and excellence.