Nebius AI logo

MLOps Engagement Engineer

Nebius AI

Join Our Team as an MLOps Engagement Engineer

Nebius AI is seeking an experienced MLOps Engagement Engineer to join our dynamic team. This role is pivotal in designing, implementing, and maintaining large-scale distributed machine learning (ML) training and inference workflows. As an MLOps Engagement Engineer, you will work closely with our Solutions Architect and support team, providing hands-on expertise to our largest customers and internal teams.

Key Responsibilities

  • Design and Implement Distributed ML Workflows: Develop and maintain scalable, efficient, and reliable ML training pipelines on Kubernetes (K8s) and Slurm, leveraging containerization (e.g., Docker) and orchestration.
  • Optimize ML Training Performance: Collaborate with data scientists and engineers to enhance ML model training and inference performance.
  • Develop Solutions Library: Design, deploy, and manage K8s and Slurm clusters for large-scale ML training, utilizing our ready-to-deploy solutions.
  • Integrate with ML Frameworks: Ensure seamless execution of distributed ML training workloads by integrating K8s and Slurm with popular ML frameworks like TensorFlow, PyTorch, or MXNet.
  • Monitor and Troubleshoot: Develop monitoring and logging tools to track distributed training performance, identify bottlenecks, and troubleshoot issues.
  • Develop Automation Tools: Create automation scripts and tools to streamline ML training workflows, leveraging technologies like Ansible, Terraform, or Python.
  • Stay Updated with Industry Trends: Participate in industry conferences, meetups, and online forums to stay abreast of the latest developments in MLOps, K8s, Slurm, and ML.

Required Qualifications

  • 3+ years of experience in MLOps, DevOps, or a related field.
  • Strong experience with Kubernetes and containerization (e.g., Docker).
  • Experience with Slurm or other distributed computing frameworks.
  • Proficiency in Python, with experience in ML frameworks like TensorFlow, PyTorch, or MXNet.
  • Strong understanding of distributed computing concepts, including parallel processing and job scheduling.
  • Experience with automation tools like Ansible, Terraform, or Python.
  • Excellent problem-solving skills with the ability to troubleshoot complex issues.
  • Strong communication and collaboration skills, with experience working with cross-functional teams.

Preferred Qualifications

  • Experience with cloud providers like AWS, GCP, or Azure.
  • Knowledge of ML model serving and deployment.
  • Familiarity with CI/CD pipelines and tools like Jenkins, GitLab CI/CD, or CircleCI.
  • Experience with monitoring and logging tools like Prometheus, Grafana, or ELK Stack.

Why Join Us?

Nebius AI is a leading AI cloud platform with one of the largest GPU capacities in Europe. We offer a unique opportunity to work with cutting-edge technology and a team of highly skilled engineers. If you are passionate about AI and ML and eager to tackle new challenges, we invite you to join our team.

Work Environment

This position offers a hybrid work environment, allowing you to work both on-site in our Amsterdam office and remotely. We are committed to providing a flexible work environment that supports work-life balance and professional growth.

Apply today to become a part of our innovative team and contribute to the future of AI and ML at Nebius AI.

Benefits
Extracted with AI

  • Flexible work environment
  • Opportunities for professional development
  • Access to cutting-edge technology

Similar jobs

Last update: 23 minutes ago

Dataiku logo
Dataiku

Software Engineer - MLOps

Join Dataiku as a Software Engineer in MLOps, focusing on developing MLOps features and capabilities in Amsterdam.

micro1 logo
micro1

Machine Learning Engineer with AI/ML Experience

Join us as a Machine Learning Engineer to develop AI/ML models and applications. Work remotely with top-tier companies.

NielsenIQ logo
NielsenIQ

Senior Machine Learning Engineer

Join NIQ as a Senior ML Engineer to develop and implement AI models using Python, PyTorch, and Azure in a hybrid work environment.

webAI logo
webAI

Senior Distributed Systems Engineer

Join webAI as a Senior Distributed Systems Engineer to design and maintain scalable systems using Python, Kubernetes, and more.

Keboola logo
Keboola

Senior AI Engineer - Backend

Join Keboola as a Senior AI Engineer to enhance AI features, develop models, and collaborate on innovative projects in Prague.

Hop logo
Hop

Machine Learning Engineer - Ads

Join as a Machine Learning Engineer focusing on Ads, developing predictive models in a hybrid role in New York.

webAI logo
webAI

AI Framework Engineer

Join webAI as an AI Framework Engineer to develop innovative AI frameworks for distributed computing environments.

Nebius AI logo
Nebius AI

System Engineer - IT Infrastructure

Join Nebius AI as a System Engineer focusing on Microsoft technologies, managing enterprise solutions, and automating processes.

Summ.link logo
Summ.link

AI Specialist with Azure Expertise

Join Summ.link as an AI Specialist to develop and integrate AI solutions using Azure tools. Boost your career in a dynamic environment.

OfferFit logo
OfferFit

Machine Learning Engineer

Join OfferFit as a Machine Learning Engineer to design and scale AI platforms. Work remotely with a focus on Python, MLOps, and data science.

Stream logo
Stream

DevOps Engineer with AWS and Linux Expertise

Join Stream as a DevOps Engineer to manage AWS infrastructure, enhance system observability, and work with cutting-edge technology.

CHAI: AI Platform logo
CHAI: AI Platform

Senior ML Infrastructure Engineer

Join CHAI: AI Platform as a Senior ML Infrastructure Engineer to build and scale ML systems in Palo Alto.

NPO logo
NPO

Cloud Data Engineer

Join NPO as a Cloud Data Engineer to enhance data platforms using GCP, Python, and more. Flexible hours and growth opportunities.

SPREAD AI logo
SPREAD AI

FullStack Software Developer

Join SPREAD AI as a FullStack Software Developer to innovate in data management and engineering intelligence.

Ema Unlimited logo
Ema Unlimited

Machine Learning Engineer

Join Ema Unlimited as a Machine Learning Engineer in SF Bay Area, working on cutting-edge AI solutions with a focus on NLP and ML technologies.

Flow Traders logo
Flow Traders

Junior Trading Operations/DevOps Engineer

Join Flow Traders as a Junior Trading Operations/DevOps Engineer in Amsterdam. Work with cutting-edge trading technologies.

SSi People logo
SSi People

Senior Machine Learning Engineer

Join as a Senior Machine Learning Engineer to design and deploy advanced ML solutions using Python, Spark, and cloud platforms. Remote work opportunity.

Dataiku logo
Dataiku

Software Engineer - MLOps

Join Dataiku as a Software Engineer in Berlin, focusing on MLOps features and capabilities. Enhance ML model automation and interfaces.

Pruna AI logo
Pruna AI

MLOps Engineer

Join Pruna AI as an MLOps Engineer to optimize machine learning infrastructure and enhance AI operations remotely.

Albert Heijn logo
Albert Heijn

Data Platform Engineer (Kafka, Databricks, Python, Azure)

Join Albert Heijn as a Data Platform Engineer to enhance our data platform using Kafka, Databricks, Python, and Azure.

InstaDeep logo
InstaDeep

Senior DevOps Engineer

Join InstaDeep as a Senior DevOps Engineer in Paris, working at the intersection of machine learning and engineering.

Niji logo
Niji

Consultant IA & ML Engineer

Join Niji as a Consultant IA & ML Engineer to develop and implement machine learning models in Issy-les-Moulineaux, France.

Stability AI logo
Stability AI

Remote Data Engineer - Research

Join Stability AI as a Remote Data Engineer to build scalable data infrastructure for AI models.

Nike logo
Nike

Senior Machine Learning Engineer

Join Nike as a Senior Machine Learning Engineer to develop and optimize ML algorithms for innovative applications.