Nebius AI logo

MLOps Engagement Engineer

Nebius AI

Join Our Team as an MLOps Engagement Engineer

Nebius AI is seeking an experienced MLOps Engagement Engineer to join our dynamic team. This role is pivotal in designing, implementing, and maintaining large-scale distributed machine learning (ML) training and inference workflows. As an MLOps Engagement Engineer, you will work closely with our Solutions Architect and support team, providing hands-on expertise to our largest customers and internal teams.

Key Responsibilities

  • Design and Implement Distributed ML Workflows: Develop and maintain scalable, efficient, and reliable ML training pipelines on Kubernetes (K8s) and Slurm, leveraging containerization (e.g., Docker) and orchestration.
  • Optimize ML Training Performance: Collaborate with data scientists and engineers to enhance ML model training and inference performance.
  • Develop Solutions Library: Design, deploy, and manage K8s and Slurm clusters for large-scale ML training, utilizing our ready-to-deploy solutions.
  • Integrate with ML Frameworks: Ensure seamless execution of distributed ML training workloads by integrating K8s and Slurm with popular ML frameworks like TensorFlow, PyTorch, or MXNet.
  • Monitor and Troubleshoot: Develop monitoring and logging tools to track distributed training performance, identify bottlenecks, and troubleshoot issues.
  • Develop Automation Tools: Create automation scripts and tools to streamline ML training workflows, leveraging technologies like Ansible, Terraform, or Python.
  • Stay Updated with Industry Trends: Participate in industry conferences, meetups, and online forums to stay abreast of the latest developments in MLOps, K8s, Slurm, and ML.

Required Qualifications

  • 3+ years of experience in MLOps, DevOps, or a related field.
  • Strong experience with Kubernetes and containerization (e.g., Docker).
  • Experience with Slurm or other distributed computing frameworks.
  • Proficiency in Python, with experience in ML frameworks like TensorFlow, PyTorch, or MXNet.
  • Strong understanding of distributed computing concepts, including parallel processing and job scheduling.
  • Experience with automation tools like Ansible, Terraform, or Python.
  • Excellent problem-solving skills with the ability to troubleshoot complex issues.
  • Strong communication and collaboration skills, with experience working with cross-functional teams.

Preferred Qualifications

  • Experience with cloud providers like AWS, GCP, or Azure.
  • Knowledge of ML model serving and deployment.
  • Familiarity with CI/CD pipelines and tools like Jenkins, GitLab CI/CD, or CircleCI.
  • Experience with monitoring and logging tools like Prometheus, Grafana, or ELK Stack.

Why Join Us?

Nebius AI is a leading AI cloud platform with one of the largest GPU capacities in Europe. We offer a unique opportunity to work with cutting-edge technology and a team of highly skilled engineers. If you are passionate about AI and ML and eager to tackle new challenges, we invite you to join our team.

Work Environment

This position offers a hybrid work environment, allowing you to work both on-site in our Amsterdam office and remotely. We are committed to providing a flexible work environment that supports work-life balance and professional growth.

Apply today to become a part of our innovative team and contribute to the future of AI and ML at Nebius AI.

Benefits
Extracted with AI

  • Flexible work environment
  • Opportunities for professional development
  • Access to cutting-edge technology

Similar jobs

Last update: 23 minutes ago

Nebius AI logo
Nebius AI

Senior Backend Engineer (Go)

Join Nebius as a Senior Backend Engineer (Go) to develop fault-tolerant cloud services in a hybrid work environment.

Nebius AI logo
Nebius AI

Senior Software Engineer - Distributed Systems and HPC

Join Nebius as a Senior Software Engineer to work on distributed systems and HPC, enhancing the TractoAI platform.

Pruna AI logo
Pruna AI

MLOps Engineer

Join Pruna AI as an MLOps Engineer to optimize machine learning infrastructure and enhance AI operations remotely.

Dataiku logo
Dataiku

Software Engineer - MLOps

Join Dataiku as a Software Engineer in MLOps, focusing on developing MLOps features and capabilities in Amsterdam.

Nebius AI logo
Nebius AI

System Engineer - IT Infrastructure

Join Nebius AI as a System Engineer focusing on Microsoft technologies, managing enterprise solutions, and automating processes.

Nebius AI logo
Nebius AI

System Engineer IAM

Join Nebius AI as a System Engineer IAM in Amsterdam to design and manage IAM systems with a focus on Azure AD.

Nebius AI logo
Nebius AI

Senior Software Engineer (C++)

Join Nebius as a Senior Software Engineer (C++) to develop reliable cloud services in a hybrid work environment.

Proximus Group logo
Proximus Group

MLops Engineer (Data Science)

Join Proximus Group as an MLops Engineer in Brussels. Work on Azure cloud, data science, and MLOps projects.

Dataiku logo
Dataiku

Software Engineer - MLOps

Join Dataiku as a Software Engineer in Berlin, focusing on MLOps features and capabilities. Enhance ML model automation and interfaces.

Proximus Ada logo
Proximus Ada

MLOps Engineer (Data Science)

Seeking an MLOps Engineer with expertise in Data Science, Python, and Azure in Brussels.

Intapp logo
Intapp

Senior MLOps Engineer

Join Intapp as a Senior MLOps Engineer to design, build, and maintain secure, scalable ML platforms. Remote position in Portugal.

Flow Traders logo
Flow Traders

Senior DevOps Engineer - Cloud and Machine Learning

Senior DevOps Engineer needed in Amsterdam for cloud-based IT environments and machine learning projects.

Mollie logo
Mollie

Machine Learning Engineer

Join Mollie as a Machine Learning Engineer in Lisbon to develop and deploy ML capabilities across various domains.

Poppi Technologies logo
Poppi Technologies

MLOps Engineer

Join Poppi Technologies as an MLOps Engineer in Valenzano, Italy. Work with AI models, DevOps, and cloud platforms to drive innovation in finance.

Freeday logo
Freeday

Senior Machine Learning Engineer

Senior ML Engineer role focusing on AI innovations, model development, and MLOps practices in Rotterdam.

Proximus Group logo
Proximus Group

Machine Learning Engineer

Join Proximus Group as a Machine Learning Engineer to develop AI solutions in a hybrid work environment.

Adyen logo
Adyen

Senior DevOps Engineer - Generative AI Team

Senior DevOps Engineer for Generative AI team in Madrid, focusing on MLOps, Kubernetes, and automation.

Flow Traders logo
Flow Traders

Senior Machine Learning Engineer

Join Flow Traders as a Senior Machine Learning Engineer in Amsterdam. Lead ML model development and integration in a dynamic trading environment.

Brunel logo
Brunel

DevOps Engineer

Join Brunel as a DevOps Engineer in Amsterdam. Design, implement, and manage CI/CD pipelines, automate processes, and ensure IT stability.

DPG Media Nederland logo
DPG Media Nederland

DevOps Engineer with AWS and Kubernetes Experience

Join NU.nl as a DevOps Engineer to enhance AWS EKS infrastructure and CI/CD pipelines. Work with Kubernetes, Terraform, and more.

Mozilla.ai logo
Mozilla.ai

Remote Machine Learning Engineer

Join Mozilla.ai as a Remote Machine Learning Engineer to develop scalable AI solutions with open-source tools.

DEPT® logo
DEPT®

Senior Data Scientist / AI Engineer

Senior Data Scientist/AI Engineer needed to build and deploy AI solutions using cutting-edge technologies in Rotterdam.

ITQ logo
ITQ

DevOps Engineer with Kubernetes and CI/CD Experience

Join ITQ as a DevOps Engineer to work with Kubernetes, CI/CD, and cloud-native technologies in a hybrid environment.

Mendel.ai logo
Mendel.ai

Senior Software Engineer (Cloud & DevOps)

Join Mendel.ai as a Senior Software Engineer in Cloud & DevOps, focusing on cloud infrastructure, CI/CD, and automation.