What You’ll Do
- Design and implement scalable, secure, and highly available Kubernetes clusters to support our growing application portfolio.
- Bootstrap new on-prem and managed Kubernetes environments from the ground up, including networking, storage, and security configurations.
- Extend our existing Kubernetes platforms with advanced features such as service mesh, serverless frameworks, and custom resource definitions (CRDs).
- Develop and maintain infrastructure-as-code (IaC) templates using Cluster API (CAPI) for automated cluster provisioning and configuration management.
- Implement robust monitoring, logging, and alerting solutions using OpenTelemetry to ensure platform health and performance.
- Optimize resource utilization and cost-effectiveness of Kubernetes deployments across multiple cloud providers.
- Collaborate with teams to design and implement CI/CD pipelines for containerized applications.
- Troubleshoot complex issues in production Kubernetes environments and lead incident response efforts.
You
- Have 5+ years bootstrapping, extending and operating K8s at scale (1,500+ nodes).
- Have 5+ years automating the provisioning, configuration management, and deployment of production systems.
- Have 5+ years building resilient, scalable systems with Python/Go.
- Have 5+ years managing and securing infrastructure at scale (2,000+ hosts).
- Possess sound experience with Infrastructure as Code (Terraform, Ansible, etc.).
- Possess sound knowledge of DevOps, Infrastructure, and Platform concepts.
- Possess strong development skills in Python or Golang.
- Possess strong proficiency with Linux command line and debugging tools.
Nice to Have
- Experience with building complex hybrid environments (AWS and on-premise preferred).
- Experience with service mesh technologies (e.g., Istio, Linkerd) and serverless frameworks (e.g., Knative).
- Experience with multi-cluster or multi-cloud Kubernetes deployments.
- Experience in the machine learning or computer hardware industry.
- Certified Kubernetes Administrator (CKA) and/or Certified Kubernetes Application Developer (CKAD) certification.
- Contributions to open-source Kubernetes projects or tools.
- Familiarity with GitOps principles and tools like ArgoCD or Flux.
About Lambda
- We offer generous cash & equity compensation.
- Investors include Gradient Ventures, Google’s AI-focused venture fund.
- We are experiencing extremely high demand for our systems, with quarter over quarter, year over year profitability.
- Our research papers have been accepted into top machine learning and graphics conferences, including NeurIPS, ICCV, SIGGRAPH, and TOG.
- We have a wildly talented team of 300, and growing fast.
- Health, dental, and vision coverage for you and your dependents.
- Commuter/Work from home stipends for select roles.
- 401k Plan with 2% company match.
- Flexible Paid Time Off Plan that we all actually use.
A Final Note
- You do not need to match all of the listed expectations to apply for this position. We are committed to building a team with a variety of backgrounds, experiences, and skills.
- Lambda is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.
Benefits Extracted with AI
- 401(k)
- Health insurance
- Dental insurance
- Vision insurance
- Commuter/Work from home stipends
- Flexible Paid Time Off
- Equity compensation
Similar jobs
Last update: 23 minutes ago
Senior Software Engineer - Cloud
Join Lambda as a Senior Software Engineer to build the world's best deep learning cloud using AWS, Python, and distributed systems.
Senior Cloud Solutions Engineer
Join Lambda as a Senior Cloud Solutions Engineer to drive cloud product advocacy and customer adoption in San Francisco.
Senior Machine Learning Researcher
Join Lambda as a Senior Machine Learning Researcher to develop AI models and optimize ML workloads. Work in San Jose with flexible benefits.
Linux Support Engineer I - 2nd Shift
Join Lambda as a Linux Support Engineer I for the 2nd shift, focusing on technical support, OS troubleshooting, and customer service.
Senior Infrastructure Engineer
Senior Infrastructure Engineer needed to enhance cloud-based platforms using Golang, AWS, Azure, and GCP in San Francisco.
Senior Software Engineer - Cloud Infrastructure
Join Orkes as a Senior Software Engineer focusing on cloud infrastructure, leveraging AWS, GCP, and Azure. Remote position with competitive salary.
Senior Software Engineer, Machine Learning Infrastructure
Join Scale AI as a Senior Software Engineer in Machine Learning Infrastructure, focusing on backend system design and ML Infrastructure.
Senior Software Engineer, Machine Learning Infrastructure
Join Amazon's Search team as a Senior Software Engineer in ML Infrastructure, focusing on large-scale distributed systems and deep learning.
Senior Software Engineer - Infrastructure
Join Voxel as a Senior Software Engineer - Infrastructure to build cloud infrastructure and distributed systems for AI-driven workplace safety.
Senior Software Engineer II - Infrastructure Platform
Join Samsara as a Senior Software Engineer II to enhance infrastructure reliability and performance. Work remotely in the US or Canada.
Senior Software Engineer (Infrastructure)
Senior Software Engineer specializing in Infrastructure, AI, and Healthcare. Remote work with competitive benefits.
Senior Software Engineer - Kubernetes & Cloud Services
Senior Software Engineer specializing in Kubernetes and Cloud Services at Dynamo AI in San Francisco.
Senior Software Engineer (Cloud & DevOps)
Join Mendel.ai as a Senior Software Engineer in Cloud & DevOps, focusing on cloud infrastructure, CI/CD, and automation.
Senior Software Engineer, Infrastructure
Join Fathom as a Senior Software Engineer, Infrastructure, focusing on devops and security in a remote role.
Staff Tech Lead - Cloud Infrastructure
Lead Cloud Infrastructure projects with expertise in Kubernetes, Python, and AWS at Aurora in Mountain View, CA.
Senior Fullstack Software Engineer - Cloud Platforms
Senior Fullstack Engineer for Cloud Platforms at Tesla, Palo Alto. Focus on distributed systems, automation, and scalability.
Senior Cloud Engineer
Join as a Senior Cloud Engineer to architect and deploy cloud solutions using Azure, AWS, and GCP. Lead innovation in cloud technology.
Senior Machine Learning Engineer
Join Amazon as a Senior Machine Learning Engineer to build scalable AI/ML infrastructure and MLOps platforms.
Senior LLM Engineer
Join our team as a Senior LLM Engineer, leveraging AWS, Python, and JavaScript to develop scalable AI solutions.
Senior Software Engineer - Cloud and Distributed Systems
Join Seqera as a Senior Software Engineer to develop cloud-based solutions in a remote-friendly environment.
Principal Software Engineer
Join Lakera as a Principal Software Engineer to lead AI safety and security innovations in San Francisco.
Remote Software Engineer - Machine Learning and Cloud Infrastructure
Join Helm.ai as a Remote Software Engineer to develop ML tools, build cloud infrastructure, and work on AI technology.
Senior DevOps Engineer
Join NVIDIA as a Senior DevOps Engineer to enhance our Kubernetes platform and multi-cloud infrastructure.
Senior Software Engineer, Infrastructure
Senior Software Engineer role focusing on infrastructure, devops, and programming in San Francisco.