Stability AI logo

Site Reliability Engineer (SRE) - Stability AI

Stability AI

About Stability AI

Stability AI is a community and mission-driven, open artificial intelligence company that cares deeply about real-world implications and applications. Our most considerable advances grow from our diversity in working across multiple teams and disciplines. We are unafraid to go against established norms and explore creativity. We are motivated to generate breakthrough ideas and convert them into tangible solutions. Our vibrant communities consist of experts, leaders, and partners across the globe who are developing cutting-edge open AI models for Image, Language, Audio, Video, and 3D.

Job Description

Stability AI’s Security team is looking for a Site Reliability Engineer (SRE) to help shape our cloud infrastructure. The person will closely work with IT, security, SRE and engineering teams to improve reliability across our environment. Candidates should have the initiative to build and improve a maturing cloud landscape.

Responsibilities

  • Implementing and maintaining infrastructure as code using Terraform
  • Supporting container orchestration platforms such as Kubernetes or ECS
  • Participating in incident management and root cause analysis to improve system reliability
  • Contributing to cloud security practices and resource tagging strategies

Qualifications

  • Collaborating with development teams to enhance CI/CD pipelines
  • Cloud security experience
  • Training and working with generative models
  • Background in software development or automation scripting
  • Knowledge of Grafana, ELK stack, or similar tools
  • Involvement in the SRE or DevOps community

Equal Employment Opportunity

We are an equal opportunity employer and do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, veteran status, disability or other legally protected statuses.

Benefits
Extracted with AI

  • Remote work flexibility

Similar jobs

Last update: 23 minutes ago

Hasura logo
Hasura

Site Reliability Engineer (SRE) - Hasura Cloud

Join Hasura as a Site Reliability Engineer to ensure smooth operation of Hasura Cloud systems, working remotely from India.

NICE logo
NICE

Senior Cloud Site Reliability Engineer

Senior Cloud Site Reliability Engineer role focusing on enhancing cloud service reliability and efficiency.

Hasura logo
Hasura

Senior Site Reliability Engineer (SRE) - Hasura Cloud

Join Hasura as a Senior Site Reliability Engineer to maintain and scale Hasura Cloud. Remote role in the US with competitive salary and benefits.

Hasura logo
Hasura

Senior Site Reliability Engineer (SRE) - Hasura Cloud

Join Hasura as a Senior Site Reliability Engineer to maintain and enhance Hasura Cloud's reliability and performance.

Stability AI logo
Stability AI

Remote Data Engineer - Research

Join Stability AI as a Remote Data Engineer to build scalable data infrastructure for AI models.

GitLab logo
GitLab

Site Reliability Engineer - Delivery: Deployments, North America

Remote Site Reliability Engineer specializing in Delivery: Deployments at GitLab, focusing on improving delivery platforms and tooling.

Stability AI logo
Stability AI

Senior Backend Engineer (AI)

Join Stability AI as a Senior Backend Engineer to develop REST APIs and AI/ML services for Generative AI models.

Algolia logo
Algolia

Senior Site Reliability Engineer

Join Algolia as a Senior Site Reliability Engineer to enhance search product reliability and scalability. Remote work available.

Stability AI logo
Stability AI

Senior Data Engineer

Join Stability AI as a Senior Data Engineer to build scalable data infrastructure for AI models. Remote work from Germany.

Stability AI logo
Stability AI

Senior Data Platform Engineer

Senior Data Platform Engineer specializing in AWS and GCP services, data pipelines, and cloud infrastructure.

IBM logo
IBM

SRE Lead at IBM

Lead SRE role at IBM, overseeing system reliability, implementing best practices, and mentoring in New York.

Happening logo
Happening

Site Reliability Engineer - Enablement

Join Happening as a Site Reliability Engineer to enhance gaming operations' performance and reliability using Kubernetes, Terraform, and more.

OpenAI logo
OpenAI

Senior Software Engineer, Observability

Join OpenAI as a Senior Software Engineer in Observability, ensuring system reliability and scalability in a fast-paced environment.

Lightspeed Commerce logo
Lightspeed Commerce

Senior Site Reliability Expert

Join Lightspeed as a Senior Site Reliability Expert in Amsterdam. Work on cloud infrastructure, automation, and high availability systems.

MongoDB logo
MongoDB

Senior Site Reliability Engineer

Join MongoDB as a Senior Site Reliability Engineer in Berlin to design and build global cloud infrastructure, ensuring reliability and performance.

Microsoft logo
Microsoft

Senior Site Reliability Engineer

Join Microsoft as a Senior Site Reliability Engineer to design and deliver Office 365 government cloud services.

The Workshop logo
The Workshop

Site Reliability Engineering Manager

Lead a DevOps team in a dynamic IT environment, focusing on reliability engineering and cloud solutions.

Anduril Industries logo
Anduril Industries

Software Reliability Engineer

Join Anduril Industries as a Software Reliability Engineer in Seattle, WA. Develop cutting-edge software for electronic warfare systems.

CrowdStrike logo
CrowdStrike

Senior Software Engineer - Cloud Platform Reliability

Join CrowdStrike as a Senior Software Engineer focusing on cloud platform reliability and scalability in a remote-first role.

Google logo
Google

Senior Software Engineer, Site Reliability Engineering

Senior Software Engineer role in Site Reliability at Google, Dublin. Focus on large-scale systems and automation.

Pure Storage logo
Pure Storage

Site Reliability Engineer, FlashArray

Join Pure Storage as a Site Reliability Engineer in Prague, focusing on cloud infrastructure uptime and incident response.

IBM logo
IBM

Senior Site Reliability Engineer

Senior Site Reliability Engineer at IBM in Cracow, skilled in AWS, Kubernetes, Linux, and Terraform.

Tint logo
Tint

Senior Site Reliability Engineer (AWS, Node.js)

Join Tint as a Senior Site Reliability Engineer to enhance AWS infrastructure efficiency and reliability. Remote role in the US.

Paradigm logo
Paradigm

Principal Infrastructure Engineer - DevSecOps

Lead DevSecOps Engineer role focusing on site reliability, security, and cloud infrastructure management with competitive benefits.