Position Overview
In today's cloud-centric world, the reliability of cloud platforms underpins everything. Ensuring the heartbeat of these systems is crucial. At the frontier of cloud technology, Site Reliability Engineering (SRE) works diligently to bolster the availability of our cloud infrastructure and services. As we pivot to a cloud-first strategy, Pure Storage seeks Site Reliability Engineers, with the ability to play a leading role in our cloud-focused transformation across the broader engineering organization. This team will be passionate about ensuring impeccable uptime, seamless scalability, observability, and unmatched availability.
This position is based at our office in Prague, Czech Republic. This is a software development position, in a team that is distributed across US (California) and Europe (Prague).
Responsibilities
- Become part of our nascent SRE team across US and Europe
- Responsible for uptime and reliability of our core services and infrastructure, including proactive monitoring and incident response/ resolution
- Maintain 24x7 production environment with a high level of service availability
- Manage operational issues, drive root cause analysis and resolution of production issues
- Explore and implement new cloud and high availability (HA) technologies and tools
- Partner with development teams in defining and implementing improvements in services architecture
- Implement automation and orchestration of manual processes required to operate and deploy cloud services
- Setup and improve service health monitoring, observability, collecting & reporting metrics, alerting
- Interface with engineering to establish a support structure, with runbooks to ensure uptime and customer success
Qualifications
- 8+ years of experience as Software Engineer and/or SRE or DevOps to support globally distributed SaaS services
- Experience with one or more of the following: Java, Python, Go, Perl and/or Ruby
- Proven ability to design, develop and operate commercially successful cloud services with high availability and well defined SLA
- Experience with IaC, automation & configuration management using tools such as Terraform, Ansible, Puppet, Chef, CloudFormation or ARM templates
- Experience with virtualization, containers and management systems such as Kubernetes
- Experience setting up monitoring of production services using ELK or something similar
- Practical experience setting up support processes using tools such as PagerDuty
- Deep understanding of the software delivery process and what it takes to “go live”
- In-depth knowledge of a public cloud platform such as AWS, Azure or GCP is a must
- Experience with Unix/Linux operating systems internals and administration or networking
- BS or higher in Computer Science, Computer Engineering or related field and equivalent practical experience
Benefits Extracted with AI
- Flexible time off
- Wellness resources
- Company-sponsored events
Similar jobs
Last update: 23 minutes ago
Senior Site Reliability Engineer
Senior Site Reliability Engineer at IBM in Cracow, skilled in AWS, Kubernetes, Linux, and Terraform.
Site Reliability Engineer (SRE) - Stability AI
Join Stability AI as a Site Reliability Engineer (SRE) to enhance cloud infrastructure and system reliability. Remote work available.
Senior Production SRE Engineer - Storage
Join NVIDIA as a Senior Production SRE Engineer - Storage, ensuring reliability of GPU cloud services with cutting-edge technologies.
Cloud Ops Engineer
Join Wrike as a Cloud Ops Engineer in Prague. Manage cloud infrastructure, ensure uptime, and work with GCP, AWS, Kubernetes, and more.
Senior Cloud Site Reliability Engineer
Senior Cloud Site Reliability Engineer role focusing on enhancing cloud service reliability and efficiency.
Site Reliability Engineer - Enablement
Join Happening as a Site Reliability Engineer to enhance gaming operations' performance and reliability using Kubernetes, Terraform, and more.
Senior Site Reliability Expert
Join Lightspeed as a Senior Site Reliability Expert in Amsterdam. Work on cloud infrastructure, automation, and high availability systems.
SRE Lead at IBM
Lead SRE role at IBM, overseeing system reliability, implementing best practices, and mentoring in New York.
Site Reliability Engineering Manager
Lead a DevOps team in a dynamic IT environment, focusing on reliability engineering and cloud solutions.
Senior Site Reliability Engineer
Join Valtech as a Senior Site Reliability Engineer in Sofia, Bulgaria. Work with AWS, GCP, and Azure in a hybrid environment.
Senior Site Reliability Engineer - Platform
Join Monta as a Senior Site Reliability Engineer to manage AWS Kubernetes infrastructure and enhance EV charging solutions.
Lead Software Engineer – SRE (Relocation to Bangkok)
Lead SRE Software Engineer role in Brno, Czechia. Involves relocation to Bangkok, system reliability focus, and diverse team collaboration.
Senior Site Reliability Engineer
Join MongoDB as a Senior Site Reliability Engineer in Berlin to design and build global cloud infrastructure, ensuring reliability and performance.
Senior Staff Software Engineer – Backend – Singularity Data Lake
Senior Staff Software Engineer for backend development in Prague, focusing on high-scale data processing and distributed systems.
Senior Site Reliability Engineer
Senior Site Reliability Engineer role in Vilnius, focusing on AWS, Linux, and microservices architecture.
Senior Site Reliability Engineer - Production Platform
Join Adyen as a Senior Site Reliability Engineer in Amsterdam, focusing on automation, containerization, and distributed systems.
Site Reliability Engineer, CI/CD
Join Vinted as a Site Reliability Engineer, CI/CD in Kaunas, Lithuania. Help scale our CI/CD infrastructure and enhance developer experience.
SW Engineering Manager
Lead a team of engineers in developing SaaS solutions, ensuring best practices and continuous improvement in Prague.
Site Reliability Engineer (SRE) - Hasura Cloud
Join Hasura as a Site Reliability Engineer to ensure smooth operation of Hasura Cloud systems, working remotely from India.
Site Reliability Engineer - IBM Power Systems
Join IBM as a Site Reliability Engineer specializing in IBM Power Systems in Poughkeepsie, NY. Engage in automation, scalability testing, and system performance.
Senior Site Reliability Engineer
Join Adyen as a Senior Site Reliability Engineer in Amsterdam to ensure platform stability and reliability through automation and troubleshooting.
Senior Site Reliability Engineer (SRE) - Hasura Cloud
Join Hasura as a Senior Site Reliability Engineer to maintain and enhance Hasura Cloud's reliability and performance.
Software Reliability Engineer
Join Anduril Industries as a Software Reliability Engineer in Seattle, WA. Develop cutting-edge software for electronic warfare systems.
Senior Site Reliability Engineer (SRE) - Hasura Cloud
Join Hasura as a Senior Site Reliability Engineer to maintain and scale Hasura Cloud. Remote role in the US with competitive salary and benefits.