Job Overview
Hasura is seeking a skilled Site Reliability Engineer (SRE) to join our team and ensure the smooth operation of Hasura Cloud systems. This role is crucial for maintaining system reliability and facilitating seamless updates without downtime. You will work remotely from India, aligning with US hours, and have the option to work from our Bangalore office if preferred.
Key Responsibilities
- Infrastructure Development: Build and maintain infrastructure using Terraform, Kubernetes, VMs, and bare metal instances.
- System Design: Design core infrastructure components to support Hasura Cloud's scalability, handling thousands of concurrent requests.
- Cloud Expansion: Expand Hasura Cloud's capabilities to support multiple cloud providers.
- Deployment Process Improvement: Enhance deployment processes to ensure reliability and minimize disruptions.
- Incident Response: Participate in a PagerDuty rotation to address availability incidents and support service engineers with customer issues.
- Proactive Issue Resolution: Use development time to address systemic issues and prevent future incidents.
- Monitoring and Alerts: Design intelligent monitoring systems that provide meaningful alerts based on symptoms rather than causes.
- Documentation and Automation: Document actions to create repeatable processes and automate tasks to improve efficiency.
- Production Debugging: Troubleshoot production issues across various services and stack levels.
- Infrastructure Growth Planning: Strategize the growth of Hasura Cloud's infrastructure.
Requirements
- Experience: 4+ years in a similar role, with a strong understanding of system behaviors, edge cases, and failure modes.
- Technical Skills: Proficiency in Linux, Unix Shell, Terraform, and programming languages such as Go and Python.
- Collaboration: Ability to work asynchronously with a globally distributed team and document processes thoroughly.
- Automation: Passion for building automation and tooling to streamline repetitive tasks.
- Cloud and Monitoring Tools: Experience with cloud providers (AWS, GCP, Azure) and monitoring tools (Honeycomb, Datadog, Prometheus, Grafana).
Nice to Have
- Familiarity with Hasura and its GraphQL APIs.
- Strong SQL skills, particularly with PostgreSQL.
- Experience in database management and scaling.
Working at Hasura
At Hasura, we empower developers to build modern applications quickly. Our team is dedicated to enhancing the developer experience and making our tools as user-friendly as possible. We offer a flexible work environment, allowing for remote or in-person collaboration at our offices in San Francisco and Bangalore.
Perks
- Remote & Hybrid Work Environment: Flexibility to work remotely or from our office spaces.
- Self-care Fridays: The second Friday of every month is a day off for personal rejuvenation.
- Equipment and Learning Allowance: Budgets for necessary tools and learning opportunities.
- Donation Matching: Annual fund to match donations to global organizations.
- Flexible Timings & PTO: Freedom to set work schedules and generous paid time off options.
Application Process
We encourage applications even if you don't meet all the requirements. We value diverse perspectives and are open to discussing any questions you may have about our culture and work processes during the interview.
Join us at Hasura and contribute to building a robust developer ecosystem with cutting-edge technology.
Benefits Extracted with AI
- Remote & Hybrid Work Environment
- Self-care Fridays
- Equipment and learning allowance
- Donation Matching
- Flexible timings & PTO
Similar jobs
Last update: 23 minutes ago
Senior Site Reliability Engineer (SRE) - Hasura Cloud
Join Hasura as a Senior Site Reliability Engineer to maintain and enhance Hasura Cloud's reliability and performance.
Senior Site Reliability Engineer (SRE) - Hasura Cloud
Join Hasura as a Senior Site Reliability Engineer to maintain and scale Hasura Cloud. Remote role in the US with competitive salary and benefits.
Senior/Staff Software Engineer - Backend
Join Hasura as a Senior/Staff Software Engineer - Backend, working remotely in India, focusing on scalable distributed systems and cloud services.
Site Reliability Engineer (SRE) - Stability AI
Join Stability AI as a Site Reliability Engineer (SRE) to enhance cloud infrastructure and system reliability. Remote work available.
Site Reliability Engineer - Enablement
Join Happening as a Site Reliability Engineer to enhance gaming operations' performance and reliability using Kubernetes, Terraform, and more.
Senior DevOps Engineer
Join saas.group as a Senior DevOps Engineer, working remotely to manage and optimize our central infrastructure.
Senior Site Reliability Engineer
Join Algolia as a Senior Site Reliability Engineer to enhance search product reliability and scalability. Remote work available.
Senior Site Reliability Engineer
Join Valtech as a Senior Site Reliability Engineer in Sofia, Bulgaria. Work with AWS, GCP, and Azure in a hybrid environment.
Senior Platform Engineer, SRE
Join HelloFresh as a Senior Platform Engineer, SRE in Berlin. Work on infrastructure automation, observability, and reliability.
Senior Software Engineer - Cloud Platform Reliability
Join CrowdStrike as a Senior Software Engineer focusing on cloud platform reliability and scalability in a remote-first role.
Senior Site Reliability Engineer
Join MongoDB as a Senior Site Reliability Engineer in Berlin to design and build global cloud infrastructure, ensuring reliability and performance.
Senior Site Reliability Expert
Join Lightspeed as a Senior Site Reliability Expert in Amsterdam. Work on cloud infrastructure, automation, and high availability systems.
Senior Software Engineer - Cloud Operations
Senior Software Engineer for Cloud Ops at Sourcegraph, specializing in cloud infrastructure, Kubernetes, and Terraform.
Site Reliability Engineer - Delivery: Deployments, North America
Remote Site Reliability Engineer specializing in Delivery: Deployments at GitLab, focusing on improving delivery platforms and tooling.
Senior Systems Engineer - Cloud Infrastructure
Senior Systems Engineer role focusing on cloud infrastructure, AWS, DevOps, and system architecture at a leading payment orchestration company.
Site Reliability Engineering Manager
Lead a DevOps team in a dynamic IT environment, focusing on reliability engineering and cloud solutions.
Senior Cloud Site Reliability Engineer
Senior Cloud Site Reliability Engineer role focusing on enhancing cloud service reliability and efficiency.
Software Engineer (Fullstack/Cloud)
Join SOUTHWORKS as a Software Engineer (Fullstack/Cloud) to work on high-profile projects with remote flexibility.
Senior Platform Engineer SRE
Senior Platform Engineer SRE role at HelloFresh in Berlin, focusing on reliability, automation, and observability.
Staff Platform Engineer (Remote)
Join Checkly as a Staff Platform Engineer, leading site reliability practices in a remote-first, DevOps-focused startup.
Senior Backend Engineer
Join Grafana Labs as a Senior Backend Engineer, working remotely in the US/Canada on Kubernetes monitoring.
Senior Software Engineer, GraphQL
Senior Software Engineer role focusing on GraphQL, system performance, and reliability in San Francisco, CA.
Senior Production SRE Engineer - Storage
Join NVIDIA as a Senior Production SRE Engineer - Storage, ensuring reliability of GPU cloud services with cutting-edge technologies.
Tech Lead, Site Reliability Engineering (SRE)
Lead Site Reliability Engineering at Uniswap Labs, enhancing system reliability and performance, based in New York, remote possible.