IBM logo

SRE Lead at IBM

IBM

Introduction

IBM Technology Zone is the one stop shop for IBMers and business partners to build, show, and share solutions built on IBM technologies to facilitate opportunity progression and customer adoption.

Role Overview

The SRE Lead at IBM will be responsible for overseeing the reliability and performance of our systems, working to ensure the highest levels of uptime and reliability. This role will involve leading SRE practices, mentoring junior engineers, and collaborating with various teams to implement best practices in site reliability.

Responsibilities

  • Lead the implementation of SRE best practices, including incident management, post-mortem analysis, and capacity planning.
  • Collaborate in building and maintaining TechZone automation that deploys and provisions environments at scale.
  • Ensure the reliability, availability, and performance of our applications and infrastructure through proactive monitoring, incident response, and optimization.
  • Work closely with software engineering, DevOps, and operations teams to design and implement reliable, scalable, and secure systems.
  • Help drive automation efforts to reduce manual intervention, enhance deployment processes, and improve overall system efficiency.
  • Mentor and guide junior team members, fostering a culture of continuous improvement and learning within the team.
  • Conduct incident response efforts, conduct root cause analysis, and implement preventive measures to avoid future incidents.
  • Maintain detailed and up-to-date documentation of system architecture, operational procedures, and incident reports.

Required Technical and Professional Expertise

  • Proven experience in site reliability engineering, DevOps, or a related field, with a track record of managing and optimizing complex systems.
  • Strong proficiency in cloud platforms (e.g., AWS, Azure, GCP), containerization technologies (e.g., Docker, Kubernetes), and infrastructure as code tools (e.g., Terraform, Ansible).
  • Proficiency in one or more programming languages (e.g., Python, Go, Java) for scripting and automation.
  • Excellent analytical and problem-solving skills, with the ability to troubleshoot and resolve complex issues efficiently.
  • Strong communication and leadership skills, with the ability to collaborate effectively with cross-functional teams and mentor junior engineers.

Preferred Technical And Professional Expertise

  • Familiarity with Hybrid Cloud technologies and strategy, including IBM's OCP, Cloud Pak, and Services strategy, and how these elements bring value to clients and end users.
  • Experience using telemetry and monitoring software.

About Business Unit

IBM has a global presence, operating in more than 175 countries with a broad-based geographic distribution of revenue. The company’s Global Markets organization is a strategic sales business unit that manages IBM’s global footprint, working closely with dedicated country-based operating units to serve clients locally. These country teams have client relationship managers who lead integrated teams of consultants, solution specialists and delivery professionals to enable clients’ growth and innovation. By complementing local expertise with global experience and digital capabilities, IBM builds deep and broad-based client relationships. This local management focus fosters speed in supporting clients, addressing new markets and making investments in emerging opportunities. Additionally, the Global Markets organization serves clients with expertise in their industry as well as through the products and services that IBM and partners supply. IBM is also expanding its reach to new and existing clients through digital marketplaces.

Your Life @ IBM

In a world where technology never stands still, we understand that, dedication to our clients success, innovation that matters, and trust and personal responsibility in all our relationships, lives in what we do as IBMers as we strive to be the catalyst that makes the world work better.

Being an IBMer means you’ll be able to learn and develop yourself and your career, you’ll be encouraged to be courageous and experiment everyday, all whilst having continuous trust and support in an environment where everyone can thrive whatever their personal or professional background.

Our IBMers are growth minded, always staying curious, open to feedback and learning new information and skills to constantly transform themselves and our company. They are trusted to provide on-going feedback to help other IBMers grow, as well as collaborate with colleagues keeping in mind a team focused approach to include different perspectives to drive exceptional outcomes for our customers. The courage our IBMers have to make critical decisions everyday is essential to IBM becoming the catalyst for progress, always embracing challenges with resources they have to hand, a can-do attitude and always striving for an outcome focused approach within everything that they do.

Are you ready to be an IBMer?

Benefits
Extracted with AI

  • Pension plan

Similar jobs

Last update: 23 minutes ago

IBM logo
IBM

Senior Site Reliability Engineer

Senior Site Reliability Engineer at IBM in Cracow, skilled in AWS, Kubernetes, Linux, and Terraform.

IBM logo
IBM

Observability Lead

Lead role in designing and managing observability frameworks, integrating tools for system performance and health in New York.

IBM logo
IBM

Site Reliability Engineer - IBM Power Systems

Join IBM as a Site Reliability Engineer specializing in IBM Power Systems in Poughkeepsie, NY. Engage in automation, scalability testing, and system performance.

Amazon Web Services (AWS) logo
Amazon Web Services (AWS)

Senior Systems Engineer, Managed Operations

Join AWS as a Senior Systems Engineer in Berlin to lead operations for the European Sovereign Cloud, ensuring high-availability AWS services.

IBM logo
IBM

Senior Software Developer

Senior Software Developer role at IBM in Cracow, focusing on hybrid cloud platforms, Kubernetes, and DevOps.

Stability AI logo
Stability AI

Site Reliability Engineer (SRE) - Stability AI

Join Stability AI as a Site Reliability Engineer (SRE) to enhance cloud infrastructure and system reliability. Remote work available.

MongoDB logo
MongoDB

Senior Site Reliability Engineer

Join MongoDB as a Senior Site Reliability Engineer in Berlin to design and build global cloud infrastructure, ensuring reliability and performance.

Binance logo
Binance

Senior Backend Developer (Node.js) / SRE

Join Binance as a Senior Backend Developer (Node.js) / SRE to develop monitoring systems for high-load production environments.

Microsoft logo
Microsoft

Senior Site Reliability Engineer

Join Microsoft as a Senior Site Reliability Engineer to design and deliver Office 365 government cloud services.

Oracle logo
Oracle

Cloud Solution Engineer (IC4)

Join Oracle as a Cloud Solution Engineer to design and deploy cloud architectures, driving customer success in Amsterdam.

European Investment Bank (EIB) logo
European Investment Bank (EIB)

Associate Integration Solutions Technical Lead

Join EIB as an Associate Integration Solutions Technical Lead in Luxembourg, driving seamless integration solutions with cutting-edge technologies.

Redcare Pharmacy logo
Redcare Pharmacy

Senior DevOps Engineer with Linux, Kubernetes, and GCP

Join Redcare Pharmacy as a Senior DevOps Engineer to enhance infrastructure efficiency using Linux, Kubernetes, and GCP.

The Workshop logo
The Workshop

Site Reliability Engineering Manager

Lead a DevOps team in a dynamic IT environment, focusing on reliability engineering and cloud solutions.

Happening logo
Happening

Site Reliability Engineer - Enablement

Join Happening as a Site Reliability Engineer to enhance gaming operations' performance and reliability using Kubernetes, Terraform, and more.

Reddit, Inc. logo
Reddit, Inc.

Senior Solutions Engineer

Join Reddit as a Senior Solutions Engineer in Amsterdam to support our growing advertising business with technical expertise and problem-solving skills.

IBM logo
IBM

DevOps Developer at IBM

Join IBM as a DevOps Developer in New York, NY. Engage in building, automating, and maintaining cloud and on-prem solutions.

Uber logo
Uber

Senior Software Engineer: Configuration Management/Deployment

Join Uber's Amsterdam team as a Senior Software Engineer focusing on configuration management and deployment. Solve infrastructure challenges at scale.

Tibo Energy Management Software logo
Tibo Energy Management Software

Cloud Engineer

Join Tibo Energy as a Cloud Engineer to drive energy transition with cloud architecture skills in a dynamic team.

Reaktor logo
Reaktor

Lead Developer with DevOps and Functional Programming

Join Reaktor as a Lead Developer in Amsterdam, focusing on DevOps, Functional Programming, and JavaScript in a hybrid work environment.

Google logo
Google

Senior Software Engineer, Site Reliability Engineering

Senior Software Engineer role in Site Reliability at Google, Dublin. Focus on large-scale systems and automation.

IBM logo
IBM

Senior Windows Engineer

Senior Windows Engineer at IBM, Radford, VA. Expertise in Windows Server, Active Directory, IaC. Full-time, onsite role with benefits.

ING logo
ING

Site Reliability Engineer

Join ING as a Site Reliability Engineer in Amsterdam. Tackle challenges in monitoring, resilience design, and lead SRE sessions.

IBM logo
IBM

Senior Software Developer

Lead a skilled team in software development focusing on Data Integration at IBM, Cracow. Expertise in Java, JavaScript, C/C++, and cloud services required.

Albert Heijn logo
Albert Heijn

Oracle Cloud Engineer

Join Albert Heijn as an Oracle Cloud Engineer to drive automation and manage cloud infrastructure in Zaandam, Netherlands.