Introduction
IBM Technology Zone is the one stop shop for IBMers and business partners to build, show, and share solutions built on IBM technologies to facilitate opportunity progression and customer adoption.
Role Overview
The SRE Lead at IBM will be responsible for overseeing the reliability and performance of our systems, working to ensure the highest levels of uptime and reliability. This role will involve leading SRE practices, mentoring junior engineers, and collaborating with various teams to implement best practices in site reliability.
Responsibilities
- Lead the implementation of SRE best practices, including incident management, post-mortem analysis, and capacity planning.
- Collaborate in building and maintaining TechZone automation that deploys and provisions environments at scale.
- Ensure the reliability, availability, and performance of our applications and infrastructure through proactive monitoring, incident response, and optimization.
- Work closely with software engineering, DevOps, and operations teams to design and implement reliable, scalable, and secure systems.
- Help drive automation efforts to reduce manual intervention, enhance deployment processes, and improve overall system efficiency.
- Mentor and guide junior team members, fostering a culture of continuous improvement and learning within the team.
- Conduct incident response efforts, conduct root cause analysis, and implement preventive measures to avoid future incidents.
- Maintain detailed and up-to-date documentation of system architecture, operational procedures, and incident reports.
Required Technical and Professional Expertise
- Proven experience in site reliability engineering, DevOps, or a related field, with a track record of managing and optimizing complex systems.
- Strong proficiency in cloud platforms (e.g., AWS, Azure, GCP), containerization technologies (e.g., Docker, Kubernetes), and infrastructure as code tools (e.g., Terraform, Ansible).
- Proficiency in one or more programming languages (e.g., Python, Go, Java) for scripting and automation.
- Excellent analytical and problem-solving skills, with the ability to troubleshoot and resolve complex issues efficiently.
- Strong communication and leadership skills, with the ability to collaborate effectively with cross-functional teams and mentor junior engineers.
Preferred Technical And Professional Expertise
- Familiarity with Hybrid Cloud technologies and strategy, including IBM's OCP, Cloud Pak, and Services strategy, and how these elements bring value to clients and end users.
- Experience using telemetry and monitoring software.
About Business Unit
IBM has a global presence, operating in more than 175 countries with a broad-based geographic distribution of revenue. The company’s Global Markets organization is a strategic sales business unit that manages IBM’s global footprint, working closely with dedicated country-based operating units to serve clients locally. These country teams have client relationship managers who lead integrated teams of consultants, solution specialists and delivery professionals to enable clients’ growth and innovation. By complementing local expertise with global experience and digital capabilities, IBM builds deep and broad-based client relationships. This local management focus fosters speed in supporting clients, addressing new markets and making investments in emerging opportunities. Additionally, the Global Markets organization serves clients with expertise in their industry as well as through the products and services that IBM and partners supply. IBM is also expanding its reach to new and existing clients through digital marketplaces.
Your Life @ IBM
In a world where technology never stands still, we understand that, dedication to our clients success, innovation that matters, and trust and personal responsibility in all our relationships, lives in what we do as IBMers as we strive to be the catalyst that makes the world work better.
Being an IBMer means you’ll be able to learn and develop yourself and your career, you’ll be encouraged to be courageous and experiment everyday, all whilst having continuous trust and support in an environment where everyone can thrive whatever their personal or professional background.
Our IBMers are growth minded, always staying curious, open to feedback and learning new information and skills to constantly transform themselves and our company. They are trusted to provide on-going feedback to help other IBMers grow, as well as collaborate with colleagues keeping in mind a team focused approach to include different perspectives to drive exceptional outcomes for our customers. The courage our IBMers have to make critical decisions everyday is essential to IBM becoming the catalyst for progress, always embracing challenges with resources they have to hand, a can-do attitude and always striving for an outcome focused approach within everything that they do.
Are you ready to be an IBMer?
Benefits Extracted with AI
- Pension plan
Similar jobs
Last update: 23 minutes ago
Senior Site Reliability Engineer
Senior Site Reliability Engineer at IBM in Cracow, skilled in AWS, Kubernetes, Linux, and Terraform.
Observability Lead
Lead role in designing and managing observability frameworks, integrating tools for system performance and health in New York.
Site Reliability Engineer - IBM Power Systems
Join IBM as a Site Reliability Engineer specializing in IBM Power Systems in Poughkeepsie, NY. Engage in automation, scalability testing, and system performance.
Senior Systems Engineer, Managed Operations
Join AWS as a Senior Systems Engineer in Berlin to lead operations for the European Sovereign Cloud, ensuring high-availability AWS services.
Senior Software Developer
Senior Software Developer role at IBM in Cracow, focusing on hybrid cloud platforms, Kubernetes, and DevOps.
Site Reliability Engineer (SRE) - Stability AI
Join Stability AI as a Site Reliability Engineer (SRE) to enhance cloud infrastructure and system reliability. Remote work available.
Senior Site Reliability Engineer
Join MongoDB as a Senior Site Reliability Engineer in Berlin to design and build global cloud infrastructure, ensuring reliability and performance.
Senior Backend Developer (Node.js) / SRE
Join Binance as a Senior Backend Developer (Node.js) / SRE to develop monitoring systems for high-load production environments.
Senior Site Reliability Engineer
Join Microsoft as a Senior Site Reliability Engineer to design and deliver Office 365 government cloud services.
Cloud Solution Engineer (IC4)
Join Oracle as a Cloud Solution Engineer to design and deploy cloud architectures, driving customer success in Amsterdam.
Associate Integration Solutions Technical Lead
Join EIB as an Associate Integration Solutions Technical Lead in Luxembourg, driving seamless integration solutions with cutting-edge technologies.
Senior DevOps Engineer with Linux, Kubernetes, and GCP
Join Redcare Pharmacy as a Senior DevOps Engineer to enhance infrastructure efficiency using Linux, Kubernetes, and GCP.
Site Reliability Engineering Manager
Lead a DevOps team in a dynamic IT environment, focusing on reliability engineering and cloud solutions.
Site Reliability Engineer - Enablement
Join Happening as a Site Reliability Engineer to enhance gaming operations' performance and reliability using Kubernetes, Terraform, and more.
Senior Solutions Engineer
Join Reddit as a Senior Solutions Engineer in Amsterdam to support our growing advertising business with technical expertise and problem-solving skills.
DevOps Developer at IBM
Join IBM as a DevOps Developer in New York, NY. Engage in building, automating, and maintaining cloud and on-prem solutions.
Senior Software Engineer: Configuration Management/Deployment
Join Uber's Amsterdam team as a Senior Software Engineer focusing on configuration management and deployment. Solve infrastructure challenges at scale.
Cloud Engineer
Join Tibo Energy as a Cloud Engineer to drive energy transition with cloud architecture skills in a dynamic team.
Lead Developer with DevOps and Functional Programming
Join Reaktor as a Lead Developer in Amsterdam, focusing on DevOps, Functional Programming, and JavaScript in a hybrid work environment.
Senior Software Engineer, Site Reliability Engineering
Senior Software Engineer role in Site Reliability at Google, Dublin. Focus on large-scale systems and automation.
Senior Windows Engineer
Senior Windows Engineer at IBM, Radford, VA. Expertise in Windows Server, Active Directory, IaC. Full-time, onsite role with benefits.
Site Reliability Engineer
Join ING as a Site Reliability Engineer in Amsterdam. Tackle challenges in monitoring, resilience design, and lead SRE sessions.
Senior Software Developer
Lead a skilled team in software development focusing on Data Integration at IBM, Cracow. Expertise in Java, JavaScript, C/C++, and cloud services required.
Oracle Cloud Engineer
Join Albert Heijn as an Oracle Cloud Engineer to drive automation and manage cloud infrastructure in Zaandam, Netherlands.