IBM logo

Observability Lead

IBM

Introduction

IBM Technology Zone is the one stop shop for IBMers and business partners to build, show, and share solutions built on IBM technologies to facilitate opportunity progression and customer adoption.

This is a new role in TechZone that provides leadership in designing, implementing, and managing the overall observability framework for the organization. This includes setting up monitoring systems, logging tools, and tracing solutions to provide comprehensive visibility into the system's performance, health, and usage patterns. Additionally, they collaborate with various teams to integrate observability into their development workflows, ensuring that issues can be detected and resolved quickly. The Observability Lead also contributes to the continuous improvement of the observability practice by staying updated on latest trends, technologies, and best practices.

Your Role and Responsibilities

  • Develop and implement comprehensive observability solutions, including monitoring, logging, tracing, and alerting systems.
  • Work closely with developers, DevOps, and SRE teams to understand their requirements and ensure the observability tools meet their needs.
  • Continuously improve the observability infrastructure to ensure system reliability, performance, and scalability.
  • Collaborate in building and maintaining TechZone automation that deploys and provisions environments at scale.
  • Participate in incident response efforts, providing valuable insights through observability data to diagnose and resolve issues quickly.
  • Analyze observability data to identify trends, bottlenecks, and potential issues. Generate reports and dashboards for stakeholders.
  • Evaluate and integrate new observability tools and technologies to enhance our monitoring capabilities.
  • Develop and promote best practices for observability within the organization, including documentation and training for engineering teams.

Required Technical and Professional Expertise

  • Experience: Proven experience in observability, monitoring, or related fields, with a strong understanding of modern observability practices.
  • Technical Skills: Proficiency with observability tools such as Prometheus, Grafana, ELK stack, Jaeger, or similar. Strong scripting and automation skills (e.g., Python, Bash).
  • System Knowledge: In-depth knowledge of distributed systems, microservices architecture, and cloud platforms (e.g., AWS, Azure, GCP).
  • Problem-Solving: Excellent analytical and problem-solving skills, with the ability to diagnose complex issues using observability data.
  • Communication: Strong communication skills, with the ability to collaborate effectively with cross-functional teams and present findings to stakeholders.
  • Education: Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience.

Preferred Technical And Professional Expertise

  • Preferred consideration will be given to candidates with a development background and demonstrated programming experience.
  • Preference to candidates with cloud architecture experience.

Benefits
Extracted with AI

  • Pension plan

Similar jobs

Last update: 23 minutes ago

ING Belgium logo
ING Belgium

Reliability Programme Manager - Tech BE SRE - Change Expert

Join ING Belgium as a Reliability Programme Manager to drive SRE practices and improve service reliability.

VASS logo
VASS

Senior Site Reliability Engineer - OSDU

Join VASS as a Senior Site Reliability Engineer in Brussels, enhancing platform reliability and availability for the European Commission.

Mindbox SA logo
Mindbox SA

SRE Engineering Manager

Join Mindbox SA as an SRE Engineering Manager in Warsaw. Lead teams, manage software development, and ensure reliability in a hybrid work environment.

Happening logo
Happening

Site Reliability Engineer - Enablement

Join Happening as a Site Reliability Engineer to enhance gaming operations' performance and reliability using Kubernetes, Terraform, and more.

IBM logo
IBM

Frontend Developer with React.js and TypeScript

Join IBM as a Frontend Developer in Dublin, focusing on React.js, TypeScript, and UI design for AI-powered cloud solutions.

Orion Innovation logo
Orion Innovation

Senior Microsoft Fullstack Developer

Join Orion Innovation as a Senior Microsoft Fullstack Developer in Montvale, NJ. Work with Angular, C#, and .NET in a hybrid environment.

OpenAI logo
OpenAI

Senior Software Engineer, Observability

Join OpenAI as a Senior Software Engineer in Observability, ensuring system reliability and scalability in a fast-paced environment.

CareAbout Health logo
CareAbout Health

Principal Software Engineer - HealthTech

Join CareAbout Health as a Principal Software Engineer to lead HealthTech innovations using AWS, Python, and cloud computing.

Boston Consulting Group (BCG) logo
Boston Consulting Group (BCG)

Global IT LLM Engineer Director & Chapter Lead

Lead AI and ML innovation as Global IT LLM Engineer Director at BCG, focusing on GenAI product development and optimization.

Oracle logo
Oracle

Senior Software Developer (C#, Microservices)

Senior Software Developer role in Austin, TX, focusing on C# and Microservices with competitive salary and benefits.

Oracle logo
Oracle

Principal Software Developer - Frontend Framework

Join Oracle as a Principal Software Developer focusing on frontend frameworks. Work remotely with cutting-edge technologies.

Poggio logo
Poggio

Senior AI Engineer

Join Poggio as a Senior AI Engineer to innovate AI systems for enterprise sales, focusing on AI capabilities and system performance.

Abridge logo
Abridge

Senior Full Stack Engineer, LLM APIs

Join Abridge as a Senior Full Stack Engineer to build innovative ML-powered solutions in healthcare AI, focusing on LLM APIs and cloud services.

Morningstar logo
Morningstar

Lead Full Stack Engineer with Java and Spring Boot

Lead Full Stack Engineer role in Chicago, focusing on Java, Spring Boot, and AWS for Morningstar Indexes.

Stripe logo
Stripe

ML Engineering Manager, LLM Foundation

Lead ML engineering team at Stripe, focusing on LLMs and AI/ML systems. Drive innovation and manage high-impact projects.

Aleph logo
Aleph

Frontend Engineer, AI

Join Aleph as a Frontend Engineer focusing on AI to develop innovative features using React.js and AI technologies in a remote role.

Centraprise logo
Centraprise

Java Fullstack AWS Developer

Seeking a Java Fullstack AWS Developer with expertise in AWS, Angular, and Java for on-site role in New York. Join our innovative engineering team.

Grafana Labs logo
Grafana Labs

Senior Full-Stack Web Developer

Remote Senior Full-Stack Web Developer role at Grafana Labs, focusing on Next.js, Node.js, and Tailwind CSS.

Arena logo
Arena

Software Engineer, Growth

Join Arena as a Software Engineer, Growth, to solve complex challenges with AI, focusing on machine learning and algorithm design.

Snowflake logo
Snowflake

Senior Software Engineer - LLM

Join Snowflake as a Senior Software Engineer to build scalable machine learning platforms with LLMs, leveraging Python and TensorFlow.

Jobs via eFinancialCareers logo
Jobs via eFinancialCareers

Senior Full Stack Engineer - Assistant Vice President

Join Deutsche Bank as a Senior Full Stack Engineer in Cary, NC, driving engineering practices and application modernization.

UKG logo
UKG

Lead AI Full Stack Developer

Lead AI Full Stack Developer role in Alpharetta, GA, focusing on AI-driven applications using GCP, full-stack development, and MLOps.

Bevel logo
Bevel

Entry Level iOS Engineer

Join Bevel as an Entry Level iOS Engineer to develop innovative health apps using Swift and Objective-C in New York.

IBM logo
IBM

Senior Frontend Developer with React and TypeScript

Join IBM as a Senior Frontend Developer to create AI-powered, cloud-native solutions using React and TypeScript.