Product Case Study

Reducing B2B network fault
investigation time by 90%

I identified a high-friction operational workflow inside Airtel Business, scoped an internal analytics and automation product, and helped reduce network fault investigation time from 3–5 hours to 29 minutes.

Before3–5 hrs

Manual investigation + cross-team handoffs

After29 min

Automated impact mapping + shared KPI view

Impact90%+

Reduction in investigation cycle time

Teams4

Field Ops, NOC, Product, and Automation

Context

The workflow problem behind the technical problem

Airtel Business serves enterprise customers where network reliability and SLA adherence directly affect customer trust. When a network fault appeared, teams needed to quickly understand whether the issue was isolated or whether it impacted B2B services used by enterprise customers.

The data existed, but it was distributed across tools, teams, and operational definitions. Analysts had to manually connect fault information with service impact, then coordinate with NOC, Field Ops, Product, and Automation teams before the next action was clear.

Problem Statement

Manual correlation slowed decisions when speed mattered most

Operations teams needed a faster and more reliable way to map network faults to impacted B2B services. Manual correlation took 3–5 hours, created cross-team delays, and made SLA risks harder to detect proactively.

The product opportunity was to turn an expert-dependent manual process into a repeatable decision-support system.

Pain Points
Slow manual checks
Conflicting KPI definitions
Multi-vendor complexity
Reactive SLA visibility
Product Goals
Faster investigation
Shared source of truth
Analyst validation
Earlier risk signals

Discovery

Before building, I mapped the work

Mapped the existing network fault-to-service impact workflow step by step.

Identified where analysts spent the most time during each investigation.

Studied 3 months of historical incident data to estimate the opportunity.

Aligned stakeholders on what "impact," SLA risk, and priority meant.

Separated repeatable automation work from judgment-heavy exceptions.

MVP Requirements

What the product needed to do

Correlate network fault data with impacted B2B services automatically.

Reduce investigation time below 60 minutes for the core workflow.

Support multi-vendor network complexity across 4+ router vendors.

Expose a reliable output that analysts could validate quickly.

Feed SLA dashboards with threshold logic for proactive regression detection.

Create shared OPeX metrics definitions for cross-functional reviews.

Solution

An internal decision-support product for fault analysis

I designed and built a network impact and fault analysis automation system that connected fault data, service impact mapping, KPI definitions, and SLA dashboard logic.

The system used Python, Selenium, REST API design, subprocess workflows with PuTTY for multi-vendor router queries, and Tableau dashboards to make investigation faster and more repeatable while preserving analyst validation at every step.

Tradeoffs

What I deliberately did not overbuild

Automate safely before broadly

Incorrect impact mapping could create operational confusion, so the MVP prioritized accuracy, validation, and analyst trust over full automation.

Workflow clarity before AI

AI augmentation was a clear opportunity, but clean workflow mapping and KPI alignment had to come before intelligent recommendations.

Decision output before technical completeness

The product had to help NOC and Field Ops make faster decisions — not simply expose more technical data in a new format.

Impact

What changed after launch

01

Investigation time dropped from 3–5 hours to 29 minutes (90%+ efficiency gain).

02

Teams gained a shared operational KPI view for recurring cross-functional reviews.

03

Two critical SLA breach patterns were identified before customer escalation.

Next Iteration

How I would evolve the product

Phase 1: Adoption Analytics

Track usage, manual overrides, and the points where analysts still need extra context.

Phase 2: Confidence Scoring

Show confidence levels for fault-to-service mapping so teams can separate clear cases from ambiguous ones.

Phase 3: AI-Assisted Triage

Use historical incident patterns to recommend likely root causes, escalation paths, and next-best actions.

Phase 4: Workflow Integration

Move alerts and summaries into the tools NOC and Field Ops already use every day.

Interview Pitch

From messy workflow to measurable outcome

This case study shows the product work I enjoy most: understand the user, define the problem, align stakeholders, ship the smallest useful system, and measure whether decisions got faster.