Manual investigation + cross-team handoffs
Product Case Study
Reducing B2B network fault
investigation time by 90%
I identified a high-friction operational workflow inside Airtel Business, scoped an internal analytics and automation product, and helped reduce network fault investigation time from 3–5 hours to 29 minutes.
Automated impact mapping + shared KPI view
Reduction in investigation cycle time
Field Ops, NOC, Product, and Automation
Context
The workflow problem behind the technical problem
Airtel Business serves enterprise customers where network reliability and SLA adherence directly affect customer trust. When a network fault appeared, teams needed to quickly understand whether the issue was isolated or whether it impacted B2B services used by enterprise customers.
The data existed, but it was distributed across tools, teams, and operational definitions. Analysts had to manually connect fault information with service impact, then coordinate with NOC, Field Ops, Product, and Automation teams before the next action was clear.
Discovery
Before building, I mapped the work
Mapped the existing network fault-to-service impact workflow step by step.
Identified where analysts spent the most time during each investigation.
Studied 3 months of historical incident data to estimate the opportunity.
Aligned stakeholders on what "impact," SLA risk, and priority meant.
Separated repeatable automation work from judgment-heavy exceptions.
MVP Requirements
What the product needed to do
Correlate network fault data with impacted B2B services automatically.
Reduce investigation time below 60 minutes for the core workflow.
Support multi-vendor network complexity across 4+ router vendors.
Expose a reliable output that analysts could validate quickly.
Feed SLA dashboards with threshold logic for proactive regression detection.
Create shared OPeX metrics definitions for cross-functional reviews.
Solution
An internal decision-support product for fault analysis
I designed and built a network impact and fault analysis automation system that connected fault data, service impact mapping, KPI definitions, and SLA dashboard logic.
The system used Python, Selenium, REST API design, subprocess workflows with PuTTY for multi-vendor router queries, and Tableau dashboards to make investigation faster and more repeatable while preserving analyst validation at every step.
Tradeoffs
What I deliberately did not overbuild
Automate safely before broadly
Incorrect impact mapping could create operational confusion, so the MVP prioritized accuracy, validation, and analyst trust over full automation.
Workflow clarity before AI
AI augmentation was a clear opportunity, but clean workflow mapping and KPI alignment had to come before intelligent recommendations.
Decision output before technical completeness
The product had to help NOC and Field Ops make faster decisions — not simply expose more technical data in a new format.
Impact
What changed after launch
Investigation time dropped from 3–5 hours to 29 minutes (90%+ efficiency gain).
Teams gained a shared operational KPI view for recurring cross-functional reviews.
Two critical SLA breach patterns were identified before customer escalation.
Next Iteration
How I would evolve the product
Phase 1: Adoption Analytics
Track usage, manual overrides, and the points where analysts still need extra context.
Phase 2: Confidence Scoring
Show confidence levels for fault-to-service mapping so teams can separate clear cases from ambiguous ones.
Phase 3: AI-Assisted Triage
Use historical incident patterns to recommend likely root causes, escalation paths, and next-best actions.
Phase 4: Workflow Integration
Move alerts and summaries into the tools NOC and Field Ops already use every day.
Interview Pitch
From messy workflow to measurable outcome
This case study shows the product work I enjoy most: understand the user, define the problem, align stakeholders, ship the smallest useful system, and measure whether decisions got faster.