HomeSample Page

Sample Page Title


Within the face of rising IT complexity, Cisco IT unified observability throughout its world atmosphere. The outcomes: 25% fewer main incidents, 45% quicker decision, and scalable automation. Right here’s how we did it—and proposals in your digital resilience journey.

Our problem: fragmented visibility

Many organizations wrestle with fragmented monitoring and extended incident decision. We confronted the identical challenges and wanted to interrupt down information silos to guard our fast-changing atmosphere.  

In Cisco IT, managing our world, dispersed IT panorama is more and more complicated. Fragmented information and visibility gaps had been making it more and more troublesome to take care of digital resilience amid our fast-paced innovation and frequent environmental adjustments. We would have liked a approach to flip our information into actionable insights however lacked a unified observability platform to centralize and make sense of all of it. 

When a serious database outage in 2024 revealed our fragmented information — uncorrelated alerts throughout associated gadgets delaying trigger identification — we knew we needed to rethink our observability method.  

“After the outage, we knew we needed to rethink every little thing. The transformation wasn’t nearly know-how, however empowering our engineers to behave quicker and smarter.”
– Chuck Churchill, Snr. Director, IT Observability, Cisco IT

This inflection level sparked a broader shift in how Cisco IT approached observability. Within the video under, Cisco IT leaders share what modified — and the way it led to a 25% discount in main incidents. 

Getting began

Like all IT problem, understanding the basis trigger was important earlier than remediation planning might start. Constructing stronger digital resilience isn’t any totally different. 

In working with our groups, we pinpointed the next as high problem areas that had been hindering our efforts:

  • Fragmented information and visibility gaps: With over 100,000 endpoints, we generated large volumes of telemetry information. Siloed monitoring instruments led to alert fatigue and visibility gaps, slowing response instances and limiting our capacity to foretell and stop points. 
  • Dangers from frequent adjustments: Our tradition of innovation and early adoption means fixed adjustments to our IT environments. We noticed a direct hyperlink between fast change and elevated incidents, so minimizing disruption with out slowing innovation grew to become important. 
  • Useful resource optimization: As our surroundings and information complexity grew, it grew to become crucial to enhance how we leverage AI and information extra successfully. We would have liked to show our information into actionable insights that empower engineers, not overwhelm them, to make sure productiveness saved tempo with progress. 

“Bringing all our information collectively was step one—turning it into actual, actionable insights is what really permits us to remain resilient as our surroundings evolves.”
– Jon Heaton, Director, Community Engineering and Operations, Cisco IT

Deciding on our three-pronged observability method

Digital resilience requires greater than merely deploying new instruments. We would have liked a holistic method that might span our complete IT panorama. We obtaind this by structuring our IT observability apply throughout three interconnected pillars: 

  1. The community: Safe, dependable community efficiency is important for preserving the enterprise up and working. This pillar focuses on complete community visibility, together with thirdget together supplier networks, to make sure the community is working securely and optimally for a easy person expertise. 
  2. Platforms and information: Right here, we give attention to observability throughout our information facilities, cloud, and underlying infrastructure, centralizing our observability information to be accessible to your entire group, together with our DevOps and SRE groups, by way of our platform consolidation and information technique.
  3. Service Operations: Our Service Operations and Enterprise Operations Heart engineers monitor, analyze, and resolve points utilizing wealthy information and insights delivered from our community, infrastructure, purposes and companies — which feed AI automation to enhance effectivity.  

Including within the crucial know-how

Every of those observability pillars is powered by a mixture of information, know-how, and processes that allow us to make use of the total potential of our information and drive higher effectivity by way of AI automation. These core components are important to our observability method and success:

  • Splunk: Splunk acts because the spine of our observability technique, centralizing information from throughout our community, infrastructure, and purposes to ship a single supply of fact for our IT groups. 
  • ThousandEyes: ThousandEyes delivers end-to-end community visibility and person expertise monitoring throughout inner and exterior environments, enabling fast identification and determination of connectivity points. 
  • Configuration Administration Database (CMDB): Our CMDB offers a single supply of fact for all IT property, enriching alerts and incidents with important context and powering environment friendly, proactive operations. 
  • AI Operations: Our AI programs leverage the centralized observability information in Splunk to automate occasion evaluation, scale back alert fatigue, and speed up incident response — empowering engineers to give attention to higher-value work. For instance, our Service Operations makes use of numerous AI assistants for AI-driven incident administration to foretell incident assignments and recommend decision steps.  

By integrating Splunk with ThousandEyes, our CMDB, and different instruments, we are able to guarantee a seamless, scalable method to observability that grows with our enterprise.

Tangible outcomes

 

This unified observability method has helped us sort out our most urgent challenges and enhance outcomes that collectively strengthen our digital resilience. Up to now 18 months, we’ve seen:

  • Important discount in main incidents: We diminished main incidents by 25% yr over yr and had zero main community incidents, down from 3-4 per quarter beforehand. 
  • Sooner, more practical restoration: We lowered our Imply Time to Detect and Resolve by 45% yr over yr, enabling quicker restoration and minimized disruption. 
  • Improved change administration: We decreased incidents brought on by change by 20%, enabled by unified information insights and end-to-end visibility that helps smarter change administration processes. 
  • Strengthened visibility and information utilization: We now monitor 10x extra community telemetry information that yields deeper insights and 4x higher visibility — enabling earlier detection and proactive decision of potential points earlier than they escalate. 
  • Scalable automation and effectivity:  Centralizing information in Splunk has established a basis for continued developments in AIOps, enabling us to develop AI-automations that now deal with 99.998% of ~4 million every day alert — considerably enhancing operational effectivity. 

“This journey is as a lot about altering mindsets as deploying know-how. We’re empowering each engineer to behave on insights, not simply alerts.”
– Mark Hutchins, Director, IT Service Administration, Cisco IT

Sensible takeaways

Our digital resilience journey is ongoing, however we study extra every day that we share with prospects enhancing their very own journey. We advocate: 

  • Gather telemetry from all over the place: Centralize telemetry to optimize essentially the most related metrics, occasions, logs and traces from throughout your community, infrastructure, cloud, purposes, and companies. 
  • Prioritize information – the bedrock of success: give attention to information high quality and hygiene. Correct, clear information is important for producing reliable insights and enabling efficient automation. 
  • Use what’s already sensible: First, leverage the programs that you’ve: product capabilities which have AI and observability in-built. Second, empower groups to experiment with customized AI options to maximise worth the place your current programs have gaps. 

Keep tuned for updates and learnings as we proceed to innovate and optimize.

“Digital resilience is a shifting goal. We’re all the time studying, adapting, and refining our method as our surroundings evolves.”
– Jon Heaton, Director, Community Engineering and Operations, Cisco IT

 Extra assets:

Discover extra case research to see how Cisco strengthens digital resilience 

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles