Network and infrastructure management has always been a complicated business and keeping everything running smoothly sometimes takes up far too much of everyone’s time. But in the last few years things have come to a head.
That’s because there are now multiple levels of complexity thanks to cloud, hybrid, and data center-bound systems, software-defined and legacy networking, virtualization, containerization, and much more. And, digital transformation efforts are accelerating this proliferation in complexity.
The result is so much “stuff”, of such a variety, that it is rapidly becoming untenable to expect the limited numbers of networking staff that organizations have available to them to understand it, manage it, and keep it running within the required parameters. That’s why enterprises are increasingly looking to artificial intelligence for IT operations (AIOps) solutions for help: by 2023, 40% of DevOps teams will augment application and infrastructure monitoring tools with AIOps, Gartner predicts.
The current state of AIOps
What exactly constitutes AIOps? Gartner defines it as follows: “AIOps combines big data and machine learning to automate IT operations processes, including event correlation, anomaly detection and causality determination.” Today AIOps solutions promise to spot networking problems and the causes of those problems very quickly, and to help optimize IT operations for maximum efficiency.
Fundamentally, AIOps platforms work by leveraging better communication to solve problems. Without them, IT staff of various specializations have their own tools for the situations they might face. When an incident occurs, staff use their tools to examine what is going on in their own silos, and the challenge is to link a cause in one silo with an effect in another. That relies on everyone communicating with each other effectively to establish a causal link, and this can take a great deal of time.
AIOps gets around this problem by breaking down the communication barrier between silos. To get an AIOps platform to work effectively, it’s first necessary to get as much relevant data as possible from all the different silos, and put it into one large data lake. Using that information, the AIOps solution can start to analyze this to identify a problem when it occurs (or perhaps is about to occur),what the cause is, and how to fix it.
Also read: SD-WAN is Important for an IoT and AI Future
What challenges is the sector facing?
One of the biggest challenges that the sector faces stems from the fact that these systems use machine learning to spot patterns and correlations. But here the old adage about “rubbish in, rubbish out” holds true. Specifically, many organizations embarking in AIOps initiatives soon realize that they are limited by the poor quality of the data that they can make available to AIOps systems — initially, at least. In many cases data may be missing or incomplete, or just inconsistent.
Another challenge is how AIOps can be used to optimize IT operations automatically. Being able to spot and establish the cause of problems comes down to analysis of good data, but optimizing operations requires more than good data: it also requires a high degree of “domain knowledge”: all the things that a human knows about the enterprise’s business patterns and how those impact IT decisions. Sharing this type of domain knowledge can be hard, but until this is improved AIOps solutions can’t learn what they need to know to automate infrastructure optimization.
One final challenge worth mentioning is the problems that organizations often face when they first get AIOps solutions up and running; put simply, they are often not ready for the many incidents that these systems may quickly start to detect.
What’s going on here is that many organizations’ monitoring systems may be relatively old, perhaps five years old, while their infrastructure has evolved and modernized over that period. That means that AIOps solutions may very quickly detect minor problems that have built up over that intervening five years that enterprises were never aware of, or were unable to track down the root cause of it.
What advances are being made?
One of the key changes taking place now stems from the realization that there has been, and still is, a “quick win problem” with AIOps. What this means is that many organizations have been looking to AIOps for quick ways to solve existing problems such as reducing their fault finding times.
But for AIOps to have a really significant impact on enterprise IT activities it’s necessary to go beyond the tactical and start thinking how it can be used more strategically. Rather than being used to help to carry out existing tasks quicker and with less human resources involved, enterprises need to start looking at how AIOps can help them fundamentally reengineer their operations so that it can achieve far more with the limited resources that they have — both human and machine.
Also read: Will Network Admins Disappear in an Automated Cloud Future?
What does this mean for the future of networking?
The likely path that AIOps vendors will take will lead to much higher levels of automation in network management by using a closed-loop, self-driving approach. What this means is that an AIOps solution will build its knowledge of a system, watch for signs of problems, match a problem with a known solution or group of solutions, choose a solution based on probabilities of being successful, and then initiate the chosen solution. Finally, it will monitor the results and evaluate the outcome, and use this information to learn how to make a more accurate decision next time.
Where AIOps will be most impactful
In the same way that technologies such as virtualization were first tested in non-mission critical applications, it’s likely that these closed-loop systems will first be trialled on relatively low-risk parts of network infrastructure while the technology evolves. Despite being low risk, improvements to these network operations promise to free up significant amounts of networking staff time, enabling them to work on less tedious and more productive tasks.
Ultimately, no-one knows for sure how AIOps will be put to best effect. But what is fairly certain is that it won’t be to enable enterprises to do what they are doing now faster. Rather, it will be to enable them to do something completely different that, without AIOps, simply wouldn’t be possible at all.
Read next: AIOps Will Mean the End of Human Network Management