Business driven alerts correlation in network management environments
- SALAH, SAEED
- Jesús Esteban Díaz Verdejo Co-Doktorvater
- Gabriel Maciá Fernández Co-Doktorvater
Universität der Verteidigung: Universidad de Granada
Fecha de defensa: 18 von Mai von 2015
- Juan Manuel López Soler Präsident
- José Camacho Páez Sekretär
- Pablo Pavón Mariño Vocal
- Ramón Agüero Calvo Vocal
- Óscar Esparza Martín Vocal
Art: Dissertation
Zusammenfassung
As telecommunication networks evolve rapidly in terms of scalability, complexity and heterogeneity, the efficiency of incident management procedures and the accuracy in the detection of anomalous behaviors are becoming important factors that largely influence on the decision making process in large Information Technology (IT) service companies. For this reason, these companies are doing big efforts investing in new technologies and projects aimed at finding efficient management solutions. One of the challenging issues for network and system management teams is that of dealing with the huge amount of alerts generated by the managed systems and networks. Currently, alerts correlation is the primary technique used to handle this issue. Despite the big amount of research efforts that have been carried out in the alerts correlation field, this is still an active research area in network management. This is mainly due to the fact that the efficiency and robustness of the models used and the algorithms proposed vary from system to system, but none of them have already succeeded to provide an optimal solution to this problem in terms of reducing and aggregating the number of alerts to a single alert per incident. On the other hand, Incident Ticketing Systems (ITSs) play an important role in maintaining modern telecommunication networks and have been mainly introduced to assist in speeding up the incident recovery process and adding more advanced functions to incident solving. As the bulk of the tickets are normally created manually, they constitute ideal candidates to be used into the alerts correlation procedure to add more semantic information and human knowledge, coming from the management staff and also from the end users of the services (through Service Desks). Although tickets reflect the business point of view of the incidents, to the best of our knowledge, and despite its potential usefulness, few efforts have been devoted to the incorporation of the information from an incident ticketing system into the alerts correlation procedure itself. In this work, we propose a generic tickets-alerts correlation architecture composed of three main parts: alerts correlation, incident tickets correlation and tickets-alerts correlation. In the alerts correlation part, a survey of the state-of-the-art in alerts correlation techniques is first presented. Unlike other authors, we consider here that the correlation process is a common problem for different fields in the industry, and not only for network or security management. Thus, we focus on showing the broad influence of this problem. Additionally, we suggest an alerts correlation model capable of modeling current and prospective proposals. Finally, we also review some of the most important commercial products currently available. In the incident tickets correlation part, we first check that, in many cases, the handling of tickets by a management team is not completely systematic and may be incoherent and inefficient. This way, irrelevant or redundant tickets for a same incident are likely issued, thus creating a redundancy in the system that leads to inefficiencies. To handle this issue, we suggest a model aimed to correlate redundant tickets in order to ideally reduce the information to a single ticket per incident. Using this model as a basis, we also develop and evaluate a methodology that assesses the efficiency of a management team during the process of tickets creation and management. In the last part, we propose and test a model for the joint correlation of tickets and alerts. Finally, we validate the proposed correlation models by evaluating them with two datasets taken from a real incident ticketing system of an IT service company, in order to analyze their applicability and usefulness by targeting them at three main applications: how can the models be used to evaluate the tickets creation process, how can the models be used to improve the alerts correlation process, and finally, how to use them in evaluating the management team in terms of their speed, accuracy and the influence of each management group in the whole incident resolving process. These analyses can be leveraged for improving both the management groups functioning and the policies for tickets creation and incident management. The results of this work show that incorporating the ticketing information in the alerts correlation process will permit obtaining better correlation rates, i.e., a bigger and more reliable reduction in the number of alerts. By using these models, decision makers would get more accurate information about the real incidents happening in the network and their descriptions, so that decisions and prioritization procedures would be more precise. At the same time, the proposed methods are based on simple elements and reasonings, making their applications in a real network management system almost straightforward.