Novel approaches and improvements in traffic classification

Jawad, Khalife

Novel approaches and improvements in traffic classification

Jawad, Khalife

Dirigida por:

Amjad Hajjar Codirector/a
Jesús Esteban Díaz Verdejo Codirector

Universidad de defensa: Universidad de Granada

Fecha de defensa: 16 de febrero de 2016

Tribunal:

Juan Manuel López Soler Presidente
Juan José Ramos Muñoz Secretario
Javier Aracil Rico Vocal
Eduardo Magaña Lizarrondo Vocal
David Larrabeiti López Vocal

Departamento:

ELECTRÓNICA Y TECNOLOGÍA DE COMPUTADORES

Tipo: Tesis

Teseo: 408077 DIALNET DIGIBUG editor

Resumen

Identifying Internet traffic applications is centric to many network security and management tasks. With the steady emergence of Internet applications, encryption and obfuscation techniques, researchers are facing various challenges in accurately identifying different applications. An optimal traffic-classification model has yet to be defined despite of the effort devoted by the research community in the last decade. This thesis aims to provide an analytical review of existing traffic identification methods, while suggesting novel enhancements and approaches. First, with the remarkably increasing number of papers in the literature covering traffic classification, we survey most recent works and propose a systematic multilevel taxonomy. Expressed in a consistent terminology, our proposed taxonomy can promote and unify the research efforts in designing the best future traffic identification model, which we describe and characterize while underlining main research requirements. Moreover, the different categories (e.g. payload-based, statistical, machine learning, graphical, etc.) are compared, as found in the literature, in terms of performance, accuracy, ability to detect critical applications like peer-to-peer, and other relevant criteria. The current research trend is also analyzed in the light of the surveyed works. Second, with the lack of reference datasets for evaluating and comparing traffic classification methods, we collected real traffic sets in a significant volume and extracted all the parameters relevant to the classification process using customized tools. Third, we assess payload based traffic identification methods and propose an optimization that can best achieve the trade-off between the classification accuracy and the user’s privacy protection. For this purpose, we assess the performance of Deep Packet Inspection, and present a customized sampling policy for the traffic payloads. According to our testbed, promising results, related to the classification time gain, were obtained at the cost of less inspected payload while maintaining the classification accuracy. Finally, we assess blind identification models. First, we explore the discriminative power of traffic features at the application layer by proposing a flow-based classifier relying on application message lengths analysis. Our approach analyzes application layer messages without breaching the user privacy. For this model, we propose a novel flow-based classifier using multi-modal distributions. Evaluated on a real captured dataset, the results evidence the goodness of the proposal and the existence of discriminative information regarding traffic classification in the sizes of the exchanged messages. Then, we discuss an extended model for multi-label host classification, as most of the current host-based classification models do not reflect real usage scenarios, where a host might be using more than one application. For this purpose, we choose a host classification method, based on graphical techniques, that we enhance and extend to handle multi-label classification. Our proposal can regarded as an attempt to radically change the conventional view of monolabel host classification. Our results show an improvement in classification accuracy for most protocols including peer-to-peer.