Novel approaches and improvements in traffic classification

  1. Jawad, Khalife
Supervised by:
  1. Amjad Hajjar Co-director
  2. Jesús Esteban Díaz Verdejo Co-director

Defence university: Universidad de Granada

Fecha de defensa: 16 February 2016

Committee:
  1. Juan Manuel López Soler Chair
  2. Juan José Ramos Muñoz Secretary
  3. Javier Aracil Rico Committee member
  4. Eduardo Magaña Lizarrondo Committee member
  5. David Larrabeiti López Committee member
Department:
  1. ELECTRÓNICA Y TECNOLOGÍA DE COMPUTADORES

Type: Thesis

Abstract

Identifying Internet traffic applications is centric to many network security and management tasks. With the steady emergence of Internet applications, encryption and obfuscation techniques, researchers are facing various challenges in accurately identifying different applications. An optimal traffic-classification model has yet to be defined despite of the effort devoted by the research community in the last decade. This thesis aims to provide an analytical review of existing traffic identification methods, while suggesting novel enhancements and approaches. First, with the remarkably increasing number of papers in the literature covering traffic classification, we survey most recent works and propose a systematic multilevel taxonomy. Expressed in a consistent terminology, our proposed taxonomy can promote and unify the research efforts in designing the best future traffic identification model, which we describe and characterize while underlining main research requirements. Moreover, the different categories (e.g. payload-based, statistical, machine learning, graphical, etc.) are compared, as found in the literature, in terms of performance, accuracy, ability to detect critical applications like peer-to-peer, and other relevant criteria. The current research trend is also analyzed in the light of the surveyed works. Second, with the lack of reference datasets for evaluating and comparing traffic classification methods, we collected real traffic sets in a significant volume and extracted all the parameters relevant to the classification process using customized tools. Third, we assess payload based traffic identification methods and propose an optimization that can best achieve the trade-off between the classification accuracy and the user’s privacy protection. For this purpose, we assess the performance of Deep Packet Inspection, and present a customized sampling policy for the traffic payloads. According to our testbed, promising results, related to the classification time gain, were obtained at the cost of less inspected payload while maintaining the classification accuracy. Finally, we assess blind identification models. First, we explore the discriminative power of traffic features at the application layer by proposing a flow-based classifier relying on application message lengths analysis. Our approach analyzes application layer messages without breaching the user privacy. For this model, we propose a novel flow-based classifier using multi-modal distributions. Evaluated on a real captured dataset, the results evidence the goodness of the proposal and the existence of discriminative information regarding traffic classification in the sizes of the exchanged messages. Then, we discuss an extended model for multi-label host classification, as most of the current host-based classification models do not reflect real usage scenarios, where a host might be using more than one application. For this purpose, we choose a host classification method, based on graphical techniques, that we enhance and extend to handle multi-label classification. Our proposal can regarded as an attempt to radically change the conventional view of monolabel host classification. Our results show an improvement in classification accuracy for most protocols including peer-to-peer.