Learning rules in data stream miningAlgorithms and applications

  1. Ruiz Sánchez, Elena
Supervised by:
  1. Jorge Casillas Barranquero Director

Defence university: Universidad de Granada

Fecha de defensa: 07 May 2021

Committee:
  1. Francisco Herrera Triguero Chair
  2. Jesús Alcalá Fernández Secretary
  3. Alberto Cano Committee member
  4. Shuo Wang Committee member
  5. Pedro Gonzalez Garcia Committee member
Department:
  1. CIENCIAS DE LA COMPUTACIÓN E INTELIGENCIA ARTIFICIAL

Type: Thesis

Abstract

In this thesis, a fully online algorithm based on learning rules for classification in data streams, CLAST, is proposed. The algorithm dynamically learns a population of rules that together represent the solution to the problem. Rules are a legible knowledge representation form that represent relationships between variables and, consequently, offer the possibility of reaching a considerable level of interpretability detail. Compared to other data stream classifiers, the proposal obtains very competitive results in the experiments carried out. In real-world problems with very high arrival rates and immense volumes of data is often difficult to fi nd data that are completely labeled and structured. Therefore, we explore other learning paradigms, besides supervised learning, that allow us to avoid dependence on timely available labels. In this line, two algorithmic proposals are made. The rst one is Fuzzy-CSar-AFP; an unsupervised learning proposal for direct extraction of association rules in data streams (association stream mining). It is an online proposal, which processes the data one by one at the time of arrival, and is able to directly build and maintain association rules, without the need for a previous stage of frequent itemset identi cation. The last of the proposals, PAST, consists of a semi-supervised method that extends the two previous approaches by combining the ability to extract knowledge from the data labeling with the ability to learn from unlabeled data. In terms of predictive ability, the method presents a good performance in the experiments conducted; improving the results obtained using only labeled data. This means that the algorithm is able to extract knowledge from unlabeled data that allows it to improve its understanding of the problem. Moreover, the viability of association rule extraction in data streams is studied in two real applications. The rst application is based on smartphone usage data, while the second one exploits streams of tweets with political content. In both cases, the analysis of the generated association rules is very useful to understand what is happening over time, providing knowledge that would otherwise be very diffcult to obtain.