Learning rules in data stream mining: Algorithms and applications

Ruiz Sánchez, Elena

Learning rules in data stream miningAlgorithms and applications

Ruiz Sánchez, Elena

Supervised by:

Jorge Casillas Barranquero Director

Defence university: Universidad de Granada

Fecha de defensa: 07 May 2021

Committee:

Francisco Herrera Triguero Chair
Jesús Alcalá Fernández Secretary
Alberto Cano Committee member
Shuo Wang Committee member
Pedro Gonzalez Garcia Committee member

Department:

CIENCIAS DE LA COMPUTACIÓN E INTELIGENCIA ARTIFICIAL

Type: Thesis

Teseo: 662560 DIALNET DIGIBUG editor

Abstract

In this thesis, a fully online algorithm based on learning rules for classification in data streams, CLAST, is proposed. The algorithm dynamically learns a population of rules that together represent the solution to the problem. Rules are a legible knowledge representation form that represent relationships between variables and, consequently, offer the possibility of reaching a considerable level of interpretability detail. Compared to other data stream classifiers, the proposal obtains very competitive results in the experiments carried out. In real-world problems with very high arrival rates and immense volumes of data is often difficult to fi nd data that are completely labeled and structured. Therefore, we explore other learning paradigms, besides supervised learning, that allow us to avoid dependence on timely available labels. In this line, two algorithmic proposals are made. The rst one is Fuzzy-CSar-AFP; an unsupervised learning proposal for direct extraction of association rules in data streams (association stream mining). It is an online proposal, which processes the data one by one at the time of arrival, and is able to directly build and maintain association rules, without the need for a previous stage of frequent itemset identi cation. The last of the proposals, PAST, consists of a semi-supervised method that extends the two previous approaches by combining the ability to extract knowledge from the data labeling with the ability to learn from unlabeled data. In terms of predictive ability, the method presents a good performance in the experiments conducted; improving the results obtained using only labeled data. This means that the algorithm is able to extract knowledge from unlabeled data that allows it to improve its understanding of the problem. Moreover, the viability of association rule extraction in data streams is studied in two real applications. The rst application is based on smartphone usage data, while the second one exploits streams of tweets with political content. In both cases, the analysis of the generated association rules is very useful to understand what is happening over time, providing knowledge that would otherwise be very diffcult to obtain.