New applications of models based on imprecise probabilities within data mining
- Moral García, Serafín
- Joaquín Abellán Mulero Codirector
- Carlos Javier Mantas Codirector
Universidad de defensa: Universidad de Granada
Fecha de defensa: 21 de diciembre de 2022
- José del Sagrado Martínez Presidente/a
- Silvia Acid Carrillo Secretaria
- Andrés Ramón Masegosa Arredondo Vocal
Tipo: Tesis
Resumen
When we have information about a finite set of possible alternatives provided by an expert or dataset, a mathematical model is needed to represent such information. In some cases, a unique probability distribution is not appropriate for this purpose because the available information is not sufficient. For this reason, several mathematical theories and models based on imprecise probabilities have been developed in the literature. In this thesis work, we analyze the relations between some imprecise probability theories and study the properties of some models based on imprecise probabilities. When imprecise probability theories and models arise, tools for quantifying the uncertaintybased information in such theories and models, usually called uncertainty measures, are needed. In this thesis work, we analyze the properties of some existing uncertainty measures in theories based on imprecise probabilities and propose uncertainty measures in imprecise probability theories and models that present some advantages over the existing ones. Situations in which it is necessary to represent the information provided by a dataset about a finite set of possible alternatives arise in classification, an essential task within Data Mining. This well-known task consists of predicting, for a given instance described via a set of attributes, the value of a variable under study, known as the class variable. In classification, it is often needed to quantify the uncertainty-based information about the class variable. For this purpose, classical probability theory (PT) has been employed for many years. In the last years, classification algorithms that represent the information about the class variable via imprecise probability models have been developed. Via experimental studies, it has been shown that classification methods based on imprecise probabilities significantly outperform the ones that utilize PT when data contain errors. When classifying an instance, classifiers tend to predict a single value of the class variable. Nonetheless, in some cases, there is not enough information available for a classifier to point out a single class value. In these situations, it is more logical that classifiers predict a set of class values instead of a single value of the class variable. This is known as Imprecise Classification. Classification algorithms (including Imprecise Classification) often aim to minimize the number of instances erroneously classified. This would be optimal if all classification errors had the same importance. Nevertheless, in practical applications, different classification errors usually lead to different costs. For this reason, classifiers that take the misclassification costs into account, also known as cost-sensitive classifiers, have been developed in the literature. Traditional classification (including Imprecise Classification) assumes that each instance has a single value of a class variable. However, in some domains, this task does not fit well because an instance may belong to multiple labels simultaneously. In these domains, the Multi-Label Classification task (MLC) is more suitable than traditional classification. MLC aims to predict the set of labels associated with a given instance described via an attribute set. Most of the MLC methods proposed so far represent the information provided by an MLC dataset about the set of labels via classical PT. In this thesis work, we develop new classification algorithms based on imprecise probability models, including Imprecise Classification, cost-sensitive Imprecise Classification, and MLC, that present some advantages and obtain better experimental results than the ones of the state-of-the-art.