Algoritmo para classificação multirrótulo baseado em biclusterização
Among the approaches used in machine learning, the classification stands out especially in its single label way. Although that is common, some domains have multiple labels that are such an intrinsic characteristic of the data, therefore it is necessary a multilabel classification approach. Two strat...
Autor principal: | Schmitke, Luiz Rafael |
---|---|
Formato: | Tese |
Idioma: | Português |
Publicado em: |
Pontifícia Universidade Católica do Paraná
2022
|
Assuntos: | |
Acesso em linha: |
http://repositorio.utfpr.edu.br/jspui/handle/1/29739 |
Tags: |
Adicionar Tag
Sem tags, seja o primeiro a adicionar uma tag!
|
Resumo: |
Among the approaches used in machine learning, the classification stands out especially in its single label way. Although that is common, some domains have multiple labels that are such an intrinsic characteristic of the data, therefore it is necessary a multilabel classification approach. Two strategies are possible to get the multilabel classification, either to convert the multilabel problem into one or more single label problems, or to adapt a single label algorithm to deal with a multilabel data. Despite the fact that problem transformation is effective, some algorithms have issues, as fixed parameters to indicate the single label subproblem quantity and the maintenance of the preexistent relationship among the labels do not use correlation nor co-occurrence measures. Among the categories of algorithms to work with the problem transformation, it was chosen one that allows a transformation from a multilabel problem to n binary problems. That has a characteristic of having a low runtime, which allows to use more complex single label algorithms in the classification stage, like neural networks or deep learning, but it also shows a lower performance in multilabel metrics. Thus, this work shows the BicbPT algorithm that uses the biclustering and multilabel-binary problem transformation to minimize those problems and improve the multilabel metrics without losing the low execution time characteristic of this category. It was chosen the algorithms BR, CC, ECC, RAkEL and LP with SVM, C4.5 and Naïve Bayes to evaluate the proposed method and 12 datasets with distinct complexities and different domains. The experiments show that the BicbPT obtains better performance in the multilabel metrics than the multilabel-binary algorithms, being similar only to the ECC, but in this one the execution time is up to 10 times higher. The BicbPT also keeps the lower execution time, characteristic of the multilabel-binary category. Finally, comparing the two versions of the BicbPT is possible to realize that the way labels influence each other allows improving the multilabel classification, and not only considering the maintenance of relationships in the n transformed problems. |
---|