Redução de dimensionalidade: aplicação de algoritmos de seleção e extração de atributos

The diagnosis of genetic diseases such as cancer has advanced with the evolution of techniques for obtaining genetic data, and the number of mapped genes has increased significantly and consequently the complexity in the analysis of these data due to the small number of samples. Techniques such as S...

ver descrição completa

Autor principal: De Julio, João Pedro Evaristo
Formato: Trabalho de Conclusão de Curso (Graduação)
Idioma: Português
Publicado em: Universidade Tecnológica Federal do Paraná 2020
Assuntos:
Acesso em linha: http://repositorio.utfpr.edu.br/jspui/handle/1/15988
Tags: Adicionar Tag
Sem tags, seja o primeiro a adicionar uma tag!
Resumo: The diagnosis of genetic diseases such as cancer has advanced with the evolution of techniques for obtaining genetic data, and the number of mapped genes has increased significantly and consequently the complexity in the analysis of these data due to the small number of samples. Techniques such as Selection (with the Filter, Wrapper, and Embedded approaches) and Attribute Extraction make it possible to reduce dimensionality, which in addition to removing irrelevant or redundant attributes, makes it easier to understand the results. Attribute Selection aims to find relevant attributes to increase the predictive capacity of classifiers while Attribute Extraction performs transformation operations without losing data’s properties. Thus, this paper presents an application of Attribute Extraction techniques on selected subsets through Attribute Selection. The proposed combination uses sequential search to select attributes with two algorithms of the Filter approach and seven ways to reduce the Wrapper approach. In each subset, PCA was applied with 90, 95 and 99% of the attributes. For the experiments, five genetic databases with thousands of attributes per sample were used. When analyzing the classification rate with seven different classifiers, can be noted a significant increase in the data classification rate after applying the combination of techniques, resulting in an increase of up to 12% in the worst case.