Redução de dimensionalidade: aplicação de algoritmos de seleção e extração de atributos
The diagnosis of genetic diseases such as cancer has advanced with the evolution of techniques for obtaining genetic data, and the number of mapped genes has increased significantly and consequently the complexity in the analysis of these data due to the small number of samples. Techniques such as S...
Autor principal: | De Julio, João Pedro Evaristo |
---|---|
Formato: | Trabalho de Conclusão de Curso (Graduação) |
Idioma: | Português |
Publicado em: |
Universidade Tecnológica Federal do Paraná
2020
|
Assuntos: | |
Acesso em linha: |
http://repositorio.utfpr.edu.br/jspui/handle/1/15988 |
Tags: |
Adicionar Tag
Sem tags, seja o primeiro a adicionar uma tag!
|
Resumo: |
The diagnosis of genetic diseases such as cancer has advanced with the evolution of techniques for obtaining genetic data, and the number of mapped genes has increased significantly and consequently the complexity in the analysis of these data due to the small number of samples. Techniques such as Selection (with the Filter, Wrapper, and Embedded approaches) and Attribute Extraction make it possible to reduce dimensionality, which in addition to removing irrelevant or redundant attributes, makes it easier to understand the results. Attribute Selection aims to find relevant attributes to increase the predictive capacity of classifiers while Attribute Extraction performs transformation operations without losing data’s properties. Thus, this paper presents an application of Attribute Extraction techniques on selected subsets through Attribute Selection. The proposed combination uses sequential search to select attributes with two algorithms of the Filter approach and seven ways to reduce the Wrapper approach. In each subset, PCA was applied with 90, 95 and 99% of the attributes. For the experiments, five genetic databases with thousands of attributes per sample were used. When analyzing the classification rate with seven different classifiers, can be noted a significant increase in the data classification rate after applying the combination of techniques, resulting in an increase of up to 12% in the worst case. |
---|