Clusterização e análise de tweets com foco em postagens relacionadas às ações da Petrobrás

Brasil, Bolsa e Balcão (B3), responsible for R$6.45 trillion in transactions in 2020, directly and indirectly contributes to the increase of information disseminated by social media, impacting the stock market. Because there is a large amount, investors cannot analyze them, so having an artifice tha...

ver descrição completa

Autor principal: Murato, Demetrius Milton
Formato: Trabalho de Conclusão de Curso (Graduação)
Idioma: Português
Publicado em: Universidade Tecnológica Federal do Paraná 2022
Assuntos:
Acesso em linha: http://repositorio.utfpr.edu.br/jspui/handle/1/27571
Tags: Adicionar Tag
Sem tags, seja o primeiro a adicionar uma tag!
Resumo: Brasil, Bolsa e Balcão (B3), responsible for R$6.45 trillion in transactions in 2020, directly and indirectly contributes to the increase of information disseminated by social media, impacting the stock market. Because there is a large amount, investors cannot analyze them, so having an artifice that contributes to the grouping of news related to the same subject can contribute to the performance of investors. Given this scenario, the present work used unsupervised machine learning to group posts collected from Twitter related to Petrobras' stocks. Originating from data collection through synchronization with the Twitter API platform, preprocessing was performed based on text mining techniques, application of BagofWords (BoW) and Term FrequencyInverse Document Frequency (TF) IDF) to define the most recurrent terms and the weight of each post until grouping is carried out. In this case, for comparison, a direct grouping of the matrix obtained by TFIDF and another grouping after resizing the weight matrix by the Main Component Analysis (PCA) was performed. In order to confront and facilitate the visualization of the main differences, scatter plots and word clouds were created for each grouping. The results obtained showed that performing grouping in a matrix resized by the Principal Component Analysis has a better performance for the separation of related texts, contributing to its interpretation.