Registro fonte: Reconhecimento de voz utilizando WAV2VEC 2.0 para o português brasileiro

Reconhecimento de voz utilizando WAV2VEC 2.0 para o português brasileiro

Deep learning techniques have been shown to be efficient in various tasks, especially in the development of speech recognition systems, that is, systems that aim to transcribe a sentence in audio in a sequence of words. Despite the progress in the area, its development can still be considered a diff...

ver descrição completa

Autor principal:	Gris, Lucas Rafael Stefanel
Formato:	Trabalho de Conclusão de Curso (Graduação)
Idioma:	Português
Publicado em:	Universidade Tecnológica Federal do Paraná 2022
Assuntos:	Sistemas de reconhecimento de padrões Redes neurais (Computação) Reconhecimento automático da voz Pattern recognition systems Neural networks (Computer science) Automatic speech recognition CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
Acesso em linha:	http://repositorio.utfpr.edu.br/jspui/handle/1/29999
Tags:	Adicionar Tag Sem tags, seja o primeiro a adicionar uma tag!

id	riut-1-29999
recordtype	dspace
spelling	riut-1-299992022-10-25T06:05:45Z Reconhecimento de voz utilizando WAV2VEC 2.0 para o português brasileiro Speech recognition using WAV2VEC 2.0 for brazilian portuguese Gris, Lucas Rafael Stefanel Candido Junior, Arnaldo Soares, Anderson da Silva Aikes Junior, Jorge Paula Filho, Pedro Luiz de Candido Junior, Arnaldo Sistemas de reconhecimento de padrões Redes neurais (Computação) Reconhecimento automático da voz Pattern recognition systems Neural networks (Computer science) Automatic speech recognition CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO Deep learning techniques have been shown to be efficient in various tasks, especially in the development of speech recognition systems, that is, systems that aim to transcribe a sentence in audio in a sequence of words. Despite the progress in the area, its development can still be considered a difficult task, especially when there is a lack of data available, as in Brazilian Portuguese. In this sense, this work aims to validate the development of an Automatic Speech Recognition using only open available audio data, from the fine-tuning of the Wav2Vec 2.0 XLSR-53 model pre-trained in many languages, for the Brazilian Portuguese. The final obtained model presents a WER of 11.95%, 13% less than the best open ASR model for Brazilian Portuguese available, which is a promising result in the area. In general, this work validates the use of self-supervising learning techniques, in special, the use of the Wav2vec 2.0 architecture in the development of robust ASRs, even when there is a few available data, and also exposes possible enhancements that can improve even more the obtained result. Técnicas de aprendizado profundo tem se mostrado muito eficientes nas mais diversas tarefas, em especial, no desenvolvimento de sistemas de reconhecimento de voz, isto é, sistemas que procuram transcrever sentenças em áudio em sequências de palavras ou textos. Apesar do avanço na área, seu desenvolvimento ainda pode ser considerado uma tarefa difícil, especialmente quando existem poucos dados abertos disponíveis, como no Português Brasileiro. Nesse cenário, este trabalho apresenta o objetivo de validar o desenvolvimento de um reconhecedor de voz utilizando somente bases abertas disponíveis, a partir do ajuste do modelo Wav2Vec 2.0 XLSR-53 pré-treinado em muitas línguas, para o Português Brasileiro. O modelo final obtido apresenta um WER de 11,95%, 13% a menos que o melhor modelo aberto para o Português Brasileiro disponível, o que é um resultado promissor na área. Em suma, este trabalho valida a utilização das técnicas de aprendizado auto-supervisionado, em especial, a utilização da arquitetura Wav2vec 2.0, no desenvolvimento de ASRs robustos, mesmo quando há poucos dados disponíveis, e também expõe possíveis melhorias que podem aprimorar ainda mais o resultado obtido. 2022-10-24T16:54:53Z 2022-10-24T16:54:53Z 2021-05-05 bachelorThesis GRIS, Lucas Rafael Stefanel. Reconhecimento de voz utilizando WAV2VEC 2.0 para o português brasileiro. Trabalho de Conclusão de Curso (Bacharelado em Ciência da Computação) - Universidade Tecnológica Federal do Paraná, Medianeira, 2021. http://repositorio.utfpr.edu.br/jspui/handle/1/29999 por openAccess application/pdf Universidade Tecnológica Federal do Paraná Medianeira Brasil Ciência da Computação UTFPR
institution	Universidade Tecnológica Federal do Paraná
collection	RIUT
language	Português
topic	Sistemas de reconhecimento de padrões Redes neurais (Computação) Reconhecimento automático da voz Pattern recognition systems Neural networks (Computer science) Automatic speech recognition CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
spellingShingle	Sistemas de reconhecimento de padrões Redes neurais (Computação) Reconhecimento automático da voz Pattern recognition systems Neural networks (Computer science) Automatic speech recognition CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO Gris, Lucas Rafael Stefanel Reconhecimento de voz utilizando WAV2VEC 2.0 para o português brasileiro
description	Deep learning techniques have been shown to be efficient in various tasks, especially in the development of speech recognition systems, that is, systems that aim to transcribe a sentence in audio in a sequence of words. Despite the progress in the area, its development can still be considered a difficult task, especially when there is a lack of data available, as in Brazilian Portuguese. In this sense, this work aims to validate the development of an Automatic Speech Recognition using only open available audio data, from the fine-tuning of the Wav2Vec 2.0 XLSR-53 model pre-trained in many languages, for the Brazilian Portuguese. The final obtained model presents a WER of 11.95%, 13% less than the best open ASR model for Brazilian Portuguese available, which is a promising result in the area. In general, this work validates the use of self-supervising learning techniques, in special, the use of the Wav2vec 2.0 architecture in the development of robust ASRs, even when there is a few available data, and also exposes possible enhancements that can improve even more the obtained result.
format	Trabalho de Conclusão de Curso (Graduação)
author	Gris, Lucas Rafael Stefanel
author_sort	Gris, Lucas Rafael Stefanel
title	Reconhecimento de voz utilizando WAV2VEC 2.0 para o português brasileiro
title_short	Reconhecimento de voz utilizando WAV2VEC 2.0 para o português brasileiro
title_full	Reconhecimento de voz utilizando WAV2VEC 2.0 para o português brasileiro
title_fullStr	Reconhecimento de voz utilizando WAV2VEC 2.0 para o português brasileiro
title_full_unstemmed	Reconhecimento de voz utilizando WAV2VEC 2.0 para o português brasileiro
title_sort	reconhecimento de voz utilizando wav2vec 2.0 para o português brasileiro
publisher	Universidade Tecnológica Federal do Paraná
publishDate	2022
citation	GRIS, Lucas Rafael Stefanel. Reconhecimento de voz utilizando WAV2VEC 2.0 para o português brasileiro. Trabalho de Conclusão de Curso (Bacharelado em Ciência da Computação) - Universidade Tecnológica Federal do Paraná, Medianeira, 2021.
url	http://repositorio.utfpr.edu.br/jspui/handle/1/29999
_version_	1805309312013172736
score	10,814766

Reconhecimento de voz utilizando WAV2VEC 2.0 para o português brasileiro

Registros relacionados