Registro fonte: Síntese de voz aplicada ao português brasileiro usando aprendizado profundo

Síntese de voz aplicada ao português brasileiro usando aprendizado profundo

Deep Artificial Neural Networks have been used to solve a wide range of problems. In particular, such methodology allowed to substantially increase the state of the art in the area of speech synthesis. In this work we explored the state of the art of speech synthesis for Brazilian Portuguese, for th...

ver descrição completa

Autor principal:	Casanova, Edresson
Formato:	Trabalho de Conclusão de Curso (Graduação)
Idioma:	Português
Publicado em:	Universidade Tecnológica Federal do Paraná 2020
Assuntos:	Inteligência artificial Redes neurais (Computação) Codificador de voz Artificial intelligence Neural networks (Computer science) Vocoder CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
Acesso em linha:	http://repositorio.utfpr.edu.br/jspui/handle/1/12513
Tags:	Adicionar Tag Sem tags, seja o primeiro a adicionar uma tag!

id	riut-1-12513
recordtype	dspace
spelling	riut-1-125132020-11-16T13:09:32Z Síntese de voz aplicada ao português brasileiro usando aprendizado profundo Speech synthesis applied to brazilian portuguese using deep learning Casanova, Edresson Candido Junior, Arnaldo Candido Junior, Arnaldo Paula Filho, Pedro Luiz de Aikes Junior, Jorge Inteligência artificial Redes neurais (Computação) Codificador de voz Artificial intelligence Neural networks (Computer science) Vocoder CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO Deep Artificial Neural Networks have been used to solve a wide range of problems. In particular, such methodology allowed to substantially increase the state of the art in the area of speech synthesis. In this work we explored the state of the art of speech synthesis for Brazilian Portuguese, for this it was necessary to create an audio base containing approximately 10 hours of a single speaker in the language. Deep neural networks are formed by a number of nodes, or units, connected by links, these nodes represent artificial neurons and are arranged in layers connected by sets of weights. A number of models for voice synthesis were investigated, such as DCTTS, Tacotron and Mozilla TTS. Some experiments were proposed to explore the main models of speech synthesis and vocoders in the literature. The results showed that the Mozilla TTS model sounds more natural and performs better than the other explored models, however, the audio quality synthesized by the DCTTS model is very close. In addition, the use of transfer learning from the English to Portuguese was explored, which demonstrate advantages in the application of such technique. Redes Neurais Artificiais Profundas tem sido utilizadas para solucionar uma ampla gama de problemas. Em particular, tal metodologia permitiu aumentar substancialmente o estado da arte na área de síntese de voz. Neste trabalho explorou-se o estado da arte da síntese de voz para o Português Brasileiro, para tal foi necessário a criação de uma base de áudio contendo aproximadamente 10 horas de um único locutor no idioma. Redes neurais profundas são formadas por um numero de nós, ou unidades, conectados por ligações, estes nós representam neurônios artificiais e são organizados em camadas conectadas por conjuntos de pesos. Uma serie de modelos para síntese de voz foram investigados, a exemplo o DCTTS, o Tacotron e o TTS da Mozilla. Alguns experimentos foram propostos visando explorar os principais modelos de síntese de voz e vocoders da literatura. Os resultados demonstraram que o modelo TTS da Mozilla soa mais natural e possui um melhor desempenho que os demais modelos explorados, entretanto, a qualidade dos áudios sintetizados pelo modelo DCTTS fica muito próxima. Adicionalmente, explorou-se o uso de transferência de aprendizado do idioma Inglês para o Português, o que demonstrou vantagens na aplicação de tal técnica. 2020-11-16T13:09:31Z 2020-11-16T13:09:31Z 2019-07-01 bachelorThesis CASANOVA, Edresson. Síntese de voz aplicada ao português brasileiro usando aprendizado profundo. 2019. Trabalho de conclusão de Curso (Bacharelado em Ciências da Computação) - Universidade Tecnológica Federal do Paraná, Medianeira, 2019. http://repositorio.utfpr.edu.br/jspui/handle/1/12513 por openAccess application/pdf Universidade Tecnológica Federal do Paraná Medianeira Brasil Ciência da Computação UTFPR
institution	Universidade Tecnológica Federal do Paraná
collection	RIUT
language	Português
topic	Inteligência artificial Redes neurais (Computação) Codificador de voz Artificial intelligence Neural networks (Computer science) Vocoder CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
spellingShingle	Inteligência artificial Redes neurais (Computação) Codificador de voz Artificial intelligence Neural networks (Computer science) Vocoder CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO Casanova, Edresson Síntese de voz aplicada ao português brasileiro usando aprendizado profundo
description	Deep Artificial Neural Networks have been used to solve a wide range of problems. In particular, such methodology allowed to substantially increase the state of the art in the area of speech synthesis. In this work we explored the state of the art of speech synthesis for Brazilian Portuguese, for this it was necessary to create an audio base containing approximately 10 hours of a single speaker in the language. Deep neural networks are formed by a number of nodes, or units, connected by links, these nodes represent artificial neurons and are arranged in layers connected by sets of weights. A number of models for voice synthesis were investigated, such as DCTTS, Tacotron and Mozilla TTS. Some experiments were proposed to explore the main models of speech synthesis and vocoders in the literature. The results showed that the Mozilla TTS model sounds more natural and performs better than the other explored models, however, the audio quality synthesized by the DCTTS model is very close. In addition, the use of transfer learning from the English to Portuguese was explored, which demonstrate advantages in the application of such technique.
format	Trabalho de Conclusão de Curso (Graduação)
author	Casanova, Edresson
author_sort	Casanova, Edresson
title	Síntese de voz aplicada ao português brasileiro usando aprendizado profundo
title_short	Síntese de voz aplicada ao português brasileiro usando aprendizado profundo
title_full	Síntese de voz aplicada ao português brasileiro usando aprendizado profundo
title_fullStr	Síntese de voz aplicada ao português brasileiro usando aprendizado profundo
title_full_unstemmed	Síntese de voz aplicada ao português brasileiro usando aprendizado profundo
title_sort	síntese de voz aplicada ao português brasileiro usando aprendizado profundo
publisher	Universidade Tecnológica Federal do Paraná
publishDate	2020
citation	CASANOVA, Edresson. Síntese de voz aplicada ao português brasileiro usando aprendizado profundo. 2019. Trabalho de conclusão de Curso (Bacharelado em Ciências da Computação) - Universidade Tecnológica Federal do Paraná, Medianeira, 2019.
url	http://repositorio.utfpr.edu.br/jspui/handle/1/12513
_version_	1805297540746182656
score	10,814766

Síntese de voz aplicada ao português brasileiro usando aprendizado profundo

Registros relacionados