Descrição: Contributions to the video captioning in an open-world scenario using deep learning techniques

Contributions to the video captioning in an open-world scenario using deep learning techniques

Video captioning poses a significant challenge within the Computer Vision and Artificial Intelligence domains. It involves the challenging task of translating the visual content of videos into natural language descriptions. Despite significant advancements achieved through deep learning techniques,...

ver descrição completa

Autor principal:	Inácio, Andrei de Souza
Formato:	Tese
Idioma:	Português
Publicado em:	Universidade Tecnológica Federal do Paraná 2023
Assuntos:	Descrição de Vídeos Aprendizado profundo (aprendizado do computador) Visão por computador Processamento de linguagem natural (Computação) Redes neurais (Computação) Percepção de padrões Sistemas de reconhecimento de padrões Big data Video description Deep learning (Machine learning) Computer vision Natural language processing (Computer science) Neural networks (Computer science) Pattern perception Pattern recognition systems CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO Engenharia Elétrica
Acesso em linha:	http://repositorio.utfpr.edu.br/jspui/handle/1/32638
Tags:	Adicionar Tag Sem tags, seja o primeiro a adicionar uma tag!

Resumo:	Video captioning poses a significant challenge within the Computer Vision and Artificial Intelligence domains. It involves the challenging task of translating the visual content of videos into natural language descriptions. Despite significant advancements achieved through deep learning techniques, existing approaches usually perform such a task in a closed-world scenario, assuming all actions, concepts presented in a scene, and vocabulary are known in advance. However, new actions and objects may emerge unexpectedly in real-world applications, and new vocabulary may be necessary to describe those concepts. Therefore, an ideal video captioning approach for an open-world environment should be able to describe known events, detect unknown ones, and adapt incrementally to learn how to describe new events without forgetting what it has already learned. This thesis presents contributions to the video captioning problem in an open-world scenario. The first method, called OSVidCap, was proposed to describe concurrent known events performed by humans in videos and can deal with unknown ones. The second proposed method is an incremental learning approach for video captioning, designed to adapt an existing model to learn new events incrementally. Two novel datasets and a protocol for evaluating video captioning approaches in an open-world scenario are presented. Experimental results conducted on these datasets demonstrate the effectiveness of the proposed methods.

Contributions to the video captioning in an open-world scenario using deep learning techniques

Registros relacionados