O uso da rede neural convolucional como extrator de características aplicado ao problema de identificação de escritores

Context: In the context of the writer identification, researchers often propose different methods for extraction, processing of features and classification. In which we can divide the writer identification in two groups. The first one extracted local features related to writing, like as spacing, con...

ver descrição completa

Autor principal: Righetto, Guilherme
Formato: Trabalho de Conclusão de Curso (Graduação)
Idioma: Português
Publicado em: Universidade Tecnológica Federal do Paraná 2020
Assuntos:
Acesso em linha: http://repositorio.utfpr.edu.br/jspui/handle/1/6031
Tags: Adicionar Tag
Sem tags, seja o primeiro a adicionar uma tag!
Resumo: Context: In the context of the writer identification, researchers often propose different methods for extraction, processing of features and classification. In which we can divide the writer identification in two groups. The first one extracted local features related to writing, like as spacing, concavity, angulations, among others. The second represents writing through texture descriptors, which extract global features. Bases of handwritten documents usually have a unique writing style. However, currently the content of each sample of the same writer produces documents in different languages, such as Portuguese, Arabic, English, German, among others. When considering more than one writing style for the problem of identifying writers, the task becomes even more difficult as it is necessary to construct an identification system independent of the alphabet used. In order to solve the problem of writer identification who produce documents in different languages, several techniques were used as mentioned above. One of them is to use a convolutional neural network (CNN) as a feature extractor and classifier, in which it was also used in this work, in addition to the dissimilarity approach, which turns a n-classes problem into binary. Objective: The main objective of this work is to evaluate the performance provided by the features extracted by the convolutional neural network (CNN) in the process of the off-line writer identification. For this we will use the databases BFL, CVL and QUWI. Method: The method proposed in this work fulfilled the following steps: Preprocessed manuscript documents using a texture generation approach. Subsequently the texture was divided into blocks of different sizes. In the next step using the CNN classifier / universal features extractor, two CNN resources were used, the classification of each input block, that is, the traditional CNN classifier process and also the use of CNN as a features extractor. Then, dissimilarity feature vectors are computed by feature vectors extracted from each block. In the next step, the SVM classifier was used to classify the texture blocks. Finally, a combination of the predictions generated from each block was made by the SVM and CNN in order to obtain a final decision on who wrote a particular document. Results: The main results were obtained using the dissimilarity approach in the feature vector extracted by the convolutional neural network, 98.26% (BFL), 97.91% (CVL) and 86.96% (QUWI). Conclusions: We conclude in this work that the dissimilarity approach remains robust in relation to the writer identification of handwritten documents of different languages. In addition, it was observed that the features extracted by CNN obtained good results in cases where the written language followed a similar alphabet.