Segmentação e classificação de espécimes de herbário: um estudo de caso com a família piperaceae giseke
Herbaria are deposits of dehydrated plants or fungi that register a region’s richness. In fact,more than 3,500 herbaria worldwide host approximately 400 million specimens, thousands of which have not been identified due to slowness in the process of name determining and the sheer lack of taxonomists...
Autor principal: | Kajihara, Alexandre Yuji |
---|---|
Formato: | Dissertação |
Idioma: | Português |
Publicado em: |
Universidade Tecnológica Federal do Paraná
2023
|
Assuntos: | |
Acesso em linha: |
http://repositorio.utfpr.edu.br/jspui/handle/1/32342 |
Tags: |
Adicionar Tag
Sem tags, seja o primeiro a adicionar uma tag!
|
Resumo: |
Herbaria are deposits of dehydrated plants or fungi that register a region’s richness. In fact,more than 3,500 herbaria worldwide host approximately 400 million specimens, thousands of which have not been identified due to slowness in the process of name determining and the sheer lack of taxonomists. A promising solution for such an issue is the automated identification of specimens. The current analysis aims at proposing an approach based on Machine Learning for the identification of herbarium samples at species level. Piperaceae was the botanic family selected for this study since samples’ entification is highly complex due to the great number of species and their great morphological similarities. In the first place, 10,514 samples of 235 Piperaceae species collected in Brazil have been retrieved in speciesLink. They have been identified by experts with experience in the family’s taxonomy. The specimens constituted the dataset named Brazil, which was later subdivided into subsets with samples collected in the state of Paraná and regions North, Northeast, Southeast, Midwest and South. After specimen segmentation by U-Net, the Paraná set was employed to assess which color modes (RGB and grayscale) and dimensions (256×256, 400×400 and 512×512 pixels) of images, descriptors (LBP, SURF, MobileNetV2, ResNet50 and VGG16), and classifiers (DT, 𝑘-NN, MLP, RF and SVM) would produce the best classification results. Due to such assessment, classification of species of regional and Brazil sets was undertaken by a combination of MLP with characteristics retrieved by VGG16, in RGB images with 512×512 pixels. Among the regional subsets, the best F1-Score average, between 0.58 and 0.69, were registered in those with most samples, albeit featuring few species: Northeast (≥ 10 images of 35 species; ≥ 20 images of 21 species) and Midwest (≥ 10 images of 29 species; ≥ 20 images of 17 species). In Brazil sets with subsets of at least 10 and 20 samples and between 105 and 160 species, F1-Score average varied between 0.41 and 0.46. Classification results seem to have been affected by factors: minimum number of samples of each specie within the subset; total number of species in the subset; interclass similarity; intraspecies variability and imbalance of datasets. Results Top-3 and Top-5 were promising and may be useful to researchers with lists of occurrences in which species would have a greater inclusion possibility. In regional subsets with at least 10 and 20 samples for each species, Top-3 and Top-5 of MLP with VGG16 varied between 66.45% and 95.00%; in subsets Brazil, between 64.92% and 78.69%. Summing up, results in current study showed that best performances were obtained by classifier MLP in non-handcrafted features (VGG16) retrieved from colored images with 512×512 pixels. Consequently, Machine Learning techniques applied on herbarium specimen images may provide a computer tool that would help botanists in the classifications of samples that need identification. |
---|