Conference Paper (published) | A comparison of approaches for automated text extraction from scholarly figures

Conference Paper (published)

A comparison of approaches for automated text extraction from scholarly figures

Details

Citation

B?schen F & Scherp A (2017) A comparison of approaches for automated text extraction from scholarly figures. In: Amsaleg L, Gu?mundsson G, Gurrin C, Jónsson B & Satoh S (eds.) MultiMedia Modeling. MMM 2017. Lecture Notes in Computer Science, 10132. MMM2017: 23rd International Conference on Multimedia Modeling, Reykjavik, Iceland, 04.01.2017-06.01.2017. Cham, Switzerland: Springer, pp. 15-27. https://doi.org/10.1007/978-3-319-51811-4_2

Abstract
So far, there has not been a comparative evaluation of different approaches for text extraction from scholarly figures. In order to fill this gap, we have defined a generic pipeline for text extraction that abstracts from the existing approaches as documented in the literature. In this paper, we use this generic pipeline to systematically evaluate and compare 32 configurations for text extraction over four datasets of scholarly figures of different origin and characteristics. In total, our experiments have been run over more than 400 manually labeled figures. The experimental results show that the approach BS-4OS results in the best F-measure of 0.67 for the Text Location Detection and the best average Levenshtein Distance of 4.71 between the recognized text and the gold standard on all four datasets using the Ocropy OCR engine.

Keywords
Scholarly figures; Text extraction; Comparison

Journal
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Status	Published
Funders
Title of series	Lecture Notes in Computer Science
Number in series	10132
Publication date	31/12/2017
Publication date online	31/12/2016
URL
Publisher	Springer
Place of publication	Cham, Switzerland
ISSN of series	0302-9743
ISBN	978-3-319-51810-7
Conference	MMM2017: 23rd International Conference on Multimedia Modeling
Conference location	Reykjavik, Iceland
Dates	04/01/2017–06/01/2017

我要吃瓜

A comparison of approaches for automated text extraction from scholarly figures

Details