Improving point of view scene recognition by considering textual data

Volkmar Frinken, Yutaro Iwakiri, Ryosuke Ishida, Kensho Fujisaki, Seiichi Uchida

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

At the current rate of technological advancement and social acceptance thereof, it will not be long before wearable devices will be common that constantly record the field of view of the user. We introduce a new database of image sequences, taken with a first person view camera, of realistic, everyday scenes. As a distinguishing feature, we manually transcribed the scene text of each image. This way, sophisticated OCR algorithms can be simulated that can help in the recognition of the location and the activity. To test this hypothesis, we performed a set of experiments using visual features, textual features, and a combination of both. We demonstrate that, although not very powerful when considered alone, the textual information improves the overall recognition rates.

Original languageEnglish
Title of host publicationProceedings - International Conference on Pattern Recognition
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2966-2971
Number of pages6
ISBN (Electronic)9781479952083
DOIs
Publication statusPublished - Dec 4 2014
Event22nd International Conference on Pattern Recognition, ICPR 2014 - Stockholm, Sweden
Duration: Aug 24 2014Aug 28 2014

Publication series

NameProceedings - International Conference on Pattern Recognition
ISSN (Print)1051-4651

Other

Other22nd International Conference on Pattern Recognition, ICPR 2014
CountrySweden
CityStockholm
Period8/24/148/28/14

Fingerprint

Optical character recognition
Cameras
Experiments

All Science Journal Classification (ASJC) codes

  • Computer Vision and Pattern Recognition

Cite this

Frinken, V., Iwakiri, Y., Ishida, R., Fujisaki, K., & Uchida, S. (2014). Improving point of view scene recognition by considering textual data. In Proceedings - International Conference on Pattern Recognition (pp. 2966-2971). [6977224] (Proceedings - International Conference on Pattern Recognition). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICPR.2014.512

Improving point of view scene recognition by considering textual data. / Frinken, Volkmar; Iwakiri, Yutaro; Ishida, Ryosuke; Fujisaki, Kensho; Uchida, Seiichi.

Proceedings - International Conference on Pattern Recognition. Institute of Electrical and Electronics Engineers Inc., 2014. p. 2966-2971 6977224 (Proceedings - International Conference on Pattern Recognition).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Frinken, V, Iwakiri, Y, Ishida, R, Fujisaki, K & Uchida, S 2014, Improving point of view scene recognition by considering textual data. in Proceedings - International Conference on Pattern Recognition., 6977224, Proceedings - International Conference on Pattern Recognition, Institute of Electrical and Electronics Engineers Inc., pp. 2966-2971, 22nd International Conference on Pattern Recognition, ICPR 2014, Stockholm, Sweden, 8/24/14. https://doi.org/10.1109/ICPR.2014.512
Frinken V, Iwakiri Y, Ishida R, Fujisaki K, Uchida S. Improving point of view scene recognition by considering textual data. In Proceedings - International Conference on Pattern Recognition. Institute of Electrical and Electronics Engineers Inc. 2014. p. 2966-2971. 6977224. (Proceedings - International Conference on Pattern Recognition). https://doi.org/10.1109/ICPR.2014.512
Frinken, Volkmar ; Iwakiri, Yutaro ; Ishida, Ryosuke ; Fujisaki, Kensho ; Uchida, Seiichi. / Improving point of view scene recognition by considering textual data. Proceedings - International Conference on Pattern Recognition. Institute of Electrical and Electronics Engineers Inc., 2014. pp. 2966-2971 (Proceedings - International Conference on Pattern Recognition).
@inproceedings{1bdf70948adf4066a9ad0a6981bad4d3,
title = "Improving point of view scene recognition by considering textual data",
abstract = "At the current rate of technological advancement and social acceptance thereof, it will not be long before wearable devices will be common that constantly record the field of view of the user. We introduce a new database of image sequences, taken with a first person view camera, of realistic, everyday scenes. As a distinguishing feature, we manually transcribed the scene text of each image. This way, sophisticated OCR algorithms can be simulated that can help in the recognition of the location and the activity. To test this hypothesis, we performed a set of experiments using visual features, textual features, and a combination of both. We demonstrate that, although not very powerful when considered alone, the textual information improves the overall recognition rates.",
author = "Volkmar Frinken and Yutaro Iwakiri and Ryosuke Ishida and Kensho Fujisaki and Seiichi Uchida",
year = "2014",
month = "12",
day = "4",
doi = "10.1109/ICPR.2014.512",
language = "English",
series = "Proceedings - International Conference on Pattern Recognition",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "2966--2971",
booktitle = "Proceedings - International Conference on Pattern Recognition",
address = "United States",

}

TY - GEN

T1 - Improving point of view scene recognition by considering textual data

AU - Frinken, Volkmar

AU - Iwakiri, Yutaro

AU - Ishida, Ryosuke

AU - Fujisaki, Kensho

AU - Uchida, Seiichi

PY - 2014/12/4

Y1 - 2014/12/4

N2 - At the current rate of technological advancement and social acceptance thereof, it will not be long before wearable devices will be common that constantly record the field of view of the user. We introduce a new database of image sequences, taken with a first person view camera, of realistic, everyday scenes. As a distinguishing feature, we manually transcribed the scene text of each image. This way, sophisticated OCR algorithms can be simulated that can help in the recognition of the location and the activity. To test this hypothesis, we performed a set of experiments using visual features, textual features, and a combination of both. We demonstrate that, although not very powerful when considered alone, the textual information improves the overall recognition rates.

AB - At the current rate of technological advancement and social acceptance thereof, it will not be long before wearable devices will be common that constantly record the field of view of the user. We introduce a new database of image sequences, taken with a first person view camera, of realistic, everyday scenes. As a distinguishing feature, we manually transcribed the scene text of each image. This way, sophisticated OCR algorithms can be simulated that can help in the recognition of the location and the activity. To test this hypothesis, we performed a set of experiments using visual features, textual features, and a combination of both. We demonstrate that, although not very powerful when considered alone, the textual information improves the overall recognition rates.

UR - http://www.scopus.com/inward/record.url?scp=84919935024&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84919935024&partnerID=8YFLogxK

U2 - 10.1109/ICPR.2014.512

DO - 10.1109/ICPR.2014.512

M3 - Conference contribution

AN - SCOPUS:84919935024

T3 - Proceedings - International Conference on Pattern Recognition

SP - 2966

EP - 2971

BT - Proceedings - International Conference on Pattern Recognition

PB - Institute of Electrical and Electronics Engineers Inc.

ER -