VisualMRC is a visual machine reading comprehension dataset that proposes a task: given a question and a document image, a model produces an abstractive answer.
You can find more details, analyses, and baseline results in our paper. You can cite it as follows:
@inproceedings{VisualMRC2021,
author = {Ryota Tanaka and
Kyosuke Nishida and
Sen Yoshida},
title = {VisualMRC: Machine Reading Comprehension on Document Images},
booktitle = {AAAI},
year = {2021}
}
- [2025.03.27] Our VisualMRC dataset is available on 🤗HuggingFace.
- 🤗VisualMRC
- 10,197 images
- 30,562 QA pairs
- 10.53 average question tokens (tokenizing with NLTK tokenizer)
- 9.53 average answer tokens (tokenizing wit NLTK tokenizer)
- 151.46 average OCR tokens (tokenizing with NLTK tokenizer)
