The Quest Dataset


The Learning Agency Lab’s data science competition, “The Quest for Quality Questions: Improving Reading Comprehension through Automated Question Generation,” was designed to build AI algorithms that can automatically generate questions that test young learners’ reading comprehension.
As many educators and researchers know, questions are key in teaching and evaluating narrative comprehension skills in young learners. However, generating high-quality reading comprehension queries is time consuming, which limits the number of texts that young readers can engage with in this way. Datasets can help by informing quality question automation.
The Quest challenge dataset can be accessed on this page and was aided by foundational data from the Lab’s FairytaleQA dataset of 10,580 questions. Those queries were created to address gaps in similar datasets, which often overlooked fine reading skills that showcased an understanding of varying narrative elements.

Potential Uses

Learning Exchange users can utilize The Quest Dataset to:
Recent developments in generative large language models (LLMs) not only enable computer algorithms to generate language at the same level of human intelligence and fluency but at even greater speeds and automation levels.


Stories and Sections

The Quest dataset is a collection of fairytale stories by diverse authors from different regions around the globe, from England’s Beatrix Potter to Japanese fairytales and Native American folklore. The Quest dataset includes a statistical summary and numerical breakdown of each story’s chapters and questions.
Charts are loading...

Loc/Sum, Ex/Im1, Attr1

Charts are loading...

Distribution Charts

Within the dataset, some questions have multiple possible answers. The answer distribution chart helps users visualize the frequency of these various answer options. Meanwhile, the attributes and nature of different fairytale queries and answers are further broken down in this dataset to help capture more nuanced insights and to provide better information around reader comprehension. Included charts outline:
Charts are loading...