Menu
The Quest Dataset
Overview
The Learning Agency Lab’s data science competition,
“The Quest for Quality Questions: Improving Reading Comprehension through Automated Question Generation,” was designed to build AI algorithms that can automatically generate questions that test young learners’ reading comprehension.
As many educators and researchers know, questions are key in teaching and evaluating narrative comprehension skills in young learners. However, generating high-quality reading comprehension queries is time consuming, which limits the number of texts that young readers can engage with in this way. Datasets can help by informing quality question automation.
The Quest challenge dataset can be accessed on this page and was aided by foundational data from the Lab’s FairytaleQA dataset of 10,580 questions. Those queries were created to address gaps in similar datasets, which often overlooked fine reading skills that showcased an understanding of varying narrative elements.
This dataset is licensed under CC BY. This license enables reusers to distribute, remix, adapt, and build upon the material in any medium or format, and only so long as attribution is given to the creator.
Quest dataset © 2024 by The Learning Agency Lab is licensed under CC BY 4.0. To view a copy of this license, visit https://creativecommons.org/licenses/by/4.0/
Potential Uses
Learning Exchange users can utilize The Quest Dataset to:
- Train artificial intelligence/natural language processing (AI/NLP) models for question generation or answering.
- Analyze the difficulty level of different types of questions in the context of children's storybooks.
- Examine trends in character development, plot, or other story elements in children's storybooks and their impact on reading comprehension.
Recent developments in generative large language models (LLMs) not only enable computer algorithms to generate language at the same level of human intelligence and fluency but at even greater speeds and automation levels.
Dashboard
Stories and Sections
The Quest dataset is a collection of fairytale stories by diverse authors from different regions around the globe, from England’s Beatrix Potter to Japanese fairytales and Native American folklore. The Quest dataset includes a statistical summary and numerical breakdown of each story’s chapters and questions.
Loc/Sum, Ex/Im1, Attr1
Distribution Charts
Within the dataset, some questions have multiple possible answers. The answer distribution chart helps users visualize the frequency of these various answer options. Meanwhile, the attributes and nature of different fairytale queries and answers are further broken down in this dataset to help capture more nuanced insights and to provide better information around reader comprehension. Included charts outline:
- Question Attributes – This chart provides a deeper understanding of the attributes associated with questions, including causal relationships, actions, emotions, outcomes, characters, settings, and predictions.
- Number of Attributes – Some Quest dataset questions have more than one associated attribute, which this chart displays.
- Local or Summary Answers – This chart visualizes the distribution of questions necessitating answers found within specific sections (local), while others entail summarizing information from the entire text (summary).
- Explicit or Implicit Questions – Explicit questions directly seek factual information from the text, while implicit queries require inference, a distinction visualized by this chart.