The Quest Dataset

Overview

The Learning Agency Lab’s data science competition, “The Quest for Quality Questions: Improving Reading Comprehension through Automated Question Generation,” was designed to build AI algorithms that can automatically generate questions that test young learners’ reading comprehension.

As many educators and researchers know, questions are key in teaching and evaluating narrative comprehension skills in young learners. However, generating high-quality reading comprehension queries is time consuming, which limits the number of texts that young readers can engage with in this way. Datasets can help by informing quality question automation.

The Quest challenge dataset can be accessed on this page and was aided by foundational data from the Lab’s FairytaleQA dataset of 10,580 questions. Those queries were created to address gaps in similar datasets, which often overlooked fine reading skills that showcased an understanding of varying narrative elements.

This dataset is licensed under CC BY. This license enables reusers to distribute, remix, adapt, and build upon the material in any medium or format, and only so long as attribution is given to the creator.

Potential Uses

Learning Exchange users can utilize The Quest Dataset to:

Train artificial intelligence/natural language processing (AI/NLP) models for question generation or answering.
Analyze the difficulty level of different types of questions in the context of children's storybooks.
Examine trends in character development, plot, or other story elements in children's storybooks and their impact on reading comprehension.

Recent developments in generative large language models (LLMs) not only enable computer algorithms to generate language at the same level of human intelligence and fluency but at even greater speeds and automation levels.

Dashboard

Stories and Sections

The Quest dataset is a collection of fairytale stories by diverse authors from different regions around the globe, from England’s Beatrix Potter to Japanese fairytales and Native American folklore. The Quest dataset includes a statistical summary and numerical breakdown of each story’s chapters and questions.

Charts are loading...

Loc/Sum, Ex/Im1, Attr1

Charts are loading...

Distribution Charts

Within the dataset, some questions have multiple possible answers. The answer distribution chart helps users visualize the frequency of these various answer options. Meanwhile, the attributes and nature of different fairytale queries and answers are further broken down in this dataset to help capture more nuanced insights and to provide better information around reader comprehension. Included charts outline:

Question Attributes – This chart provides a deeper understanding of the attributes associated with questions, including causal relationships, actions, emotions, outcomes, characters, settings, and predictions.
Number of Attributes – Some Quest dataset questions have more than one associated attribute, which this chart displays.
Local or Summary Answers – This chart visualizes the distribution of questions necessitating answers found within specific sections (local), while others entail summarizing information from the entire text (summary).
Explicit or Implicit Questions – Explicit questions directly seek factual information from the text, while implicit queries require inference, a distinction visualized by this chart.

Charts are loading...

Visit our sister organization:

Visit our sister organization:

Visit our sister organization:

Overview

Potential Uses

Dashboard

Stories and Sections

Loc/Sum, Ex/Im1, Attr1

Distribution Charts

Visit our sister organization:

Visit our sister organization:

Visit our sister organization:

The Quest Dataset

Overview

Potential Uses

Dashboard

Stories and Sections

Loc/Sum, Ex/Im1, Attr1

Distribution Charts

The Learning Agency Lab is no longer active. This website is a living archive of the Lab’s work.

Check out similar work by The Learning Agency Lab's sister organization at the-learning-agency.com.