Datasets

Overview

By providing researchers and technologists with access to high-quality datasets, the Lab’s Learning Exchange fosters innovation and progress in education – helping to improve student outcomes and prepare learners for success in the 21st century.
Datasets in other fields have already yielded transformational results. Among them, the ImageNet dataset, which is credited with major advances in computer vision, and datasets like the Automated State Assessment Prize (ASAP) that helped launch the practice of automated essay scoring.
Datasets in other fields have already yielded transformational results. Among them, the ImageNet dataset, which is credited with major advances in computer vision, and datasets like the Automated State Assessment Prize (ASAP) that helped launch the practice of automated essay scoring.
The Lab’s Learning Exchange is a critical step toward creating high-quality educational datasets that can unleash the full potential of artificial intelligence and machine learning in education. Many of the datasets hosted in this clearinghouse are associated with open data science competitions to support the creation of new, innovative AI algorithms in education. These datasets can be used to:

How can visitors best use the Lab’s Learning Exchange?

Our goal is to enable educators and engineers to create useful teaching and learning tools that are supported by the use of large and relevant datasets. To get started:

Preview & Download Datasets.
Dataset descriptions and examples of their potential applications are listed on each set’s page. Upon deciding on the dataset you are interested in, please click the button to download the data in your preferred format (CSV, XLSX). [Note: All datasets provided on the Exchange are open source.]

Tableau integration.
Learning Exchange visitors can use the built-in data visualizations by Tableau, a visual analytics platform, to further analyze available data. Users can also create custom data visualizations with the interactive dashboards.

Datasets

Male teacher talking to his student while holding a digital tablet

The Quest Dataset

The Learning Agency Lab’s data science competition, “The Quest for Quality Questions: Improving Reading Comprehension through Automated Question Generation,” was designed to build AI algorithms that can automatically generate questions for testing young learners’ reading comprehension.

image 9

Jo Wilder Dataset

Just as there are many ways to learn there are many ways to assess learning. Game-based learning is different in that it allows students to engage with educational content in a dynamic way that traditional classroom experiences do not typically provide.

PERSUADE Dataset

Why do students write the way they do? And are they any good at it? Understanding the nuance of how students write remains a complex challenge – one that can be aided by deeper insight into how various writing components ultimately come together to form effective essays and other text.

KLICKE Dataset

What if educators could better understand the indicators that predict good writing – before a student even approaches a keyboard? Most writing assessments focus on only the final product, but data science may now be able to unlock key aspects of the writing process in order to bring new insights and efficiencies to light.

AIDE Dataset​

Can AI be trained to detect plagiarism? The influx of artificial intelligence and large language models (LLMs) in the classroom is a source of both excitement and concern among educators. As LLMs like ChatGPT become increasingly sophisticated, they are capable of generating text that is difficult to distinguish from human-written text.

PIILO Dataset

Can AI be trained to detect and protect users’ personal information? As the use of artificial intelligence (AI) in education and classroom settings grows, a core challenge persists – protecting student and user privacy.

ASAP 2.0 Dataset

As many educators know, grading essays by hand is hard, time-consuming and expensive. Automating the essay-scoring process could mean untold efficiencies for teachers and faster feedback for students. Yet the first Automated Student Assessment Prize (ASAP) competition to tackle grading student-written essays was held twelve years ago.