Datasets
Overview
- Train machine learning algorithms to evaluate student essays for writing and language proficiency.
- Analyze a dataset of persuasive essays by students to examine the writing and linguistic differences between different student populations in the United States.
- Train generative AI algorithms to create reading comprehension questions for elementary and middle school students.
- Analyze students’ enjoyment, engagement, and learning progress on game-based learning platforms (Ex. Jo Wilder and the Capitol Case).
How can visitors best use the Lab’s Learning Exchange?
Preview & Download Datasets.
Dataset descriptions and examples of their potential applications are listed on each set’s page. Upon deciding on the dataset you are interested in, please click the button to download the data in your preferred format (CSV, XLSX). [Note: All datasets provided on the Exchange are open source.]
Tableau integration.
Learning Exchange visitors can use the built-in data visualizations by Tableau, a visual analytics platform, to further analyze available data. Users can also create custom data visualizations with the interactive dashboards.
Datasets
The Quest Dataset
The Learning Agency Lab’s data science competition, “The Quest for Quality Questions: Improving Reading Comprehension through Automated Question Generation,” was designed to build AI algorithms that can automatically generate questions for testing young learners’ reading comprehension.
Jo Wilder Dataset
Just as there are many ways to learn there are many ways to assess learning. Game-based learning is different in that it allows students to engage with educational content in a dynamic way that traditional classroom experiences do not typically provide.
PERSUADE Dataset
Why do students write the way they do? And are they any good at it? Understanding the nuance of how students write remains a complex challenge – one that can be aided by deeper insight into how various writing components ultimately come together to form effective essays and other text.
KLICKE Dataset
What if educators could better understand the indicators that predict good writing – before a student even approaches a keyboard? Most writing assessments focus on only the final product, but data science may now be able to unlock key aspects of the writing process in order to bring new insights and efficiencies to light.
AIDE Dataset
Can AI be trained to detect plagiarism? The influx of artificial intelligence and large language models (LLMs) in the classroom is a source of both excitement and concern among educators. As LLMs like ChatGPT become increasingly sophisticated, they are capable of generating text that is difficult to distinguish from human-written text.
PIILO Dataset
Can AI be trained to detect and protect users’ personal information? As the use of artificial intelligence (AI) in education and classroom settings grows, a core challenge persists – protecting student and user privacy.
ASAP 2.0 Dataset
As many educators know, grading essays by hand is hard, time-consuming and expensive. Automating the essay-scoring process could mean untold efficiencies for teachers and faster feedback for students. Yet the first Automated Student Assessment Prize (ASAP) competition to tackle grading student-written essays was held twelve years ago.