Why do students write the way they do? And are they any good at it?

Understanding the nuance of how students write remains a complex challenge – one that can be aided by deeper insight into how various writing components ultimately come together to form effective essays and other text.

When researchers are able to study the anatomy of student writing, their resulting analysis can provide invaluable glimpses into how students engage with and comprehend text. These insights can generate better feedback on student writing as a whole, which is crucial to creating proficient writers – something that research reveals is a growing education challenge. According to the National Assessment of Educational Progress (NAEP), less than a third of high school seniors are proficient writers with those numbers shrinking to only 15% for some marginalized communities.

Granular writing feedback can help create better writers but teachers are often too overwhelmed to provide it as needed.  So what can help? More knowledge about the different elements of student writing can aid better development of customized AI, machine learning, and also more effective, formative teacher feedback.

Recent enhancements in the ability to study specific student writing components are now possible thanks to the Learning Agency Lab’s PERSUADE dataset. This dataset opens a window into how students think, label and organize their thoughts as they write. The resulting snapshots of information provide greater clarity and enhanced knowledge of specific writing elements.

Traditionally these types of  labeled datasets, which break down and focus on particular elements of discourse in an essay, are hard to come by. The PERSUADE (Persuasive Essays for Rating, Selecting, and Understanding Argumentative and Discourse Elements) dataset available here on the Lab’s Learning Exchange is a rare and exciting, nationally-representative new resource that lets learning engineers glean in-depth insights on student writing in the United States.

The PERSUADE dataset provides access to comprehensive data such as labels for more than 25,000 essays, including the various argumentative and rhetorical elements contained within each essay response. It also includes the effectiveness rating of these discourse elements, holistic quality scores for the essay responses, and student demographic information that includes grade level, race/ethnicity, economic background, and more.
The dataset was developed as a part of the Feedback Prize project, an initiative by Georgia State University and The Learning Agency Lab. The goal of the prize is to spur the development of open-source algorithms in assisted writing feedback tools and help struggling students dramatically improve in writing. Information in the PERSUADE dataset encompasses the actual questions and argumentative writing elements from students in grades 6-12.

This dataset is licensed under CC BY. This license enables reusers to distribute, remix, adapt, and build upon the material in any medium or format, and only so long as attribution is given to the creator.

Potential Uses

Those who access the PERSUADE dataset can conceivably:


The Tableau dashboards below provide data and analysis on essay length distribution, discourse length distribution, and more.

Charts are loading...