KLICKE Dataset


What if educators could better understand the indicators that predict good writing – before a student even approaches a keyboard? 

Most writing assessments focus on only the final product, but data science may now be able to unlock key aspects of the writing process in order to bring new insights and efficiencies to light.

A unique new dataset of keystroke logs that capture key writing process features (pauses, deletions, bursts, process variance, etc.), from the Learning Exchange’s new KLICKE (Keystroke Logs in Compositions for Knowledge Evaluation) Corpus can now help learning engineers discover new ways to see how the writing process can predict overall writing quality.

Ultimately, work derived from the KLICKE dataset could provide valuable information that aids writing instruction, writing research, and helps train artificial intelligence models in the development of automated writing evaluation techniques, intelligent tutoring systems, and writing support tools.

While past research into keystroke logging has been done, most studies of the process included only a small number of writing process features and were also limited by relatively small datasets. KLICKE, which was released in October 2023 via a Kaggle competition and concluded in early 2024, encompassed 7,209 entrants and ultimately yielded 2,256 participants and 44,811 submissions.

In addition to training AI and supporting teachers, the potential applications of the KLICKE writing dataset include the ability to direct learners’ attention to their text production process, which can boost their autonomy, metacognitive awareness, and self-regulation in writing.

Klicke dataset © 2024 by The Learning Agency Lab is licensed under CC BY 4.0. To view a copy of this license, visit https://creativecommons.org/licenses/by/4.0/

Potential Uses