The Open Data for Assessment Fund

We’re excited to announce the launch of the first competition in the series – The Quest! Scroll down to learn more. If you’re interested in sharing potential ODAF datasets with the lab, please see the final section, Get Involved, for more information. Funding is available!

BACKGROUND

The Open Data for Assessment Fund (ODAF) was designed to respond to the current lack of high-quality, open source assessment datasets in education. When datasets are open and available, innovators and researchers can develop new solutions (e.g., artificial intelligence and machine learning) that can reduce the cost and time to develop and administer assessments.

For example, the Automated State Assessment Prize (ASAP) dataset has become central to the field of writing assessment. ASAP, hosted in 2012, was the first study that publicly examined the ability of computers to score student essays. The dataset consists of 22,000 essays scored by human raters. The dataset was constructed to address a key pain point identified by educators – the length of time it takes to manually grade essays. This leads test companies to produce assessments made up of faster-to-grade tasks such as multiple-choice questions.
 
Through the ASAP dataset, tools were created that allowed for the testing and validation of automated essay scoring – producing rich information about student learning and student work in a fraction of the time, while also supporting rich assessments. While ASAP laid a solid foundation, the competition allowed participants to keep intellectual property, meaning that the solutions produced were not required to be publicly accessible.
 
Despite the power of open datasets, very few assessment datasets have been released. This is primarily because:
  • Almost all large educational assessment datasets are proprietary (like ASAP), held by large testing companies for competitive advantage
  • Federal funding focuses more on education interventions and research than the development of open datasets
  • Few researchers create datasets given the considerable logistical hurdles, and lack of connection to funding and their own career advancement.
This lack of assessment-focused datasets has become a major bottleneck to innovation, making advancement in the field difficult and expensive. While there have been promising accomplishments in the field, these have been isolated successes. 
 
The ODAF will address the challenges described above by collecting and releasing datasets; helping other experts collect and release datasets; and supporting the creation of data science competitions that will help draw attention to the assessment datasets. In sum, the ODAF will serve as a clearinghouse for open source assessment datasets. 

THE QUEST FOR QUALITY QUESTIONS: IMPROVING READING COMPREHENSION THROUGH AUTOMATED QUESTION GENERATION

In collaboration with Dr. Scott Crossley at Vanderbilt University, we are excited to announce our first competition in the series – a private data science challenge, called The Quest!

The Quest will focus on automatic question generation for testing reading comprehension among elementary and middle school students. More specifically, this is an NLP competition, utilizing a dataset consisting of approximately 200 children’s stories and approximately 8,000 question-answer (QA) pairs. The QA pairs are short in length (a question is one sentence and an answer is typically a few words). In this challenge, we will be using an automatic NLG metric, with the possibility of additional human evaluation.

Ten team leads were chosen to participate in this challenge, and combined with their team members, over forty individuals from around the world are participating.

The challenge launched on Oct. 28, 2022, and will run through Feb. 10, 2023. We will take breaks during the weeks of Nov. 21-25 and Dec. 26-30. The models generated will be shared after the challenge closes.

To follow along with the competition, sign up here to receive The Quest newsletter.

GET INVOLVED

As the ODAF is an ongoing project, we are always open to reviewing new datasets! If you have a dataset, or an idea for a dataset, that might be a good fit for the ODAF series– focused on assessment and aligned with the selection criteria– we would love to know more! Funding is available!

Please feel free to email natalie@the-learning-agency.com.