Almost every occupation involves interpreting and analyzing data as well as making decisions based upon that data. The modern business world was built on the spreadsheet. Medical professionals make life and death decisions comparing conflicting data. Natural scientists and social scientists wrangle data by the fistful. Politics, policy, sports, automotive technology, heck, even reading a newspaper requires data analysis skills these days.

Nonetheless, data sources are not always ethical and algorithms are not always as transparent as they initially seem. Making decisions driven by data can cause pre-existing biases to persist which leads to a lack of objectivity and can result in poor decision-making.

But the prevailing math curriculum does little to support students in developing data science vital skills. Students may make graphs in physics or have to interpret a table on a standardized test. However, they rarely have to clean data sets, make tradeoffs when choosing a data visualization, or resolve conflicting interpretations of the data in order to ensure that the outcomes are fair, accountable, and transparent.

Instead, students learn things like L’Hopital’s Rule and Integration by Parts, while complaining that math has little relevance to their everyday lives.

Is it time for a change? The short answer is yes. Today’s math curriculum is not relevant for students and far more needs to be done to incorporate data science into the curriculum.

Politics, policy, sports, automotive technology, heck, even reading a newspaper requires data analysis skills these days.

## The Problem Of Calculus

The problem is that calculus isn’t really about learning calculus anymore. It’s become more like a sign of college-readiness than a substantive learning experience. Enrollment in high school calculus classes has steadily increased over the past couple of decades as students strive to show colleges that they can do college-level math. But that hasn’t translated into better math skills.

In a widely cited data set from the Mathematical Association of America, less than 20% of students who take a high school calculus class go on to take calculus II in for their first math class at college. Over 30% will re-take college-level calculus. And over 30% will take a pre-calculus or remedial math course.

Taking high school calculus doesn’t even seem to help students do well in college calculus. Students who took high school calculus only performed 5 points better, on average, in a college calculus course than those who didn’t. Earning an A in algebra, algebra II, and geometry was the best predictor of earning a B in a college calculus course — not taking the same course in high school.

The demand for high school calculus classes has also accelerated the overall K-12 math curriculum. Algebra I is taught in eighth grade instead of high school. Pre-calculus is now a standard high school course instead of a college-level course.

So high school calculus, at least as currently implemented, doesn’t seem to help students very much.

A growing group of educators and researchers, however, believe that a data science course could help students: prepare students for careers in science, engineering, and policy; make students more savvy consumers of the data they encounter everyday; and change how students perceive math—as a useful, vital tool in their everyday lives, rather than a hoop to jump through.

The problem is that calculus isn’t really about learning calculus anymore. It’s become more like a sign of college-readiness than a substantive learning experience.

# Good Data Science Curriculum

**What does a good data science class and curriculum look like?**

Of course, developing a more effective math curriculum involves more than trading out one class for another. Any effective new curriculum has to overcome a number of challenges.

**The material can’t be formulaic**

Data science has not been entirely neglected. High school statistics courses, such as AP Statistics, give students some experience in understanding and analyzing data.

The problem is that such courses are often formulaic. The student experience in AP Statistics is drastically different from the experience of actually using statistics to analyze data. Researchers spend a lot of time wrangling data. They have to make choices about how to analyze the data — not just apply a single algorithm and read out the results. They have to think about how the data was generated. They have to consider multiple plausible interpretations of the data.

A good data science course would emphasize these decision-making points, offer more opportunities for hands-on data manipulation, and engage students in argument about how to analyze the data and interpret the results.

This means less hypothesis testing—less emphasis on statistical significance—and more modeling—more emphasis on the value and meaning of the patterns in the data. Students should practice transforming raw data into more useful data sets and defining meaningful variables (rather than just accepting that existing variables measure what they purport to be measuring).

A data science course could also focus on understanding uncertainty and variation, which evidence suggests is a very challenging topic for college students and crosses discipline boundaries. Students should also learn to consider how these algorithms will affect society and adopt a mindset that is diverse and all-inclusive. They should remain unbiased and not let their personal opinions affect their approach.

**The material needs to be an open door; not a closed gate**

Another challenge comes from the dual purposes of a data science class. The underlying premise of teaching calculus is to prepare students for careers in engineering and science.

But, as noted earlier, skills at analyzing and interpreting data serve two purposes: preparing students for data-intensive careers in social science, business, journalism, and the natural sciences; and teaching students to become better data citizens — wiser news consumers, better critical thinkers.

This dual purpose means that teachers have to design “low floor, high ceiling” tasks that students with varying levels of experience can take value from. It also means pursuing complementary learning goals. Students should learn how to process their own data (the various choices you have to make, and how those choices can affect the outcome), and how to critique data claims made by others.

This is less about the content and more about the approach. Data talks, for instance, develop a critical stance toward data through argument and conversation. Data talks are designed to help students develop “data literacy,” which is the ability to understand, process, utilize, and communicate with learned data and its constructs. Creative data projects might open the door as well.

Another issue is relevance. For instance, if a student is interested in baseball, then they can use data science to gain more insight into the sport. Being able to relate a passion or hobby to learning is great for information processing and aids in long-term retention.

*Data science material must build connections across the curriculum*

Interpreting and analyzing data involves more than just math. It’s also about the design of the study, the content of the data, the audience of the data, the logical relationship between the claims being made and the evidence. If one of the goals is to make student savvy data consumers, it could also include understanding the media environment.

Although data science could be centered in a math class, an effective curriculum would include data science in natural science, social science, and civics courses. This would fold a data science reform into existing priorities for teaching science and media criticism. “Analyzing and interpreting data”, for instance, is one of the key practices that students should learn under the Next Generation Science Standards.

Data science could also be infused throughout the earlier math curriculum. Jo Boaler’s data science course, for instance, requires no higher-level math background and could fit into a middle school curriculum.

Another resource has been pulled together by economist Steve Levitt and his team. They’ve launched a project called “Data Science for Everyone” with fantastic resources.

One of the nice things is that there are many relevant data sets to draw on. There is data about everything: sports, politics, biology, physics, astronomy, business, news media, education, the environment. Savvy teachers can tailor lessons to student interest, further illustrating the relevance of math to their lives.

Although data science could be centered in a math class, an effective curriculum would include data science in natural science, social science, and civics courses.

Last Updated: 01/13/2021

## Resources

*Classes*- Bootstrap: Bootstrap provides high-quality math and science curricula for grades 6-12 that is both challenging and engaging. The coursework is drawn from top universities such as Northeastern and Brown.

- Data 8: “Foundations of Data Science” course offered by UC Berkeley that combines inferential and computational thinking with real-world relevance. This course is more fitting for college students.

- CourseKata: Here you can find interactive materials for online learning designed to improve education for young people. CourseKata collects data as students navigate the platform and uses it to continuously improve their learning resources.

- Introduction to Data Science (IDS): IDS offers educational materials for high-school students as well as tools for professional development. Their units are designed to increase student engagement and provide a high-quality learning experience.

*Datasets*

- National Health and Nutrition Examination Survey (NHANES) Data Portal: Here students can explore health-related datasets and practice comprehension. They will be able to explore and choose from data such as the height, weight, and age. This information can be used to come to different conclusions about the overall health of U.S citizens.

- California American Community Survey (ACS) Data Portal: Students will be able to access demographic information specifically related to California Residents such as marital or employment status, sex, or place of birth. There are various science challenges available to test data comprehension.

- MLB Historical Odds & Scores: This dataset includes MLB data from 2010 – 2020 including run lines, totals, and opening/closing moneylines. This data would be useful for students in a machine learning project.

- Common Core of Data: The U.S Department of Education’s database on information for public elementary schools, secondary schools, and other school districts includes data about drop out rates and other reference materials.

- Introduction to Data Science: In this program, students are be able to collect data and use it to learn about analysis and text interpretation. The program also offer tools for professional development.

- EverFi: EverFi offers a “data literacy” program. The program includes datasets that are ready for use by students.

*Resources*

- Miner. Similar to Kaggle, Miner is an open source Python-based framework to let instructors host their own data science hackatons and competitions in class. The focus is on learning-by-doing and challenges posted over the years have included development of drug predictors and user-movie recommender systems.