Gathering Data

Need to add worksheet in somewhere... --MF, 8/12/21
In this activity, you'll define a research question, collect data from your classmates, and clean the data so it can be entered into Snap!.
The white-background text below "answers" the pink box. We didn't allow that in the high school version. Wouldn't it be better for the white text to start with "Survey your class to collect some data. In a later activity..."? -bh 2/6/22

In a later activity, you'll input the data in Snap! and create your own visualization of the results, a pictograph like this one:
pictograph with ice cream cones; the vertical axis is labeled 'Number of students' and the categories along on the horizontal axis are 'MintChip' showing three ice cream cones, 'Vanilla' showing four ice cream cones,  and 'Orange' showing one ice cream cone

Every day, thousands of professional data analysts go through the process of collecting and understanding data.
Define the question → Collect the data → Clean the data → Analyze the data → Visualize and share findings

Defining Your Research Question

I'm having trouble with the phrase "research question." You use it to mean literally the question students will ask other students. But usually "the research question" isn't "what's your favorite flavor of ice cream" but rather "what's the most popular flavor of ice cream among the students in this class?" I'm not sure how to fix this; maybe don't say "research question" at all, but instead call it a "survey question" or something. -bh 2/6/22
  • Agreed. ALSO, the questions you are asking are so familiar to middle-school students from early elementary school that they are likely to be insulting. (One current EDC project has several of these questions being handled, with graphs in pre-school!) And "what's your favorite" is often unanswerable. Since these are only suggestions that the kids don't need to adopt, make them suggestions that get them off the stereotyped data research topics, something that kids might get curious about but might not have thought of. Maybe something like "Of the following eight kinds of work, which ones (pick three at most) are clisest to what you think you might eventually enjoy doing?" Then list like: Medical (emergency worker, doctor, nurse, medical research, vet); building trades (contractor, mason, electrical work, whatever); computer science (engineer, programmer, whatever); educator (teacher, whatever); food service (baker, short-order cook, chef, server); writer (reporter, novelist, whatever); music.... etc. (avoiding hierarchies within the list as I did) --P
    1. Decide on a research question
      1. That has a relatively small number of possible answers (2-12)
      2. For which each person will only have one response.
      Here are some ideas.
      • What's your favorite day of the week?
      • What's your favorite ice cream flavor?
      • What's your favorite fruit?
      • What's your favorite subject in school?
      • What's your favorite pizza topping?
      • How do you get to school?
      • What's your favorite sport?
      • What's your favorite musical genre?
    2. List some possible responses. There should be a relatively small number (2-12). (Note that questions "What is your favorite book?" won't make a great choice for this project since there are a large number of possible responses.)

    Collecting Your Data

    1. Ask each classmate your question, and paste a link to the results.
    2. I (honestly) have no idea what this means! Are they making a list in Snap!, or are they putting the results in a text file, or what? And, paste a link where?

    Cleaning Your Data

    Reference: Cleaning Big Data: Most Time-Consuming, Least Enjoyable Data Science Task, Survey Says
    I still don't love having this link in the curriculum. It basically says, "this stuff isn't fun." --MF, 8/12/21 +1 -bh 2/6/22; me, too pg March 30. Also, though Sparks is not constrained to middle-school, we'd want it to be accessible and inviting there. Think 11-year-old reading this for school.
    Data cleaning is a crucial part of the data analysis process. Data scientists report spending 60% of their time cleaning data!

    So, what makes data messy? That depends on the kind of data collected, but in this case, there might be variation in the ways people respond to the question. You'll need to find and remove those differences so the data can be analyzed by a computer. Here are some examples:

    1. Go through your data, and see if any of your data needs to be cleaned up. You'll have to make some decisions about how you want to organize the data.
    2. Save your cleaned up data so you can find it again later.
    In this activity, you defined a research question, collected responses to your question, and cleaned the data for entry into Snap!.