Project

Descriptive Statistics

In this component of your project, you will explore the variables you will use in your analysis.

How to complete and submit (due Thursday, June 8, 11:59pm)

First, locate and clone the component 2 repository (instructions). Go to our course github organization and find the repository called c2_descriptives_yourusername. Clone that repository. Open the document called c2_descriptives.qmd. This is a template document where you will put your work.

Commit your changes regularly. When you are done, render the document and submit by pushing your work to github (instructions).

Details

This should be completed in the template Quarto document (these instructions are duplicated there).

Part 1: Understanding your variables

  • Write down your research question (from component 1)
  • Write down your hypothesis (from component 1)
  • Write down the name of your data set (from component 1)
  • Write down the names of the variables you will need to do your analysis (i.e., what you need to type to access them in R)--this includes your explanatory and response variables and any other variables needed to filter your dataset or create new variables.
  • For each variable, write:
    • Whether it is an explanatory (independent) variable, a response (dependent) variable, or something else (explain what)
    • What kind of variable it is (categorical or numeric and which subtype within these)

Part 2: Univariate distributions

  • For each of your variables:
    • Look at its distribution using the summary() or table() function (as appropriate to its type).
    • Indicate how many missing values (NAs) there are.
    • In ~2-3 sentences, describe what you see in the results. Note anything that surprises you.

Part 3: Preparing your data set

  • Will you analyze all the observations in your data set? Explain why or why not. If not, which will you include and which will you exclude?
  • Do you need to create any new variables? Explain why or why not. If so, what are they?
  • Make any necessary changes to your data set. Check whether they worked as expected.

Part 4: Multivariate visualizations

  • Make a plot showing the relationship between your explanatory (independent) and response (dependent) variables. Depending on your variable types, this might be a scatter plot, box plot, bar chart, etc. Include meaningful axis labels.

  • Interpret your plot in a few sentences. Does it appear these variables are associated with one another? Does anything about the relationship surprise you? Is it consistent or inconsistent with your hypothesis?

Grading

Part 1 10%
Part 2 25%
Part 3 25%
Part 4 30%
Workflow and formatting 10%
Why?

Before doing any inferential statistics, you should always get a good understanding of what the distributions of your variables look like. This will help you head off problems later. For example, looking at the distributions of your variables will reveal if data is missing when you don’t expect it to be. It also might clue you in on interesting patterns you may want to investigate further. Looking at bivariate associations between variables gives you a sense of what the relationship between them looks like, which will help you interpret inferential statistics.