Project
Descriptive Statistics
In this component of your project, you will explore the variables you will use in your analysis.
How to complete and submit (due Thursday, June 8, 11:59pm)
First, locate and clone the component 2 repository (instructions). Go to our course github organization and find the repository called c2_descriptives_yourusername. Clone that repository. Open the document called c2_descriptives.qmd. This is a template document where you will put your work.
Commit your changes regularly. When you are done, render the document and submit by pushing your work to github (instructions).
Details
This should be completed in the template Quarto document (these instructions are duplicated there).
Part 1: Understanding your variables
- Write down your research question (from component 1)
- Write down your hypothesis (from component 1)
- Write down the name of your data set (from component 1)
- Write down the names of the variables you will need to do your analysis (i.e., what you need to type to access them in R)--this includes your explanatory and response variables and any other variables needed to filter your dataset or create new variables.
- For each variable, write:
- Whether it is an explanatory (independent) variable, a response (dependent) variable, or something else (explain what)
- What kind of variable it is (categorical or numeric and which subtype within these)
Part 2: Univariate distributions
- For each of your variables:
- Look at its distribution using the
summary()
ortable()
function (as appropriate to its type). - Indicate how many missing values (NAs) there are.
- In ~2-3 sentences, describe what you see in the results. Note anything that surprises you.
- Look at its distribution using the
Part 3: Preparing your data set
- Will you analyze all the observations in your data set? Explain why or why not. If not, which will you include and which will you exclude?
- Do you need to create any new variables? Explain why or why not. If so, what are they?
- Make any necessary changes to your data set. Check whether they worked as expected.
Part 4: Multivariate visualizations
Make a plot showing the relationship between your explanatory (independent) and response (dependent) variables. Depending on your variable types, this might be a scatter plot, box plot, bar chart, etc. Include meaningful axis labels.
Interpret your plot in a few sentences. Does it appear these variables are associated with one another? Does anything about the relationship surprise you? Is it consistent or inconsistent with your hypothesis?
Grading
Part 1 | 10% |
Part 2 | 25% |
Part 3 | 25% |
Part 4 | 30% |
Workflow and formatting | 10% |