Lecture 14
Duke University
SOCIOL 333 - Summer Term 1 2023
2023-06-12
Project component 3, results: draft by Thursday (June 15), submit for grading next Tuesday (June 20)
Define your null hypothesis (boring result; nothing is happening; no group differences) and an alternative hypothesis (the other possibility)
In the world where your null hypothesis is true, would your sample be super weird? Run a test to find out!
If the sample is sufficiently different than we expected, we reject the null hypothesis in favor of the alternative hypothesis
If not, we fail to reject the null hypothesis
Independence: The value of one observation in the sample does not depend on the values of any of the others.
Samples are large enough: Big samples are more likely to generate normal sampling distributions.
Null distributions take different shapes depending on what test statistic you calculate
But one very common one is the normal distribution
Z scores are a measure of how far an observation is from the center of a normal distribution, in units of standard deviation (or standard error–more on the difference in a moment).
Z scores are the test statistic for the normal distribution
Why are standard units important? For comparing across distributions.
Example: Scores on the (pre-2016) SAT were normally distributed with a mean of 1500 points and a standard deviation of 300 points.
Now imagine that we randomly sampled students and asked them to provide their SAT scores. We have many samples of 25 students each.
The mean score in each 25-student sample is our sample statistic
The term for the spread of this sampling distribution is the standard error.
It is determined by formulas that vary depending on the specific situation. You generally won’t calculate it yourself; it will be provided in R output.
Standard error is always smaller than standard deviation: there is more variation within samples from the same population than between them.
Percentile: What percent of the data is below a particular observation.
From 0% to 100%, but generally reported as a proportion instead (0-1).
Something that can be calculated with software (like R)
pnorm()
function takes a Z score and gives you the corresponding percentile[1] 0.8413447
What is the Z score of a man who is 70 inches tall?
What is the Z score of a man who is 65 inches tall?
If someone is in the 30th percentile in terms of height, is their height most likely:
Percentiles/Z scores quantify how far our sample is from what we expected under the null hypothesis
P value: closely related to percentile. How much of the expected distribution under the null hypothesis has a value more extreme than what we find in our sample?
How we think about p values depends on if our hypothesis test is one-tailed or two-tailed.
One-tailed: we reject the null hypothesis only if our alternative hypothesis is specific about the direction of difference.
Two-tailed: we reject the null hypothesis if direction is not important: ie, if the sample statistic is either significantly bigger or significantly smaller than we expect.
Most hypothesis tests involving normal distributions in sociology are two-tailed, and most software defaults to two-tailed tests.
When using some other distributions, like Chi square distributions, tests are always one-tailed—more on that another day.
The p value for a one-tailed test is the percent of the distribution that is either bigger or smaller than the observed value (depending on which direction the hypothesis takes).
Example: If you randomly select an American who identifies as a man, what is the probability his height will be 63.4 inches or less? (recall the height distribution: mean 70 inches, standard deviation 3.3 inches)
What is the probability of randomly selecting an American man who is 76.6 inches tall or taller? (use the same height distribution)
First: Draw the distribution and shade the area we are interested in.
Example: What is the probability of randomly selecting an American man whose height is two or more standard deviations from the mean?
Back to our question from earlier: how do I know if my sample is weird enough to warrant rejecting my null hypothesis?
We define threshold p values before we run tests
If the p value we get is smaller than the threshold, we say the sample provides statistically significant evidence against the null hypothesis, and we reject it.
Most common cutoff, by (arbitrary but common) convention: p = 0.05
Define your null hypothesis (boring result; nothing is happening; no group differences) and an alternative hypothesis (the other possibility)
In the world where your null hypothesis is true, would your sample be super weird? Run a test to find out!
If the sample is sufficiently different than we expected, we reject the null hypothesis in favor of the alternative hypothesis
If not, we fail to reject the null hypothesis
Now that we know a little bit about distributions, how do we run tests on data?
Step 1: Figure out what kind of test you need
This depends on your variable types. Are they…
It also depends on how many variables your question involves
Does this feel similar to plotting? Lots of things depend on what kind of variables you have!
For each of these research questions, identify the explanatory variable, the response variable, their types, and the correct statistical test. Think about what the data would look like for each individual in these cases
infer
The bad news: there are a lot of different tests.
The good news: regardless of what specific test you run, the logic is similar, and your code will look about the same!
We will be using the infer
package to conduct hypothesis tests
infer
is built to work on similar logic to the tidyverse—where filter()
, mutate()
, and ggplot()
are from