[1] 2.949948
Lecture 13
Duke University
SOCIOL 333 - Summer Term 1 2023
2023-06-08
Project component 2, descriptive statistics, due tonight 11:59pm
Description: What is the difference between the number of coaches assigned to men’s teams and the number of coaches assigned to women’s teams in my sample?
Inference: What is the difference between the number of coaches assigned to men’s teams and the number of coaches assigned to women’s teams in my target population?
yrbss
(13583 high school students). We will assume this was drawn randomly.If we were able to measure the population parameter–weekly strength training days for our entire target population (all US high school students)–would you expect it to be:
Let’s say I think that 50% of Duke students prefer classes with final exams over classes with final papers.
To check this, I poll a random sample of 100 students. I find that 56 of them prefer final papers over final exams. Does that disprove my hypothesis?
What about if I find that 90 of them prefer final papers over exams?
Start with a null hypothesis: the boring result; what you’d find if nothing is happening or if everything is the same between groups
And a competing alternative hypothesis: the other possibility
Test whether the data in your sample provides strong enough evidence to overturn or reject the null hypothesis in favor of the alternative hypothesis.
New York is known as “the city that never sleeps.” A random sample of 25 New Yorkers were asked how much sleep they get per night. Do these data provide convincing evidence that New Yorkers on average sleep less than 8 hours a night?
Employers at a firm are worried about the effect of March Madness, a basketball championship held each spring in the US, on employee productivity. They estimate that on a regular business day employees spend on average 15 minutes of company time checking personal email, making personal phone calls, etc. They also collect data on how much company time employees spend on such non- business activities during March Madness. They want to determine if these data provide convincing evidence that employee productivity decreases during March Madness.
When you randomly draw samples from populations, their sample statistics vary
They vary in a predictable way–following a normal distribution, centered at the population parameter
If your sample is outside that expected distribution, it’s probably not from the same population–we can reject the null hypothesis!
Sampling distributions are narrower when sample sizes are bigger, so it’s easier to identify samples from other populations and correctly reject false null hypotheses.
Z scores are a statistic that tells you how far away a value is from the center of its distribution in a standardized unit
They make it possible to compare values on different scales
The SAT and ACT follow nearly normal distributions
Clarification on three variables:
You want to include three variables in one plot when those variables are all part of one question: ie, how does the relationship between X and Y vary by Z.
Another way you may have written three-variable questions:
In this case, you’re asking multiple two-variable questions: showing them together in one plot wouldn’t make much sense
For this assignment: One plot is required–pick your favorite sub-question
But making more plots is a snap once you have one–if you want to make plots for all sub-questions, go for it!
Other questions?