Lecture 12
Duke University
SOCIOL 333 - Summer Term 1 2023
2023-06-07
Rows: 13,583
Columns: 13
$ age <int> 14, 14, 15, 15, 15, 15, 15, 14, 15, 15, 15, 1…
$ gender <chr> "female", "female", "female", "female", "fema…
$ grade <chr> "9", "9", "9", "9", "9", "9", "9", "9", "9", …
$ hispanic <chr> "not", "not", "hispanic", "not", "not", "not"…
$ race <chr> "Black or African American", "Black or Africa…
$ height <dbl> NA, NA, 1.73, 1.60, 1.50, 1.57, 1.65, 1.88, 1…
$ weight <dbl> NA, NA, 84.37, 55.79, 46.72, 67.13, 131.54, 7…
$ helmet_12m <chr> "never", "never", "never", "never", "did not …
$ text_while_driving_30d <chr> "0", NA, "30", "0", "did not drive", "did not…
$ physically_active_7d <int> 4, 2, 7, 0, 2, 1, 4, 4, 5, 0, 0, 0, 4, 7, 7, …
$ hours_tv_per_school_day <chr> "5+", "5+", "5+", "2", "3", "5+", "5+", "5+",…
$ strength_training_7d <int> 0, 0, 0, 0, 1, 0, 2, 0, 3, 0, 3, 0, 0, 7, 7, …
$ school_night_hours_sleep <chr> "8", "6", "<5", "6", "9", "8", "9", "6", "<5"…
mutate()
with a new function, factor()
, for this.
always did not ride most of time never rarely sometimes
399 4549 293 6977 713 341
<NA>
311
yrbss_reordered <- mutate(yrbss,
# we are turning our helmet variable into a factor
# its name will remain the same (the new variable
# we create will overwrite the old one)
helmet_12m = factor(
helmet_12m,
# and we want its response options to be in this order
# least to greatest frequency
levels = c("did not ride", "never", "rarely",
"sometimes", "most of time", "always")))
table(yrbss_reordered$helmet_12m)
did not ride never rarely sometimes most of time always
4549 6977 713 341 293 399
One numeric variable
geom_histogram()
One categorical variable
geom_bar()
Two numeric variables
geom_point()
One numeric and one categorical variable
geom_boxplot()
Two categorical variables
Adding an additional categorical variable
Adding an additional numeric variable
aes()
(aesthetics) is where we tell R what variables we want represented by what plot elements
For example:
ggplot(acs12, aes(x = income))
ggplot(acs12, aes(x = income, y = married))
ggplot(acs12, aes(x = income, y = married, color = gender))
ggplot(acs12, aes(x = time_to_work, y = lang, size = income))
ggplot(acs12, aes(x = time_to_work, y = lang, color = income))
RQ: How do reported rates of texting while driving vary by gender? (in yrbss
)
We can use color!
color
(for lines and points) and fill
(for filling in areas like bars) are the arguments we use inside aes()
to add color.Rows: 13,583
Columns: 13
$ age <int> 14, 14, 15, 15, 15, 15, 15, 14, 15, 15, 15, 1…
$ gender <chr> "female", "female", "female", "female", "fema…
$ grade <chr> "9", "9", "9", "9", "9", "9", "9", "9", "9", …
$ hispanic <chr> "not", "not", "hispanic", "not", "not", "not"…
$ race <chr> "Black or African American", "Black or Africa…
$ height <dbl> NA, NA, 1.73, 1.60, 1.50, 1.57, 1.65, 1.88, 1…
$ weight <dbl> NA, NA, 84.37, 55.79, 46.72, 67.13, 131.54, 7…
$ helmet_12m <chr> "never", "never", "never", "never", "did not …
$ text_while_driving_30d <chr> "0", NA, "30", "0", "did not drive", "did not…
$ physically_active_7d <int> 4, 2, 7, 0, 2, 1, 4, 4, 5, 0, 0, 0, 4, 7, 7, …
$ hours_tv_per_school_day <chr> "5+", "5+", "5+", "2", "3", "5+", "5+", "5+",…
$ strength_training_7d <int> 0, 0, 0, 0, 1, 0, 2, 0, 3, 0, 3, 0, 0, 7, 7, …
$ school_night_hours_sleep <chr> "8", "6", "<5", "6", "9", "8", "9", "6", "<5"…
fill = gender
is the argument we want for bar color (an area rather than a point or line)yrbss |>
filter(!is.na(gender) & !is.na(text_while_driving_30d)) |> # remove missing data
mutate(text_while_driving_30d = factor(text_while_driving_30d, # change the order of the bars
levels = c("did not drive", "0", "1-2", "3-5",
"6-9", "10-19", "20-29", "30"))) |>
# we add the fill argument here with the variable we want the fill color to represent.
ggplot(aes(x = text_while_driving_30d, fill = gender)) +
geom_bar()
position = "dodge"
argument to geom_bar()
yrbss |>
filter(!is.na(gender) & !is.na(text_while_driving_30d)) |>
mutate(text_while_driving_30d = factor(text_while_driving_30d,
levels = c("did not drive", "0", "1-2", "3-5",
"6-9", "10-19", "20-29", "30"))) |>
ggplot(aes(x = text_while_driving_30d, fill = gender)) +
geom_bar(position = "dodge") +
labs(x = "Number of days last month respondent texted while driving",
y = "",
# I can change the label on other aes features just like on axes--this goes above the legend
fill = "Sex")
Make plots that are interpretable
Do they have to be pretty?
Check out the R Graph Gallery
If this interests you, I recommend:
Soc 232: Visualizing Social Data, taught by Dr. Kieran Healy
Stat 313: Advanced Data Visualization, taught by Dr. Mine Çetinkaya-Rundel
Soc 223: Data Analytics and Visualization for Business, taught by Dr. Stephen Vaisey
Step 1: Identify the variables you want to plot and figure out if they’re numeric or categorical
Step 2: Choose a plot type that fits your variables and question
Step 3: Look for resources. Find the relevant code in the slides and read the R Graph Gallery’s information about your plot type.
Step 4: Make the super simple version
Step 5: Refine it to make it more interpretable