Lecture 12
Duke University
SOCIOL 333 - Summer Term 1 2023
2023-06-07

Rows: 13,583
Columns: 13
$ age <int> 14, 14, 15, 15, 15, 15, 15, 14, 15, 15, 15, 1…
$ gender <chr> "female", "female", "female", "female", "fema…
$ grade <chr> "9", "9", "9", "9", "9", "9", "9", "9", "9", …
$ hispanic <chr> "not", "not", "hispanic", "not", "not", "not"…
$ race <chr> "Black or African American", "Black or Africa…
$ height <dbl> NA, NA, 1.73, 1.60, 1.50, 1.57, 1.65, 1.88, 1…
$ weight <dbl> NA, NA, 84.37, 55.79, 46.72, 67.13, 131.54, 7…
$ helmet_12m <chr> "never", "never", "never", "never", "did not …
$ text_while_driving_30d <chr> "0", NA, "30", "0", "did not drive", "did not…
$ physically_active_7d <int> 4, 2, 7, 0, 2, 1, 4, 4, 5, 0, 0, 0, 4, 7, 7, …
$ hours_tv_per_school_day <chr> "5+", "5+", "5+", "2", "3", "5+", "5+", "5+",…
$ strength_training_7d <int> 0, 0, 0, 0, 1, 0, 2, 0, 3, 0, 3, 0, 0, 7, 7, …
$ school_night_hours_sleep <chr> "8", "6", "<5", "6", "9", "8", "9", "6", "<5"…
mutate() with a new function, factor(), for this.
always did not ride most of time never rarely sometimes
399 4549 293 6977 713 341
<NA>
311
yrbss_reordered <- mutate(yrbss,
# we are turning our helmet variable into a factor
# its name will remain the same (the new variable
# we create will overwrite the old one)
helmet_12m = factor(
helmet_12m,
# and we want its response options to be in this order
# least to greatest frequency
levels = c("did not ride", "never", "rarely",
"sometimes", "most of time", "always")))
table(yrbss_reordered$helmet_12m)
did not ride never rarely sometimes most of time always
4549 6977 713 341 293 399
One numeric variable
geom_histogram()One categorical variable
geom_bar()Two numeric variables
geom_point()One numeric and one categorical variable
geom_boxplot()Two categorical variables
Adding an additional categorical variable
Adding an additional numeric variable
aes() (aesthetics) is where we tell R what variables we want represented by what plot elements
For example:
ggplot(acs12, aes(x = income))
ggplot(acs12, aes(x = income, y = married))
ggplot(acs12, aes(x = income, y = married, color = gender))
ggplot(acs12, aes(x = time_to_work, y = lang, size = income))
ggplot(acs12, aes(x = time_to_work, y = lang, color = income))
RQ: How do reported rates of texting while driving vary by gender? (in yrbss)
We can use color!
color (for lines and points) and fill (for filling in areas like bars) are the arguments we use inside aes() to add color.Rows: 13,583
Columns: 13
$ age <int> 14, 14, 15, 15, 15, 15, 15, 14, 15, 15, 15, 1…
$ gender <chr> "female", "female", "female", "female", "fema…
$ grade <chr> "9", "9", "9", "9", "9", "9", "9", "9", "9", …
$ hispanic <chr> "not", "not", "hispanic", "not", "not", "not"…
$ race <chr> "Black or African American", "Black or Africa…
$ height <dbl> NA, NA, 1.73, 1.60, 1.50, 1.57, 1.65, 1.88, 1…
$ weight <dbl> NA, NA, 84.37, 55.79, 46.72, 67.13, 131.54, 7…
$ helmet_12m <chr> "never", "never", "never", "never", "did not …
$ text_while_driving_30d <chr> "0", NA, "30", "0", "did not drive", "did not…
$ physically_active_7d <int> 4, 2, 7, 0, 2, 1, 4, 4, 5, 0, 0, 0, 4, 7, 7, …
$ hours_tv_per_school_day <chr> "5+", "5+", "5+", "2", "3", "5+", "5+", "5+",…
$ strength_training_7d <int> 0, 0, 0, 0, 1, 0, 2, 0, 3, 0, 3, 0, 0, 7, 7, …
$ school_night_hours_sleep <chr> "8", "6", "<5", "6", "9", "8", "9", "6", "<5"…
fill = gender is the argument we want for bar color (an area rather than a point or line)yrbss |>
filter(!is.na(gender) & !is.na(text_while_driving_30d)) |> # remove missing data
mutate(text_while_driving_30d = factor(text_while_driving_30d, # change the order of the bars
levels = c("did not drive", "0", "1-2", "3-5",
"6-9", "10-19", "20-29", "30"))) |>
# we add the fill argument here with the variable we want the fill color to represent.
ggplot(aes(x = text_while_driving_30d, fill = gender)) +
geom_bar()position = "dodge" argument to geom_bar()yrbss |>
filter(!is.na(gender) & !is.na(text_while_driving_30d)) |>
mutate(text_while_driving_30d = factor(text_while_driving_30d,
levels = c("did not drive", "0", "1-2", "3-5",
"6-9", "10-19", "20-29", "30"))) |>
ggplot(aes(x = text_while_driving_30d, fill = gender)) +
geom_bar(position = "dodge") +
labs(x = "Number of days last month respondent texted while driving",
y = "",
# I can change the label on other aes features just like on axes--this goes above the legend
fill = "Sex")
Make plots that are interpretable
Do they have to be pretty?
Check out the R Graph Gallery
If this interests you, I recommend:
Soc 232: Visualizing Social Data, taught by Dr. Kieran Healy
Stat 313: Advanced Data Visualization, taught by Dr. Mine Çetinkaya-Rundel
Soc 223: Data Analytics and Visualization for Business, taught by Dr. Stephen Vaisey
Step 1: Identify the variables you want to plot and figure out if they’re numeric or categorical
Step 2: Choose a plot type that fits your variables and question
Step 3: Look for resources. Find the relevant code in the slides and read the R Graph Gallery’s information about your plot type.
Step 4: Make the super simple version
Step 5: Refine it to make it more interpretable