Describing data: part 4

Lecture 10

Aidan Combs

Duke University
SOCIOL 333 - Summer Term 1 2023

2023-06-05

Logistics

  • Project component 2: descriptive statistics

    • All materials are posted (instructions, example, github repos)
    • Due Wednesday June 7 11:59pm
    • I have returned feedback (GitHub issue on your repo) and grades (Sakai) for your proposals
    • We have covered everything for parts 1-3. Part 4 is today and tomorrow.
    • I’m around before/after class for questions (after generally better), or set up an appointment

Today

  • Answers to filtering
  • Pipes
  • Working on creating variables
  • Answers to creating variables
  • Plots

Filtering exercise (ex-5-31): answers!

  • Find this on your computer (no need to clone it again)

Stringing commands together with pipes (|>)

  • Often we need to change data frames in more than one way
  • Example from last week: How does employment status vary by age category?
  • We need to create an age category variable (as we talked about Thursday)
  • But we probably want to filter too–why analyze kids?

Stringing commands together: approach 1

We can do this with the tools we have.

# first we make the new variable
acs12_newagevar <- mutate(acs12, agecat = case_when(age < 14 ~ "child",
                                                    age < 18 ~ "teen",
                                                    age < 67 ~ "adult",
                                                    TRUE ~ "retired"))

# then we filter to remove children
acs12_nokids <- filter(acs12_newagevar, agecat != "child")

# did it work?
table(acs12_nokids$agecat, useNA = "always")

  adult retired    teen    <NA> 
   1264     297      89       0 
  • But it’s kind of ugly… we don’t need to save that middle step

Stringing commands together: approach 2, with |>

  • The pipe operator, |>, lets us pass the result of one function directly into another one
  • The | symbol is the key below “delete” on your keyboard (not I, not l, not 1)
  • It replaces the first mutate/filter argument (the dataset)
  • “Take the thing that came before this and give it to the function that comes after this”
acs12_nokids <- acs12 |> # start with acs12
  mutate(agecat = case_when(age < 14 ~ "child", #then add a new variable to it
                            age < 18 ~ "teen",
                            age < 67 ~ "adult",
                            TRUE ~ "retired")) |> 
  filter(agecat != "child") # then filter out the kids

table(acs12_nokids$agecat, useNA = "always")

  adult retired    teen    <NA> 
   1264     297      89       0 

Another |> example

  • This:
mutate(acs12, agecat = case_when(age < 14 ~ "child",
                                 age < 18 ~ "teen",
                                 age < 67 ~ "adult",
                                 TRUE ~ "retired"))
  • Is the same as this:
acs12 |> 
  mutate(agecat = case_when(age < 14 ~ "child",
                            age < 18 ~ "teen",
                            age < 67 ~ "adult",
                            TRUE ~ "retired"))

More on |>

  • We’re just scratching the surface in this class

  • When you need to clean your data or when your analyses are more complex, |> makes your life a lot easier!

  • Sometimes in internet resources more than ~2 years old, you’ll see %>% instead–this older version does exactly the same thing

  • Annoying to type out? There’s a keyboard shortcut

    • on mac, command-shift-M
    • on pc, ctrl-shift-M

Continuing on creating new variables

  • Find your exercise from last Thursday (ex-6-01)

    • No need to clone it again; find where it’s saved on your computer
  • There are hints on the website–use them if you get stuck

  • 10-15 minutes to work

Creating new variables: answers!