Data and Variables

Lecture 2

Aidan Combs

Duke University
SOCIOL 333 - Summer Term 1 2023

2023-05-18

Your advice for each other

  • Keep on the ball…

    • Always finish | don’t get behind | always stay on your work
  • …but don’t sacrifice your mental health

    • Take breaks when needed, don’t overwork yourself | Stress is a fact of life, but should never be a way of life!

    • Have a positive outlook, don’t be so pessimistic that it brings the entire mood of the group down

    • stay happy stay positive stay sane

Wrapping up course logistics

Reminders

  • Drop/add ends tomorrow
  • Class does not meet Fridays–next meeting is Monday

Slides

  • Available on the website. Use your arrow keys to navigate.

  • I will post a version before class so you can follow along if you want to. If we don’t make it through everything I will cut the extra slides afterwards.

  • Editable versions will be posted on GitHub if useful to you–the file format is the same as what we will use for assignments (more on that next week!)

Cadence

  • Class: Monday-Thursday (except Memorial Day and Juneteenth)
  • In-class exercises: Most days; usually due before the next class period
  • Project: Completed in pieces throughout the semester. Bring drafts of components for workshopping on Thursdays. Components are due for grading the following Monday (or Tuesday, when Monday is a holiday).
  • Homework: Due on Mondays when there isn’t a project component due.

Grading

Category Percentage
Homework 20% (10% x 2)
In-class exercises 30%
Project 50% (22.5% components; 27.5% final paper+pres)

See course syllabus for how the final letter grade will be determined.

Support

  • Attend office hours! 30 min before/after class (in RC 329 usually; after class Wednesday in my office downstairs)
  • Ask questions in class
  • Email for individual matters. aidan.combs@duke.edu
  • Lots of resources available through the university–use them!

Announcements

  • Posted on Sakai (Announcements tool) and sent via email, be sure to check both regularly
  • I’ll assume that you’ve read an announcement by the next “business” day
  • I’ll (try my best to) send a weekly update announcement each Thursday or Friday, outlining the plan for the following week and reminding you what you need to do.

Diversity + inclusion

My goal is that everyone is well-served by this course. Diverse experiences and perspectives that you bring to this class are a resource, strength and benefit!

  • Please list your pronouns on your name tent if you feel comfortable doing so.
  • If you feel like your performance in the class is being impacted by your experiences outside of class, please don’t hesitate to come and talk with me. I want to be a resource for you.
    • If you prefer to speak with someone outside of the course, your advisors and deans are excellent resources.
  • I, along with many others, am still learning (and unlearning), and sometimes make mistakes. If something that happens here doesn’t sit right with you, please talk to me about it.

Accessibility

  • The Student Disability Access Office (SDAO) is available to ensure that students are able to engage with their courses and related assignments.

  • I am committed to making all course materials accessible and I’m always learning how to do this better. If any course component is not accessible to you in any way, please don’t hesitate to let me know.

Course policies

COVID policies

  • Masks are optional

  • If you’re sick, please stay home

  • Read and follow university guidance

Late work and regrades

  • We have policies!
  • Read about them on the course syllabus and refer back to them when you need it

Collaboration policy

  • Everything you submit for grading in this course should be completed and submitted individually. You can talk to each other, but please don’t copy/paste.

  • However, you will be working together frequently to provide feedback on each others’ final projects.

Sharing / reusing code policy

  • Programmers use online resources to help them accomplish tasks all the time–these sources can be extremely helpful for learning when used well.

    • Use them to learn. Don’t blindly copy/paste; it won’t work. We’ll talk more about this soon.
  • You may use online resources (e.g. RStudio Community, StackOverflow, ChatGPT, etc.) but you must explicitly cite where you obtained any code you directly use or use as inspiration.

  • Un-cited recycled/copied code = plagiarism :(

Academic integrity

To uphold the Duke Community Standard:

  • I will not lie, cheat, or steal in my academic endeavors;

  • I will conduct myself honorably in all my endeavors; and

  • I will act if the Standard is compromised.

Most importantly!

Ask if you’re not sure if something violates a policy!

Data and Variables

Sociological data

Sociological data: Qualitative

  • Interpretations, stories, narratives, observations

  • Uses:

    • deep understanding of particular cases, theory building, where measurement doesn’t work
  • Methods:

    • interviews, ethnographies, content analysis, image analysis, etc

Sociological data: Quantitative (this class)

  • Measured quantities. Counts, closed-ended surveys, proportions, outcomes.

  • Uses:

    • Describing measurable things about the world, theory testing
  • Methods/sources:

    • Surveys, administrative data, digital trace data, etc

EADA data (from last time)

Step 1: Data -> data frame

Observations

  • Each instance that you observe data about (often 1 observation = 1 person, but not always)
  • Rows in your data frame

Variables

  • Aspects that can be different for different cases
  • Columns in your data frame
  • Values the variable can take are called response options (especially in the context of surveys)

Types of variables

Categorical

  • Response options have names, not numbers
  • Mathematical operations (addition, subtraction, division) do not make sense
  • The book treats these subtypes as the same–we will too for math’s sake, but sometimes the difference does play into interpretations

Categorical: Nominal

  • Simple categories
  • Can’t say one category is “bigger” or “smaller” than another
  • Examples: gender identity, race, ethnicity, economic sector

Categorical: Ordinal

  • You can say that one is “bigger” or “more” than another
  • Examples: education (no hs diploma, hs diploma, some college, 2 year degree, 4 year degree, advanced degree), income (low, middle, high)

Categorical: Binary

  • Has just two levels: yes/no; present/not present
  • Examples: has college degree/does not have college degree; not ever married/married

Numeric

  • Response options are numbers where the values are meaningful (ie, not arbitrary)
  • Mathematical operations (addition, subtraction, etc) make sense

Numeric: Continuous

  • Any value within a range is possible–decimals are fine
  • Examples: Temperature, height, elevation, stock price

Numeric: Discrete

  • Only certain values (usually round numbers) “make sense” and are used
  • Examples: Age, years of education

Exercise: EADA data variable types

  • Determine the type of each variable in this data set.

For next time: