Data wrangling

Setting up

Today we are going to take a look at the starwars database that is included in the tidyverse package.

library(tidyverse) # open our beloved package

starwars

New data wrangling functions: count(), pivot_longer(), pivot_wider()

Before moving to more advanced data wrangling functions, let’s revise the commands in the past recitations.

starwarsEdit <- ____ %>% 
  ____(____) %>% # first, let's remove the columns films, vehicles, and starships; don't forget the pipe %>%!
  group_by(_______) %>%
  summarise(___________)# then, let's calculate mean, min and max values of height, mass, and birth_year by species and gender

Now, what if I want to know how many observations I have per gender and species?

starwars %>% 
  ________(______________) %>% # by gender and species
  summarise(n=n()) # use the n() function inside summarise()

Just so you know – there is a faster way to do this using count()

starwars %>% 
  count(____________) # insert the two groups here

Now we can play a bit with the original data and transform in the opposite format to the original one. We can do this by using the function pivot_longer() and pivot_wider(). Let’s have a look and see the arguments these functions take.

Let’s first select some of the columns so it doesn’t get too cluttery. Let’s select: name, height, mass, birth_year

starwarsSelect <- _______________ %>% ________(_____________________)

What kind of format is the dataset? Wide or long?

It’s wide indeed, so then let’s use pivot_longer() to make it longer.

_______________________ <- starwarsSelect %>% # identify the dataframe
  pivot_longer(cols = _______________, # indicate the columns to pivot into longer format: that is, all columns except name
               names_to = __________, # name of the column with the levels of the group; it should be in quotes
               values_to = __________) # name of the column with the actual values; it should be in quotes

If you wanted, you could also go back to the original format by using pivot_wider():

starwarsSelectTransformW <- starwarsSelectTransformL %>%
  pivot_wider(names_from = features, # name of the column to get the names from and transform into columns
              values_from = values) # name of the column to get the values from

Nice uh? :)

Some more practice

Now, let’s import the dataset from exam 1. As you may recall, the dataset you were given in exam 1 reported real judgment data for grammatical and ungrammatical sentences.

exam1.data <- read.csv("exam1.data.csv")

What is the format of the dataset?

Whatever that was, let’s change it!

___________ <- exam1.data %>%
  pivot_????(   )

Aaaaand let’s change it back again!

___________ <- exam1.data %>%
  pivot_????(   )

If you compared the dataset newly wrangled with the original one, you would see that they are exactly the same!