mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4     21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710    22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Isaac Quintanilla Salinas
UC Riverside
4/21/2022
Presentation:
www.inqs.info/files/hiss_3/hiss_3.html
RMD:
www.inqs.info/files/hiss_3/hiss_3.qmd
Website:
Email:
iquin002@ucr.edu
mutate() adds new variablesselect() selects variablesfilter() filters dataif_else() conditional function that returns 2 valuesgroup_by() a dataset is grouped by factorssummarise() provides summaries of dataUsed to create tidy data
pivot_longer() (formerly gather()) transforms the data from wide to long
pivot_wider() (formerly spread()) transforms the data from long to wide
separate() separates a one variable to multiple variables
unite() merge multiple variable to one variable
%>%The pipe operator is the real power of tidyverse.
It takes the output of a function and uses it as input for another function.
Tidyverse works best when data frames (tibbles) are used a inputs.
We will work on manipulating the mtcars data set
Below prints out the code:
mutate()Adds a new variable to a data frame
Example:
mutate()Each argument adds a new variable added
Example:
               mpg cyl disp  hp drat    wt  qsec vs am gear carb  log_mpg
Mazda RX4     21.0   6  160 110 3.90 2.620 16.46  0  1    4    4 3.044522
Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4 3.044522
Datsun 710    22.8   4  108  93 3.85 2.320 18.61  1  1    4    1 3.126761
                log_hp
Mazda RX4     4.700480
Mazda RX4 Wag 4.700480
Datsun 710    4.532599
select()-This selects the variables to keep in the data frame
-Example:
filter()Selects observations that satisfy a condition
Example:
if_else()A function that provides T (1) if the condition is met and F (0) otherwise
Example:
mtcars %>% 
  mutate(log_mpg=log(mpg),log_hp=log(hp)) %>%
  select(mpg,log_mpg,hp,log_hp) %>%
  filter(log_hp<5) %>%
  mutate(hilhp=if_else(log_hp>mean(log_hp),1,0)) %>%
  head(n=3)               mpg  log_mpg  hp   log_hp hilhp
Mazda RX4     21.0 3.044522 110 4.700480     1
Mazda RX4 Wag 21.0 3.044522 110 4.700480     1
Datsun 710    22.8 3.126761  93 4.532599     1
group_by()This groups the data frame
Example:
mtcars %>% 
  mutate(log_mpg=log(mpg),log_hp=log(hp)) %>%
  select(mpg,log_mpg,hp,log_hp) %>%
  filter(log_hp<5) %>%
  mutate(hilhp=if_else(log_hp>mean(log_hp),1,0)) %>%
  group_by(hilhp) %>% 
  head(n=3)# A tibble: 3 × 5
# Groups:   hilhp [1]
    mpg log_mpg    hp log_hp hilhp
  <dbl>   <dbl> <dbl>  <dbl> <dbl>
1  21      3.04   110   4.70     1
2  21      3.04   110   4.70     1
3  22.8    3.13    93   4.53     1
summarise()mtcars %>% 
  mutate(log_mpg=log(mpg),log_hp=log(hp)) %>%
  select(mpg,log_mpg,hp,log_hp) %>%
  filter(log_hp<5) %>%
  mutate(hilhp=if_else(log_hp>mean(log_hp),1,0)) %>%
  group_by(hilhp) %>%
  summarise(mean_mpg=mean(mpg),mean_lmpg=mean(log_mpg),
            sd_mpg=sd(mpg),sd_lmpg=sd(log_mpg)) %>%
  head(n=3)# A tibble: 2 × 5
  hilhp mean_mpg mean_lmpg sd_mpg sd_lmpg
  <dbl>    <dbl>     <dbl>  <dbl>   <dbl>
1     0     29.7      3.38   3.85   0.133
2     1     22.0      3.08   3.46   0.148
We work on converting data from wide to long using the functions in the tidyr package. For many statistical analysis, long data is necessary.
Use the read_csv() to read data_3_4.csv into an object called data1;
 [1] "ID1"       "v1/mean"   "v1/sd"     "v1/median" "v2/mean"   "v2/sd"    
 [7] "v2/median" "v3/mean"   "v3/sd"     "v3/median" "v4/mean"   "v4/sd"    
[13] "v4/median"
# A tibble: 6 × 13
  ID1   v1/me…¹ `v1/sd` v1/me…² v2/me…³ `v2/sd` v2/med…⁴ v3/me…⁵ `v3/sd` v3/me…⁶
  <chr>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>    <dbl>   <dbl>   <dbl>   <dbl>
1 Ad91…   3.11    2.86     4.50   1.93    3.21   3.27       2.65  -0.383    3.23
2 A9c5…   2.03    2.90     2.08   0.709   2.27   4.13       1.45   2.01     2.84
3 A28a…  -0.415   2.42     2.47   2.38   -0.820  1.22       3.44   1.63     2.10
4 Aaf5…   1.25    2.24     3.71   4.00    0.456  4.32       1.54   0.789    4.08
5 A370…  -0.984   0.972    3.73   2.19   -0.184  2.14       4.32  -0.804    5.38
6 Aea9…   1.42    1.34     2.35   2.77    4.16  -0.00874   -3.02   4.25     6.36
# … with 3 more variables: `v4/mean` <dbl>, `v4/sd` <dbl>, `v4/median` <dbl>,
#   and abbreviated variable names ¹`v1/mean`, ²`v1/median`, ³`v2/mean`,
#   ⁴`v2/median`, ⁵`v3/mean`, ⁶`v3/median`
# A tibble: 10 × 5
   ID1       time    mean     sd  median
   <chr>     <chr>  <dbl>  <dbl>   <dbl>
 1 Ad9131ee9 v1     3.11   2.86   4.50  
 2 Ad9131ee9 v2     1.93   3.21   3.27  
 3 Ad9131ee9 v3     2.65  -0.383  3.23  
 4 Ad9131ee9 v4     0.605  0.883  4.65  
 5 A9c5988ea v1     2.03   2.90   2.08  
 6 A9c5988ea v2     0.709  2.27   4.13  
 7 A9c5988ea v3     1.45   2.01   2.84  
 8 A9c5988ea v4     0.710  3.03  -0.0898
 9 A28a5479d v1    -0.415  2.42   2.47  
10 A28a5479d v2     2.38  -0.820  1.22  
pivot_longer()pivot_longer() function grabs the variables that repeated in an observation places them in one variable:data1 %>% 
  pivot_longer(cols=`v1/mean`:`v4/median`,names_to = "measurement",values_to = "value") %>% 
  head()# A tibble: 6 × 3
  ID1       measurement value
  <chr>     <chr>       <dbl>
1 Ad9131ee9 v1/mean      3.11
2 Ad9131ee9 v1/sd        2.86
3 Ad9131ee9 v1/median    4.50
4 Ad9131ee9 v2/mean      1.93
5 Ad9131ee9 v2/sd        3.21
6 Ad9131ee9 v2/median    3.27
separate()separate() function will separate a variable to multiple variables:data1 %>% 
  pivot_longer(cols=`v1/mean`:`v4/median`,names_to = "measurement",values_to = "value") %>% 
  separate(col=measurement,into=c("time","stat"),sep="/") %>% 
  head()# A tibble: 6 × 4
  ID1       time  stat   value
  <chr>     <chr> <chr>  <dbl>
1 Ad9131ee9 v1    mean    3.11
2 Ad9131ee9 v1    sd      2.86
3 Ad9131ee9 v1    median  4.50
4 Ad9131ee9 v2    mean    1.93
5 Ad9131ee9 v2    sd      3.21
6 Ad9131ee9 v2    median  3.27
pivot_wider()pivot_wider() function then converts long data to wide data.data1 %>% 
  pivot_longer(`v1/mean`:`v4/median`,"measurement","value") %>% 
  separate(measurement,c("time","stat"),sep="/") %>% 
  pivot_wider(names_from = stat,values_from = value) %>% 
  head()      # A tibble: 6 × 5
  ID1       time   mean     sd median
  <chr>     <chr> <dbl>  <dbl>  <dbl>
1 Ad9131ee9 v1    3.11   2.86    4.50
2 Ad9131ee9 v2    1.93   3.21    3.27
3 Ad9131ee9 v3    2.65  -0.383   3.23
4 Ad9131ee9 v4    0.605  0.883   4.65
5 A9c5988ea v1    2.03   2.90    2.08
6 A9c5988ea v2    0.709  2.27    4.13
ggplot2 creates a plot by layering graphical elements on top of a plot
A base plot is created with the data
Additional layers are added to base plot with + sign
Create Base Plot
Add geometrical Elements
Customize Plot
A base plot is created using ggplot2()
data: specifies data frame to construct the base plot
mapping: specifies the aesthetic mapping for the plot
aes(): creates the mapping functiongeom_histogram()geom_density()geom_qq()geom_qq_line()geom_point()geom_line()geom_bin2d()geom_density_2d()geom_contour_filled()geom_contour()Regression Line
geom_smooth(method = "lm")LOESS
geom_smooth()Faceting: Facet allows you to subset the data by a categorical variable
facet_grid()
facet_wrap()
Grouping can be done within the mapping function: aes()
color
group
shape
ggtitle()xlab()ylab()The theme() function allows you to change any component in the plot
ggplot2 has several prebuilt themes:
theme_bw()
theme_void()
Legends can be adjusted using the scale_XX_YY()
XX: the type grouping factor
YY: the type variable

Base Plot
Scatter Plot
Add Regression Line
Split The Plot
Change the Labels
Adjust the Legend
Change the theme
ggplot(mtcars, 
       aes(mpg, hp, 
           color = factor(vs))) +
  geom_point()+
  geom_smooth(method = "lm") +
  facet_grid(cols = vars(am), 
    labeller = as_labeller(c(
      `1` = "Manual",
      `0` =  "Automatic"))) + 
  ggtitle("Mtcars Plot") + 
  xlab("Miles Per Gallon") + 
  ylab("Horse Power") +
  scale_color_discrete(
    labels = c("V-Shaped", "Straight"),
    name = "")
ggplot(mtcars, 
       aes(mpg, hp, 
           color = factor(vs))) +
  geom_point()+
  geom_smooth(method = "lm") +
  facet_grid(cols = vars(am), 
    labeller = as_labeller(c(
      `1` = "Manual",
      `0` =  "Automatic"))) + 
  ggtitle("Mtcars Plot") + 
  xlab("Miles Per Gallon") + 
  ylab("Horse Power") +
  scale_color_discrete(
    labels = c("V-Shaped", "Straight"),
    name = "") +
  theme_bw()
Google is your friend!
Practice!
Read the documentation!
Utilize Cheatsheets!