mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Isaac Quintanilla Salinas
UC Riverside
4/21/2022
Presentation:
www.inqs.info/files/hiss_3/hiss_3.html
RMD:
www.inqs.info/files/hiss_3/hiss_3.qmd
Website:
Email:
iquin002@ucr.edu
mutate()
adds new variablesselect()
selects variablesfilter()
filters dataif_else()
conditional function that returns 2 valuesgroup_by()
a dataset is grouped by factorssummarise()
provides summaries of dataUsed to create tidy data
pivot_longer()
(formerly gather()
) transforms the data from wide to long
pivot_wider()
(formerly spread()
) transforms the data from long to wide
separate()
separates a one variable to multiple variables
unite()
merge multiple variable to one variable
%>%
The pipe operator is the real power of tidyverse.
It takes the output of a function and uses it as input for another function.
Tidyverse works best when data frames (tibbles) are used a inputs.
We will work on manipulating the mtcars
data set
Below prints out the code:
mutate()
Adds a new variable to a data frame
Example:
mutate()
Each argument adds a new variable added
Example:
mpg cyl disp hp drat wt qsec vs am gear carb log_mpg
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 3.044522
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 3.044522
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 3.126761
log_hp
Mazda RX4 4.700480
Mazda RX4 Wag 4.700480
Datsun 710 4.532599
select()
-This selects the variables to keep in the data frame
-Example:
filter()
Selects observations that satisfy a condition
Example:
if_else()
A function that provides T (1) if the condition is met and F (0) otherwise
Example:
mtcars %>%
mutate(log_mpg=log(mpg),log_hp=log(hp)) %>%
select(mpg,log_mpg,hp,log_hp) %>%
filter(log_hp<5) %>%
mutate(hilhp=if_else(log_hp>mean(log_hp),1,0)) %>%
head(n=3)
mpg log_mpg hp log_hp hilhp
Mazda RX4 21.0 3.044522 110 4.700480 1
Mazda RX4 Wag 21.0 3.044522 110 4.700480 1
Datsun 710 22.8 3.126761 93 4.532599 1
group_by()
This groups the data frame
Example:
mtcars %>%
mutate(log_mpg=log(mpg),log_hp=log(hp)) %>%
select(mpg,log_mpg,hp,log_hp) %>%
filter(log_hp<5) %>%
mutate(hilhp=if_else(log_hp>mean(log_hp),1,0)) %>%
group_by(hilhp) %>%
head(n=3)
# A tibble: 3 × 5
# Groups: hilhp [1]
mpg log_mpg hp log_hp hilhp
<dbl> <dbl> <dbl> <dbl> <dbl>
1 21 3.04 110 4.70 1
2 21 3.04 110 4.70 1
3 22.8 3.13 93 4.53 1
summarise()
mtcars %>%
mutate(log_mpg=log(mpg),log_hp=log(hp)) %>%
select(mpg,log_mpg,hp,log_hp) %>%
filter(log_hp<5) %>%
mutate(hilhp=if_else(log_hp>mean(log_hp),1,0)) %>%
group_by(hilhp) %>%
summarise(mean_mpg=mean(mpg),mean_lmpg=mean(log_mpg),
sd_mpg=sd(mpg),sd_lmpg=sd(log_mpg)) %>%
head(n=3)
# A tibble: 2 × 5
hilhp mean_mpg mean_lmpg sd_mpg sd_lmpg
<dbl> <dbl> <dbl> <dbl> <dbl>
1 0 29.7 3.38 3.85 0.133
2 1 22.0 3.08 3.46 0.148
We work on converting data from wide to long using the functions in the tidyr package. For many statistical analysis, long data is necessary.
Use the read_csv()
to read data_3_4.csv
into an object called data1
;
[1] "ID1" "v1/mean" "v1/sd" "v1/median" "v2/mean" "v2/sd"
[7] "v2/median" "v3/mean" "v3/sd" "v3/median" "v4/mean" "v4/sd"
[13] "v4/median"
# A tibble: 6 × 13
ID1 v1/me…¹ `v1/sd` v1/me…² v2/me…³ `v2/sd` v2/med…⁴ v3/me…⁵ `v3/sd` v3/me…⁶
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Ad91… 3.11 2.86 4.50 1.93 3.21 3.27 2.65 -0.383 3.23
2 A9c5… 2.03 2.90 2.08 0.709 2.27 4.13 1.45 2.01 2.84
3 A28a… -0.415 2.42 2.47 2.38 -0.820 1.22 3.44 1.63 2.10
4 Aaf5… 1.25 2.24 3.71 4.00 0.456 4.32 1.54 0.789 4.08
5 A370… -0.984 0.972 3.73 2.19 -0.184 2.14 4.32 -0.804 5.38
6 Aea9… 1.42 1.34 2.35 2.77 4.16 -0.00874 -3.02 4.25 6.36
# … with 3 more variables: `v4/mean` <dbl>, `v4/sd` <dbl>, `v4/median` <dbl>,
# and abbreviated variable names ¹`v1/mean`, ²`v1/median`, ³`v2/mean`,
# ⁴`v2/median`, ⁵`v3/mean`, ⁶`v3/median`
# A tibble: 10 × 5
ID1 time mean sd median
<chr> <chr> <dbl> <dbl> <dbl>
1 Ad9131ee9 v1 3.11 2.86 4.50
2 Ad9131ee9 v2 1.93 3.21 3.27
3 Ad9131ee9 v3 2.65 -0.383 3.23
4 Ad9131ee9 v4 0.605 0.883 4.65
5 A9c5988ea v1 2.03 2.90 2.08
6 A9c5988ea v2 0.709 2.27 4.13
7 A9c5988ea v3 1.45 2.01 2.84
8 A9c5988ea v4 0.710 3.03 -0.0898
9 A28a5479d v1 -0.415 2.42 2.47
10 A28a5479d v2 2.38 -0.820 1.22
pivot_longer()
pivot_longer()
function grabs the variables that repeated in an observation places them in one variable:data1 %>%
pivot_longer(cols=`v1/mean`:`v4/median`,names_to = "measurement",values_to = "value") %>%
head()
# A tibble: 6 × 3
ID1 measurement value
<chr> <chr> <dbl>
1 Ad9131ee9 v1/mean 3.11
2 Ad9131ee9 v1/sd 2.86
3 Ad9131ee9 v1/median 4.50
4 Ad9131ee9 v2/mean 1.93
5 Ad9131ee9 v2/sd 3.21
6 Ad9131ee9 v2/median 3.27
separate()
separate()
function will separate a variable to multiple variables:data1 %>%
pivot_longer(cols=`v1/mean`:`v4/median`,names_to = "measurement",values_to = "value") %>%
separate(col=measurement,into=c("time","stat"),sep="/") %>%
head()
# A tibble: 6 × 4
ID1 time stat value
<chr> <chr> <chr> <dbl>
1 Ad9131ee9 v1 mean 3.11
2 Ad9131ee9 v1 sd 2.86
3 Ad9131ee9 v1 median 4.50
4 Ad9131ee9 v2 mean 1.93
5 Ad9131ee9 v2 sd 3.21
6 Ad9131ee9 v2 median 3.27
pivot_wider()
pivot_wider()
function then converts long data to wide data.data1 %>%
pivot_longer(`v1/mean`:`v4/median`,"measurement","value") %>%
separate(measurement,c("time","stat"),sep="/") %>%
pivot_wider(names_from = stat,values_from = value) %>%
head()
# A tibble: 6 × 5
ID1 time mean sd median
<chr> <chr> <dbl> <dbl> <dbl>
1 Ad9131ee9 v1 3.11 2.86 4.50
2 Ad9131ee9 v2 1.93 3.21 3.27
3 Ad9131ee9 v3 2.65 -0.383 3.23
4 Ad9131ee9 v4 0.605 0.883 4.65
5 A9c5988ea v1 2.03 2.90 2.08
6 A9c5988ea v2 0.709 2.27 4.13
ggplot2 creates a plot by layering graphical elements on top of a plot
A base plot is created with the data
Additional layers are added to base plot with +
sign
Create Base Plot
Add geometrical Elements
Customize Plot
A base plot is created using ggplot2()
data
: specifies data frame to construct the base plot
mapping
: specifies the aesthetic mapping for the plot
aes()
: creates the mapping functiongeom_histogram()
geom_density()
geom_qq()
geom_qq_line()
geom_point()
geom_line()
geom_bin2d()
geom_density_2d()
geom_contour_filled()
geom_contour()
Regression Line
geom_smooth(method = "lm")
LOESS
geom_smooth()
Faceting: Facet allows you to subset the data by a categorical variable
facet_grid()
facet_wrap()
Grouping can be done within the mapping function: aes()
color
group
shape
ggtitle()
xlab()
ylab()
The theme()
function allows you to change any component in the plot
ggplot2 has several prebuilt themes:
theme_bw()
theme_void()
Legends can be adjusted using the scale_XX_YY()
XX
: the type grouping factor
YY
: the type variable
Base Plot
Scatter Plot
Add Regression Line
Split The Plot
Change the Labels
Adjust the Legend
Change the theme
ggplot(mtcars,
aes(mpg, hp,
color = factor(vs))) +
geom_point()+
geom_smooth(method = "lm") +
facet_grid(cols = vars(am),
labeller = as_labeller(c(
`1` = "Manual",
`0` = "Automatic"))) +
ggtitle("Mtcars Plot") +
xlab("Miles Per Gallon") +
ylab("Horse Power") +
scale_color_discrete(
labels = c("V-Shaped", "Straight"),
name = "")
ggplot(mtcars,
aes(mpg, hp,
color = factor(vs))) +
geom_point()+
geom_smooth(method = "lm") +
facet_grid(cols = vars(am),
labeller = as_labeller(c(
`1` = "Manual",
`0` = "Automatic"))) +
ggtitle("Mtcars Plot") +
xlab("Miles Per Gallon") +
ylab("Horse Power") +
scale_color_discrete(
labels = c("V-Shaped", "Straight"),
name = "") +
theme_bw()
Google is your friend!
Practice!
Read the documentation!
Utilize Cheatsheets!