The goal of this lab is give you more practice with scripting, R Packages, and Basic Stats functions.
You will need to submit your answers from Section 5.2 onto to canvas.
For this lab, it is recommended you use an R script to analyze the data. As a reminder, the code you create here can be used for HW 1. You can use the R script that is provided on canvas or create your own1.
Additionally, remember to practice proper scripting techniques to ensure best practices. This includes naming techniques, R package placements, and proper commenting.
R Packages are used to increases the functionality of R. Additionally, many R package developers release data to the public via. The data is saved as an RData file that can be easily accessed. For this section, you will install the palmerpenguins
from CRAN.
There are two ways to install an R Package from CRAN, via the console or RStudio (recommended). You can choose either way to install the package. If you decide to install the palmerpenguins
via the console, use the following code:
install.packages("palmerpenguins")
Before you can access the data, you will need to load in the package, use the following code to load the package:
library(palmerpenguins)
The name of the data set is called penguins
use the head
function to view the first few lines of the data set. The output is provided below:
## # A tibble: 6 x 8
## species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex
## <fct> <fct> <dbl> <dbl> <int> <int> <fct>
## 1 Adelie Torge… 39.1 18.7 181 3750 male
## 2 Adelie Torge… 39.5 17.4 186 3800 fema…
## 3 Adelie Torge… 40.3 18 195 3250 fema…
## 4 Adelie Torge… NA NA NA NA <NA>
## 5 Adelie Torge… 36.7 19.3 193 3450 fema…
## 6 Adelie Torge… 39.3 20.6 190 3650 male
## # … with 1 more variable: year <int>
If you look deeper into the penguins
data set, you will notice there are missing values in a few observations. Therefore, we are going to eliminate these observations, by creating a new data set called new_penguin
. You can use the na.omit
function to eliminate observations with missing values:
new_penguin <- na.omit(penguins)
new_penguin
Using the new_penguin
data set, convert the following variables to separate vectors:
species
island
sex
bill_length_mm
bill_depth_mm
flipper_length_mm
body_mass_g
As an example, the code below extracts the variable year
and creates a new vector called penguins_year
:
penguins_year <- new_penguin$year
R has basic functions to calculate basic statistics for vectors. The table below provides a limited list of functions.
Function | Description |
---|---|
max | Maximum |
min | Minimum |
range | Range (min and max) |
mean | Mean |
median | Median |
sd | Standard Deviation |
sum | Sum |
Function | Description |
---|---|
table | Obtain Frequencies |
prop.table | Obtain Relative Frequencies |
Using the different functions obtain the required statistics. Record your answers on the Lab 1B quiz on Canvas.
For the numeric vectors, obtain the following statistics: mean, median, standard deviation, and sum. For example:
mean(penguins_year)
For the character vectors, obtain the table frequencies. For example:
table(penguins_year)
Remember to properly comment which problems are being answered in the script.↩︎