2+2
[1] 4
August 16, 2022
Below is a basic installation guide. Depending on your operating system, you may need to install other R packages1 to have R running smoothly.
Go to the R Website and and go to the “Getting Started” section. This is usually the first thing you will see on the website.
Click on the ‘download R’ link in this section. It will take you to a page called ‘CRAN Mirrors’.
Select the first mirror: https://cloud.r-project.org/
Go to the ‘Download and Install R’ and choose your operating system.
R is a language that tells a computer what to do. It is not exactly an application that you can click on and start conducting statistical analysis. For example, I can write up an R script in a text application, and then run it in a terminal to get the results. Therefore, R can be used in multiple ways on your computer.
When you install R, there is an R GUI (Graphical User Interface) where you can write up R Code and analyze data. The R GUI is perfectly capable of doing everything you need to do. You don’t have to run anything in a terminal. However, it is limited in the features it provides and looks archaic. RStudio is an IDE (integrated development environment) that enhances your R experience. It provides a console (terminal), a script editor (how to write your R Code), environment window (tells you what you have created), and so much more. While you can do everything on the R GUI, I highly recommend installing RStudio! To install RStudio, go to its website www.rstudio.com
Installing RStudio is much simpler than installing R. However, you must install R before you can install RStudio. Here are some of the basic steps to install RStudio.
Go to the website: https://www.rstudio.com/products/rstudio/download/#download
Scroll down to ‘All Installers’.
Click on your operating system, download and install.
While most of your statistical analysis will be done with R functions, it is important to at least have an idea of what is going on. Additionally, we will cover other topics that you may or may not need to know. The topics we will cover are:
Basic calculations in R
Types of Data
R Objects
This section focuses the basic calculation that you can do in R. Essentially, we look at how R can be used as a calculator. This is done by using different operators in R. An operator is a symbol that tells R to do something. Some common operators are +
,-
, and *
which corresponds to addition, subtraction, and division.
To add numbers in R, all you need to use the +
operator. For example 2+2=4. When you type it in R you have:
When you ask R to perform a task, it prints out the result of the task. As we can see above, R prints out the number 4.
To add more than 2 numbers, you can simply just type it in.
This provides the number 6.
To subtract numbers, you need to use the -
operator. Try 4-2
:
Try 4-6-4
Notice that you get a negative number.
Now try 4+4-2+8
:
To multiply numbers, you will need to use the *
operator. Try 4*4
:
To divide numbers, you can use the /
operator. Try 9/3
:
To exponentiate a number to the power of another number, you can use the ^
operator. Try 2^5
:
If you want to take e to the power 2, you will use the exp()
function. Try exp(2)
:
To take the n-th root of a value, use the ^
operator with the /
operator to take the n-th root. For example, to take the 5th-root of 32, type 32^(1/5)
:
To take the natural logarithm of a value, you will use the log()
function. Try log(5)
:
If you want to take the logarithm of a different base, you will use the log()
function with base
argument. We will discuss this more in section 7 of this chapter.
Another important part of R is comparing numbers. When you compare two numbers, R will tell you if that is true or false. We will talk about some of the basic comparisons and their operators.
To check if one number is less than or greater than another number, you will use the >
or <
operators. Try 5>4
:
Notice that R states it’s true. It evaluates the expression and tells you if it’s true or not. Try 5<4
:
Notice that R tells you it is false.
To check if one number is less than or equal to/greater than or equal to another number, you will use the >=
or <=
operators. Try 5>=5
:
Try 5>=4
:
Try 5<=4
To check if 2 numbers are equal to each other, you can use the ==
operator. Try 3==3
:
Try 4==3
Another way to see if 2 numbers are not equal to each other, you can use the !=
. Try 3!=4
:
Try 3!=3
:
You may be asking why use !=
instead of ==
. They both provides similar results. Well the reason is that you may need the ‘TRUE’ output for analysis. One is only true when they are equal, while the other is true when they are not equal.
The last operator we will discuss is the help operator ?
. If you want to know more about anything we talked about you can type ?
in front of a functiona and a help page will pop-up in your browser or in RStudio’s ‘Help’ tab. For example you can type ?Arithmetic
or ?Comparison
, to review what we talked about. For other operators we didn’t talk about use ?assignOps
and ?Logic
.
In R, the type of data, also known as class, that we are using dictates how the programming works. For the most part, users will use ‘numeric’,‘logical’, ‘POSIX’ and ‘character’ data types. Other types of data you may encounter are ‘integer’, ‘complex’, and ‘raw’. These types of data are rarely used. To obtain more information on them, use the ?
operator.
The numeric class is the data that are numbers. Almost every analysis that you use will be based on the numeric class. To check if you have a numeric class, you just need to use the is.numeric()
function. For example, try is.numeric(5)
:
Notice that when you input an number into R, it automatically changes it to a numeric class. R is changes data to the class that it most likely needs to be. Now this is great because you do not need to do anything on your end. Howerver, if you need a different class, you will need to change it.
A logical class are data where the only value is ‘TRUE’ or ‘FALSE’. Sometimes the data is coded as 1 for ‘TRUE’ and 0 for ‘FALSE’. The data may also be coded as ‘T’ or ‘F’. To check if data belongs in the logical class, you will need the is.logical()
function. Try is.logical(3<4)
:
Remember when we ran 3<4
in the previous section. The output was ‘TRUE’. Now R is checking whether the output is of a logical class. Since it it, R returns ‘TRUE’. Now try is.logical(3>4)
:
The output is ‘TRUE’ as well even though the condition 3>4
is ‘FALSE’. Since the output is a logical data type, it is a logical variable.
The POSIX class are date-time data. Where the data value is a time component. The POSIX class can be very complex in how it is formatted. IF you would like to learn more try ?POSIXct
or ?POSIClt
. First, lets run Sys.time()
to check what is today’s data and time:
Now lets check if its of POSIX class, you can use the class()
function to figure out which class is it. Try class(Sys.time())
:
A character value is where the data values follow a string format. Examples of characters values are letters, words and even numbers. A character value is any value surrounded by quotation marks. For example, the phrase “Hello World!” is considere as one character value. Another example if you data is coded with the actual words “yes” or “no”. To check if you have character data, use the is.character()
function. Try is.character("Hello World!")
:
Notice that the output says ‘TRUE’. Character values can be created with single quotations. Try is.character('Hello World!')
:
Integers are just whole numbers for the most part. To create an interger, type the letter ‘L’ after a number. To check if you are using integer data, use the is.integer()
function. Try is.integer(5L)
:
Complex numbers are data values where there is a real component and an imaginary component. The imaginary component is a number multiplied by \(i=\sqrt{-1}\). To create a complect number, use the complex()
function. To check if a number is complex, use the is.complex()
function. Try the following to create a complex number complex(1,4,5)
:
Now try is.complex(complex(1,4,5))
:
You will probably never use raw data. I have never used raw data in R. To create a raw value, use the raw()
or charToRaw()
functions. Try charToRaw('Hello World!')
:
To check if you have raw data, use the is.raw()
function. Try is.raw(charToRaw('Hello World!'))
:
The last data class in R is missing data denoted as NA
. Whenever you see NA
in any of the analysis you see, it means that the data is missing. To check if you have missing data, use the is.na()
function. Try is.na(NA)
:
R objects are where most of the statistical analysis is conducted on. An R object can be thought of as a container of data. For the most part, you will only use a data frame (or tibble) for your data analysis. However, it is always a good idea to to have some basic understanding of the other R objects.
To create an R object, all we need to do is assign data to a variable. The variable is the name of the R object. it can be called anything, but you can only use alphanumeric values, underscore, and periods. To assign a value to a variable, use the <-
operator. This is known a left assignment. Kinda like an arrow pointing left. Try assigning 9 to ‘x’ (x<-9
)`:
To see if x
contains 9, type x
in the console:
Now x
can be treated as data and we can perform data analysis on it. For example, try squaring it:
You can use any mathematical operation from the previous sections. Try some other operations and see what happens.
The output R prints out can be stored in a variable using the asign operator, <-
. Try storing x^3
in a variable called x_cubed
:
To see what is stored in x_cubed
you can either type x_cubed
in the console or use the print()
function with ‘x_cubed’ inside the paranthesis.
A vector is a set data values of a certain leng. The R object x
is considered as a numerical vector (because it contains a number) with the length 1. To check, try is.numeric(x)
and is.vector(x)
:
Now let’s create a logical vector that contains 4 elements (have it follow this sequence: T,F,T,F) and assign it to y
. To create a vector use the c()
function and type all the values and seperating it with columns. Type y<-c(T,F,T,F)
:
Now, lets see how y
looks like. Type y
:
Now lets see if it’s a logical vector:
Fortunately, this vector is really small to count how many elements it has, but what if the vector is really large? To find out how many elements a vector has, use the length()
function. Try length(y)
:
The c()
function allows you to put any data type and as many values as you wish. The only condition of a vector is that it must be the same data type.
A matrix can be thought as a square or rectangular grid of data values. This grid can be constructed in any shape. Similar to vectors they must contain the same data type. The size of a matrix is usually denoted as \(n\times k\), where \(n\) represents the number of rows and \(k\) represents the number of columns. To get a rough idea of how a matrix may look like, type matrix(rep(1,12),nrow=4,ncol=3)
3:
Notice that this is a \(4\times 3\) matrix. Each element in the matrix has the value 1. Now try this matrix(rbinom(12,1.5),nrow=4,ncol=3)
4:
Your matrix may look different, but that is to be expected. Notice that some elements in a matrix are 0’s and some are 1’s. Each element in a matrix can hold any value.
Constructing a matrix can be a bit difficult to do because the data values may need to be arranged in a certain way. Notice that I used the matrix()
function to create the matrix. The examples above contain other components in the function that we will discuss later.
Matrices can be considered as a 2-dimensional block of numbers. An array is an n-dimensional block of numbers. While you may never need to use an array for data analysis. It may come in handy when programming by hand. To create an array, use the array()
function. Below is an example of a \(3 \times 3 \times 3\) with the numbers 1, 2, and 3 representing the 3rd dimension stored in an R object called first_array
5.
Data frames can be thought as the data sets that we normally see in other softwares. You can think about it as an excel spreadsheet. However, you cannot not change the values easily other than coding the changes. In a much general sense, a data frame is just a collection of labeled vectors. To get an idea of what a data frame looks like, try head(iris)
:
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
The head()
function just tells R to only print the top few components of the data frame.
Now try tail(iris)
:
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
145 6.7 3.3 5.7 2.5 virginica
146 6.7 3.0 5.2 2.3 virginica
147 6.3 2.5 5.0 1.9 virginica
148 6.5 3.0 5.2 2.0 virginica
149 6.2 3.4 5.4 2.3 virginica
150 5.9 3.0 5.1 1.8 virginica
The tail()
function provides the last 6 rows of the data frame.
To me a list is just a container that you can store practically anything. It is compiled of elements, where each element contains an R object. For example, the first element of a list may contain a data frame, the second element may contain a vector, and the third element may contain another list. It is just a way to store things.
To create a list, use the list()
function. Create a list compiled of first element with the mtcars data set, second element with a vector of zeros of size 4, and a matrix \(3 \times 3\) identity matrix6. Store the list in an object called list_one
:
Type list_one
to see what pops out:
[[1]]
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
[[2]]
[1] 0 0 0 0
[[3]]
[,1] [,2] [,3]
[1,] 1 0 0
[2,] 0 1 0
[3,] 0 0 1
Each element in the list is labeled as a number. It is more useful to have the elements named. An element is named by typing the name in quotes followed by the =
symbol before your object in the list()
function (mtcars=mtcars
).
Here I am creating an object called list_one
, where the first element is mtcars
labeled mtcars
, the second element is a vector of zeros labeled vector
and the last element is the identity matrix labeled identity
.’
Now create a new list called list_two
and store list_one
labeled as list_one
and first_array
labeled as array
.
$list_one
$list_one$mtcars
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
$list_one$vector
[1] 0 0 0 0
$list_one$identity
[,1] [,2] [,3]
[1,] 1 0 0
[2,] 0 1 0
[3,] 0 0 1
$array
, , 1
[,1] [,2] [,3]
[1,] 1 1 1
[2,] 1 1 1
[3,] 1 1 1
, , 2
[,1] [,2] [,3]
[1,] 2 2 2
[2,] 2 2 2
[3,] 2 2 2
, , 3
[,1] [,2] [,3]
[1,] 3 3 3
[2,] 3 3 3
[3,] 3 3 3
R packages can be thought as more software that increases the capabilities of R↩︎
There may be challenges using R and RStudio on a Mac. Follow an online tutorial to properly install these packages. In the past, many of my students had trouble installing R packages.↩︎
The function rep()
creates a vector by repeating a value for a certain length. rep(1,12)
creates a vector of length 12 with each element being 1↩︎
The rbinom()
function generates binomial random variables and stores them in a vector. rbinom(12,1.5)
This creates 12 random binomial numbers with parameter \(n=1\) and \(p=0.5\).↩︎
Notice the code is surrounded by parenthesis. This tells R to store the array and print out the results. You can surround code with parenthesis evertime you create an object to also print what is stored.↩︎
An identity matrix is a matrix where the diagonal elements are 1 and the non-diagonal elements are 0↩︎