The goal of this assignment is to continue our exploration with plots, hypothesis testing, and linear regression. Please use the RMD file FIRST_LAST_HW_04.Rmd
to answer your questions. Delete code or answer chunks as necessary. You will submit both the RMD file and the corresponding HTML file from the knitted document.
Note: you may need to search the internet to find answers for some of the questions.
You will be graded as follows:
(20 pts) Have you completed the assignment in its entirety?
(20 pts) Are your responses correct for the subset of randomly graded questions?
(5 pts) CODE DOCUMENTATION. Add comments to nearly every line explaining what the line of code is doing.
(5 pts) Submitted Knitted Document
Is to be completed in Lab 4A. Below is a description of what is necessary from the lab.
Load the data set df3.csv
and merge it to the previous data set.
Using the newly merged data set, create a scatter plot and fit a trend line to the data. Is there a positive or negative association with the data?
Using the newly merged data set, create a scatter plot and fit a trend line for each group. What is the association for each group?
For each group, fit a linear regression model and interpret the slope for each group. The units for x
and y
are liters.
Is to be completed in Lab 4B. Below is a description of what is necessary from the lab.
The 2-sample independent t-test results.
The dreams
data.
Conduct a paired t-test on the dreams
data using the variables extra
and group
. Is there a significant difference between the groups? You can assume the vectors are ordered correctly.
Compare the results from the paired t-test and the 2-sample independent t-test. Are the results the same?
The dreams
data was constructed from the sleep
data in R. Read the help documentation for the sleep
data. Determine if the data is independent or dependent?
Look at the confidence intervals for both tests. Which is larger? Postulate why one is larger than the other?
The faithful
data set in R provides information on waiting time (waiting
, independent) and duration of eruption (eruptions
, dependent) about the Old Faithful geyser in Yellow Stone National Park. Fit a Linear regression model and interpret the slope term.
The beavers1
data set in R provide temperature data for a beaver at different time points. Create time-series plot (line plot) showing the temperature trend of the beaver.
The trees
data set in R provides measurements of the diameter (Girth
, independent) and height (Height
, dependent) of 31 fallen black cherry trees. Fit a linear regression model between the 2 variable and interpret the slope term.
The rock
data set in R provides measurements of different rock samples in the petroleum reservoir. Run a correlation test between the variables area
and shape
.
The gehan
data set in the R package MASS
provides clinical trials on remission times for Leukemia patients. Conduct a hypothesis test comparing the treatment and control group (treat
) on remission times time
. Interpret the results.
If simple linear regression is used when the outcome is normally distributed? What type of regression is used when the outcome follows a Poisson distribution?
Using ggplot2
, what is the function you need to plot a map?