Instructions

The goal of this assignment is to continue our exploration with plots, hypothesis testing, and linear regression. Please use the RMD file FIRST_LAST_HW_04.Rmd to answer your questions. Delete code or answer chunks as necessary. You will submit both the RMD file and the corresponding HTML file from the knitted document.

Note: you may need to search the internet to find answers for some of the questions.

You will be graded as follows:

  • Does your R chunks run (some errors are acceptable in this assignment)?
  • Have you completed the assignment in its entirety?
  • Have you followed the instructions carefully?
  • Have you responded to the questions correctly?

Grading

  • (20 pts) Have you completed the assignment in its entirety?

  • (20 pts) Are your responses correct for the subset of randomly graded questions?

  • (5 pts) CODE DOCUMENTATION. Add comments to nearly every line explaining what the line of code is doing.

  • (5 pts) Submitted Knitted Document


Questions

Part 1

Q1

Part A

Is to be completed in Lab 4A. Below is a description of what is necessary from the lab.

  • The merged data set.

Part B

1.

Load the data set df3.csv and merge it to the previous data set.

2.

Using the newly merged data set, create a scatter plot and fit a trend line to the data. Is there a positive or negative association with the data?

3.

Using the newly merged data set, create a scatter plot and fit a trend line for each group. What is the association for each group?

4.

For each group, fit a linear regression model and interpret the slope for each group. The units for x and y are liters.

Part 2

Q2

Part A

Is to be completed in Lab 4B. Below is a description of what is necessary from the lab.

  • The 2-sample independent t-test results.

  • The dreams data.

Part B

1.

Conduct a paired t-test on the dreams data using the variables extra and group. Is there a significant difference between the groups? You can assume the vectors are ordered correctly.

2.

Compare the results from the paired t-test and the 2-sample independent t-test. Are the results the same?

3.

The dreams data was constructed from the sleep data in R. Read the help documentation for the sleep data. Determine if the data is independent or dependent?

4.

Look at the confidence intervals for both tests. Which is larger? Postulate why one is larger than the other?

Part 3

Q3

The faithful data set in R provides information on waiting time (waiting, independent) and duration of eruption (eruptions, dependent) about the Old Faithful geyser in Yellow Stone National Park. Fit a Linear regression model and interpret the slope term.

Q4

The beavers1 data set in R provide temperature data for a beaver at different time points. Create time-series plot (line plot) showing the temperature trend of the beaver.

Q5

The trees data set in R provides measurements of the diameter (Girth, independent) and height (Height, dependent) of 31 fallen black cherry trees. Fit a linear regression model between the 2 variable and interpret the slope term.

Q6

The rock data set in R provides measurements of different rock samples in the petroleum reservoir. Run a correlation test between the variables area and shape.

Q7

The gehan data set in the R package MASS provides clinical trials on remission times for Leukemia patients. Conduct a hypothesis test comparing the treatment and control group (treat) on remission times time. Interpret the results.

Part 4

Q8

Watch the following video: https://www.youtube.com/watch?v=Q2dewZweAtU.

What is optimization?

Q9

If simple linear regression is used when the outcome is normally distributed? What type of regression is used when the outcome follows a Poisson distribution?

Q10

Using ggplot2, what is the function you need to plot a map?