Now return to the scatterplot that you created earlier. Notice that there is an outlier in both longevity (40 years) and gestation (645 days). Note: This outlier corresponds to the longevity and gestation period of the elephant.What do you think will happen to the correlation if we remove this outlier?To do this in R, copy the following command:cor(a$longevity[a$animal!="elephant"],a$gestation[a$animal!="elephant"])Notice that the correlation between gestation and longevity has changed.Report the new value for the correlation between gestation and longevity and compare it to the value you found earlier when the outlier was included. What is it about this outlier that results in the fact that its inclusion in the data causes the correlation to increase? (Hint: look at the scatterplot.)
Question
Now return to the scatterplot that you created earlier. Notice that there is an outlier in both longevity (40 years) and gestation (645 days). Note: This outlier corresponds to the longevity and gestation period of the elephant.What do you think will happen to the correlation if we remove this outlier?To do this in R, copy the following command:cor(aanimal!="elephant"],aanimal!="elephant"])Notice that the correlation between gestation and longevity has changed.Report the new value for the correlation between gestation and longevity and compare it to the value you found earlier when the outlier was included. What is it about this outlier that results in the fact that its inclusion in the data causes the correlation to increase? (Hint: look at the scatterplot.)
Solution
The question is asking you to analyze the impact of an outlier on the correlation between two variables - longevity and gestation. Here are the steps to answer this question:
-
First, you need to identify the outlier in the scatterplot. In this case, the outlier is the elephant with a longevity of 40 years and a gestation period of 645 days.
-
Next, you need to calculate the correlation between longevity and gestation with the outlier included. This can be done using the "cor" function in R. The command would look something like this: cor(agestation).
-
Then, you need to calculate the correlation between longevity and gestation without the outlier. This can be done by excluding the elephant from the data set. The command would look something like this: cor(aanimal!="elephant"],aanimal!="elephant"]).
-
Compare the two correlation values. If the correlation is higher when the outlier is included, this means that the outlier is having a significant impact on the relationship between the two variables.
-
To understand why the outlier is having such a
Similar Questions
The data in the scatterplot below are an individual's age (in years) and the expected life span (in years). The circles correspond to females and the x's to males. Which of the following conclusions is most accurate? There is a positive correlation between gender and life expectancy. There is a negative correlation between gender and life expectancy. There is a positive correlation between age and life expectancy for both males and females. There is a negative correlation between age and life expectancy for both males and females.
To open R with the dataset preloaded, right-click here and choose "Save Target As" to download the file to your computer. Then find the downloaded file and double-click it to open it in R.The data have been loaded into the data frame a. Enter the command a to see the data. The variables in a are animal, gestation, and longevity.animal: the name of the animal speciesgestation: the average gestation period of the species, in dayslongevity: the average longevity of the species, in yearsNotice that the correlation between gestation and longevity has changed.Remember that the correlation is only an appropriate measure of the linear relationship between two quantitative variables. First produce a scatterplot to verify that gestation and longevity are nearly linear in their relationship.To do this in R, copy the entire command below:plot(a$longevity,a$gestation,xlab="Average Longevity of Species (years)", ylab="Average Gestation Period of Species (days)")Observe that the relationship between gestation period and longevity is linear and positive. Now we will compute the correlation between gestation period and longevity.To do that in R, copy the command:cor(a$longevity,a$gestation)Now return to the scatterplot that you created earlier. Notice that there is an outlier in both longevity (40 years) and gestation (645 days). Note: This outlier corresponds to the longevity and gestation period of the elephant.Report the correlation between gestation and longevity and comment on the strength and direction of the relationship. Interpret your findings in context.
Choose the most likely correlation value for this scatterplot:r = 0.436r = 0.100r = −0.897r = 0.995r = −0.575
A local ice cream shop kept track of the number of cans of cold soda it sold each day, and the temperature that day, for two months during the summer. The data are displayed in the scatterplot below:The one outlier corresponds to a day on which the refrigerator for the soda was broken. Which of the following is true? A reasonable value of the correlation coefficient r for these data is 1.2. If the temperature were measured in degrees Celsius (C = 5/9*(F-32)), the value of r would change accordingly. If the outlier were removed, r would increase. If the outlier were removed, r would decrease.
A correlation of r = 0.85 is found between weekly sales of firewood and cough drops over a 1-year period. Which of the following is a correct interpretation of this correlation value? There is a pretty strong positive linear relationship between sales of firewood and cough drops. Temperature is a possible lurking variable that is behind this relationship. Fire must be the cause of coughing.
Upgrade your grade with Knowee
Get personalized homework help. Review tough concepts in more detail, or go deeper into your topic by exploring other relevant questions.