#subsets data by “Number.of.Bales” and stores in a new vector
Bales <- data$Number.of.Bales
#subsets data by “Total.Weight” and stores in a new vector
Weight <- data$Total.Weight
#creates a scatterplot of the relationship between the Number of Bales and the Total Weight. the plot() function is used to create the scatter plot below
plot(Bales,Weight)
The lines of code above create two different vectors each containing different data. By subsetting the data, storing it in a new vector, and storing that new vector into a new variable name we can plot the data on a scatter plot. In this instance “Bales” is on the x-axis while the y-axis is the “Weight”. The column names are the parameters set for this scatter plot. By running the last line of code using the plot() function a scatter plot will be created with the already established parameters.
By looking at the scatter plot above we can see that as the “Bales” increases so does the “Weight”. Intuitively this makes sense as more bales would weigh more. This is known as a positive correlation but to confirm the relationship we used the cor() function in the lines of code below.
#the cor() command calculates the strength and direction of the relationship between the two columns named as the parameters
cor(Bales, Weight)
The cor() function can be used to calculate the strength and direction of the relationship between the two columns named as the parameters and in this case it is “Bales” and “Weight”. The cor() function in this example generated a value of 0.936676248944921. This correlation denotes a strong positive relationship between “Bales” and “Weight”.
#we squared the cor() command syntax we generated in the above equation
(cor(Bales,Weight))^2
The value produced by squaring the cor() function is 0.877362395337528. The cor() function is squared in order to see the variance. The variance is equal to the value created by squaring the cor() function in this example. 0.877362395337528 is close to 1 which means that much of the variation in one parameter is closely associated with the variation in the other parameter. Taking all of this into account we can conclude that there is indeed a positive relationship between the two parameters “Bales” and “Weight” as shown by looking at the scatter plot, using the cor() function, and then also squaring the cor() function to calculate the variance.
Leave a Reply