My partner and I chose to compare donations from people named ‘john’ with all the other people that made donations. To do this we created two different subsets from the donor data set. The first vector we created was made up of donations from people named ‘john’ and we appropriately named that vector ‘only_john_donation’. The second vector we created was made up of donations coming from all the other people not named ‘john’ and appropriately named that vector ‘never_john_donation’. We chose to do this due to the fact there happened to be a large number of people named ‘john’ within the donor dat set. Next we ran a t-test between ‘only_john_donation’ and ‘never_john_donation’. The results are attached below:
When looking at the results of the t-test it is important to look at the p-value. The p-value generated from this t-test is 0.03204. What does this mean? A p-value is essentially used to help support a rejection of a hypothesis. When a p-value is less than 0.05 or 5% it is commonly accepted that the null hypothesis should be rejected. In this instance the p-value is less than 0.05 showing that within this statistical comparison the null hypothesis should be rejected. Also important to note is that the p-value equating to 0.03204 means that there is a 3.204% chance that the results of this comparison are completely random. Clearly that is a very low chance making this comparison and the statistics significant. Additionally the mean of both of the vectors we created is listed at 17.6 and 24.82551 respectively. From this we can conclude that people named ‘john’ averaged less donation amounts than the rest of the people in the data set.
Leave a Reply