This project is an analysis and visualization of data on Uber rides taken in New York, New York in the summer of 2014 using R. The data was sourced from Data Flair and this project uses the tutorial there as a jumping off point.
As I was completing this tutorial, I began to brainstorm any other factors besides the time of day that could influence the amount of Uber rides taken. I decided to examine some weather data in addition analyzing the Uber ride data by itself. The weather data I used was sourced from the National Weather Services Forecast Office.
You can find the full R code and the data sets used here.
To obtain the weather data, I began looking at government run websites that listed historical data. I found historical data for New York, New York and used observations taken from the Central Park observation station.
To get the weather table given on the website into a usable format, I copied it into excel, and then read the data into R. The Uber ride data from April to September of 2014 was provided by the tutorial in a .csv file which I then read into R.
I began by compiling the data from each individual month into one data frame and then reformatting the dates so that they were in a more usable format. Then, I created a data frame that shows the number of rides each hour over everyday from April to September in 2014. Using this data frame, I created the following graph.
This graph shows that Uber rides peaked at 5:00 P.M. and were at their lowest at 2:00 A.M. Additionally, rides had a spike at 7:00 A.M. and 8:00 A.M. This coincides with the times that people are coming and going to work, with the number of rides typically highest in the evening.
I wondered next if these peaks would be different on the weekend and on weekdays. It would seem likely that the morning peak of Uber rides should be stronger on weekdays than on weekends. The following graph show the Uber rides taken each hour on the weekends.
This graph shows that Uber rides peaked at 4:00 P.M. and remain high until after midnight on the weekends. Additionally, Uber rides steadily increase throughout the morning and there is no peak around 8:00 A.M. We can also look at weekday data only. The following graph displays Uber rides taken on weekdays.
This graph shows that Uber rides peaked at 5:00 P.M on weekdays but then promptly begin to decline, staying low over night. There is also a more defined spike in rides at 7:00 A.M. and at 8:00 A.M. when compared with viewing all the ride data.
The weather data that I obtained contained several different weather metrics including highest, lowest, and average temperature each day, the amount of precipitation each day, and the amount and depth of snowfall each day. Since the period of Uber data that I am analyzing was taken during the summer, there was no snowfall, so I did not examine the snowfall data. The weather metrics that I did examine were the average temperature each day and the amount of precipitation each day and how these correlated with the average amount of rides taken on those days.
First, I examined the average temperature each day and explored how this related to the average number of rides taken each day. Using this data, I examined how the average number of rides varied based on the average daily temperature any day of the week, just weekdays, and just weekends. The following graphs illustrate this.
When examining all days, rides are slightly higher on average when the average temperature for that day is above 60 degrees Fahrenheit. Examining this data strictly on weekdays and strictly on weekends we see that this effect of high daily temperatures increasing the amount of Uber rides is greater over the weekend than it is during the weekdays.
We can also examine precipitation data that was present and see if this appears to have any affect on the number of rides taken. The following graph displays the amount of precipitation in inches on a given day and the average amount of rides at that precipitation level.
From this graph, we can see that as the amount of precipitation increase, so too does the amount of rides taken. In fact, the average amount of rides taken on days when there are 4 to 5 inches of rain is approximately 40% higher than the average amount of rides taken on days when there are less than 1 inch of rain.
This data analysis of has shown that the number of Uber rides varies depending on both the time of day, which day of the week it is, the average temperature outside on a given day, and the amount of precipitation.
To add to this analysis, we can examine further factors that may influence the number of Uber rides taken. Additionally, if year-round data were obtained, we could identify and attempt to explain seasonal changes in the number of Uber rides taken.