Week 35: Baltimore Bridges

TidyTuesday
2018
Published

November 28, 2018

# loading the packages
library(readr)
library(tidyverse)
library(stringr)
library(ggthemr)
library(gganimate)
library(formattable)
# load the theme flat dark
ggthemr("flat dark")
# reading the data
bridges <- read_csv("baltimore_bridges.csv")
#View(bridges)
# naming the columns
names(bridges)<-c("lat","long","County","Carries","Year Built",
                  "Condition","Average Daily Traffic","Total Improvement",
                  "Month","Year","Owner","Responsibility","Vehicles")
attach(bridges)

Bridge Data and Baltimore

Data for the analysis and description about the Baltimore bridges are hyper-linked.

{{% tweet "1067650979129118720" %}}

GitHub Code

Data on bridges is of week 35 from TidyTuesday. Trying to explain the data using maps is obvious, yet I will use animated jitter plots. There are 13 variables and 2079 observations. Brave choice of limiting my self to less than 10 variables, where latitude, longitude and Vehicles will not be taken into account.

So with the help of packages tidyverse, ggthemr, gganimate,formattable and readr I will complete this analysis. Most of the bridges are owned by several agencies, but I will only focus on the top three ownership holders.

Counties which have bridges owned by State Highway Agency

Close to 1000 bridges are owned by State Highway Agency, where most of them are in Baltimore County. High amount of bridges are in good condition, further more bridges are in Fair condition and only around 10 bridges in Poor condition.

Considering the Average Daily Traffic only one bridge in Poor condition has the amount of close to 110,000, while all the other poor condition bridges have Average Daily Traffic less than 30,000. Counties Anne Arundel and Hartford have no Poor condition bridges at all.

Most of the bridges are from Baltimore County and around 20 bridges have count of more than 150,000 Average Daily Traffic for both Fair and Good conditions. Hartford and Carroll Counties have their Average Daily Traffic which does not exceed 80,000 at any condition of the bridge.

# jitter plot to State Highway Agency
ggplot(subset(bridges,Owner=="State Highway Agency")
       ,aes(color=Condition,y=`Average Daily Traffic`,x=str_wrap(County,7)))+
  xlab("County")+
  ggtitle("Condition of Bridges owned by State Highway Agency \nand their Average Daily Traffic")+
  scale_y_continuous(labels =seq(0,230000,10000) ,breaks = seq(0,230000,10000))+
  transition_states(Condition,transition_length = 2,state_length = 3)+
  enter_fade()+exit_shrink()+ease_aes("back-in")+geom_jitter()

Counties which have bridges owned by County Highway Agency

County Highway Agency owns the second most amount of bridges in this data-set. Therefore using jitter plots we are going to check how the condition of the bridges and counties are explained in the simplest manner.

Less amount of poor condition bridges in all counties except Anne Roundel County. All bridges owned by County Highway Agency have a limited Average Daily Traffic less than 50,000. Clearly we have more Fair bridges than Good ones. In the Poor condition category only two have Average Daily Traffic more than 20,000, while other two have more than 10 bridges.

Most of these bridges are in Baltimore County even it is in any one of three conditions. There are few bridges which have values more than 40,000 Average Daily Traffic and they are also in Baltimore County.

There are bridges which have Zero Average Daily Traffic. In all three Conditions only Hartford County has bridges which has Average Daily Traffic less than 10,000.

# jitter plot to County Highway Agency
ggplot(subset(bridges,Owner=="County Highway Agency")
       ,aes(color=Condition,y=`Average Daily Traffic`,x=str_wrap(County,7)))+
  xlab("County")+ geom_jitter()+
  ggtitle("Condition of Bridges owned by County Highway Agency \nand their Average Daily Traffic")+
  scale_y_continuous(labels =seq(0,40000,5000) ,breaks = seq(0,40000,5000))+
  transition_states(Condition,transition_length = 2,state_length = 3)+
  enter_fade()+exit_shrink()+ease_aes("back-in")

Counties which have bridges owned by State Toll Authority

There is only one bridge which is in Poor condition and it is in Baltimore County, while counties Howard and Anne Arundel have no Good condition bridges. Further there is only 3 Fair condition bridges in Howard County while they have values for Average Daily Traffic less than 10,000.

The highest Average Daily Traffic is close to 170,000 which are only 4 and are in Good and Fair conditions. Further, Anne Arundel County has only one Good bridge and in Hartford it is six bridges. Only few of the bridges have Average Daily Traffic close to zero.

# jitter plot to State tolll authority
ggplot(subset(bridges,Owner=="State Toll Authority")
       ,aes(color=Condition,y=`Average Daily Traffic`,x=str_wrap(County,7)))+
  xlab("County")+geom_jitter()+
  ggtitle("Condition of Bridges owned by State Toll Authority \nand their Average Daily Traffic")+
  scale_y_continuous(labels =seq(0,170000,10000) ,breaks = seq(0,170000,10000))+
  transition_states(Condition,transition_length = 2,state_length = 3)+
  enter_fade()+exit_shrink()+ease_aes("back-in")

Most amount of bridges Built based on Year

Years 1957, 1970, 1975, 1991, 1963 and 1961 have the top 6 spots for building more than 50 bridges in those years. If we consider the conditions of Fair and Good only the year 1991 is suitable to be mentioned, while all other years has at-least one Poor condition bridge. Further There are more Poor condition bridges in 1961 than in 1957. While all Poor condition bridges has Average Daily Traffic less than 50,000.

Finally, there are only few bridges which have Average Daily Traffic above 100,000 and only 3 are in Good condition. There are Bridges which can have Average Daily Traffic close to zero in all 6 years and all conditions.

# jitter plot to years based on Most amount of bridges built
ggplot(subset(bridges,`Year Built`=="1957" | `Year Built`=="1970" | 
                `Year Built`=="1975" | `Year Built`=="1991" |
                `Year Built`=="1963" | `Year Built`=="1961")
       ,aes(color=Condition,y=`Average Daily Traffic`,x=factor(`Year Built`)))+
  xlab("Year Built")+ylab("Average Daily Traffic")+
  ggtitle("Most amount of Bridges built based on Years \nand their Conditions")+
  geom_jitter()+legend_bottom()+
  transition_states(Condition,transition_length = 2,state_length = 3)+
  enter_fade()+exit_shrink()+ease_aes("back-in")

Average Traffic Less than or equal to 100,000 for Counties with Bridge Condition

While obtaining summary for county variable there is one issue because there are two observations which say “Baltimore city” than “Baltimore City” and I don’t want to change them.

If we focus on Average Daily Traffic less than or equal to 100,000 based on County and Condition. It is clear that Poor condition bridges are part of this criteria and mostly Average Daily Traffic is less than 5000 for Counties Howard, Hartford and Carroll. While Baltimore County has highest amount up-to 75,000, but Baltimore County has highest amount close to 40,000 for Average Daily Traffic. Finally Anne Arundel County has only one Poor condition bridge which has Average Daily Traffic close to zero.

We can see that there are more Fair Condition bridges than Good ones. In Baltimore County most of the Fair condition bridges have Average Daily Traffic less than 15000. Similarly Carroll county and Hartford county also behave under such criteria. But for Good condition bridges this is not the case where there is no certain strong dense region as similar to Fair condition bridges.

Previously when we looked into county we did not see Baltimore City often as a factor, but here that is not the case.

# jitter plot to average daily Traffic less than or equal 1000000
ggplot(subset(bridges,`Average Daily Traffic`<=100000 & County!="Baltimore city"),
       aes(x=County,y=`Average Daily Traffic`,color=Condition))+
  xlab("County")+ylab("Averag Daily Traffic")+
  ggtitle("Average Daily Traffic Less than 100,000 \nFor Counties")+
  scale_y_continuous(labels = seq(0,100000,5000),breaks = seq(0,100000,5000))+
  theme(axis.text.x = element_text(angle = -90))+coord_flip()+ geom_jitter()+
  transition_states(Condition,transition_length = 2,state_length = 3)+
  enter_fade()+exit_shrink()+ease_aes("back-in")

Average Traffic More than 100,000 for Counties with Bridge Condition

This Jitter plot is completely different than previous one, because there are no clear dense regions for any counties and conditions of the bridge. There is only one Poor condition bridge in Baltimore County where the Average Daily Traffic is close to 115,000. In Fair condition bridges also Baltimore County holds the most, while they are slightly dense in the region of 175,000 to 190,000. while for Howard County similar density occurs between 190,000 to 205,000. Bridges in Good condition have more higher values in Baltimore County than Anne Arundel County.

# jitter plot to average daily Traffic more than  1000000
ggplot(subset(bridges, `Average Daily Traffic` > 100000 & County != "Baltimore city"),
       aes(x=County,y=`Average Daily Traffic`,color=Condition))+
  xlab("County")+ylab("Average Daily Traffic")+
  ggtitle("Average Daily Traffic More than 100,000 \nFor Counties")+
  scale_y_continuous(labels=seq(100000,230000,5000),breaks=seq(100000,230000,5000))+
  coord_flip()+ theme(axis.text.x = element_text(angle = -90))+
  geom_jitter()+transition_states(Condition,transition_length = 2,state_length = 3)+
  enter_fade()+exit_shrink()+ease_aes("back-in")

Improvement and Bridge Conditions with Counties

In the variable of Total Improvement there are 1438 missing values, 42 values are zero and the rest are actual values. I am going to look at Total Improvement in two tables. First table will include where Bridges have Total Improvement higher than 9,999,000 USD and less than 30,000,000 USD. Second table is for Bridges which have Total Improvement higher than or equal to 30,000,000 USD.

Further to make these tables interesting I will be using the package formattable, and colors and tiles for numerical values. In the first table there are 7 bridges while only Anne Arundel County holds 3 and Baltimore City holds 4. One bridge is from 1953, and others are from the period of 1977 to 1983. Conditions of these bridges are mostly Fair and two bridges are in Good condition. Lowest Average Daily Traffic is 11760, while highest is 124193, where both bridges are in Fair Condition, and the amount spent on them for Total Improvement are respectively 18,163,000 USD and 16,264,000 USD. The bridge with Highest amount of Average Daily traffic is built in 1953.

# removing unnecessary columns and setting restriction to 
# Total Improvement
Top10<-subset(bridges[,c(-1,-2,-9,-10,-11,-12,-13)], 
              `Total Improvement` > 9999 & `Total Improvement` < 30000)

# setting colours
customRed0 = "#FF8080"
customRed = "#7F0000"

customyellow0 = "#FFFF80"
customyellow = "#BFBF00"

customblue0 = "#6060BF"
customblue =  "#00007F"

# creating the table for above data set
formattable(Top10,align=c("l","l","c","c","c","c"),
            list(
              County =formatter("span",style= ~style(color="grey")),
            `Total Improvement`=color_tile(customblue0,customblue),
            `Average Daily Traffic`=color_tile(customyellow0,customyellow),
            `Year Built`=color_tile(customRed0,customRed)
            ))
County Carries Year Built Condition Average Daily Traffic Total Improvement
Anne Arundel County MD 2 1983 Fair 53221 13504
Anne Arundel County US 50 1953 Fair 124193 16264
Anne Arundel County UPPER LEVEL ROADWA 1977 Fair 11760 18163
Baltimore City IS 95 SB 1977 Fair 94765 15785
Baltimore City IS 95 VIADUCT SB 1980 Good 63650 16051
Baltimore City IS 95 VIADUCT NB 1980 Fair 52850 20484
Baltimore City IS 95 SB 1980 Good 55621 20484

When I did try to plot the top ten bridges with most Total improvement values there was one issue, which is the distance between first two values and the next 8 values. Therefore I divided the table into two.

In this second table We can see there are two bridges which are from Baltimore City and are built in 1980 and 1971, but the amount spent on Total Improvement is 300,000,000 USD each. But their Average Daily Traffic is respectively 56280 and 30600.

While we have another bridge from Baltimore City and built in 1907, but Total Improvement amount is 35,026,000 USD. Here, the Average Daily Traffic is 3900.

# removing unnecessary columns and setting restriction to 
# Total Improvement
Top3<-subset(bridges[,c(-1,-2,-9,-10,-11,-12,-13)], `Total Improvement` >= 30000)

# setting colours
customRed0 = "#FF8080"
customRed = "#7F0000"

customyellow0 = "#FFFF80"
customyellow = "#BFBF00"

customblue0 = "#6060BF"
customblue =  "#00007F"

# creating the table for above data set
formattable(Top3,align=c("l","l","c","c","c","c"),
            list(
              County =formatter("span",style= ~style(color="black")),
            `Total Improvement`=color_tile(customblue0,customblue),
            `Average Daily Traffic`=color_tile(customyellow0,customyellow),
            `Year Built`=color_tile(customRed0,customRed)
            ))
County Carries Year Built Condition Average Daily Traffic Total Improvement
Baltimore City US 40 EDMONDSON AV 1907 Poor 3900 35026
Baltimore City IS 95 VIADUCT SB 1980 Fair 56280 300000
Baltimore City EASTERN AVENUE 1971 Fair 30600 300000

Conclusion

My conclusion of the above plots and tables in point form

  • Jitter plots and animation are useful in explaining one continuous variable with multiple categorical variables.

  • Sub-setting the data set and applying formattable package is useful to explain different continuous values with in a table.

Further Analysis

  • Similarly we can use mapping to point out the locations of the bridges and use animation to make it more clear.

Please see that
This is my fifth post on the internet so please be kind to tolerate my mistakes in grammar and spellings. I intend to post more statistics related materials in the future and learn accordingly. Thank you for reading.

THANK YOU