Research question : Does there appear to be a relationship between 2012 U.S. presidential candidates with their financial contributors’ neighborhood?
Candidates for presidential election need donations from contributors to run a campaign.You have seen result of votes for New York and its cities.But state and city are too general to make a conclusion. This research will dive deep into the neighborhood level and analyze whether or not there is a correlation between neighborhood and the candidates.If it’s indeed there’s a correlation, then it will provide meaningful factors for the next U.S. presidential candidates.
This data is collected from Federal Election Commission, http://www.fec.gov/ . This site is official U.S. government, and write detail about 2012 U.S. presidential election.This data has contributors’ identity such as id,name,occupation,city,status, employer, contribution amount, etc.This data can be downloaded at this link.This is a must written survey by all contributors that want to donate for candidates’ campaign.
One contributors can submit more than once. Since I only pay attention to the contributors only, I drop dupplicate the contributors, so the cases are 378562 unique contributors. The variables that will be focused on are contributors’ zipcodes(categorical, various levels) and candidate name(categorical, various levels).
This is observational study. The data is collected from must filled survey by contributors,not computer, as there’s some city that is the same but two different things (“North Hills,” vs “North Hills”). There are total of 2207 cities in dataset as opposed to 62 New York cities the original number.
It can generalized to all New York financial contributions in 2012, but only for those New Yorkers that contribute to presidential election. This data can’t be generalized to others that perhaps have insufficient money to contribute, or other reasons. This can be some extraneous variables that prevent the survey to generalized to New York population. This data is taken just for 2012 New York financial contributions. 2016 presidential election will have different candidates, and hence it will vary greatly.We can’t make a causality based on the fact that this is observational study.
Exploratory data analysis:
Library that I will be using are:
library(ggplot2) library(ggmap) library(zipcode) library(dplyr) data(zipcode)
Here I load the data into dataframe, and filter for just two candidates, Obama and Romney.
df = read.csv("fc2012ny.csv") #Row wise, drop duplicated zip codes. And subsetting for two main candidates, Obama and Romney #Column wise, selecting only two variable required, 3=cand_nm, 7=contbr_zip rb_zip = subset(df[!duplicated('contbr_zip'),c(3,7)], cand_nm == 'Romney, Mitt' | cand_nm == 'Obama, Barack') #Redefined the factor levels to only 2 rb_zip = droplevels(rb_zip)
This plot will show how many contributors for Obama and Romney.
#Plotting by ggplot ggplot(rb_zip,aes(x=cand_nm)) + geom_bar() + xlab('Candidate names')+ ylab('Number of contributors')+ ggtitle('Contributors for candidates presidential election 2012')+ ggsave('plot.jpg',limitsize = T)
Even when we’re doing bar chart, We can see that Obama has almost 10 times contributors than Romney’s. You can see that with this many contributors, Obama has more freedom to run his campaign compared to Romney.Below is the contigency table of zipcode financial contributors for Obama and Romney,
## ## Obama, Barack Romney, Mitt ## 338450 40112
To help me with the analysis, I use the cool zipcode package library from Jeffrey Breen. I extract the latitude, longitude and city (I did not use city in the df dataset, because it need some wrangling.) Since some of the people only fill 5 prefix zipcodes, I convert all of them into 5 prefix zipcodes. Then I use color to differentiate both candidates. Below I use another awesome plot package, ggmap by David Kahle and Hadley Wickham.
#Clean and join the zipcode data, by Jeffrey Breen, author of R zipcode package. rb_zip$contbr_zip = clean.zipcodes(rb_zip$contbr_zip) zip_map= merge(rb_zip, zipcode, by.x = 'contbr_zip', by.y = 'zip') #Draw original map of new york map = get_map(location="ny",zoom=6,source="stamen") #Plot the point ggmap(map)+ geom_point(data=zip_map, aes(x=longitude,y=latitude,colour=cand_nm))