{"id":781,"date":"2014-05-09T09:38:02","date_gmt":"2014-05-09T09:38:02","guid":{"rendered":"http:\/\/opisthokonta.net\/?p=781"},"modified":"2014-05-09T09:45:32","modified_gmt":"2014-05-09T09:45:32","slug":"the-r-code-for-the-home-field-advantage-and-traveling-distance-analysis","status":"publish","type":"post","link":"https:\/\/opisthokonta.net\/?p=781","title":{"rendered":"The R code for the home field advantage and traveling distance analysis."},"content":{"rendered":"<p>I was asked in the comments on my <a href=\"https:\/\/opisthokonta.net\/?p=621\">Does traveling distance influence home field advantage?<\/a> to provide the R code I used, because Klemens of the <a href=\"http:\/\/www.rationalsoccer.com\/\">rationalsoccer<\/a> blog wanted to do the analysis on some of his own data.  I have refactored it a bit to make it easier to use.<\/p>\n<p>First load the data with the coordinates I<a href=\"https:\/\/opisthokonta.net\/?p=619\" title=\"Dataset: Football stadiums with geographic coordinates\"> posted last year<\/a>. <\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\ndta.stadiums &lt;- read.csv('stadiums.csv')\r\n<\/pre>\n<p>I also assume you have data formated like the data from <a href=\"http:\/\/www.football-data.co.uk\/\">football-data.co.uk<\/a> in a data frame called <em>dta.matches<\/em>. <\/p>\n<p>First wee need a way to calculate the distance (in kilometers) between the two coordinates. This is a function that does that. <\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\ncoordinate.distance &lt;- function(lat1, long1, lat2, long2, radius=6371){\r\n  #Calculates the distance between two WGS84 coordinates.\r\n  #\r\n  #http:\/\/en.wikipedia.org\/wiki\/Haversine_formula\r\n  #http:\/\/www.movable-type.co.uk\/scripts\/gis-faq-5.1.html\r\n  dlat &lt;- (lat2 * (pi\/180)) - (lat1 * (pi\/180))\r\n  dlong &lt;- (long2 * (pi\/180)) - (long1 * (pi\/180))\r\n  h &lt;- (sin((dlat)\/2))^2 + cos((lat1 * (pi\/180)))*cos((lat2 * (pi\/180))) * ((sin((dlong)\/2))^2)\r\n  c &lt;- 2 * pmin(1, asin(sqrt(h)))\r\n  d &lt;- radius * c\r\n  return(d)\r\n}\r\n<\/pre>\n<p>Next, we need to find the coordinates where each match is played, and the coordinates for where the visting team comes from. Then the traveling distance for each match is calculated and put into the <em>Distance<\/em> column of <em>dta.matches<\/em>. <\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\ncoord.home &lt;- dta.stadiums[match(dta.matches$HomeTeam, dta.stadiums$FDCOUK),\r\n                           c('Latitude', 'Longitude')]\r\ncoord.away &lt;- dta.stadiums[match(dta.matches$AwayTeam, dta.stadiums$FDCOUK),\r\n                           c('Latitude', 'Longitude')]\r\n\r\ndta.matches$Distance &lt;- coordinate.distance(coord.home$Latitude, coord.home$Longitude,\r\n                                            coord.away$Latitude, coord.away$Longitude)\r\n<\/pre>\n<p>Here are two functions that is needed to calculate the home field advantage per match. The <em>avgerage.gd<\/em> function takes a data frame as an argument and computes the average goal difference for each team. The result should be passed to the <em>matchwise.hfa<\/em> function to calculate the the home field advantage per match. <\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\navgerage.gd &lt;- function(dta){\r\n  #Calculates the average goal difference for each team.\r\n  \r\n  all.teams &lt;- unique(c(levels(dta$HomeTeam), levels(dta$AwayTeam)))\r\n  average.goal.diff &lt;- numeric(length(all.teams))\r\n  names(average.goal.diff) &lt;- all.teams\r\n  for (t in all.teams){\r\n    idxh &lt;- which(dta$HomeTeam == t)\r\n    goals.for.home &lt;- dta[idxh, 'FTHG']\r\n    goals.against.home &lt;- dta[idxh, 'FTAG']\r\n    \r\n    idxa &lt;- which(dta$AwayTeam == t)\r\n    goals.for.away &lt;- dta[idxa, 'FTAG']  \r\n    goals.against.away &lt;- dta[idxa, 'FTHG']\r\n    \r\n    n.matches &lt;- length(idxh) + length(idxa)\r\n    total.goal.difference &lt;- sum(goals.for.home) + sum(goals.for.away) - sum(goals.against.home) - sum(goals.against.away)\r\n    \r\n    average.goal.diff[t] &lt;- total.goal.difference \/ n.matches\r\n  }\r\n  return(average.goal.diff)\r\n}\r\n\r\n\r\nmatchwise.hfa &lt;- function(dta, avg.goaldiff){\r\n  #Calculates the matchwise home field advantage based on the average goal\r\n  #difference for each team.\r\n  \r\n  n.matches &lt;- dim(dta)[1]\r\n  hfa &lt;- numeric(n.matches)\r\n  for (idx in 1:n.matches){\r\n    hometeam.avg &lt;- avg.goaldiff[dta[idx,'HomeTeam']]\r\n    awayteam.avg &lt;- avg.goaldiff[dta[idx,'AwayTeam']]\r\n    expected.goal.diff &lt;- hometeam.avg - awayteam.avg\r\n    observed.goal.diff &lt;- dta[idx,'FTHG'] - dta[idx,'FTAG']\r\n    hfa[idx] &lt;- observed.goal.diff - expected.goal.diff\r\n  }\r\n  return(hfa)\r\n}\r\n<\/pre>\n<p>In my analysis I used data from several seasons, and the average goal difference for each team was calculated per season. Assuming you have added a <em>Season<\/em> column to <em>dta.matches<\/em> that is a factor indicating which season the match is from, this piece of code calculates the home field advantage per match based on the seasonwise average goal differences for each team (puh!). The home field advantage is out into the new column <em>HFA<\/em>. <\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\ndta.matches$HFA &lt;- numeric(dim(dta.matches)[1])\r\nseasons &lt;- levels(dta.matches$Season)\r\n\r\nfor (i in 1:length(seasons)){\r\n  season.l &lt;- dta.matches$Season == seasons[i]\r\n  h &lt;- matchwise.hfa(dta.matches[season.l,], avgerage.gd(dta.matches[season.l,]))\r\n  dta.matches$HFA[season.l] &lt;- h\r\n}\r\n<\/pre>\n<p>At last we can do the linear regression and make a nice little plot. <\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\nm &lt;- lm(HFA ~ Distance, data=dta.matches)\r\nsummary(m)\r\n\r\nplot(dta.matches$Distance, dta.matches$HFA, xlab='Distance (km)', ylab='Difference from expected goals', main='Home field advantage vs traveling distance')\r\nabline(m, col='red')\r\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>I was asked in the comments on my Does traveling distance influence home field advantage? to provide the R code I used, because Klemens of the rationalsoccer blog wanted to do the analysis on some of his own data. I &hellip; <a href=\"https:\/\/opisthokonta.net\/?p=781\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[5,6],"tags":[39,40,50],"class_list":["post-781","post","type-post","status-publish","format-standard","hentry","category-r","category-soccer","tag-hfa","tag-home-field-advantage","tag-r"],"_links":{"self":[{"href":"https:\/\/opisthokonta.net\/index.php?rest_route=\/wp\/v2\/posts\/781","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/opisthokonta.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/opisthokonta.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/opisthokonta.net\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/opisthokonta.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=781"}],"version-history":[{"count":10,"href":"https:\/\/opisthokonta.net\/index.php?rest_route=\/wp\/v2\/posts\/781\/revisions"}],"predecessor-version":[{"id":791,"href":"https:\/\/opisthokonta.net\/index.php?rest_route=\/wp\/v2\/posts\/781\/revisions\/791"}],"wp:attachment":[{"href":"https:\/\/opisthokonta.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=781"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/opisthokonta.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=781"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/opisthokonta.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=781"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}