{"id":939,"date":"2014-12-13T11:43:50","date_gmt":"2014-12-13T11:43:50","guid":{"rendered":"http:\/\/opisthokonta.net\/?p=939"},"modified":"2015-08-22T18:17:23","modified_gmt":"2015-08-22T18:17:23","slug":"the-dixon-coles-model-part-4-some-tricks-to-speed-up-estimation","status":"publish","type":"post","link":"https:\/\/opisthokonta.net\/?p=939","title":{"rendered":"The Dixon-Coles model, part 4: A trick to speed up estimation"},"content":{"rendered":"<p>In the previous installments in this series on implementing the Dixon-Coles model I complained a bit about the time it took to estimate the parameters. In the original implementation in <a href=\"https:\/\/opisthokonta.net\/?p=890\">part 1<\/a> it took about 40 seconds. Now 40 seconds is not much to complain about, there are a whole lot of other models and algorithms that takes much much longer time to fit (for my master&#8217;s I had some computations that took several months). Still, I wanted to make a few improvements. <\/p>\n<p>The approach I described in <a href=\"https:\/\/opisthokonta.net\/?p=927\">part 3<\/a> is quite acceptable, I think, especially since it takes less than a second to fit the model. But still, I wanted to make some improvements to my original implementation.<\/p>\n<p>There are several reasons for the estimation procedure being slow. I used a general purpose optimizer instead of a tailor-made algorithm, and I didn&#8217;t provide the optimizer with a function of the derivative of the model likelihood function, nor the function defining the constraint. This means that the optimizer have to estimate the derivatives by doing a lot of evaluations of the two functions with slight changes in the parameters. The most important speed bump, however, is probably due to how I implemented the constraint that all the average of the attack parameters should equal 1. <\/p>\n<p>The <a href=\"http:\/\/cran.r-project.org\/web\/packages\/alabama\/index.html\">alabama<\/a> package I used relied on a technique called <a href=\"http:\/\/en.wikipedia.org\/wiki\/Lagrange_multiplier\">Lagrange multipliers<\/a>, which is a very general method for constrained optimization. Instead of relying on general constrained optimization procedures, there is a trick commonly used in linear models with sum-to-zero constrained categorical parameters that we also can use here.<\/p>\n<p>There has been some discussion and confusion in the comments about how categorical variables are coded and how R presents the results of the <em>glm<\/em> function. A thorough discussion of this is best left for another time, but let me explain how the sum-to-zero constraint is implemented in linear models. We will fit the model with this constraint and then make some adjustments later on to get the correct average-is-one constraint.<\/p>\n<p>The sum-to-zero constraint basically says that the sum of all the parameters for a categorical variable must equal to zero: <\/p>\n<p><center> \\( \\sum_{i=1} \\theta_i = 0 \\) <\/center><\/p>\n<p>If we for example have three levels, we can write out the equation like this:<\/p>\n<p><center> \\( \\theta_1 + \\theta_2 + \\theta_3 = 0 \\) <\/center><\/p>\n<p>If we subtract \\( \\theta_3\\) and multiply both sides of the equation by minus 1 we get <\/p>\n<p><center> \\( &#8211; \\theta_1 &#8211; \\theta_2 = \\theta_3  \\) <\/center><\/p>\n<p>Notice how we can write one of the parameters as a simple linear function of the other parameters. We can use this result to construct the design matrix for the categorical variable, incorporating the sum-to-zero constraint (exactly which parameter or level we chose to be a function of the others doesn&#8217;t matter, the end results does not differ). Suppose we have the following observations of a three-level categorical variable:<\/p>\n<p><center> \\( \\begin{bmatrix} A &#038; A &#038; B &#038; B &#038; C &#038; C \\end{bmatrix}^T \\) <\/center><\/p>\n<p>We can then construct the following design matrix:<\/p>\n<p><center> \\( \\begin{bmatrix} 1 &#038; 0 \\\\ 1 &#038; 0 \\\\ 0 &#038; 1 \\\\ 0 &#038; 1 \\\\ -1 &#038; -1 &#038; \\\\ -1 &#038; -1 &#038; \\end{bmatrix} \\) <\/center><\/p>\n<p>Notice that we only need two columns (<em>i.e.<\/em> two variables) to encode the three levels. Since the last parameter is a function of the two other it is redundant. Also notice how the observations in the two last rows, corresponding to the \\(C\\) observations, will influence the estimation of all the other parameters for this variable. When the two parameters are estimated, the last parameter can be obtained using the result from above relating the last parameter to all the other.<\/p>\n<p>In the Dixon-Coles paper they used the constraint that the average of the attack parameters should be 1. This is not quite the same as the sum-to-zero constraint, but for prediction, it does not matter exactly which constraint we use. Anyway, I will explain later how we can fix this. <\/p>\n<p>To use this trick in the Dixon-Coles implementation we need to make the following changes to our code from part 1. Obviously the first thing we need to change is how the design matrices in the <em>DCmodelData<\/em> function is computed. We need four matrices now, since the number of parameters estimated directly is different for the attack and defense parameters. Notice how I chose the last of team that appear last in the team.names vector. The teams get sorted alphabetically, so for the 2011-12 Premier League data this is is <em>Wolves<\/em>. <\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\nDCmodelData &lt;- function(df){\r\n  \r\n  team.names &lt;- unique(c(levels(df$HomeTeam), levels(df$AwayTeam)))\r\n  \r\n  # attack, with sum-to-zero constraint\r\n  ## home\r\n  hm.a &lt;- model.matrix(~ HomeTeam - 1, data=df)\r\n  hm.a[df$HomeTeam == team.names[length(team.names)], ] &lt;- -1\r\n  hm.a &lt;- hm.a[,1:(length(team.names)-1)]\r\n  \r\n  # away\r\n  am.a &lt;- model.matrix(~ AwayTeam -1, data=df)\r\n  am.a[df$AwayTeam == team.names[length(team.names)], ] &lt;- -1\r\n  am.a &lt;- am.a[,1:(length(team.names)-1)]\r\n  \r\n  # defence, same as before \r\n  hm.d &lt;- model.matrix(~ HomeTeam - 1, data=df)\r\n  am.d &lt;- model.matrix(~ AwayTeam -1, data=df)\r\n  \r\n  return(list(homeTeamDMa=hm.a, homeTeamDMd=hm.d,\r\n    awayTeamDMa=am.a, awayTeamDMd=am.d,\r\n    homeGoals=df$FTHG, awayGoals=df$FTAG,\r\n    teams=team.names)) \r\n}\r\n<\/pre>\n<p>Some changes to the <em>DCoptimFn<\/em> function is also needed, so it properly handles the changes we made to the design matrices. <\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\n# I don't bother showing the rest of the function  \r\nnteams &lt;- length(DCm$teams)\r\nattack.p &lt;- matrix(params[3:(nteams+1)], ncol=1) #one column less\r\ndefence.p &lt;- matrix(params[(nteams+2):length(params)], ncol=1) \r\n \r\n# need to multiply with the correct matrices\r\nlambda &lt;- exp(DCm$homeTeamDMa %*% attack.p + DCm$awayTeamDMd %*% defence.p + home.p)\r\nmu &lt;- exp(DCm$awayTeamDMa %*% attack.p + DCm$homeTeamDMd %*% defence.p)\r\n<\/pre>\n<p>We also need to make a the appropriate adjustments to the vectors with the initial parameter values, so that they have the correct lengths.  <\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\ndcm &lt;- DCmodelData(data)\r\nnteams &lt;- length(dcm$teams)\r\n\r\n#initial parameter estimates\r\nattack.params &lt;- rep(.1, times=nteams-1) # one less parameter\r\ndefence.params &lt;- rep(-0.8, times=nteams)\r\nhome.param &lt;- 0.06\r\nrho.init &lt;- 0.03\r\npar.inits &lt;- c(home.param, rho.init, attack.params, defence.params)\r\n\r\n#informative names\r\n#skip the last team\r\nnames(par.inits) &lt;- c('HOME', 'RHO', \r\n                      paste('Attack', dcm$teams[1:(nteams-1)], sep='.'),\r\n                      paste('Defence', dcm$teams, sep='.'))\r\n<\/pre>\n<p>With these changes we can  simply use the built-in <em>optim<\/em> function in R. There is no need for the <em>DCattackConstr<\/em> function anymore, or a third party package, since we built the constraint right into the design matrices.<\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\nres &lt;- optim(par=par.inits, fn=DCoptimFn, DCm=dcm, method='BFGS')\r\n<\/pre>\n<p>This takes about 6-7 seconds on my laptop, a decent improvement to the 40 seconds it took before.  If you take a look at the resulting parameter estimates in <em>res$par<\/em> you will see that the attack parameter for Wolves is missing. As I explained earlier, this parameter is easy to find. It is also easy to correct all the parameter estimates so that they become the same as if we had a mean-is-one constraint on the attack parameters. This is done by increasing the attack parameters by one, and decreasing the defense parameters by one. The reason it is that simple is that the sum-to-zero constraint is equivalent with a mean-is-zero constraint.  <\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\nparameters &lt;- res$par\r\n\r\n#compute Wolves attack parameter\r\nmissing.attack &lt;- sum(parameters[3:(nteams+1)]) * -1\r\n\r\n#put it in the parameters vector\r\nparameters &lt;- c(parameters[1:(nteams+1)], missing.attack, parameters[(nteams+2):length(parameters)])\r\nnames(parameters)[nteams+2] &lt;- paste('Attack.', dcm$teams[nteams], sep='')\r\n\r\n#increase attack by one\r\nparameters[3:(nteams+2)] &lt;- parameters[3:(nteams+2)] + 1  \r\n\r\n#decrease defence by one\r\nparameters[(nteams+3):length(parameters)] &lt;- parameters[(nteams+3):length(parameters)] - 1 \r\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>In the previous installments in this series on implementing the Dixon-Coles model I complained a bit about the time it took to estimate the parameters. In the original implementation in part 1 it took about 40 seconds. Now 40 seconds &hellip; <a href=\"https:\/\/opisthokonta.net\/?p=939\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[48,5,6],"tags":[16,50],"class_list":["post-939","post","type-post","status-publish","format-standard","hentry","category-dixon-coles-model","category-r","category-soccer","tag-football","tag-r"],"_links":{"self":[{"href":"https:\/\/opisthokonta.net\/index.php?rest_route=\/wp\/v2\/posts\/939","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/opisthokonta.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/opisthokonta.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/opisthokonta.net\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/opisthokonta.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=939"}],"version-history":[{"count":38,"href":"https:\/\/opisthokonta.net\/index.php?rest_route=\/wp\/v2\/posts\/939\/revisions"}],"predecessor-version":[{"id":1000,"href":"https:\/\/opisthokonta.net\/index.php?rest_route=\/wp\/v2\/posts\/939\/revisions\/1000"}],"wp:attachment":[{"href":"https:\/\/opisthokonta.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=939"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/opisthokonta.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=939"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/opisthokonta.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=939"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}