{"id":230,"date":"2012-12-13T17:54:00","date_gmt":"2012-12-13T17:54:00","guid":{"rendered":"http:\/\/opisthokonta.net\/?p=230"},"modified":"2012-12-13T17:55:19","modified_gmt":"2012-12-13T17:55:19","slug":"synonymous-factor-levels-in-r","status":"publish","type":"post","link":"https:\/\/opisthokonta.net\/?p=230","title":{"rendered":"&#8216;Synonymous&#8217; factor levels in R"},"content":{"rendered":"<p>When I work with data from different sources, they are often inconsistent in ways they specify categorical variables. One example is country names. There are many ways the name of a country can be specified, and even if there are <a href=\"http:\/\/en.wikipedia.org\/wiki\/ISO_3166\">international standards<\/a>, different organizations like to do it their way. North Korea, for example, may sometimes be written as just as &#8216;North Korea&#8217;, but other sources may call it &#8216;Korea DPR&#8217;. <\/p>\n<p>This of course leads to complications when we want to combine data from different sources. What could be a trivial lookup in two different dataframes in R becomes a real hassle. One solution I have come up with is to make a .csv file with different names from different sources, and then load it into R and use it to &#8216;translate&#8217; the factor levels from one source to the way the levels are represented in the other. Based on a <a href=\"http:\/\/wiki.stdout.org\/rcookbook\/Manipulating%20data\/Renaming%20levels%20of%20a%20factor\/\">method for renaming levels<\/a> with regular expressions from Winston Chang&#8217;s <a href=\"http:\/\/wiki.stdout.org\/rcookbook\/\">Cookbook for R<\/a>, I made a function for renaming several levels in a dataframe at once. The part about using a .csv file is not the important thing here, it is just a more convenient way of storing the information needed.<\/p>\n<p>The function takes four arguments. <code>dat<\/code> is a dataframe that contains the factors that is to be renamed. <code>vars<\/code> is the variables to rename. <code>from<\/code> and <code>to<\/code> specifies what to rename from and what to rename to. The function returns a dataframe.<\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\nrenameLevels &lt;- function(dat, vars, from, to){\r\n  for (v in vars){\r\n    ptrns &lt;- paste(&quot;^&quot;, from, &quot;$&quot;, sep=&quot;&quot;) \r\n    for (lvl in 1:length(ptrns)){\r\n      levels(dat[, v]) &lt;- sub(ptrns[lvl], to[lvl], levels(dat[, v]))\r\n    }\r\n  }\r\n  return(dat)\r\n}\r\n<\/pre>\n<p>A small example:<\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\n#data to be translated\r\nvar &lt;- factor(c(&quot;b&quot;, &quot;a&quot;, &quot;c&quot;, &quot;a&quot;, &quot;d&quot;, &quot;a&quot;, &quot;e&quot;, &quot;b&quot;))\r\nvar2 &lt;- factor(c(&quot;b&quot;, &quot;b&quot;, &quot;b&quot;, &quot;b&quot;, &quot;b&quot;, &quot;a&quot;, &quot;e&quot;, &quot;b&quot;))\r\ndata &lt;- data.frame(var, var2)\r\n#&gt; data\r\n#  var var2\r\n#1   b    b\r\n#2   a    b\r\n#3   c    b\r\n#4   a    b\r\n#5   d    b\r\n#6   a    a\r\n#7   e    e\r\n#8   b    b\r\n\r\n#translate from roman to greek letters\r\nroman &lt;- c(&quot;a&quot;, &quot;b&quot;, &quot;c&quot;, &quot;d&quot;, &quot;e&quot;)\r\ngreek &lt;- c(&quot;alpha&quot;, &quot;beta&quot;, &quot;gamma&quot;, &quot;delta&quot;, &quot;epsilon&quot;)\r\n\r\ndata2 &lt;- renameLevels(data, c(&quot;var&quot;, &quot;var2&quot;), roman, greek)\r\n#&gt; data2\r\n#      var    var2\r\n#1    beta    beta\r\n#2   alpha    beta\r\n#3   gamma    beta\r\n#4   alpha    beta\r\n#5   delta    beta\r\n#6   alpha   alpha\r\n#7 epsilon epsilon\r\n#8    beta    beta\r\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>When I work with data from different sources, they are often inconsistent in ways they specify categorical variables. One example is country names. There are many ways the name of a country can be specified, and even if there are &hellip; <a href=\"https:\/\/opisthokonta.net\/?p=230\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[5],"tags":[19,22,21,50,20,53],"class_list":["post-230","post","type-post","status-publish","format-standard","hentry","category-r","tag-factor","tag-function","tag-levels","tag-r","tag-rename","tag-tips"],"_links":{"self":[{"href":"https:\/\/opisthokonta.net\/index.php?rest_route=\/wp\/v2\/posts\/230","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/opisthokonta.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/opisthokonta.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/opisthokonta.net\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/opisthokonta.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=230"}],"version-history":[{"count":20,"href":"https:\/\/opisthokonta.net\/index.php?rest_route=\/wp\/v2\/posts\/230\/revisions"}],"predecessor-version":[{"id":251,"href":"https:\/\/opisthokonta.net\/index.php?rest_route=\/wp\/v2\/posts\/230\/revisions\/251"}],"wp:attachment":[{"href":"https:\/\/opisthokonta.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=230"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/opisthokonta.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=230"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/opisthokonta.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=230"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}