Looking at monthly distribution of births in Norway

A news story earlier this week reported an increased number of births during the summer months in Norway. According to the story the peak in births used to be in the spring months, nine months after summer vacation, but is now during the summer. The midwifes thinks this change is because of the rules for granting a place in preschool day care. Children born before september 1st are legally entitlet to a place in day care.

Anyway i decided to try to visualize this. I found some data at the Statistics Norway website, loaded it into R, cleaned it, restructured it etc. and made this animation with ggplot2 showing the monthly distribution of births from year 2000 to 2011. I decided to include data for the years before 2005 since that is when the current left wing coalition took office and they had a program for universal access to day care. It is hard to spot a definite trend, but the graph for 2011 shows a clear top in the summer months. It will be interesting to see if this becomes clearer the next couple of years. Also, if this becomes a continuing trend, it would be interesting to look at surveys in family planning and see if there has been more of it the last couple of years.

The birthIndex on the y-axis is not the precise number of births for a given month, but is corrected for the number of days in the month. This makes the different months comparable.

Unicode csv files in Python 2.x

In some recent web scraping projects I extracted some data from a HTML document and saved it in a .csv file, using the csv module in Python. I used the BeautifulSoup module to parse and navigate the HTML, and since BS always encodes text in unicode, there was some real hassle when I tried to write special (non-ASCII) characters to the csv file since the csv module does not support unicode.

The documentation to the csv module provides some solutions to the problem, but I found that the easiest solution was to just install jdunck’s unicodecsv module. It has the same interface as the regular csv module, which is great. This means that if you already have a script that uses the regular module you can just write import unicodecsv as csv (or whatever you imported csv as) and it should work.

I guess Python 3.x does not have this problem since all strings by default are unicode strings.