In some recent web scraping projects I extracted some data from a HTML document and saved it in a .csv file, using the csv module in Python. I used the BeautifulSoup module to parse and navigate the HTML, and since BS always encodes text in unicode, there was some real hassle when I tried to write special (non-ASCII) characters to the csv file since the csv module does not support unicode.
The documentation to the csv module provides some solutions to the problem, but I found that the easiest solution was to just install jdunck’s unicodecsv module. It has the same interface as the regular csv module, which is great. This means that if you already have a script that uses the regular module you can just write
import unicodecsv as csv (or whatever you imported csv as) and it should work.
I guess Python 3.x does not have this problem since all strings by default are unicode strings.