Most mornings, I do the puzzles in the Boston Globe‘s g section. These happen to share a page with the horoscopes, which I like to read aloud to anyone within earshot. However, over time, I grew suspicious. The same predictions, even the same turns of phrase, seemed to pop up again week after week. Someone always needed to keep a careful eye on their assets. Love was always “on the rise” for one person or another.
I wanted to find out if I’m simply being sensitive, or if there really was meaningful repetition in the predictions.
THE DATA: http://www.uclick.com/client/bos/el/
The first drawback to starting with a question rather than a data set: I assumed that the online archive of horoscopes would be more robust. Unfortunately, I discovered that only the last two weeks (i.e., March 7 – 19) are accessible. I decided to go forward with my smaller set anyway — because I was still curious, and because two weeks seemed sufficient to at least explore my repetition hunch.
I wrote a small scraper to pull down all of the existing ‘scopes. (Shout-out to Harvard’s CS171 Visualization course and to the pattern.web Python module.) The data was then split two ways: by text alone, and by Zodiac sign.
If I wanted to do this more rigorously, I would need to a good algorithm to suss out all possible repeating phrases. As it stands, I wrote a quick and dirty program to sort the text by individual words, sentences, and phrase pairs.
The most common phrases were…
love is on the rise 7
love is highlighted 5
deception is apparent 4
romance is in the stars 3
love and romance are on the rise 2
love and romance are highlighted 2
love is in the stars 2
love and romance are in the stars 2
Then, using Many Eyes, I ran my text through a few different word visualizations. Many Eyes is an IBM site that allows novice data journalists to play with information in a fairly easy way.
Overall, it helped a lot that I knew exactly what I was looking for when I started this assignment. (Does a data story start with the data? Or with what you want out of the data?) However, I feel more like I excavated some fun facts than an actual “story.”
Regardless, I really want to scrape a whole year of horoscopes now.