Sruthi’s Media Diary

The big picture

By Tuesday (21st of Feb) early morning, I tracked about 5.5 days of media usage totalling about 46.6 hours. I used 4 distinct sources of technology – Macbook Air, iPhone, Echo and paper. I used RescueTime to track usage on my laptop, Moment app to track usage on my phone and my good ol’ brain for the rest.

Using a top down approach, following was my overall media usage broken down by category:

 Source: RescueTime, Moment and personal data collected; chart built using Plot.ly

My media usage amounts to about 35% of my day (46.6 hours out of 5.5 days tracked). I spend the rest of my day commuting (without using media), in class, meetings, running errands, socializing, working out and sleeping. Given sleeping forms a third of my day (7 hours per day), my media consumption though significant is not a very bad statistic.

Takeaway 1: Multiple media sources form the 35% daily average media usage for a multitude of tasks

From the smallest to the largest source of media consumption…

Echo (daily average ~ 10 minutes)

Echo has been primary news source in the last week. I listen to headlines and short articles from NY times, WSJ, BBC and Economist as I get ready for the day.

Usually I try to scroll through my NY times, WSJ and BBC phone apps but the usage has been minimal in the last week.My news app usage varies but I find myself needing 15-20 minutes to go through all my news apps during the morning but I haven’t allotted the time since being back to school. I usually listen to news podcasts (economist and WSJ) on my walk to school, but given the snow / weekends my podcast listening has been non-existent.

Takeaway 2: Consume news (mostly headlines) during commute / multi-tasking

Print Media (daily average ~ 1-2 hours)

My print media usage is usually restricted for class readings – articles and cases. Given I am taking 5 courses this semester, all of which are qualitative, it makes sense to read 1-2 hours on a daily basis to prepare for my average 2 classes per day.

Takeaway 3: Print media restricted for coursework ~ associating print with serious media consumption

iPhone (daily average ~ 1.6 hours)

While on average my iPhone usage is around 1.6 hours per week, following is a snapshot of my phone usage for a single day which is reflective of my day-day consumption. I learnt a lot about my phone usage habits and they were pretty consistent with my love of productivity and addictive Instagram usage habits.

 

Using Moment app on my iPhone, I was able to track app usage by minute, location and time of day. Following is a snapshot for last Monday (20th Feb):

1. Throughout the day, I check my phone 60 times, that means on average once every 17 minutes (excluding 7 hours for uninterrupted sleep time) … clearly a sign of addiction. I used the phone, per check, anywhere from 2 minutes to 44 minutes with a median of 3 minutes, which reflects my fairly short attention span.

2. Home screen – I spend majority of my time using the home and lock screen, which is where I receive alerts from my various news apps. This indicates my sad habit of consuming news headlines in terms of alerts (I mostly get updates from news apps and outlook and check my phone periodically as my phone is always on night mode).

3. Productivity, productivity, productivity apps – sweat, outlook, weather, notes, app store – my focus has been on working out, emailing / checking calendar, taking quick notes, checking weather and getting more apps to improve my productivity. I am not surprised or shocked by the usage numbers given I feel I am at a minimal time per app.

4. Social networking – Facebook, Instagram, Whatsapp – my Instagram usage is alarming. I have a preference for visual media consumption especially given my interest in following influencers in food, travel, health and fitness space. I feel Instagram is best suited to connect with influencers and brands I like.

Takeaway 4: Spend time reading in-depth investigative news articles rather than consuming news updates

Macbook (daily average ~ 3.8 hours)

I primarily love using my laptop the most because of the screen space and find it most convenient to use the laptop for both work and entertainment.

  1. Too much entertainment – According to my RescueTime dashboard, I spend 40% of my time on entertainment and rest on more productive applications like outlook and excel. Following is a screenshot of my overall usage last week by top applications used:Source: RescueTime dashboard
  2. Timeline analysis – I created the following heatmap for my three main categories (entertainment, communication and design) to understand my hourly usage patterns across the last few days. The richer the color, the more time spent in that category.

Source: RescueTime data; heatmap built with excel

My main takeaway from my usage indicates that I have productive work hours from 9 am till 8 pm and during the rest of the time I waste my time consuming Netflix for entertainment purposes.

Takeaway 5: Give up Netflix!!!!

Overall, I notice my media consumption is very self-centered in serving my own interests. I would be curious to learn how to a non-participatory citizen, such as myself, to be influenced by subjects outside my interest areas and how these topics could enrich my life.

What does Hillary Clinton’s Inbox look like?

I, too, am tired of hearing about Hillary’s use of a private email server. On the other hand, it led to a pretty neat data set to unpack: a dump of emails she’s sent and received.

I played around with this data set a bit and was particularly interested in how different groups of people interacted with Hillary. Did men use shorter sentences than women, for example? Did her staffers send one-liners versus ambassadors who sent lengthy emails? Did she have interesting relationships with people we might not be familiar with?

I didn’t get a chance to answer all of these questions, but I ended up being interested in the way words in her email were clustered, and decided to come up with a visualization based on that.

For a simple representation to start, I created a scatter plot visualization using mpld3, which creates interactive matplotlib graphs for the browser. It’s clunky to navigate (you need to switch to a zoom-in mode, drag a rectangular portion of the graph to zoom in on, then switch again to the cursor mode to scroll over words), but it’s interesting to see which words appear together for a first step.

Isn’t it interesting that “bipartisan” appears well outside the main cluster of words?

Isn’t it interesting that “bipartisan” appears well outside the main cluster of words?

Lesson learned along the way: visualizing text is hard. I found that the norm for text visualizations out there, such as word clouds or circle packing, was reductionist for some of the data I have, like topic models or k-means clustering.

While I didn’t create data visualizations for some of the questions I posed earlier, I do have some statistics:

For males:

6187 sentences
83764 word tokens
10762 word types
7.78 average tokens per type
13.54 average sentence length
5.01 average word token length
7.34 average word type length
Hapax legomena (words that appear only once – an indicator of vocabulary usage) comprise 49.60% of the types

For females:

22845 sentences
369517 word tokens
30386 word types
12.16 average tokens per type
16.17 average sentence length
4.94 average word token length
7.84 average word type length
Hapax legomena comprise 49.76% of the types

I analyzed all of Taylor Swift’s lyrics so you don’t have to.

At the 58th Grammy Awards earlier this year, Taylor Swift became the first woman to win Album of the Year twice for a solo album.

By the numbers, this shouldn’t come as a shock. Swift — an objectively gifted singer, songwriter, and performer — has had a wildly successful career by any metric. That said, if I had to list the top 10 female performers of my lifetime I’m not sure Swift would make the cut. As culture critic Camille Paglia so delicately put it for The Hollywood Reporter, I find her music to be “mainly complaints about boyfriends, faceless louts who blur in her mind as well as ours.”

While the internet is rife with Taylor Swift listicles analyzing the lyrics of her songs, data-driven analysis is scarce (or, more likely, just private). So, in the spirit of collect and verify, I decided to do a textual analysis of TSwift’s work using Word Counter to see just how boy-centric her lyrics actually were.

True to Sands prediction from last class: 80% of my time was spent on data collection, 15% was spent sifting through said data, and I’m wrapping up the remaining 5% now. Using the database AZLyrics,  I combed through the many, many songs of Taylor Swift. To date, she has released five studio albums, two live albums, two video albums, two extended plays (EPs), 37 singles, three featured singles, and eleven promotional singles To keep things simple, I decided to stick with her five studio albums, Taylor Swift (2006), Fearless (2008), Speak Now (2010), Red (2012), and 1989 (2014).

Word Counter is a pretty straightforward tool: it counts the words, bigrams, and trigrams in a plain text document which you can either paste directly into the browser or upload to the site. From there, you can download the single word counts, bigrams (2 contiguous words), and trigrams (3 contiguous words) into .csv format. Between the five albums, I copied in text from 69 songs and then downloaded the data.

Then the process became a bit less straightforward. Comparing single word-counts of individual songs and albums side by side didn’t really give me a ton of useful insight — not to mention, it’s a fairly boring way to see the data. I decided to compare Swift’s two “Albums of the Year” — 1989 (in blue) and Fearless (magenta) — by plugging the songs’ text into Tagul, a very user friendly word cloud art generator.

Combo_TS

Other than showing Ms. Swift is a thematically consistent songwriter, this didn’t give me much to go by. Perhaps, if I compared the two albums’ most frequently used trigrams?

trigrams_new

Aha — now we were getting somewhere. Where Fearless (right) reinforces my earlier criticism, the trigrams from 1989 — namely, the song “Shake it off” -focus more intensely on Swift herself. As she explained to Rolling Stone in 2014: “When you live your life under that kind of scrutiny, you can either let it break you, or you can get really good at dodging punches. And when one lands, you know how to deal with it. And I guess the way that I deal with it is to shake it off.”

Ultimately, my textual investigation should have supplemented a broader investigation which also examined songs Swift wrote vs. co-produced and weighted the popularity of the songs. From the data I did collect, it seems Camilla Pagalia and I should maybe give Swift another chance: the pop star is shifting tone, however incrementally, from the lovestruck ballads of albums past.   

 

What Poverty Steals

I’m fascinated by the tangle of life expectancy, wealth and poverty, income inequality and social mobility.

My data viz was prompted by new research detailing poor people’s shorter life expectancies and The Atlantic article about the 47 percent of cash-strapped Americans who said they couldn’t come up with $400 for an unexpected expense.

Economic stability is about income, but it’s also about assets and wealth. It’s about having a cushion to shield you from the inevitable unexpected expense of car repairs or a medical emergency.

Poor people don’t have that margin of error, which is one of the reasons that economic mobility is so low.  Kids who are born poor in Shelby County (the county that holds Memphis) die poor. Only 2.6 percent of children raised in the bottom quintile of household income in the Memphis area rise to the top quintile by adulthood. According to a New York Times interactive, “Shelby County is very bad for income mobility for children in poor families. It is better than only about 9 percent of counties.”

Since Shelby County is majority black and a disproportionate share of the poor people are black, I wanted to focus specifically on black people. (Hispanics also have an insanely high poverty rate, but there’s relatively few of them in Shelby County/Memphis and most are recent immigrants.)

Here’s what I wanted to determine for people in the county where I live, Shelby County:

If poor people had the same life expectancy of rich people, how much more could they expect to earn over those additional working years? If you add up all those dollars, how many millions of dollars are poor black people in Shelby County forfeiting simply because they’re poor, black and live where they do?

If I could answer this question, I wanted to show the data similar to how Periscopic animated the years lost to gun deaths. http://guns.periscopic.com/?year=2013

periscopic gun deathsSpoiler alert: I don’t have the data to answer the question I was trying to answer. Especially not on the $$ end.

Nevertheless, I did a short video, 1:06. And I figured out how to add music.

I was going to build a little bar chart showing the difference in life expectancy between poor people and rich people in Shelby County – or one comparing the income disparity in life expectancy by the biggest counties in Tennessee, but there wasn’t a whole lot of difference. And I can do a bar chart, so I was trying to figure out what I didn’t know how to do.

*** I’m pretty sure my math is all wrong, because there’s far more than 26,000 black adults in Shelby County who are poor (as defined by living in the bottom quartile of household income). Would love to think through how to answer my question with someone who knows.

Profile (belated) and data story: Sands Fish

I interviewed Sands Fish for our class profiles assignment months ago and decided to try to profile him through the medium in which he is an expert: data visualization. However, I ran into a road block that I wasn’t able to resolve until our data visualization class. So I’m combining two assignments in one and finally presenting my results.

After Sands and I talked, I transcribed 25 minutes of our interview, including even the “um”s and “yeah”s. Then I analyzed the text from several different perspectives, trying to echo Sands’ work with MediaCloud, which crunches massive amounts of data to discover the relationships between words and the people who use them. In our case, I wanted to get a visual representation of the themes and rhythm of our interview.

First, I analyzed the language we each used. Here are the words I used most often:

Screen Shot 2016-04-19 at 10.52.27 PM

And the ones Sands used most often:

Screen Shot 2016-04-19 at 10.53.21 PM

There wasn’t a lot of overlap.

Then I counted the number of words in each uninterrupted chunk of speech and made a spreadsheet recording each of those chunks under our respective names, with the minute timestamp interspersed. For example, here is the first five minutes:

Screen Shot 2016-04-19 at 11.12.56 PM

Here is a streamgraph that shows our individual share of the conversation, and the overall give and take. I used total words per person per minute to produce this graph on raw.densitydesign.org:

Screen Shot 2016-04-19 at 10.56.54 PM

Then I took a more granular look at the first 10 minutes of conversation, using cumulative word count instead of minutes as the x-axis value. That gave me a better sense of the frequency of volleys between us, and the duration of each uninterrupted chunk of speech:

Screen Shot 2016-04-19 at 11.15.35 PM

Here are a few takeaways I gleaned about my interview style by representing the interview visually:

  • I affirm understanding in lazy ways (yeah, OK, mhmm), and I interrupt a lot.
  • It would be better would be to remain silent until the end of my interviewee’s explanation, and then affirm my understanding in a summary that uses key words and phrases that he or she has shared.
  • Overall the share of conversation is roughly appropriate for interviewer and interviewee, though the spike at 22 represents a story I shared that probably didn’t add much to the interview.

The American Community Survey – 3 in 1: explainer, engagement, data story

I have thought about creating a census fan page many times. Looking at data all day makes one appreciate the history, scale, and effort of this massive public endeavor. Not only does the census provide official guidance to the formulation of public funding and policy, it has over the years also ritualistically structured our understanding of our environment. Since 1790, the census evolved not just to adapt to the massive increase in population(from under 4 million to 318 million today) and migration(from 5.1% urban to 81% in 2000), but its format has also changed to reflect our attitudes. In this 3 part(hopefully) assignment/makeup assignments, I focused on explaining and visualizing the American Community Survey(ACS), a newer data offering of the census that is a yearly long form survey for a 1% sample of the population.

Last summer, while interning at a newsroom, I built a twitter bot based on the ACS inspired by how nuanced and evocative the original collected format of the dataset is. Each tweet is a person’s data reconstituted into a mini bio. In the year since, people have retweeted when an entry is absurd or sad, but most often when an entry reminded them of themselves or someone they know. It quickly became clear that narratives are more digestible than data plotted on a map. However, I was at a loss on how to further this line of inquiry to include more data in bigger narratives.

Part of my research is to experiment with ways of making public data accessible so that individuals can make small incremental changes to improve their own environment. Many of these small daily decisions are driven by public data, but making the underlying data public is not always enough. While still plotting data on maps regularly, I started to think about narratives. Can algorithmically constructed narratives and narrative visualizations stand alone as long-form creative nonfiction?

There are so many wonderful public data projects that go the extra step out there. Socialexplorer does a great job of aggregating the data, so does actually ancestry.com. Projects from timeLab show many examples of how census data has been used for a variety of purposes, even entertainment. And just last week, the macroconnections group unveiled a beautiful and massive effort to expose public datasets with datausa.io that takes data all the way into a story presentation.

Constraints are blessings…

It’s fortunate that I work in such a time and environment but also very intimidating. What can I contribute to an already rich body of work where each endeavor normally requires many hours and even months of teamwork, not to mention the variety of skills involved? More selfishly, what can visual artists add to the conversation that is beyond simply dressing up the results? This series of 3 assignments is a start.

1. Explainer – the evolution of the census

Instead of focusing on how the population has changed, here is a visualization of how census questions have changed to reflect the attitudes and needs of the times. Unfortunately this was unfinished and only goes from 1790 to 1840 right now.

 

1790_1840view closeups here – 1790_1840

2. Engagement – how special are you?

I have been procrastinating by spending a lot of time on guessing the correlation. I think that buzzfeed-type quizzes are one of the best data collection tools. Of course there is also this incredible NYT series. People who commented on the census bot often directly address tweets that describe themselves. This is an experiment to get people to learn something about the data by allowing them to place themselves in it.

Screen Shot 2016-04-19 at 9.42.30 PMScreen Shot 2016-04-19 at 9.42.00 PM  This is also still very much in progress: http://jjjiia.github.io/censusquiz/

3. Data Story

To be continued …

Parity Pool

Storytelling with data requires patience, reliable sources, and creativity. I was excited to browse the aggregated data sets on the newly launched DataUSA.io website. I soon found myself lost in statistics about occupations, income distribution, and wage gaps in the United States. Ultimately, I decided to explore educational data provided around Computer Science degrees programs. I wasn’t exactly sure what I would fine, but I new I wanted to look at issues of diversity within the technology sector. Visit http://partnews16-722286.silk.co/ to see what I discovered.

Screenshot 2016-04-19 20.52.53

Visualizing Comments by Gender at the NYT

I recently read Emma Pierson’s study about commenters and gender at the NYT. I thought it was a great piece with compelling data, some of which I tried to pull out in the following infographic.

A few challenges: the program I used didn’t allow me a lot of flexibility in terms of editing the charts, so I had to be creative about which points I chose to pull out of her findings. This visualization also uses word clouds, which some folks find terribly unsophisticated, but I really liked the visual comparison of the types of words that men and women use in comments on the same articles side by side.

Without further ado, here’s the visualization…(unfortunately, I had to paste in a screenshot because the original png file wouldn’t copy into this, so the quality on this version is a little lower than I would have hoped)

comments and gender snip

It’s low cost energy, stupid.

Recently, the Department of Energy announced it will participate in the development of the Plains & Eastern Clean Line Project (Clean Line), a major clean energy infrastructure project which will bring low-cost renewable power to my home state of Arkansas, Tennessee and other markets in the Mid-South and Southeast. The approximately 700-mile, high voltage direct current transmission line and associated facilities has the capacity to deliver 4,000 megawatts (MW) of wind power from the Oklahoma Panhandle region.

The all-Republican Arkansas congressional delegation has already issued a statement against the decision, citing executive overreach. Yet, in a state whose per capita GDP of $40,924 trails well below the US average of $54,307 and where access to inexpensive energy is hard to come by, I thought the case for the project deserved to be made.

For this assignment, I designed the following graphic for the Arkansas Times, the state’s go-to alternative news source.

Clean Line Energy (2)

 

.