Request and investigate public records with MuckRock

By Aaron and Drew

Online tools have the ability to lower the effort that journalists need to put into researching their stories. Lowering the activation energy can spur new types of journalists, reimagined forms of engagement, and entire communities centered around this new media.

MuckRock is a perfect example of such an ecosystem forming around a tool that made a previously burdensome task easy. Requesting information from the government can be daunting, but MuckRock guides you through everything and even digitizes what is otherwise often a snail mail process.

However, MuckRock is not just about requesting public records. It’s also about everything that comes after. People can track each other’s’ requests, report articles on the public records, and even crowdsource donations to support more investigative research. (from: https://www.prx.org/group_accounts/190663-muckrock)

What is the Freedom of Information Act?

Lyndon B. Johnson signed into law the Freedom of Information Act (FOIA) in 1966. The law mandates the disclosure of government records to anyone who request them, citizen or not. There are a few exceptions to these record releases, such as national security or the locations of wells. In 2015, the government received over 700,000 such requests for information, of which approximately 25% were released in full and 45% were partly released.

The law mandates a response from agencies within 20 business days. Agencies are allowed to charge citizens for the time and materials.

The FOIA is a federal law that only applies to federal agencies in the executive branch. All 50 states and the District of Columbia have passed their own Freedom of Information laws that are generally very similar to the federal version.

What exactly is MuckRock?

MuckRock is a service for journalists to request and manage FOI requests from a variety of federal, state, and local resources. Since 2010, it has released more than one million government documents: http://www.bostonmagazine.com/news/blog/2016/07/03/muckrock-foia-turns-50/

How do you request a public record?

It’s great you asked! We wondered the exact same thing, so we went ahead and requested our own with MuckRock. The process is simple. All you need to do is sign up for an account, pay a nominal fee ($20 for 4 requests) and then make your request.

With our request, we have asked the FBI to release all records pertaining to foreign cyber attacks against American universities.

Muckrock tracks the average response times of various agencies. Here are some examples:

Agency Average Response Time Required Response Time Success Rate Average Cost
Federal Bureau of Investigations 130 days 20 days 21% $2661.30
Central Intelligence Agency 156 days 20 days 9.5% $28.30
Department of Justice 211 days 20 days 8.2% Not Available
Massachusetts Bay Transit Authority 84 days 10 days 38% $2082.84

How do public records turn into journalism?

Once one hears back from the government with the information she requested, she can use the information in her reporting. Additionally, MuckRock writes its own articles using the public records surfaced by users.

This reporting has the ability to close the loop of the FOIA process and hold parties accountable for actions that might otherwise go unnoticed. Articles on MuckRock are often very timely. Some recent examples include “Boston Police underestimated size of Women’s March protests by nearly 150 thousand” and “EPA Transition docs detail many of the regulations Trump could roll back”.

In a departure from what you normally see in journalism, the articles are often centered around a piece of evidence, such a police report, FBI file, or government document. Not only is the evidence there for you to see and inspect, you can look up the history of how and where it was obtained.

This has the ability to change how readers interact with the news they’re consuming. They can inspect the evidence themselves, forming their own judgement, and even develop ideas on how they may further the reporting in the future — transforming them from consumers into producers. It also introduces transparency that can help instill confidence in the media.

Who funds all this?

Individuals can request their own public records (like we did!), thus supporting the MuckRock community through these one-off records. However, they can also help fund larger projects that are centered around a particular topic and requiring substantial funding. In this sense, MuckRock serves as a crowd-funding website.

Example: https://www.muckrock.com/project/the-private-prison-project-8/

Quick data visualizations

The need for data visualization

With the growth in trend of buzz words like big data, data science etc, the general interest in expecting data as proof is becoming the norm amongst readers. Additionally, growing popularity of blogs like FiveThirtyEight, reporting is slowly moving towards becoming more data oriented. Therefore, the onus now lies on the media content producers to use advanced data analysis to make their points. However, analyzing data is complicated and even harder to communicate but could be done effectively by using data visuals.

Conducting my research on the topic of easy data visualizations, I noticed that majority of the recommendations revolved around using programming languages like R, python etc. Learning how to code is a mammoth task for writers whose main focus is on researching and delivering the story and not learning how to code. Writers need a tool that helps them analyze data and build visuals with a few clicks. A tool like Plot.ly.

What is plot.ly?

Plot.ly addresses the user challenge of creating data visualizations without having heavy knowledge of programming and data visualization techniques. Plot.ly’s website and blog showcase a number of samples on how leading news sites have used Plot.ly visualizations in their articles. For example, below is a sample visual showing statistical analysis in a NYTimes article:

Source: NYTimes 2014 Article – How birth year influences political views

Some of my favorite tools on the platform (image below) are:

  • Ability to use excel layout to input data and pick from over 20 different chart types
  • Creating charts which enable reader interaction
  • Using statistical analysis tools like ANOVA on a web-based platform
  • Reverse coding, enables users to get the code behind the visualizations incase users want to create the same visuals using other programming languages

What does data visualization mean for advancing journalism?

In my opinion, I think Plot.ly helps advance journalism and storytelling by:

  • Saving time for writers by freeing up time for constructing and presenting stories instead of wasting time and resources on visual designers. Additionally, enabling journalists to publish stories as fast as possible.
  • Increasing interactions with readers. It has become harder and harder to engage readers through various types of media because of the declining attention span. Therefore, getting readers to engage readers through interactive visuals could help engage readers and help increase participation
  • Integrating diverse communities – Using technology platforms like Plot,ly could help increase interactions between diverse groups like technologists and journalists helping advance each other’s cause.

Plot.ly resources

Multiple tutorials are available on the webiste. From creating charts to data analysis using sample data sets.

Genevieve’s Bio

Hi All!

My name is Genevieve and I am studying Risk and Resilience at the Harvard Graduate School of Design. My background is very interdisciplinary, but I am broadly interested are power dynamics and how they play out at intersections of various fields and technologies. A few examples of related projects that I have been working on recently examine:
  • The role of bias, transparency and accountability in AI through The Future Institute at HKS
  • The relationship between neuroscience and the risks associated with astronauts’ spacesuits through SEAS
  • Applications of soft robotics at the Wyss Institute
  • Data extraction and the creation of a data economy in the Arctic through the Harvard Urban Theory Lab
Prior to coming to Cambridge, I was teaching design ethics and intercultural communications in social innovation and technology at the University of British Colombia, Kaospilot, RISD and the Pratt Institute. I have also founded a jewellery company that focuses on international mining policy. We partner with organizations such as the UN, OECD, USAID, etc. on issues relating to property rights and conflict in mineral extraction.
In this course, I am interested in exploring the implications of power structures and technologies across media.
I am from Vancouver, Canada. I really love to surf and snowboard and solidly am mediocre at both.

Tools: Twitter is an oldie but it still jams

I already know. Twitter is not the newness. But that’s why I’m taking this class – to discover new tools and think differently. In the meantime, when I’m on deadline writing three columns a week, sometimes I feel like this:

But Twitter’s  Advanced Search often comes through in a pinch. When there’s a breaking news story in the #blacklivesmatter movement or something trending in underserved communities, Twitter often has the news first and Advanced Search allows you to zone in on specific dates, people and even geographic location. You can search by specific tweeters, hashtags or general phrases making it easier to source, fact check and connect. That makes me as happy as Solange when you don’t touch her hair. Don’t touch mine either.

Google Translate: A keystone for global communication

Google Translate is a tool that most of us already know and use. As one of the more popular Google products, it currently serves 500 million monthly users. While Google Translate historically may have been helpful for casual browsers of the internet, it’s not really useful enough to rely on completely for every day conversation, nor for a comprehensive understanding of foreign website.

Google’s recent update of Google Translate, however, has changed that. As of December last year, Google introduced AI into Google Translate, making the product astoundingly better. NYTimes shares the below example:

“Uno no es lo que es por lo que escribe, sino por lo que ha leído.”
With the original Google Translate: “One is not what is for what he writes, but for what he has read.”
With the new A.I.-rendered version: “You are not what you write, but what you have read.”

The difference is stark. Not only has the improvement enabled more coherent and seamless translations, the Google Neural Machine Translation tool now is able to link between two different languages that haven’t been previously linked. That is, Google Translate (idiomatically speaking) has it’s own language that it translates all languages to, thus enabling it to translate two different languages that it hasn’t been explicitly linked to. This improvement opens the door to more language pairings without much of the previous heavy lifting of explicitly linking one language and translating it to another.

This change has interesting implications on the future of news. It makes international news articles accessible to everyone. It allows journalists much easier and faster (and more reliable) access to sources–whether it be other people or documentation and data. More data will simply be more accessible.

It also may have implications on the labor force in the news industry–local speakers may not eventually be needed for reporting. How might this change the type of coverage we get? In a time when some news articles are already written by bots, will Google Translate improve our coverage because we can “understand” more? Or will this make news stories even more impersonal and spotty as we miss cultural nuances and context that only a local expert can provide? The potential implications seem both exciting, and daunting.

 

Sources and more information:

Google’s AI translation tool seems to have invented its own secret internal language

Hi! I’m Aileen.

Hi! I’m Aileen, a second year Sloan MBA who is coffee & pastry obsessed. In a world where I have oodles of money, I would own a high end bakery, and smell the smell of baking croissants all day. I hum when I feel awkward.

But perhaps more relevantly–

 

 

  • My Background: 
    • Education: Majored in Political Science, Minored in Economics. Originally I wanted to be a journalist to pay the bills as I worked my way through the next great American novel. Was fascinated most in my classes by the role of media in political society.
    • Work Experience (journalism ended up not working out): 
      • Advertising: I used analytics and statistics to optimize media placement, brand messaging, and media mix for clients like JetBlue, and Match.com.
      • Google: Decided I wanted to understand how businesses worked. I helped launch and grow a new product, and also did operations strategy.
      • Entrepreneurship: Creating your own product felt compelling, and still is. I am a co-founder for Armoire, a startup that was in MIT’s summer accelerator this past summer, and still going strong.
  • My Personal interests:
    • Better media for the average person: After studying mass media in American democracy during my undergrad, I struggled with some of the shortcomings in today’s media: the sensational headlines, dizzyingly short news cycles, parachute journalism, and inaccessibility by the average American. I’m passionate about finding a media structure that is engaging and educational for everyone, not just people who read The Economist.
    • Food science: Because, science makes everything tasty!
    • Other things I do in my free time: Learning how to photograph & edit, blogging & writing, learning French, baking, and learning how to gracefully lose at chess.

 

Mic Check: Jeneé O.

Peace! I’m a Nieman Fellow (Nieman Foundation for Journalism at Harvard). I’m also a lifestyle columnist and culture critic at The Kansas City Star where I write about race, gender and civil rights issues through the lens of pop culture.

Journalism is rapidly changing and we can’t just change with it, we have to innovate, too. And it’s important to me that we think about how to do that inclusively.  Diversity and accessibility in digital storytelling is a must.

When I’m not learning as much as possible and representing for my Hogwarts family, I’m walking my two boxers or listening to trap music and doing yoga. You can find me on Twitter @jeneeinkc.

 

Overview: Find stories faster in massive document dumps

If you were tasked with reviewing and making sense of a huge stack of documents you’ve never seen before, you would probably go about it in a pretty standard way. Skim the first page and make a quick decision about whether it’s relevant or about a specific topic, then move to page two and make that decision again. After a few pages, you might have a few separate piles describing what you’ve seen so far.

As you continue reading, the piles might get more sophisticated. In one pile, you might place emails containing specific complaints to the school board. In another, policy proposals from a public official’s top adviser. On and on you go until you get through enough of the pile to have a fairly good idea of what’s inside.

For investigative journalists reviewing massive document dumps — responses to public records requests, for example — this may be one of the very first steps in the reporting process. The faster reporters understand what they have, the faster they can decide whether there’s a story worth digging into.

Overview, a project to help journalists sift through massive document dumps

Making sense of documents as efficiently as possible is the primary purpose of Overview, an open-source tool originally developed by The Associated Press and funded by a collection of grants from the Knight Foundation and Google, among others.

Upload your documents into Overview and it will automatically process them first using optical character recognition. It then uses a clustering algorithm called term frequency-inverse document frequency to try to sort each individual document into a series of piles. It’s somewhat similar to the way a human reporter would sort documents if she were reading the pages one by one.

TF-IDF is built on a really basic assumption. It counts the number of times each word is used in each document — say a single email in a batch of thousands. It then compares those counts to the number of times the same words are used in the larger collection of documents. If a few of the emails have words in common that are relatively uncommon in the whole collection of emails, the assumption is that those documents are related in some way.

Overview doesn’t actually derive any meaning from the words it’s counting, so the assumption the algorithm makes about documents being related might be wrong or totally unhelpful. But Overview also allows users to tag individual documents (or whole piles) with custom labels. It might, for example, help a reporter more quickly identify those complaints to the school board or the policy proposals to the public official because they’re all grouped together by the algorithm.

Overview has a few other helpful features, like fast searching and the ability to rerun the clustering algorithm with different parameters — specific terms of interest or stop words, for example. It’s also seamlessly integrated with another tool called DocumentCloud, a popular platform journalists use to annotate and publish documents online.

Visual Explanatory Illustrations: “Back of a Napkin” methodology

[[* I reviewed the lists of tools, but understood that the selected tool does not need to be among the ones listed *]]

As a reaction to the access to huge amounts of information, we’ve seen a surge of explanatory media. Vox.com is known for its tagline “Explain the news”, theSkimm has a set of guides to hot news topics, and the tool FOLD lets writers link media cards along with their writing to provide more context.

News and storytelling already rely on images, audio, maps, cards, data diagrams, and more, to support their arguments and provide context. There is, however, an underuse of illustrations that help explain how systems work. We are visual thinkers and most of us learn better with pictures. While glorified illustrations of data and aesthetically pleasing designs are appealing, I am now talking about pictures that enable understanding by for example showing how things are connected. Future news sources that leverage this tool of explanatory illustrations, and successfully satisfy readers’ demand for understanding the news, will be at an advantage.

Figure 1: Example of an explanatory illustration

A specific tool that teaches anyone to problem-solve and communicate with pictures is Dan Roam’s book The Back of a Napkin: Solving Problems and Selling Ideas with Pictures. Dan Roam provides a methodology for discovering, developing, and selling ideas through pictures. He shows how to decompose a problem and come up with both simple pictures, as illustrated in Fig. 1, and more complex pictures.

 

 

Dan Roam describes the process of visual thinking as four steps, with separate chapters describing how to do each step:
1) looking, i.e. collecting and screening
2) seeing, i.e. selecting and clumping
3) imagining, i.e. seeing what is not there
4) showing, i.e. making it all clear

The book also includes concrete methodology charts, as shown in Figure 2, that can be useful starting points when determining how best to illustrate a topic or your ideas with pictures.

Figure 2: A chart to help determine how best to visualize a problem. The rows specify what type of problem it is (who/what, where, etc.) and the columns specify what should be highlighted (quality vs. quantity, vision vs. execution, etc.).

 

 

 

Anushka’s bio

 

My name is Anushka Shah, and I work as a researcher at Ethan Zuckerman’s Center for Civic Media here at the MIT Media Lab. My work focuses on using text analytics to analyze news language and on producing research with a new analytics tool called Media Cloud.

Home is Mumbai (really, Bombay) for me. It’s where I grew up, where I went to school, and where my family lives. I studied Government and Economics in the U.K. for my undergraduate education, with the hope of returning to India to participate in the political sector. When I did return home, I slowly came to realize there were two Indias; a socially and economically comfortable one that I grew up in, and a difficult, dark, disadvantaged one that I only saw at a distance.

I spent the next three years working with non-profit organizations and grass-roots political parties trying to understand various aspects of this other India. It was an important experience for me, not because I learned much of how certain issues could be positively affected, or what policies worked on ground and didn’t, but because I understood how deeply complex rural India is.

Amidst other things, the simplistic narratives about rural India that I and many others grew up with, kept the two Indias apart. I got interested in media as a way to affect opinion, knowledge, and eventually civic engagement in India. I studied applied quantitative research with a focus on news analytics, and now work in Ethan’s lab using Media Cloud to research Indian media.

Going forward, I want to use my quantitative media skills and field experience in India to design effective media messaging back home.