In Defense of The Web Inspector

Here’s a funny thing about the Web: sometimes, secrets are hiding in plain sight. Indeed, when you browse a web page, you generally receive a lot of elements from its server. Of course, you generally obtain HTML and CSS markup, as well as Javascript code, but also a variety of other files like fonts or data sheets. Then the browser combines and interprets all of that data to form the page you are browsing. What you see on the webpage is only, then, the tip of the iceberg; but there is generally much more to it, and it’s sitting idle in your computer’s memory.

Often the rest of the iceberg is essentially worthless for journalism purposes. Sometimes, however, it can be crucial to access it. For instance, you could be looking at a visualization and you be longing to get the dataset forming the base of what you are seeing. Or you would want to remove that stupid overlay sitting between you and the paywalled content. As it happens, more often than you might think, you can circumvent it. (We’ll see how to do this later.)

So, today, I wanted to talk about a tool that allows you to do that; more crucially, if you are reading this now on desktop, it is probably just a shortcut away:

  • If you are on Chrome or Safari on Mac, just trigger the shortcut Cmd+Alt+i.
  • If you are on Chrome on Windows/Linux, just press F12.
  • If you are on Edge/IE, just trigger the shortcut Ctrl+1.
  • If you are on Firefox on Windows/Linux, just trigger Ctrl+Shift+c.
  • If you are on Firefox on Mac, just trigger Cmd+Alt+c.

What you are seeing here is the Web Inspector. Some of you, probably, have heard of it, or used it; most journalists, maybe even the ones that are processing data, are not aware of its existence. A web inspector allows you to understand what is going on with the web page that you are visiting. It generally is organized around the same categories:

  • a console, which broadly is here to detect and notify errors.
  • a storage panel, which displays cookies and other data stored by the website on your computer’s hard drive.
  • a debugger, which really is useful for developers that seek to debug their Javascript scripts.
  • a timeline, which displays how the page is loading (at what speed? What are the components that take the most time/space/computing power to load?),
  • along with a network panel which shows through which networking mechanisms these elements were loaded.
  • the resource panel, which shows all the elements used to load the page,
  • and the elements (or DOM explorer) panel, which how these elements fit together through HTML.

Let’s go back to the two scenarios that I laid out earlier, and use them as examples of how to harness these a web inspector for journalistic purposes.

Let’s take, for instance, this applet. Made by a French public broadcaster, it tracks the attendance of local politicians across France. You can search by name or region but, sadly, you can’t directly download all the data. This is all the more disappointing that the website indicates that their dataset has been done by hand, so you probably can’t find it elsewhere.

Well, with a web inspector, you can. If you open it and click on the network panel (and reload the page), you can see that there is a datas.json file that is being downloaded. (See the red rectangle.) You just have to click on it, and you just have to browse the dataset.

Now let’s take a second example. You want to go on a paywalled website, say, ForeignPolicy.com. You probably will end up with that:

Now, there is a way to actually read the article in a few clicks. First, open the inspector by right-clicking on the dark part of the page and selecting “Inspect element”.

You should probably obtain a panel with an element of the HTML already selected. You can just remove it by pressing the delete key.

The problem, now, is that scrolling has been deactivated on this website, so you can’t descend much further into the article. However, if you inspect one of the article’s paragraphs, the panel will display the part of the HTML file that corresponds to the article’s content. You can then expand every <p> (which is the HTML-speak for paragraphs), or right-click “Expand all” on the line above the first paragraph:

And here you have it:


It’s not the most practical way of reading an article, but it’s probably better than no article at all. (And to be clear, I’m all for paying for your content!)

The broader point is this: if you feel like you get stuck on a webpage, that a webpage is somehow blocking you to access a deeper level of content, the web inspector may be here to help. It is not bullet-proof, but, as we’ve seen here, it can sometimes save your research process.

In short, the web inspector is an underrated tool for journalistic research: it is already installed in every desktop browser, it is a de facto Swiss knife for web tinkering, and is not that well-known. To me, it may be one of the common tools of journalism in the future.

Request and investigate public records with MuckRock

By Aaron and Drew

Online tools have the ability to lower the effort that journalists need to put into researching their stories. Lowering the activation energy can spur new types of journalists, reimagined forms of engagement, and entire communities centered around this new media.

MuckRock is a perfect example of such an ecosystem forming around a tool that made a previously burdensome task easy. Requesting information from the government can be daunting, but MuckRock guides you through everything and even digitizes what is otherwise often a snail mail process.

However, MuckRock is not just about requesting public records. It’s also about everything that comes after. People can track each other’s’ requests, report articles on the public records, and even crowdsource donations to support more investigative research. (from: https://www.prx.org/group_accounts/190663-muckrock)

What is the Freedom of Information Act?

Lyndon B. Johnson signed into law the Freedom of Information Act (FOIA) in 1966. The law mandates the disclosure of government records to anyone who request them, citizen or not. There are a few exceptions to these record releases, such as national security or the locations of wells. In 2015, the government received over 700,000 such requests for information, of which approximately 25% were released in full and 45% were partly released.

The law mandates a response from agencies within 20 business days. Agencies are allowed to charge citizens for the time and materials.

The FOIA is a federal law that only applies to federal agencies in the executive branch. All 50 states and the District of Columbia have passed their own Freedom of Information laws that are generally very similar to the federal version.

What exactly is MuckRock?

MuckRock is a service for journalists to request and manage FOI requests from a variety of federal, state, and local resources. Since 2010, it has released more than one million government documents: http://www.bostonmagazine.com/news/blog/2016/07/03/muckrock-foia-turns-50/

How do you request a public record?

It’s great you asked! We wondered the exact same thing, so we went ahead and requested our own with MuckRock. The process is simple. All you need to do is sign up for an account, pay a nominal fee ($20 for 4 requests) and then make your request.

With our request, we have asked the FBI to release all records pertaining to foreign cyber attacks against American universities.

Muckrock tracks the average response times of various agencies. Here are some examples:

Agency Average Response Time Required Response Time Success Rate Average Cost
Federal Bureau of Investigations 130 days 20 days 21% $2661.30
Central Intelligence Agency 156 days 20 days 9.5% $28.30
Department of Justice 211 days 20 days 8.2% Not Available
Massachusetts Bay Transit Authority 84 days 10 days 38% $2082.84

How do public records turn into journalism?

Once one hears back from the government with the information she requested, she can use the information in her reporting. Additionally, MuckRock writes its own articles using the public records surfaced by users.

This reporting has the ability to close the loop of the FOIA process and hold parties accountable for actions that might otherwise go unnoticed. Articles on MuckRock are often very timely. Some recent examples include “Boston Police underestimated size of Women’s March protests by nearly 150 thousand” and “EPA Transition docs detail many of the regulations Trump could roll back”.

In a departure from what you normally see in journalism, the articles are often centered around a piece of evidence, such a police report, FBI file, or government document. Not only is the evidence there for you to see and inspect, you can look up the history of how and where it was obtained.

This has the ability to change how readers interact with the news they’re consuming. They can inspect the evidence themselves, forming their own judgement, and even develop ideas on how they may further the reporting in the future — transforming them from consumers into producers. It also introduces transparency that can help instill confidence in the media.

Who funds all this?

Individuals can request their own public records (like we did!), thus supporting the MuckRock community through these one-off records. However, they can also help fund larger projects that are centered around a particular topic and requiring substantial funding. In this sense, MuckRock serves as a crowd-funding website.

Example: https://www.muckrock.com/project/the-private-prison-project-8/

Quick data visualizations

The need for data visualization

With the growth in trend of buzz words like big data, data science etc, the general interest in expecting data as proof is becoming the norm amongst readers. Additionally, growing popularity of blogs like FiveThirtyEight, reporting is slowly moving towards becoming more data oriented. Therefore, the onus now lies on the media content producers to use advanced data analysis to make their points. However, analyzing data is complicated and even harder to communicate but could be done effectively by using data visuals.

Conducting my research on the topic of easy data visualizations, I noticed that majority of the recommendations revolved around using programming languages like R, python etc. Learning how to code is a mammoth task for writers whose main focus is on researching and delivering the story and not learning how to code. Writers need a tool that helps them analyze data and build visuals with a few clicks. A tool like Plot.ly.

What is plot.ly?

Plot.ly addresses the user challenge of creating data visualizations without having heavy knowledge of programming and data visualization techniques. Plot.ly’s website and blog showcase a number of samples on how leading news sites have used Plot.ly visualizations in their articles. For example, below is a sample visual showing statistical analysis in a NYTimes article:

Source: NYTimes 2014 Article – How birth year influences political views

Some of my favorite tools on the platform (image below) are:

  • Ability to use excel layout to input data and pick from over 20 different chart types
  • Creating charts which enable reader interaction
  • Using statistical analysis tools like ANOVA on a web-based platform
  • Reverse coding, enables users to get the code behind the visualizations incase users want to create the same visuals using other programming languages

What does data visualization mean for advancing journalism?

In my opinion, I think Plot.ly helps advance journalism and storytelling by:

  • Saving time for writers by freeing up time for constructing and presenting stories instead of wasting time and resources on visual designers. Additionally, enabling journalists to publish stories as fast as possible.
  • Increasing interactions with readers. It has become harder and harder to engage readers through various types of media because of the declining attention span. Therefore, getting readers to engage readers through interactive visuals could help engage readers and help increase participation
  • Integrating diverse communities – Using technology platforms like Plot,ly could help increase interactions between diverse groups like technologists and journalists helping advance each other’s cause.

Plot.ly resources

Multiple tutorials are available on the webiste. From creating charts to data analysis using sample data sets.

Tools: Twitter is an oldie but it still jams

I already know. Twitter is not the newness. But that’s why I’m taking this class – to discover new tools and think differently. In the meantime, when I’m on deadline writing three columns a week, sometimes I feel like this:

But Twitter’s  Advanced Search often comes through in a pinch. When there’s a breaking news story in the #blacklivesmatter movement or something trending in underserved communities, Twitter often has the news first and Advanced Search allows you to zone in on specific dates, people and even geographic location. You can search by specific tweeters, hashtags or general phrases making it easier to source, fact check and connect. That makes me as happy as Solange when you don’t touch her hair. Don’t touch mine either.

Overview: Find stories faster in massive document dumps

If you were tasked with reviewing and making sense of a huge stack of documents you’ve never seen before, you would probably go about it in a pretty standard way. Skim the first page and make a quick decision about whether it’s relevant or about a specific topic, then move to page two and make that decision again. After a few pages, you might have a few separate piles describing what you’ve seen so far.

As you continue reading, the piles might get more sophisticated. In one pile, you might place emails containing specific complaints to the school board. In another, policy proposals from a public official’s top adviser. On and on you go until you get through enough of the pile to have a fairly good idea of what’s inside.

For investigative journalists reviewing massive document dumps — responses to public records requests, for example — this may be one of the very first steps in the reporting process. The faster reporters understand what they have, the faster they can decide whether there’s a story worth digging into.

Overview, a project to help journalists sift through massive document dumps

Making sense of documents as efficiently as possible is the primary purpose of Overview, an open-source tool originally developed by The Associated Press and funded by a collection of grants from the Knight Foundation and Google, among others.

Upload your documents into Overview and it will automatically process them first using optical character recognition. It then uses a clustering algorithm called term frequency-inverse document frequency to try to sort each individual document into a series of piles. It’s somewhat similar to the way a human reporter would sort documents if she were reading the pages one by one.

TF-IDF is built on a really basic assumption. It counts the number of times each word is used in each document — say a single email in a batch of thousands. It then compares those counts to the number of times the same words are used in the larger collection of documents. If a few of the emails have words in common that are relatively uncommon in the whole collection of emails, the assumption is that those documents are related in some way.

Overview doesn’t actually derive any meaning from the words it’s counting, so the assumption the algorithm makes about documents being related might be wrong or totally unhelpful. But Overview also allows users to tag individual documents (or whole piles) with custom labels. It might, for example, help a reporter more quickly identify those complaints to the school board or the policy proposals to the public official because they’re all grouped together by the algorithm.

Overview has a few other helpful features, like fast searching and the ability to rerun the clustering algorithm with different parameters — specific terms of interest or stop words, for example. It’s also seamlessly integrated with another tool called DocumentCloud, a popular platform journalists use to annotate and publish documents online.

Tools to transcribe audio and video content

I’m pretty new at making podcasts. It’s not always easy when English is not your first language. Especially the transcription! If I had to do it myself by hand, it would take ages before I start editing. But with a help from some tools, I can edit and produce podcasts without a pain. I’ve only used the first one, but saw a demo for the second one at ONA last year, which was impressive.

  • Pop-up archive is a good for transcribing audio material. The accuracy is pretty good and I love the timestamping features.
  • Trint is a tool for transcribing audio and video material. It also has timestamping features with a function to adjust. The text can be also adjusted. You can also highlight the segment you want to use and it automatically tells you the time duration of the selected part.

FYI, in case of audio/video production, I always listen or watch the entire raw material of the interview. Even you have everything transcribed, it is just a guide for editing. Find the best part of the interview using your own eyes and ears!

PGP: An Old Technology for a New Media Environment

Data privacy is, and should, be top of mind for journalists. As the Trump Administration takes an antagonistic approach with the media, it’s not very unrealistic to imagine the President signing an executive order any day now forcing news organizations to release emails to the government or have to pay significant fines or even face jail time if they do not reveal sources for leaks.

Just this week, President Trump tweeted about the “illegal tweets coming out of Washington” following the resignation of Michael Flynn as National Security Advisor. Flynn’s resignation was due in large part to reporters from The New York Times, the Washington Post, and other outlets publishing stories based on leaked information from government officials about Flynn’s conversations with Russia.

For journalists to keep informing the public of the stories that the Administration is trying to hide or ignore, they must continue using anonymous sources from within the government. These leaks cannot stop, regardless of whatever measures the Administration tries to put in place to stop government employees from speaking out and contacting the press.

The Need for Encryption

But for many of these employees, there are major ramifications to divulging top secret or sensitive information. Before any government employee considers leaking information to the press, they need to be sure that the communication is delivered securely and their identity is not divulged. Outside of in-person, secret meetups Deepthroat-style, this means that the journalist will need to use encryption to keep the information secure. Similarly, the journalist will need to keep the information secure to keep sources private to continue reporting the stories that need to be told.

PGP: A Golden Standard

Pretty Good Privacy (PGP) is a free encryption and decryption program created by Phil Zimmermann and typically used for email that has been around since 1991. The name, which is a tribute to A Prairie Home Companion, is misleading, as the tool is known to be more than just “pretty good” when it comes to maintaining a user’s privacy. In a post titled “Why do you need PGP?,” Zimmermann explains the need for the encryption tool:

Intelligence agencies have access to good cryptographic technology. So do the big arms and drug traffickers. So do defense contractors, oil companies, and other corporate giants. But ordinary people and grassroots political organizations mostly have not had access to affordable military grade public-key cryptographic technology. Until now. PGP empowers people to take their privacy into their own hands. There’s a growing social need for it.

Encryption, much like PGP, is a very old technology that is still just as relevant and powerful as it was when it  was first invented. Through encryption, the message you send is muddled up into a meaningless string of letters and numbers so that anyone snooping through your email cannot decipher the message. Only those with the correct key can unlock the meaning:

(via Lifehacker)

To start using PGP, you need to download GNU Privacy Guard (GnuPG), either through GPGTools (OS X) or Gpg4win (Windows). Once he or she has his or her own PGP key, the person can communicate with anyone else through encryption, so long as the recipient also has a PGP key. There are several browser extensions you can download to make the process of sending an encrypted email quicker, including PGP Anywhere and Mailvelope. PGP also works with mail clients such as Mozilla Thunderbird for email encryption.

The biggest hurdle for anyone new to PGP is finding others who have their own PGP keys as well. WIthout the two-way system, you cannot send the encrypted messages. This may be a deterrent for some reporters who cannot convince sources to use a PGP key because of the time it takes to set it up. But for journalists who want to protect information and confidentiality, the upfront costs are worth the privacy gained through encryption.

To avoid this issue, there are other encryption tools journalists can use, such as Virtru. This tool is used in conjunction with other platforms such as Gmail and Salesforce to keep information secure through data encryption. However, unlike PGP, Virtru and other similar products are not free for users.

PGP is only the first step

Though email encryption is only one step journalists can take to keep their messages secure and the privacy of their sources intact, it’s one of the most important and the first they should consider. PGP is not the perfect solution for encryption, as several government agencies to have the ability to unlock keys and decipher the message. But using PGP can be seen as a gateway for journalists to better maintain confidentiality and keep information secure. Creating a key and locking their emails is the first step journalists can take to unlocking the road to better privacy habits.

A few thoughts on media and storytelling tools

There are several tools that I had never heard about before reading the articles assigned for this week’s class, and that I believe can have important implications for the future of news and storytelling.

I believe that news has to have tools that enable to collaborate with social media – one the one hand, social media can benefit from the higher quality of content news provide; and on the other hand, news can benefit from the bottom-up information sharing that is vivid on social media. In particular, when it comes to the sharing of stories, tools such as Shorthand Social, StoryMap.js, or Storyful multisearch could be very interesting and fruitful.

I also believe that data visualization has an important role to play – we live in a world with a huge number of data, and many people are not aware of the figures, or do not know how to read them. Data provide a lot of information, but the information has to be processed. That’s why I believe that tools such as Silk.co, DataPortals.org

Finally, I believe that tools using current tools and trying to analyze them, such as advanced twitter search, and Tweetdeck might be particulary interesting in the months and years to come.

Politwoops: tracking politicians’ social media stumbles

Deleting tweets is something we’ve probably all done from time to time – whether it’s just to fix a typo or to tone down our reaction to the latest aggravating news story. As private citizens, erasing an earlier post is a reasonable expectation. Yet it might be argued that for politicians in public office, what is said (and read) should stay said, much as a hot-mic gaffe, for example, can’t be taken back.

Twitter has become an important medium for politicians, whether campaigning for office or serving constituents. But sometimes, politicians (and their staffers) can get a bit carried away – and become just as susceptible as the rest of us to some post-tweet regret. Fortunately, the website Politwoops, now hosted for U.S. politicians by ProPublica, preserves these deleted tweets. Their archive makes for an interesting insight into the tweets that politicians wish they could (and perhaps believe they have) taken back. Given the Tweeter-in-Chief’s no-holds-barred nocturnal musings, for example, it’s a tool that may well prove useful for journalists in the coming years.

Several journalists have already noted, for example, the chronological coincidence that President-elect Trump praised Russia’s nonchalant response to Russian sanctions at exactly the time his recently fired National Security Adviser Mike Flynn was holding sensitive discussions with the Russian ambassador. That wasn’t a tweet Trump ever deleted – but it’s certainly reassuring to know that if he had, it would still be on record.

Location-based social media monitoring

Beacon Hill, Sunday night, 10:50 p.m.: Sitting at my kitchen table, I heard a series of pops followed immediately by the sound of sirens.  “Were those gunshots or fireworks?  Should I be worried?  And are the sirens related to the pops I heard?”  

My first reaction was to search for possibly related posts on Twitter while looking for a live audio feed of Boston police scanners.  Instead, I remembered reading about the location-based social media search services that aggregated posts from across several platforms, and I tried the first one I could quickly get a free trial for: Echosec.

Instead of searching across several social media platforms separately, Echosec allowed me to search for all geotagged posts in an area of my choosing and within specified date ranges.  My story has a simple ending — I found on Echosec that neighbors on reddit posted that it was definitely fireworks, which was later confirmed by the police through the live feed.  

Nothing became of this tiny story, but imagine the uses for location-based social media monitoring services in situations with more impact and higher stakes.  

Using Echosec (and other similar services) for discovery and identification

Google “location-based social media monitoring,” and you’ll find pages of lists suggesting various services, most of which appear to be enterprise services.  While many of these services appear to primarily serve police departments, security companies, and marketing departments of large businesses, over the last couple years, journalists have also used these tools to assist their reporting.  For example:

  • At NBC 5 in Chicago, a producer used Geofeedia to quickly find photos of people who were hiding inside a building after an employee shot his boss.  Based on these photos, the station was able to identify potential sources.
  • A social media editor for The Associated Press used SAM to identify students at a South Dakota high school where a shooting was foiled, which led to a reporter being able to conduct an interview to confirm details seen on social media.

In more general cases, these tools can also be used to get a sense for people’s reactions to news and events across the board, not only to identify sources and images.  Broadly, using geolocated social media search tools as several benefits over simply searching on Twitter.

  • Aggregating data from many social media platforms saves time in pressing situations.
  • Aggregation also provides more comprehensive coverage, especially as different social media platforms are prominent in different areas of the world.
  • Searches can be more location-specific and time-specific than most apps allow within their own search function.

Drawbacks to these services

However, there are two major hurdles these services have to overcome to gain more mainstream traction:

  1. They’re relatively costly.  At the lower end, Echosec costs $129 per user per month, and as of 2012, the much more powerful Geofeedia’s preliminary pricing was $1,450 per month for five users.  (And as I searched through lists of services that were only a couple years old, I found that free versions don’t seem to last long in the marketplace, or if they still exist, are not well supported.)  Either the prices have to come down, or the services have to become much, much better than they currently are in order to make the price tag worth it for newsrooms that are satisfied searching on their own.
  2. The vast majority of social media posts are not geolocated. While the percentage varies by platform (Instagram, for example, tends to have “a lot more [geolocated posts] than Facebook, Youtube, or other platforms”), a Knight Lab sample of 200,000 tweets run in 2015 found less than 0.4% were geocoded.  This means that while you can get a sample of tweets that are geolocated, you do have to make sure not to rely on these tools too much — you could miss an important non-geocoded post that does not turn up in your searches.

That said, for many reporting purposes, simply knowing how to strategically search on popular social media sites is enough.  For journalists without access to these fancier aggregated geolocation search tools, old-fashioned hashtag-hunting and keyword-monitoring may be sufficient.

The potential

A common accusation recently is that the “mainstream media” has lost touch with the average American.  One way to gain easy access to some representation of those viewpoints (although we do then get into the issue of comment rage and trolls — which we’ll sidestep for now) is to see what everyone is saying across various social media channels and be able to check for location-based trends.  After all, the Internet is supposed to be the great equalizer — according to a 2016 Pew Research study, 87% of Americans use the internet.  That percentage will only grow.

Going forward, I do think location-based social media monitoring tools have the potential to become even more powerful as a way to explore the public conversation and identify trends, or simply to get the “pulse” of the public.