Refugee Resettlement in the United States

An explainer on the process refugees go through to relocate to the U.S. — a collaboration from Brittany and me…

From Brussels to Paris, the growing number of terror attacks in the West has bled both fear and ignorance around the number of Syrian refugees resettled in the United States. The Republican presidential frontrunner has even gone so far as to pledge that he will send resettled refugees back to Syria if elected. Yet, for all of the hand-wringing about the influx of potential jihadists, official government data tells another story.

Since the Syrian civil war broke out in March of 2011, just under 2,200 refugees have been admitted into the United States. According to the Pew Research Center, of the 70,000 refugees the United States was able to legally accept in the 2015 fiscal year, roughly 25% were from Burma, 20% from Iraq, and 13% from Somalia.  While the Obama Administration will raise the refugee cap to 85,000 to accommodate 10,000 Syrian refugees in 2016, Syrians will still make up less than 12% of the total admitted refugee population. Also, while the average processing time for refugees is 18 to 24 months, Syrian applications can take significantly longer because of security concerns and difficulties in verifying their information. Aid organizations currently put the actual processing time at 33 months.

Rather than just throwing more numbers at the reader, we decided to let he or she engage with the Syrian asylum application process directly via Typeform. A survey with style, easy on the eyes Typeform allows the designer to simulate a conversation through “logic jumps”, which adapts the survey based on a respondent’s answer. Try your hand at the journey here.

My [future] tool: Uliza

I heard the phrase “digital divide” for the first time about six months ago. As someone just sticking their toe in to the larger debate around ICTs, net neutrality, and zero-rating products, it’s been a slightly overwhelming dive down the rabbit hole to say the least. It has also lead me to the tool I’ll be introducing today: Uliza.

What is it?

Uliza, which means “ask” in Swahili, is a telephone service that leverages existing technologies in voice recognition, cloud-computing, and translation to provide access to information for the 4.5 billion people who are off-net or illiterate in a major internet language. It is currently being developed for market in East Africa by a team of graduate students at The Fletcher School, MIT, and UC Berkeley.

How does it work?

Anyone with a phone can call a toll-free number, ask a question in their own language, and receive an answer through an automated service, at no cost.

Caller experience:Uliza caller process Back-end experience:Uliza backend process

Why does it matter?

With only 5% of the world’s languages available on the internet, representing linguistic diversity online continues to be a major challenge. Uliza is one product in growing suite of tools which seek to bridge the information divide between networked and un-networked communities. The original three-person team behind Uliza — who collectively have more than a decade’s worth of experience working in East Africa — chose to roll out Uliza in Kenya due to the high adoption of mobile technology, even among low-income population, growing telecom industry, and a need to scale Swahili-language resources.   



(How) Can Algorithms be Racist?

Technology can be the ultimate equalizer: once access is provided, it can erase borders, education, race, class. But a new study offers that the same tools that are said to provide a level playing field might also be blind spots.  Are the algorithms that are used to drive images and ads perpetuating human prejudices?  One study says yes. But, how can algorithms (which seem to be based on reason) discriminate?

Flash preview: (How) can algorithms be racist? An illustrated story #doodles #datamining #race #partnews

A video posted by Sophie C (@petit.chou) on

For this assignment, Alicia and I wanted to tackle the issue of bias and discrimination in algorithms in a creative way. Our response is to this short article from the Guardian, “Can Googling be Racist?“.  The Instagram video is a preview of the resulting story, which I plan to scan into a static web-readable series.

To explain, we  supplemented Latanya Sweeney’s research paper with my own knowledge of data mining and algorithms, in a easily-digestable format. One of my biggest gripes as a computer scientist/machine-learner is the assumption that algorithms are either value-free or a mysterious black box. As Mark Twain (might have) said,

“There are three kinds of lies: lies, damned lies, and statistics.”


Tracing the links of the Germanwings disaster

A week ago a German jet crashed into the Alps, killing all 144 people on board. For the first several hours after the tragedy it was considered an accident, but it is now apparent that the plane’s co-pilot, Andreas Lubitz, is responsible, and details continue to emerge about his past. As more facts surface, news outlets covering the tragedy have released them in incremental updates. These updates have touched on a wide variety of questions: Why was no one aware of or worried about his mental health issues? Should he have been flying a plane in the first place? Have suicide plane crashes happened before? How has small-town Germany — such as the town of the 16 high school students on board or the pilot’s hometown — reacted to the horrific event?

When publishing these updates, publishers are often linking back to previous stories as a proxy for background information. The “original” story breaking the incident tends to be low on hyperlinks (such as the first link above, which only links to a Germany topic page) while later updates start to link back to archival stories for context. I was curious whether these internal, archival hyperlinks could be followed in order to automatically create a community of stories, one that touches on a variety of aspects of the incident. Links are rarely added to stories retroactively, so in general, following the links means traveling back in time. Could a crawler organize all the links for me, and present historical content (whether over the past 3 days or 10 years) for the Germanwings disaster?

I built a crawler that follows the inline, internal links in an article, and subsequently builds a graph spidering out from the source, storing metadata like link location and anchor text along the way. It doesn’t include navigational links, only links inside the article text; and it won’t follow links to YouTube or Wikipedia, just, for instance, the Times. This quickly builds up a dialogue of stories within a publisher’s archive, around one story; from here, it is easy to experiment with simple ranking algorithms like the most-cited, the oldest, or the longest article.

I chose three incremental update articles from March 30, one each from the Times, the Post, and the Guardian, all reporting that Lubitz was treated for suicidal tendencies:

For each of these three, I spidered out as far as they could go (though in the case of the Times that turned infinite, so I had to stop it somewhere).

New York Times

My first strategy was to simply look at the links that the article already contained. While the system can track links pointing in as well as out, this aticle only had outlinks; presumably this is because a) it was a very recent article at the time of the query, and b) we cannot be sure that we have all of the related stories from the given spider.

Clicking on a card will reveal the card’s links in turn–both inlinks and outlinks.

The “germanwings-crash.html” article had several links that formed a clear community, including archival stories about plane crashes from 1999 and 2005. The 1999 story was about an EgyptAir crash that has also been deemed a pilot suicide. This suggests that old related articles could surface from following hyperlinks, even if they were not initially tagged or indexed as being related. The 2005 crash is linked in the context of early speculation about the cause of the crash (cabin depressurization was initially considered). It is a less useful signal, but it could be useful in the right context.

This community of links is generally relevant, but it does veer into other territories sometimes. The Times’ large topic pages about France, Spain, and Germany all led the crawler towards stories about the Eurozone economy and the Charlie Hebdo shooting.

Washington Post

The Wapo article collected a community of just 32 links, forming a small community. When I limited the spidering to just 3 levels out, it yielded 12 Germanwings stories covering various aspects of the incident, as well as two older ones, one of which is titled “Ten major international airlines disasters in the past 50 years.”

Click on the image to see the graph in Fusion Tables.

The Washington Post articles dipped the farthest back in the past, with tangential but still related events like the missing Malaysia Airlines flight and the debate over airline cell phone regulations.

The Guardian

The Guardian crawler pulled 59 links, including the widest variety of topic and entity pages. It also picked up article author homepages though (e.g. 32 of these links ended up being relevant Germanwings articles, which is well more than I expected to see…I wouldn’t have guessed the Guardian had published so many stories about it so quickly. These ranged from the forthcoming Lufthansa lawsuit to the safety of the Airbus.

Click on the image to see the graph in Fusion Tables

The Guardian seems to have amassed the biggest network, and tellingly, they already have the dedicated topic page to show for it, even if it’s just a simple timeline format. The graph appears more clustered than Wapo’s, which was more sequential. But it doesn’t dip as far back in the past, and at one point, the crawler did find itself off-topic on a classical music tangent (the culprit was a story about an opera performance that honored the Germanwings victims).


In the end, the crawler worked well on a limited scope, but I found two problems for link-oriented recommendation and context provision:

  1. The links were often relevant, but it wasn’t clear why. More detail surrounding the context around the link is crucial. This could be served by previewing the paragraph on the page where the link occurs, so a reader could dive into the story itself. In short, a simple list wouldn’t be as detailed as a complete graph or more advanced views.
  2. The topic pages were important hubs, but also noisy and impermanent. Most NYT topic pages feature the most recent stories that have been tagged as such; this works better for a page like “Airbus SAS” than it does for “France.” As such, such an algorithm needs to treat topic pages with more nuance. Considering topic pages as “explainer” pages in their own right, one wonders how they could be improved or customized for a given event or incident.

Another wrinkle: I returned to the NYT article the next day after a few improvements to the crawler, and found that they had removed a crucial link from the article, one that connected it to the rest of the nodes. So already my data is outdated! This shows the fragility of a link-oriented recommendation scheme as it stands now.

Demystifying the Internet in Cuba

A group of early adopters at CENIAI, Havana, 1996. Photo courtesy of Larry Press.

A group of early adopters at CENIAI, Havana, 1996. Photo courtesy of Larry Press.

When it comes to the Internet, Cuba is routinely compared to countries like China, Iran, and Vietnam, where broad-reaching Internet censorship regimes exist. The degree to which Internet use is controlled by the Cuban government is great. But unlike these and many other countries, there is no evidence that the Cuban government conducts systematic censorship of online content.

Similarly, there is no reliable data on how many people in Cuba actually use the Internet — regularly-cited statistics range from 2.9%-25%. And one could spend years reading western media coverage of Cuba’s Internet and its embattled blogging community (as both of these authors have) and never figure out precisely how the Internet works there, how many people use it, and what kinds of restrictions they face in doing so. Like many other aspects of public life and experience on the island, Cuba’s digital culture is poorly understood by outsiders…

Read the whole explainer by me and Elaine on Medium.

Explainer: Nigerian elections

Since Saturday’s presidential election in Nigeria, the world has been watching. Firstly, Nigerians and observers feared that cycles of electoral violence and rigged results might repeat themselves. Secondly, a win for presidential challenger Muhammad Buhari – which looked likely early on Tuesday – would mark President Goodluck Jonathan the first incumbent not to win re-election in the country’s history. Before results and Buhari’s historic victory were confirmed later today, I created an infographic to give a brief background explainer about the Nigerian elections. If I had had more time, I would have liked to include more info on social media and tech innovations used during this election.