It’s the French presidential election this weekend. Expect the unexpected.

A lot of people went to bed on the night of November 8, 2016, confident that they would wake up to news of President-elect Hillary Clinton’s election victory. This is not, of course, what happened. Donald Trump’s stunning win confounded pollsters and pundits alike, much as had the UK’s decision to Brexit months earlier.

Political unpredictability has continued apace in 2017, and France’s presidential election represents an early test of whether the nationalistic tremors of 2016 will continue to haunt liberal democracy in its heartlands. One thing’s for sure: no-one can confidently predict the results of this contest. But to borrow a phrase, there are some known unknowns to brush up on ahead of time.

How does France elect a president?

Presidential elections in France are a two-step process, with the top two candidates from this Sunday’s first round progressing to a head-to-head run-off a fortnight later – which means that, whatever happens, we won’t know who’ll become France’s next president on Sunday, but we know who won’t. The two-stage system aside, the process is quite straightforward – the candidates with the largest vote totals progress.

So who’s in the running?

In recent years, two main parties have dominated French politics: the left-wing Socialists, and the center-right Republicans. The current president is Socialist François Hollande, who announced last year that he would not seek re-election, in large part due to staggeringly low approval ratings, which hit an eye-popping low of 4% (not a typo!) late last year.

As in any election without the incumbent running, the field is wide open this year – albeit to an extent unprecedented in French politics, for several reasons. First, with or without Hollande, France’s Socialist party is in disarray: similar to Democrats in the US and the UK’s Labour Party, it is riven by in-fighting between left wing forces and more centrist impulses.

This has served to fragment the electorate. The official Socialist candidate, Benoît Hamon, is struggling under the weight of his predecessor’s unpopularity. Meanwhile, two former Socialist ministers, Jean-Luc Mélenchon and Emmanuel Macron, have each seized chunks of the party’s traditional vote from the left and the right with their own new movements, Unsubmissive France and En Marche!, respectively.

The right wing has also splintered. The National Front, led by Marine Le Pen – the daughter of Jean-Marie Le Pen, who scored a place in the run-off election in 2002 – has seized on antipathy towards Muslim immigrants to lead many first-round polls this year. Meanwhile François Fillon, the Republican candidate, is riven by allegations of financial impropriety relating to salaries given to members of his family, which have threatened to upend his candidacy.

OK, so it’s a broad field. But who’s going to win?

It’s very hard to predict who will make it through to the run-off – let alone win – but right now there are four viable candidates who, polls suggest, each clustered at between 18% and 23% of the vote. Mélenchon, with a radically leftist agenda, has been gaining recently at the expense of Hamon, and is running at around 18%, for what would be a close-fought fourth. Fillon has resisted calls – even from among his own party – to drop out, and his support seems to have stabilized, as he is currently running in a close third. Macron and Le Pen, meanwhile, have been trading the lead for the last few weeks, with each averaging around 23% of the vote.

The state of the French presidential polls at the time of writing. (Wikipedia)

The results of the run-off will depend, of course, on who makes it through. As things stand, one candidate – Macron – would win his head-to-head with each of the other three viable candidates, while another – Le Pen – would lose all of hers. (Mélenchon beats Fillon in the least-likely match-up.) But polling the run-off accurately is difficult while other candidates remain in the race, particularly when three out of the four leading candidates represent parties who have never won the presidency. In particular, if Fillon continues his comeback and makes it through to the run-off against Le Pen, the achingly familiar prospect of an experienced but scandal-plagued establishment candidate losing to a xenophobic outsider seems plausible.

How important is this election?

In a word, very. In constitutional terms, the French presidency represents something of a middle way between a mostly-symbolic head of state like the German presidency, and the powerful executive in the American system. Compared with America, periods of “cohabitation” – where one party controls parliament while another occupies the presidency – have been relatively rare, and in these instances, the president tends to take a back seat.

But one area in which French presidents have the most control is in foreign affairs, and France’s election this year represents, in a certain sense, another referendum on the European Union. Only the far-right Le Pen has vowed to leave the Euro currency, but both she and Mélenchon have adopted the fateful promise made by Britain’s David Cameron to renegotiate France’s relationship with the EU and put the resulting settlement to a formal referendum vote.

Meanwhile, Le Pen and Russian President Vladimir Putin have been open about their mutual respect, and Fillon is also notably more comfortable with Russia than other Republican figures – while Macron, the only avowedly pro-European candidate has been hit with a barrage of cyberattacks and fake news. Again all this sounds familiar, perhaps that’s because it is.

Both politically and geographically, France is much more central to the European project than Britain ever was, so a rebuke by voters would represent a much more existential threat to the Union – creating precisely the kind of instability that Russia’s Putin is said to want.

What else is there to know?

The polls are changing daily, and the election is now too close to call, according per multiple outlets. While this uncertainty creates volatility, in everything from markets to geopolitics, at least pundits and the public alike are more prepared for multiple outcomes than they were on the mornings of June 24 and November 9, 2016, when British and American voters created political earthquakes. It’s always useful to know what you don’t know.

Waymo, Otto, Uber, Google: A Lawsuit in Need of an Explanation

Amongst the many stories of Uber’s recent controversies, you may have heard of a lawsuit between Waymo and Uber. The case centers on intellectual property infringement, which is already a complicated, technical issue on its own when you break down how a judge or jury determines if the technology is infringing on a patent.

Let’s break down just what’s going on in this situation:

What is Waymo?

To understand what Waymo is, we need to go back to Google Co-Founder Larry Page’s open letter in which he announced that Google would become a subsidiary of Alphabet Inc., a holding company that will be the parent company for several of the company’s endeavors. This includes X, the company’s “moonshot factory” or investment lab, GV and CapitalG, the company’s two investment arms, and Waymo, the company’s self-driving car project that spun out of X.

Waymo began as a self-driving car research project at X (then called Google X) in 2009. The name Waymo, which Alphabet unveiled in 2016, is short for “a new way forward in mobility.”

Seems like a weird phrase to shorthand.

Yes, yes it does.

Let’s go back to this Alphabet company. Why was it created and did it replace Google?

Google still exists and includes everything you would associate with the brand: search, ads, YouTube, Maps, Chrome, and Android. But now it’s one of several entities under the Alphabet umbrella.

When it comes to the why, that’s a bit trickier to explain. There’s plenty of speculation that the creation of Alphabet is all about improving visibility into the company’s operations and revenue breakdown for investors. Each of these companies now operates independently, with separate budgets and revenue that are reported out in the quarterly earnings the company is required to file by the US Securities and Exchange Commission, the the governing body of the US financial markets.

So, back to Waymo. Why was it spun out of X?

Companies are spun out of X when they’ve moved past the research stage and are ready for commercialization. If Alphabet is confident the company has a sound business model and product that’s ready for the market, it’s moved out of X to become a stand-alone company. According to Waymo CEO John Krafcik, “what you’re feeling from the Waymo team is confidence that we can bring this [technology] to [people].”

How exactly does Uber fit into all of this?

Uber, like many other technology and automotive companies, is developing self-driving car technology of its own as part of its Advanced Technologies Group. CEO Travis Kalanik first began recruiting engineers for the project in Pittsburgh in late 2014.

Pittsburgh seems pretty random.

It’s not. It’s home to Carnegie Melon University (CMU), which has a well-respected robotics department where many of the top experts in the field spent time conducting research. Originally, Uber partnered with CMU’s National Robotics Engineering Center to develop the technology. Then, Uber poached about 50 people from CMU, which was about one third of the Center’s researchers.

Wait, Uber stole all of the workers away from its partner?

That’s a story on it’s own. Let’s just say it was not a popular move in the technology community.

Did Uber’s technology from the CMU partnership infringe on Waymo’s patents?

No. The patent infringement issue starts with Otto, a startup that was developing self-driving technology for trucks. The company was founded by former Google employees Anthony Levandowski, Lior Ron, Don Burnette, and Claire Delaunay. Levandowski formerly led Google’s self-driving car project and incorporated Otto two weeks after leaving the search giant. The team first announced the existence of the company in May 2016. Only three months later, Uber acquired Otto for $680 million.

Why did Uber acquire Otto? And when are we getting to the patent infringement?

We’re finally getting there. There are a number of reasons Uber bought Otto, including its relationship to car manufacturers, its talent, and its technology. That technology includes LiDAR, or light detection and ranging. LiDAR works by using lasers to detect objects, space, and anything else in an environment by tracking how long it takes for the laser to hit the object and bounce back to create a 3D map. It’s a mechanical form of echolocation. The technology is used for autonomous guided vehicles (aka self-driving cars) to detect everything on and around the road, from other vehicles to obstructions.

This is the technology that Waymo is suing Uber over.

So why does Waymo think Uber is infringing on the LiDAR technology? And when did they file the lawsuit?

The lawsuit began in February. According to Waymo, it all started with an email: “One of our suppliers specializing in LiDAR components sent us an attachment (apparently inadvertently) of machine drawings of what was purported to be Uber’s LiDAR circuit board — except its design bore a striking resemblance to Waymo’s unique LiDAR design,” the company announced in a Medium post.

That email sparked an investigation by Waymo, which eventually led the company to discover that a month and a half before he resigned, Levandowski downloaded more than 14,000 proprietary files, including the designs of the company’s LiDAR technology and circuit board. Waymo also claims that other former Google employees downloaded confidential information about suppliers and manufacturing. The full filing goes into details about just what these employees supposedly stole.

In the filing, Waymo asked the court for an injunction against Uber’s self-driving car program. Translation: Waymo wants to stop Uber from continuing to work on the technology.

How did Uber respond?

Uber released a statement to Business Insider on February 24th denying all allegations, claiming the lawsuit is “a baseless attempt to slow down a competitor.”

Levandowski also released a response of sorts. The Otto founder exercised his Fifth Amendment right to avoid self incrimination on March 30th. He also hired his own criminal counsel for the suit, though he is not formally named in Waymo’s filings.

Uber officially filed its formal response to the lawsuit on April 7th, which included details about the differences in the two company’s LiDAR technologies, the lack of evidence around the 14,000 files.

Did Waymo respond?

Waymo originally claimed that Uber failed to disclose the proper documents related to the lawsuit on April 3rd. Waymo asked Judge William Alsup to compel Uber to produce all of the documents or assume the company is hiding documents.

Waymo also reiterated its claims in its response, saying “Uber’s assertion that they’ve never touched the 14,000 stolen files is disingenuous at best, given their refusal to look in the most obvious place: the computers and devices owned by the head of their self-driving program.”

So where does the lawsuit stand now?

The last update with the lawsuit has to do with Levandowski’s Fifth Amendment claim. William Alsup, the judge presiding over the case, rejected Levandowski’s request and ordered Uber to disclose documents created by a third party when it conducted due diligence for the acquisition of Otto. The due diligence report must be included without any redactions related to Levandowski in a “privilege log,” which is a document a party in a lawsuit produces that they do not think should be opened in court because of the proprietary nature of the material.

Waymo filed the last update in the case, filing an opposition request over Uber’s motion to keep the dispute private by going into arbitration. Uber originally filed for arbitration because of Alphabet’s employee agreements state that any disputes with the company should be settled in arbitration. But Waymo believes this is not a valid claim, as Levandowski is not the defendant in the case, Uber is.

So what now?

Prepare for a long lawsuit filled with more filings and claims. Unless some settlement is reached, the case could be dragged out for months.

French presidential polls seem too consistent to be true

The French presidential election will be held in 10 days from now, and it seems like a nail-biter. The first round, which seeks to determine which two candidates will be qualified for the final runoff1, seems to have become essentially a four-way race. And we are not talking about a four separate, but similar politicians. We are talking about a pro-European centrist candidate (Emmanuel Macron), a neo-conservative who, after being charged for embezzlement of public funds, is now France’s most unpopular politician (François Fillon), a brash but eloquent candidate endorsed by the French Communist Party (Jean-Luc Mélenchon), and, of course, Marine Le Pen. According to the latest polls, all of them are, essentially, within 5 points of each other.

The French media is, understandably, covering the election relatively nervously. Even though every poll for the past few weeks has shown Macron and Le Pen with a somewhat comfortable lead over Fillon and Mélenchon, every one of the 6 potential match-ups is brought up by pundits. (This has a lot to do with a massive 15-year old election upset; I’ll come back to that a little later.) Most bonkers, however, is that this concern is actually probably underplayed. Just by taking a look at the general shape of election polls, it is pretty clear that something weird is going on, and that we may underestimate how truly unpredictable this first round is shaping to be.

The big red flag of these election polls is that they really, really don’t deviate from each other. It is a pretty easy thing to measure. Polls, because they take measurement on a sample, are inherently flawed; fortunately, that flaw is simple to estimate. For this post, I am assuming that the sampling error for one poll, for the first round, is around 2.7 points2. This is the sampling error that one would expect when trying to evaluate percentages around 20 points, and with samples of 1000 people. This what we are talking about here: the four candidates’ numbers are currently in the range of 17 to 25 points; and almost every poll has a 1000-ish sample.

If that standard error due to sampling is 2.7 points, that means that we should find results within that interval something like 68% of the time.3. This is what would happen if all the surveys were done with random sampling, and no tinkering on the back end. What happens if you line up the polls together and compare them?

The black dots here, represent the different polls. The black line is the moving average (computed with a local regression). The blue interval is this average +/- 1.35 points, which represents the sampling error. We should normally expect to see roughly a third of polls outside that interval. It is obviously not the case. Even more striking is the fact that polls get significantly closer 60 days before the election, after February 25th, at the moment where you can see a lot of movements in the numbers. For the past three months, basically, there has been virtually no outlier poll for any of the four major candidates. This should not happen in an ideal polling environment, and is quite concerning.

To get more dramatic, I used a chi-square test, which is used to determine if a dice is weighted. Here is what it yields:

  • The odds that the fact that Macron’s scores were this consistent is a coincidence is 0.001%.
  • The odds that the fact that Fillon’s scores were this consistent is a coincidence is 0.0003%.
  • The odds that the fact that Mélenchon’s scores were this consistent is a coincidence is 0.0006%.
  • The odds that the fact that Le Pen’s scores were this consistent is a coincidence is 0.00000000002%.

There are potential two explanations to this. Firstly, French pollsters almost uniformly use a method called quota sampling. In other words, they seek for a certain balance with regard to their samples to achieve certain ratios that would be, in their mind, representative of the electorate. The consistency of the quotas used by pollsters could be the cause of the consistency of the polls; and, of course, this is a controversial polling method. It was discredited in the US in 1948, after it failed to predict Truman’s re-election. And this is probably a very bad election cycle to promote quota sampling. The 2017 election cycle has been defined by its volatility, as well as an unusually high number of undecided and a big question mark with regard to turnout. (That, in short, never really happened before4.) In this disjunctive election cycle, it seems a little bit crazy to pretend that the quality of polls rely on a deep understanding of the electorate and its dynamics.

The second explanation, much less nice to pollsters, is what Americans would call herding. In other words, manipulating poll data, or hiding some results, to prevent outliers5. I don’t know to which extent this is the case, because it would require a little bit more of analysis, but it definitely does seem likely. That polls seem more and more consistent during periods at the end of the campaign that displayed a lot of poll movements looks very suspicious to me. I just don’t buy that pollsters have a better grasp of the electorate now that they had four months ago, especially provided that the campaign has been essentially upended a few times since then.

I am all the more suspicious of French pollsters that they actually screwed up big time before. In 2002, all of them showed Prime Minister Jospin and President Chirac as confortable front-runners in the first round; they were virtually assured of being qualified to the second round. What happened, in fact, is that National Front’s Jean-Marie Le Pen came in second, ousting Jospin. (And with an almost 1 point difference!). This was, and still is, a national trauma, as the public did not see it coming. And guess how were pre-election Le Pen’s poll numbers? (The red dot is his actual, eventual score)

It seems like history is repeating itself 15 years later. What does that mean for Election Day? Essentially, fasten your seat bells. The uncertainty around this election is much, much higher than polls might let us think. A lot of second-round possibilities, even between the far-right and the far-left, are to be considered. And, for once, the political TV circus is actually justifiably hysterical.

(1) The French election votes according to the runoff voting system. First round candidates need 50% of the votes to win, or else the two top candidates face off in the second round to get these 50%. And for those who think that this system is a French peculiarity, you might want to think again: it’s actually the voting system used in next Tuesday’s special election in Georgia.

(2) This is also somewhat less than the average error that French polls have historically displayed 30 days before the election. If I were to use that metric, though, it would only strengthen my claims that French polls may be pretty low-quality, and too close to each other.

(3) The standard error is basically half the margin of error. If a pollster say “20%” to you, it really means that it is 95% confident that it will fall between 17.3 and 22.7 points (+/- 2.7 points). And 68% that it will fall between 18.65 and 21.35 (the range is 2.7 points large.)

(4) Turnout for French presidential elections tends to be in the 80%’s, which is obviously much higher than elections in the U.S.. This tends to reduce uncertainty with regard to the effect of turnout on elections.

(5) Much more through explanation of herding is given in this article, which served as a partial inspiration for this post.

Posted in All

Understanding the International Criminal Court


Like many intergovernmental agencies or entities, the International Criminal Court (ICC) in the Hague is, generally speaking, little understood and even less valued in the United States. In the international community, the court is often either hailed as a remarkable development in international justice or starkly criticized as remote and ineffectual. Recent more damning criticisms dismiss the whole enterprise as Neocolonialist. But what actually can the court do and what has it done? This is a story that could be told in many ways involving deep historical and legal analysis. But it has also been a numbers game where location is deeply relevant and so I attempted to tell a very simple version of that story using maps made with Datawrapper. I am indebted to former ICC prosecutor Luis Moreno Ocampo and Harvard Kennedy School professor Kathryn Sikkink, whose January term HKS class “Preventing Mass Atrocities: Preventing Mass Atrocities: The Security Council and the International Criminal Court,” provided much of the background information.


In the more than half century since the Nuremberg Trials, there have been a number of one-off experiments with international and local transitional justice (the international tribunals for the former Yugoslavia and Rwanda, Timor-Leste, Sierra Leone, Cambodia, Argentina, Guatemala etc.). But proponents of international justice dreamed of establishing a single court that would have jurisdiction to try grave cases of human rights abuses around the world and whose moral and legal authority would hopefully prevent such crimes from occurring in the future. After years of wrangling, the ICC was established by the Rome Statute, which was adopted at an international diplomatic conference in 1998 and came into force in 2002.

There was of course a catch (several in fact) and limits on its authority and powers. The most significant being the court only has jurisdiction over those States that are parties to the statute (or committed crimes in territories that are parties). The exception to this rule is cases that have referred to the court by the United Nations Security Council.

The above map and numbers of countries that have signed looks pretty impressive, until you realize who is missing – namely the United States, Russia, China and most of the countries of the Middle East. That’s more than half the UN Security Council and the countries where many of the worst conflicts of the 21st century are occurring. To some degree, the court merely holds a mirror to existing international power dynamics that govern our world. The ICC’s defenders would say it is unfair to expect the court to surpass these realities. But for an entity whose stated mission includes preventing future atrocities from happening, the fact that no one responsible for the horrific crimes occurring in Syria is likely to step foot in its chambers–unless there is a dramatic geopolitical shift–is a brutal blow.

Beyond jurisdiction, the court’s mandate only allows it to try cases that meet the threshold of genocide, war crimes, crimes against humanity or the less-tested crime of aggression.  A case can only be prosecuted if it has been established that the appropriate State is unwilling or unable to genuinely do so itself.


Preliminary Investigations

Some of the cases that have undergone preliminary investigation test the third-party territorial jurisdiction clause (ie the United Kingdom for crimes committed in Iraq, the registered vessels of Comoros, Greece and Cambodia for the flotilla incident with Israel)

These are cases that are deemed not to meet the statutory requirements of the court.

These are cases that are ruled to meet the requirements for further investigation.  This is where the geographic concentration of cases begins to become apparent.





The collapse of the court’s case in Kenya has been a source of much concern and seen as a bad omen for the court’s future.


This includes noteworthy cases such as the Gadaffs in Libya (the case against Colonel Muammar Gaddafi for crimes committed during the Libyan revolution was dropped with his death, there is an arrest warrant out for his son Saif al-Islam Gaddafi, but he is being held by a militia in Libya),  Sudanese President Omar al-Bashir (his arrest warrant, the first against an active head of State, has been routinely flouted by African and Middle Eastern countries to which he he has travelled freely), and Joseph Kony in Uganda (the online video “Kony2012” may have been a viral sensation but has had no visible impact on securing his capture).



These last maps should make apparent the main criticisms lodged against the court – that convictions have been few, that without a tool to enforce arrest warrants it will remain impotent and, most recently, that the pronounced focus on African countries is a sign of its colonialist intentions (and have resulted in the threat of the withdrawal of several African nations from the Rome Statute).

Proponents of the ICC mostly acknowledge that the court has its flaws that could be improved, but defend the cost and pace of the convictions by explaining the complexity, scope and ambition of what it trying to be achieved. They look to recent developments in Latin America as a sign that the court’s preventative potential may be working. The enforcement dilemma, they say,  is ultimately one of political will that must be worked out through advocacy and diplomatic channels. And the charges of neocolonialism have been vehemently denied by the current Gambian and former Argentine prosecutors of the court, who argue that those criticisms betray a lack of understanding of the court’s statutory limitations and are an excuse for brutal dictators to evade justice.

Ultimately, the ICC may be the best tool we have in an imperfect world. After all without it, former ICC prosecutor Luis Moreno Ocampo has asked, “who else will fight for the victims?”

Posted in All

Busting HBCU myths with data

By Jeneé Osterheldt and Tyler Dukes

There’s a long-standing myth that Historically Black Colleges and Universities, or HBCUs, do a poor job graduating their black students.

According to U.S. Department of Education data, only 4 out of 10 black students graduate “on-time” — that is, within six years of starting their freshman year.

Weighted average of graduation rates for black students at 84 HBCUs reporting to the U.S. Department of Education as of 2014 within six years of their start date, or 150 percent time. SOURCE: Integrated Postsecondary Education Data System, PartNews analysis

Compared with colleges and universities overall, the number of black students who graduate on time is closer to 5 in 10.

Weighted average of graduation rates for black students at 1,671 colleges and universities, including HBCUs, reporting to the U.S. Department of Education as of 2014 within six years of their start date, or 150 percent of the time. SOURCE: Integrated Postsecondary Education Data System, PartNews analysis

So what’s the deal?

Jay Z says numbers don’t lie, but they don’t exactly paint the whole Picasso either. It might seem like HBCUs have a low grad rate — but it’s just not that simple.

If you plot graduation rates for black students against the percentage of first-generation students at a college or university, it looks a little something like this.

Approximate plot of percentage of first-generation students (horizontal axis) vs. graduation rates for black students (vertical axis) in 2015 for about 1,600 colleges and universities reporting to the U.S. Department of Education. SOURCE: US DOE College Scorecard, PartNews analysis

The general trend is that the higher the percentage of first-generation students, the lower the graduation rate.

And that’s an important relationship, because when we look at where HBCUs fall on this plot, they tend to be scattered around here, toward the lower end of the graduation rates and the higher percentage of first-generation students.

Approximate locations of 85 HBCUs inplot of percentage of first-generation students (horizontal axis) vs. graduation rates for black students (vertical axis) in 2015 for about 1,600 colleges and universities reporting to the U.S. Department of Education. SOURCE: US DOE College Scorecard, PartNews analysis

On average, about 43 percent of students enrolled in HBCUs are first-generation. Compare that to about 36 percent for colleges overall.

Another factor: Money. According to a Pell Institute study students from families in the top quartile (over $108,650) are eight times more likely to hold a college degree than a kid from the bottom quartile (under $34,160). About half of the nation’s HBCUs have a freshman class where three-quarters of the students are from low-income backgrounds.

About 50 percent of the nation’s HBCUs have a freshman class where 75 percent are from low-income backgrounds.  SOURCE: Pell Institute

But just 1 percent of the 676 non-HBCUs serve as high a percentage of low-income students.

That bag makes a difference. Not to mention, the schools themselves see less resources.
According to the Thurgood Marshall College Fund, HBCUs have one-eighth the average size of endowments than historically white colleges and universities.

And consider the open-admission policy. HBCUs are more likely to accept students with lower grades and SAT scores than other institutions. The Post Secondary National Policy Institute found that over 25 percent of HBCUs are open admission institutions compared with 14 percent of other colleges and universities.

Despite the odds, HBCUs still make a major difference to their student bodies. These schools, which on the surface may seem to do a poor job at graduating black students, helped create the black middle class. At least that’s what U.S. Commission On Civil Rights report says.

Historically Black Colleges and Universities have produced 40 percent of African-American members of Congress, 40 percent of engineers, 50 percent professors at PWIs, 50 percent lawyers and 80 percent of judges.

And to think, HBCUs only represent 3 percent of of post-secondary institutions. Just saying: imagine what these schools could do with more funding and support.

Long live black excellence.

Movie Success: Is it in the Data?

By Dijana, Maddie, and Sruthi 

Despite all of the talk every year about how out of touch the entire award ceremony and results are, everyone in the film industry wants to win an Oscar. It’s the most prestigious award in the industry, signifying the recipient being at the top of the field. Regardless of the flaws of the voting process, which is completely subjectives given the voting constituency, the number of Oscars a film wins is more often than not the main measure of a film’s success.

But is there some other factor that contributes to that success? Do the film critics sway the voters? Does public sentiment push movies into Oscar contention? Is there some correlation between the revenue of a movie and it’s Oscar potential? Does spending more on the film lead to more wins?

Using data sets that included quantitative data from IMDB, the American Film Institute, and Box Office Mojo, it’s clear that some factors have a stronger relationship with Oscar wins than others.

(Budget) Size Doesn’t Matter

Click to see full image

After charting the relationship between Oscar wins and the adjusted budget of each film released from 1928 to 2010 (with some omissions due to the incomplete data set), it’s clear that the cost of the movie has no bearing on its overall success. Very rarely are the big budget movies major winners at the Academy Awards. In fact, only Titantic, a film that cost approximately $200M to make, has a significantly large budget in film terms.  

There may be many inferences to make from the data, such as the fact that the most expensive movies tend to be summer releases geared for the general population as opposed to the serious film crowd. Budgets may be increasing for films over the last several decades (see graph below), but the number of films winning a significant amount of awards has not increased. However, given the lack of available data, these points remain speculation.     

Click to see full image

Mo’ Money, Mo’ Oscars?

The size of a budget may not signal a greater probability for a film to win more Oscars, but about overall revenue? Does the box office success matter for the Academy voters?

Looking at the data from Box Office Mojo of the 25 movies released before 2011 with the highest domestic grosses, adjusted for inflation, it’s clear that there is no significant relationship between revenue and Oscar wins.

Click to see full image

Academy members aren’t swayed from box office smashes when it comes to choosing Best Picture. Though there are a few outliers, such as Gone with the Wind and Titanic, that have earned a significant amount of revenue at the box office as well as Academy Awards, for the most part more revenue does not signify more Oscars. In fact, 6 of the 25 movies did not win a single Oscar, and three only won one Academy Award.

The People vs. Oscar Wins

In the film industry, an Oscar is an incredible achievement, signifying the quality of the end product and the work that went into creating it. But does the public see these films in the same way? Just how popular are the most successful movies?

Using data from the IMDB database, the rating of each film (which any visitor can vote on) was compared to the total Oscar wins:

Click to see full image

Surprisingly, the most popular film on IMDb according to the public rating at 9.2 out of 10 is The Shawshank Redemption, a film that while nominated for seven Academy Awards, came away empty handed. On the other hand, two of the three films with the most Oscar wins (11), Titanic and Ben-Hur failed to make the top 250 ratings on the website, with ratings of 7.7 and 5.7, respectively. The Lord of the Rings: Return of the King is one of the few films that bucks this trend, with a rating of 8.9 and 11 Oscar wins.

But What Do the Experts Think?

When it comes to measuring the success of a film, one major group has been ignored thus far: film experts, including historians and critics. To get a better sense of how much expert opinion matches up with the Academy’s, the American Film Institute’s list of top 25 movies of all time (up to 2010) was used as the primary source for analysis:

Click to see full image

Much like the previous analyses, the number of wins does not match up well with the ranking. While several movies, such as Gone with the Wind, Lawrence of Arabia, and On the Waterfront, each won several awards and were ranked in the top 10, the top film of all time, Citizen Kane, only won one Oscar. Even more surprising, several of the top 25 films, including Singin’ in the Rain, Psycho, and It’s a Wonderful Life received zero Academy Awards.

Of course, several caveats must be made with the data. The number of total categories and therefore possible wins has increased substantially from the first Academy Awards in 1928. Similarly, there is no way to determine the competitiveness of the field in a given year. There’s no way of knowing if a film that is highly regarded by critics and the public would have won more awards if it was released in another year.Additionally, not all Oscars are created equally, and more weight may need to be applied to categories like Best Picture and Best Director over others.

Despite the issues with the data, one thing remains clear: Oscar wins may be a measure of success for the industry, but it very little, if any, evidence that several criteria matter when it comes to predicting success. So instead of trying to use an IMDB rating to predict the next Oscar winner, it may be better to just guess blindly.

Anti-Semitic Incidents in MA: A Tale of Two Data Sources

By AAAD (Arthur, Anne, Anne, Drew)

“Data-driven” is the theme of the modern age. From business decision-making to policy changes, from news stories to social impact evaluations, data is a foundational building block for many organizations and their causes. However, while we would like to think that all data represent absolute truths, the real world presents many challenges to accurate reporting.

Our team was motivated by the question: How has the prevalence of anti-Semitic incidents in Massachusetts changed over the past several years?  In our exploration of this question, we learned an old but important truth when you see data, dive deep and make sure you understand how the data collection methods could affect the resulting data.

To begin our exploration of anti-Semitic incidents in MA, we looked into two sources: the Anti-Defamation League (ADL) and Massachusetts Executive Office of Public Safety and Security (EOPSS). To begin with, we noticed obvious discrepancies in the annual totals of anti-Semitic incidents reported by the two sources:

Anti-Semitic incidents in Massachusetts

2015 50 40
2014 47 36
2013 46 83
2012 38 90
2011 72 92
2010 64 48

Source: ADL press releases and EOPSS data from a FOI request

After seeing these discrepancies, we decided to dig deeper and try to understand what might account for the differences.  We began by investigating how the data is collected and then comparing differing statistics between the two sources.

EOPSS’ approach and its implications

Massachusetts passed its first hate crime legislation in 1991, but not every agency has adhered to it. According to reports from the Massachusett’s Executive Office of Public Safety and Security (EOPSS), the state did not begin tracking non-reporting agencies until 2005.

The Massachusetts “Hate Crimes Reporting Act requires that the number of hate crimes that occur in the Commonwealth be submitted to the Secretary of Public Safety annually, even if that number is zero. (M.G.L. c. 22C, § 32).” Nonetheless, as late as 2014, some districts were not reporting this statistic. The FBI also compiles hate crime data, though submitting this information is voluntary. Some Massachusetts agencies that have failed to report hate crime data to the FBI have stated they did not realize the FBI had even requested the information.

The accuracy of hate crime reporting data can be influenced by a number of factors, including record keeping procedures within a given agencies and whether or not officers are trained to inquire about factors that qualify crimes as hate crimes.

When agencies do not report data to the state, any hate crimes recorded in the populations in those districts are not represented by the official state statistics. Agencies that have zero hate crimes should report zero hate crimes to the state (These are designated as “zero-reporting” agencies in official reports). A further complication in determining trends can occur when formerly non-reporting agencies begin to report incidents of hate crime if the number is not zero.

Data collected by Massachusetts indicates the population covered by agencies that did not report hate crime statistics grew from roughly 66,000 in 2011 to over 300,000 in 2014.

Massachusetts has recently taken steps to increase the public’s ability to report hate crimes, setting up a hotline in November of 2016. Some police districts also have a designated Civil Rights Officer to handle hate crimes.  

The issues raised by non-reporting are far from academic. When national tragedies occur, one reaction may be in an increase in hate crimes against particular populations. In these cases, hate crime statistics can provide insight about the implications for local communities.

In the wake of the 2016 presidential election, Bristol County Sheriff, Thomas Hodgson, called for the issue of arrest warrants for elected officials of “sanctuary cities.” This prompted Somerville mayor, Joe Curtatone, to defend the legality of sanctuary cities and refer to Sheriff Hodgson as a “jack-booted thug.” He further taunted Hodgson to, “come and get me.” These flare ups between public officials indicate the tension that has formed in the public sphere around the issue of immigration.

Hate crime reporting statistics can provide a tool to measure claims of anti-immigrant-related incidents and provide the public with a sense of whether these incidents are on the rise. Massachusetts has responded to concerns about an increase in hate crimes by setting up a hate crime reporting hotline.

Official statistics from police departments and college campuses can bring clarity to the issue, but Massachusetts must both require and enforce reporting mandates as well as provide training to local agencies to improve and standardize the reporting of these statistics.

ADL’s selected approach and its implications

Another source of data on Massachusetts anti-Semitic crimes comes from the Jewish NGO, the Anti-Defamation League (ADL). The ADL was founded in the United States in 1913 and aims to “stop anti-Semitism and defend the Jewish people,” according to their website.

Since 1979, the ADL has conducted an annual “Audit of Anti-Semitic Incidents.” The ADL’s data partially overlaps with official data — they use data from law enforcement — but they also collect outside information from victims and community leaders.

The limitations in the ADL’s audits are like those of any audits trying to cover anti-Semitic crimes. The way the ADL handles them, however, should carefully be noted as it greatly affects the resulting numbers.

First of all, unlike the official data, the ADL also includes non-criminal acts of “harassment and intimidation” in its numbers, which encompasses hate propaganda, threats, and slurs.

Another key difference from the official data is that ADL staff attempt to verify all anti-Semitic acts included in the audit. While crimes against institutions are easier to confirm, harassment against individuals that are reported anonymously provide unique challenges for verifying.

Additionally, some crimes are not easily identifiable as anti-Semitic even though they may be. In their annual audit, the ADL considers all types of vandalism at Jewish institutions to be anti-Semitic, even without explicit evidence of anti-Semitic intent. This includes stones thrown at synagogue windows, for example.

On the other hand, the ADL does not consider all swastikas to be anti-Semitic. As of 2009, they have stopped counting swastikas that don’t target Jews, as it has become a universal symbol of hate in certain cases.

The ADL also does not count related incidents separately. Instead, crimes or harassment that occurred in multiple nearby places at similar times are counted as one event.

All of these choices made by the ADL greatly affect the numbers that they produce each year.

Comparing and contrasting the results of the two methodologies

Numbers can tell different stories depending on the choices and circumstances surrounding the ADL and the EOPSS’ hate crime data collection processes.  To demonstrate this, we compare some of the conclusions between the two datasets for anti-Semitic hate crimes in Massachusetts.

Starting small: One location claim

One of the ADL’s figures for 2013 indicated that 28% (or 13) of the 46 total anti-Semitic incidents that year took place on a school or college campus.  If we look for the same percentage in the EOPSS data, we find a similar 29% of reported 2013 incidents occurring on a school or college campus.  

This single summary seems to bode well for comparisons between the two datasets: however, things get a little hazier when you look at the absolute numbers.  Instead of 13 out of 46 total incidents, the EOPSS data reported 24 out of 83 incidents on a school or college campus, and it’s unclear what accounts for the difference in scale.

Time trends in reports

If we look at time trends, 25% of the anti-Semitic incidents in Massachusetts reported by the ADL in 2014 occurred in July and August, while that figure was 8% for the same time period in 2013.  

That “marked increase” in anti-Semitic incidents was attributed to the 50-day Israel-Gaza conflict that took place from July 8 to August 26, 2014 by ADL’s New England Regional Director saying, “This year’s audit provides an alarming snapshot into anti-Semitic acts connected to Operation Protective Edge over the summer as well as acts directed at Jewish institutions.  The conflict last summer demonstrates how anti-Israel sentiment rapidly translates into attacks on Jewish institutions.”

If we look at EOPSS data for 2013 and 2014, however, there appears to be no sign of a marked increase in anti-Semitic incidents recorded in the summer months — in fact, in absolute numbers, both incidents in July/August and incidents in the entire year decreased from 2013 to 2014 in the EOPSS data.  

Because the ADL does not provide their underlying data to the public, we can’t dig into the stories of the specific incidents in July/August 2014 and see if they could indeed be a result of the Israel-Gaza conflict.  Additionally, with not-particularly-scientific or consistent reporting methodologies, it’s hard to make concrete conclusions from either of these datasets.

Incident types: Differences might be explained by differing reporting policies

Thus far, we’ve identified contradictions between the two datasets, but have not been able to discern how the two data collection methods may have specifically contributed to those contradictions.  

One topic where we can attempt to do so is the matter of vandalism:

According to the annual ADL audit, 16 of the 46 anti-Semitic incidents in Massachusetts in 2013 (35%) involved vandalism.  The same figure from the ADL for 2014 was 23 vandalism incidents out of 47 total anti-Semitic incidents in Massachusetts (49%).  In EOPSS’ numbers, however, vandalism looks like an even larger portion of anti-Semitic incidents in Massachusetts.

As discussed previously, the ADL reports all vandalism of Jewish institutions as anti-Semitic incidents, but does not count all vandalism including swastikas as anti-Semitic incidents in their data.  Although not directly specified, the EOPSS datasets likely do categorize all reports of swastikas as anti-Semitic vandalism, which would be a possible explanation for the large discrepancy in percentages (on top of the simple explanation that with numbers of this magnitude and lack of precision, variations are inevitable).

Do Data Due-Diligence!

Investigating the discrepancies and the data collection methodologies was not merely an academic exercise: it demonstrates that this is a necessary step to understanding what kinds of conclusions you can reasonably draw from your data and what kinds of caveats you should include when reporting or making decisions based on that data.  

Using only one dataset without exploring how the data was collected and digging into the details of the data could yield very different headlines:

Blindly using ADL data might yield: “Anti-Semitic hate crimes in Massachusetts increase 2% from 2013 to 2014.” (This was just 46 to 47 — is it really reflective of the situation to call that a 2% increase? Does this reflect the reality?)

Blindly using EOPSS data might yield: “Massachusetts became safer for Jewish people in 2014: anti-Semitic hate crimes dropped 43% from 2013.”  (Is this message true, or is this “trend” due to data collection issues? Why does it paint such a different picture from the ADL data?)

Do your data due-diligence.

Posted in All

Where are Pulitzers Won?

Yesterday saw the announcement of the 2017 Pulitzer Prizes. Awarded in some form or another for one hundred years, the Pulitzers represent the peak of journalistic recognition as well as literary and musical accomplishment.

Though the categories celebrating journalism have shifted somewhat over the years, the Pulitzers have long recognized quality reporting at all levels, from the local to the international. So what can analysis of who won the awards tell us about the geographic spread of successful journalism?

For this assignment I analyzed where four different categories of Pulitzers were awarded over the course of the last century. First, I looked at the Pulitzer for Local Investigative Specialized Reporting, a category awarded since 1964. Scraping the data from a list on Wikipedia, I calculated the number of awards given to titles in each U.S. state, and used the visualization tool Datawrapper to display the results:

backup link: //

23 out of 50 states have seen a title win a Pulitzer for local reporting – a decent geographic spread. Next I looked at the prizes for National and International Reporting respectively:

backup links: //


As these charts show, larger states have tended to dominate the National and International categories, which makes sense given the consolidation of resources in large bureaus, particularly in New York and Washington. For international reporting especially, New York dwarfs all other states, accounting for well more than half of all International Pulitzers.

Yet the Public Interest category, displayed below, shows much more geographic diversity. Though New York and California, as large states, still lead the way with 10 prizes each, Putlizers for work in the public interest have been awarded to fully 31 states plus DC, and states like North Carolina (6 awards) and Missiouri (4) have been frequently recognized.

backup link: //

This analysis suggests that while major titles like the New York Times and Washington Post have long lead the way with their hard-hitting reporting at the national and international levels, for a century now, newspapers at every level and in a majority of states have performed award-winning journalism in the public interest. These local titles, exposing municipal corruption and state-level scandal, are the backbone of American journalism and – facing the most danger from the loss of advertising revenue and corporate consolidation – are most in need ongoing financial support.

Climate change & terrorism: The data

Last November, Presidential candidate Bernie Sanders raised some eyebrows when he said, “…climate change is directly related to the growth of terrorism. If we do not get our act together and listen to what the scientists say, you’re gonna see countries all over the world — this is what the CIA says — they’re going to be struggling over limited amounts of water, limited amounts of land to grow their crops, and you’re going to see all kinds of international conflict.”

Since then, a number of media outlets have fact-checked this statement, and PolitiFact has rated this comment as being Mostly False. You can read about PolitiFact’s full analysis here.

While Sanders’s comments were perhaps too direct in establishing a causality relationship between climate change and terrorism, he’s not alone in connecting the impact of climate change as a destabilizing force that terrorist organizations can take advantage of. The Defense Department mentions climate change as a “threat multiplier” in a 2014 report, and Al Gore has been quoted numerous times how the Syrian Civil War was caused by extreme drought conditions, which were caused by climate change.

While intuitively, these arguments make logical sense, other than anecdotal one-off instances (i.e. drought in Syria led to Syrian Civil War, drought in Nigeria led to Boko Haram, etc.), what has lacked is a comprehensive review of extreme weather conditions globally in recent years, and whether geographies facing the worst impact of climate change has seen an increase in terrorist activities. Based on Sanders’s statements, this seems like a reasonable assumption to make.

The first place to look was at where climate change was hitting the hardest in recent history. Mapped below is a heatmap of the impact of extreme weather events on the population. The higher number, the great percentage of the population that has been impacted by extreme weather such as drought, floods, etc.

Data source: IMF. Extreme weather impact on percentage of population, 1990-2009

An interactive version of the map is here:!/publish-confirm

Swaziland, Malawi, China, Niger, and Eritrea are countries who have populations most impacted by severe weather conditions. If Sanders’s comments hold true, we should also see the highest number of terrorist activities in those countries in recent history. Mapped below is the number of casualties from terrorist incidents since 1980. Casualties were plotted here instead of number of incidents to show the severity of terrorist activity.

It is immediately apparent that those 5 countries do not have anywhere near the highest number of terrorist casualties in the past two to three decades.

Also included in the interactive map for context is the percentage change in GDP year over year to potentially show the amplifier impact of climate change, as well as poor economic conditions on terrorist activity. However, based on the data that is presented, no direct relationship can be easily seen between both climate change, and economic health on terrorist activity. Sanders’s comments don’t hold up against the data. Instead, as Time and PolitiFact have indicated, there seems to be many other factors that contribute towards terrorist incidents.

Posted in All