How can data science help in elections?

PAKISTAN
By: Zeeshan-ul-hassan Usmani

On July 25 (tomorrow), Pakistan will have its 13th general election (1954, 1962, 1970, 1977, 1985, 1988, 1990, 1993, 1997, 2002, 2008 and 2013). This time, the election is set during hot and humid month of July and the temperature forecast for the day is between 27-33 degree Celsius, with almost no chances of rain anywhere in the country.

Based on our assessment, we predict that the turnout tomorrow will be historic, in the range of 57-61 percent. Historically, the average turn out has remained at around 45 percent, since 1977 (lowest was 35 percent in 1997 and the highest 55 percent in 1977. In 2013, the turn out was 53 percent). Pakistan is ranked 164th out of 169 democracies in terms of the voters’ turn out; Australia is first on the list with 94.5 percent average turn out.

The turnout varies significantly across the country, with Musakhel and Kohlu districts with the lowest average turnout of 25 percent, and Layyah and Khanewal districts having an average of around 60 percent. Punjab has the highest and Balochistan the lowest voters’ turnout.

As many as 3,675 candidates will contest the election on July 25, that is an average of around 13 candidates per seat.

The PTI has fielded the highest number, 244 candidates. Between its three seats, Islamabad has the highest number of candidates in any district, with 76 candidates.

Based on the first-past-the-post (FPTP) system of voting followed in the country, a party or the coalition of parties will need 172 seats to form the next government.

There are quite a few interesting facts about these elections. For example, we will see the highest number of turncoats (candidates who often change their party affiliations) ever in the contest this time.

Candidates from the military background will have a meager chance of winning seats. In the last five general elections since 1993, 138 candidates with ex-military profile contested but only 16 managed to win. Not a single independent candidate with the military background has ever won a seat since 1993.

The independent candidates and their ability to remain independent after winning the elections is highly questionable. Almost 80 percent (77 out of 96) have ended up joining a political party after winning an election.

There will be 86,436 polling stations (Punjab: 48,667; Sindh: 18,647; FATA and KPK: 14,655; Balochistan: 4,467).

Estimates suggest that voters from Karachi, Lahore, and Islamabad (KLI) make up 85 percent of those active on the social media platforms, which represent no more than nine percent of the national vote bank.

According to one analysis, the percentage of PMLN’s safe seats is around 30 percent. Almost 20 percent of the seats fall in the so-called ‘hat trick’ category i.e. seats won by the same party in last three elections.

There are a few initiatives to safeguard this election against malpractices. For example, NADRA has developed a system to electronically transmit the results from polling stations with the picture of each voter for accountability.

While women account for 45 percent of the vote bank, their participation in general elections has remained relatively low. Women voters have been missing from many KP districts in the last few elections, and there are five constituencies in Punjab where women participation has been less than five percent. The NA-152 was able to fetch only 1.9 percent women voters in earlier elections.

Thirteen transgender candidates are contesting the elections.

The history of elections and the charges of corruption, voters’ fraud, ghost votes, interferences by deep state or violence go hand in hand. There is (almost) no country in the world without the fear or accusations of such incidents in their elections. We have the example of Russia’s apparent meddling in US elections and the alleged role of Cambridge Analytica to sway voters in one way or the other.

The deadly cycle of violence has been witnessed in the run up to this election as well. Haroon Bilour of Awami National Party (ANP) got killed with 20 others in a suicide attack in Peshawar. As many as 149 got killed, and 186 were left injured in a deadly suicide bombing on BAP’s leader Siraj Raisani. Four people died, and 10 got injured after an explosion near JUI-F’s Akram Durrani rally in Bannu. Most recently, PTI’s Illyas Gandapur was killed in an attack.

The total tally comes to 174 dead and 261 injured so far, making it one of the deadliest elections ever in Pakistan.

Data science can help us answer quite a few questions and predict election results. The complete dataset of past election results is available online now. The margin of victory between winners and runners-up can be assessed online at a page maintained a Kaggle, a website dedicated to promotion of data science projects. Another page on the website shows the heat map of the total number of votes secured by each party in each constituency. Moreover, here is a complete map of hat-trick seats, we may consider these seats as quick-wins for a respective political party, but there are few constituencies like NA-247 which nullifies the logic. Kaggle kernel of exploratory data analysis of elections have also mapped the strength and numbers of each political party across all constituencies in previous elections. There are Kaggle pages on voter’s turnout and the number of votes in each constituency as well.

Here is a basic formula to go about predicting the election results for Pakistan’s General Election 2018. What we need to do is to calculate the combined probability of a party/candidate to win a particular seat. I would use the following parameters and approximate weights to calculate the winner for each seat:

1. Winning party from last elections: We can give close to 40 percent weight to the winning party of the last election for that seat. If you look deeper, few parties have their confirmed seats (as explained in the hat-trick kernels).

2. Winning candidate from last elections(s): Say another 20% will go to the winning candidate. If he/she is from the same party it will increase their chances of winning; if the candidate has changed the loyalty, the weight should go towards the new party.

3. Vote margin and voters turnout: Another 5-10% of weight should go there. Say if you have won with 30% or more margin, you are quite save to lead this time too, but if you have won with a thin margin, the seat can swing. It also depends on voters turnout, if the margin was only 5 percent and 20 percent more voters come out to vote this time, your lead may increase or disappear based on choices of new voters. You can also assume that new voters would proportionally vote the same way (or otherwise).

4. Polls: I would only give 3-5% of weight to poll results like Gallup

5. Geo-Political Indicators (GPIs): This is the most important set of variables for your analysis. It contains several factors decisive for swing seats and overall election results. It can include the sentiments of that constituency (code a python script to fetch Top 20 Google search results for a respective constituency and automatically classify it using NLP as +ve or -ve). More +ve will give you a good score while -ve would give you a zero or even -ve result. That would be the indicator of incumbent performance in the last tenure. Another variable is to search for the candidate’s family, education and political background, if he/she has any cases of corruption against it, was it named in any significant scandal (Panama leaks, etc.)

6. Rigging: This would be at the heart of your analysis. You should calculate all three forms of pre-poll, polling-day and post-poll rigging and what are the chances of it happening in a respective seat. Skimming through media headlines and talking to local folks would give you a good idea to start with.

 

Courtesy :Daily Times