Friday, April 24, 2015

How do political campaigns use data analysis?

Looking through SSRN this morning, I came across a paper by David Nickerson (Notre Dame) and Todd Rogers (Harvard), "Political Campaigns and Big Data" (February 2014). It's a nice follow-up to yesterday's post about the software supporting new approaches to data analysis in Washington, DC.

In the paper, Nickerson and Rogers get into the math behind the statistical methods and supervised machine learning employed by political campaign analysts. They discuss the various types of predictive scores assigned to voters—responsiveness, behavior, and support—and the variety of data that analysts pull together to model and then target supporters and potential voters.

In the following excerpt, the authors explain how predictive scores are applied to maximize the value and efficiency of phone bank fundraising calls:

Campaigns use predictive scores to increase the efficiency of efforts to communicate with citizens. For example, professional fundraising phone banks typically charge $4 per completed call (often defined as reaching someone and getting through the entire script), regardless of how much is donated in the end. Suppose a campaign does not use predictive scores and finds that upon completion 17 of the call 60 percent give nothing, 20 percent give $10, 10 percent give $20, and 10 percent give $60. This works out to an average of $10 per completed call. Now assuming the campaign sampled a diverse pool of citizens for a wave of initial calls. It can then look through the voter database that includes all citizens it solicited for donations and all the donations it actually generated, along with other variables in the database such as past donation behavior, past volunteer activity, candidate support score, predicted household wealth, and Census-based neighborhood characteristics (Tam Cho and Gimpel 2007). It can then develop a fundraising behavior score that predicts the expected return for a call to a particular citizen. These scores are probabilistic, and of course it would be impossible to only call citizens who would donate $60, but large gains can quickly be realized. For instance, if a fundraising score eliminated half of the calls to citizens who would donate nothing, so that in the resulting distribution would be 30 percent donate $0, 35 percent donate $10, 17.5 percent donate $20, and 17.5 percent donate $60. The expected revenue from each call would increase from $10 to $17.50. Fundraising scores that increase the proportion of big donor prospects relative to small donor prospects would further improve on these efficiency gains.

If you've ever wanted to know more about how campaigns use data analysis tools and techniques, this paper is a great primer.

Thursday, April 23, 2015

Quorum: Is software the new Congressional intern?

Last month, a number of news outlets wrote about a startup called Quorum. Winner of the 2014 Harvard Innovation Challenge's McKinley Family Grant for Innovation and Entrepreneurial Leadership in Social Enterprise, Quorum has amazing potential to create new ways for legislators to easily use data to understand their constituencies and track legislation—literally data for policymaking. Quorum even pulls data from the American Community Survey, which James Treat of the Census Bureau wrote about for this blog a few years back.

TechCrunch touts Quorum as a replacement for the hordes of summer Hill interns, while the Washington Post likens it to Moneyball for K Street.

Danny Crichton at TechCrunch writes:

The challenges are numerous in this space. "Figuring out who you should talk to is a really tough process," Jonathan Marks, one co-founder of Quorum, explained. "This is a problem that a lot of our clients have, [since] there are tens of thousands of relationships in DC." The challenge is magnified since those relationships change so often.

Another challenge is simply following legislation. Marks gave the example of a non-profit firm that wanted to develop a scorecard with grades for each congressmen on several key votes (a common strategy these days in Washington advocacy). One firm had "three people spending 1.5 weeks to tabulate all the data." An opposition research firm went through "6000 votes on abortion” to tabulate every single congressman's legislative history. This was all done manually (i.e. with an army of interns).

But Quorum is not the first product of its kind. Bloomberg and CQ have long dominated with products targeted at this audience. But this is becoming a competitive space for entrepreneurs. Catherine Ho at the Washington Post explains:

Since 2010, at least four companies, ranging from start-ups to billion-dollar public corporations, have introduced new ways to sell data-based political and competitive intelligence that offers insight into the policymaking process."


Other companies are emerging in the space with some success. For others, it’s too soon to tell.

Popvox, founded in 2010, is an online platform that collects correspondence between constituents and their representatives on bills, organizes the data by state, and packages the information in charts and maps so lawmakers can easily spot where voters stand on a proposed bill. An early win was when nearly 12,000 people nationwide used the platform to oppose a proposal to allow robo-calls to cellphones — the bill was withdrawn by its sponsors.

Popvox does not disclose its revenue, but co-founder Marci Harris said the platform has more than 400,000 users across every congressional district and has delivered more than 4 million constituent positions to Congress.

FiscalNote, which uses data-mining software and artificial intelligence to predict the outcome of legislation and regulations, has pulled in $19.4 million in capital since its 2013 start from big-name investors including Dallas Mavericks owner Mark Cuban, Yahoo co-founder Jerry Yang and the Winklevoss twins. The company says it achieves 94 percent accuracy. And Ipsos, the publicly traded market research and polling company, is amping up efforts to sell polling data to lobby firms.

For an academic's take on the trend toward data in politics and camapigning, UNC assistant professor Daniel Kreiss published a great piece for the Stanford Law Review in 2012 titled "Yes We Can (Profile You)," which lays out the ways in which political campaigns employ sophisticated data analysis techniques to measure and target voters.

Friday, April 17, 2015

What exactly is Section 215?

On June 1, Section 215 of the USA PATRIOT Act is set to expire. This is a critical moment in an effort to reform and modernize government surveillance frameworks in the United States. But it's difficult to explain how we got here and why this is important in a few sentences.

The Electronic Frontier Foundation has put together a great background video on Section 215 that explains what it is, what it does, and what's at stake. And they include some data as well.

Watch the video below and learn more about EFF's efforts here.

Wednesday, April 15, 2015

Disability Confident: How can we measure if government policy is working?

Andy White is Employment and Working Age Manager in the Evidence & Service Impact Section at RNIB.

Current UK government policy to improve employment opportunities for disabled people is based on the government’s Disability Confident campaign. Charities such as RNIB are keeping a close watch on this by measuring its impact on the employment rates of disabled people.

Blind and partially sighted people are significantly less likely to be in paid employment than the general population or other disabled people. For every three registered blind and partially sighted people of working age, only one is in paid employment. Worse, blind and partially sighted people are nearly five times more likely than the general population to have had no paid work for five years.

Measuring the employment rates of people registered as blind (serious sight impaired) or partially sighted (sight impaired) gives us the clearest indication of the employment status of people living with sight loss. But even among those not registered, the Labour Force Survey indicates that just over 44% of people who are described as "long term disabled with a seeing difficulty" are employed, compared with 74% of the general population.

One way to increase the numbers of blind and partially sighted people in employment is to focus on increasing the supply of blind and partially sighted people to the labour market by building their attributes and capabilities, and increasing the demand for meaningful work by supporting creative employment opportunities.

Another approach is to support people with sight loss to keep working—27% of non-working registered blind and partially sighted people said that the main reason for leaving their last job was the onset of sight loss or deterioration of their sight. However, 30% who were not working but who had worked in the past said that they maybe or definitely could have continued in their job given the right support.

We can address this by providing blind and partially sighted people with appropriate vocational rehabilitation support, and helping employers understand the business case for job retention. This is a challenge, given that the majority of employers have a negative attitude toward employing a blind or partially sighted person.

Blind and partially sighted people looking for work need specialist support on their journey towards employment. In addition to barriers common to anyone out of work for a long period, blind and partially sighted jobseekers have specific needs related to their sight loss.

Research indicates that those furthest from the labour market require a more resource-intensive model of support than those who are actively seeking work. Many blind and partially sighted jobseekers fall into this category.

The increased pressure on out-of-work blind and partially sighted people to join employment programmes means greater engagement in welfare to work programmes, and an increasing responsibility for the welfare to work industry to meet the specific needs of blind and partially sighted jobseekers.

Government policies such as the Disability Confident campaign will only be effective when there is a sea change in the proportion of blind and partially sighted people of working age achieving greater independence through paid employment.

Research about the employment status of blind and partially sighted people can be found on the Knowledge Hub section of RNIB's website. We also publish a series of evidence-based reviews, including one for people of working age, upon which this blog is based.

Friday, April 10, 2015

Data shows what millions knew: the Internet was really slow!

Meredith Whittaker is Open Source Research Lead at Google.

For much of 2013 and 2014, accessing major content and services was nearly impossible for millions of US Internet users. That sounds like a big deal, right? It is. But it's also hard to document. Users complained, the press reported disputes between Netflix and Comcast, but the scope and extent of the problem wasn't understood until late 2014.

This is thanks in large part to M-Lab, a broad collaboration of academic and industry researchers committed to openly and empirically measuring global Internet performance. Using a massive archive of open data, M-Lab researchers uncovered interconnection problems between Internet service providers (ISPs) that resulted in nationwide performance slowdowns. Their published report, ISP Interconnection and its Impact on Consumer Internet Performance, lays out the data.

To back up a moment—interconnection sounds complicated. It's not. Interconnection is the means by which different networks connect to each other. This connection allows you to access online content and services hosted anywhere, not just content and services hosted by a single access provider (think AOL in the 1990’s vs today’s Internet). By definition, the Inter-net wouldn't exist without interconnection.

Interconnection points are the places where Internet traffic crosses from one network to another. Uncongested interconnection points are critical to a healthy, open Internet. Put another way, it doesn't matter how wide the road is on either side—if the bridge is too narrow, traffic will be slow.

M-Lab data and research exposed just such slowdowns. Let’s take a look…

The chart above shows download throughput data, collected by M-Lab in NYC between Feb 2013 and Sept 2014. The reflects traffic between customers of Time Warner Cable, Verizon, and Comcast—major ISPs—and an M-Lab server hosted on Cogent's network. Cogent is a major transit ISP and many content and services are hosted on Cogent’s network and on similar transit networks. Traffic between people and the content they want to access has to move through an interconnection point between their ISP (TWC, Comcast, and Verizon, in this case) and Cogent. What we see here, then, is severe degradation of download throughput between these ISPs and Cogent that lasted for about a year. During this time, customers of these three ISPs attempting to access anything hosted on Cogent in NYC were subjected to severely slowed Internet performance.

But maybe things are just slow, no?

Here you see download throughput in NYC during the same time period, for the same three ISPs (plus Cablevision). The difference: here they are accessing an M-Lab server hosted on Internap's network (another transit ISP). In this case, in the same region, for the same general population of users, during the same time, download throughput was stable. Content and services accessed on Internap's network performed just fine.

Couldn't this just be Cogent's problem? Another good question…

Here we return to Cogent. This graph spans the same time period, in NYC, looking again at download throughput across a Cogent interconnection point. The difference? We’re looking at traffic to customers of the ISP Cablevision.

Comparing these three graphs, we see M-Lab data exposing problems that aren't specific to one ISP or ISPs, but a problem with the relationship between pairs of ISPs—in this example, Cogent when paired with Time Warner, Comcast, or Verizon. This relationship manifests, technically, as interconnection.

These graphs focus on NYC but M-Lab saw similar patterns across the US as researchers examined performance trends across pairs of ISPs nationwide—e.g., whenever Comcast interconnected with Cogent. The research shows that the scope and scale of interconnection-related performance issues were nationwide and continued for over a year. IT also shows that these issues were not strictly technical in nature. In many cases, the same patterns of performance degradation existed across the US wherever a given pair of ISPs interconnected. This rules out a regional technical problem and instead points to business disputes as the cause of congestion.

M-Lab research shows that when interconnection goes bad, it’s not theoretical: it interferes with real people trying to do critical things. Good data and careful research helped to quantify the real, human impact of what had been relegated to technical discussion lists and sidebars in long policy documents. More focus on open data projects like M-Lab could help quantify the human impact across myriad issues, moving us from a hypothetical to a real and actionable understanding of how to draft better policies.

Monday, April 6, 2015

The Rural Broadband Digital Divide

Michael Curri is president and founder of Strategic Networks Group.

There is a high degree of awareness of how differences in Internet connectivity contribute to the "digital divide" experienced by many, if not most, rural areas. Less is understood about a very real divide that exists from (a lack of) utilization. That’s right, just as important as "speed" is how much businesses and non-commercial organizations utilize the Internet.

Using the data SNG has collected in numerous states between 2012 and February 2015, we can actually quantify this digital divide. Just as significantly, we can identify the types of organizations (industry, size, rural/urban, etc.) that are experiencing the greatest gap in utilization. To quantify utilization, SNG has developed a means to measure utilization we call the Digital Economy index (DEi) that is a reflection of how many Internet processes or applications an organization uses. We measure use of 17 applications on a ten-point scale (ten being best) to develop the DEi (e.g. an organization using 8 of 17 applications would have a DEi score of 4.7).

Collecting data in numerous states, each with rural and urban components, SNG has uncovered the digital divide that exists based largely on the size of the community businesses are located. The table on the right shows that the more urban a community, the higher the DEi score. Regardless of speed available, rural communities are utilizing the Internet and its applications at a lower rate largely because in rural areas there is less knowledge transfer amongst peers and less of a market for specialized technical services.

Beyond the notable gap in Internet utilization between rural and urban areas, SNG’s research also reveals sectors and types of organizations that suffer most from this digital divide. This is consistent with our findings that rural communities have far less local resources to support businesses looking to better utilize broadband applications.

For small towns & isolated small towns (in essence, the census terms for “rural”), local governments have the largest utilization gap compared to their metropolitan peers: with a DEi of 5.24 compared to 7.17. Libraries also show a notable utilization gaps: metro = 7.23; rural = 6.12). In contrast, K-12 schools of comparable size have very similar DEi scores regardless of how urban or rural they are.

When examining industry type, it is illuminating to see just how much variance there can be depending on industry. Ironically, one of the biggest utilization gaps is in what might be considered the most advanced sector (Professional and Technical Services) which is large, growing, and well paying but slow to adopt key Internet applications.

Larger businesses in rural areas (100 or more employees) still experience a utilization gap to their urban counterparts. Rural businesses with less than 100 employees experience a much larger utilization gap.

So while fiber, net neutrality, and FCC decisions dominate the news, the success of broadband in driving impacts is dependent on utilization.

This means that providing our rural businesses with the knowledge and support to leverage the Internet is key to maintaining competitiveness. Furthermore, in today’s landscape it is easier to live rural and work globally, as long as rural businesses have access to networks and support systems that help them thrive in the digital economy. Developing local networks and supports is a direct and significant opportunity (as well as challenge) for local business retention and growth. There are ways to achieve this, including SNG’s Small Business Growth Program. We’d love to share with you how this program can drive economic growth in your region.

See more here.

Friday, April 3, 2015

Mapping the sneakernet

In March, internet researcher and designer An Xiao Mina published a fascinating piece on The New Inquiry about "the sneakernet," a concept that addresses the nuances of connectivity and the myriad social methods through which people exchange culture, access, and information. In the article, she shares an anecdote from a research trip to Northern Uganda, a region where residents had no access to the electric grid or running water and access to 3G internet was limited by both availability and affordability. She writes:

At night, residents turn on their radios, and those who can afford Chinese feature phones play mp3s. One day, I heard familiar lyrics:

Hey, I just met you
And this is crazy
But here’s my number
So call me maybe

I turned my head. A number of young people gathered around a woman rocking out to Carly Rae Jepsen’s "Call Me Maybe," a song that owes so much of its success to the viral power of YouTube and Justin Bieber. The phone's owner wasn't accessing it via the Internet. Rather, she had an mp3 acquired through a Bluetooth transfer with a friend.

Indeed, the song was just one of many media files I saw on people's phones: There were Chinese kung fu movies, Nigerian comedies, and Ugandan pop music. They were physically transferred, phone to phone, Bluetooth to Bluetooth, USB stick to USB stick, over hundreds of miles by an informal sneakernet of entertainment media downloaded from the Internet or burned from DVDs, bringing media that;s popular in video halls—basically, small theaters for watching DVDs—to their own villages and huts.

In geographic distribution charts of Carly Rae Jepsen's virality, you'd be hard pressed to find impressions from this part of the world. Nor is this sneakernet practice unique to the region. On the other end of continent, in Mali, music researcher Christopher Kirkley has documented a music trade using Bluetooth transfers that is similar to what I saw in northern Uganda. These forms of data transfer and access, though quite common, are invisible to traditional measures of connectivity and Big Data research methods. Like millions around the world with direct internet connections, young people in "unconnected" regions are participating in the great viral products of the Internet, consuming mass media files and generating and transferring their own media.

What does this have to do with public policy? At the end of the piece, An explains how understanding connectivity as a spectrum, rather than a binary, can inform policies and strategies for outreach and access. To illustrate this, she uses a vivid water analogy:

Like water, the Internet is vast, familiar and seemingly ubiquitous but with extremes of unequal access. Some people have clean, unfettered and flowing data from invisible but reliable sources. Many more experience polluted and flaky sources, and they have to combine patience and filters to get the right set of data they need. Others must hike dozens of miles of paved and dirt roads to access the Internet like water from a well, ferrying it back in fits and spurts when the opportunity arises. And yet more get trickles of data here and there from friends and family, in the form of printouts, a song played on a phone’s speaker, an interesting status update from Facebook relayed orally, a radio station that features stories from the Internet.

Like water from a river, data from the Internet can be scooped up and irrigated and splashed around in novel ways. Whether it’s north of the Nile in Uganda or south of Market St. in the Bay Area, policies and strategies for connecting the "unconnected" should take into account the vast spectrum of ways that people find and access data. Packets of information can be distributed via SMS and mobile 3G but also pieces of paper, USB sticks and Bluetooth. Solar-powered computer kiosks in rural areas can have simple capabilities for connecting to mobile phones’ SD cards for upload and download. Technology training courses can start with a more nuanced base level of understanding, rather than assuming zero knowledge of the basics of computing and network transfer. These are broad strokes, of course; the specifics of motivation and methods are complex and need to be studied carefully in any given instance. But the very channels that ferry entertainment media can also ferry health care information, educational material and anything else in compact enough form.

An Xiao Mina is a product owner at Meedan and an internet researcher with The Civic Beat.