Where Information Lives

EMC Journal

Subscribe to EMC Journal: eMailAlertsEmail Alerts newslettersWeekly Newsletters
Get EMC Journal: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


EMC Journal Authors: William Schmarzo, Greg Schulz, Mat Mathews, Jeffrey Abbott, Mehdi Daoudi

Related Topics: EMC Journal, Big Data on Ulitzer, Internet of Things Journal

Blog Feed Post

Election #DataScience and the Death of Truth | @CloudExpo #BigData #Analytics

Many candidates treated opinions as ‘truth’ and a large portion of the American public grabbed ahold of these ‘truths’ as gospel

The U.S. Presidential election is finally over. The protests are winding down, they’ve stopped burning cars in Oakland (for now), and the talks of California succession are waning. But I am struggling to return to “normal” because in this election, truth got hammered.

Many candidates treated opinions as “truth” and a large portion of the American public grabbed a hold of these “truths” as gospel. It may have been a good time to be in the “fact checking” business, but I’m not sure how effective even the fact checkers could be given the spontaneous nature of “opinions as facts” being thrown around, not to mention the people who create fake news intentionally.

So let’s play a game! Let’s call this game “Separate the Truth from the Myths.” Let’s see how you do.

  1. Bat Boy Sighted in NYC Subway (probably too expensive to get a condo in Manhattan)
  2. Obama Appoints Martian Ambassador (but the Senate will request Matt Damon since he’s already lived and farmed on Mars)
  3. Skynet is a Reality (Hey, even Iron Man showed up at the Senate to tell them so!)
  4. Ted Cruz Shot JFK (okay, so it actually was his dad, but accusing Ted Cruz is more funny)

All but one of these stories appeared in the highly credible “National Enquirer” or “Weekly World News.” That’s like buying a copy of the “Mad Magazine” (for you old timers) or reading “The Onion” (for you young whippersnappers) expecting the “truth” from these satirical publications (see Figure 1).

Figure 1: Real Headlines from “Less Than Credible” Sources

However the below stories in Figure 2 where plastered across social media sites as if they were the truth, and as you can see from the engagement numbers, lots of people took the time to read these “truths.”

Figure 2: Social Media Fake News and Number of Views

Data Science And Common Sense
As a data scientist, we need to know not to accept the “truth” without applying some common sense. For all the fancy training in neural networks, artificial intelligence and machine learning, it’s hard to replace “common sense” as a necessary data scientist characteristic. Let’s walk through an example of how a data scientist might approach one of the sensational stories that recently popped up on social media (see Figure 3).

Figure 3: The Guardian, September 26, 2016

OMG, murders are up 10.8% in the biggest percentage increase since 1971, according to a highly credible source like the FBI. It’s become the “Walking Dead” out there!

Sensational headlines grab attention and incite fear and dread. “Dirty Laundry” sells. But the problem with data at the aggregate level is that it:

  • Distorts the real truth (or root cause) of what’s the problem, and
  • It is not actionable

The above headline could lead to the conclusion that the current criminal and rehabilitation policies have failed and everything should be thrown out. But there are no details as to what aspects of these programs are broken and no triage of the root causes in order to explore what might be done to fix the problem. As a data scientist, one must demand the granular details so that we can turn the data into insights in order to make the information actionable, such as:

This is a good starting point. If we want to address the increase in murders, we need to drill into each individual murder (and attempted murder) in those 10 cities. We need to keep drilling into the granular details in order to identify those variables and metrics that might be predictors of murders and attempted murders.

For example, we could identify the specific blocks of these cities where the murders are occurring, or the time of day and day of week, or the time of the year, or any special events that occurred right before the murders, etc. We could explore other variables that might be indicative of an increase in murder (e.g., % of broken homes, % of children born out of wedlock, % of high school dropouts, % of drug addicts, unemployment rate among male adults, increase in graffiti).

Once we know those variables that are predictive of murders, then we have a focus as to where we can start fixing the problem, taking corrective actions such as adding more police or community outreach, reducing high school dropouts, increasing drug arrests, testing different programs and approaches, measuring program effectiveness, learning and improving. Now that’s thinking like a data scientist.

Data Scientist Lessons Learned
What are the lessons that we can take away from this “opinions as facts” syndrome?

  • Common sense is critical. Don’t accept “truths” at face value. Demand more details in order to identify and quantify those variables and metrics that might be predictive or indicative of the researched problem.
  • You can’t fix the business – or the country – without drilling into the details and the potential causal factors. We need insights that are drawn from facts that are supported by granular data so that we know what actions to take. With these detailed insights in hand, we now know where to invest our scarce financial and human resources.
  • Details matter. At the aggregate level, the headlines may be sensational, but it is not insightful or actionable until you get into the details. Remember Simpson’s Paradox.
  • Data quality, accuracy and reasonableness are important, especially if you are trying to make business-impactful decisions based upon that data. Business users, if they are expected to use the data to support decisions, must have confidence in the data. “Facts as Facts” are critical if we want to overcome decisions being made on a traditional basis such as gut, hearsay and history.

The good data scientist learns not to trust anything at first blush; that while opinions might yield variables and metrics that might be better predictors of performance, in the end the data scientists need to validate each of these variables and metrics to quantify if they really are better predictors of performance.

In the movie “Star Wars: The New Hope," the weak-minded Storm Troopers were easily dissuaded from pursuing the truth about the droids by Obi-Wan Kenobi’s use of the Jedi Mind Trick to plant the “truth” in their weak minds.

Don’t be weak-minded about seeking the truth. Use your common sense to challenge the “truth,” and get into the granular details so that one can identify and quantify those variables and metrics that are better predictor or indicators of the problems.

And beware the “These aren’t the Droids you’re looking for” syndrome. That’s for the weak-minded.

The post Election Data Science and the Death of Truth appeared first on InFocus Blog | Dell EMC Services.

More Stories By William Schmarzo

Bill Schmarzo, author of “Big Data: Understanding How Data Powers Big Business”, is responsible for setting the strategy and defining the Big Data service line offerings and capabilities for the EMC Global Services organization. As part of Bill’s CTO charter, he is responsible for working with organizations to help them identify where and how to start their big data journeys. He’s written several white papers, avid blogger and is a frequent speaker on the use of Big Data and advanced analytics to power organization’s key business initiatives. He also teaches the “Big Data MBA” at the University of San Francisco School of Management.

Bill has nearly three decades of experience in data warehousing, BI and analytics. Bill authored EMC’s Vision Workshop methodology that links an organization’s strategic business initiatives with their supporting data and analytic requirements, and co-authored with Ralph Kimball a series of articles on analytic applications. Bill has served on The Data Warehouse Institute’s faculty as the head of the analytic applications curriculum.

Previously, Bill was the Vice President of Advertiser Analytics at Yahoo and the Vice President of Analytic Applications at Business Objects.