Where Information Lives

EMC Journal

Subscribe to EMC Journal: eMailAlertsEmail Alerts newslettersWeekly Newsletters
Get EMC Journal: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn

EMC Journal Authors: William Schmarzo, Jason Bloomberg, Jordan Knight, Mat Mathews, Bruce Popky

Related Topics: EMC Journal, Data Mining, Big Data on Ulitzer

Blog Feed Post

Thinking Like a Data Scientist | @CloudExpo [#BigData #IoT #DevOps]

Identify, brainstorm and/or uncover new variables that are better predictors of business performance

Thinking Like a Data Scientist: Part I

One question I frequently get is: "How do I become a data scientist?"  Wow, tough question.  There are several new books that outline the different skills, capabilities and technologies that a data scientist is going to need to learn and eventually master.  I've read several of these books and am impressed with the depth of the content.

Unfortunately, these books spend the vast majority of their time reviewing and/or teaching things such as the data science processes (such as CRISP: Cross Industry Standard Process for Data Mining), and basic and advanced statistics, data mining and data visualization techniques and tools.

Yes, these are very important data science skills, but they are not nearly sufficient to make our data science teams effective.  The data science teams still need help from the business users - or subject matter experts (SME) - to understand the decisions the business is trying to make, the hypotheses that they want to test and the predictions that they need to produce in support of those decisions and hypotheses.  In essence, to improve the overall effectiveness of our data science teams, we need to teach the business users to think like a data scientist.

So the objective of this blog (which if successful, will make its way into my Big Data MBA curriculum for the University of San Francisco School of Management fall semester) is to define a process that helps business users to "think like a data scientist."

I am also going to test this concept and methodology at my session at EMC World, where I am presenting "Expert Guidance To Achieve Big Data Maturity" on Monday, May 4th at 4:30.  So sharpen your pencils and let's begin the exercise!

Thinking Like a Data Scientist Process
The goal of the "thinking like a data scientist" process is to identify, brainstorm and/or uncover new variables that are better predictors of business performance.  But "business performance" of what?  Our key business initiative, of course.

Step 1:  Identify Key Business Initiative.  Would you expect anything different from me than starting with what's important to the business?  So, how can you spot a key business initiative?

A key business initiative is characterized as:

  • Critical to the immediate-term performance of the organization
  • Documented (communicated either internally or publicly)
  • Cross-functional (involves more than one business function)
  • Owned/championed by a senior business executive
  • Has a measurable financial goal
  • Has a well-defined delivery timeframe (9 to 12 months)
  • Undertaken to delivery significant, compelling and/or distinguishable financial or competitive advantage

I am a big stickler about targeting business initiatives that are focused on the next 9 to 12 months.  Anything longer than 12 months can quickly digress into a "Battlestar Gallatica" or "cure world hunger" project that may have incredible business value, but little chance of success.

For a refresher on how to identify an organizations key business initiatives, read my blog "Big Data MBA: Reading the Annual Report for Big Data Opportunities."  That blog outlines how to leverage publicly available information (e.g., annual reports, analyst calls, executive speeches, company blogs, SeekingAlpha.com) to uncover an organization's key business initiatives.

For purposes of this exercise, I'm going to pretend that our client is Foot Locker, and that our target business initiative is "Improve Merchandising Effectiveness" as highlighted in their annual report (see Figure 1).


Figure 1: Identifying and Understanding Organization's Key Business Initiatives

Step 2:  Identify Strategic Nouns. Strategic nouns are the key business entities that either impact or are impacted by the organization's key business initiative.  These strategic nouns are critical to our data scientist thinking process because these are the entities for which we want to uncover or gain new, actionable insights, and around which we will ultimately build our analytic profiles.  Examples of strategic nouns include customers, patients, students, employees, stores, products, medication, trucks, wind turbines, etc.

For the Foot Locker "Improve Merchandising Effectiveness" business initiative, the strategic nouns upon which we will focus are:

  • Customers
  • Products
  • Campaigns
  • Stores

Step 3:  Brainstorm Strategic Noun Questions. Probably the hardest part of this exercise - and maybe the hardest part of the "thinking like a data scientist" exercise - is to brainstorm the different questions that you want to ask in support of the targeted business initiative.  For this part of the exercise, we want the business users to brainstorm the business questions for each of the "strategic noun" questions from the perspectives of:

  • Descriptive Analytics:  Understanding what happened
  • Predictive Analytics:  Predicting what is likely to happen
  • Prescriptive Analytics:  Recommending what to do next

See Figure 2 for an example of the evolution from Descriptive to Predictive to Prescriptive.

Figure 2:  Evolution of The Analytic Questions

Figure 2: Evolution of The Analytic Questions

In our Foot Locker "Improve Merchandising Effectiveness" example, we want to brainstorm the "Customer" strategic noun questions as such:

Descriptive Analytics (Understanding what happened)

  • What customers are most receptive to what types of merchandising campaigns?
  • What are the characteristics of customers (e.g., age, gender, customer tenure, life stage, favorite sports) who are most responsive to merchandising offers?
  • Are there certain times of year where certain customers are more responsive?

Predictive Analytics (Predicting what will happen)

  • Which customers are most likely to respond to a Back to School event
  • Which customers are most likely to respond to a BOGOF offer?
  • Which customers are most likely to respond to a 50% off in-store markdown?

Prescriptive Analytics (Recommending what to do next)

  • What personalized offers (recommendations) should I deliver to Anne Smith to get her to come into the store?

Part II of "Thinking Like a Data Scientist" blog series will conclude this "thinking like a data scientist" process and hopefully help us uncover new data sources and metrics that may be better predictors of business performance.

Thinking Like a Data Scientist - Part I
Bill Schmarzo

More Stories By William Schmarzo

Bill Schmarzo, author of “Big Data: Understanding How Data Powers Big Business” and “Big Data MBA: Driving Business Strategies with Data Science”, is responsible for setting strategy and defining the Big Data service offerings for Hitachi Vantara as CTO, IoT and Analytics.

Previously, as a CTO within Dell EMC’s 2,000+ person consulting organization, he works with organizations to identify where and how to start their big data journeys. He’s written white papers, is an avid blogger and is a frequent speaker on the use of Big Data and data science to power an organization’s key business initiatives. He is a University of San Francisco School of Management (SOM) Executive Fellow where he teaches the “Big Data MBA” course. Bill also just completed a research paper on “Determining The Economic Value of Data”. Onalytica recently ranked Bill as #4 Big Data Influencer worldwide.

Bill has over three decades of experience in data warehousing, BI and analytics. Bill authored the Vision Workshop methodology that links an organization’s strategic business initiatives with their supporting data and analytic requirements. Bill serves on the City of San Jose’s Technology Innovation Board, and on the faculties of The Data Warehouse Institute and Strata.

Previously, Bill was vice president of Analytics at Yahoo where he was responsible for the development of Yahoo’s Advertiser and Website analytics products, including the delivery of “actionable insights” through a holistic user experience. Before that, Bill oversaw the Analytic Applications business unit at Business Objects, including the development, marketing and sales of their industry-defining analytic applications.

Bill holds a Masters Business Administration from University of Iowa and a Bachelor of Science degree in Mathematics, Computer Science and Business Administration from Coe College.