Where Information Lives

EMC Journal

Subscribe to EMC Journal: eMailAlertsEmail Alerts newslettersWeekly Newsletters
Get EMC Journal: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


EMC Journal Authors: William Schmarzo, Mat Mathews, Greg Schulz, APM Blog, Cloud Best Practices Network

Related Topics: EMC Journal, Cloud Data Analytics, Big Data on Ulitzer

Blog Feed Post

Favorite 2015 @Schmarzo Big Data Blogs | @BigDataExpo #IoT #BigData

It’s that time of year that I review everything that I’ve written over the past year and share my favorite blogs

It’s that time of year that I review everything that I’ve written over the past year and share my favorite blogs. As many of you know, I travel frequently and because I’ve continuously seen every airline movie, I have plenty of time to write. And according to the commentary, every now and then I have a good one. So here are my Top 10 Blogs from 2015!

#10EMC World Day 1: Big Data Business Model Maturity Index and Analytic Profiles. I believe that this is the first blog where I introduce some very important data science concepts: Analytic Profiles and Scores. I talk a lot about the power of Scores in delivering metrics that are potentially better predictors of performance. Scores are important in supporting the decisions you are trying to make and the actions or outcomes you are trying to predict. And Analytic Profiles are how we bring all the analytics insights to support the organization’s key business initiatives.

For example, let’s look at what Bill Schmarzo’s Analytic Profile might be from the perspective of Starbucks.

  • Demographic Information.  This is the basic information about me such as name, home address, work address, age, gender, marital status, income level, value of home, length of time in current home, education level, number of dependents, etc.
  • Behavioral Information:  Now it gets interesting, as we want to uncover behavioral insights that are relevant for the business initiatives that Starbucks is trying to support.  Depending upon the targeted business initiative (e.g., customer acquisition, customer retention, advocacy development, new product introductions), here is some behavioral information that Starbucks might want to capture about me:  favorite drinks in rank order, favorite stores in rank order, most frequent time-of-day to visit a store, most frequent day-of-week to visit a store, recency of store visit, frequency of store visits in past week / month / quarter, monetary value of store visits the past week / month / quarter, how long do I stay at which stores, etc.
  • Classifications.  Now we want to try to create some “classifications” about Bill Schmarzo life that might have impact on my key business initiatives such as:  Lifestage classifications, Lifestyle classifications, Product preference classifications, Store visit classifications, etc.
  • Rules.  We might also want to capture some rules or propensities about Bill’s usage patterns that we can use to support Starbucks’ key business initiatives, including: propensity to buy oatmeal when he buys coffee when traveling in the morning, propensity to buy a cookie/pastry when traveling in the afternoon, propensity to buy product in the channel, propensity to order online, etc.
  • Scores.  We also may want to create scores to support decision-making and process optimization.  Scores that we might want to create could include Customer Lifetime Value Score, Advocacy Score, Loyalty Score, Product Usage Score, Store Visitation Score, etc.

#9Weaving Data Hay Into Business Gold. Some organizations think that they will just jump data into the data lake and the deploy data scientists to “find needles in the data lake haystack.” But the “find a needle in the haystack” is the wrong analogy for the data lake.  “Finding a needle in the haystack” is a data warehouse / Business Intelligence way of thinking about analysis; to slice-and-dice the data haystack trying to find needles.

However, data science with a data lake is more like trying to “weave data hay into business gold.”  So instead of thinking about the data lake as this haystack from which you are trying to find needles, think instead about the data lake as the loom for your data where you weave data hay into business gold.

The primary goal of the data lake, from a business perspective, is to think differently; to enable your data science team to not search for random needles in haystacks, but instead think about how they can leverage the data lake to “weave the data hay into business gold.”

#8An Executive Mandate: Think Open Business.   This blog discusses the power of thinking about an “open business” model, where we define “open” as creating a platform or ecosystem that allows third-parties (developers, partners, resellers) to provide value (and make money) upon that platform or ecosystem. Attacking the market with a closed business model introduces two significant liabilities:

  1. You limit innovation to only the innovation that your company itself can deliver.
  2. You force your customers to only have the choice of only the products that the original manufacturer can develop and deliver.

This blog discusses how Apple and Google addressed the innovation challenge with an open business model that encourages app developers to develop new, innovative products on top of their platform; the model allows third-party app developers to make money on top of the Apple and Google smartphone platforms.  This adds considerable value to their respective platforms and in the process, Apple and Google are transforming their business models from a product-centric business model to a market-enabling business model; one of the key transformations as an organization seeks to “metamorphosize” their business models.

The blog concludes with an exercise on how my favorite kitchen appliance – the Vitamix could create a more open and creative marketplace for its products.

#7The Mid-market Big Data Call to Action. Small organizations seem to have this inferiority complex when it comes to big data. It would seem that the deck is stacked against the small organizations that lack the technology resources to invest or the data experience upon which to leverage to compete with the large companies in the area of big data. However, I think the opposite is true, that small organizations have a HUGE advantage over many of their larger counterparts with respect to integrating data and analytics into their business models including:

  • Smaller organizations have fewer data silos, so they have a much clearer view of their customers, products, operations and markets.
  • Smaller organizations have a smaller number of HIPPO’s (the Highest Paid Person’s Opinion) with which to deal.
  • Smaller organizations can unlearn faster.
  • Smaller organizations are less fixated on technology.
  • But the most important reason is that it is easier for small organizations to institute the organizational and cultural change necessary to actually act on the analytic insights.

#6Why Do I Need A Data Lake? The data lake is a powerful big data architecture that leverages the economics of big data to enable storage, management and analysis of data as compared to traditional data warehouse technologies. The key to maximizing the value of your big data initiatives is the analytics hub and spoke service architecture.

The hub of the architecture is the data lake:

  • Centralized, singular, schema-less data store with raw data
  • Mechanism for rapid ingestion of data with appropriate latency
  • Ability to map data across sources and provide visibility and security to users
  • Catalog to find and retrieve data
  • Costing model of centralized service
  • Ability to manage security, permissions and data masking
  • Supports self-provisioning

The spokes of the architecture are the analytic use cases:

  • Ability to perform analytics (data scientist)
  • Analytics sandbox (HDFS, Hadoop, Spark, Hive, HBase)
  • Data engineering tools (Elastic Search, MapReduce, YARN, HAWQ, SQL)
  • Analytical tools (SAS, R, Mahout, MADlib, H2O)
  • Visualization tools (Tableau, DataRPM, ggplot2)
  • Ability to exploit analytics (application development)
  • 3rd platform application (mobile app development, web site app development)
  • Analytics exposed as services to applications (API’s)
  • Integrate in-memory and/or in-database scoring and recommendations into business process and operational systems

#5In Big Data, Are You Using Refrigerators or Stoves? This blog really challenges how organizations are positioning and selling big data. Too many “experts” are over-emphasizing the big data technology aspects and ignoring the really hard work – understanding what business opportunities exist and how the organization is trying to address them with data and analytics.

My University of San Francisco MBA class finished their Big Data MBA course. We used our trusty “thinking like a data scientist” process to teach our students how to identify a business opportunity, and then use the “thinking like a data scientist” process to drive cross-organizational collaboration to come up with ideas that they can turn into actions using data and analytics.

My co-teacher, the ever talented and energetic Professor Mouwafac Sidaoui, and I asked our students: “What employer wouldn’t want an employee who can excel at doing that?”

#4Big Data Fails: How to Avoid Them. This isn’t exactly a blog. This is an interview that I had with Jessica Davis (InformationWeek) that nicely summarizes many of the keys to big data success. The article actually makes me sound smart (and that’s no small task!). To quote the article:

The companies that run into the most trouble [with big data] are those in which data is in silos, and the thinking about that data is also in silos. For instance, in a banking company there may be a checking account silo and a mortgage silo, and the owners of each group aren’t accustomed to thinking about the whole customer who consumes both services.

Companies that can get past that limitation in their thinking are more likely to be successful with their big data initiatives.

And that example also shows an important factor in successful big data initiatives – collaboration among groups who may not normally collaborate with each other. It relies on team members with different areas of expertise working well together.

“The places where we are seeing success is where the business people and the IT people like each other.”

#3Creativity Is a Team Activity in Big Data. This could have very easily been my favorite blog. It certainly turned out to be one of my most popular blogs.

The potential of big data is only limited by the creative thinking of your business stakeholders. Maybe the biggest inhibitor to creative thinking is the baggage about data and analytics that we have picked up over the years. Organizations need to embrace the power of “thinking differently,” especially with respect to:

  • Data as a strategic asset to be gathered, enriched and shared, versus data as a cost to be minimized
  • The potential of predictive (what is likely to happen) and prescriptive (what should I do) questions versus of just mechanically capturing descriptive (what happened) questions
  • The power of data science to quantify those variables and metrics that are better predictors of performance, versus business intelligence that just reports on what happened while monitoring current business performance
  • Building analytic profiles at the individual (human, machine) level to uncover individual behaviors, tendencies, propensities, interests, passions, associations and affiliations that can lead to specific actionable insights, versus relying on aggregated data to uncover general market trends

#2 – Thinking Like A Data Scientist series. This is probably unfair because this was a four-blog series, but this is my favorite blog(s) from 2015. The series included:

The 8-step “Thinking Like A Data Scientist” process is an enabler for organizations that want to get the most of both their data…and their people. It drives organizational alignment around an organization’s key business initiatives and uncovers where and how big data and data science can optimize key business processes, uncover new monetization opportunities and deliver a more compelling customer experience.

#1Big Data MBA Textbook: Driving Business Strategies with Data Science. Clearly #1 for me was the release of my second Big Data book. This book was written as a textbook to use as part of the class I teach at the University of San Francisco School of Management, but I hope that others can use this textbook to advance big data and data science as business disciplines for tomorrow’s business leaders.

I hope that 2016 is as productive, and given the number of Big Data Vision Workshops that I have to facilitate, I bet it will be!

Favorite 2015 Schmarzo Big Data Blogs
Bill Schmarzo

More Stories By William Schmarzo

Bill Schmarzo, author of “Big Data: Understanding How Data Powers Big Business”, is responsible for setting the strategy and defining the Big Data service line offerings and capabilities for the EMC Global Services organization. As part of Bill’s CTO charter, he is responsible for working with organizations to help them identify where and how to start their big data journeys. He’s written several white papers, avid blogger and is a frequent speaker on the use of Big Data and advanced analytics to power organization’s key business initiatives. He also teaches the “Big Data MBA” at the University of San Francisco School of Management.

Bill has nearly three decades of experience in data warehousing, BI and analytics. Bill authored EMC’s Vision Workshop methodology that links an organization’s strategic business initiatives with their supporting data and analytic requirements, and co-authored with Ralph Kimball a series of articles on analytic applications. Bill has served on The Data Warehouse Institute’s faculty as the head of the analytic applications curriculum.

Previously, Bill was the Vice President of Advertiser Analytics at Yahoo and the Vice President of Analytic Applications at Business Objects.