In part one of this series, I covered the general analytic cycle and its value in driving a self-service culture of analytics. I also covered the very important first step in that cycle: asking the right question. In this second installment of the two part series, I start with the second step of the analytic cycle through the end of the analytic round.
Understand the Analytics Biases at Play
The next stage of the analytic cycle is understanding the analytic biases at play. You have to have a solid understanding of analytic bias on a personal, organizational and industry level. What are analytic biases? The succinct answer is they are learned behaviors that result in you seeing what you expected to see. Let’s explore each of these three levels of analytic bias further.
Personal analytic bias often comes into play due to experience. The more experience we have, the more we tend to fall prey to our personal analytic biases. For instance, if you are a category manager for office furniture for twenty years, you know your most profitable and bestselling product is an executive leather chair.
But because you’ve been the office furniture category manager for twenty years, you take it for granted that this continues to be the case. And your monthly dashboard shows a steady state, so you don’t dive deeper. As a result, you don’t realize that the northeast region drives that product and you’ve been losing to your competitors in the west because they have a completely different style preference that’s changed over the last twenty years.
Organizational analytic biases tend to fall into the same general camp. This problem is typically more acute in larger and long-established businesses. The reason is, again, that there is institutional wisdom and experience. This shows up in policies, procedure manuals, and training. “We’ve always measured it this way” is a good indication of organizational analytic bias.
One of the major problems with this bias is there is diminishing ROI to six sigma programs. Changing defects from one in 100 to one in 10,000 can have a huge effect. Changing defects from one in 1,000,000 to one in 1,000,010? Not so much. Your higher ROI projects tend to come from finding outliers or, as the popular book called them, black swans.
Finally, there are industry analytic biases. Like organizational biases, they tend to be more prevalent and dominate in long-established industries or sectors. These biases are often organized into business-ratios books to provide guidelines for comparison between similar companies, typically at the SIC-detailed level. If everyone in the airline industry is trying to increase their load factor, how much innovation is there really?
To be clear, I’m not saying that all policies or biases are bad and need blowing up, or that industry ratios are bad or wrong. However, we need to be aware of the historical context which gave rise to these accepted ratios or measurements. Being aware of all three levels of analytic bias also helps us refine the right question to ask. Often times it’s not what is being measured that is interesting, but what implicitly isn’t being measured that the analyst can exploit for rapid and high ROI.
Sourcing the Data
This is probably the most straightforward of the steps. Once we understand the question we are trying to answer and have become aware of our analytic biases, we have to source data to answer the question.
At Tableau, we recommend sourcing data at the most disaggregated level as possible as this gives you the greatest flexibility for deeper data exploration. But sometimes the data you need is in a spreadsheet and you need a quick answer to your question. In that case, answer the question in the most expedient manner possible. That being said, if there is an ongoing question or measurement, you might consider following aggregated data sets back to the source system.
Find out where that spreadsheet data came from, and build some standardized data sources you can publish to Tableau Server. ( about how we source and publish data for all Tableau team members to see and understand our data.)
Data exploration is a crucial stage of the analytic cycle. There are two major steps in the data exploration stage: learning the data shape, and beginning to make associations between dimensions and measures.
As I mentioned, data exploration is really only possible with disaggregated data, at least to some extent. Data that is in the form of a spreadsheet crosstab has already been transformed. It can potentially hold key insights that can no longer be seen because it has been aggregated along some dimension or by a mathematical aggregation. This can make it difficult or impossible to drill deeper into the data. It can also produce meaningless results because a measure has been aggregated along a different dimension. Of course, if this is a standardized data set that you use often, this stage may be truncated in certain situations. But keep in mind those analytic biases: Are you breezing through this stage over and over because you know the data, or are you keeping an open mind about the data?
Remember, data can change over time. And if you don’t explore your data set continuously, you may be missing important insights.
I mentioned that learning the data shape is one of the two core areas of this stage. By data shape I mean how is it arranged. Do columns need to be pivoted into row data? Do I have a large number of dimensions and few measurements or vice versa? Is the data useful in its current form? Can I answer the questions I have using this data source, or do I have to source more data and figure out how to bring the sets together?
Data shape also relates to the relative size of the data set—how many rows of data? What analysis periods can I cover? Is the data high volume telemetry data or am I looking at 300 new sales records per day? Does the data cover the analysis period I need?
The second part of this stage entails making associations within the data. A descriptive statistics approach works well here. Create histograms for all measures and look at the distributions: Are they normally distributed or are they skewed to one side or the other? Create scatter plots to find correlated measures or identify outliers that either point to a data issue or an opportunity. Build basic charts of the different dimensions and see how they interact or influence each other.
At this point, I am not in goal-oriented analysis nor am I paying much attention to answering my question. I’m merely sifting to try and find interesting anomalies that I can come back to later for deeper analysis.
Equipped with a strong understanding of my data’s shape and characteristics inherent in the dimensions and measures, I’m ready for the analysis stage.
This is where all of your hard work in the previous stages comes together. Armed with your research questions or hypothesis it is now time to roll up your sleeves and find the answer. Build your analysis using our visual analysis best practices to maximize impact and ROI. Make sure that you stay on task and keep your question at the forefront of the analysis.
If you find something interesting but unrelated, put that item on your whiteboard and use it to drive more analytic cycles. The goal of the analysis stage is to answer the question you asked in stage one and prepare an actionable, visual story to socialize and gain buy-in for data-driven decision-making.
At Tableau, we developed Tableau Server to share our visualizations. In the early days of Tableau Desktop 3.0, I didn’t have the luxury of publishing to a server and allowing other people to interact with my analysis. Instead, I had to email packaged workbooks around to people with Tableau Reader or sneaker-net them over on a thumb drive. It got the job done, but utilizing server maximizes socialization of Tableau visualizations. Of course, Tableau Online is also a great way to socialize your visualizations in the cloud.
However you choose to socialize your analysis, the key is to share your insights and take action on it. Whether it’s spotting and capitalizing on a hot trend or reducing defects, your analyses only generate ROI if they are shared and acted upon.
An analytic cycle framework can help you have a repeatable Agile process to drive actionable intelligence while building a culture of analytics. By having a shared framework you enable everyone to confidently see and understand data.