This is part of a series of reflections inspired by my courses at HBX, an online business school cohort powered by Harvard Business School. With Business Analytics, Economics for Managers, and Financial Accounting, I'm learning the fundamentals of business. Find the whole series here.
The most important rule of finding a solution: we have to know the problem.
Usually, the numbers tell us the problem. We're not hitting our revenue. Bounce rates are too high. ROI too low. Often, we take these numbers at their face value. We trust numbers to tell us what to do and how to solve the problem. Because numbers can't lie, right?
Wrong. Kind of.
The data sets we create depend upon what problems we want to solve. We run completely different analyses to find the bright spots vs. growth opportunities. It's not as if we try to spin the data in our favor; rather, with a particular problem in mind, you think about the data differently. It informs our decision to adjust data, for instance, or how we evaluate outliers.
A concept called hidden variables illustrates this idea. When you're looking at a data set, hidden variables are the things behind the data you're comparing that might not be fundamentally apparent.
HBX provided a great example of this. Think about the relationship between ice cream and snow shovel sales. We see there's a direct relationship between them: when one goes up, the other goes down. But if we want to see why this happens, we have to dig deeper. The real reason is beyond the data: the season. We don't tend to buy ice cream in the winter and snow shovels in the summer.
What about the real world? Hidden variables crop up all the time. Whether that's market forces, political events, or the importance of culture when dealing with specific issues, it's important to always go back to the problem you're trying to solve.
One great example from our peer reflection involved correlating traffic accidents and time of day. Typically, we see more accidents between the hours of 7AM-9AM and 3PM-6PM. An official could see this and declare, "Well, it's clear that time of day is too dangerous to drive." The hidden variable? Number of cars on the road. People are shuttling back and forth to work then, and when more people are on the road, there's a higher likelihood of an accident. I love this example because it's something that, without careful peeling back of the data, it's so easy to draw the wrong conclusion.
Correlation doesn't equal causation. All it shows is a relationship between two things. Just because the numbers are there, doesn't mean they're telling you the whole story. We need to always think past their face value to the story they're really telling us--it's the only way we can find our "truth."