This is part of a series of reflections inspired by my courses at HBX, an online business school cohort powered by Harvard Business School. With Business Analytics, Economics for Managers, and Financial Accounting, I'm learning the fundamentals of business. Find the whole series here.
Making decisions isn't always so easy as "because of this, we should do that." The world isn't so linear. With such high stakes, we need to separate what's important from what's not, and what's related with what's not.
Introducing Multiple Regression
When you're dealing with lots of variables at play, you can take regression one step further: multiple regression. This is essentially the same function as with linear, but adding the complexity of additional variables, or y=a+b1x1+b2x2+…+bkxk. Basically, the difference is that single linear regression takes a look at one independent variable's affect on one dependent variable to see the gross relationship between them, while a multiple regression looks at multiple variables' effect on one dependent variable to find the net effect between them.
For HBX, we used the example of home-buying. If I want to buy a home and I'm only looking at the effect of square footage on price, it's only giving me some of the picture. That's single linear regression.
What we really do as potential homebuyers is more like multiple regression: we want to know square footage, but also the neighborhood, school system, distance from our job/major city, crime rate, amount of land included, etc, etc, etc. Multiple regression better simulates the real world because the real world isn't so neat as one thing affecting another.
Visualizing Multiple Regression
Multiple regression is more difficult to visualize because we're looking at so many different factors. Instead of thinking of a 2-D chart, imagine a plane in which all of the possible data points sit.
When There's a Lot of Data...There's a Lot of Chaos
It can get even messier than that, though, if the variables are difficult to separate. In analytics we call this multicollinearity. When this happens, it means that two or more of the independent variables you're studying are too highly correlated to separate out into a regression. Translation: need more data!
Most importantly, with more variables judgement is more important than ever. There are some ways to do that with the data directly (p-values and adjusted R squared values can tell you how significant and how varied the data is likely to be) but more often than not, you'll need to say to yourself: does this variable make sense? Do I care about its impact? Does it play a significant role in the decision I'm about to make?
Sometimes, the data provides answers. But often, it sparks questions of our own.