Site icon Datafloq News

The 7 Deadly Sins of Quantitative Data Analysts

Quantitative analysis: Its such an appealing phrase. It provides a sense of order and rationality in an often unordered and irrational world. When done right, quantitative analysis can give you a sense of satisfaction and accomplishment. Youve quantified, youve analyzed, and as a result you now have a nicely working model with which to grow your business.

Right?

Unfortunately, it doesnt always work out that way. There are actually quite a few ways to screw up quantitative analysis. Sometimes, we dont have any control maybe the right data isnt available, or maybe it doesnt quite translate into the format we need but other times we make mistakes that could have been prevented, if only wed paid a little more attention.

At Quandl, we provide datasets on everything from education and demographics to investments and financial instruments. Two of our most commonly used collections of data fall in the latter categories, futures prices and stock data. Its our mission to make all the numerical data in the world available in one place. Given the large amounts of data we deal with from various sources and in various formats, weve seen pretty much every kind of sin that can be committed with respect to big data.

After doing a bit of analysis ourselves, weve boiled down those transgressions into the 7 Deadly Sins of Quantitative Analysis. These are mistakes that, despite potentially leading to big problems, we see people making over and over again. Our goal here is not only to help drive better data, but also to drive better uses of that data.

Greed

Do not overfit your model. Stay lean and parsimonious.

It can be tempting to throw everything up to and including the kitchen sink into your model. However, sometimes you just need to step back and ask yourself: Does a kitchen sink really belong here?

When it comes to big data, it can be hard to rein in greed. Our brains (and often our bosses) yell, More, more, more! as if simply collecting the biggest base of data is an end unto itself. But maybe thinking of the big in big data as referring to the potential results and revelations the data can offer, rather than the size of the database itself, can lead to more productive uses.

While its true that useful insights can sometimes emerge from data collected without any specific purpose in mind, theres a bigger virtue in keeping things slim and focused.

Gluttony

Do not extend your model beyond its natural scope. Stick to what you know.

Speaking of slim and focused, gluttony and greed are sometimes confused. But whereas greed is all about obtaining more stuff (data, in this case) without any regard for its use or necessity, gluttony is typically seen as an overextension of a natural and necessary activity.

Its entirely natural to create models that fit within the boundaries of your goals and area(s) of expertise. However, when you start creeping beyond those boundaries, it can be difficult to align the work youre putting into your analysis with your business needs. Furthermore, that extraneous work can cause projects depending on your results to become delayed or even result in faulty analysis due to overcomplexity.

The only way to prevent such distension is to be constantly mindful of your models scope. Things will certainly need to be adjusted from time to time, but such adjustments should be the result of careful and deliberate consideration not a side effect of scope creep.

Lust

You will be tempted to iterate on out-of-sample data. Dont do it.

Testing data out of sample is a critical component of developing any type of forecasting model. It helps you discover potential biases in your initial dataset, and doing it properly will lead to confirmation of your model, or at least provide useful information to help you tweak it.

However, there can be too much of a good thing. Learning the limits of out-of-sample data is just as important as learning the limits of in-sample data. To build on your model, its not enough to just keep running recursively through datasets. As your dataset matures, you will need to not only readjust your model, but youll need to re-determine the sample data you use to both create the model and test it.

Sloth

If you are lazy or sloppy with your data, you will fail.

This sin seems pretty straightforward, but being straightforward isnt the same as being easy to avoid. The temptation to let things slide can be strong, and it can be hard to preserve the integrity of your data.



Being lazy or sloppy can take a number of forms. It can start in the design phase by failing to consider all of the various data points that you want to encompass in a record, or by not fully understanding the relationships between different types of records. It can continue into the data collection phase by allowing the collection of incomplete or unqualified records. And of course it can appear in the analysis itself by cutting corners when creating the model or failing to appropriately account for exceptions and outliers.

Everyone knows the saying, A stitch in time saves nine. Well, when you scale that one stitch across millions (or more) of records, thats a lot of unnecessary stitches saved. A little work on the frontend will always pay out in the end.

Anger

Never get emotional. All models eventually fail. Recognize failure, and react rationally.

Lets face it, your data doesnt care about your ego. If you want to rant and rage at your failures, the only one hurting yourself is you. At the same time, you are hurting those depending on you clients, colleagues, business partners, etc. to come up with a working model.

When it comes to failure, the best thing to do is recognize it happens, and then figure out how to get it right the next time. Focus on identifying and fixing the problems. Just remember that you will probably have more problems to identify and fix after that. Wash, rinse, repeat.

Also, it may help to remember that there will always be more data to collect and analyze, which means your model will need to be incrementally improved as time goes on. It will never be perfect, and thats a good thing. It means you will always have a new puzzle to decipher! If you could figure everything out all at once, life would get really boring really quickly.

Envy

To be like the best, emulate their rigor and discipline; not their models. Copycats fail.

Coming up with a new model, a new way of looking at the world, is rarely easy. It requires shifting your own point of view and seeing data in a way that you and perhaps nobody have never seen it before. Given that reality, it can be tempting to rely on someone elses models, since theyve already done the work.

The problem is that theres a real advantage of being the first to market, and that advantage already lies with someone else. If their model is really that good, chances are extremely low that youll be able to implement it as well as they already do. And by the time you get to where they are now, theyll be that much further ahead. Its a chase that youre ever unlikely to win.

That doesnt mean all hope is lost, though. There is actually a science to innovation, and it requires a meticulous and steadfast approach. Look for connections that nobody else has explored, and then test them to see how they work. Explore both unexpected successes and the unexpected failures to understand them, then assemble a new model using the best bits from each. Done right, youll wind up with a new model of your own and then others will be chasing you!

Pride

The market knows more than you and always will. Forget this truth at your peril.

Pride comes before the fall, whether were talking a metaphorical fall from grace or a fall in profits. Its always at the moment that you start thinking youre unstoppable and your analysis cant be beaten that some new variable or circumstance comes along to undercut your understanding.

Analyzing large sets of data requires a certain level of humility. You are at the mercy of the data, and if you go in with the expectation of controlling the outcomes, theres a good chance youll get pummeled by the results. Analysis is all about seeing where the data leads; the moment you try to lead it, it will pull you the other way.

The fact is, no one person can ever know or take into account every bit of data thats out there. Theres always some bit of local knowledge that will take time to filter into your dataset, and knowing the limitations of your data is more important than assuming you can know everything.

Sooner or later every quantitative analyst is tempted by one or more of these 7 Deadly Sins. These forbidden fruits can penetrate even the most sophisticated analysis. However, by knowing the signs, you may be able to recognize them when they come up and perhaps even avoid them altogether.

Cover image: Cameron Stewart

Exit mobile version