I have been spending a lot of time lately analyzing regression and related factors. As my stock market model creeps toward the realm of artificial intelligence, making decisions based on related data sets, the biggest decision as a programmer / data scientist is, how do I know a coorelation between data is predictive vs. just a coincidence?
So if you are reading this post to try to find the answer to that question, you can stop reading now, because I have nothing for you. Right now I am just using my gut (Feeding a computer datasets is probably a lot like parenting – only tell your children the things they need to know, don’t confuse them with too much data as they build their moral compass).
Anyway, as I think about all these issues, every time I think I find a meaningful correlation, I think back to the Super Bowl Indicator. Are you looking for a sure-fire predictor of stock market performance as measured by the S&P 500, with a track record of being correct 80% of the time in last 50 years?
Then maybe your strategy should be if an AFC team wins the Super Bowl, go all in. If the NFC wins, take all your savings out of the market. Of course I am not serious, but it is curious that this is one of the best predictors out there. Since the year 2000, it has a 66% accuracy
I bring this up because this is a great example of the problem faced by software developers/data scientists such as myself. As artificial intelligence drives more and more software in our world, we will expose new flaws in the software development process. The programmer’s dilemma will be to decide what data to expose to the computer to help it make decisions. Intelligence (either artificial or human) is built up off experience and/or data available. If I just built my stock model off the Super Bowl indicator, rather than looking at other technical or fundamental factors, I would have easily outperformed myself and just about every other financial advisor out there. Maybe I should spend more time trying to justify that it is a meaningful coorelation, rather than just a coincidence. As of this writing, I still can’t do it, and I am ignoring it.
So don’t consider this post investment advice, just a thought about how machines will be making decisions in the future, and the flaws that will be programmed in to future software based on the decision by the programmer about what data is meaningful. There will be some mistakes made.
And if you see an investment adviser who touts an 80% success rate over the last 40 years, ask him or her if they know who won the Super Bowl last year.