VIRTUAL DAN

VIRTUAL DAN

Notes from my travels around the internet

VIRTUAL DAN
  • My Pacific Northwest Solar
  • About

Building a Prediction Model Pattern

  • Investing
  • Technology

I have been building prediction modeling applications for years as a investor, as a way to try to identify when the various asset classes or particular stocks may be over or under priced. My current model is over 15 years old, and as you might guess is becoming a huge mess of code-spaghetti which is becoming difficult to modify.

Recently, I stumbled across a full suite of college football data, and started to wonder if one could build a model to predict college football games. Rather than try to copy my existing investment model, I decided to mentally start from scratch and figure out the best way to design predictive models for maintainability. I now have a college football game prediction model up and running, using my new pattern I designed during this process:

Now this might be Data Science 101 to a data scientist, but this is not my area of expertise. My software suite is a SQL Server database and C#, tools I am very comfortable with. Rather than learn new tools and software specially built for data modeling, I thought it would be more interesting to design my own custom approach. I am a software developer, so my thinking how how to build this process was inspired my Model/View/Controller (MVC), a software design pattern that focuses on separation of logic for interconnected systems. So taking this foundation, I have broken the process of setting up an managing the model into 4 main components.

  1. Create Program to Load Data. Before I build a model, I have to make sure I have access to the data necessary to power it. There are plenty of great API’s to gather investment data, and if necessary data can be gathered via data scraping. I have a good library of tools to call APIs, and a nice suite of data scraping tools. So building the logic usually takes some time, but the logic to gather the data can be nicely compartmentalized for easy maintenance.
  2. Create Program to Regression Test Various Assumptions. Before building the program, you have to define a rough set of assumptions as to the cause and effects of various factors. The set of assumptions you create can only be limited by the data you have available. For example, for my College Football prediction model one assumption I tested was that a team is more valuable after a big home game loss. The assumption is the team might be more motivated to do well following a bad home loss, and potential betters are soured on the team. So you look at the data you have, then create various assumptions you can test against the data. Once you have a set of assumptions, you create a program to fire the assumptions at your prediction engine with varying the weight of each assumption each run. Doing this you hopefully identify assumptions that have no correlation to future performance, and ones that have a strong correlation or inverse correlation to future performance. Below I have expanded on how the prediction engine is built, as it is a core piece of the program.
  3. Create Program to calculate the ‘best’ predictions. Once you have tested various factors against your historical data, choose the factors and weightings of each factor that performed best of all the factor combinations you fired at at the prediction engine. This will be what generates the predictions, then looks at the current price (or the current betting line in the case of my college football model), and determine the ‘best’ value prediction. Note that I plan to rerun my regression tests on this model quarterly, so that I can see how well the assumption weightings are holding up. If some start to deteriorate, I may adjust factors and weightings as appropriate.
  4. Create Program to track predictions and update results. I think this is perhaps the most important piece. The prediction engine bases it’s prediction based on past data, so it is important to see if past data accurately predicts future results. So for example for the college football predictions, every Monday I run a job that updates the weekend scores, then compares the results to my predictions for the week. Each week I will look closer at the losses, to see what I missed, and maybe give me some ideas for additional factors to add. Of course, new factors may mean collecting more data, which further adds to the effort of building and maintaining the model. It is a very iterative process, as optimizations can always be made.

The Prediction Engine

Building the prediction engine is an iterative process in itself. The plan is to start small, then slowly add additional calculations over time. As long as additions are managed in an organized manner, the code base should be maintainable even after adding a large number of factors. The prediction engine (described in the big square in the diagram above) consists of 3 major parts.

a. Build Objects. The first thing to do when firing up the prediction engine is to pull the data stored in the database into a view model that exposes the data in a way to be easily accessible. These are typically complex objects that represent the entity you are making a prediction on (i.e. football game, a stock market security, asset class, etc.). For instance, a college football model would pull in a game object, which would have two teams attached to it with all the statistics and history needed for each team. For instance, a ‘bad previous week home team loss factor’ will require looking at past game performance in order to see if the a team had a bad loss in the previous week. As long as the data is there, that is a fairly simple subroutine to write.

b. Generate Predicted Value. Now that you have your data accessible – fire your list of assumption factors and weightings to calculate a value. To simplify the architecture of this, I have a separate subroutine for each factor calculation to try to avoid my logic bloat. This will allow me to isolate factors, and add new ones or delete invalid ones as necessary.

c. Generate Recommended action. Once you have calculated the value of all your assumptions against an object, you should have a score for that object. That score can then be compared to the price of the object to see if there is any action to be taken. For example, take a college football game, and given your assumptions and the data available step b came up with a calculation that the home team should win by 3 points. If the betting line has the home team favored by 14, and your threshold for action is a 7 point differential, then the recommendation action would be to place a bet on the visiting team. The same works for a stock market security. If step b calculates a stock price of $15, and the stock is priced at $10 the recommended action might be to buy the stock.

Note that it is also valuable to track the variability of the model in the form of standard deviation or R value. Some models may show a coorelation, but have a wide deviation. These deviations will help you set your ‘time to take action’ price. Typically the wider the deviation, the higher I set my action price.

Breaking the logic for this prediction engine into segmented parts should really help the management of the logic. In addition, I have a pretty good library of reusable logic components that I should be able to apply across multiple predictive models. My goal here is to slowly increase the size and scope of the calculations, while keeping the overall system pretty simple.

Now that I have my college football predictive model working, I will just continue to add assumptions to see if I can continue to increase the accuracy of my predictions. Then I will start tearing out components of my existing investment prediction engine, and rebuild it using this new model.

When will I be done with this project? Hopefully never. If all goes well, these models should be continually evolving and growing as more data is collected, and hopefully become more accurate.

APPIP ERROR: amazon-search[

]
November 13, 2020 Dan

Post navigation

Zillow vs Redfin → ← Blazor and WordPress – The Story Continues

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Archives

  • August 2021 (1)
  • May 2021 (1)
  • April 2021 (1)
  • March 2021 (1)
  • February 2021 (1)
  • January 2021 (1)
  • December 2020 (2)
  • November 2020 (2)
  • October 2020 (2)
  • September 2020 (2)
  • August 2020 (2)
  • July 2020 (2)
  • June 2020 (2)
  • May 2020 (2)
  • April 2020 (2)
  • March 2020 (2)
  • February 2020 (2)
  • January 2020 (3)
  • December 2019 (2)
  • November 2019 (2)
  • October 2019 (2)
  • September 2019 (1)
  • August 2019 (2)
  • July 2019 (2)
  • June 2019 (1)
  • May 2019 (3)
  • April 2019 (2)
  • March 2019 (2)
  • February 2019 (2)
  • January 2019 (3)
  • December 2018 (2)
  • November 2018 (2)
  • October 2018 (2)
  • September 2018 (2)
  • August 2018 (2)
  • July 2018 (2)
  • June 2018 (2)
  • May 2018 (2)
  • April 2018 (2)
  • March 2018 (2)
  • February 2018 (1)
  • January 2018 (3)
  • December 2017 (2)
  • November 2017 (2)
  • October 2017 (2)
  • September 2017 (2)
  • August 2017 (2)
  • July 2017 (2)
  • June 2017 (3)
  • May 2017 (2)
  • April 2017 (1)
  • March 2017 (3)
  • February 2017 (3)
  • January 2017 (2)
  • December 2016 (2)
  • November 2016 (3)
  • October 2016 (2)
  • September 2016 (2)
  • August 2016 (2)
  • July 2016 (2)
  • June 2016 (2)
  • May 2016 (2)
  • April 2016 (2)
  • March 2016 (2)
  • February 2016 (3)
  • January 2016 (4)
  • December 2015 (2)
  • November 2015 (2)
  • October 2015 (3)
  • September 2015 (3)
  • August 2015 (2)
  • July 2015 (4)
  • June 2015 (2)
  • May 2015 (4)
  • April 2015 (3)
  • March 2015 (4)
  • February 2015 (4)
  • January 2015 (4)
  • December 2014 (5)
  • November 2014 (3)
  • October 2014 (5)
  • September 2014 (3)
  • August 2014 (5)
  • July 2014 (4)
  • June 2014 (4)
  • May 2014 (3)
  • April 2014 (3)
  • March 2014 (5)
  • February 2014 (2)
  • January 2014 (5)
  • December 2013 (4)
  • November 2013 (6)
  • October 2013 (3)
  • September 2013 (3)
  • August 2013 (4)
  • July 2013 (3)
  • June 2013 (3)
  • May 2013 (5)
  • April 2013 (2)
  • March 2013 (6)
  • February 2013 (6)
  • January 2013 (5)
  • December 2012 (5)
  • November 2012 (4)
  • October 2012 (3)
  • September 2012 (4)
  • August 2012 (3)
  • July 2012 (3)
  • June 2012 (2)
  • May 2012 (3)
  • March 2012 (3)
  • February 2012 (2)
  • January 2012 (1)
  • December 2011 (3)
  • November 2011 (3)
  • October 2011 (2)
  • September 2011 (2)
  • August 2011 (3)
  • July 2011 (4)
  • June 2011 (4)
  • May 2011 (3)
  • April 2011 (6)
  • March 2011 (8)
WEBSITE DISCLAIMER: The operator of this site (Vertical Financial Systems, Inc) are not registered investment advisers, broker/dealers, or research analysts/organizations. The content on this website is issued solely for information purposes and should not to be construed as an offer to buy, sell, or trade in any way, any security mentioned herein. All information presented on this website is believed to be reliable and written in good faith, but no representation or warranty, expressed or implied is made as to their accuracy, completeness or correctness. You are responsible for doing your own research before investing in any securities mentioned herein. Readers are urged to consult with their own independent financial advisors with respect to any investment. Neither Vertical Financial Systems, Inc, nor its officers or employees accept any liability whatsoever for any direct or consequential loss arising from any use of information on this website.
Full Disclosure: As an Amazon Associate I earn from qualifying purchases
Powered by WordPress | theme SG Simple