
Data as of 10 Jan 25
Introduction
For short-term predictions, it is possible to use just the price data. i.e. Open, High and Low for input and Close for the output. This is what is done by traditional Technical Analysis practitioners. But the difference is that our models do not assume the inter-relationship between the input variables is linear and that the probability distribution is Gaussian (Normal). This makes a big difference.
Time series of <300 data points is adequate. We are more concerned with recent patterns and not those long ago. In most cases, Volume’s contribution to model quality and accuracy is insignificant. We can do a check on this by using Principal Component Analysis (PCA). For example, below we have calculated the PCA of the Dow Jones Industrial Average Index as standardized correlation coefficients. You can see that Open(O), High(H), Low(L), Close(C) and Volume(V) is able to explain 82% of the Variance (i.e. explain the part of the DJIA that is not noise). But while O, H, and L each contribute 0.49, V contributes only 0.18.

Which regression algorithm to use for your prediction model?
There are so many types of regression algorithms ranging from simple Linear Least-Squares regression to Decision Trees, K-Nearest Neighbor (KNN) to Neural Networks (NN). Leaving out simple linear regression and KNN (which is more suited to clustering and classification objectives) we focus on Decision Trees and Neural Networks.
Before we make a comparison, note that there are other procedures to implement before we can Run although we will not cover these procedures in this article. (1) The data must be partitioned into Training (50%), Validation (30%) and Testing (20%) sets. The Testing set is a hold-out portion of the data, which the model has not seen until put to the test after Training and Validation. (2) The data must be standardized by converting them into Z-Score (X-Mean X /Standard Deviation) or in some cases Unit Normalization between 0 and 1. (3) In our models, after the Decision Tree or Neural Network, we still need to fit the data into an ARIMA (Auto-Regressive Integrated Moving Average) model and obtain its prediction for 1-20 steps ahead. . Then we need to do a Monte Carlo Simulation of this prediction to generate a Probability Distribution Function (PDF). The generated PDF is then fitted into an unbounded Metalog PDF.
The modeling algorithms we will compare
1. Single Decision Tree
2. Random Trees (also called Random Forest)
3. Ensemble of Boosted Decision Trees
4. Ensemble of Bagged Decision Trees
5. Back Propagation Neural Net (NN) with no Hidden Layer and a Linear Output (instead of Sigmoid or Hyperbolic Tangent (TanH).)
* From experience I have found that for simple short-term modeling, using sophisticated NNs with Hidden Layer and activation functions such as Sigmoid TanH or ReLU, will overfit the data.
# All Decision Trees will have 10 levels, 20 nodes and 50 splits, and fully-grown trees will be pruned.
Model performance metrics
We will use Mean Absolute Deviation (MAD) to measure model performance, the smaller the MAD the more desirable. Let’s continue and try to model the DJIA. The results are:

Here you can see that Ensembles of Boosted Decision Trees perform best. And the Neural Network has by far the worst performance. In part 2 of this we will explain why NNs perform badly. But at this point I would like to point out that predicting the DJIA is not advisable. It is too event-driven and as you know, the US markets reacts to everything and anything. So here we can see that the MAD >100. This definitely too high, in good quality models it should be <10 and in many cases even <1. There is a test to show how unpredictable the DJIA. The Durbin-Watson Test for Independence (DWT) In the Table below (see “Indep. Test” in last row,) DWT is 2.26 which means that the DJIA has negative autocorrelation characteristics. i.e positive moves are likely to be followed by negative moves which means trends do not last long (at least at this juncture of time just before the 20 January coming of MAGA Man.) and reversals can be swift and very significant.

Proceed to Part 2 for a brief description of the different forms of Decision Trees we have implemented.
Comments