
Schematic of a single Tree as generated by our modelling
Decision Tree is a category of AI that learns by Supervised Learning. Supervised Learning is a method where a model is trained using correctly labelled data, and the algorithm learns by comparing its output with the actual correct output and adjusts accordingly, progressively minimizing its errors. Having learned the data, the tree is introduced to new and unseen data to solve a problem.
Decision Tree can solve both classification and regression problems. In this website we use it to do regression i.e. predict future values of a time series.
Decision Tree is a non-parametric algorithm; meaning there is no need to make assumptions about the data and the shape of its Distribution. Also, handling missing data is not a problem for Decision Tree.
Like a real tree, it has a root node, branch nodes, leaf nodes, and terminal nodes. Like a gardener, you can prune your Decision Tree. Pruning helps eliminate unnecessary features that the model might otherwise learn.
A small tree with few branches and leaf nodes can underfit the model and not learn the relevant patterns. On the other hand, a large tree, can overfit the model. Most software allows the user to strike a balance between underfitting and overfitting by selecting the metric to define underfitting and overfitting.
Due to their structure and learning algorithm, Decision Tree can easily solve non-linear problems, which is a major advantage for the modelling of financial markets data.
Unlike Neural Networks, the output of Decision Tree is not a black-box and can be displayed like a flowchart and interpreted.
Variations of Decision Tree Modelling
Diagram of a single Tree and a Random Forest *source: www.towardsdatscience.com

Model performance of variations of Decision Tree Modelling

Variations of Decision Tree Modelling
Single Decision Tree: Displaying a basic tree structure with labelled nodes, branches, and leaf outputs. Gives a single output. Chances of overfitting are higher. The Table above ranks model performance with best performance at the top. Out chosen criterion for measuring model is MAD (Median Absolute Deviation). You can see that a single decision tree has the worst model performance. And in this instance a Boosted Ensemble of Trees performs best.
Random Forest: Random forests are collections of randomised decision trees. Random forests reduce the chances of overfitting by randomly selecting variables from the dataset to build a forest of many trees with different combinations of the variables. For time series, each tree provides a prediction, and the final forecast is typically the average or majority vote of all trees. The Random Forest diagram shows a forest of many species of trees, each delineated by a different colour.
Random Trees: Random Trees involve creating multiple decision trees using subsets of the training data, each tree contributing to the final prediction. Each tree uses a random selection of features. These trees operate independently of each other. Unlike Random Forests, which systematically combine predictions, Random Trees, may not involve a structured aggregation process. If a single output is required, they may use an averaging or majority vote system like a Random Forest, but because the trees were constructed in a different way, the results will be different.
Boosted Ensemble of Trees: Boosting is a powerful ensemble technique in statistical analysis and machine learning designed to improve the accuracy of predictive models. The primary goal of boosting is to convert a collection of weak learners (models that perform only slightly better than random guessing) into a strong learner (a model with high predictive accuracy). This is achieved by sequentially training models, where each new model focuses on correcting the errors of its predecessors.
Bagging Ensemble of Trees: Bagging (a.k.a bootstrapping a.k.a sampling with replacement) involves creating multiple subsets of the original training dataset by randomly sampling with replacement. This means some data points may appear more than once in a subset, while others may not appear at all. Each subset generated through bootstrapping is used to train a separate model. Once all models are trained, bagging combines their predictions, and an average is derived to form a final output.
Comments