
The modeling of financial instruments must take into account the unique statistical characteristics of financial markets data.
1. Financial markets data (“FMD”) are inherently ‘noisy’ and there is difficulty in separating the noise from the true signal. We must smoothen the data before we use it as input into the model. The chart below shows how the ups and downs of the price of iShares Siver Trust ETF (SLV) are smoothed by a technique called Holt-Winters No Trend Smoothing. The Red line is the actual data, and it is smoothed by the Blue line. There are many other more sophisticated ways for ‘de-noising’ and the models on this website do apply them.

2. FMD almost always do not have a Normal Distribution. A Normal Distribution is the familiar bell-shaped curve commonly applied in traditional statistical analysis to determine the probability of a variable having a specific value. FMD have irregular shapes, may have more than one peak, skewed to the right or left, and have long fat tails.
The graphic below illustrates a Probability Density Function (PDF) for a financial instrument that displays long fat tails, skewness, and multiple peaks.)
Long Fat Tails: Indicating higher probabilities of extreme values compared to a normal distribution.
Skewness: Here, the distribution is shifted towards the right side, suggesting asymmetry.
Multiple Peaks: Reflecting the presence of more than one dominant behavior or market regime.

Because of these characteristic we have to find a way in our models to ‘fit’ the data into a class of PDF that best approximates the data- without underfitting or overfitting it. There are methods in Statistics to fit to the right PDF. The chart below shows how the iShares Silver Trust price data is fitted to a new type of PDF called a MetaLog PDF (Orange) which approximates it better than a classical Beta General PDF (Dark Brown).

3. The inter-relationship between variables in a financial instrument are complex, non-linear and dynamic (continuously evolving). For example, if we have the Open(O), High(H), Low(L) and Close (C) prices of an ETF, a change in O does not lead to proportionate change in H, L, or C. This can be seen in the paired scatterplot below. If the inter-relationships between O, H, L and C were linear all the scatter plots would be a 45-degree line rising up from bottom left corner to right top corner. As you can see, each pair of variables has a different-shaped scatter plot.

4. A related issue to noise in data is the need to extract the main features of the data. Let’s first explain what are the ‘dimensions’ of a time series data set. If you have 1 column the data is 1-dimensional; 2 columns is 2-dimensional and 3 columns is 3-dimensional and so on. We can only visualize 3 dimensions but in the world of mathematics the number of possible dimensions is infinite. With many dimensions, the noise in the data increases and the model doesn’t know which features are important. We can reduce the dimension and extract the important features by many techniques. But here we show the simplest: Principal Components Analysis (PCA). Using the same SLV ETF Open, High, Low and Close data as above (which is 4 ‘dimensions’), we apply PCA. The Table below shows that actually you need only one Principal Component to do your modeling. Component 1 (PC1) accounts for 99.7% of the variance in the data. And that is of course and because O, H, L, and C are very highly correlated. With dimension reduction the model will be more efficient.

Conclusion
The unique statistical characteristics of FMD means that we must use tools that go beyond classical Statistics Theory to build our models. Fortunately, with the advent of new algorithms in Artificial Intelligence (AI), new tools in Econometrics and non-linear Statistics as well as an increase in computing power of ordinary computers we are able to do that. Thus, the models on this website combine AI, Econometrics and Statistical techniques.
Comments