Background >

Stationary Timeseries

In lots of the analysis, we perform the transformation of the data called to make given data stationary. The transformation includes:

Boxcox (or Yeo-Johnson transformation) transformation
Difference

Stationary time series

Stationarity implies that a dataset's properties remain consistent over time (Reference). By converting time series data to a stationary form, we make the data more predictable and easier to analyze, as it highlights only the meaningful changes we want to focus on. Stationary implies that a dataset's properties remain consistent over time (Reference). By converting time series data to a stationary form, we make the data more predictable and easier to analyze, as it highlights only the meaningful changes we want to focus on.

Wide-sense stationary refers to a process where the mean is constant, the variance is constant, and the covariance depends only on the time lag between observations. For example, $ y = 2t $ is not stationary because its mean value increases over time, indicating that the mean is not constant.

$ y = tsin(t) $ isn’t stationary because the variance increases over time.

Covariance measures the relationship between the values at two different time points. If two values at different times increase together, they have a large positive covariance. If one value increases while the other decreases, the covariance is negative. For example, $t=sin(t)$ has covariance that depends only on time lag, e.g., the covariance between $ t = 2 $ and $ t = 8 $ is equals to covariance at $ t = 3 $ and $ t = 9 $ because the time lag is the same, which is 6.

However, the covariance of $t=tsin(t)$ depends on time $t$ and the specific points in time being compared, as the values continually increase over time.

Boxcox transformation

The Box-Cox transformation is a technique used to stabilize the variance of a dataset. It's a generalization than just taking the log transformation.

Stabilizing the variance means that the spread of the data becomes more consistent across different levels of the data. Especially, boxcox makes data centered around the mean and is distributed normally.

Consider non stationary data of $y=tsin(t)+3t$. Boxcox transformation produce the output that attenuate the variance of the data (or the amount of going up and down).

Yeo-Johnson transformation

Yeo-Johnson transformation is a yet another generalization of the Box-Cox transformation. Yeo-Johnson allows for the transformation of both positive and negative data. This is preferred over Box-Cox as many of the data are not positive, e.g., bitcoin price can be -1.5% compared to the previous day. Since it's a generalization, one can consider that it's essentially doing the same thing as Box-Cox, just that it's more flexible.

Difference

Difference is a technique to make data stationary by subtracting the previous value from the current value. For example, $y_{t} - y_{t-1}$ subtracts the previous value from the current value, removing the trend of the data. Thus, it moves the data around the mean value.

Per definition of stationary, seasonal data is non-stationary. For example, GDP, sales of air conditioners, and temperature are all seasonal as their values depend on time of the year. Common techniques to resolve the issue is difference calculated simply as $y_{t} - y_{t-4}$, i.e., subtracting summer data from the previous year's summer.

Another cause of non-stationary, esp., in data like bitcoin is the presence trends. For example, bitcoin price may increase over time, making mean non constant. To make it stationary, we take the difference of the data, i.e, $y_{t} - y_{t-1}$. Doing so, we can concentrate on changes in the data rather than the absolute values.

Take $y=tsin(t)+3*t$ as an example. Due to $3*t$ term, it has an increasing trend. By applying differencing after a boxcox transformation, the data can be made stationary. In the figure below, we zoomed into the range of [0, 5] to show the small values after the diff.

Stationary bitcoin data

See bitcoin data section for the example conversion.