## Data Preprocessing

In all of the analysis, we perform the following timeseries transformations unless otherwise stated:

- Boxcox transformation
- First order diff

### Why stationary?

Various timeseries analysis assumes that the data is *stationary*. This can be represented mathematically as follows:
$$ F_X(x_{t_1+\tau}, \cdots, x_{t_n+\tau}) = F_X(x_{t_1}, \cdots, x_{t_n}) $$ for all $\tau$, $t_1$, $t_2$, $\cdots$, $t_n$, and $n$.

Put another way, it means "A stationary time series is one whose properties do not depend on the time at which the series is observed." (source) For example, white noise is stationary.

Wide sense stationary is defined as constant mean, constant vairance, and covariance depending only on time difference. For example, $ y = 2t $ is not stationary as its mean is ever increasing in time.

$ y = tsin(t) $ isn't stationary as the variance (or, the amount of shaking up and donw) is getting bigger as time goes on.

Autocovairance measures the linear relationship between two timepoints. Roughtly, it measures the the amount of two values' change in magnitude and direction. For example, $t=sint$ has covariance that depends only on time lag between two points as it's cyclic. However, $ t = tsin(t) $ shown above depends on time as the their value is getting larger as time goes on.

### What is boxcox transformation?

Boxcox transformation automatically stabilizes the variance of the data, making it more normal distribution like (aka bell-shaped). And why normal? Because that's what most of the statistical tests assume. See below for normal distribution.

Consider non stationary data of $y=tsin(t)+t$. Boxcox transformation produce the following.

## Difference

Per definition of stationary, seasonal data is non-stationary. For example, GDP, sales of air conditioners, and temperature are all seasonal. Common techniques to remove seasonality is difference. It's calculated simply as $y_{t} - y_{t-1}$.

Take $y=tsin(t)+t$ as an example. If we take the difference, after boxcox transformation, we get data that looks much more constant mean, constatnt variance, and constant covariance (or, more like white noise).

## Stationary bitcoin data

Therefore, we apply boxcox transformation and difference to the bitcoin data to make it stationary. Below is the result.