Data scaling refers to the process of transforming the data to a common scale so that the model can learn the patterns more effectively. Models that rely on distance (or, similarity) such as k-NN (k Nearest Neightbor), SVM (Support Vector Machine) as well as Neutral network models such as MLP, LSTM, etc. require data scaling.
Let's consider k-NN which finds nearest k points given a query point. If the data is not scaled, the distance is dominated by the feature with larger scale. For example, if one feature is in the range of [0, 1] and the other is in the range of [0, 1000], the distance is heavily influenced by the second feature. This is why we need to scale the data to bring all features onto the same scale.
There are a couple of reasons why neural networks require scaling:
There are a couple of scaling methods:
As explained, scaling requires reading the entire data. Therefore, to avoid data leakage, we perform scaling AFTER the data split so that we do not accidentally include future data during training. See also feature generation and stationary time series.