Feature Scaling

When training machine learning algorithms, one of the techniques that will speed up your trainings is if you scale your features.

In this post, we'll explore 3 feature-scaling methods that can be implemented in scikit-learn:

  1. StandardScaler
  2. MinMaxScaler
  3. RobustScaler

Import Dependencies (As Always)

import pandas as pd
import numpy as np
from sklearn import preprocessing
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline
matplotlib.style.use('ggplot')

1. Standard Scaler

The StandardScaler assumes that your data is normally distributed within each feature, and it'll scale them such that the distribution is now:

  • centered around 0
  • with a standard deviation of 1.

The mean and standard deviation are separately calculated for the feature, and the feature is then scaled based on:

$$\frac{x_i - \text{mean($\boldsymbol{x}$)}}{\text{stdev($\boldsymbol{x}$)}}$$

Let's start coding.

# Create data samples x1, x2, x3
np.random.seed(1)
df = pd.DataFrame({
    'x1': np.random.normal(0, 2, 10000),
    'x2': np.random.normal(5, 3, 10000),
    'x3': np.random.normal(-5, 5, 10000)
})
# Use StandardScaler
scaler = preprocessing.StandardScaler()
scaled_df = scaler.fit_transform(df)
scaled_df = pd.DataFrame(scaled_df, columns=['x1', 'x2', 'x3'])
# Plot and visualize
fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(6, 5))

ax1.set_title('Before Scaling')
sns.kdeplot(df['x1'], ax=ax1)
sns.kdeplot(df['x2'], ax=ax1)
sns.kdeplot(df['x3'], ax=ax1)

ax2.set_title('After Standard Scaler')
sns.kdeplot(scaled_df['x1'], ax=ax2)
sns.kdeplot(scaled_df['x2'], ax=ax2)
sns.kdeplot(scaled_df['x3'], ax=ax2)

plt.show()

standard-scaler

As you can see, all features are now on the same scale (relative to one another).

Tip: If you use this to scale your training data, make sure to use the same mean and standard deviation to normalize your test set.


2. Min-Max Scaler

The MinMaxScaler is probably the most famous scaling algorithm and it follows the following formula for each feature:

$$\frac{x_i - \text{min}(\boldsymbol{x})}{\text{max}(\boldsymbol{x}) - \text{min}(\boldsymbol{x})}$$

Basically, it shrinks the range such that it is now between 0 and 1 (or -1 to 1 if there exist negative values).

The MinMaxScaler works well for cases when the distribution is not Gaussian or when the standard deviation is very small. However, it is sensitive to outliers — though this be rectified by RobustScaler, which we will see soon.

# Create data samples x1, x2, x3
df = pd.DataFrame({
    # positive skew
    'x1': np.random.chisquare(8, 1000),
    # negative skew 
    'x2': np.random.beta(8, 2, 1000) * 40,
    # no skew
    'x3': np.random.normal(50, 3, 1000)
})
# Use MinMaxScaler
scaler = preprocessing.MinMaxScaler()
scaled_df = scaler.fit_transform(df)
scaled_df = pd.DataFrame(scaled_df, columns=['x1', 'x2', 'x3'])
# Plot and visualize
fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(6, 5))
ax1.set_title('Before Scaling')
sns.kdeplot(df['x1'], ax=ax1)
sns.kdeplot(df['x2'], ax=ax1)
sns.kdeplot(df['x3'], ax=ax1)
ax2.set_title('After Min-Max Scaling')
sns.kdeplot(scaled_df['x1'], ax=ax2)
sns.kdeplot(scaled_df['x2'], ax=ax2)
sns.kdeplot(scaled_df['x3'], ax=ax2)
plt.show()

minmax-scaler

While the skewness of the distributions are maintained, the 3 distributions are brought into the same scale such that they overlap.


3. Robust Scaler

The RobustScaler uses a similar method to the Min-Max scaler. However, it uses the interquartile range instead of the min-max, which makes it robust to outliers. It follows the following formula for each feature:

$$\frac{x_i - Q_1(\boldsymbol{x})}{Q_3(\boldsymbol{x}) - Q_1(\boldsymbol{x})}$$

As usual, let's look at a few visualizations to get a better understanding.

# Create data samples x1, x2
x = pd.DataFrame({
    # Distribution with lower outliers
    'x1': np.concatenate([np.random.normal(20, 1, 1000), np.random.normal(1, 1, 25)]),
    # Distribution with higher outliers
    'x2': np.concatenate([np.random.normal(30, 1, 1000), np.random.normal(50, 1, 25)]),
})
# Use RobustScaler
scaler = preprocessing.RobustScaler()
robust_scaled_df = scaler.fit_transform(x)
robust_scaled_df = pd.DataFrame(robust_scaled_df, columns=['x1', 'x2'])

# Use MinMaxScaler
scaler = preprocessing.MinMaxScaler()
minmax_scaled_df = scaler.fit_transform(x)
minmax_scaled_df = pd.DataFrame(minmax_scaled_df, columns=['x1', 'x2'])
# Plot and visualize
fig, (ax1, ax2, ax3) = plt.subplots(ncols=3, figsize=(9, 5))
ax1.set_title('Before Scaling')
sns.kdeplot(x['x1'], ax=ax1)
sns.kdeplot(x['x2'], ax=ax1)
ax2.set_title('After Robust Scaling')
sns.kdeplot(robust_scaled_df['x1'], ax=ax2)
sns.kdeplot(robust_scaled_df['x2'], ax=ax2)
ax3.set_title('After Min-Max Scaling')
sns.kdeplot(minmax_scaled_df['x1'], ax=ax3)
sns.kdeplot(minmax_scaled_df['x2'], ax=ax3)
plt.show()

robust-scaler

Note that after applying RobustScaler, the distributions are brought into the same scale and actually overlap — while the outliers remain outside the bulk of the new distributions. Whereas in MinMaxScaler, the two normal distributions are kept separate by the outliers that are inside the range of 0 and 1.


Bonus: Normalizer

The Normalizer scales each value by dividing each value by its magnitude in n-dimensional space for n number of features.

For example, if your features were x, y, and z, your scaled value for x would be:

$$\frac{x_i}{\sqrt{x_i^2 + y_i^2 + z_i^2}}$$

Essentially, it normalize samples individually to unit norm. Each sample (i.e. each row of the data matrix) with at least one non zero component is rescaled independently of other samples so that its norm (l1 or l2) equals one.

from mpl_toolkits.mplot3d import Axes3D
df = pd.DataFrame({
    'x1': np.random.randint(-100, 100, 1000).astype(float),
    'y1': np.random.randint(-80, 80, 1000).astype(float),
    'z1': np.random.randint(-150, 150, 1000).astype(float),
})
scaler = preprocessing.Normalizer()
scaled_df = scaler.fit_transform(df)
scaled_df = pd.DataFrame(scaled_df, columns=df.columns)
fig = plt.figure(figsize=(9, 5))
ax1 = fig.add_subplot(121, projection='3d')
ax2 = fig.add_subplot(122, projection='3d')
ax1.scatter(df['x1'], df['y1'], df['z1'])
ax2.scatter(scaled_df['x1'], scaled_df['y1'], scaled_df['z1'])
plt.show()

normalizer


In a Nutshell

You probably won't go wrong if you use StandardScaler to scale your features.


If you enjoyed this post and want to buy me a cup of coffee...

The thing is, I'll always accept a cup of coffee. So feel free to buy me one.

Cheers! ☕️