What is Ridge Regression?

Contributed by: Prashanth Ashok

Understanding Ridge Regression

Ridge regression is a model-tuning method used to analyze data suffering from multicollinearity. It performs L2 regularization to address issues where least-squares are unbiased, and variances are large, resulting in predicted values deviating significantly from actual values.

Cost function for ridge regression:

Min(||Y – X(theta)||^2 + λ||theta||^2)

Lambda represents the penalty term denoted by the alpha parameter in the ridge function. By adjusting the alpha values, the penalty term can be controlled. Higher alpha values lead to a larger penalty, reducing the magnitude of coefficients.

  • Shrinks parameters to prevent multicollinearity
  • Reduces model complexity through coefficient shrinkage
  • Useful for regression analysis

Types of Ridge Regression Models

Ridge regression modifies the standard regression equation to account for variance not explained by the general model. After identifying data suitable for L2 regularization, specific steps can be taken.

Standardization Process

Standardization is critical in ridge regression. Variables, both dependent and independent, are standardized by subtracting means and dividing by standard deviations. All calculations in ridge regression are based on standardized variables, and coefficients are adjusted back to their original scale.

Bias and Variance Trade-Off

The bias-variance trade-off in building ridge regression models can be complex. Generally, bias increases as lambda increases, while variance decreases with increasing lambda.

Assumptions of Ridge Regression

Ridge regression shares assumptions with linear regression, including linearity, constant variance, and independence. Unlike linear regression, ridge regression does not require the assumption of normal error distribution.

For a practical example, consider a linear regression problem focusing on food restaurant data to enhance sales in a specific region.

Required Libraries for Analysis

import numpy as np
import pandas as pd
import os

import seaborn as sns
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
import matplotlib.style
plt.style.use('classic')

import warnings
warnings.filterwarnings("ignore")

df = pd.read_excel("food.xlsx")

After performing exploratory data analysis and handling missing values, dummy variables are created for categorical variables.

df = pd.get_dummies(df, columns=cat, drop_first=True)

Standardizing the dataset is crucial for the linear regression method in ridge regression.

Scaling Continuous Variables

from sklearn.preprocessing import StandardScaler
std_scale = StandardScaler()

df['week'] = std_scale.fit_transform(df[['week']])
df['final_price'] = std_scale.fit_transform(df[['final_price']])
df['area_range'] = std_scale.fit_transform(df[['area_range']])

Train-Test Split

X = df.drop('orders', axis=1)
y = np.log(df[['orders']])

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=1)

Linear Regression Model

regression_model = LinearRegression()
regression_model.fit(X_train, y_train)

for idx, col_name in enumerate(X_train.columns):
print("The coefficient for {} is {}".format(col_name, regression_model.coef_[0][idx]))

Further analysis reveals the impact of variables on the regression model, aiding in business decision-making.

Difference Between Ridge and Lasso Regression

Aspect Ridge Regression Lasso Regression
Regularization Approach Adds penalty term proportional to square of coefficients Adds penalty term proportional to absolute value of coefficients
Coefficient Shrinkage Coefficients shrink towards but never exactly to zero Some coefficients can be reduced exactly to zero
Effect on Model Complexity Reduces model complexity and multicollinearity Results in simpler, more interpretable models
Handling Correlated Inputs Handles correlated inputs effectively Can be inconsistent with highly correlated features
Feature Selection Capability Limited Performs feature selection by reducing some coefficients to zero
Preferred Usage Scenarios Assumes all features are relevant or dataset has multicollinearity Advantageous for parsimony, especially in high-dimensional datasets
Decision Factors Data nature, desired model complexity, multicollinearity Data nature, desire for feature selection, potential inconsistency with correlated features
Selection Process Often determined through cross-validation Often determined through cross-validation and comparative model performance assessment

Ridge Regression in Machine Learning

  • Ridge regression is a crucial technique in machine learning, essential for robust models in scenarios prone to overfitting and multicollinearity. It modifies standard linear regression by introducing a penalty term to the square of coefficients, effectively managing correlated independent variables. Its benefits include reducing overfitting, managing multicollinearity, and improving model generalization.

For practical implementation, selecting the right regularization parameter, lambda, is vital for balancing bias and variance during model training. Ridge regression is widely supported in machine learning libraries, with Python’s scikit-learn being a prominent example.

Regularization Process

  1. Selecting the alpha hyperparameter is crucial in Ridge regularization, requiring manual setting through techniques like GridSearchCV.
  2. GridSearchCV helps find the best alpha value for Ridge regularization.
ridge = Ridge()
parameters = {'alpha': [1e-15, 1e-10, 1e-8, 1e-3, 1e-2, 1, 5, 10, 20, 30, 35, 40, 45, 50, 55, 100]}
ridge_regressor = GridSearchCV(ridge, parameters, scoring='neg_mean_squared_error', cv=5)
ridge_regressor.fit(X, y)

print(ridge_regressor.best_params_)
print(ridge_regressor.best_score_)

The best parameter value is often determined through cross-validation, aiding in model performance enhancement.

From the analysis, the final model can be defined based on the coefficients’ impact on the regression model, helping in understanding essential variables influencing the business problem.

Ridge Regression FAQs

What is Ridge Regression?

Ridge regression is a linear regression method that adds a bias to reduce overfitting and improve prediction accuracy.

How Does Ridge Regression Differ from Ordinary Least Squares?

Unlike ordinary least squares, ridge regression includes a penalty on the magnitude of coefficients to reduce model complexity.

When Should You Use Ridge Regression?

Use ridge regression when dealing with multicollinearity or when there are more predictors than observations.

What is the Role of the Regularization Parameter in Ridge Regression?

The regularization parameter controls the extent of coefficient shrinkage, influencing model simplicity.

Can Ridge Regression Handle Non-Linear Relationships?

While primarily for linear relationships, ridge regression can include polynomial terms for non-linearities.

How is Ridge Regression Implemented in Software?

Most statistical software offers built-in functions for ridge regression, requiring variable specification and parameter value.

How to Choose the Best Regularization Parameter?

The best parameter is often found through cross-validation, using techniques like grid or random search.

What are the Limitations of Ridge Regression?

It includes all predictors, which can complicate interpretation, and choosing the optimal parameter can be challenging.