Ridge Regression
$$ SSR_{L2}(w) = \sum_{i=1}^{n} (y_i - f(x_i))^2 + \lambda \sum_{j=1}^{d} w_j^2 $$ p116 Regularization adds a penalty term to the cost function (sum of weights coeficients).
""" Ridge (L2 Regularization)
Basis expansion implies a more complex model.
One way to decrese this complexity is by regularization.
Regularization puts constrains on the sum of weights
in order to keep the weights small.
It adds a penalty term to the cost function.
Ridge regularization uses the sum of square weights,
which penalizes large weight vectors, and is probably
the most popular regularized regression method.
Reshape the train data to prevent numerical errors (to large or to small)
By reshaping the data can be transform so that it has a mean of 0
and a standard deviation of 1
When making predictions data must be transformed using the same
PolynomialFeatures transformation that was used to preprocess the training data.
"""
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression, Ridge
import matplotlib.pyplot as plt
import numpy as np
# ---------------------------------------------------
# Dataset
X = [30, 46, 60, 65, 77, 95] # area (m^2)
y = [31, 30, 80, 49, 70, 118] # price (10,000$)
X2 = [17, 40, 55, 57, 70, 85] # test data
y2 = [19, 50, 60, 32, 90, 110]
#----------------------------------------------------
# Ridge Regression
degree_ = 4
lambda_ = 0.8
# Scale train data to prevent numerical errors
X = np.array(X).reshape(-1, 1) # any numbers of rows, one column
polyX = PolynomialFeatures(degree=degree_).fit_transform(X)
model1 = LinearRegression().fit(polyX, y)
model2 = Ridge(alpha=lambda_, solver='svd').fit(polyX, y)
print('Linear coeficients: ', sum(model1.coef_)) # -64.66185222664129
print('Ridge coeficients: ', sum(model2.coef_)) # -7.221838297484756
t_ = np.array(np.linspace(0, 100, 100)).reshape(-1, 1)
t = PolynomialFeatures(degree=degree_).fit_transform(t_)
# Predictions
x_unknown = 40
xa = np.array([x_unknown]).reshape(-1,1)
polyX = PolynomialFeatures(degree=degree_).fit_transform(xa)
ya = model1.predict(polyX) # Linear regression
ya = round(ya[0], 2)
polyX = PolynomialFeatures(degree=degree_).fit_transform(xa)
yb = model2.predict(polyX) # Ridge regression
yb = round(yb[0], 2)
# ------------------------------------------------------------------
# Plotting
# Plot train, test data and prediction line
plt.figure(figsize=(6,4))
plt.scatter(X, y, color='blue', label='Training set')
plt.scatter(X2, y2, color='red', label='Test set')
plt.title(f'{degree_}-degree polynomial / Ridge Regression')
plt.plot(t_, model1.predict(t), '--', color='gray', label='Linear regression')
plt.plot(t_, model2.predict(t), '-', color='orange', label='Ridge regression')
plt.scatter(xa, ya, color='gray', marker='x')
plt.scatter(xa, yb, color='red', marker='x')
plt.annotate(f'({xa[0][0]}) Linear, price = {ya}', (xa+0.1, ya-5))
plt.annotate(f'({xa[0][0]}) Ridge, price = {yb}', (xa+0.1, yb+5))
plt.xlabel("area (m^2)")
plt.ylabel("price (10,000$)")
plt.xlim((0, 100))
plt.ylim((0, 130))
plt.legend(loc='upper left')
plt.show()

Lasso Regression
$$ R_1(w) = { \sum_{j=1}^{d} w_j } ~ \enspace / \enspace ~ R_2(w) = { \sum_{j=1}^{d} w_j^2 } $$ p117 Lasso regression puts constrains on sum of absolute weights.
""" Lasso (L1 Regularization)
It puts a constrain on the sum of absolute weights values, it is
different from Ridge regression (L2) who uses the sum of square weights.
L1 and L2 behave the same at the extremes.
L1 shrikns many coefficients to be exactly 0, producing a sparse model,
which can be attractive in problems that benefit from features elimination.
"""
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import Ridge, Lasso
import matplotlib.pyplot as plt
import numpy as np
# ---------------------------------------------------
# Dataset
X = [30, 46, 60, 65, 77, 95] # area (m^2)
y = [31, 30, 80, 49, 70, 118] # price (10,000$)
X2 = [17, 40, 55, 57, 70, 85] # test data
y2 = [19, 50, 60, 32, 90, 110]
#----------------------------------------------------
# Lasso Regression
degree_ = 4
lambda_ = 0.8
# Scale train data to prevent numerical errors
X = np.array(X).reshape(-1, 1)
polyX = PolynomialFeatures(degree=degree_).fit_transform(X)
model1 = Ridge(alpha=lambda_, solver='svd').fit(polyX, y)
model2 = Lasso(alpha=lambda_, max_iter=1300000).fit(polyX, y) # Look Here
t_ = np.array(np.linspace(0, 100, 100)).reshape(-1, 1)
t = PolynomialFeatures(degree=degree_).fit_transform(t_)
# Predictions
x_unknown = 18
xa = np.array([x_unknown]).reshape(-1,1)
polyX = PolynomialFeatures(degree=degree_).fit_transform(xa)
ya = model1.predict(polyX) # Ridge regression
ya = round(ya[0], 2)
polyX = PolynomialFeatures(degree=degree_).fit_transform(xa)
yb = model2.predict(polyX) # Lasso regression
yb = round(yb[0], 2)
# ------------------------------------------------------------------
# Plotting
# Plot train, test data and prediction line
plt.figure(figsize=(6,4))
plt.scatter(X, y, color='blue', label='Training set')
plt.scatter(X2, y2, color='red', label='Test set')
plt.title(f'{degree_}-degree polynomial / Lasso Regression')
plt.plot(t_, model1.predict(t), '--', color='gray', label='Ridge regression')
plt.plot(t_, model2.predict(t), '-', color='orange', label='Lasso regression')
plt.scatter(xa, ya, color='gray', marker='x')
plt.scatter(xa, yb, color='red', marker='x')
plt.annotate(f'({xa[0][0]}) Ridge, price = {ya}', (xa+1.5, ya-5))
plt.annotate(f'({xa[0][0]}) Lasso, price = {yb}', (xa+1.5, yb-5))
plt.xlim((0, 100))
plt.ylim((0, 130))
plt.legend(loc='upper left')
plt.show()

➥ Questions
Last update: 48 days ago