"how numbers are stored and used in computers"
Regularization improves generalization by constraining the hypothesis space. It's especially critical in overparameterized settings, where many different parameter configurations can perfectly fit the training data, but only a subset of them will generalize well.
Regularization refers to a set of techniques used to prevent overfitting by adding a penalty term to the loss function. This discourages the model from fitting the training data too closely by controlling the complexity of the model, particularly the size of its parameters.
Given a dataset of
With regularization, we modify the loss to include a penalty term
Where
Penalizes large weights and encourages smaller, more evenly distributed values.
Promotes sparsity by encouraging some weights to become exactly zero.
Promotes redundancy and robustness by randomly setting weights to zero during training.
Limits how long the model can optimize itself on the training data.
The addition of a regularization term modifies the gradient during optimization. For example, in gradient descent with L2 regularization, the weight update becomes:
Here, the term