"how numbers are stored and used in computers"

Backpropagation Gradient

: The loss function (e.g., cross-entropy or mean squared error).
: The weight matrix of the -th layer.
: The activations from the previous layer ().
: The error term (also called "delta") for the current layer , which reflects how much the output of that layer affects the loss.
: The transpose of the previous layer’s activations, used to align dimensions for matrix multiplication.
The expression represents how the loss changes with respect to the weights in a given layer—essential for updating weights during training using gradient descent.