Number Representations & States

"how numbers are stored and used in computers"

Backpropagation Gradient

  • : The loss function (e.g., cross-entropy or mean squared error).
  • : The weight matrix of the -th layer.
  • : The activations from the previous layer ().
  • : The error term (also called "delta") for the current layer , which reflects how much the output of that layer affects the loss.
  • : The transpose of the previous layer’s activations, used to align dimensions for matrix multiplication.
  • The expression represents how the loss changes with respect to the weights in a given layer—essential for updating weights during training using gradient descent.