"how numbers are stored and used in computers"

Model parameters

When a model's size is expressed in number of parameters, it refers to the total number of learnable weights in the model — the values that are adjusted during training to minimize error and improve performance. In neural networks, parameters are the weights between neurons and the biases for each neuron. Each parameter is a floating-point number, and together, they define the behavior of the model. The more parameters a model has, the more complex patterns it can potentially learn.

Simple linear regression model

A simple linear regression model with one input and one output has two parameters (a weight and a bias).

A simple linear regression model tries to model the relationship between a single input variable and an output variable by fitting a straight line , where is the predicted output, is the input, is the weight (i.e. the slope of the line), and is the bias (i.e. the y-intercept).

The model is trained by adjusting and to minimize the error between the predicted values and the actual data. The most common error metric is Mean Squared Error (MSE):

Where is the number of data points, and is the -th training example. In this model, the two parameters and are the values the model "learns" during training.

A large language model like GPT-3 has 175 billion parameters, meaning it has a huge number of internal connections it can adjust to model language patterns.

Multiple Linear Regression

A multiple linear regression model attempts to captures the relationship between an input vector (specified as ) and an output variable , by adjusting a weight vector and a bias .

Using vector notation, this might be expressed as:

Just like simple regression, we want to minimize the Mean Squared Error (MSE) across training examples, where is the -th training input, and is the true output for the -th training example:

There are weights in this model and 1 bias term, so the total number of parameters is . This number grows linearly with the number of features.

Neural networks

A fully-connected layer of a neural network with input neurons and output neurons has parameters, accounting for both weights and biases.

Trade offs

More parameters typically allow the model to represent more complex functions or patterns in data, but they also carry the burden of higher computational cost for training and inference, higher memory usage, and higher risk of overfitting if not trained properly.