Where is the data stored in a trained ML model?

The final trained model doesn’t store the actual training data, but rather extracts patterns, relationships, and representations from it during training. How much of the training data is “stored” in a model depends on the type of model and how it’s designed, but here’s a general breakdown:

Classical Machine Learning Models (e.g., Decision Trees, SVMs):
These models don’t store the data directly but encode patterns in the form of parameters (like decision boundaries or tree splits). They remember general trends but not specific data points unless the model is overfitting.
Neural Networks:
Neural networks also don’t store raw training data. Instead, they learn and store the weights and biases in the network. These weights represent the relationships and features extracted from the data. The larger and more complex the network, the more information it can “remember” about the data’s structure, but it doesn’t store the data itself.
K-Nearest Neighbors (KNN):
This is an exception, as KNN stores the training data and uses it to make predictions by comparing new inputs to the stored data points.
Overfitting:
If a model is overfitted, it may “memorize” the training data by encoding too many details, leading to poor generalization on new data. But even in this case, it’s through weights or parameters, not direct storage of the training data.

In summary, most models do not store the actual training data but learn representations from it, except for certain models like KNN.