OVERFITTING AND UNDERFITTING
Overfitting
- It will perform well on its training data, poorly on new unseen data.
Handling Overfitting :
- Cross-validation
- This is done by splitting your dataset into ‘test’ data and ‘train’ data. Build the model using the
‘train’ set. The ‘test’ set is used for in-time validation. This way you know what the expected
output is and you will easily be able to judge the accuracy of your model.
- Regularization
- This is a form of regression, that regularizes or shrinks the coefficient estimates towards zero.
This technique discourages learning a more complex model.
- Early stopping
- When training a learner with an iterative method, you stop the training process before the final
iteration. This prevents the model from memorizing the dataset.
- Pruning
- This technique applies to decision trees.
- Pre-pruning: Stop ‘growing’ the tree earlier before it perfectly classifies the training set.
- Post-pruning: Allows the tree to ‘grow’, perfectly classify the training set and then post prune the
tree.
- Dropout
- This is a technique where randomly selected neurons are ignored during training
- Regularize the weights.
- Removing Irrelevant input features.
- Removing outliers or anomalies.
- Cross-validation
- This is done by splitting your dataset into ‘test’ data and ‘train’ data. Build the model using the ‘train’ set. The ‘test’ set is used for in-time validation. This way you know what the expected output is and you will easily be able to judge the accuracy of your model.
- Regularization
- This is a form of regression, that regularizes or shrinks the coefficient estimates towards zero. This technique discourages learning a more complex model.
- Early stopping
- When training a learner with an iterative method, you stop the training process before the final iteration. This prevents the model from memorizing the dataset.
- Pruning
- This technique applies to decision trees.
- Pre-pruning: Stop ‘growing’ the tree earlier before it perfectly classifies the training set.
- Post-pruning: Allows the tree to ‘grow’, perfectly classify the training set and then post prune the tree.
- Dropout
- This is a technique where randomly selected neurons are ignored during training
- Regularize the weights.
- Removing Irrelevant input features.
- Removing outliers or anomalies.
Underfitting
- Underfitting is a modeling error which occurs when a function does not fit the data points well enough. It is the result of a simple model with an insufficient number of training points. A model that is under fitted is inaccurate because the trend does not reflect the reality of the data.
Handling Underfitting :
- Get more training data.
- Increase the size or number of parameters in the model.
- Increase the complexity of the model.
- Increasing the training time, until cost function is minimized.
- With these techniques, you should be able to improve your models and correct any overfitting or underfitting issues.
Comments
Post a Comment