Skip to main content

How to prevent overfitting and underfitting

       OVERFITTING AND UNDERFITTING

Overfitting

  • It will perform well on its training data, poorly on new unseen data.

    Handling Overfitting : 

    • Cross-validation 
      • This is done by splitting your dataset into ‘test’ data and ‘train’ data. Build the model using the ‘train’ set. The ‘test’ set is used for in-time validation. This way you know what the expected output is and you will easily be able to judge the accuracy of your model.
    • Regularization 
      • This is a form of regression, that regularizes or shrinks the coefficient estimates towards zero. This technique discourages learning a more complex model.
    • Early stopping 
      • When training a learner with an iterative method, you stop the training process before the final iteration. This prevents the model from memorizing the dataset.
    • Pruning 
      • This technique applies to decision trees. 
      • Pre-pruning: Stop ‘growing’ the tree earlier before it perfectly classifies the training set. 
      • Post-pruning: Allows the tree to ‘grow’, perfectly classify the training set and then post prune the tree. 
    • Dropout 
      • This is a technique where randomly selected neurons are ignored during training
    • Regularize the weights. 
    • Removing Irrelevant input features.
    • Removing outliers or anomalies.

Underfitting

  • Underfitting is a modeling error which occurs when a function does not fit the data points well enough. It is the result of a simple model with an insufficient number of training points. A model that is under fitted is inaccurate because the trend does not reflect the reality of the data. 

    Handling Underfitting : 

  • Get more training data. 
  • Increase the size or number of parameters in the model. 
  • Increase the complexity of the model. 
  • Increasing the training time, until cost function is minimized. 
  • With these techniques, you should be able to improve your models and correct any overfitting or underfitting issues.        





Learn Data Science Material which helps to learn concepts in Python, Statistics , Data Visualization, Machine Learning , Deep Learning. And it contains Projects helps to understand the flow of building model , and what are the necessary steps should be taken depending on the data set. Interview Questions helps to crack the interview. 





Learn Python from basics to advanced. 




Join ML in python channel in telegram , Where you can learn every concepts in Python, Statistics, Data Visualization, Machine Learning, Deep Learning.

  

Join Aptitude Preparation channel in telegram , this channel helps to crack any interview.

Comments

Popular posts from this blog

Python Introduction

 Introduction  Python is developed by Guido Van Rossum and released in 1991. Python is high level, interpreted, general purpose programming language. It is one of the top five most used languages in the world. Currently there are 8.2 million developers who code in Python. Python is one of the most preferred languages in the field of Data Science and Artificial Intelligence. Key Features Python is an interpreted language, unlike compiled languages like Java, C, C++, C#, Go etc., Python codes are executed directly even before compiling.  Python is Dynamically typed, no need to mention type of variable before assigning. Python handles it without raising any error. Python codes can be executed on different software or operating systems without changing it. Python supports both Functional and Object oriented programming as it supports creating classes and objects. Python has high number of modules and frameworks support. Python is free and Open Source, which means it is availa...

Basic Concepts in time and work

                               Time and Work Time and Work is a another important concepts in aptitude. Pipes and cistern is an application of time and work. Time and Work is directly proportional , where if work increases, time also increases. Work and person is directly proportional, where if  work increases , persons also increases. Time and Person is inversely proportional , where if person increases ,time decreases. Formulas : If `A` completes work in `n` days , then the work completed in one day is 1/n th part of work. If `A` completes 1/n th work in one day, then total work completed in `n` days. Let us assume `A` completes the work in `n` days and `B` completes the work in `m` days .Let us take work as only one unit, the amount of work done in one day by `A` is 1/n similarly for `B` is 1/m. So in how many days the work is completed if they work together then ,A + B = 1/n + 1/m. Problem...

Importance of data preprocessing in machine learning

                          Data  Preprocessing Data Preprocessing is a technique that is used to convert the raw data into a clean data set. In other words, whenever the data is gathered from different sources it is collected in raw format which is not feasible for the analysis. Need of Data Preprocessing Inaccurate data There are many reasons for missing data such as data is not continuously collected, a mistake in data entry, technical problems with bio-metrics and much more. The presence of noisy data The reasons for the existence of noisy data could be a technological problem of gadget that gathers data, a human mistake during data entry and much more. Inconsistent data  The   presence of inconsistencies are due to the reasons such that existence of duplication within data, human data entry, containing mistakes in codes or names, i.e., violation of data constraints and much more. Steps Involved ...