Skip to main content

Statistics Introduction

 

  •  Statistics is the study of the collection ,analysis, interpretation, presentation, and organisation of data.
  •  It is a way to understand the data and find the patterns in that.
 

 Terminologies in Statistics:


  •  Population is the whole contains every events in an experiments.
  •  Parameter is the characteristics of population such as population such as population mean, median etc.
  •  Sample is a subset of the population.
  •  Statistics is a characteristics of sample such as sample mean, median etc.
 

Types of Analysis or data types

 

 
 

 Numerical or Quantitative

  • Quantitative is nothing but variables are expressed in numerical terms.
  • Example : Price , income, etc.
  • Their are two types of data in numerical data type.

        Continuous Data Type:

  • A continuous data set is a quantitative data set representing a scale of measurement that can consist of numbers other than whole numbers, like decimals and fractions. 
  • Example: Height, weight, length, temperature.

        Discrete Data Type:

  • Discrete data is based on counts. Only a finite number of values is possible.
  • There is constant interval for an instance.
  • Example: No of children’s, and interval is 1 because we can’t say 1.5 like that.
 

Categorical or Qualitative

  • Qualitative is nothing but variables  represents characteristics but can’t expressed in numerical terms.
  • Example : marital status etc 
  • Their are three data types in Categorical Data Type.

        Nominal Data Type:

  • The values which is not having specific order. 
  • Example: Names , TV, fan etc.

        Ordinal Data Type:

  • The ordinal data in which the categories are ordered.
  • Example: Education Scoring Class (Fail, Pass, First Class, Second Class, Distinction) , Ageing(Young age , Middle Age , Old age) etc.

        Binary Data Type

  • Binary data is an important special case of categorical data that takes only one of two values.
  • Example: 0/1, yes/no, accept/reject.




Here a small example. 

A data frame containing columns as  name, degree, gender, performance, Experience, Promotion and three records.

Name column is an example for nominal data type because their is no specific order.
Degree column is an example for ordinal data type because each degree has some qualification has to be done.
Gender column is an example for binary data type, because here we have two values either male or female.
Performance column is an example for ordinal data type.
Experience column is an example for discrete data type where experience column is integer, no floating values.
Promotion column is an example for binary data type.




Here "data" is variable we stored data frame. 
data.info()   

It will give the information about data Frame. On Dtype column will tell data type of each column. int64,float64 tells that column is numerical data type.
object Dtype tells us that column is categorical data type.

Why data types is important?

Datatypes are an important concept because in statistical analysis we analyze continuous data differently than categorical data otherwise it would result in a wrong analysis. Therefore knowing the types of data you are dealing with, enables you to choose the correct method for analysis.


Two types of statistics:

Descriptive Statistics 

  • In Descriptive Statistics your are describing, presenting, summarizing and organizing your data.
  • It gives basic information about data helps to further proceed the data analysis.

Inferential Statistics

  •  It is about using data from sample and then making inferences about the larger population from which the sample is drawn. 
  •  The goal of the inferential statistics is to draw conclusions from a sample and generalize them to the population.
    
Descriptive Statistics in Part - 2


Learn Data Science Material which helps to learn concepts in Python, Statistics , Data Visualization, Machine Learning , Deep Learning. And it contains Projects helps to understand the flow of building model , and what are the necessary steps should be taken depending on the data set. Interview Questions helps to crack the interview. 





Learn Python from basics to advanced. 



Join ML in python channel in telegram , Where you can learn every concepts in Python, Statistics, Data Visualization, Machine Learning, Deep Learning.

  

Join Aptitude Preparation channel in telegram , this channel helps to crack any interview.


Comments

Popular posts from this blog

Python Introduction

 Introduction  Python is developed by Guido Van Rossum and released in 1991. Python is high level, interpreted, general purpose programming language. It is one of the top five most used languages in the world. Currently there are 8.2 million developers who code in Python. Python is one of the most preferred languages in the field of Data Science and Artificial Intelligence. Key Features Python is an interpreted language, unlike compiled languages like Java, C, C++, C#, Go etc., Python codes are executed directly even before compiling.  Python is Dynamically typed, no need to mention type of variable before assigning. Python handles it without raising any error. Python codes can be executed on different software or operating systems without changing it. Python supports both Functional and Object oriented programming as it supports creating classes and objects. Python has high number of modules and frameworks support. Python is free and Open Source, which means it is availa...

Percentage Problems on overall percentage change - Module [ 1 ]

                                            Module -1  Let us discuss , if the problems is based on percentage change. 1. If the salary of person increased by 10% and then decreased by 10% , what is the overall percentage change in the salary? Ans :                     Let us assume the salary of person is 100% , then it is increased by 10% so it becomes 110%. Now it is decreased by 10% from 110 that is 11, so                     110 - 11 = 99                    Initial salary is 100% , now the salary is 99% that is 1% change in the percentage.   2. If the cost price of an article is 100 , while selling he increased the cost price by 20% and then decreased 20% . what ...

Importance of data preprocessing in machine learning

                          Data  Preprocessing Data Preprocessing is a technique that is used to convert the raw data into a clean data set. In other words, whenever the data is gathered from different sources it is collected in raw format which is not feasible for the analysis. Need of Data Preprocessing Inaccurate data There are many reasons for missing data such as data is not continuously collected, a mistake in data entry, technical problems with bio-metrics and much more. The presence of noisy data The reasons for the existence of noisy data could be a technological problem of gadget that gathers data, a human mistake during data entry and much more. Inconsistent data  The   presence of inconsistencies are due to the reasons such that existence of duplication within data, human data entry, containing mistakes in codes or names, i.e., violation of data constraints and much more. Steps Involved ...