How to get descriptive statistics of a Pandas DataFrame?

This recipe helps you get descriptive statistics of a Pandas DataFrame

Recipe Objective

Before making a model we need to analyse the data and for that we need to calculate different statics of the features.

This is the data science python source code does the following
1. Creates data dictionary and converts it into pandas dataframe
2. Uses describe function on dataframe
3. Performs statistical analysis on the dataset

So this is the recipe on how we can get descriptive statistics of a Pandas DataFrame

Master the Art of Data Cleaning in Machine Learning

Step 1 - Import the library

import pandas as pd

We have imported pandas which will be need for the dataset.

Step 2 - Setting up the Data

We have created a dictionary of data and passed it in pd.DataFrame to make a dataframe with columns 'first_name', 'last_name', 'age', 'Comedy_Score' and 'Rating_Score'. raw_data = {'first_name': ['Sheldon', 'Raj', 'Leonard', 'Howard', 'Amy'], 'last_name': ['Copper', 'Koothrappali', 'Hofstadter', 'Wolowitz', 'Fowler'], 'age': [42, 38, 36, 41, 35], 'Comedy_Score': [9, 7, 8, 8, 5], 'Rating_Score': [25, 25, 49, 62, 70]} df = pd.DataFrame(raw_data, columns = ['first_name', 'last_name', 'age', 'Comedy_Score', 'Rating_Score']) print(df) print(df.info())

Step 3 - Finding different statistics

So we will be finding different statistic of the feature.

    • First, sum of all the ages

print(df['age'].sum())

    • Mean of Rating_Score

print(df['Rating_Score'].mean())

    • Cumulative sum of Rating_Score

print(df['Rating_Score'].cumsum())

    • Summary statistics on Rating_Score

print(df['Rating_Score'].describe())

    • Counting the number of non-NA values

print(df['Rating_Score'].count())

    • Minimum value of Rating_Score

print(df['Rating_Score'].min())

    • Maximum value of Rating_Score

print(df['Rating_Score'].max())

    • Median value of Rating_Score

print(df['Rating_Score'].median())

    • Sample variance of Rating_Score values

print(df['Rating_Score'].var())

    • Sample standard deviation of Rating_Score values

print(df['Rating_Score'].std())

    • Skewness of Rating_Score values

print(df['Rating_Score'].skew())

    • Kurtosis of Rating_Score values

print(df['Rating_Score'].kurt())

    • Correlation Matrix Of Values

print(df.corr())

    • Finally, Covariance Matrix Of Values

print(df.cov())

So the output comes as:

 first_name     last_name  age  Comedy_Score  Rating_Score
0    Sheldon        Copper   42             9            25
1        Raj  Koothrappali   38             7            25
2    Leonard    Hofstadter   36             8            49
3     Howard      Wolowitz   41             8            62
4        Amy        Fowler   35             5            70

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 5 columns):
first_name      5 non-null object
last_name       5 non-null object
age             5 non-null int64
Comedy_Score    5 non-null int64
Rating_Score    5 non-null int64
dtypes: int64(3), object(2)
memory usage: 280.0+ bytes
None

192

46.2

0     25
1     50
2     99
3    161
4    231
Name: Rating_Score, dtype: int64

count     5.000000
mean     46.200000
std      20.753313
min      25.000000
25%      25.000000
50%      49.000000
75%      62.000000
max      70.000000
Name: Rating_Score, dtype: float64

5

25

70

49.0

430.7

20.7533129885327

-0.07499061439128718

-2.6952969741807777

                   age  Comedy_Score  Rating_Score
age           1.000000      0.767579     -0.451895
Comedy_Score  0.767579      1.000000     -0.567136
Rating_Score -0.451895     -0.567136      1.000000

                age  Comedy_Score  Rating_Score
age            9.30          3.55        -28.60
Comedy_Score   3.55          2.30        -17.85
Rating_Score -28.60        -17.85        430.70

Download Materials


What Users are saying..

profile image

Savvy Sahai

Data Science Intern, Capgemini
linkedin profile url

As a student looking to break into the field of data engineering and data science, one can get really confused as to which path to take. Very few ways to do it are Google, YouTube, etc. I was one of... Read More

Relevant Projects

Learn to Build an End-to-End Machine Learning Pipeline - Part 3
This machine learning project integrates model monitoring, CI/CD practices and Amazon Sagemaker pipelines into the logistics-oriented machine learning pipeline to streamline workflow orchestration for scalable and reliable deployment of ML models in logistics.

Credit Card Default Prediction using Machine learning techniques
In this data science project, you will predict borrowers chance of defaulting on credit loans by building a credit score prediction model.

ML Model Deployment on AWS for Customer Churn Prediction
MLOps Project-Deploy Machine Learning Model to Production Python on AWS for Customer Churn Prediction

NLP Project for Multi Class Text Classification using BERT Model
In this NLP Project, you will learn how to build a multi-class text classification model using using the pre-trained BERT model.

Learn How to Build a Linear Regression Model in PyTorch
In this Machine Learning Project, you will learn how to build a simple linear regression model in PyTorch to predict the number of days subscribed.

Build a CNN Model with PyTorch for Image Classification
In this deep learning project, you will learn how to build an Image Classification Model using PyTorch CNN

Walmart Sales Forecasting Data Science Project
Data Science Project in R-Predict the sales for each department using historical markdown data from the Walmart dataset containing data of 45 Walmart stores.

Census Income Data Set Project-Predict Adult Census Income
Use the Adult Income dataset to predict whether income exceeds 50K yr based oncensus data.

NLP Project for Beginners on Text Processing and Classification
This Project Explains the Basic Text Preprocessing and How to Build a Classification Model in Python

Build a Hybrid Recommender System in Python using LightFM
In this Recommender System project, you will build a hybrid recommender system in Python using LightFM .

OSZAR »