EDA on Google Playstore apps

Sanket Chavan
4 min readFeb 12, 2021

To gain insights from the variety of apps available on Google’s Playstore

Photo by Luke Chesser on Unsplash

Exploratory Data Analysis refers to the critical process of performing initial investigations on data so as to discover patterns, to spot anomalies, to test hypothesis and to check assumptions with the help of summary statistics and graphical representations. — Prasad Patil

The dataset on which we have worked it available here.

You can find the Google Colab link here.

I used Google collaboratory because it is simple and provides a variety of features. it saves memory. Writing markdown text very much easy. And if you run into an error while executing the code, there is a feature that will allow you to rectify that by visiting the stack overflow website. You can share the link of the project with anyone and give the permissions accordingly. The essential libraries required for data analysis and model building come preinstalled.

The Process:

  • In this analysis, we will be exploring various apps available on Google Playstore
  • We will be using the pandas, numpy, matplotlib and seaborn libraries.
  • Cleaning and Preparation of data are done. This also involves variable identification, changing the types of the values etc.
  • In the end, we will be asking and answering a few questions to gain more insights.

Importing the required libraries

import pandas as pdimport numpy as npimport matplotlib.pyplot as pltimport seaborn as sns

Before we start to explore the data, we will perform some functions to get to know the data better

The dataset has some null and duplicate values. We will get rid of them using the following pandas codes

df.dropna(inplace = True)df.drop_duplicates(subset=['App'], inplace = True)

Variable identification:

  • Categorical variables
  1. Category
  2. Type
  3. Content Rating
  4. Genres
  5. Last Updated
  • Numerical variables
  1. Rating
  2. Reviews
  3. Installs
  4. Price

Data cleaning and preparations

To clean and prepare the data following important functions will be used

  • apply(): to apply the function to all the values
  • df[column].value_counts(): to know the number of values in different categories of data.
  • pd.to_numeric(): to convert the objects or strings to integer or floats
  • max(), min(), mean() etc.

Exploring the data

We will explore each variable separately and plot the graph wherever required to gain the insights.

Asking and answering questions:

We will ask some questions and try to answer them by using various pandas functions, plotting graphs and interpreting them.

which are the most reviewed free and paid apps categories?

out of the paid apps, how many are the cheapest?

Which category has the highest number of paid apps?

The category ‘Family’ has the highest number of paid apps.

Which paid app has the highest number of installs?

Apps and their categories which have device-dependent sizes.

Which genres have the highest-rated apps?

Which apps have the highest number of ratings by Teens?

Conclusions:

After this EDA, we have come to know the following

  • The majority of the apps come into these three categories
  1. Family
  2. Game
  3. Tools
  • Free apps are more than the Paid apps
  • The dominant genres are:
  1. Tools
  2. Entertainment
  3. Education
  • Most numbers of apps are rated at 4.4
  • Facebook has the highest number of reviews while there are 67 apps with only one review
  • There are 58 apps which have the highest number of installs i.e. 1,000,000,000+
  • The app “I’m Rich — Trump Edition” from the category ‘Lifestyle’ is the most costly app priced at $400
  • Following Categories have the highest average number of installs
  1. Communication
  2. Entertainment
  3. Game
  4. News and magazines
  5. Productivity
  6. Tools
  7. Video players
  8. Travels and local

References

Finally,

I hope you liked this. if you have any views and opinions, feel free to mention them. this is my first story about the EDA. i plan to learn more and improve. please share this with your friends, acquaintances and whoever else who might need to see this.

To reach me here:

Thank you and have a great day!!!!

--

--