Unveiling the Android App Market: A Data-Driven Approach

Android App Market

Mobile apps are everywhere. They are easy to create and can be lucrative. Because of these two factors, more and more apps are being developed. In this project, I did a comprehensive analysis of the Android app market by comparing over ten thousand apps in Google Play across different categories. I looked for insights in the data to devise strategies to drive growth and retention. The data for this project was scraped from the Google Play website. While there are many popular datasets for Apple App Store, there aren’t many for Google Play apps, which is partially due to the increased difficulty in scraping the latter as compared to the former. The data files were as follows:

  • apps.csv: contains all the details of the apps on Google Play. These are the features that describe an app.
  • user_reviews.csv: contains 100 reviews for each app, most helpful first. The text in each review has been pre-processed, passed through a sentiment analyzer engine and tagged with its sentiment score.

In the ever-evolving world of mobile applications, the Google Play Store stands as a colossal hub of activity with over a billion active users. My expedition into this digital marketplace involved an in-depth analysis of over ten thousand apps. Here’s a task-by-task breakdown of what I uncovered:

1. Data Preparation and Cleaning

My journey began with importing data and meticulously cleaning it. I addressed duplicate entries and prepared my dataset for a thorough analysis. This step was crucial to ensure the accuracy of our findings.

Clean data leads to trustworthy analyses, essential in a data-rich environment like the Google Play Store.

2. Refining the Data

I observed that entries in the ‘Installs’ and ‘Price’ columns were hindered by special characters. By stripping these characters, we transformed the data into a purely numeric format, paving the way for more sophisticated numerical analysis.

Precision in data formatting directly influences the feasibility of quantitative analysis.

3. Converting Data Types

Despite cleansing, ‘Installs’ and ‘Price’ remained non-numeric. A conversion to float types was necessary to enable precise calculations, a task I achieved using Pandas’ astute data type conversion capabilities.

The transformation of ‘Installs‘ and ‘Price’ into float types was imperative for numerical calculations, underscoring the importance of correct data types in analytical processes.

Accurate data typing is pivotal in enabling complex, numerical computations.

4. Exploring App Categories

With 33 unique categories, I noticed a dominance of Family and Game apps. It was intriguing to find that categories like Tools, Business, and Medical apps also had a significant presence, shedding light on diverse user preferences.

This diversity reflects the varied interests of the global user base.

The app market is diverse, with certain categories like Family and Games leading in popularity.

5. Distribution of App Ratings

My analysis revealed that the average app rating stood at a commendable 4.17, with a left-skewed distribution. This finding indicates a market with high-quality standards, where most apps are well-received by users.

The prevalence of high ratings suggests a market standard where quality is typically well-maintained.

6. The Size and Price of Apps

I examined how app size and price affect download rates. My findings highlighted that top-rated apps usually ranged from 2 MB to 20 MB and were priced under $10, suggesting that users favor affordable, lightweight apps.

Efficiency in app size and pricing is key to app popularity.

7. The Pricey Side of Apps

Not all apps follow conventional pricing strategies. A strip plot analysis revealed that categories like Medical and Family housed more expensive apps, some even stretching up to $80, highlighting the niche but essential nature of these applications.

App pricing strategies vary greatly across categories, reflecting the specific utility and target audience of different apps.

8. Filtering Out 'Junk' Apps

This project is more than a showcase of data science skills; it’s a testament to how technology can be harnessed to solve not just earthly challenges but cosmic mysteries too. As I continue to reach for the stars, the interplay between technology and human ingenuity will remain my greatest asset in navigating the unknown frontiers of space.

9. Paid vs. Free Apps: The Download Dilemma

My study showed that free apps have a higher download count compared to paid apps. However, the difference wasn’t as stark as anticipated, indicating a market where quality could sometimes trump cost.

Both free and paid apps have their own market, with free apps leading in volume but paid apps holding their ground in niche segments.

10. Sentiment Analysis of User Reviews

Finally, I delved into user reviews through sentiment analysis. Interestingly, I found that free apps often received harsher feedback, while paid apps seldom faced extreme negativity. This could hint at a correlation between app quality and pricing.

User sentiment tends to be more favorable towards paid apps, suggesting a perceived higher quality or value.


My journey through the Google Play Store’s dataset has been a revealing venture into the dynamics of the app market. From app categories and user ratings to pricing strategies and user sentiment, the insights gleaned are invaluable for developers and businesses aiming to thrive in this bustling digital ecosystem.

The Android app market is a landscape of diverse strategies, user preferences, and business models. From my analysis, it’s evident that while free apps dominate in volume, paid apps maintain a stronghold in quality. Understanding these nuances is key for any aspiring developer or company looking to make a mark in this competitive arena.

My journey through the data of the Google Play Store has been enlightening, offering valuable insights for anyone interested in the dynamics of the app market. As the digital world continues to evolve, so will the trends and patterns in this vibrant marketplace.

Take a look behind the scenes of my daily work, discover interesting facts from the world of data and the latest news about ITG