Python Developer - pandas, NumPy, Matplotlib

Python is a high-level, interpreted, general-purpose programming language. Its design philosophy emphasizes code readability with the use of significant indentation.
Python is dynamically-typed and garbage-collected. It supports multiple programming paradigms, including structured (particularly procedural), object-oriented and functional programming. It is often described as a “batteries included” language due to its comprehensive standard library.

As a Data Analyst, Data Engineer or Data Scientist I can:

import, clean, manipulate, and visualize data with some of the most popular Python libraries, including pandas, NumPy, Matplotlib, Seaborn and many more,
build an effective data architecture, streamline data processing, and maintain large-scale data systems,
create data engineering pipelines, automate common file system tasks, and build a high-performance database.

My Python Learning Path

With Data Camp, Kaggle and others I build my skills and experience and validate my knowledge:

Python Fundamentals

Python Fundamentals (Datacamp) 15 hours (skill track ⇒ certificate)

In this track, I learned the Python basics I need to start on my programming journey, including how to clean real-world data ready for analysis, use data visualization libraries, and even how to write your own Python functions. I also learned how to store, manipulate, and explore data using NumPy, how to visualize my data using Matplotlib, manipulate DataFrames and dictionaries using pandas, and write my own functions and list comprehension.

Introduction to Python (course ⇒ certificate)

Python is a general-purpose programming language that is becoming ever more popular for data science. Companies worldwide are using Python to harvest insights from their data and gain a competitive edge. This course focused on Python specifically for data science. I learned about powerful ways to store and manipulate data, and helpful data science tools to begin conducting my own analyses.

Intermediate Python (course ⇒ certificate)

In this course I discovered how dictionaries offer an alternative to Python lists, and why the pandas dataframe is the most popular way of working with tabular data. In the second chapter of this course, I found out how I can create and manipulate datasets, and how to access them using these structures.

Python Data Science Toolbox (Part 1) (course ⇒ certificate)

In this course I learned the art of function writing in this first Python Data Science Toolbox course. I came out of this course being able to write your very own custom functions, complete with multiple parameters and multiple return values, along with default arguments and variable-length arguments. I gained insight into scoping in Python and was able to write lambda functions and handle errors in your function writing practice.

Python Data Science Toolbox (Part 2) (course ⇒ certificate)

In this second Python Data Science Toolbox course, I continued to build your Python data science skills. I learned about iterators, objects I have already encountered in the context of for loops. Then I learned about list comprehensions, which are extremely handy tools for all data scientists working in Python.

Data Manipulation with Python

Data Manipulation with Python (Datacamp) 16 hours (skill track ⇒ certificate)

Real-world data is messy. That’s why libraries like pandas are so valuable. Using pandas, I can take the pain out of data manipulation by extracting, filtering, and transforming data in DataFrames, clearing a path for quick and reliable data analysis. In this track I learned how to prepare real-world data for analysis and grow my expertise as I work with multiple DataFrames using pandas. I also gained hands-on experience of how to combine, merge, and create visualizations. I also learned all about NumPy arrays and use New York City’s tree census data to create, sort, filter, and update arrays.

Data Manipulation with pandas (course ⇒ certificate)

pandas is the world’s most popular Python library, used for everything from data manipulation to data analysis. In this course, I learned how to manipulate DataFrames, as I extract, filter, and transform real-world datasets for analysis. Using pandas I explored all the core data science concepts. I also learned how to import, clean, calculate statistics, and create visualizations – using pandas to add to the power of Python.

Reshaping Data with pandas (course ⇒ certificate)

Often data is in a human-readable format, but it’s not suitable for data analysis. This is where pandas can help—it’s a powerful tool for reshaping DataFrames into different formats. In this course, I grew my data scientist and analyst skills as I learned how to wrangle string columns and nested data contained in a DataFrame. I also learned how to reshape a DataFrame from wide to long format, stack and unstack rows and columns, and get descriptive statistics of a multi-index DataFrame.

Joining Data with pandas (course ⇒ certificate)

Being able to combine and work with multiple datasets is an essential skill for any aspiring Data Scientist. pandas is a crucial cornerstone of the Python data science ecosystem. In this course I learned how to handle multiple DataFrames by combining, organizing, joining, and reshaping them using pandas.

Introduction to NumPy (course ⇒ certificate)

NumPy is an essential Python library. TensorFlow and scikit-learn use NumPy arrays as inputs, and pandas and Matplotlib are built on top of NumPy. In this Introduction to NumPy course, I became a master wrangler of NumPy’s core object: arrays. I created, sort, filter, and update arrays. I discovered why NumPy is so efficient and use broadcasting and vectorization to make my NumPy code even faster.

Data Analyst with Python

Data Analyst with Python (Datacamp) 36 hours (skill track ⇒ certificate)

In this course I began my data analyst training with interactive exercises and get hands-on with some of the most popular Python libraries, including pandas, NumPy, Seaborn, and many more. I learned why Python for data analysis is so popular and worked with real-world datasets to grow my data manipulation and exploratory data analysis skills. I also learned key statistics skills, like hypothesis.

Introduction to Python (course ⇒ certificate)

Intermediate Python (course ⇒ certificate)

Data Manipulation with pandas (course ⇒ certificate)

pandas is the world’s most popular Python library, used for everything from data manipulation to data analysis. In this course, I learned how to manipulate DataFrames, as I extract, filter, and transform real-world datasets for analysis. Using pandas I explored all the core data science concepts. Using real-world data, including Walmart sales figures and global temperature time series, I learned how to import, clean, calculate statistics, and create visualizations—using pandas to add to the power of Python.

Joining Data with pandas (course ⇒ certificate)

pandas is a crucial cornerstone of the Python data science ecosystem, with Stack Overflow recording 5 million views for pandas questions. In this course I learned how to handle multiple DataFrames by combining, organizing, joining, and reshaping them using pandas. I worked with datasets from the World Bank and the City Of Chicago. I finished the course with a solid skillset for data-joining in pandas.

Introduction to Statistics in Python (course ⇒ certificate)

Statistics is the study of how to collect, analyze, and draw conclusions from data. It’s a hugely valuable tool that I can use to bring the future into focus and infer the answer to tons of questions. In this course, I discovered how to answer questions like these as you grow your statistical skills and learn how to calculate averages, use scatterplots to show the relationship between numeric values, and calculate correlation. I also learned how to tackle probability, the backbone of statistical reasoning, and learned how to use Python to conduct a well-designed study to draw my own conclusions from data.

Introduction to Data Visualization with Seaborn (course ⇒ certificate)

Seaborn is a powerful Python library that makes it easy to create informative and attractive data visualizations. In this course I learned how to explore this library and create Seaborn plots based on a variety of real-world data sets, including exploring how air pollution in a city changes through the day and looking at what young people like to do in their free time. This data will gave me the opportunity to find out about Seaborn’s advantages first hand, including how I can easily create subplots in a single figure and how to automatically calculate confidence intervals.

Exploratory Data Analysis in Python (course ⇒ certificate)

Exploratory data analysis is a process for exploring datasets, answering questions, and visualizing results. This course presented the tools I you need to clean and validate data, to visualize distributions and relationships between variables, and to use regression models to predict and explain. I explored data related to demographics and health, including the National Survey of Family Growth and the General Social Survey. But the methods I learned apply to all areas of science, engineering, and business. I used Pandas, a powerful library for working with data, and other core Python libraries including NumPy and SciPy, StatsModels for regression, and Matplotlib for visualization.

Sampling in Python (course ⇒ certificate)

Sampling in Python is the cornerstone of inference statistics and hypothesis testing. It’s a powerful skill used in survey analysis and experimental design to draw conclusions without surveying an entire population. In this Sampling in Python course, I discovered when to use sampling and how to perform common types of sampling—from simple random sampling to more complex methods like stratified and cluster sampling. I also learned how to estimate population statistics and quantify uncertainty in my estimates by generating sampling distributions and bootstrap distributions.

Hypothesis Testing in Python (course ⇒ certificate)

Hypothesis testing lets me answer questions about my datasets in a statistically rigorous way. In this course, I learned how and when to use common tests like t-tests, proportion tests, and chi-square tests. Working with real-world data, including Stack Overflow user feedback and supply-chain data for medical supply shipments, I learned gain a deep understanding of how these tests work and the key assumptions that underpin them. I also discovered how non-parametric tests can be used to go beyond the limitations of traditional hypothesis tests.

Data Scientist with Python

Data Scientist with Python (Datacamp) 90 hours (skill track ⇒ certificate)

In this track, I learned how Python language allows you to import, clean, manipulate, and visualize data – all integral skills for any aspiring data professional or researcher. Starting with the Python essentials for data science, I worked through interactive exercises that test your abilities. I got hands-on with some of the most popular Python libraries for data science, including pandas, Seaborn, Matplotlib, scikit-learn, and many more.

Introduction to Python (course ⇒ certificate)

Intermediate Python (course ⇒ certificate)

Data Manipulation with pandas (course ⇒ certificate)

Joining Data with pandas (course ⇒ certificate)

Introduction to Statistics in Python (course ⇒ certificate)

Introduction to Data Visualization with Matplotlib (course ⇒ certificate)

Visualizing data in plots and figures exposes the underlying patterns in the data and provides insights. Good visualizations also help me communicate my data to others, and are useful to data analysts and other consumers of the data. In this course, I learned how to use Matplotlib, a powerful Python data visualization library. Matplotlib provides the building blocks to create rich visualizations of many different kinds of datasets. I learned how to create visualizations for different kinds of data and how to customize, automate, and share these visualizations.

Introduction to Data Visualization with Seaborn (course ⇒ certificate)

Python Data Science Toolbox (Part 1) (course ⇒ certificate)

Python Data Science Toolbox (Part 2) (course ⇒ certificate)

Intermediate Data Visualization with Seaborn (course ⇒ certificate)

Seaborn is a visualization library that is an essential part of the python data science toolkit. In this course, I learned how to use seaborn’s sophisticated visualization tools to analyze multiple real world datasets including the American Housing Survey, college tuition data, and guests from the popular television series, The Daily Show. I also were able to use seaborn functions to visualize your data in several different formats and customize seaborn plots for my unique needs.

Exploratory Data Analysis in Python (course ⇒ certificate)

Working with Categorical Data in Python (course ⇒ certificate)

In this course, I learned how to manipulate and visualize categorical data using pandas and seaborn. Through hands-on exercises, I got to grips with pandas’ categorical data type, including how to create, delete, and update categorical columns. I also worked with a wide range of datasets including the characteristics of adoptable dogs, Las Vegas trip reviews, and census data to develop your skills at working with categorical data.

Others:

Intro to Machine Learning (Kaggle) 20.04.2022
Python (Kaggle) 17.03.2022
Dictionaries, Frequency Tables, and Functions in Python (Dataquest) 15.03.2022
For Loops and Conditional Statements in Python (Dataquest) 14.03.2022
Variables, Data Types, and Lists in Python (Dataquest) 14.03.2022
Python 2020 (Linkedin Learning) 06.01.2022

Python

As a Data Analyst, Data Engineer or Data Scientist I can:

My Python Learning Path

My badges:

Some articles about Programming

Unveiling the Android App Market: A Data-Driven Approach

Solving the Cosmic Mystery of Spaceship Titanic with Data Science

Navigating History and Data Science: The Titanic Kaggle Challenge

Making Predictions with Linear Regression