Data Engineer

Data Engineer is responsible for laying the foundations for the acquisition, storage, transformation, and management of data in an organization. Data Engineer in charge of developing and maintaining the database architecture and data processing systems. This infrastructure is key to ensure that the development of high-level data applications, such as data visualization, and the deployment of machine learning models is carried out in a seamless, secure, and effective way.

I am a certified DataCamp Data Analyst Associate. As a Data Analyst Associate I can demonstrate that I have the knowledge, skills, and abilities to succeed at the entry level in this role. The competency domains assessed included, but were not limited to:

  • Data Management
  • Exploratory Analysis
  • Statistical Experimentation
  • Communication and Visualization


As a Data Engineer I can:

  • working on data mapping, data integrations, and ingestion, data processing and data automation,
  • designing and building pipelines to run asynchronous data processing jobs triggered from a user interface request,
  • handling multiple GCP, AWS or Azure services,
  • building, improving and testing code that moves and manipulates data coming from disparate sources, including massive log and event streams, SQL databases, and online, API-based services,
  • designing and implementing data storage,
  • designing and developing data processing,
  • monitoring and optimizing data storage and data processing.
Microsoft Azure

My Data Engineering Learning Path

With Data CampMicrosoft Learn and Coursera I build my skills and experience and validate my knowledge:

Data Engineer (Datacamp) 57 hours (course) (award)

In this course I grew my language skills as I work with Shell, SQL, and Scala, to create data engineering pipelines, automate common file system tasks, and build a high-performance database. Through hands-on exercises, I added cloud and big data tools, such as AWS Boto, PySpark, Spark SQL, and MongoDB, to my data engineering toolkit to help me create and query databases, wrangle data, and configure schedules to run my pipelines. By the end of this track, I had mastered the critical database, scripting, and process skills I need to progress my career and have a firm grasp of Python for data engineering.

  • Understanding Data Engineering (award)

In this course, I learned about a data engineer’s core responsibilities, how they differ from data scientists, and facilitate the flow of data through an organization. Through hands-on exercises I followed Spotflix, a fictional music streaming company, to understand how their data engineers collect, clean, and catalog their data. By the end of the course, I understood what my company’s data engineers do, be ready to have a conversation with a data engineer, and have a solid foundation to start your own data engineer journey.

  • Introduction to Data Engineering (award)

In this course I started exploring the world of data engineering. It’s perfect for people who work at a company with several data sources and don’t have a clear idea of how to use all those data sources in a scalable way.

  • Data Manipulation with pandas (award)

pandas is the world’s most popular Python library, used for everything from data manipulation to data analysis. In this course, I learned how to manipulate DataFrames, as I extract, filter, and transform real-world datasets for analysis. Using pandas I explored all the core data science concepts. Using real-world data, including Walmart sales figures and global temperature time series, I learned how to import, clean, calculate statistics, and create visualizations—using pandas to add to the power of Python.

  • Joining Data with pandas (award)

pandas is a crucial cornerstone of the Python data science ecosystem, with Stack Overflow recording 5 million views for pandas questions. In this course I learned how to handle multiple DataFrames by combining, organizing, joining, and reshaping them using pandas. I worked with datasets from the World Bank and the City Of Chicago. I finished the course with a solid skillset for data-joining in pandas.

  • Introduction to Statistics in Python (award)

Statistics is the study of how to collect, analyze, and draw conclusions from data. It’s a hugely valuable tool that I can use to bring the future into focus and infer the answer to tons of questions. In this course, I discovered how to answer questions like these as you grow your statistical skills and learn how to calculate averages, use scatterplots to show the relationship between numeric values, and calculate correlation. I also learned how to tackle probability, the backbone of statistical reasoning, and learned how to use Python to conduct a well-designed study to draw my own conclusions from data.

  • Introduction to Data Visualization with Seaborn (award)

Seaborn is a powerful Python library that makes it easy to create informative and attractive data visualizations. In this course I learned how to explore this library and create Seaborn plots based on a variety of real-world data sets, including exploring how air pollution in a city changes through the day and looking at what young people like to do in their free time. This data will gave me the opportunity to find out about Seaborn’s advantages first hand, including how I can easily create subplots in a single figure and how to automatically calculate confidence intervals.

  • Exploratory Data Analysis in Python (award)

Exploratory data analysis is a process for exploring datasets, answering questions, and visualizing results. This course presented the tools I you need to clean and validate data, to visualize distributions and relationships between variables, and to use regression models to predict and explain. I explored data related to demographics and health, including the National Survey of Family Growth and the General Social Survey. But the methods I learned apply to all areas of science, engineering, and business. I used Pandas, a powerful library for working with data, and other core Python libraries including NumPy and SciPy, StatsModels for regression, and Matplotlib for visualization.

  • Sampling in Python (award)

Sampling in Python is the cornerstone of inference statistics and hypothesis testing. It’s a powerful skill used in survey analysis and experimental design to draw conclusions without surveying an entire population. In this Sampling in Python course, I discovered when to use sampling and how to perform common types of sampling—from simple random sampling to more complex methods like stratified and cluster sampling. I also learned how to estimate population statistics and quantify uncertainty in my estimates by generating sampling distributions and bootstrap distributions.

  • Hypothesis Testing in Python (award)

Hypothesis testing lets me answer questions about my datasets in a statistically rigorous way. In this course, I learned how and when to use common tests like t-tests, proportion tests, and chi-square tests. Working with real-world data, including Stack Overflow user feedback and supply-chain data for medical supply shipments, I learned gain a deep understanding of how these tests work and the key assumptions that underpin them. I also discovered how non-parametric tests can be used to go beyond the limitations of traditional hypothesis tests.

Microsoft Azure Data Engineering Associate (Microsoft, Coursera) 113 hours (course) (certificate)

This Professional Certificate is intended for data engineers and developers who want to demonstrate their expertise in designing and implementing data solutions that use Microsoft Azure data services.

This Professional Certificate helped me develop expertise in designing and implementing data solutions that use Microsoft Azure data services. I learned how to integrate, transform, and consolidate data from various structured and unstructured data systems into structures that are suitable for building analytics solutions that use Microsoft Azure data services.

⇒ Verify at: Coursera

My badges:

Some articles about Data Engineering: