Data Science Workshops

Together you’ll do more with data thanks to my workshop Data Science with Python and Spark. Do you want to know more about this workshop? Curious how I can adapt it to your needs? Something else? Don’t hesitate to contact me.

Data Science with Python and Spark

Apache Spark is an open-source distributed engine for querying and processing data. In this three-day hands-on workshop, you will learn how to leverage Spark from Python to process large amounts of data.

After a presentation of the Spark architecture, we’ll begin manipulating Resilient Distributed Datasets (RDDs) and work our way up to Spark DataFrames. The concept of lazy execution is discussed in detail and we demonstrate various transformations and actions specific to RDDs and DataFrames. You’ll learn how DataFrames can be manipulated using SQL queries.

We’ll show you how to apply supervised machine learning models such as linear regression, logistic regression, decision trees, and random forests. You’ll also see unsupervised machine learning models such as PCA and K-means clustering.

By the end of this workshop, you will have a solid understanding of how to process data using PySpark and you will understand how to use Spark’s machine learning library to build and train various machine learning models.

What you’ll learn

This workshop is for you because


Day 1:

Day 2:

Day 3:


Participants are expected to be familiar with the following Python syntax and concepts:

Some experience with Pandas and SQL is useful, but not required.

Recommended preparation

Participants are kindly requested to have the following items installed prior to the start of the workshop:

More detailed installation instructions will be provided by email after signup.

I’ve previously delivered this workshop at

KPN ICT Consulting

Do you want to know more about this workshop? Curious how I can adapt it to your needs? Something else? Fill in this form or send an email to jeroen@datascienceworkshops.com, and I’ll get back to you within 24 hours.