Data Science in practice
Introduction
In this course block and its subsequent chapters we will demonstrate how to build a machine learning model in practice. In the end, you will have a "ready-to-go" model that can predict whether a bank customer will subscribe to a term deposit.
Along the way we will explore useful functionalities of scikit-learn
, common
pitfalls in data science projects, how to properly save a model, and conclude
with a bonus section on pipelines to automate the entire modelling process.
Let's get started!
Remember the bank marketing data set that we used to explore in the Data Preparation & Preprocessing portion and then completely abandoned in the last couple of chapters? Well, it's time to bring it back!
Info
The bank marketing data was adapted from:
S. Moro, P. Cortez and P. Rita (2014). A Data-Driven Approach to Predict the Success of Bank Telemarketing1
The publicly available dataset is from a Portuguese retail bank and houses information on direct marketing campaigns (phone calls). Bank customers were contacted and asked to subscribe to a term deposit.
Prerequisites
0.
What's our goal?
First, let's define the end goal:
Build a machine learning model that can predict whether a bank customer will subscribe to a term deposit.
Tip
Put simply, a term deposit is a type of bank account where you agree to lock away your money for a fixed period of time (the "term") in exchange for a guaranteed interest rate that's typically higher than a regular savings account.
Using information such as clients' demographic details, economic indicators, and marketing campaign data, we aim to solve this binary classification task.
Before we dive in, you have to set up the project which will be used throughout the remainder of this course.
1.
Project structure
Start with creating the following project structure:
2.
Download data
Danger
Since we want to make sure that everyone uses the same initial data set,
we urge you to re-download it and place it within your data/
folder.
3.
Virtual environment
Create a virtual environment. Now, you should have the following structure:
Be sure to activate the environment!
4.
Install packages
Install the necessary packages - pandas
and scikit-learn
.
-
Decision Support Systems, Volume 62, June 2014, Pages 22-31: https://doi.org/10.1016/j.dss.2014.03.001 ↩