Member-only story
Email Spam Detection Using Python & Machine Learning

Email spam, also called junk email, is unsolicited messages sent in bulk by email (spamming). The name comes from Spam luncheon meat by way of a Monty Python sketch in which Spam is ubiquitous, unavoidable, and repetitive.
In this article I will show you how to create your very own program to detect email spam using a machine learning technique called natural language processing, and the Python programming language !
If you prefer not to read this post and would like a video representation of it, you can check out the YouTube Video below. It goes through everything in this article with a little more detail and will help make it easy for you to start programming your own email spam detection program even if you don’t have the programming language Python installed on your computer. Or you can use both as supplementary materials for learning!
Programming
The first thing that I like to do before writing a single line of code is to put in a description in comments of what the code does. This way I can look back on my code and know exactly what it does.
Description: This program detects if an email is spam (1) or not (0)
Import the libraries
#Import libraries
import numpy as np
import pandas as pd
import nltk
from nltk.corpus import stopwords
import string
Load the data and print the first 5 rows.
#Load the data
#from google.colab import files # Use to load data on Google Colab
#uploaded = files.upload() # Use to load data on Google Colab
df = pd.read_csv('emails.csv')
df.head(5)

Let’s explore the data and get the number of rows & columns.
#Print the shape (Get the number of rows and cols)
df.shape