Heart Disease Detection Using Machine Learning & Python

Image for post
Image for post

The term “heart disease” is often used interchangeably with the term “cardiovascular disease.” Cardiovascular disease generally refers to conditions that involve narrowed or blocked blood vessels that can lead to a heart attack, chest pain (angina) or stroke. Other heart conditions, such as those that affect your heart’s muscle, valves or rhythm, also are considered forms of heart disease.

Diseases under the heart disease umbrella include blood vessel diseases, such as coronary artery disease; heart rhythm problems (arrhythmias); and heart defects you’re born with (congenital heart defects), among others. Many forms of heart disease can be prevented or treated with healthy lifestyle choices. — Mayo Clinic

In this article I will show you how to create a program in Python to detect if a person has a cardiovascular disease or not. Information about the data set that I will be using throughout this article and program can be found below.

Data Set Features:

If you prefer not to read this article and would like a video representation of it, you can check out the YouTube Video . It goes through everything in this article with a little more detail, and will help make it easy for you to start programming your own Machine Learning model even if you don’t have the programming language Python installed on your computer. Or you can use both as supplementary materials for learning about Machine Learning !

Programming:

The first thing that I like to do before writing a single line of code is to put in a description in comments of what the code does. This way I can look back on my code and know exactly what it does.

#Description:

Import the libraries.

#Import Libraries
import numpy as np
import pandas as pd
import seaborn as sns

Load the data set.

#Load the data
from google.colab import files # Use to load data on Google Colab
uploaded = files.upload() # Use to load data on Google Colab

Store the data into a variable and print the data.

#Store the data into the df variable
df = pd.read_csv('cardio.csv')
df.head(7) #Print the first 7 rows
Image for post
Image for post
The First 7 Rows Of Data

Get the number of rows and columns.

#Get the shape of the data (the number of rows & columns)
df.shape
Image for post
Image for post
70000 rows and 13 columns

Count the number of empty values in each column.

#Count the empty (NaN, NAN, na) values in each column
df.isna().sum()
Image for post
Image for post

Here is another way to check if your data set contains any nulls values.

#Another check for any null / missing values
df.isnull().values.any()

Get some statistics on the data.

#View some basic statistical details like percentile, mean, standard deviation etc.
Image for post
Image for post

Get a count of the number of individuals with a cardiovascular disease and the number of individuals without a cardiovascular disease.

#Get a count of the number of patients with (1) and without (0) a cardiovasculer disease
Image for post
Image for post

Visualize the number of individuals with a cardiovascular disease and the number of individuals without a cardiovascular disease.

#Visualize this count
sns.countplot(df['cardio'])
Image for post
Image for post

Let’s look at the number of people with a Cardio Vascular Disease that exceed the number of people without a Cardio Vascular Disease?

# Let's look at  the number of people with a Cardio Vascular Disease that exceed
#the number of people without a Cardio Vascular Disease?
Image for post
Image for post

Get the correlation of the columns.

#Get the correlation of the columns
df.corr()
Image for post
Image for post

Visualize the correlation.

#Visualize the correlation
import matplotlib.pyplot as plt
plt.figure(figsize=(7,7)) #7in by 7in
sns.heatmap(df.corr(), annot=True, fmt='.0%')
Image for post
Image for post

Remove or drop the years column and the id column.

# Remove or drop the years column
df = df.drop('years', axis=1)

Split the data into feature data and target data.

#Split the data into feature data and target data
X = df.iloc[:, :-1].values
Y = df.iloc[:, -1].values

Split the data again, into 75% training data set and 25% testing data set.

#Split the data again, into 75% training data set and 25% testing data set

Scale the values in the data to be values between 0 and 1 inclusive.

#Feature Scaling
#Scale the values in the data to be values between 0 and 1 inclusive
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

Create the machine learning model called a Random Forest Classifier.

# Use Random Forest Classifier
from sklearn.ensemble import RandomForestClassifier
forest = RandomForestClassifier(n_estimators = 10, criterion = 'entropy', random_state = 1)
forest.fit(X_train, Y_train)

Test the models accuracy on the training data.

#Test the models accuracy on the trainingg data set
model = forest
model.score(X_train, Y_train)
Image for post
Image for post

The model was about 97.99% accurate on the training data.
Test the models accuracy on the test data set by creating a confusion matrix and then using the confusion matrix to compute the accuracy score.

Print the confusion matrix and the accuracy to the screen.

#Test the models accuracy on the test data set
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(Y_test, model.predict(X_test))
Image for post
Image for post

The model was 70.2% accurate on the test data. This is okay, but when it comes to individuals and their health, you would want to get a much higher accuracy score than that.

With some more tweaking of this program maybe it is possible to get a higher accuracy score !

If you are interested in reading more on machine learning to immediately get started with problems and examples then I strongly recommend you check out Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. It is a great book for helping beginners learn how to write machine learning programs, and understanding machine learning concepts.

Image for post
Image for post

Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems

Thanks for reading this article I hope its helpful to you all ! If you enjoyed this article and found it helpful please leave some claps to show your appreciation. Keep up the learning, and if you like machine learning, mathematics, computer science, programming or algorithm analysis, please visit and subscribe to my YouTube channels (randerson112358 & compsci112358 ).

Image for post
Image for post

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store