Heart Disease Detection Using Machine Learning & Python

Image for post
Image for post

The term “heart disease” is often used interchangeably with the term “cardiovascular disease.” Cardiovascular disease generally refers to conditions that involve narrowed or blocked blood vessels that can lead to a heart attack, chest pain (angina) or stroke. Other heart conditions, such as those that affect your heart’s muscle, valves or rhythm, also are considered forms of heart disease.

Diseases under the heart disease umbrella include blood vessel diseases, such as coronary artery disease; heart rhythm problems (arrhythmias); and heart defects you’re born with (congenital heart defects), among others. Many forms of heart disease can be prevented or treated with healthy lifestyle choices. — Mayo Clinic

In this article I will show you how to create a program in Python to detect if a person has a cardiovascular disease or not. Information about the data set that I will be using throughout this article and program can be found below.

Age | age | int (days)|Height | height | int (cm) |Weight | weight | float (kg) |Gender | gender | categorical code |Systolic blood pressure | ap_hi | int |Diastolic blood pressure | ap_lo | int |Cholesterol | cholesterol | 1: normal, 2: above normal, 3: well above normal |Glucose | gluc | 1: normal, 2: above normal, 3: well above normal |Smoking | smoke | binary |Alcohol intake | alco | binary |Physical activity | active | binary |Presence or absence of cardiovascular disease | cardio | binary |

If you prefer not to read this article and would like a video representation of it, you can check out the YouTube Video . It goes through everything in this article with a little more detail, and will help make it easy for you to start programming your own Machine Learning model even if you don’t have the programming language Python installed on your computer. Or you can use both as supplementary materials for learning about Machine Learning !

Programming:

The first thing that I like to do before writing a single line of code is to put in a description in comments of what the code does. This way I can look back on my code and know exactly what it does.

#This program classifies a person as having a cardiovascular disease (1) or not (0)#So the target class "cardio" equals 1, when the patient has cardiovascular disease, and it's 0, when the patient is healthy.

Import the libraries.

Load the data set.

Store the data into a variable and print the data.

Image for post
Image for post
The First 7 Rows Of Data

Get the number of rows and columns.

Image for post
Image for post
70000 rows and 13 columns

Count the number of empty values in each column.

Image for post
Image for post

Here is another way to check if your data set contains any nulls values.

Get some statistics on the data.

df.describe()
Image for post
Image for post

Get a count of the number of individuals with a cardiovascular disease and the number of individuals without a cardiovascular disease.

df['cardio'].value_counts()
Image for post
Image for post

Visualize the number of individuals with a cardiovascular disease and the number of individuals without a cardiovascular disease.

Image for post
Image for post

Let’s look at the number of people with a Cardio Vascular Disease that exceed the number of people without a Cardio Vascular Disease?

#Create a years column
df['years'] = ( df['age'] / 365).round(0) #Get the years by dividing the age in days by 365
df["years"] = pd.to_numeric(df["years"],downcast='integer') # Convert years to an integer#Visualize the data
#colorblind palette for colorblindness
sns.countplot(x='years', hue='cardio', data = df, palette="colorblind", edgecolor=sns.color_palette("dark", n_colors = 1));
Image for post
Image for post

Get the correlation of the columns.

Image for post
Image for post

Visualize the correlation.

Image for post
Image for post

Remove or drop the years column and the id column.

#Remove or drop the id column
df = df.drop('id', axis=1)

Split the data into feature data and target data.

Split the data again, into 75% training data set and 25% testing data set.

from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size= 0.25, random_state = 1)

Scale the values in the data to be values between 0 and 1 inclusive.

Create the machine learning model called a Random Forest Classifier.

Test the models accuracy on the training data.

Image for post
Image for post

The model was about 97.99% accurate on the training data.
Test the models accuracy on the test data set by creating a confusion matrix and then using the confusion matrix to compute the accuracy score.

Print the confusion matrix and the accuracy to the screen.

TN = cm[0][0]
TP = cm[1][1]
FN = cm[1][0]
FP = cm[0][1]
#Print the confusion matrix
print(cm)
#Print the models accuracy on the test data
print('Model Test Accuracy = {}'. format( (TP + TN)/ (TP +TN + FN + FP) ) )
Image for post
Image for post

The model was 70.2% accurate on the test data. This is okay, but when it comes to individuals and their health, you would want to get a much higher accuracy score than that.

With some more tweaking of this program maybe it is possible to get a higher accuracy score !

If you are interested in reading more on machine learning to immediately get started with problems and examples then I strongly recommend you check out Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. It is a great book for helping beginners learn how to write machine learning programs, and understanding machine learning concepts.

Image for post
Image for post

Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems

Thanks for reading this article I hope its helpful to you all ! If you enjoyed this article and found it helpful please leave some claps to show your appreciation. Keep up the learning, and if you like machine learning, mathematics, computer science, programming or algorithm analysis, please visit and subscribe to my YouTube channels (randerson112358 & compsci112358 ).

Image for post
Image for post

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store