Detect If An Individual Has Parkinson's Disease

Image for post
Image for post

Parkinson’s disease is a progressive nervous system disorder that affects movement. Symptoms start gradually, sometimes starting with a barely noticeable tremor in just one hand. Tremors are common, but the disorder also commonly causes stiffness or slowing of movement. — mayoclinic.org

In this article I will show you how to use a machine learning algorithm called XGBoost to detect if an individual has Parkinsons’s disease. XGBoost stands for Extreme Gradient Boosting and is based on decision trees. It is a “newish” machine learning algorithm.

Below is some information about the data set that I will be using throughout this article.

Data Set Features:

If you want even more articles on machine learning and artificial intelligence in health, be sure to check out my article, Heart Disease Detection Using Machine Learning & Python.

Image for post
Image for post
Heart Disease Detection Using Machine Learning & Python

If you prefer not to read this article and would like a video representation of it, you can check out the YouTube Video below. It goes through everything in this article with a little more detail, and will help make it easy for you to start programming your own Machine Learning model even if you don’t have the programming language Python installed on your computer. Or you can use both as supplementary materials for learning about Machine Learning !

Programming:

The first thing that I like to do before writing a single line of code is to put in a description in comments of what the code does. This way I can look back on my code and know exactly what it does.

#This program detects if an individual has of Parkinson’s disease.

Import the dependencies.

#Get the dependencies
import pandas as pd
import numpy as np
from xgboost import XGBClassifier
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split
import seaborn as sns

Load the data.

df=pd.read_csv('parkinsons.data')
df.head()
Image for post
Image for post

Check for any missing values in the data set.

#Check for any missing data
df.isnull().values.any()
Image for post
Image for post

The result returned false, so their is no missing values in the data set.

Get the count of the number of rows and columns in the data set.

df.shape
Image for post
Image for post

Looks like their are 195 individuals in the data set and 24 data points.

Get the count of the number of individuals with and without Parkinson’s disease.

df['status'].value_counts()
Image for post
Image for post

So it looks like 147 of the individuals in have Parkinson’s disease and 48 individuals do not.

I want to know the percentage of just guessing all individuals either have Parkinson’s disease or guessing all the individuals do not have Parkinson’s disease.

print('If I guess the individual did not have Parkinson’s disease, I would be correct',48/(147+48)*100,'% of the time.')
Image for post
Image for post

If I guess for every individual in the data set that they have the disease, then I would be correct about 75.38% of the time. If I guess for every individual in the data set that they don’t have the disease, then I would be correct about 24.62% of the time.

Visualize this count.

sns.countplot(df['status'],label="Count")
Image for post
Image for post

Get the data types in the data set.

df.dtypes
Image for post
Image for post

Looks like the only column that is not already a number is the name column. I will get rid of this column as it doesn’t seem to be too important to determine if an individual has Parkinson’s disease or not.

Create the feature data set and the target data set.

#Create the feature data set
X = df.drop(['name'],1)
X = np.array(X.drop(['status'],1))

Split the data into 80% training and 20% testing.

x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

Transform the feature data by scaling the values.

sc = MinMaxScaler(feature_range=(0,1))
x_train = sc.fit_transform(x_train)
x_test = sc.transform(x_test)

Create the XGBClassifier model.

model = XGBClassifier().fit(x_train, y_train)

Get and print the models predictions on the testing data set.

predictions = model.predict(x_test)
predictions
Image for post
Image for post

Get the models accuracy, precision, recall, and f1-score.

print( classification_report(y_test, predictions) )
Image for post
Image for post

It looks like the model was about 90% accurate.

If you are interested in reading more on machine learning to immediately get started with problems and examples then I strongly recommend you check out Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. It is a great book for helping beginners learn how to write machine learning programs, and understanding machine learning concepts.

Image for post
Image for post

Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems

Thanks for reading this article I hope its helpful to you all ! If you enjoyed this article and found it helpful please leave some claps to show your appreciation. Keep up the learning, and if you like machine learning, mathematics, computer science, programming or algorithm analysis, please visit and subscribe to my YouTube channels (randerson112358 & computer science).

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store