Detect If An Individual Has Parkinson's Disease
Parkinson’s disease is a progressive nervous system disorder that affects movement. Symptoms start gradually, sometimes starting with a barely noticeable tremor in just one hand. Tremors are common, but the disorder also commonly causes stiffness or slowing of movement. — mayoclinic.org
In this article I will show you how to use a machine learning algorithm called XGBoost to detect if an individual has Parkinsons’s disease. XGBoost stands for Extreme Gradient Boosting and is based on decision trees. It is a “newish” machine learning algorithm.
Below is some information about the data set that I will be using throughout this article.
Data Set Features:name - ASCII subject name and recording numberMDVP:Fo(Hz) - Average vocal fundamental frequencyMDVP:Fhi(Hz) - Maximum vocal fundamental frequencyMDVP:Flo(Hz) - Minimum vocal fundamental frequencyMDVP:Jitter(%),MDVP:Jitter(Abs),MDVP:RAP,MDVP:PPQ,Jitter:DDP - Severalmeasures of variation in fundamental frequencyMDVP:Shimmer,MDVP:Shimmer(dB),Shimmer:APQ3,Shimmer:APQ5,MDVP:APQ,Shimmer:DDA - Several measures of variation in amplitudeNHR,HNR - Two measures of ratio of noise to tonal components in the voicestatus - Health status of the subject (one) - Parkinson's, (zero) - healthyRPDE,D2 - Two nonlinear dynamical complexity measuresDFA - Signal fractal scaling exponentspread1,spread2,PPE - Three nonlinear measures of fundamental frequency variation
If you want even more articles on machine learning and artificial intelligence in health, be sure to check out my article, Heart Disease Detection Using Machine Learning & Python.
If you prefer not to read this article and would like a video representation of it, you can check out the YouTube Video below. It goes through everything in this article with a little more detail, and will help make it easy for you to start programming your own Machine Learning model even if you don’t have the programming language Python installed on your computer. Or you can use both as supplementary materials for learning about Machine Learning !
The first thing that I like to do before writing a single line of code is to put in a description in comments of what the code does. This way I can look back on my code and know exactly what it does.
#This program detects if an individual has of Parkinson’s disease.
Import the dependencies.
#Get the dependencies
import pandas as pd
import numpy as np
from xgboost import XGBClassifier
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split
import seaborn as sns
Load the data.
Check for any missing values in the data set.
#Check for any missing data
The result returned false, so their is no missing values in the data set.
Get the count of the number of rows and columns in the data set.
Looks like their are 195 individuals in the data set and 24 data points.
Get the count of the number of individuals with and without Parkinson’s disease.
So it looks like 147 of the individuals in have Parkinson’s disease and 48 individuals do not.
I want to know the percentage of just guessing all individuals either have Parkinson’s disease or guessing all the individuals do not have Parkinson’s disease.
print('If I guess the individual did not have Parkinson’s disease, I would be correct',48/(147+48)*100,'% of the time.')print('If I guess the individual had Parkinson’s disease, I would be correct',147/(147+48) *100, '% of the time.')
If I guess for every individual in the data set that they have the disease, then I would be correct about 75.38% of the time. If I guess for every individual in the data set that they don’t have the disease, then I would be correct about 24.62% of the time.
Visualize this count.
Get the data types in the data set.
Looks like the only column that is not already a number is the name column. I will get rid of this column as it doesn’t seem to be too important to determine if an individual has Parkinson’s disease or not.
Create the feature data set and the target data set.
#Create the feature data set
X = df.drop(['name'],1)
X = np.array(X.drop(['status'],1))#Create the target data set
y = np.array(df['status'])
Split the data into 80% training and 20% testing.
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
Transform the feature data by scaling the values.
sc = MinMaxScaler(feature_range=(0,1))
x_train = sc.fit_transform(x_train)
x_test = sc.transform(x_test)
Create the XGBClassifier model.
model = XGBClassifier().fit(x_train, y_train)
Get and print the models predictions on the testing data set.
predictions = model.predict(x_test)
Get the models accuracy, precision, recall, and f1-score.
print( classification_report(y_test, predictions) )
It looks like the model was about 90% accurate.
If you are interested in reading more on machine learning to immediately get started with problems and examples then I strongly recommend you check out Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. It is a great book for helping beginners learn how to write machine learning programs, and understanding machine learning concepts.
Thanks for reading this article I hope its helpful to you all ! If you enjoyed this article and found it helpful please leave some claps to show your appreciation. Keep up the learning, and if you like machine learning, mathematics, computer science, programming or algorithm analysis, please visit and subscribe to my YouTube channels (randerson112358 & computer science).