Chronic Kidney Disease Prediction Using Python & Machine Learning

A Python Program to Detect and Classify Chronic Kidney Disease

Image for post
Image for post
Data Set Column Information:age	-	age	
bp - blood pressure
sg - specific gravity
al - albumin
su - sugar
rbc - red blood cells
pc - pus cell
pcc - pus cell clumps
ba - bacteria
bgr - blood glucose random
bu - blood urea
sc - serum creatinine
sod - sodium
pot - potassium
hemo - hemoglobin
pcv - packed cell volume
wc - white blood cell count
rc - red blood cell count
htn - hypertension
dm - diabetes mellitus
cad - coronary artery disease
appet - appetite
pe - pedal edema
ane - anemia
class - classification

Programming:

#Description: Classify patients as having chronic kidney disease 
# or not using Artificial Neural Networks
#Import Libraries
import glob
from keras.models import Sequential, load_model
import numpy as np
import pandas as pd
import keras as k
from keras.layers import Dense
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, MinMaxScaler
import matplotlib.pyplot as plt
    #load the data 
from google.colab import files #Only use for Google Colab
uploaded = files.upload() #Only use for Google Colab
df = pd.read_csv("kidney_disease.csv")

#Print the first 5 rows
df.head()
Image for post
Image for post
Fig 1: A sample of the data set
#Get the shape of the data (the number of rows & columns)
df.shape

Data Manipulation: Clean The Data

#Create a list of columns to retain
columns_to_retain = ["sg", "al", "sc", "hemo",
"pcv", "wbcc", "rbcc", "htn", "classification"]

#columns_to_retain = df.columns, Drop the columns that are not in columns_to_retain
df = df.drop([col for col in df.columns if not col in columns_to_retain], axis=1)

# Drop the rows with na or missing values
df = df.dropna(axis=0)
#Transform non-numeric columns into numerical columns
for column in df.columns:
if df[column].dtype == np.number:
continue
df[column] = LabelEncoder().fit_transform(df[column])
df.head()
Image for post
Image for post
Fig 2 : Sample of the first 5 rows of new data set

Data Manipulation: Split & Scale The Data

#Split the data
X = df.drop(["classification"], axis=1)
y = df["classification"]
#Feature Scaling
x_scaler = MinMaxScaler()
x_scaler.fit(X)
column_names = X.columns
X[column_names] = x_scaler.transform(X)
#Split the data into 80% training and 20% testing 
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size= 0.2, shuffle=True)

Build The Model (Artificial Neural Network):

#Build The model

model = Sequential()
model.add(Dense(256, input_dim=len(X.columns), kernel_initializer=k.initializers.random_normal(seed=13), activation="relu"))model.add(Dense(1, activation="hard_sigmoid"))
#Compile the model
model.compile(loss='binary_crossentropy',
optimizer='adam', metrics=['accuracy'])
#Train the model
history = model.fit(X_train, y_train,
epochs=2000,
batch_size=X_train.shape[0])
Image for post
Image for post
Fig 3: A sample of the training with the models accuracy = 99.56% and loss= .0087
#Save the model
model.save("ckd.model")
#Visualize the models accuracy and loss
plt.plot(history.history["acc"])
plt.plot(history.history["loss"])
plt.title("model accuracy & loss")
plt.ylabel("accuracy and loss")
plt.xlabel("epoch")
plt.legend(['acc', 'loss'], loc='lower right')
plt.show()
Image for post
Image for post
Fig 4: The models loss (orange) & accuracy (blue)
print("---------------------------------------------------------")
print("Shape of training data: ", X_train.shape)
print("Shape of test data : ", X_test.shape )
print("---------------------------------------------------------")
Image for post
Image for post
Fig 5: Shape of training and testing data

for model_file in glob.glob("*.model"):
print("Model file: ", model_file)
model = load_model(model_file)
pred = model.predict(X_test)
pred = [1 if y>=0.5 else 0 for y in pred] #Threshold, transforming probabilities to either 0 or 1 depending if the probability is below or above 0.5
scores = model.evaluate(X_test, y_test)
print()
print("Original : {0}".format(", ".join([str(x) for x in y_test])))
print()
print("Predicted : {0}".format(", ".join([str(x) for x in pred])))
print()
print("Scores : loss = ", scores[0], " acc = ", scores[1])
print("---------------------------------------------------------")
print()
Image for post
Image for post
Printing the model(s) output.

Conclusion and Resources

Image for post
Image for post

Other resources

Image for post
Image for post
Poly-cystic kidney disease

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store