Classify Iris Species Using Python & Logistic Regression

Logistic Regression Python Program

Image for post
Image for post

In this article I will show you how to write a simple logistic regression program to classify an iris species as either ( virginica, setosa, or versicolor) based off of the pedal length, pedal height, sepal length, and sepal height using a machine learning algorithm called Logistic Regression.

Logistic regression is a model that uses a logistic function to model a dependent variable. Like all regression analyses, the logistic regression is a predictive analysis. Logistic regression is used to describe data and to explain the relationship between one dependent variable and one or more nominal, ordinal, interval or ratio-level independent variables.

Image for post
Image for post
A Logistic Function Graph

If you prefer not to read this article and would like a video representation of it, you can check out the YouTube video below. It goes through everything in this article with a little more detail, and will help make it easy for you to start programming your own Machine Learning model in Python. Or you can use both (this article and video) as supplementary materials for learning about Machine Learning !

Start Programming:

I will start by stating what I want this program to do. I want this program to predict/classify the iris species as either ( virginica, setosa, or versicolor) based off of the pedal length, pedal height, sepal length, and sepal height

First I will import the dependencies, that will make this program a little easier to write. I’m importing the machine learning library sklearn, seaborn, and matplotlib.

# Import the dependencies
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

Next I will load the data set from the seaborn library, store it into a variable called data, and print the first 5 rows of data.

#Load the data set
data = sns.load_dataset("iris")
data.head()
Image for post
Image for post
The First 5 Rows Of The Iris Data Set

Start preparing the training data set by storing all of the independent variables/columns/features into a variable called ‘X’, and store the independent variable/target into a variable called ‘y’.

#Prepare the training set

# X = feature values, all the columns except the last column
X = data.iloc[:, :-1]

# y = target values, last column of the data frame
y = data.iloc[:, -1]

Plot the relation of each feature / column with each species. I will use a scatter plot to show this relation. The sepal length will be blue, sepal width will be green, petal length will be red and petal width will be black.

# Plot the relation of each feature with each species

plt.xlabel('Features')
plt.ylabel('Species')

pltX = data.loc[:, 'sepal_length']
pltY = data.loc[:,'species']
plt.scatter(pltX, pltY, color='blue', label='sepal_length')

pltX = data.loc[:, 'sepal_width']
pltY = data.loc[:,'species']
plt.scatter(pltX, pltY, color='green', label='sepal_width')

pltX = data.loc[:, 'petal_length']
pltY = data.loc[:,'species']
plt.scatter(pltX, pltY, color='red', label='petal_length')

pltX = data.loc[:, 'petal_width']
pltY = data.loc[:,'species']
plt.scatter(pltX, pltY, color='black', label='petal_width')

plt.legend(loc=4, prop={'size':8})
plt.show()
Image for post
Image for post
Graph Of Each Feature Relation With Each Species

Split the data into 80% training and 20 % testing by using the method train_test_split() from the sklearn.model_selection library, and store the data into x_train, x_test, y_train, and y_test.

#Split the data into 80% training and 20% testing
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Create and train the Logistic Regression model !

#Train the model
model = LogisticRegression()
model.fit(x_train, y_train) #Training the model

Now that the model is trained, I will print the predictions and get a few metrics from the model based off of the testing data set. Based off of the metrics, it looks like the model correctly classified every species.

#Test the model
predictions = model.predict(x_test)
print(predictions)# printing predictions

print()# Printing new line

#Check precision, recall, f1-score
print( classification_report(y_test, predictions) )

print( accuracy_score(y_test, predictions))
Image for post
Image for post
Highlighted In Yellow Is The Models Prediction. Below Are The Metrics.

That is it, you are done creating your Logistic Regression program to classify iris species ! Again if you want, you can watch and listen to me explain all of the code on my YouTube video.

If you are interested in reading more on machine learning to immediately get started with problems and examples then I strongly recommend you check out Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. It is a great book for helping beginners learn how to write machine learning programs, and understanding machine learning concepts.

Image for post
Image for post
Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems

Thanks for reading this article I hope its helpful to you all ! If you enjoyed this article and found it helpful please leave some claps to show your appreciation. Keep up the learning, and if you like machine learning, mathematics, computer science, programming or algorithm analysis, please visit and subscribe to my YouTube channels (randerson112358 & compsci112358 ).

Image for post
Image for post

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store