Member-only story
NBA Data Analysis Using Python & Machine Learning
Explore NBA Basketball Data Using KMeans Clustering

In this article I will show you how to explore data and use the unsupervised machine learning algorithm called KMeans to cluster / group NBA players. The code will explore the NBA players from 2013–2014 basketball season and use KMeans to group them in clusters to show which players are most similar.
K-Means is one of the most popular “clustering” algorithms. K-means stores ‘k’ centroids that it uses to define clusters. A point is considered to be in a particular cluster if it is closer to that cluster’s centroid than any other centroid. K-Means finds the best centroids by alternating between (1) assigning data points to clusters based on the current centroids (2) chosing centroids (points which are the center of a cluster) based on the current assignment of data points to clusters. — Chris Piech[1]

The KMeans algorithm will categorize the items into k groups of similarity. To calculate that similarity, we will use the euclidean distance as measurement.
The K-Means algorithm works as follows:
- First we initialize k points, called means, randomly.
- We categorize each item to its closest mean and we update the mean’s coordinates, which are the averages of the items categorized in that mean so far.
- We repeat the process for a given number of iterations and at the end, we have our clusters.
The “points” mentioned above are called means, because they hold the mean values of the items categorized in it. -geeksforgeeks[2]
If you prefer not to read this article and would like a video representation of it, you can check out the YouTube video below. It goes through everything in this article with a little more detail, and will help make it easy for you to start programming your own Machine Learning model in Python. Or you can use both (this article and video) as supplementary materials for learning about Machine Learning !