Funded by the European Union. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Education and Culture Executive Agency (EACEA). Neither the European Union nor EACEA can be held responsible for them.

Applets

k-means Method

K-means is a clustering method widely recognized in the literature and used to subdivide a set of data into distinct groups called clusters. The purpose of this algorithm is to assign each element of the data set to clusters so that elements within the same cluster are more similar to each other than to elements in other clusters.

The K-means process can be summarized as follows:
  1. Initialization: The algorithm begins by randomly selecting K centroids, where K is the number given by the user, which represents the number of clusters to be formed.
  2. Point Assignment: Each data element is assigned to the cluster whose centroid is closest in terms of Euclidean distance.
  3. Centroid Update: The cluster centroids are recalculated based on the elements assigned to them.
  4. Repetition: Steps 2 and 3 are repeated iteratively until convergence is achieved, that is, until there are no significant changes in the centroids or assigned elements.

In the IPB application, it is possible for the user to enter their own data by embedding a database in .xlsx or .csv formats. After data insertion, it is feasible to select 2 or 3 feature to generate clusters.

It is necessary to pay attention that the applet accepts features in numerical and categorical format, however the k-means algorithm supports only numerical variables for creating clusters. If the user does not have a database available, there is an option called example, which generates a default data set, which can be used to handle the applet's functionalities.

After reading the data set, the user must define the K input parameter and the desired number of iterations of the k-means algorithm. At the end, the algorithm returns the graphic visualization, in two or three dimensions, depen-ding on the number of features selected, where it is possible to visualize the generated clusters and the centroids, as well as their respective coordinates.

Scientific Area:
Learning

Language/Environments:
Python

Target Group:
Basic

Keywords:
Unsupervised, Clustering, Association, Dataset, k-means


Start the applet!