This example demonstrates how to use the KMeans Clustering Agent to group similar events into clusters.
Iris dataset is used in this example, the dataset contains four features (length and width of sepals and petals) of 50 samples of three species of Iris (Iris setosa, Iris virginica and Iris versicolor). The objective is to segment the Iris data(without labels) into clusters — 1, 2 & 3 by k-means clustering and predict the cluster values for the incoming test data.
Drag the KMeans clustering Agent onto the canvas. Link the input endpoint to the test data and the output to the printer. Save the Data Stream.
Select the Agent and click Configure. In this case, keep the default Collection.
Drag the training data file to the training property. The features grid is auto-populated from the training file data.
Tick the exclude checkbox on the Label feature as this is not needed in training and set the data type of all other features to float.
Set the number of clusters to three because our training data contain three species of flowers.
Keep the default deterministic seed to produce repeatable results. Apply the changes and save the Data Stream.
Select the arrow entering the input endpoint and click Configure.
Map the test data attribute to each training data attribute.
Apply the changes, save the Data Stream and publish it.
Let's look at the Live Data View. Observe the Predicted Label column of the printed events. The incoming test data has a cluster value appended - based on the four features (length and width of sepals and petals).
The value assigned can also be confirmed by comparing the Label and Predicted Label columns.