# K-mean clustering

ARTH — Task 42

Task Description 📄

📌 Create a blog/article/video about explaining k-mean clustering and its real use-case in the security domain

Let’s start with the task :

**Firstly , What is mean by k-mean clustering ?**

K-means clustering is one of the simplest and popular unsupervised machine learning algorithms. In other words, the K-means algorithm **identifies k number of centroids**, and then allocates every data point to the nearest cluster, while keeping the centroids as small as possible.

# How does the K-Means Algorithm Work?

The working of the K-Means algorithm is explained in the below steps:

Step-1:Select the number K to decide the number of clusters.

Step-2:Select random K points or centroids. (It can be other from the input dataset).

Step-3:Assign each data point to their closest centroid, which will form the predefined K clusters.

Step-4:Calculate the variance and place a new centroid of each cluster.

Step-5:Repeat the third steps, which means reassign each datapoint to the new closest centroid of each cluster.

Step-6:If any reassignment occurs, then go to step-4 else go to FINISH.

Step-7: The model is ready.

# K-mean Clustering Use-case in Security:

Cyber profiling is the process of collecting data from individuals and groups to identify significant co-relations. the idea of cyber profiling is derived from criminal profiles, which provide information on the investigation division to classify the types of criminals who were at the crime scene. here is an interesting white paper on how to cyber-profile users in an academic environment based on user data preferences.

Through K-mean Clustering algorithms, the data can be grouped by the number of websites visited. This grouping aims to see what the user frequently accesses websites.

Clustering analysis is one of the most useful methods for the acquisition of knowledge and is used to find clusters that are a fundamental and important pattern for the distribution of the data itself

# Advantages of K-means

- It is very simple to implement.
- It is scalable to a huge data set and also faster to large datasets.
- it adapts the new examples very frequently.
- Generalization of clusters for different shapes and sizes.

# Disadvantages of K-means

- It is sensitive to the outliers.
- Choosing the k values manually is a tough job.
- As the number of dimensions increases its scalability decreases.