Unsupervised Learning

Sri Dedeepya Nandamuri
3 min readNov 29, 2019

Let’s see some different algorithm which frequently used for the Unsupervised learning is K-Means Clustering and Dimensionality reduction.

K-Means Clustering:

In this algorithm, the input variables are clusters.it is the iteration process done frequently till the clusters are correctly mapped.

K-Means Clustering

From the above diagram, we randomly considered any 5 points from all the clusters.we calculate the individual middle point (or) centroid for all the 5 point and check the distance of the clusters of the centroid and the smallest distance of the clusters from the centroid are considered as one group.

if any of the cluster doesn’t match we go on allocate randomly another 5 points and calculate the individual middle point (or) centroid for all the other 5 point and check the distance of the clusters of the centroid and the smallest distance of the clusters from the centroid are considered as one group.

if stills doesn’t fit well we will continue the process till the cluster are correctly mapped.

Dimensionality Reduction:

In this algorithm ,we remove the unnecessary columns.

There are two types of Dimensionality Reduction

Let’s discuss about the Feature Selection:

Different points taken into consideration for Feature Selection

1.)Missing value Ratio:

  • we calculate the missing values
  • we keep the threshold limit as “10”
  • if the threshold is “greater than” 10 = we consider the column
  • if the threshold is “lesser than 10” = we remove the column

2.)Low Variance Filter:

  • Always the variance of the column should be greater than the threshold.
  • if the variance is less than the threshold(10) then we remove the column.

3.)High correlation filter:

  • we calculate the correlation of each input column with the output columns.
  • Pairs of columns with correlation coefficient higher than a threshold are reduced to only one.

4.)Random Forest:

  • we calculate the information gain of each column
  • if the information gain is greater than 1 =we keep the output
  • if the information gain is lesser than 1 =we remove the output

5.)Backward Feature Extraction:

  • In this,we decrease the existing columns and check the accuracy of the data if it is decreasing,”we will stop”.

6.)Forward Feature Extraction:

  • In this,we increase the columns by using existing columns of the data and check the accuracy of the data if it is increasing,”we do not stop”.

From the Dimensionality reduction,

we use the Principle common Analysis as the method.

Let’s see what it do:

Principle common Analysis:

Principal component analysis (PCA) is an important technique to understand in the fields of statistics and data.

PCA finds a new set of dimensions (or a set of basis of views) such that all the dimensions are orthogonal (and hence linearly independent) and ranked according to the variance of data along them. It means more important principle.

--

--