Changelog:
 12	
  Dec	
  2016	
  
                                                                    	
                                                                      	
                 Advantages	
  &	
  Disadvantages	
  of	
  	
  
           k-‐Means	
  and	
  Hierarchical	
  clustering	
  
                  (Unsupervised	
  Learning)	
                                            	
                                            	
  
                Machine	
  Learning	
  for	
  Language	
  Technology	
  
                                ML4LT	
  (2016)	
  
                                Marina	
  San(ni	
  
                 Department	
  of	
  LinguisHcs	
  and	
  Philology	
  
                              Uppsala	
  University	
                                            	
  
2016	
                                                              	
  
                         Advantages	
  &	
  Disadvantages	
  of	
  k-‐Means	
  and	
  Hierarchical	
  Clustering	
     1	
  
Outline
•  k-‐Means:	
  Advantages	
  and	
  Disadvantages	
  
•  Hierarchical	
  Clustering:	
  Advantages	
  and	
  Disadvantages	
  
2016	
                         Advantages	
  &	
  Disadvantages	
  of	
  k-‐Means	
  and	
  Hierarchical	
  Clustering	
     2	
  
k-‐Means:  Advantages  and  Disadvantages  
Advantages	
  
•  Easy	
  to	
  implement	
  
•  With	
  a	
  large	
  number	
  of	
  variables,	
  K-‐Means	
  may	
  be	
  computaHonally	
  faster	
  than	
  
   hierarchical	
  clustering	
  (if	
  K	
  is	
  small).	
  
•  k-‐Means	
  may	
  produce	
  Hghter	
  clusters	
  than	
  hierarchical	
  clustering	
  
•  An	
  instance	
  can	
  change	
  cluster	
  (move	
  to	
  another	
  cluster)	
  when	
  the	
  centroids	
  are	
  re-‐
   computed.	
  	
  
Disavantages	
  
•  Difficult	
  to	
  predict	
  the	
  number	
  of	
  clusters	
  (K-‐Value)	
  
•  IniHal	
  seeds	
  have	
  a	
  strong	
  impact	
  on	
  the	
  final	
  results	
  
•  The	
  order	
  of	
  the	
  data	
  has	
  an	
  impact	
  on	
  the	
  final	
  results	
  
•  SensiHve	
  to	
  scale:	
  rescaling	
  your	
  datasets	
  (normalizaHon	
  or	
  standardizaHon)	
  will	
  
   completely	
  change	
  results.	
  While	
  this	
  itself	
  is	
  not	
  bad,	
  not	
  realizing	
  that	
  you	
  have	
  to	
  
   spend	
  extra	
  a4en(on	
  to	
  scaling	
  your	
  data	
  might	
  be	
  bad.	
  	
  
2016	
                                           Advantages	
  &	
  Disadvantages	
  of	
  k-‐Means	
  and	
  Hierarchical	
  Clustering	
     3	
  
Hierarchical  Clustering:  Advantages  and  Disadvantages  
    Advantages	
  
    •  Hierarchical	
  clustering	
  outputs	
  a	
  hierarchy,	
  ie	
  a	
  structure	
  that	
  is	
  more	
  informaHve	
  than	
  
       the	
  unstructured	
  set	
  of	
  flat	
  clusters	
  returned	
  by	
  k-‐means.	
  Therefore,	
  it	
  is	
  easier	
  to	
  decide	
  
       on	
  the	
  number	
  of	
  clusters	
  by	
  looking	
  at	
  the	
  dendrogram	
  (see	
  suggesHon	
  on	
  how	
  to	
  cut	
  a	
  
       dendrogram	
  in	
  lab8).	
  
    •  Easy	
  to	
  implement	
  
    Disavantages	
  
    •  It	
  is	
  not	
  possible	
  to	
  undo	
  the	
  previous	
  step:	
  once	
  the	
  instances	
  have	
  been	
  assigned	
  to	
  a	
  
       cluster,	
  they	
  can	
  no	
  longer	
  be	
  moved	
  around.	
  	
  
    •  Time	
  complexity:	
  not	
  suitable	
  for	
  large	
  datasets	
  
    •  IniHal	
  seeds	
  have	
  a	
  strong	
  impact	
  on	
  the	
  final	
  results	
  
    •  The	
  order	
  of	
  the	
  data	
  has	
  an	
  impact	
  on	
  the	
  final	
  results	
  
    •  Very	
  sensiHve	
  to	
  outliers	
  
    2016	
                                           Advantages	
  &	
  Disadvantages	
  of	
  k-‐Means	
  and	
  Hierarchical	
  Clustering	
     4	
  
The  end
            Advantages	
  &	
  Disadvantages	
  of	
  k-‐Means	
  and	
  Hierarchical	
  
2016	
                                                                                       5	
  
                                       Clustering