How exactly does k-means++ work?

Question

1 Answer

Anurag · Answer 1 · 2019-07-09T14:32:56+0000

In unsupervised learning, the cluster centers are initially chosen at random from the set of input data, where the probability of choosing vector x is high.

Code for Kmeans++:

def initialize(X, K):
    C = [X[0]]
    for k in range(1, K):
        D2 = scipy.array([min([scipy.inner(c-x,c-x) for c in C]) for x in X])
        probs = D2/D2.sum()
        cumprobs = probs.cumsum()
        r = scipy.rand()
        for j,p in enumerate(cumprobs):
            if r < p:
                i = j
                break
        C.append(X[i])
    return C

The output of the cumsum function gives boundaries to partition the interval [0,1]. These partitions have a length equal to the probability of the corresponding point being chosen as a center. Since r is uniformly chosen between [0,1], it will fall into exactly one of these intervals. The for loop here checks to see which partition r is in.

For Example:

probs = [0.1, 0.2, 0.3, 0.4]
cumprobs = [0.1, 0.3, 0.6, 1.0]
if r < cumprobs[0]:
    # this event has probability 0.1
    i = 0
elif r < cumprobs[1]:
    # this event has probability 0.2
    i = 1
elif r < cumprobs[2]:
    # this event has probability 0.3
    i = 2
elif r < cumprobs[3]:
    # this event has probability 0.4
    i = 3

Hope this answer helps.

How exactly does k-means++ work?

1 Answer

Related questions

Browse Categories

Browse By Domains

Popular Courses

Popular Tutorials

Popular Resources