**In unsupervised learning**, the cluster centers are initially chosen at random from the set of input data, where the probability of choosing vector x is high.

**Code for Kmeans++:**

def initialize(X, K):

C = [X[0]]

for k in range(1, K):

D2 = scipy.array([min([scipy.inner(c-x,c-x) for c in C]) for x in X])

probs = D2/D2.sum()

cumprobs = probs.cumsum()

r = scipy.rand()

for j,p in enumerate(cumprobs):

if r < p:

i = j

break

C.append(X[i])

return C

The output of the cumsum function gives boundaries to partition the interval [0,1]. These partitions have a length equal to the probability of the corresponding point being chosen as a center. Since r is uniformly chosen between [0,1], it will fall into exactly one of these intervals. The for loop here checks to see which partition r is in.

**For Example:**

probs = [0.1, 0.2, 0.3, 0.4]

cumprobs = [0.1, 0.3, 0.6, 1.0]

if r < cumprobs[0]:

# this event has probability 0.1

i = 0

elif r < cumprobs[1]:

# this event has probability 0.2

i = 1

elif r < cumprobs[2]:

# this event has probability 0.3

i = 2

elif r < cumprobs[3]:

# this event has probability 0.4

i = 3

Hope this answer helps.