I am having trouble fully understanding the k-means++ algorithm. I am interested exactly how the first k centroids are picked (the rest is like in the original k-means).

Is the probability function used based on distance or Gaussian?

In the same time the most long distant point (from the other centroids) is picked for a new centroid.

I will appreciate a step by step explanation and an example. The one in Wikipedia is not clear enough. Also, a very well commented source code would also help. If you are using 6 arrays then please tell us which one is for what.