I am reading about the difference between k-means clustering and k-medoid clustering.

Supposedly there is an advantage to using the pairwise distance measure in the k-medoid algorithm, instead of the more familiar sum of squared Euclidean distance-type metric to evaluate variance that we find with k-means. And apparently, this different distance metric somehow reduces noise and outliers.

I have seen this claim but I have yet to see any good reasoning as to the mathematics behind this claim.

What makes the pairwise distance measure commonly used in k-medoid better? More exactly, how does the lack of a squared term allow k-medoids to have the desirable properties associated with the concept of taking a median?