In reinforcement learning, what is the difference between policy iteration and value iteration?

As much as I understand, in value iteration, you use the Bellman equation to solve for the optimal policy, whereas, in policy iteration, you randomly select a policy π, and find the reward of that policy.

My doubt is that if you are selecting a random policy π in PI, how is it guaranteed to be the optimal policy, even if we are choosing several random policies.