I have read a fair amount about Haar training and I'm not clear on how many images one should use for the positive and negative sample sets. I see it recommended to use many images, some people recommend thousands. I'm also unclear about whether the number of positive and negative sample images should be the same?

Here is the best tutorial on Haartraining:

It says they used 5000 for positive and 3000 for negative.

This link says 3000 for positive and 5000 for negative. Anyway, a higher number of images improves the accuracy, but it also increases training time.

You can also refer to this for training your own OpenCV haar classifier:

