I am currently reading the Machine Learning book by Tom Mitchell. When talking about neural networks, Mitchell states:

"Although the perceptron rule finds a successful weight vector when the training examples are linearly separable, it can fail to converge if the examples are not linearly separable. "

I am having problems understanding what he means with "linearly separable"? Wikipedia tells me that "two sets of points in a two-dimensional space are linearly separable if they can be completely separated by a single line."

But how does this apply to the training set for neural networks? How can inputs (or action units) be linearly separable or not?

I'm not the best at geometry and maths - could anybody explain it to me as though I were 5? ;) Thanks!