I am reading Tom Mitchell's Machine Learning book, the first chapter.
What I want to do is to write the program to play checker with itself, and learn to win at the end. My question is about the credit assignment of a non-terminal board position it encounters. Maybe we can set the value using the linear combination of its feature and randomly weights, how to updates it with LMS rules? Because we don't have the training samples apart from ending states.
I am not sure whether I state my question clearly although I tried to.