2 views

I am reading Tom Mitchell's Machine Learning book, the first chapter.

What I want to do is to write the program to play checker with itself, and learn to win at the end. My question is about the credit assignment of a non-terminal board position it encounters. Maybe we can set the value using the linear combination of its feature and randomly weights, how to updates it with LMS rules? Because we don't have the training samples apart from ending states.

I am not sure whether I state my question clearly although I tried to.

by (108k points)

Let us assume that white wins. Then, every position White passed through should receive positive credit, while every position Black passed through should receive negative credit. If you iterate this reasoning, whenever you have a set of moves making up a game, you should add some amount of score to all states from the victor and have to remove some amount of score from all states from the loser. You can perform this for a bunch of computer vs. computer games.

Now you are having a data set made up of a bunch of checker positions and respective scores. You can estimate the features over those positions and train your favorite regressor, such as LMS.

If you are willing to join an ML Course then you can join Intellipaat's Machine Learning Course.