In this section of this book, the description of assigning the utilities to the local state is given by a classic problem called Tile World.
It is a two-dimensional grid world, in which we have an agent, tiles, obstacles, and holes.
An agent can move in four directions (up, down, left, right) and if it is located next to a tile, it can push it in the appropriate direction. Holes have to be filled up with tiles by the agent. The aim is to fill all holes with tiles.
The state of the environment can be described using the below variables:
The agent's current position (a_x, a_y)
Four tile's present positions (t1_x, t1_y), (t2_x, t2_y), (t3_x, t3_y) , (t4_x, t4_y)
Say in the current state, if the agent pushes the tile beneath it down, the system state transfers to the next state, in which every variable stays the same, except the agent's current position and the position of the tile which is being pushed.
Our utility function can be defined as the percentage of holes being filled, i.e.,
# of holes filled
u = -------------------------
# of total holes
It's apparent that:
If the agent fills all holes, utility = 1
If the agent fills zero holes, utility = 0
Associating utility function
Now, look at the two states below.
It's easy to see that:
Both states have the same utility value which is 1/3 (because 1 out 3 holes are filled)
The left (state s1) is a dead position, in which you are unable to move all tiles into holes
The right (state s2) is a good position, in which you have options to move the remaining two tiles into holes.
So the conclusions are:
If you associate the utility function only to a local state, e.g., u(s1) or u(s2), you actually could not tell the difference in terms of utilities. u(s1)=u(s2)=1/3.
You need a global or long-term view of the states which can be represented with the run, which is a sequence of interleaved environment states and actions the agent takes.
You can assign a utility, not to individual states, but tuns. Such an approach takes an inherently long term view.
u: run -> real value