Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in AI and Deep Learning by (50.2k points)

I am working on an Agent class in Python 2.7.11 that uses a Markov Decision Process (MDP) to search for an optimal policy π in a GridWorld. I am implementing a basic value iteration for 100 iterations of all GridWorld states using the following Bellman Equation:

enter image description here

  • T(s, a,s') is the probability function of successfully transitioning to successor state s' from current state s by taking action ‘a’.

  • R(s, a,s') is the reward for transitioning from s to s'.

  • γ (gamma) is the discount factor where 0 ≤ γ ≤ 1.

  • Vk(s') is a recursive call to repeat the calculation once s' has been reached.

  • Vk+1(s) is representative of how after enough k iterations have occurred, the Vk iteration value will converge and become equivalent to Vk+1

This equation is derived from taking the maximum of a Q value function, which is what I am using within my program:

enter image description here

When constructing my Agent, it is passed an MDP, which is an abstract class containing the following methods:

# Returns all states in the GridWorld

def getStates()

# Returns all legal actions the agent can take given the current state

def getPossibleActions(state)

# Returns all possible successor states to transition to from the current state 

# given action, and the probability of reaching each with that action

def getTransitionStatesAndProbs(state, action)

# Returns the reward of going from the current state to the successor state

def getReward(state, action, nextState)

My Agent is also passed a discount factor, and many iterations. I am also making use of a dictionary to keep track of my values. Here is my code:

class IterationAgent:

    def __init__(self, mdp, discount = 0.9, iterations = 100):

        self.mdp = mdp

        self.discount = discount

        self.iterations = iterations

        self.values = util.Counter() # A Counter is a dictionary with default 0

        for transition in range(0, self.iterations, 1):

            states = self.mdp.getStates()

            valuesCopy = self.values.copy()

            for state in states:

                legalMoves = self.mdp.getPossibleActions(state)

                convergedValue = 0

                for move in legalMoves:

                    value = self.computeQValueFromValues(state, move)

                    if convergedValue <= value or convergedValue == 0:

                        convergedValue = value

                valuesCopy.update({state: convergedValue})

            self.values = valuesCopy

    def computeQValueFromValues(self, state, action):

        successors = self.mdp.getTransitionStatesAndProbs(state, action)

        reward = self.mdp.getReward(state, action, successors)

        qValue = 0

        for successor, probability in successors:

            # The Q value equation: Q*(a,s) = T(s,a,s')[R(s,a,s') + gamma(V*(s'))]

            qValue += probability * (reward + (self.discount * self.values[successor]))

        return qValue

This implementation is correct, though I am unsure why I need valuesCopy to accomplish a successful update to my self.values dictionary. I have tried the following to omit the copying, but it does not work since it returns slightly incorrect values:

for i in range(0, self.iterations, 1):

    States = self.mdp.getStates()

    for state in states:

        legalMoves = self.mdp.getPossibleActions(state)

        convergedValue = 0

        for move in legalMoves:

            value = self.computeQValueFromValues(state, move)

            if convergedValue <= value or convergedValue == 0:

                convergedValue = value

        self.values.update({state: convergedValue})

My question is why is including a copy of my self.values dictionary necessary to update my values correctly when valuesCopy = self.values.copy() makes a copy of the dictionary anyways every iteration? Shouldn't you update the values in the original result in the same update?

1 Answer

0 votes
by (107k points)

A shallow copy means building a new collection object and then populating it with references to the child objects found in the original. In reality, a shallow copy is only one level deep. The copying method is not recursive and therefore won’t create copies of the child objects themselves.

One thing to remember is that making a shallow copy of an object won’t clone child objects. Hence, the copy is not fully independent of the original.

If you are looking to learn more about Artificial Intelligence then you visit Artificial Intelligence(AI) Tutorial. Also, if you are appearing for job profiles of AI Engineer or AI Expert then you can prepare for the interviews on Artificial Intelligence Interview Questions.

31k questions

32.8k answers

501 comments

693 users

Browse Categories

...