2 views

I understand that decision trees try to put classifiers with high entropy high on the decision tree. However, how does information gain play into this?

Information gain is defined as:

InformationGain = EntropyBefore - EntropyAfter

Does a decision tree try to place classifiers with low information gain at the top of the tree? So entropy is always maximized and information gain is always minimized?

Sorry, I am just a bit confused. Thank you!

by (108k points)

No, you are always setting the nodes with high information gain at the top of the tree. But remember, this is a recursive algorithm.

If you have a table with (say) five attributes, then you must first calculate the information of each of those five attributes and choose the attribute with the highest information gain. At this time, you should think of your developing decision tree as having a root with that highest node, and children with sub-tables taken from the values of that attribute. For example, if it is a binary attribute, you will have two children; each will have the four remaining attributes; one child will have all the elements corresponding to the selection attribute being true, the other all the elements corresponding to false.

Now for each of those children nodes, you go through and select the attribute of highest information gain again, recursively until you cannot continue.

In this way, you have a tree that is always telling you to make a decision based on a variable that is expected to give you the most information while taking into account the decisions you have already made.

If you wish to learn about Artificial Intelligence then visit this Artificial Intelligence Course.