Decision Tree Algorithm in Machine Learning Using Sklearn

Creating-and-Visualizing-Decision-Tree-Algorithm_BIG.jpg

What is Decision Tree? Decision Tree in Python and Scikit-Learn

Decision Tree algorithm is one of the simplest yet most powerful Supervised Machine Learning algorithms. Decision Tree algorithm can be used to solve both regression and classification problems in Machine Learning. That is why it is also known as CART or Classification and Regression Trees. As the name suggests, in Decision Tree, we form a tree-like model of decisions and their possible consequences.

Before we dive right into understanding this interesting algorithm, let us take a look at the concepts this blog has to offer.

Check this video on Decision Tree Machine Learning:

Video Thumbnail

Without much delay, let’s get started!

Decision Tree Algorithm Example

Monica’s cousin Marry is visiting Central Park this weekend. Now, Monica needs to make some plans for the weekend, whether to go out for shopping, go for a movie, spend time in the Central Park coffee shop, or just stay in and play a board game. Well, she decides to create a Decision Tree to make things easy. Let us see what she has created.

DecisionTree

Here, the interior nodes represent different tests on an attribute (for example, whether to go out or stay in), branches hold the outcomes of those tests, and leaf nodes represent a class label or some decision taken after measuring all attributes. Each path from the root node to the leaf nodes represents a decision tree classification rule.

Rule 1: If it’s not raining and not too sunny, then go out for shopping.

Rule 2: If it’s not raining but too sunny outside, then go for a movie.

Rule 3: If it’s raining outside and the cable has signal, then watch a TV show.

Rule 4: If it’s raining and the cable signal fails, then spend time in the coffee shop downstairs

That’s how a decision tree helps Monica to make the perfect weekend plan with her cousin.

Certification in Bigdata Analytics

Types of Decision Tree Algorithms

There are two types of decision trees. They are categorized based on the type of the target variable they have. If the decision tree has a categorical target variable, then it is called a ‘categorical variable decision tree’. Similarly, if it has a continuous target variable, it is called a ‘continuous variable decision tree’.

  • Root Node: This node gets divided into different homogeneous nodes. It represents the entire sample.
  • Splitting: It is the process of splitting or dividing a node into two or more sub-nodes.
  • Interior Nodes: They represent different tests on an attribute.
  • Branches: They hold the outcomes of those tests.
  • Leaf Nodes: When the nodes can’t be split further, they are called leaf nodes.
  • Parent and Child Nodes: The node from which sub-nodes are created is called a parent node. And, the sub-nodes are called the child nodes.

Get 100% Hike!

Master Most in Demand Skills Now!

Advantages of Decision Tree Algorithms

  • Easy to understand
  • Requires minimum data cleaning
  • No constraint on the data type

Disadvantages of Decision Tree Algorithms

  • Possibility of overfitting

How Does a Decision Tree in Machine Learning Work?

The process of training and predicting the target features using a decision tree in Machine Learning is given below:

  • Feed a dataset, containing a number of training instances, with a set of features and a target
  • Train the decision tree classification or regression models with the help of DecisionTreeClassifier () or DecisionTreeRegressor () methods, and add the required criterion while building the decision tree model
  • Use Graphviz to visualize the decision tree model

That’s it! Your decision tree model is ready.

Decision Tree in Machine Learning – DecisionTreeClassifier () and DecisionTreeRegressor ()

DecisionTreeClassifier (): It is nothing but the decision tree classifier function to build a decision tree model in Machine Learning using Python. The DecisionTreeClassifier() function looks like this:

DecisionTreeClassifier (criterion = ‘gini’, random_state = None, max_depth = None, min_samples_leaf =1)

Here are a few important parameters:

  • criterion: It is used to measure the quality of a split in the decision tree classification. By default, it is ‘gini’; it also supports ‘entropy’.
  • max_depth: This is used to add maximum depth to the decision tree after the tree is expanded.
  • min_samples_leaf: This parameter is used to add the minimum number of samples required to be present at a leaf node.

DecisionTreeRegressio (): It is the decision tree regressor function used to build a decision tree model in Machine Learning using Python. The DecisionTreeRegressor () function looks like this:

DecisionTreeRegressor (criterion = ‘mse’, random_state =None , max_depth=None, min_samples_leaf=1,)

  • criterion: This function is used to measure the quality of a split in the decision tree regression. By default, it is ‘mse’ (the mean squared error), and it also supports ‘mae’ (the mean absolute error).
  • max_depth: This is used to add maximum depth to the decision tree after the tree is expanded.
  • min_samples_leaf: This function is used to add the minimum number of samples required to be present at a leaf node.

Become an Artificial Intelligence Engineer

Creating and Visualizing a Decision Tree Regression Model in Machine Learning Using Python

Problem Statement: Use Machine Learning to predict the selling prices of houses based on some economic factors. Build a model using decision tree in Python.

Dataset: Boston Housing Dataset

Let us have a quick look at the dataset:

DecisionTree1

Model Building

Let us build the regression model of decision tree in Python.

Step 1: Load required packages

DecisionTree2

Step 2: Load the Boston dataset

DecisionTree3

Take a look at the first five entries of the dataset

DecisionTree4
DecisionTree5

Step 3: Visualize the dataset using a scatter plot

DecisionTree6
DecisionTree7

Step 4: Define the features and the target

DecisionTree8

Step 5: Split the dataset into train and test sets

DecisionTree9

Here, ‘test_size = 0.2’ means that the test set will be 20 percent of the whole dataset and the training set’s size will be 80 percent of the entire dataset.

Step 6: Build the model with the decision tree regressor function

DT1
DT2

Step 7: Visualize the tree using Graphviz

DT3

After executing this step, the ‘reg_tree.dot’ file will be saved in your system. Now to visualize the tree, open this file with the ‘.dot’ extension.

DT4

Now, copy the Graphviz data, go to the site ‘http://www.webgraphviz.com/’ and paste the Graphviz data there, as shown below:

DT5

Having done all this, let us perform the decision tree analysis now.

Step 7: Predict the values

DT6

Step 8: Compare y_test and y_pred

DT8
DT9

Step 9: Finding the RMSE value

DT10
DT11

Creating and Visualizing a Decision Tree Classification Model in Machine Learning Using Python

Problem Statement: Use Machine Learning to predict breast cancer cases using patient treatment history and health data. Build a model using decision tree in Python.

Dataset: Breast Cancer Wisconsin (Diagnostic) Dataset

Let us have a quick look at the dataset:

DT12

Model Building

Let us build the classification model of decision tree in Python.

Step 1: Load required packages and the dataset using Pandas

DT13
DT14

Step 2: Take a look at the shape of the dataset

DT15
DT16

Step 3: Define the features and the target

DT17

Step 4: Split the dataset into train and test sets using sklearn

DT18

Step 5: Build the model with the help of the decision tree classifier function

DT19
DT20

Step 6: Visualize the decision tree

DT21

After executing this step, the ‘clf_tree.dot’ file will be saved in your system. Now to visualize the tree, open this file with the ‘.dot’ extension and copy the graphviz data. Then, go to the site ‘http://www.webgraphviz.com/’ and paste the graphviz data there, as shown below:

DecisionTree11

Click on ‘Generate Graph’

DecisionTree12

After this step, let us perform the decision tree analysis now.

Step 7: Predict the values

DecisionTree13

Step 8: Compare y_test and y_pred

DecisionTree14
DecisionTree15

Step 9: Find the confusion matrix and other metric parameters for this decision tree classification model

DecisionTree16
DecisionTree17

What Did We Learn So Far?

In this decision tree tutorial blog, we answered the question, ‘what is decision tree algorithm?’ We also learned how to build decision tree classification models with the help of decision tree classifier and decision tree regressor, decision tree analysis, and also decision tree algorithm visualization in Machine Learning using Python, Scikit-Learn, and Graphviz tool. Strengthen your foundation in statistics, linear algebra, and optimization through our beginner-friendly online Machine Learning course.

About the Author

Technical Content Writer

Garima Hansa is an emerging Data Analyst and Machine Learning enthusiast with hands-on experience through academic and independent projects. She specializes in Python, SQL, data visualization, statistical analysis, and machine learning techniques. Known for building efficient, well-documented solutions and translating complex data insights into actionable recommendations, Garima contributes meaningful value to research, analytics, and developer communities.