Techno Blender
Digitally Yours.

What? When? How?: ExtraTrees Classifier | by Karun Thankachan | Aug, 2022

0 75


What is ExtraTrees Classifier? When to use it? How to implement it?

Photo by Eunice Lituañas on Unsplash

Tree based models have increased in popularity over the last decade, primarily due to their robust nature. Tree-based models can be used on any type of data (categorical/continuous), can be used on data that is not normally distributed, and require little if any data transformations (can handle missing value/scale issues etc.)

While Decision Trees and Random Forest are often the go to tree-based models, a lesser known one is ExtraTrees. (If in case you are new to tree-based models, do check out the following post).

Similar to Random Forests, ExtraTrees is an ensemble ML approach that trains numerous decision trees and aggregates the results from the group of decision trees to output a prediction. However, there are few differences between Extra Trees and Random Forest.

Random Forest uses bagging to select different variations of the training data to ensure decision trees are sufficiently different. However, Extra Trees uses the entire dataset to train decision trees. As such, to ensure sufficient differences between individual decision trees, it RANDOMLY SELECTS the values at which to split a feature and create child nodes. In contrast, in a Random Forest, we use an algorithm to greedy search and select the value at which to split a feature. Apart from these two differences, Random Forest and Extra Trees are largely the same. So what effect do these changes have?

  • Using the entire dataset (which is the default setting and can be changed) allows ExtraTrees to reduce the bias of the model. However, the randomization of the feature value at which to split, increases the bias and variance. The paper that introduced the Extra Trees model conducts a bias-variance analysis of different tree based models. From the paper we see on most classification and regression tasks (six were analyzed) ExtraTrees have higher bias and lower variance than Random Forest. However, the paper goes on to say this is because the randomization in extra trees works to include irrelevant features into the model. As such, when irrelevant feature were excluded, say via a feature selection pre-modelling step, Extra Trees get a bias score similar to that of Random Forest.
  • In terms of computational cost, Extra Trees is much faster than Random Forest. This is because Extra Trees randomly selects the value at which to split features, instead of the greedy algorithm used in Random Forest.
Photo by Jens Lelie on Unsplash

Random Forest remains the go-to ensemble tree based model (with recent competition from XGBoost Models). However, from our prior discussion on the differences between Random Forest and Extra Trees, we see that ExtraTrees have value, especially when computational cost is a concern. Specifically, when building models that have substantial feature engineering/feature selection pre-modelling steps, and computational cost is an issue ExtraTrees would be a good choice over other ensemble tree-based models.

ExtraTrees can be used to build classification model or regression models and is available via Scikit-learn. For this tutorial, we will cover the classification model, but the code can be used for regression with minor tweaks (i.e., switching from ExtraTreesClassifier to ExtraTreesRegressor)

Building a model

We will use make_classification from Scikit-learn to create dummy classification dataset. To evaluate the model, we will use 10-fold cross validation with accuracy as the evaluation metric.

Hyper Parameter Tuning

The detailed list of parameters for the Extra Trees Model can be found on the Scikit-learn page. The Extra Trees Research paper calls out three key parameters explicitly, with the following statement.

“The parameters K, nmin and M have different effects: K determines the strength of the attribute selection process, nmin the strength of averaging output noise, and M the strength of the variance reduction of the ensemble model aggregation.”

Let’s look at these parameters more closely from the implementation perspective.

  • K is the max_feature in Scikit-learn documentation and refers to the number of features to be considered at each decision node. The higher the value of K, more features are considered at each decision node, and hence lower the bias of the model. However, too high a value of K reduces randomization, negating the effect of the ensemble.
  • nmin maps to min_sample_leaf, and is a minimum number of samples required to be at a leaf node. The higher its value, the less likely the model is to overfit. Smaller numbers of samples result in more splits and a deeper, more specialized tree.
  • M maps to n_estimators, and is a number of trees in the forest. The higher its value, the lower the variance of the model.

The best set of parameters can be selected via GridSearchCV as shown below.

Photo by Riccardo Annandale on Unsplash
  • ExtraTrees Classifier is an ensemble tree-based machine learning approach that uses relies on randomization to reduce variance and computational cost (compared to Random Forest).
  • ExtraTrees Classifier can be used for classification or regression, in scenarios where computational cost is a concern and features have been carefully selected and analyzed.
  • Extra Trees can be implemented from Scikit-learn. The three hyperparameters important for tuning are max_feature, min_samples_leaf, and n_estimators.

That’s it! The what, when and how for ExtraTrees!


What is ExtraTrees Classifier? When to use it? How to implement it?

Photo by Eunice Lituañas on Unsplash

Tree based models have increased in popularity over the last decade, primarily due to their robust nature. Tree-based models can be used on any type of data (categorical/continuous), can be used on data that is not normally distributed, and require little if any data transformations (can handle missing value/scale issues etc.)

While Decision Trees and Random Forest are often the go to tree-based models, a lesser known one is ExtraTrees. (If in case you are new to tree-based models, do check out the following post).

Similar to Random Forests, ExtraTrees is an ensemble ML approach that trains numerous decision trees and aggregates the results from the group of decision trees to output a prediction. However, there are few differences between Extra Trees and Random Forest.

Random Forest uses bagging to select different variations of the training data to ensure decision trees are sufficiently different. However, Extra Trees uses the entire dataset to train decision trees. As such, to ensure sufficient differences between individual decision trees, it RANDOMLY SELECTS the values at which to split a feature and create child nodes. In contrast, in a Random Forest, we use an algorithm to greedy search and select the value at which to split a feature. Apart from these two differences, Random Forest and Extra Trees are largely the same. So what effect do these changes have?

  • Using the entire dataset (which is the default setting and can be changed) allows ExtraTrees to reduce the bias of the model. However, the randomization of the feature value at which to split, increases the bias and variance. The paper that introduced the Extra Trees model conducts a bias-variance analysis of different tree based models. From the paper we see on most classification and regression tasks (six were analyzed) ExtraTrees have higher bias and lower variance than Random Forest. However, the paper goes on to say this is because the randomization in extra trees works to include irrelevant features into the model. As such, when irrelevant feature were excluded, say via a feature selection pre-modelling step, Extra Trees get a bias score similar to that of Random Forest.
  • In terms of computational cost, Extra Trees is much faster than Random Forest. This is because Extra Trees randomly selects the value at which to split features, instead of the greedy algorithm used in Random Forest.
Photo by Jens Lelie on Unsplash

Random Forest remains the go-to ensemble tree based model (with recent competition from XGBoost Models). However, from our prior discussion on the differences between Random Forest and Extra Trees, we see that ExtraTrees have value, especially when computational cost is a concern. Specifically, when building models that have substantial feature engineering/feature selection pre-modelling steps, and computational cost is an issue ExtraTrees would be a good choice over other ensemble tree-based models.

ExtraTrees can be used to build classification model or regression models and is available via Scikit-learn. For this tutorial, we will cover the classification model, but the code can be used for regression with minor tweaks (i.e., switching from ExtraTreesClassifier to ExtraTreesRegressor)

Building a model

We will use make_classification from Scikit-learn to create dummy classification dataset. To evaluate the model, we will use 10-fold cross validation with accuracy as the evaluation metric.

Hyper Parameter Tuning

The detailed list of parameters for the Extra Trees Model can be found on the Scikit-learn page. The Extra Trees Research paper calls out three key parameters explicitly, with the following statement.

“The parameters K, nmin and M have different effects: K determines the strength of the attribute selection process, nmin the strength of averaging output noise, and M the strength of the variance reduction of the ensemble model aggregation.”

Let’s look at these parameters more closely from the implementation perspective.

  • K is the max_feature in Scikit-learn documentation and refers to the number of features to be considered at each decision node. The higher the value of K, more features are considered at each decision node, and hence lower the bias of the model. However, too high a value of K reduces randomization, negating the effect of the ensemble.
  • nmin maps to min_sample_leaf, and is a minimum number of samples required to be at a leaf node. The higher its value, the less likely the model is to overfit. Smaller numbers of samples result in more splits and a deeper, more specialized tree.
  • M maps to n_estimators, and is a number of trees in the forest. The higher its value, the lower the variance of the model.

The best set of parameters can be selected via GridSearchCV as shown below.

Photo by Riccardo Annandale on Unsplash
  • ExtraTrees Classifier is an ensemble tree-based machine learning approach that uses relies on randomization to reduce variance and computational cost (compared to Random Forest).
  • ExtraTrees Classifier can be used for classification or regression, in scenarios where computational cost is a concern and features have been carefully selected and analyzed.
  • Extra Trees can be implemented from Scikit-learn. The three hyperparameters important for tuning are max_feature, min_samples_leaf, and n_estimators.

That’s it! The what, when and how for ExtraTrees!

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.

Leave a comment