Tree API¶
-
class
msitrees.tree.MSIDecisionTreeClassifier(**kwargs)¶ MSI Decision Tree Classifier
Based on breadth-first tree traversal, this no-hyperparameter tree building algorithm tries to create new decision nodes by performing temporary split for each candidate node (at any point of time all current leaves are considered candidate) one by one, and keeping one which decreases overall cost function the most. New branches created with this operation are added to candidate pool. Best split points are estimated with gini based information gain. Training ends when any new split would only add needless complexity to the tree.
Cost function follows paper implementation [1] based on harmonic mean of model inaccuracy and surfeit, but with modifications to approximation of I(X, M) for performance and reusability reasons.
-
root¶ Root of a decision tree. All decision and leaf nodes are children of this node.
- Type
MSINode
-
fitted¶ Boolean variable indicating if tree was previously fitted.
- Type
bool
-
shape¶ Shape of dataset X with (n_samples, n_features) or None if tree was not yet fitted.
- Type
tuple
-
ncls¶ Number of classification categories or None if tree was not yet fitted.
- Type
int
-
ndim¶ Number of dataset X dimensions. 1 if n_features eq 1, 2 if n_features > 1 or None if tree was not yet fitted.
- Type
int
-
importances¶ Array with feature importances or None if tree was not fitted.
- Type
np.ndarray
References
Examples
>>> from msitrees.tree import MSIDecisionTreeClassifier >>> from sklearn.datasets import load_iris >>> from sklearn.model_selection import cross_val_score >>> data = load_iris() >>> clf = MSIDecisionTreeClassifier() >>> cross_val_score(clf, data['data'], data['target'], cv=10) ... array([1. , 1. , 1. , 0.93333333, 0.93333333, 0.8 , 0.93333333, 0.86666667, 0.8 , 1. ])
-
property
feature_importances_¶ Returns feature importances
Each feature importance is calculated as normalized sum of gini based information gain at nodes where split was made on that particular feature.
- Returns
importances – Normalized array of feature importances.
- Return type
np.ndarray
-
fit(x: Union[numpy.ndarray, pandas.core.frame.DataFrame, pandas.core.series.Series], y: Union[numpy.ndarray, pandas.core.series.Series]) → msitrees.tree.MSIDecisionTreeClassifier¶ Fits decision tree to training dataset.
- Parameters
x (np.ndarray) – Training data of shape (n_samples, n_features) or (n_samples, ). All values have to be numerical, so perform any required encoding before calling this method.
y (np.ndarray) – Ground truth data of shape (n_samples, ). All values have to be numerical, so perform any required encoding before calling this method.
- Returns
self – Fitted estimator.
- Return type
-
get_depth() → int¶ Returns decision tree depth
- Returns
depth – Maximum depth of fitted decision tree.
- Return type
int
-
get_n_leaves() → int¶ Returns number of tree leaves
- Returns
num_leaves – Number of leaf nodes in fitted tree.
- Return type
int
-
get_params(deep: bool = True) → dict¶ scikit-learn API compatibility
-
predict(x: Union[numpy.ndarray, pandas.core.frame.DataFrame]) → numpy.ndarray¶ Predicts class labels for input data X
- Parameters
x (np.ndarray) – Array of samples with shape (n_samples, n_features). Class label is predicted for each sample.
- Returns
pred – Array with shape (n_samples, ) Class label prediction for each sample.
- Return type
np.ndarray
-
predict_log_proba(x: Union[numpy.ndarray, pandas.core.frame.DataFrame]) → numpy.ndarray¶ Predicts class log probability for input data X.
Probability is defined as fraction of class label in a leaf.
- Parameters
x (np.ndarray) – Array of samples with shape (n_samples, n_features). Class log probabilities are predicted for each sample.
- Returns
logprobas – Array with shape (n_samples, n_targets) Array of log probabilities. Each index corresponds to class label and holds predicted log porbability of this class.
- Return type
np.ndarray
-
predict_proba(x: Union[numpy.ndarray, pandas.core.frame.DataFrame]) → numpy.ndarray¶ Predicts class probability for input data X.
Probability is defined as fraction of class label in a leaf.
- Parameters
x (np.ndarray) – Array of samples with shape (n_samples, n_features). Class probabilities are predicted for each sample.
- Returns
probas – Array with shape (n_samples, n_targets) Array of probabilities. Each index corresponds to class label and holds predicted porbability of this class.
- Return type
np.ndarray
-
score(x: numpy.ndarray, y: numpy.ndarray) → float¶ Predicts from X and computes accuracy score wrt. y
- Parameters
x (np.ndarray) – Array of samples with shape (n_samples, n_features). Class label is predicted for each sample.
y (np.ndarray) – Array of ground truth labels.
- Returns
accuracy – Accuracy score for predicted class labels.
- Return type
float
-
set_params(**params: dict) → msitrees.tree.MSIDecisionTreeClassifier¶ scikit-learn API compatibility
-