Ensemble API¶
-
class
msitrees.ensemble.MSIRandomForestClassifier(n_estimators: int = 100, bootstrap: bool = True, feature_sampling: bool = True, n_jobs: int = - 1, random_state: Optional[int] = None, **kwargs)¶ MSI Random Forest Classifier
A collection of MSI based decision tree classifiers fitted on bootstrapped sub-samples of dataset. Final class label is decided by majority voting between all estimators.
- Parameters
n_estimators (int, default=100) – Number of tree estimators to fit
bootstrap (bool, default=True) – When true, each estimator in will be fitted with bootstrap sub-sample of original dataset.
feature_sampling (bool, default=True) – When true, number of features considered at each split will be equal to sqrt(n_features). This is equivalent of sklearn max_features param set to ‘auto’.
n_jobs (int, default=-1) – Number of parallel jobs to run. When set to -1 all CPUs are used. 1 means no parallel processing.
random_state (int, default=None) – Sets seed for class instance. Used to control bootstrap. Note that this seeds only generator that sets random state for each estimator, so trees generally should have their own unique seeds. When random_state is set to a number, those seeds will be reproduced at each run. Parameter set to None should result in different estimator seeding each time.
-
estimators¶ list with all fited MSIDecisionTreeClassifier instances.
- Type
list
-
fitted¶ Boolean variable indicating if tree was previously fitted.
- Type
bool
-
shape¶ Shape of dataset X with (n_samples, n_features) or None if tree was not yet fitted.
- Type
tuple
-
ncls¶ Number of classification categories or None if tree was not yet fitted.
- Type
int
-
ndim¶ Number of dataset X dimensions. 1 if n_features eq 1, 2 if n_features > 1 or None if tree was not yet fitted.
- Type
int
-
importances¶ Array with feature importances or None if tree was not fitted.
- Type
np.ndarray
References
[1] https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8767915
[4] https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm
Examples
>>> from msitrees.ensemble import MSIRandomForestClassifier >>> from sklearn.datasets import load_iris >>> from sklearn.model_selection import cross_val_score >>> data = load_iris() >>> clf = MSIRandomForestClassifier() >>> cross_val_score(clf, data['data'], data['target'], cv=10) ... array([1. , 1. , 1. , 1. , 0.93333333, 0.86666667, 0.93333333, 0.86666667, 0.8 , 1. ])
-
property
feature_importances_¶ Returns feature importances
Each feature importance is calculated as normalized sum of gini based information gain at nodes where split was made on that particular feature. For random forest classifier, final importance is a mean over all estimators.
- Returns
importances – Normalized array of feature importances.
- Return type
np.ndarray
-
fit(x: Union[numpy.ndarray, pandas.core.frame.DataFrame, pandas.core.series.Series], y: Union[numpy.ndarray, pandas.core.series.Series]) → msitrees.ensemble.MSIRandomForestClassifier¶ Fits random forest classifier to training dataset.
- Parameters
x (np.ndarray) – Training data of shape (n_samples, n_features) or (n_samples, ). All values have to be numerical, so perform any required encoding before calling this method.
y (np.ndarray) – Ground truth data of shape (n_samples, ). All values have to be numerical, so perform any required encoding before calling this method.
- Returns
self – Fitted estimator.
- Return type
-
get_params(**kwargs) → dict¶ Get parameters for this estimator
Notes
scikit-learn API compatibility.
-
predict(x: Union[numpy.ndarray, pandas.core.frame.DataFrame]) → numpy.ndarray¶ Predicts class labels for each sample in input data X
- Parameters
x (np.ndarray) – Array of samples with shape (n_samples, n_features). Class label is predicted for each sample.
- Returns
pred – Array with shape (n_samples, ) Class label prediction for each sample.
- Return type
np.ndarray
-
predict_log_proba(x: Union[numpy.array, pandas.core.frame.DataFrame]) → numpy.ndarray¶ Predicts class log probability for each sample in input data X.
Probability is defined as fraction of class label in a leaf.
- Parameters
x (np.ndarray) – Array of samples with shape (n_samples, n_features). Class log probabilities are predicted for each sample.
- Returns
logprobas – Array with shape (n_samples, n_targets) Array of log probabilities. Each index corresponds to class label and holds predicted log porbability of this class.
- Return type
np.ndarray
-
predict_proba(x: Union[numpy.array, pandas.core.frame.DataFrame]) → numpy.ndarray¶ Predicts class probability for each sample in input data X.
Probability is defined as fraction of class label in a leaf.
- Parameters
x (np.ndarray) – Array of samples with shape (n_samples, n_features). Class probabilities are predicted for each sample.
- Returns
probas – Array with shape (n_samples, n_targets) Array of probabilities. Each index corresponds to class label and holds predicted porbability of this class.
- Return type
np.ndarray
-
score(x: Union[numpy.ndarray, pandas.core.frame.DataFrame], y: Union[numpy.ndarray, pandas.core.frame.DataFrame]) → float¶ Predicts class label for each sample in X and computes accuracy score wrt. ground truth.
- Parameters
x (np.ndarray) – Array of samples with shape (n_samples, n_features). Class label is predicted for each sample.
y (np.ndarray) – Array of ground truth labels.
- Returns
accuracy – Accuracy score for predicted class labels.
- Return type
float
-
set_params(**params) → msitrees.ensemble.MSIRandomForestClassifier¶ Set the parameters of this estimator
Notes
scikit-learn API compatibility.