SelectPercentile#
- class sklearn.feature_selection.SelectPercentile(score_func=<function f_classif>, *, percentile=10)[source]#
- Select features according to a percentile of the highest scores. - Read more in the User Guide. - Parameters:
- score_funccallable, default=f_classif
- Function taking two arrays X and y, and returning a pair of arrays (scores, pvalues) or a single array with scores. Default is f_classif (see below “See Also”). The default function only works with classification tasks. - Added in version 0.18. 
- percentileint, default=10
- Percent of features to keep. 
 
- Attributes:
- scores_array-like of shape (n_features,)
- Scores of features. 
- pvalues_array-like of shape (n_features,)
- p-values of feature scores, None if - score_funcreturned only scores.
- n_features_in_int
- Number of features seen during fit. - Added in version 0.24. 
- feature_names_in_ndarray of shape (n_features_in_,)
- Names of features seen during fit. Defined only when - Xhas feature names that are all strings.- Added in version 1.0. 
 
 - See also - f_classif
- ANOVA F-value between label/feature for classification tasks. 
- mutual_info_classif
- Mutual information for a discrete target. 
- chi2
- Chi-squared stats of non-negative features for classification tasks. 
- f_regression
- F-value between label/feature for regression tasks. 
- mutual_info_regression
- Mutual information for a continuous target. 
- SelectKBest
- Select features based on the k highest scores. 
- SelectFpr
- Select features based on a false positive rate test. 
- SelectFdr
- Select features based on an estimated false discovery rate. 
- SelectFwe
- Select features based on family-wise error rate. 
- GenericUnivariateSelect
- Univariate feature selector with configurable mode. 
 - Notes - Ties between features with equal scores will be broken in an unspecified way. - This filter supports unsupervised feature selection that only requests - Xfor computing the scores.- Examples - >>> from sklearn.datasets import load_digits >>> from sklearn.feature_selection import SelectPercentile, chi2 >>> X, y = load_digits(return_X_y=True) >>> X.shape (1797, 64) >>> X_new = SelectPercentile(chi2, percentile=10).fit_transform(X, y) >>> X_new.shape (1797, 7) - fit(X, y=None)[source]#
- Run score function on (X, y) and get the appropriate features. - Parameters:
- Xarray-like of shape (n_samples, n_features)
- The training input samples. 
- yarray-like of shape (n_samples,) or None
- The target values (class labels in classification, real numbers in regression). If the selector is unsupervised then - ycan be set to- None.
 
- Returns:
- selfobject
- Returns the instance itself. 
 
 
 - fit_transform(X, y=None, **fit_params)[source]#
- Fit to data, then transform it. - Fits transformer to - Xand- ywith optional parameters- fit_paramsand returns a transformed version of- X.- Parameters:
- Xarray-like of shape (n_samples, n_features)
- Input samples. 
- yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None
- Target values (None for unsupervised transformations). 
- **fit_paramsdict
- Additional fit parameters. 
 
- Returns:
- X_newndarray array of shape (n_samples, n_features_new)
- Transformed array. 
 
 
 - get_feature_names_out(input_features=None)[source]#
- Mask feature names according to selected features. - Parameters:
- input_featuresarray-like of str or None, default=None
- Input features. - If - input_featuresis- None, then- feature_names_in_is used as feature names in. If- feature_names_in_is not defined, then the following input feature names are generated:- ["x0", "x1", ..., "x(n_features_in_ - 1)"].
- If - input_featuresis an array-like, then- input_featuresmust match- feature_names_in_if- feature_names_in_is defined.
 
 
- Returns:
- feature_names_outndarray of str objects
- Transformed feature names. 
 
 
 - get_metadata_routing()[source]#
- Get metadata routing of this object. - Please check User Guide on how the routing mechanism works. - Returns:
- routingMetadataRequest
- A - MetadataRequestencapsulating routing information.
 
 
 - get_params(deep=True)[source]#
- Get parameters for this estimator. - Parameters:
- deepbool, default=True
- If True, will return the parameters for this estimator and contained subobjects that are estimators. 
 
- Returns:
- paramsdict
- Parameter names mapped to their values. 
 
 
 - get_support(indices=False)[source]#
- Get a mask, or integer index, of the features selected. - Parameters:
- indicesbool, default=False
- If True, the return value will be an array of integers, rather than a boolean mask. 
 
- Returns:
- supportarray
- An index that selects the retained features from a feature vector. If - indicesis False, this is a boolean array of shape [# input features], in which an element is True iff its corresponding feature is selected for retention. If- indicesis True, this is an integer array of shape [# output features] whose values are indices into the input feature vector.
 
 
 - inverse_transform(X)[source]#
- Reverse the transformation operation. - Parameters:
- Xarray of shape [n_samples, n_selected_features]
- The input samples. 
 
- Returns:
- X_originalarray of shape [n_samples, n_original_features]
- Xwith columns of zeros inserted where features would have been removed by- transform.
 
 
 - set_output(*, transform=None)[source]#
- Set output container. - See Introducing the set_output API for an example on how to use the API. - Parameters:
- transform{“default”, “pandas”, “polars”}, default=None
- Configure output of - transformand- fit_transform.- "default": Default output format of a transformer
- "pandas": DataFrame output
- "polars": Polars output
- None: Transform configuration is unchanged
 - Added in version 1.4: - "polars"option was added.
 
- Returns:
- selfestimator instance
- Estimator instance. 
 
 
 - set_params(**params)[source]#
- Set the parameters of this estimator. - The method works on simple estimators as well as on nested objects (such as - Pipeline). The latter have parameters of the form- <component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
- Estimator parameters. 
 
- Returns:
- selfestimator instance
- Estimator instance. 
 
 
 
 
     
 
 
