RoughSetSelector

class scikit_weak.feature_selection.RoughSetSelector(search_strategy='approximate', epsilon=0.0, n_iters=100, method='conservative', l=0.5, discrete=True, metric='minkowski', neighborhood='nearest', n_neighbors=3, radius=1.0, random_state=None)

A class to perform Feature Selection based on Rough Sets by searching for reducts [1]. The y input to the fit method should be given as an iterable of DiscreteWeakLabel. Supports both discrete (using Pawlak Rough Sets) and continuous (using Neighborhood Rough Sets) datasets.

Parameters
  • search_strategy ({'approximate', 'brute'}, default='approximate') – The search strategy to be used. ‘approximate’ is similar to RFE, having complexity O(n^2). ‘brute’ is a brute-force search strategy, all possible combinations of features are evaluated, with complexity O(2^n)

  • epsilon (float, default=0.0) – The approximation factor. Should be a number between 0.0 and 1.0 (excluded)

  • n_iters (int, default=100) – Number of iterations to be used when search_strategy=’approximate’. Not used if search_strategy=’brute’

  • method ({'lambda', 'conservative'}, default='conservative') – The method used to compute the fitness. If ‘lambda’, then the algorithm solves a single objective optimization problem If ‘conservative’ solve a 2-objectives problem

  • l (float in [0,1], default=0.5) – Lambda interpolation factor. Only used if method=’lambda’

  • discrete (bool, default=True) – Whether the input X is discrete or not. If discrete=True then use equivalence-based (i.e. Pawlak) Rough Sets. If discrete=False use neighborhood-based Rough Sets.

  • metric (string or function, default='minkowski') – Metric to be used with neighborhood-based Rough Sets. Only used if discrete=False. If discrete=True, then metric=”hamming”

  • neighborhood ({'delta', 'nearest'}, default='nearest') – Type of neighborhood-based Rough Sets to be used. If neighborhood=’delta’, then use delta-neighborhood Rough Sets: all neigbhors with distance <= radius are selected. If neighborhood=’nearest’, then use k-nearest-neighbors Rough Sets: only the k nearest neighbors are selected. Only used if discrete=False

  • n_neighbors (int, default=3) – Number of nearest neighbors to select. Only used if discrete=False and neighborhood=’nearest’

  • radius (float, default=1.0) – Radius to select neighbors. Only used if discrete=False and neighborhood=’delta’

  • random_state (int, default=None) – Randomization seed. Used only if search_strategy=’approximate’

Variables
  • n_classes (int) – The number of unique classes in y

  • reducts (list) – The list of minimal reducts. If search_strategy=’approximate’, reducts always contains at most a single set of features for each membership degree. If search_strategy=’brute’, reducts contains the list of all minimal reducts.

  • reducts_poss – The list of membership values of the minimal reducts.

fit(X, y)

Fit the RoughSetSelector model

fit_transform(X, y)

Fit and then transform data

transform(X, y=None)

Transform the data (only X, y is ignored) selecting a reduct at random

References

[1] Campagner, A., Ciucci, D., Hüllermeier, E. (2021).

Rough set-based feature selection for weakly labeled data. International Journal of Approximate Reasoning, 136, 150-167. https://doi.org/10.1016/j.ijar.2021.06.005.

[2] Campagner, A., Ciucci, D. (2021)

Feature Selection and Disambiguation in Learning from Fuzzy Labels Using Rough Sets. International Joint Conference on Rough Sets, LNCS 12872, 164-179. https://doi.org/10.1007/978-3-030-87334-9_14

[3] Campagner, A., Ciucci, D., & Hüllermeier, E. (2020).

Feature Reduction in Superset Learning Using Rough Sets and Evidence Theory. International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, CCIS 1237, 471-484. https://doi.org/10.1007/978-3-030-50146-4_35