GeneticRoughSetSelector

class scikit_weak.feature_selection.GeneticRoughSetSelector(epsilon=0.0, method='conservative', discrete=False, l=0.5, tournament_size=0.1, p_mutate=0.1, metric='minkowski', neighborhood='nearest', n_neighbors=3, radius=1.0, random_state=None, population_size=100, n_iters=100)

A class to perform Rough Set-based feature selection, by searching for reducts, using Genetic Algorithms. The y input to the fit method should be given as an iterable of DiscreteWeakLabel. Supports both discrete (using Pawlak Rough Sets) and continuous (using Neighborhood Rough Sets) datasets.

Parameters
  • epsilon (float, default=0.0) – The approximation factor. Should be a number between 0.0 and 1.0 (excluded)

  • method ({'lambda', 'conservative', 'dominance'}, default='conservative') – The method used to compute the fitness. If ‘lambda’, then the algorithm solves a single objective optimization problem If ‘conservative’ or ‘dominance’ solves a multiple objective optimization problem: in particular, if ‘conservative’ a 2-objectives problem and if ‘dominance’ (n+1)-objectives problem, where n is the number of instances

  • discrete (bool, default=True) – Whether the input X is discrete or not. If discrete=True then use equivalence-based (i.e. Pawlak) Rough Sets. If discrete=False use neighborhood-based Rough Sets.

  • l (float in [0,1], default=0.5) – Lambda interpolation factor. Only used if method=’lambda’

  • tournament_size (float in [0,1], default=0.1) – Proportion of population to select from in tournament selection.

  • p_mutate (float in [0,1], default=0.1) – Proability of point mutation

  • metric (string or function, default='minkowski') – Metric to be used with neighborhood-based Rough Sets. Only used if discrete=False. If discrete=True, then metric=”hamming”

  • neighborhood ({'delta', 'nearest'}, default='nearest') – Type of neighborhood-based Rough Sets to be used. If neighborhood=’delta’, then use delta-neighborhood Rough Sets: all neigbhors with distance <= radius are selected. If neighborhood=’nearest’, then use k-nearest-neighbors Rough Sets: only the k nearest neighbors are selected. Only used if discrete=False

  • n_neighbors (int, default=3) – Number of nearest neighbors to select. Only used if discrete=False and neighborhood=’nearest’

  • radius (float, default=1.0) – Radius to select neighbors. Only used if discrete=False and neighborhood=’delta’

  • random_state (int, default=None) – Randomization seed. Used only if search_strategy=’approximate’

  • population_size (int, default=100) – Size of the population for the Genetic Algorithm

  • n_iters (int, default=100) – Number of generations for the Genetic Algorithm

Variables
  • n_classes (int) – The number of unique classes in y

  • best_features (ndarray) – The unique most fit feature sets.

  • best_targets – The disambiguated targets corresponing to the most fit feature sets. Can be used for transductive learning or training a downstream model.

fit(X, y)

Fit the GeneticRoughSetSelector model

fit_transform(X, y)

Fit and then transform data

transform(X, y=None)

Transform the data (only X, y is ignored) selecting a reduct at random