GeneticRoughSetSelector

class scikit_weak.feature_selection.GeneticRoughSetSelector(epsilon=0.0, method='conservative', discrete=False, l=0.5, tournament_size=0.1, p_mutate=0.1, metric='minkowski', neighborhood='nearest', n_neighbors=3, radius=1.0, random_state=None, population_size=100, n_iters=100)

A class to perform Rough Set-based feature selection, by searching for reducts, using Genetic Algorithms. The y input to the fit method should be given as an iterable of DiscreteWeakLabel. Supports both discrete (using Pawlak Rough Sets) and continuous (using Neighborhood Rough Sets) datasets.

Parameters

epsilon (float, default=0.0) – The approximation factor. Should be a number between 0.0 and 1.0 (excluded)
method ({'lambda', 'conservative', 'dominance'}, default='conservative') – The method used to compute the fitness. If ‘lambda’, then the algorithm solves a single objective optimization problem If ‘conservative’ or ‘dominance’ solves a multiple objective optimization problem: in particular, if ‘conservative’ a 2-objectives problem and if ‘dominance’ (n+1)-objectives problem, where n is the number of instances
discrete (bool, default=True) – Whether the input X is discrete or not. If discrete=True then use equivalence-based (i.e. Pawlak) Rough Sets. If discrete=False use neighborhood-based Rough Sets.
l (float in [0,1], default=0.5) – Lambda interpolation factor. Only used if method=’lambda’
tournament_size (float in [0,1], default=0.1) – Proportion of population to select from in tournament selection.
p_mutate (float in [0,1], default=0.1) – Proability of point mutation
metric (string or function, default='minkowski') – Metric to be used with neighborhood-based Rough Sets. Only used if discrete=False. If discrete=True, then metric=”hamming”
neighborhood ({'delta', 'nearest'}, default='nearest') – Type of neighborhood-based Rough Sets to be used. If neighborhood=’delta’, then use delta-neighborhood Rough Sets: all neigbhors with distance <= radius are selected. If neighborhood=’nearest’, then use k-nearest-neighbors Rough Sets: only the k nearest neighbors are selected. Only used if discrete=False
n_neighbors (int, default=3) – Number of nearest neighbors to select. Only used if discrete=False and neighborhood=’nearest’
radius (float, default=1.0) – Radius to select neighbors. Only used if discrete=False and neighborhood=’delta’
random_state (int, default=None) – Randomization seed. Used only if search_strategy=’approximate’
population_size (int, default=100) – Size of the population for the Genetic Algorithm
n_iters (int, default=100) – Number of generations for the Genetic Algorithm

Variables

n_classes (int) – The number of unique classes in y
best_features (ndarray) – The unique most fit feature sets.
best_targets – The disambiguated targets corresponing to the most fit feature sets. Can be used for transductive learning or training a downstream model.

fit(X, y): Fit the GeneticRoughSetSelector model

fit_transform(X, y): Fit and then transform data

transform(X, y=None): Transform the data (only X, y is ignored) selecting a reduct at random