GeneticRoughSetSelector
- class scikit_weak.feature_selection.GeneticRoughSetSelector(epsilon=0.0, method='conservative', discrete=False, l=0.5, tournament_size=0.1, p_mutate=0.1, metric='minkowski', neighborhood='nearest', n_neighbors=3, radius=1.0, random_state=None, population_size=100, n_iters=100)
A class to perform Rough Set-based feature selection, by searching for reducts, using Genetic Algorithms. The y input to the fit method should be given as an iterable of DiscreteWeakLabel. Supports both discrete (using Pawlak Rough Sets) and continuous (using Neighborhood Rough Sets) datasets.
- Parameters
epsilon (float, default=0.0) – The approximation factor. Should be a number between 0.0 and 1.0 (excluded)
method ({'lambda', 'conservative', 'dominance'}, default='conservative') – The method used to compute the fitness. If ‘lambda’, then the algorithm solves a single objective optimization problem If ‘conservative’ or ‘dominance’ solves a multiple objective optimization problem: in particular, if ‘conservative’ a 2-objectives problem and if ‘dominance’ (n+1)-objectives problem, where n is the number of instances
discrete (bool, default=True) – Whether the input X is discrete or not. If discrete=True then use equivalence-based (i.e. Pawlak) Rough Sets. If discrete=False use neighborhood-based Rough Sets.
l (float in [0,1], default=0.5) – Lambda interpolation factor. Only used if method=’lambda’
tournament_size (float in [0,1], default=0.1) – Proportion of population to select from in tournament selection.
p_mutate (float in [0,1], default=0.1) – Proability of point mutation
metric (string or function, default='minkowski') – Metric to be used with neighborhood-based Rough Sets. Only used if discrete=False. If discrete=True, then metric=”hamming”
neighborhood ({'delta', 'nearest'}, default='nearest') – Type of neighborhood-based Rough Sets to be used. If neighborhood=’delta’, then use delta-neighborhood Rough Sets: all neigbhors with distance <= radius are selected. If neighborhood=’nearest’, then use k-nearest-neighbors Rough Sets: only the k nearest neighbors are selected. Only used if discrete=False
n_neighbors (int, default=3) – Number of nearest neighbors to select. Only used if discrete=False and neighborhood=’nearest’
radius (float, default=1.0) – Radius to select neighbors. Only used if discrete=False and neighborhood=’delta’
random_state (int, default=None) – Randomization seed. Used only if search_strategy=’approximate’
population_size (int, default=100) – Size of the population for the Genetic Algorithm
n_iters (int, default=100) – Number of generations for the Genetic Algorithm
- Variables
n_classes (int) – The number of unique classes in y
best_features (ndarray) – The unique most fit feature sets.
best_targets – The disambiguated targets corresponing to the most fit feature sets. Can be used for transductive learning or training a downstream model.
- fit(X, y)
Fit the GeneticRoughSetSelector model
- fit_transform(X, y)
Fit and then transform data
- transform(X, y=None)
Transform the data (only X, y is ignored) selecting a reduct at random