robustx.datasets.provided_datasets package
Submodules
robustx.datasets.provided_datasets.AdultDatasetLoader module
- class robustx.datasets.provided_datasets.AdultDatasetLoader.AdultDatasetLoader(seed=None)[source]
Bases:
ExampleDatasetLoader
- property X: DataFrame
Returns only feature variables as DataFrame @return: pd.DataFrame
- get_default_preprocessed_features()[source]
Returns a preprocessed version of the dataset by using a default/standard preprocessing pipeline @return: pd.DataFrame
- Return type:
DataFrame
- property y: Series
Returns only target variable as Series @return: pd.Series
robustx.datasets.provided_datasets.ExampleDatasetLoader module
- class robustx.datasets.provided_datasets.ExampleDatasetLoader.ExampleDatasetLoader(categoricals, numericals, missing_val_num=nan, missing_val_cat=nan, seed=None)[source]
Bases:
DatasetLoader
,ABC
An abstract extension of DatasetLoader class which stores example datasets provided within the library
…
Attributes / Properties
- _categorical: list[str]
Stores the list of categorical column names
- _numerical: list[str]
Stores the list of numerical column names
- __missing_num: any
Value representing missing numerical data
- __missing_cat: any
Value representing missing categorical data
- categorical -> list[str]:
Returns the list of categorical features
- numerical -> list[str]:
Returns the list of numerical features
- get_default_preprocessed_features() pd.DataFrame: [source]
Abstract method to get the default preprocessed dataset
- get_preprocessed_features() pd.DataFrame: [source]
Returns the dataset preprocessed according to user specifications (imputing, scaling, encoding)
- default_preprocess() None: [source]
Preprocesses and updates the dataset using the default preprocessing method
- -------
- property categorical
Returns all categorical column names @return: list[str]
- default_preprocess()[source]
Changes the data attribute to be preprocessed using the default method @return: None
- abstract get_default_preprocessed_features()[source]
Returns a preprocessed version of the dataset by using a default/standard preprocessing pipeline @return: pd.DataFrame
- Return type:
DataFrame
- get_preprocessed_features(impute_strategy_numeric='mean', impute_strategy_categoric='most_frequent', fill_value_categoric=None, fill_value_numeric=None, scale_method='standard', encode_categorical=True, selected_features=None)[source]
Returns a preprocessed version of the dataset based on what the user inputs @param impute_strategy_numeric: strategy for imputing missing numeric values (‘mean’, ‘median’) @param impute_strategy_categoric: strategy for imputing missing categoric values (‘most_frequent’, ‘constant’) @param fill_value_categoric: value to use for constant imputing strategy for categorical features @param fill_value_numeric: value to use for constant imputing strategy for numerical features @param scale_method: method for scaling numerical features (‘standard’, ‘minmax’, None) @param encode_categorical: whether to encode categorical features (True/False) @param selected_features: list of features to select, if None all features are used @return: pd.DataFrame
- Return type:
DataFrame
- property numerical: list[str]
Returns all numerical column names @return: list[str]
- preprocess(impute_strategy_numeric='mean', impute_strategy_categoric='most_frequent', fill_value_categoric=None, fill_value_numeric=None, scale_method='standard', encode_categorical=True, selected_features=None)[source]
Changes the data attribute to be preprocessed based on parameters @param impute_strategy_numeric: strategy for imputing missing numeric values (‘mean’, ‘median’) @param impute_strategy_categoric: strategy for imputing missing categoric values (‘most_frequent’, ‘constant’) @param fill_value_categoric: value to use for constant imputing strategy for categorical features @param fill_value_numeric: value to use for constant imputing strategy for numerical features @param scale_method: method for scaling numerical features (‘standard’, ‘minmax’, None) @param encode_categorical: whether to encode categorical features (True/False) @param selected_features: list of features to select, if None all features are used @return: None
robustx.datasets.provided_datasets.IonosphereDatasetLoader module
- class robustx.datasets.provided_datasets.IonosphereDatasetLoader.IonosphereDatasetLoader(seed=None)[source]
Bases:
ExampleDatasetLoader
A DataLoader class responsible for loading the Ionosphere dataset
- property X
Returns only feature variables as DataFrame @return: pd.DataFrame
- get_default_preprocessed_features()[source]
Returns a preprocessed version of the dataset by using a default/standard preprocessing pipeline @return: pd.DataFrame
- property y: Series
Returns only target variable as Series @return: pd.Series
robustx.datasets.provided_datasets.IrisDatasetLoader module
- class robustx.datasets.provided_datasets.IrisDatasetLoader.IrisDatasetLoader(seed=None)[source]
Bases:
ExampleDatasetLoader
A DataLoader class responsible for loading the Iris dataset
- property X
Returns only feature variables as DataFrame @return: pd.DataFrame
- get_default_preprocessed_features()[source]
Returns a preprocessed version of the dataset by using a default/standard preprocessing pipeline @return: pd.DataFrame
- property y
Returns only target variable as Series @return: pd.Series
robustx.datasets.provided_datasets.TitanicDatasetLoader module
- class robustx.datasets.provided_datasets.TitanicDatasetLoader.TitanicDatasetLoader(seed)[source]
Bases:
ExampleDatasetLoader
- property X: DataFrame
Returns only feature variables as DataFrame @return: pd.DataFrame
- get_default_preprocessed_features()[source]
Returns a preprocessed version of the dataset by using a default/standard preprocessing pipeline @return: pd.DataFrame
- Return type:
DataFrame
- property y: Series
Returns only target variable as Series @return: pd.Series