robustx.datasets.provided_datasets package

Submodules

robustx.datasets.provided_datasets.AdultDatasetLoader module

class robustx.datasets.provided_datasets.AdultDatasetLoader.AdultDatasetLoader(seed=None)[source]

Bases: ExampleDatasetLoader

property X: DataFrame: Returns only feature variables as DataFrame @return: pd.DataFrame

get_default_preprocessed_features()[source]

Returns a preprocessed version of the dataset by using a default/standard preprocessing pipeline @return: pd.DataFrame

Return type:: DataFrame

load_data()[source]: Loads data into data attribute @return: None

property y: Series: Returns only target variable as Series @return: pd.Series

robustx.datasets.provided_datasets.ExampleDatasetLoader module

class robustx.datasets.provided_datasets.ExampleDatasetLoader.ExampleDatasetLoader(categoricals, numericals, missing_val_num=nan, missing_val_cat=nan, seed=None)[source]

Bases: DatasetLoader, ABC

An abstract extension of DatasetLoader class which stores example datasets provided within the library

…

Attributes / Properties

_categorical: list[str]: Stores the list of categorical column names
_numerical: list[str]: Stores the list of numerical column names
__missing_num: any: Value representing missing numerical data
__missing_cat: any: Value representing missing categorical data

categorical -> list[str]:: Returns the list of categorical features

numerical -> list[str]:: Returns the list of numerical features

load_data() → None:[source]: Abstract method to load data into the dataset

get_default_preprocessed_features() → pd.DataFrame:[source]: Abstract method to get the default preprocessed dataset

get_preprocessed_features() → pd.DataFrame:[source]: Returns the dataset preprocessed according to user specifications (imputing, scaling, encoding)

default_preprocess() → None:[source]: Preprocesses and updates the dataset using the default preprocessing method

preprocess() → None:[source]: Preprocesses and updates the dataset based on user-provided parameters

-------

property categorical: Returns all categorical column names @return: list[str]

default_preprocess()[source]: Changes the data attribute to be preprocessed using the default method @return: None

abstract get_default_preprocessed_features()[source]

Returns a preprocessed version of the dataset by using a default/standard preprocessing pipeline @return: pd.DataFrame

Return type:: DataFrame

get_preprocessed_features(impute_strategy_numeric='mean', impute_strategy_categoric='most_frequent', fill_value_categoric=None, fill_value_numeric=None, scale_method='standard', encode_categorical=True, selected_features=None)[source]

Returns a preprocessed version of the dataset based on what the user inputs @param impute_strategy_numeric: strategy for imputing missing numeric values (‘mean’, ‘median’) @param impute_strategy_categoric: strategy for imputing missing categoric values (‘most_frequent’, ‘constant’) @param fill_value_categoric: value to use for constant imputing strategy for categorical features @param fill_value_numeric: value to use for constant imputing strategy for numerical features @param scale_method: method for scaling numerical features (‘standard’, ‘minmax’, None) @param encode_categorical: whether to encode categorical features (True/False) @param selected_features: list of features to select, if None all features are used @return: pd.DataFrame

Return type:: DataFrame

abstract load_data()[source]: Loads data into data attribute @return: None

property numerical: list[str]: Returns all numerical column names @return: list[str]

preprocess(impute_strategy_numeric='mean', impute_strategy_categoric='most_frequent', fill_value_categoric=None, fill_value_numeric=None, scale_method='standard', encode_categorical=True, selected_features=None)[source]: Changes the data attribute to be preprocessed based on parameters @param impute_strategy_numeric: strategy for imputing missing numeric values (‘mean’, ‘median’) @param impute_strategy_categoric: strategy for imputing missing categoric values (‘most_frequent’, ‘constant’) @param fill_value_categoric: value to use for constant imputing strategy for categorical features @param fill_value_numeric: value to use for constant imputing strategy for numerical features @param scale_method: method for scaling numerical features (‘standard’, ‘minmax’, None) @param encode_categorical: whether to encode categorical features (True/False) @param selected_features: list of features to select, if None all features are used @return: None

robustx.datasets.provided_datasets.IonosphereDatasetLoader module

class robustx.datasets.provided_datasets.IonosphereDatasetLoader.IonosphereDatasetLoader(seed=None)[source]

Bases: ExampleDatasetLoader

A DataLoader class responsible for loading the Ionosphere dataset

property X: Returns only feature variables as DataFrame @return: pd.DataFrame

get_default_preprocessed_features()[source]: Returns a preprocessed version of the dataset by using a default/standard preprocessing pipeline @return: pd.DataFrame

load_data()[source]: Loads data into data attribute @return: None

property y: Series: Returns only target variable as Series @return: pd.Series

robustx.datasets.provided_datasets.IrisDatasetLoader module

class robustx.datasets.provided_datasets.IrisDatasetLoader.IrisDatasetLoader(seed=None)[source]

Bases: ExampleDatasetLoader

A DataLoader class responsible for loading the Iris dataset

property X: Returns only feature variables as DataFrame @return: pd.DataFrame

get_default_preprocessed_features()[source]: Returns a preprocessed version of the dataset by using a default/standard preprocessing pipeline @return: pd.DataFrame

load_data()[source]: Loads data into data attribute @return: None

property y: Returns only target variable as Series @return: pd.Series

robustx.datasets.provided_datasets.TitanicDatasetLoader module

class robustx.datasets.provided_datasets.TitanicDatasetLoader.TitanicDatasetLoader(seed)[source]

Bases: ExampleDatasetLoader

property X: DataFrame: Returns only feature variables as DataFrame @return: pd.DataFrame

get_default_preprocessed_features()[source]

Returns a preprocessed version of the dataset by using a default/standard preprocessing pipeline @return: pd.DataFrame

Return type:: DataFrame

get_feature_names(preprocessor, categorical_features, numerical_features)[source]

load_data()[source]: Loads data into data attribute @return: None

property y: Series: Returns only target variable as Series @return: pd.Series

robustx.datasets.provided_datasets package

Submodules

robustx.datasets.provided_datasets.AdultDatasetLoader module

robustx.datasets.provided_datasets.ExampleDatasetLoader module

Attributes / Properties

robustx.datasets.provided_datasets.IonosphereDatasetLoader module

robustx.datasets.provided_datasets.IrisDatasetLoader module

robustx.datasets.provided_datasets.TitanicDatasetLoader module

Module contents