robustx.datasets.provided_datasets package

Submodules

robustx.datasets.provided_datasets.AdultDatasetLoader module

class robustx.datasets.provided_datasets.AdultDatasetLoader.AdultDatasetLoader(seed=None)[source]

Bases: ExampleDatasetLoader

property X: DataFrame

Returns only feature variables as DataFrame @return: pd.DataFrame

get_default_preprocessed_features()[source]

Returns a preprocessed version of the dataset by using a default/standard preprocessing pipeline @return: pd.DataFrame

Return type:

DataFrame

load_data()[source]

Loads data into data attribute @return: None

property y: Series

Returns only target variable as Series @return: pd.Series

robustx.datasets.provided_datasets.ExampleDatasetLoader module

class robustx.datasets.provided_datasets.ExampleDatasetLoader.ExampleDatasetLoader(categoricals, numericals, missing_val_num=nan, missing_val_cat=nan, seed=None)[source]

Bases: DatasetLoader, ABC

An abstract extension of DatasetLoader class which stores example datasets provided within the library

Attributes / Properties

_categorical: list[str]

Stores the list of categorical column names

_numerical: list[str]

Stores the list of numerical column names

__missing_num: any

Value representing missing numerical data

__missing_cat: any

Value representing missing categorical data


categorical -> list[str]:

Returns the list of categorical features

numerical -> list[str]:

Returns the list of numerical features

load_data() None:[source]

Abstract method to load data into the dataset

get_default_preprocessed_features() pd.DataFrame:[source]

Abstract method to get the default preprocessed dataset

get_preprocessed_features() pd.DataFrame:[source]

Returns the dataset preprocessed according to user specifications (imputing, scaling, encoding)

default_preprocess() None:[source]

Preprocesses and updates the dataset using the default preprocessing method

preprocess() None:[source]

Preprocesses and updates the dataset based on user-provided parameters

-------
property categorical

Returns all categorical column names @return: list[str]

default_preprocess()[source]

Changes the data attribute to be preprocessed using the default method @return: None

abstract get_default_preprocessed_features()[source]

Returns a preprocessed version of the dataset by using a default/standard preprocessing pipeline @return: pd.DataFrame

Return type:

DataFrame

get_preprocessed_features(impute_strategy_numeric='mean', impute_strategy_categoric='most_frequent', fill_value_categoric=None, fill_value_numeric=None, scale_method='standard', encode_categorical=True, selected_features=None)[source]

Returns a preprocessed version of the dataset based on what the user inputs @param impute_strategy_numeric: strategy for imputing missing numeric values (‘mean’, ‘median’) @param impute_strategy_categoric: strategy for imputing missing categoric values (‘most_frequent’, ‘constant’) @param fill_value_categoric: value to use for constant imputing strategy for categorical features @param fill_value_numeric: value to use for constant imputing strategy for numerical features @param scale_method: method for scaling numerical features (‘standard’, ‘minmax’, None) @param encode_categorical: whether to encode categorical features (True/False) @param selected_features: list of features to select, if None all features are used @return: pd.DataFrame

Return type:

DataFrame

abstract load_data()[source]

Loads data into data attribute @return: None

property numerical: list[str]

Returns all numerical column names @return: list[str]

preprocess(impute_strategy_numeric='mean', impute_strategy_categoric='most_frequent', fill_value_categoric=None, fill_value_numeric=None, scale_method='standard', encode_categorical=True, selected_features=None)[source]

Changes the data attribute to be preprocessed based on parameters @param impute_strategy_numeric: strategy for imputing missing numeric values (‘mean’, ‘median’) @param impute_strategy_categoric: strategy for imputing missing categoric values (‘most_frequent’, ‘constant’) @param fill_value_categoric: value to use for constant imputing strategy for categorical features @param fill_value_numeric: value to use for constant imputing strategy for numerical features @param scale_method: method for scaling numerical features (‘standard’, ‘minmax’, None) @param encode_categorical: whether to encode categorical features (True/False) @param selected_features: list of features to select, if None all features are used @return: None

robustx.datasets.provided_datasets.IonosphereDatasetLoader module

class robustx.datasets.provided_datasets.IonosphereDatasetLoader.IonosphereDatasetLoader(seed=None)[source]

Bases: ExampleDatasetLoader

A DataLoader class responsible for loading the Ionosphere dataset

property X

Returns only feature variables as DataFrame @return: pd.DataFrame

get_default_preprocessed_features()[source]

Returns a preprocessed version of the dataset by using a default/standard preprocessing pipeline @return: pd.DataFrame

load_data()[source]

Loads data into data attribute @return: None

property y: Series

Returns only target variable as Series @return: pd.Series

robustx.datasets.provided_datasets.IrisDatasetLoader module

class robustx.datasets.provided_datasets.IrisDatasetLoader.IrisDatasetLoader(seed=None)[source]

Bases: ExampleDatasetLoader

A DataLoader class responsible for loading the Iris dataset

property X

Returns only feature variables as DataFrame @return: pd.DataFrame

get_default_preprocessed_features()[source]

Returns a preprocessed version of the dataset by using a default/standard preprocessing pipeline @return: pd.DataFrame

load_data()[source]

Loads data into data attribute @return: None

property y

Returns only target variable as Series @return: pd.Series

robustx.datasets.provided_datasets.TitanicDatasetLoader module

class robustx.datasets.provided_datasets.TitanicDatasetLoader.TitanicDatasetLoader(seed)[source]

Bases: ExampleDatasetLoader

property X: DataFrame

Returns only feature variables as DataFrame @return: pd.DataFrame

get_default_preprocessed_features()[source]

Returns a preprocessed version of the dataset by using a default/standard preprocessing pipeline @return: pd.DataFrame

Return type:

DataFrame

get_feature_names(preprocessor, categorical_features, numerical_features)[source]
load_data()[source]

Loads data into data attribute @return: None

property y: Series

Returns only target variable as Series @return: pd.Series

Module contents