frflib.utils.ml_analysis.processing

module for genereating prepocessing pipeline in sklearnDF

Module Contents

Functions

make_static_list(→ list)

Create a list to store the preprocessing transformations we want to apply to each column.

make_dict_preprocessing(→ dict)

Create a dict with the name of transformation to apply to categorical and numerical columns.

make_pipeline(→ sklearn.compose.ColumnTransformer)

Concatenate the numerical pipeline and categorical pipeline to make the final pipeline to handle the data preprocessing

frflib.utils.ml_analysis.processing.make_static_list(df: pandas.DataFrame) list

Create a list to store the preprocessing transformations we want to apply to each column.

Parameters:

df (pd.DataFrame) – Dataframe

Returns:

list for all columns with the transformation to apply to each columns

Return type:

list

frflib.utils.ml_analysis.processing.make_dict_preprocessing(data_static: list) dict

Create a dict with the name of transformation to apply to categorical and numerical columns.

Parameters:

data_static (list for all columns with the transformation to apply to each columns) – []

Returns:

In each list of transformation available we add the names of the columns concerned

Return type:

dict

frflib.utils.ml_analysis.processing.make_pipeline(dict_preprocessing: dict, fill_num=DEFAULT_NUM_FILLNA, fill_cat=DEFAULT_CAT_FILLNA) sklearn.compose.ColumnTransformer

Concatenate the numerical pipeline and categorical pipeline to make the final pipeline to handle the data preprocessing

Parameters:
  • dict_preprocessing (dict) – dict with the name of transformation to apply to categorical and numerical columns

  • fill_num (str, optional) – replace the numerical missing values, defaults to “mean”

  • fill_cat (str, optional) – replace the categorical missing values, defaults to “constant”

Returns:

Final pipeline with all preprocessing steps

Return type:

sklearndf.transformation.ColumnTransformerDF