Data Preparation
Feature Engineering
This endpoint allows users to upload a file and apply various feature engineering transformations to the data. Users can specify transformations for numeric, datetime, categorical, and text columns.
Endpoint: POST /feature-engineering
Request Parameters
File Upload
file
(required): The file to be processed. Supported formats include CSV, Excel, etc.
Form Parameters
numeric_transforms
(optional): A list of transformations to apply to numeric columns.- Supported values:
scale
,bin
,poly
.
- Supported values:
datetime_transforms
(optional): A list of transformations to apply to datetime columns.- Supported values:
year
,month
,day
,hour
.
- Supported values:
categorical_transforms
(optional): A list of transformations to apply to categorical columns.- Supported values:
onehot
,freq
.
- Supported values:
text_transforms
(optional): A list of transformations to apply to text columns.- Supported values:
tfidf
,count
.
- Supported values:
custom_transforms
(optional): A list of custom transformations to apply. Each transformation is a dictionary with the following keys:method
: The transformation method (e.g.,log
,interaction
,sqrt
, etc.).columns
: The columns to apply the transformation to.params
: Additional parameters for the transformation.
Processing Logic
- Read the uploaded file: The file is converted into a Pandas DataFrame.
- Infer column types: The types of columns (numeric, datetime, categorical, text) are inferred automatically.
- Apply numeric transformations (if specified):
scale
: Standard scaling is applied to numeric columns.bin
: Numeric columns are binned into quantiles.poly
: Polynomial features are generated for numeric columns.
- Apply datetime transformations (if specified):
year
: Extracts the year from datetime columns.month
: Extracts the month from datetime columns.day
: Extracts the day from datetime columns.hour
: Extracts the hour from datetime columns.
- Apply categorical transformations (if specified):
onehot
: One-hot encoding is applied to categorical columns.freq
: Frequency encoding is applied to categorical columns.
- Apply text transformations (if specified):
tfidf
: TF-IDF vectorization is applied to text columns.count
: Count vectorization is applied to text columns.
- Apply custom transformations (if specified): Custom transformations are applied based on the provided configuration.
- Return the modified file: The transformed file is provided for download with the filename appended with
_feature_engineering
.