Data Preparation
Data Binning
This endpoint allows users to upload a CSV file and apply a specified binning method to a numeric column. Supported methods include equal-width, equal-frequency, and custom binning.
Endpoint: POST /binning
Request Parameters
File Upload
file
(required): The CSV file to be processed.
Form Parameters
column
(required): The column to apply binning.method
(optional, default="equal_width"
): Binning method to apply. Supported values:"equal_width"
: Divides data into bins of equal width."equal_frequency"
: Divides data into bins with an equal number of data points."custom"
: Uses custom-defined bin edges.
num_bins
(optional, default=5
): Number of bins for equal-width or equal-frequency methods.custom_bins
(optional): Comma-separated list of custom bin edges.labels
(optional): Comma-separated list of bin labels.include_lowest
(optional, default=True
): Whether to include the lowest value in the first bin.right
(optional, default=True
): Whether the bins are right-inclusive.handle_outliers
(optional, default="other"
): How to handle outliers. Options:"other"
: Assigns outliers to separate bins."exclude"
: Excludes outliers from binning.
dtype
(optional, default="categorical"
): Output data type of the binned column. Supported values:"categorical"
,"integer"
,"string"
.
Processing Logic
- File Reading: The uploaded file is read and converted into a Pandas DataFrame.
- Input Validation: Ensures the specified column exists and contains numeric data.
- Method Selection:
- Equal Width: Creates equal-width bins based on data range.
- Equal Frequency: Distributes data into bins with equal frequencies.
- Custom Binning: Applies user-defined bin edges.
- Handle Outliers: Configures the handling of outliers based on user preference.
- Labeling and Data Type Conversion: Applies labels and converts the output to the specified data type.
- Return Transformed File: Provides a downloadable file with the binned column appended.