Data Preparation
Data Augmentation
This endpoint allows users to upload a file and apply data augmentation techniques to a specified text column. Users can choose from various augmentation methods and specify the number of augmented rows to generate.
Endpoint: POST /data-augmentation
Request Parameters
File Upload
file
(required): The file to be processed. Supported formats include CSV, Excel, etc.
Form Parameters
method
(optional, default:word_shuffling
): The data augmentation method to apply. Supported values:word_shuffling
: Shuffles words in the text.sentence_shuffling
: Shuffles sentences in the text.word_replacement
: Replaces words using a BERT-based model.syntax_tree_manipulation
: Manipulates the syntax tree of the text.random_word_insertion
: Inserts random words into the text.random_word_deletion
: Deletes random words from the text.
column
(required): The column in the file to apply the augmentation to.num_augmented
(optional, default:1
): The number of augmented rows to generate for each original row.
Processing Logic
- Read the uploaded file: The file is converted into a Pandas DataFrame.
- Validate the column: Checks if the specified column exists in the DataFrame.
- Apply the selected augmentation method:
- Word Shuffling: Shuffles the words in the text.
- Sentence Shuffling: Shuffles the sentences in the text.
- Word Replacement: Replaces words using a BERT-based model.
- Syntax Tree Manipulation: Manipulates the syntax tree of the text.
- Random Word Insertion: Inserts random words into the text.
- Random Word Deletion: Deletes random words from the text.
- Generate augmented rows: For each original row, the specified number of augmented rows is generated.
- Return the modified file: The augmented file is provided for download with the filename appended with
_augmented
.