Each group can have a maximum of 3 students.
Multiple groups
cannot do the same assignment. Assignments will be allotted on a
first-come-first-allotted basis. One member of each group will email
me regarding which assignment that group wants to do along with the
email-ids of other members of that group. After getting the approval
from me, please fill the following form.
https://docs.google.com/spreadsheets/d/145rzZE2yrKEOGQaUZsH-98
JhWxF991m82xR6BpsVwxM/edit?usp=sharing
Deadline to submit/present the assignments: Nov 22, 2024
General guidelines/suggestions to follow for all assignments: Show
effect of varying different hyperparameters such as # layers, hidden size,
optimizer, batch size, dropout value, learning rate, loss function on model’s
performance. Use validation data to prevent overfitting e.g. use early
stopping conditions. Apply tokenization in batches to reduce time spent on
data preprocessing. Experiment with freezing the initial layers and only
fine-tuning the outer layers. Also, check if there exists a class imbalance
problem in the dataset mentioned in each assignment and try to mitigate
that by assigning class weights. You can try parameter efficient fine-tuning
(PEFT) using approaches like LoRA (low-rank adapters) to tune only a
small subset of model parameters for faster training.
Innovation: Coming up with simple ideas/heuristics to improve the model’s
performance is encouraged. <Extra credit will be given up to 10% >.
Assignment 1: Named Entity Recognition (NER) with BERT
Objective:
Fine-tune a BERT-based model for Named Entity Recognition (NER)
using a publicly available dataset like CoNLL-2003.
Tasks:
1. Preprocessing: Load and preprocess the CoNLL-2003 dataset.
This includes: - Tokenizing the text using BERT tokenizer.
- Structuring the data into a format compatible with the `transformers`
library for token classification.
2. Fine-Tuning:
- Fine-tune BERT on the NER task using Hugging Face’s
`transformers` library. - Implement appropriate hyperparameter
tuning for model optimization.
3. Evaluation:
- Use precision, recall, and F1-score as evaluation metrics to
measure the model's performance.
- Compare the performance of your model with the benchmark
results from the CoNLL-2003 challenge.
4. Error Analysis:
- Perform detailed error analysis to understand the common mistakes
made by the model (e.g., confusion between similar entity types).
- Suggest potential improvements, such as using Conditional Random
Fields (CRF) or data augmentation.
Deliverables:
- A Jupyter notebook with the implementation of preprocessing,
model training, and evaluation.
- A written report detailing the model architecture, the
hyperparameters used, and an analysis of the model’s performance
along with the error analysis.
---
Assignment 2: Sentiment Analysis with RoBERTa
Objective:
Fine-tune RoBERTa for sentiment analysis using the IMDB movie reviews
dataset for binary classification (positive/negative).
Tasks:
1. Preprocessing:
- Clean the IMDB dataset (e.g., removing special characters,
handling missing data). - Tokenize the data using RoBERTa’s
tokenizer.
2. Fine-Tuning:
- Fine-tune RoBERTa on the sentiment classification task.
- Experiment with different hyperparameters like learning rate, batch
size, and training epochs to optimize model performance.
3. Model Evaluation:
- Evaluate the fine-tuned model using metrics like accuracy,
precision, recall, and F1-score.
- Test the model on a custom set of movie reviews and analyze the model’s
performance.
4. Hyperparameter Experimentation:
- Conduct an experiment to study the effect of various
hyperparameters on model performance (learning rate, batch size,
etc.).
Deliverables:
- A notebook showing the fine-tuning process and hyperparameter
experiments. - A report analyzing the impact of hyperparameters on
performance, including any failure cases (reviews incorrectly classified).
---
Assignment 3: Document Classification using BERT
Objective:
Fine-tune BERT for document classification using the 20 Newsgroups
dataset.
Tasks:
1. Preprocessing:
- Clean and preprocess the text in the 20 Newsgroups dataset.
- Tokenize using BERT’s tokenizer, making sure to handle input length
properly by splitting long documents if necessary.
2. Fine-Tuning:
- Fine-tune BERT for multi-class classification on the dataset.
- Experiment with different pooling strategies (CLS token pooling,
mean pooling) to aggregate document representations.
3. Evaluation:
- Use accuracy, precision, recall, and F1-score to evaluate the
model’s performance. - Perform cross-validation to get robust results
and reduce the risk of overfitting.
4. Analysis of Pooling Techniques:
- Compare the results of different pooling methods and analyze their
effects on model accuracy and other evaluation metrics.
Deliverables:
- Code that shows the preprocessing, fine-tuning, and evaluation
steps. - A report comparing pooling strategies and analyzing their
effect on the model’s performance, supported by experimental
results.
---
Assignment 4: Multi-Lingual NER with mBERT
Objective:
Fine-tune mBERT for Named Entity Recognition (NER) using a
multilingual dataset like WikiAnn.
Tasks:
1. Preprocessing:
- Load and preprocess the WikiAnn dataset for multiple languages.
- Tokenize the text using the multilingual BERT tokenizer, and
structure the data accordingly.
2. Fine-Tuning:
- Fine-tune mBERT on NER for multiple languages (e.g., English,
German, French). - Implement transfer learning: fine-tune the model
on one language and evaluate on another.
3. Evaluation:
- Compare performance across different languages using
standard NER metrics (precision, recall, F1-score).
- Perform error analysis on low-resource languages to understand
where the model struggles.
4. Transfer Learning Experiment:
- Evaluate how well the model trained on one language transfers to
another language.
Deliverables:
- A Jupyter notebook/code that fine-tunes mBERT and evaluates
performance across multiple languages.
- A detailed report discussing the challenges of multilingual NER and the
impact of transfer learning on low-resource languages.
---
Assignment 5: Zero-Shot Classification with RoBERTa
Objective:
Fine-tune RoBERTa for zero-shot classification on a custom set of
documents, using Hugging Face’s `transformers` library.
Tasks:
1. Model Setup:
- Load a pre-trained RoBERTa model using the `transformers`
library for zero-shot classification.
2. Data Collection:
- Create or use a custom dataset containing various categories
(e.g., news, sports, technology).
- Use the zero-shot learning setup to classify the documents into
predefined categories.
3. Evaluation:
- Evaluate the model’s performance by comparing the assigned labels
with the ground truth.
- Perform error analysis to identify which categories are difficult for the
model to classify.
4. Improvements:
- Suggest improvements based on error analysis, such as better
prompt engineering or dataset augmentation.
Deliverables:
- Code demonstrating the setup, fine-tuning, and evaluation of
RoBERTa for zero-shot classification.
- A report analyzing the model’s performance, the challenges
encountered, and potential improvements.
---
Assignment 6: Token Classification with DistilBERT
Objective:
Fine-tune DistilBERT for token classification on a task like Part-of-Speech
(POS) tagging.
Tasks:
1. Preprocessing:
- Preprocess a POS tagging dataset, ensuring that the text is
tokenized properly using DistilBERT’s tokenizer.
2. Fine-Tuning:
- Fine-tune DistilBERT on the POS tagging task, experimenting with
different batch sizes, learning rates, and epochs.
3. Evaluation:
- Evaluate the model using token-level accuracy, F1-score, and other
relevant metrics. - Compare the performance of DistilBERT to BERT and
analyze the trade-offs between model size and performance.
4. Efficiency Analysis:
- Analyze the performance of DistilBERT in terms of computational
efficiency (e.g., training time, memory usage) compared to BERT.
Deliverables:
- A Jupyter notebook with the preprocessing, fine-tuning, and evaluation
of DistilBERT for token classification.
- A report discussing the performance trade-offs between DistilBERT
and BERT, and an analysis of computational efficiency.
Deliverables for assignments 7-12:
A Jypyter notebook/code including tokenization, model fine-tuning,
evaluation of the model, and error handling.
Write a report on data preparation and preprocessing, model
architecture, optimal hyperparameter setting, results, and error
analysis.
Assignment 7: Sequence classification with BART model
(encoder-decoder model)
Objective:
Fine-tune a BART-based model (facebook/bart-base recommended as it is
smaller and faster) for sequence classification on CoLA dataset. Ref:
https://openreview.net/pdf?id=rJ4km2R5t7
Each example is a sequence of words annotated with whether it is a
grammatical English sentence.
Tasks:
1. Load and preprocess the CoLA dataset . This includes:
- Tokenizing the text using BARTTokenizer/AutoTokenizer.
2. Fine-Tuning:
- Fine-tune BART on the sequence classification task.
- Implement appropriate hyperparameter tuning for model optimization.
3. Evaluation:
- Use accuracy metric to measure the model's performance.
- The test data with gold labels is not available in CoLA dataset. So, use a
small part of the development data as the development set and use the
remaining data as the test set.
- Compare the performance of your model with the benchmark result from
the BERT model.
Assignment 8: Bitext classification (textual entailment classification)
with BART model (encoder-decoder model)
Objective:
Fine-tune a BART-based model for bitext classification on RTE dataset.
Ref: https://openreview.net/pdf?id=rJ4km2R5t7
RTE stands for recognizing textual entailment i.e. whether one sentence
entails/supports another.
Example of positive entailment:
1. Italian film-maker, Fellini was awarded an honorary Oscar for lifetime
achievement. He died on October 31, 1993.
2. An Italian director is awarded an honorary Oscar.
The above two example sentences are an example of positive entailment.
Example of negative entailment:
3. A smaller proportion of Yugoslavia's Italians were settled in Slovenia
(at the 1991 national census, some 3000 inhabitants of Slovenia
declared themselves as ethnic Italians).
4. Slovenia has 3,000 inhabitants.
The above two example sentences are an example of negative entailment.
Tasks:
1. Load and preprocess the RTE dataset . This includes:
- Tokenizing the text using BartTokenizer/AutoTokenizer.
2. Fine-Tuning:
- Fine-tune BART on the bitext classification task.
- Implement appropriate hyperparameter tuning for model optimization.
3. Evaluation:
- Use accuracy metric to measure the model's performance.
- Compare the performance of your model with the benchmark result from
the BERT model.
Assignment 9: Paraphrasing task with T5 model (text-to-text transfer
Transformer)
Objective:
Fine-tune T5 model to classify whether two sentences are semantically
(meaningfully) equivalent. Use MRPC dataset, Ref:
https://openreview.net/pdf?id=rJ4km2R5t7
Tasks:
1. Load and preprocess the MRPC dataset . This includes:
- Tokenizing the text using T5Tokenizer.
2. Fine-Tuning:
- Fine-tune T5 for paraphrasing task.
- Implement appropriate hyperparameter tuning for model optimization.
3. Evaluation:
- Use accuracy and F1 metric to measure the model's performance.
- Compare the performance of your model with the benchmark result from
the BERT model.
Assignment 10: Word sense disambiguation task using T5 model
text-to-text transfer Transformer)
Objective:
Fine-tune T5 model on WiC dataset. Ref: https://arxiv.org/pdf/1905.00537
Do preprocessing of the dataset, and appropriate hyperparameter tuning
for model optimization.
Task description:
Input: a word w which is present in two sentences. The task is to classify
whether the given word is used in the same sense in both sentences or not.
This is a binary classification problem.
Example of different senses of play:
- this speech did n't play well with the american public .
- play football .
Example of similar sense of person:
there was too much for one person to do .
each person is unique , both mentally and physically .
Evaluation:
- Report accuracy.
Assignment 11: Boolean Question Answering task using T5 model
text-to-text transfer Transformer)
Objective:
Fine-tune T5 model on BoolQ dataset. Ref:
https://arxiv.org/pdf/1905.00537.
Do preprocessing of the dataset, and appropriate hyperparameter tuning
for model optimization.
Task example:
Passage: Barq’s – Barq’s is an American soft drink. Its brand of root beer is
notable for having caffeine. Barq’s, created by Edward Barq and bottled
since the turn of the 20th century, is owned by the Barq family but bottled
by the Coca-Cola Company. It was known as Barq’s Famous Olde Tyme
Root Beer until 2012.
Question: is barq’s root beer a pepsi product
Answer: No
Evaluation:
Report accuracy
Assignment 12: Coreference resolution task using T5 model
text-to-text transfer Transformer)
Objective:
Fine-tune T5 model on WSC dataset. Ref: https://arxiv.org/pdf/1905.00537.
Do preprocessing of the dataset, and appropriate hyperparameter tuning
for model optimization.
Task Description:
Coreference resolution is determining the particular entity that a pronoun
refers to.
Example:
Text: Mark told Pete many lies about himself, which Pete included in his
book. He should have been more truthful. Coreference: False
## The output should be “False” because “he” doesn’t refer to Pete. “Mark”
is referred to by the pronoun “he”.
Evaluation:
Report accuracy