UHH-LT at SemEval-2020 Task 12: Fine-Tuning of Pre-Trained Transformer Networks for Offensive Language Detection

2020, Conference contribution - Article for conference in: Proceedings of the Fourteenth Workshop on Semantic Evaluation: 1638-1644, 7 p., Barcelona (online): International Committee for Computational Linguistics

DOI: 10.18653/v1/2020.semeval-1.213

@inbook{bd1d39eb57d148d4ba197fe2c574cbd4,

title = "UHH-LT at SemEval-2020 Task 12: Fine-Tuning of Pre-Trained Transformer Networks for Offensive Language Detection",

abstract = "Fine-tuning of pre-trained transformer networks such as BERT yield state-of-the-art results for text classification tasks. Typically, fine-tuning is performed on task-specific training datasets in a supervised manner. One can also fine-tune in unsupervised manner beforehand by further pre-training the masked language modeling (MLM) task. Hereby, in-domain data for unsupervised MLM resembling the actual classification target dataset allows for domain adaptation of the model. In this paper, we compare current pre-trained transformer networks with and without MLM fine-tuning on their performance for offensive language detection. Our MLM fine-tuned RoBERTa-based classifier officially ranks 1st in the SemEval 2020 Shared Task 12 for the English language. Further experiments with the ALBERT model even surpass this result.",

author = "Gregor Wiedemann and Yimam, {Seid Muhie} and Chris Biemann",

year = "2020",

month = dec,

day = "1",

doi = "10.18653/v1/2020.semeval-1.213",

language = "English",

pages = "1638--1644",

editor = "Aurelie Herbelot and Xiaodan Zhu and Alexis Palmer and Nathan Schneider and Jonathan May and Ekaterina Shutova",

booktitle = "Proceedings of the Fourteenth Workshop on Semantic Evaluation",

publisher = "International Committee for Computational Linguistics",

}

Abstract

Fine-tuning of pre-trained transformer networks such as BERT yield state-of-the-art results for text classification tasks. Typically, fine-tuning is performed on task-specific training datasets in a supervised manner. One can also fine-tune in unsupervised manner beforehand by further pre-training the masked language modeling (MLM) task. Hereby, in-domain data for unsupervised MLM resembling the actual classification target dataset allows for domain adaptation of the model. In this paper, we compare current pre-trained transformer networks with and without MLM fine-tuning on their performance for offensive language detection. Our MLM fine-tuned RoBERTa-based classifier officially ranks 1st in the SemEval 2020 Shared Task 12 for the English language. Further experiments with the ALBERT model even surpass this result.

Overview Publications1

Publications

2020
UHH-LT at SemEval-2020 Task 12 - Fine-Tuning of Pre-Trained Transformer Networks for Offensive Language Detection
- G. Wiedemann
- S. M. Yimam
- C. Biemann
Published: Preprint
- Submitted manuscript
- https://www.edit.fis.uni-hamburg.de/ws/files/55704855/2004.11493v2.pdf

External publication metrics are inactive

With the activation, data is transferred to third parties. For more information look at our Privacy statement.

External publication metrics are inactive

With the activation, data is transferred to third parties. For more information look at our Privacy statement.

UHH-LT at SemEval-2020 Task 12: Fine-Tuning of Pre-Trained Transformer Networks for Offensive Language Detection

Abstract

Publications

UHH-LT at SemEval-2020 Task 12 - Fine-Tuning of Pre-Trained Transformer Networks for Offensive Language Detection