Pretraining

nmT5: Is Parallel Data Still Relevant for Pre-training Massively Multilingual Language Models?

Parallel data and pre-training dynamics for massively multilingual LMs.