Large Language Models

SPoT: Better Frozen Model Adaptation through Soft Prompt Transfer

Soft prompt transfer for better adaptation of frozen models.

ByT5: Towards a Token-Free Future with Pre-trained Byte-to-Byte Models

Token-free byte-to-byte language modeling at scale.

The Power of Scale for Parameter-Efficient Prompt Tuning

Prompt tuning improves substantially with model scale.

nmT5: Is Parallel Data Still Relevant for Pre-training Massively Multilingual Language Models?

Parallel data and pre-training dynamics for massively multilingual LMs.