Bridging the Gap for Tokenizer-Free Language Models

Purely character-based language models (LMs) have been lagging in quality on large scale datasets, and current state-of-the-art LMs rely on word tokenization. It has been assumed that injecting the prior knowledge of a tokenizer into the model is …

Character-Level Language Modeling with Deeper Self-Attention

LSTMs and other RNN variants have shown strong performance on character-level language modeling. These models are typically trained using truncated backpropagation through time, and it is common to assume that their success stems from their ability …

Token-Free Language Modeling

How to eliminate segmentation (the last preprocessing step) from NLP models.