The recent introduction of attention mechanisms to deep learning brought significant gains in quality and interpretability to machine learning models. The impact on natural language processing has been transformational. With the introduction of new models such as Transformer, T5, GPT-x and BERT, machine understanding of language has accelerated. In this talk, we study the problems attention tries to address, the intuition behind self-attention mechanisms, and connection to CNN and Graph neural networks through the application of language modeling. We follow up with a discussion of popular self-supervised language models and scalability challenges that face attention-based models.