Member-only story

Generative AI: Understanding Transformers and Generative Techniques

Venkatesh Subramanian
5 min readApr 24, 2024

--

BERT, GPT-3, GPT-4 …. the backbone of these LLM models are transformers. The introduction of transformers has marked a significant milestone in natural language processing (NLP). Before transformers, sequence models like recurrent neural networks (RNNs) and convolutional neural networks (CNNs) were widely used in NLP tasks. However, transformers brought about a paradigm shift in how NLP tasks are approached.

Transformer Architecture

Machines work with numbers, and word embeddings play a crucial role in enabling them to process natural language. Word embeddings represent words as dense vectors in a high-dimensional space, where similar words are mapped to nearby points.

Please check the basic word embedding techniques discussed here

In the case of transformers, word embeddings are indeed utilized, but they go beyond simple word representations. Transformers incorporate positional encodings to provide information about the position of words in a sequence. This helps the model distinguish between words based not only on their meaning but also on their position in the sentence or document.

Positional Encoding

In the sentences “snake eats frog” and “frog eats snake,” the word “snake” would have different positional embeddings based on its position in each sentence. This helps the transformer model understand that the word “snake” in the first…

--

--

Venkatesh Subramanian
Venkatesh Subramanian

Written by Venkatesh Subramanian

Product development & Engineering Leader| Software Architect | AI/ML | Cloud computing|https://www.linkedin.com/in/venkatesh-subramanian-377451b4/

No responses yet