Rotary Positional Embedding(RoPE): Motivation and Implementation
Last Updated on June 13, 2024 by Editorial Team
Author(s): Harsh Maheshwari
Originally published on Towards AI.
Delve deeper into RoPE along with its code to understand the positional embedding in LLMs better
Photo by Agence Olloweb on Unsplash
Positional embedding plays a crucial role in transformer models by helping them distinguish the order of tokens in a sequence/sentence. Without positional embedding, a transformer model would treat the sentences βMy name is Harshβ and βHarsh Name is Myβ as identical since it only considers the words themselves and not their positions. This blog post assumes that the reader has a basic understanding of transformer models, tokens, and embeddings.
Source -: https://arxiv.org/pdf/1706.03762
In this blog, I will highlight the problems with absolute positional embedding and how Rotary Positional Embedding is introduced to overcome the same. I will also include the implementation for RoPE and will end the blog with some questions which you can go through for either interview preprations or to ensure that you have understood this blog nicely.
The absolute sinusoidal positional embedding is added to the input token embeddings as shown in figure above. It is calculated using a series of sinusoidal functions with different frequencies, using the formula provided.
Here the pos represents the token position, d_{model} represents the embedding dimension of model, i is the dimension index varying from (0, 1, 2, …, d_{model}/2 β 1). The positional embedding has the same dimension as… Read the full blog for free on Medium.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI