Advanced Attention Mechanisms — II
Author(s): Arion Das Originally published on Towards AI. flash attention (from source) Flash Attention.You can refer to it’s predecessors here: KV cache, sliding window attention, MHA, MQA, uptraining, & GQA. These methods were employed to bring down memory and compute requirements, but …