LLM Inference: Data Parallel, Model Parallel, and Pipeline Parallel
Author(s): Tushar Vatsa Originally published on Towards AI. credits : www.veracity.com In the previous post, we explored how KV cache optimization affects inference performance. Using the Phi-2 model as an example, we observed that increasing the sequence length led to a near-linear …