How To Train a Seq2Seq Summarization Model Using “BERT” as Both Encoder and Decoder!! (BERT2BERT)
Author(s): Ala Alam Falaki Originally published on Towards AI. BERT is a well-known and powerful pre-trained “encoder” model. Let’s see how we can use it as a “decoder” to form an encoder-decoder architecture. Photo by Aaron Burden on Unsplash The Transformer architecture …