Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: pub@towardsai.net
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab VeloxTrend Ultrarix Capital Partners Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Our 15 AI experts built the most comprehensive, practical, 90+ lesson courses to master AI Engineering - we have pathways for any experience at Towards AI Academy. Cohorts still open - use COHORT10 for 10% off.

Publication

Dense Passage Retrieval (2020) and Contriever (2021): The Models That Paved the Way for Future, Smarter LLMs
Artificial Intelligence   Latest   Machine Learning

Dense Passage Retrieval (2020) and Contriever (2021): The Models That Paved the Way for Future, Smarter LLMs

Author(s): Saif Ali Kheraj

Originally published on Towards AI.

Dense Passage Retriever (DPR) marked a turning point in open-domain question answering when it launched in 2020. It demonstrated that dense vector representations, learned through deep neural networks, can outperform traditional sparse retrieval methods like BM25 — especially in top-k recall benchmarks. Since DPR’s release, the dense retrieval landscape has evolved rapidly. This post provides a deep dive into the architectures, advancements, and key lessons learned.

This is intended for AI researchers, MLOps engineers, and data scientists building cutting-edge retrieval systems.

Dual-Encoder (Two-Tower) Models

Architecture

A dual-encoder model (also called a bi-encoder or Siamese network) consists of two independent BERT-based encoders: one for queries and one for documents/passages.

Dense Passage Retrieval (2020) and Contriever (2021): The Models That Paved the Way for Future, Smarter LLMs
Figure by Author

Each encoder maps its input into a fixed-length dense vector:

  • Query embedding: Q = Encoder(query)
  • Document embedding: D = Encoder(document)

These vectors are compared using a similarity score (typically a dot product or cosine similarity):

Unlike cross-encoders (which compute interactions jointly), dual-encoders do this independently, enabling fast and scalable retrieval. During backpropagation, the gradients from the shared loss flow individually through each encoder, updating both sets of parameters separately, even though they’re part of the same computation graph.

Goal

To maximize retrieval performance in tasks like:

  • Open-domain Question Answering (QA)
  • Dense information retrieval
  • Large-scale document search

Parameters

Typically ~220 million (using two BERT-base encoders)

  • Each encoder has ~110M parameters
  • Can be initialized separately or from the same base model

Embedding Space

Produces two independent embedding spaces for queries and documents. These are aligned during training so that relevant pairs are closer in vector space. Semantic similarity, not lexical overlap, drives retrieval.

Training

Supervised Dataset (Triplet Format):

  • Query (Q): A natural question (e.g: “What is the capital of Italy?”)
  • Positive passage (D⁺): Relevant answer passage (e.g: “Rome is the capital of Italy…”)
  • Negative passages (D⁻): Irrelevant or hard negatives (e.g: “Paris is the capital of France…”)

Batch Construction:

Figure by Author using mermaid

What the Model Learns

Semantic alignment, not just keyword matching. The model understands:

  • Synonyms
  • Paraphrases
  • Semantic relations
  • Trains a ranking function to distinguish relevant from irrelevant documents

Loss Function

A softmax-based cross-entropy loss over the similarity scores:

Figure by Author

Where:

  • D⁺:= positive passage
  • Dᵢ = all passages in the batch (including negatives)

Objective: Maximize similarity between query and positive passage, minimize similarity to all negative passages.

Zero-Shot Retrieval Use Case

DPR works especially well for Open-Domain Question Answering tasks — for example, using the Natural Questions dataset.

Example:

  • Query: “What is the capital of Italy?”
  • Positive passage (D⁺): “Rome is the capital of Italy and is known for the Colosseum.”
  • Negative passage (D⁻): “Paris is the capital of France and a popular tourist destination.”

During training, DPR learns how to match questions with correct answers — not by memorizing exact pairs, but by learning general patterns.
As a result, even if it never saw this exact question during training, at test time it can still retrieve the right answer. This is called zero-shot retrieval — the model can handle new, unseen questions without needing to be retrained. Open QA tasks are ideal to test this ability, because they contain diverse, open-ended questions — perfect for evaluating zero-shot generalization.

Benefits and Limitations

Figure by Author

Evaluation

BEIR Benchmark (Zero-Shot Evaluation)

DPR’s performance on the BEIR benchmark, which assesses zero-shot retrieval capabilities across diverse datasets, varies:

  • Average nDCG@10: Approximately 39–40%

Note: While DPR excels in QA-specific datasets, its zero-shot performance on BEIR is moderate, highlighting challenges in generalizing to diverse domains without fine-tuning.

Shared Encoder Models: Contriever (2021)

Contriever, released by Meta AI in 2021, marked a significant evolution in dense retrieval by moving from supervised, dual-encoder architectures (like DPR) to unsupervised, shared encoder models. Unlike DPR, which uses separate encoders for queries and documents, Contriever uses a single shared BERT encoder for both, trained without labeled QA data.

This shift enabled powerful zero-shot generalization, outperforming supervised models like DPR on BEIR without any fine-tuning. This section explores Contriever’s architecture, training method, inference setup, and performance.

Architecture: Shared Encoder (Siamese Network)

Contriever uses one shared encoder:

  • Single BERT-base model for both query and document
  • Input agnostic: treats all inputs as generic text (no special query/document role)
Figure by Author

Similarity is computed using cosine or dot product between embeddings. This design forces the embeddings to live in a unified semantic space, improving generalization across tasks.

Goal

  • Train a general-purpose retriever that works zero-shot on diverse domains
  • Avoid dependence on QA-specific supervision
  • Strong performance on BEIR and domain-shifted tasks

Parameters

Single BERT-base (110M parameters). Lower compute/storage cost vs DPR (which has 2x BERT).

Embedding Space

A shared space for both queries and documents. No role-specific specialization. Embeddings capture generic semantic structure, useful for broad retrieval tasks.

Training

Training Data:

No human supervision or QA pairs. Uses generic corpus like Wikipedia or Common Crawl.

Method: Unsupervised Contrastive Pretraining

Follows InfoNCE loss:

Figure by Author
  • Positives (x⁺): Augmented views of the same passage (e.g., span masking, cropping)
  • Negatives (x⁻): Other texts in the batch
Figure by Author

Data Augmentation:

  • Random span masking (30%)
  • Causal cropping
  • Dropout noise (p=0.1)

What the Model Learns

Semantic similarity via data augmentation, not explicit answers. Learns to embed similar texts close together. Excellent zero-shot capabilities due to general-purpose alignment.

Loss Function

InfoNCE (contrastive loss). Embedding similarity maximized for augmented pairs, minimized for other texts.

Zero-Shot Use Case

Contriever is trained to retrieve relevant supporting facts or arguments, even in new domains (such as biomedical, legal, or news), without requiring domain-specific supervision.

Example:

  • Input 1: “Benefits of intermittent fasting”
  • Input 2 (positive passage): “Time-restricted eating improves insulin sensitivity.”

Like DPR, Contriever learns semantic similarity patterns using contrastive learning . However, Contriever’s training does not rely on QA-specific supervision — it uses unsupervised data augmentation. As a result, it can generalize better in zero-shot settings, especially to domains it was never fine-tuned on. Contriever also uses a shared encoder — both queries and documents are passed through the same model, making it more flexible for general-purpose retrieval.

Benefits and Limitations

Table by Author

Shared Encoder vs. Separate Encoders

Dual-encoder setups can either use the same underlying network (shared weights) for queries and documents or use two separate networks. DPR originally used separate BERT encoders for questions and passages (allowing them to specialize), whereas some later models (like Contriever) use one BERT model for both queries and docs (tying weights).

Figure by Author

Shared encoders enforce the embeddings to live in one common space, which can improve zero-shot transfer (the model sees generic text regardless of role). However, separate encoders can accommodate query/document length differences or specialized training (e.g. questions vs. long docs).

In practice, many systems keep separate encoder instances but initialize them identically or from the same pre-trained model. The key is ensuring the two embeddings are compatible — query vectors and doc vectors must lie in the same semantic space to be comparable. Community discussions often stress avoiding encoder mismatch, i.e. not using different model families or training objectives for query vs. doc, since that would yield incomparable embeddings.

While both shared encoder models and DPR (Dense Passage Retriever) aim to improve dense retrieval for open-domain question answering, their training approaches differ due to their architectural designs.

Performance Comparison

Figure by Author

Conclusion

The evolution from DPR to Contriever represents a fundamental shift in how we approach dense retrieval. DPR demonstrated that dense representations could outperform sparse methods, but required supervised training data. Contriever showed that unsupervised training with a shared encoder could achieve even better zero-shot performance across diverse domains.

These models laid the foundation for modern retrieval systems used in RAG pipelines and continue to influence the development of more sophisticated retrieval architectures. The key insights — that semantic similarity matters more than lexical overlap, that unsupervised pretraining can be surprisingly effective, and that encoder alignment is crucial — remain relevant as we build the next generation of retrieval systems.

Conclusion

Future work in this space continues to build on these foundational concepts, exploring reasoning-aware retrieval, multi-modal approaches, and hybrid architectures that combine the best of both supervised and unsupervised methods.

References

[1] https://aclanthology.org/2020.emnlp-main.550/

[2] https://arxiv.org/pdf/2112.09118

[3] https://github.com/facebookresearch/contriever

[4] https://huggingface.co/facebook/contriever

[5] https://arxiv.org/abs/2112.09118

[6] https://genai-course.jding.org/rag/index.html?utm_source=chatgpt.com

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI


Take our 90+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!


Discover Your Dream AI Career at Towards AI Jobs

Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 40,000 live jobs today with Towards AI Jobs!

Note: Content contains the views of the contributing authors and not Towards AI.