Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Mastering Document Intelligence: Navigate LLM Hallucinations with the LLM Challenge Framework!
Artificial Intelligence   Latest   Machine Learning

Mastering Document Intelligence: Navigate LLM Hallucinations with the LLM Challenge Framework!

Last Updated on January 6, 2025 by Editorial Team

Author(s): Dr. Sreeram Mullankandy

Originally published on Towards AI.

Photo by Google DeepMind on Unsplash

Picture this: Your company just processed 10,000 insurance claims using LLMs. Everything seems efficient until you discover that 5% of the processed data contains subtle but critical errors. That’s 500 potentially problematic cases that could lead to incorrect payouts, compliance issues, or worse β€” loss of customer trust.

Welcome to the real world of AI-powered document processing, where hallucinations β€” LLM’s tendency to generate plausible but incorrect information β€” pose a significant business risk.

But is that the LLM’s fault?!

The LLM has no β€œhallucination problem”. Hallucination is not a bug, it is LLM’s greatest feature. The LLM Assistant (like Chat GPT) has a hallucination problem, and we should fix it.

β€” Andrej Karpathi

The $3 Trillion a year Problem

According to recent studies, businesses lose an estimated $3 Trillion annually due to their inability to extract data accurately. Traditional methods offer a difficult choice:

  • Manual processing: Accurate but painfully slow and expensive
  • OCR-based systems: Fast, but inaccurate due to lack of context
  • Rule-based systems: Rigid, error prone, and requires constant updates
  • LLM based systems: Fast, context-aware, but prone to errors due to hallucinations

Introducing the LLM Challenge Framework

What if you could combine the speed of AI with human-level accuracy? That’s exactly what the LLM Challenge Framework achieves. That is exactly what we tried out. By pitting multiple LLMs against each other and incorporating strategic human oversight, we could generate:

  • 99% accuracy in field-level data extraction
  • 70% reduction in human review time
  • 85% cost savings compared to manual processing

How It Works

Let’s break down the magic:

  1. Document pre-processing:
    β€” Ingest the document to generate images or text of consistent quality
    β€” Convert to images if you are using visual LLMs (our choice after trial and error)
    β€” Convert the document to text (using OCR) if you are using text based LLMs.
  2. Dual AI Processing:
    β€” Two independent visual LLMs process the same document
    β€” Each LLM generates structured output
    β€” Results are automatically compared with each other

2. Smart Verification:
β€” Matching results = High confidence β†’ Automatic approval
β€” Discrepancies = Targeted human review via Human-in-the-loop approach

3. Reinforcement learning:
β€” Collect human input (feedback) to identify the correct response.
β€” Use the collected data to further fine tune LLMs.

Image by Author

Why It’s Better Than Existing Solutions

Here is a high-level comparison chart on current main-stream approaches to this problem.

Image by Author

Implementation

We used AWS based Cloud-native architecture:

  • AWS Sagemaker to build, train, and deploy LLMs.
  • NVIDIA L40 GPUs to compute. We went with L40 instead of A100 or H100 to be cost-efficient. But it came with its own limitations.
  • Qwen (Qwen-2VL-2B) and InternVL (Intern-vit-6B) as the two competing visual LLMs. We tried out other LLMs too (includes Gemma and LLaMA) but settled on Qwen and InternVL considering the RAM capacity of L40 GPUs and the accuracy of the results.

Potential Improvements

Here are some of the additional approaches that could yield better results.

  • Multiple LLM Integration: Expanding beyond two LLMs to increase confidence levels.
  • Model Diversity: Optimizing the mix of large and small models. Keep in mind that the larger models may need GPUs with larger RAM, which could bump up the cost.
  • Multi-modal Capabilities: Leveraging models capable of processing both text and images.

ROI Metrics

Here are some of the metrics that we considered while operationalizing this workflow
– Accuracy (field-level): 99% accuracy
– Throughput time: 85% faster document processing
– Cost reduction: 70% reduction in processing costs
– Payback period: 6-month payback period

Relevance

Industries that deal with high volumes of critical documents where accuracy is paramount and errors can have significant financial or legal consequences, making them perfect candidates for the LLM Challenge Framework. Here are some of them:

1. Healthcare & Insurance: Processing medical claims, clinical notes, and insurance policies to extract diagnosis codes, treatment plans, and coverage details while maintaining HIPAA compliance.

2. Banking & Financial Services: Automating loan application processing, KYC documents, and financial statements to expedite customer onboarding and risk assessment.

3. Legal Services: Analyzing contracts, court documents, and legal correspondence to extract key clauses, deadlines, and legal obligations with high accuracy.

4. Supply Chain & Logistics: Processing bills of lading, customs declarations, and shipping manifests to extract shipment details, compliance information, and tracking data.

5. Manufacturing: Processing quality control reports, safety certificates, and compliance documentation to extract specifications and maintain regulatory standards.

The Bottom Line

In a world where data accuracy can make or break a business, the LLM Challenge Framework offers a revolutionary solution. It’s not just about processing documents faster β€” it’s about processing them right.

By combining the power of multiple LLMs with strategic human oversight, we can deliver unprecedented accuracy while maintaining the speed and cost benefits of automation.

The question isn’t whether to modernize your document processing β€” it’s whether you can afford not to.

Reference:

[1] T. Redman, Bad Data Costs the U.S. $3 Trillion Per Year (2016), Harvard Business Review

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓