Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: pub@towardsai.net
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab VeloxTrend Ultrarix Capital Partners Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Our 15 AI experts built the most comprehensive, practical, 90+ lesson courses to master AI Engineering - we have pathways for any experience at Towards AI Academy. Cohorts still open - use COHORT10 for 10% off.

Publication

Automating Product Matching with LLMs: A Step-by-Step Guide
Latest   Machine Learning

Automating Product Matching with LLMs: A Step-by-Step Guide

Author(s): Taha Azizi

Originally published on Towards AI.

A Deep Dive into Intelligent Product Matching for E-commerce and Supply Chain Efficiency

In today’s fast-paced digital economy, businesses constantly grapple with vast amounts of data, especially when managing product inventories from diverse suppliers. The challenge of accurately matching external product lists with internal catalogs can be a significant bottleneck, often relying on time-consuming, error-prone manual processes. Imagine a scenario where new shipments arrive weekly, each introducing new products that need to be seamlessly integrated into your existing system. This isn’t just a hypothetical; it’s a real problem facing many stakeholders, including those operating convenience-store-like markets.

This article delves into an intelligent, automated solution designed to tackle this very problem. We’ll explore how a combination of data engineering, advanced NLP techniques, and Large Language Models (LLMs) can create a robust system for exact product matching, focusing on manufacturer, name, and size. This approach, exemplified by a recent project on product matching, aims to maximize accuracy while minimizing human intervention.

The Core Challenge: Exact Product Matching

Our objective is precise: to map external supplier products to internal market products only when their manufacturer, name, and size are identical. This strict requirement often makes traditional matching methods fall short, as even minor discrepancies can lead to mismatches.

For instance, consider these examples of correct and incorrect matches:

Correct Matches: | External_Product_Name | Internal_Product_Name | | : — — — — — — — — — — — — — — — — — | : — — — — — — — — — — — — — — — — — — — — — | | DIET LIPTON GREEN TEA W/ CITRUS 20 OZ | Lipton Diet Green Tea with Citrus (20oz) | | CH-CHERRY CHS CLAW DANISH 4.25 OZ | Cloverhill Cherry Cheese Bearclaw Danish (4.25oz) |

Wrong Matches: | External_Product_Name | Internal_Product_Name | | : — — — — — — — — — — — — — — — | : — — — — — — — — — — — — — — — — — — — — — | | Hersheys Almond Milk Choco 1.6 oz | Hersheys Milk Chocolate with Almonds (1.85oz) | | COOKIE PEANUT BUTTER 2OZ | Famous Amos Peanut Butter Cookie (2oz) |

The subtle differences, like “1.6 oz” vs. “1.85oz” or “Almond Milk” vs. “Milk Chocolate,” are critical.

Automating Product Matching with LLMs: A Step-by-Step Guide

Step 1: Data Understanding and Preprocessing

Any robust AI solution begins with a thorough understanding and cleaning of the data. We start with two CSV files: Data_Internal.csv and Data_External.csv.

Initial exploration reveals the structure and content of our product lists. Key columns include NAME, OCS_NAME, and LONG_NAME in the internal data, and PRODUCT_NAME in the external data.

Step 2: The Multi-Layered Matching Strategy

We employ a phased approach, progressively increasing the sophistication of our matching algorithms.

2.1 Attempt 1: Exact Matching — The Baseline

Our first instinct is always to check for perfect, direct matches. We attempt to find PRODUCT_NAME in the external data that precisely matches NAME or LONG_NAME in the internal data.

2.2 Attempt 2: Fuzzy Matching — Embracing Variations

Real-world data rarely offers perfect consistency. Product names often have minor spelling errors, abbreviations, or reorderings. Fuzzy matching accounts for these variations by calculating a similarity score between strings. We utilize rapidfuzz's token_set_ratio, which is robust to word order and missing words.

3.3 Attempt 3: Vector Database Matching — Understanding Semantics

To move beyond superficial string comparisons, we leverage the power of semantic understanding through embedding models. We transform product names into high-dimensional numerical vectors, where similar products are represented by vectors that are close in space.

The SentenceTransformer library with the 'all-MiniLM-L6-v2' model is used to create these embeddings. A FAISS (Facebook AI Similarity Search) index is then built for efficient similarity searches.

3.4 Attempt 4: LLM-Enhanced Vector Matching — The Power of Prompt Engineering

This is where the true “intelligent” aspect comes into play. We combine the efficiency of vector similarity search with the nuanced understanding of Large Language Models (LLMs). The idea is to use the vector database to retrieve a small set of highly relevant candidates, and then use an LLM to perform a fine-grained, rule-based validation on these candidates.

We define a prompt template that explicitly asks the LLM to compare an external product name with a potential internal match and determine if they are an exact match, considering manufacturer, name, and size. The LLM used here is ‘gemma3:27b’.

3.5 Attempt 5: Few-Shot Prompting — Learning from Examples (Unsuccessful)

To address the remaining inaccuracies, especially concerning size discrepancies, few-shot prompting was attempted. This involves providing the LLM with a few examples of correct and incorrect matches directly within the prompt, guiding its reasoning process.

Result: Few-shot prompting did not yield satisfactory results in this specific scenario. This might be due to the subtle nature of the differences or the limited number of examples provided.

3.6 Attempt 6: Sequential Prompting — Double-Checking for Size Accuracy

Since few-shot prompting didn’t resolve the size issue, a sequential LLM approach was implemented. After the initial LLM validation, a second LLM call was made specifically to verify size compatibility. If a potential match passed the first LLM check but failed the second size-specific check, it was then nulled out.

This involves sending potential matches to another LLM prompt (prompt_size.txt) designed solely for size verification.

The culmination of this process is a table that lists every external item and its corresponding matched internal product. If no exact match is found (after all stages of validation, including LLM checks), the internal product column will be NULL. This directly fulfills the stakeholder's requirement for an automated mapping system.

Reflections and Future Directions

This project demonstrates the power of a hybrid AI approach for complex data matching. By combining the strengths of:

  1. Semantic Search (Vector Databases): For efficient retrieval of semantically similar candidates.
  2. LLM Validation (Prompt Engineering): For precise, rule-based verification of exact matches and handling nuanced details like size.
  3. Sequential Reasoning (Multi-Stage Prompting): To refine results and address specific discrepancies systematically.

This system moves beyond simplistic methods, offering a robust and scalable solution for product alignment.

However, it’s crucial to acknowledge certain aspects for future improvement:

  • Lack of Labeled Data: The current evaluation relied on manual inspection. In a real-world scenario, a labeled dataset is indispensable for quantitatively measuring accuracy, precision, and recall, and for training a supervised model.
  • Exact Match Constraint: The strict “exact match” requirement inherently increases false negatives (missed potential matches). For use cases where some flexibility is allowed, tuning parameters like K (number of vector candidates) and J (number of candidates sent to LLM) could yield more matches.
  • Prompt Engineering Optimization: The LLM prompts can always be further refined. Techniques like Chain-of-Thought (CoT) prompting could be explored to encourage more detailed reasoning from the LLM.
  • Scalability for Larger Datasets: For extremely large datasets, optimizing the vector database indexing, potentially using more advanced FAISS indices or distributed solutions, would be critical.
  • User Interface and Feedback Loop: Integrating this system into a user-friendly frontend where human reviewers can provide feedback on matches would create a powerful continuous learning loop, iteratively improving accuracy over time.

This project, available on GitHub (Taha-azizi/product-matching-system), serves as a strong foundation for building intelligent automation in product data management, illustrating how thoughtful AI integration can transform manual workflows into efficient, accurate processes. The future of data management is undoubtedly automated, and solutions like this are paving the way.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI


Take our 90+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!


Discover Your Dream AI Career at Towards AI Jobs

Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 40,000 live jobs today with Towards AI Jobs!

Note: Content contains the views of the contributing authors and not Towards AI.