Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Next-Gen Search powered by Jina
Latest

Next-Gen Search powered by Jina

Last Updated on May 3, 2021 by Editorial Team

Author(s): Shubham Saboo

Technology

Since the inception of online search, the world has changed dramatically, but the β€œcuriosity” that fuels the business remains constant…

β€œThe objective isn’t to make your links appear natural, the objective is that your links are naturalβ€β€Šβ€”β€ŠMattΒ Cutts

What is Neural SemanticΒ Search?

A neural search is an intelligent approach towards retrieving contextual and semantically relevant information. Instead of telling a machine a set of rules to understand what data is what, neural search does the same thing with a pre-trained neural network. This means developers don’t have to write every little rule, saving them time and headaches, and the system trains itself to get better as it goesΒ along.

Conventional Search v/s NeuralΒ Search

Conventional search

  • Conventional search is symbolic and keyword-driven due to which it lacks the necessary context.
  • Conventional search is fragile due to its hard-coded ruleΒ engines.
  • Conventional search requires updating of rules with the new addition of data, making it non-scalable and time-consuming.
  • Conventional search requires a level of domain knowledge to implement.

Neural/Semantic Search

  • Neural search is context-driven enabling it to find semantically relevant information.
  • Neural search is flexible in adapting to all the corner cases and resilient toΒ noise.
  • Neural search on the other hand can train itself on the new data using the past inferences/context making it highly scalable and efficient.
  • Neural search requires little to no domain knowledge to implement.

What isΒ Jina?

Jina is a cloud-native neural search platform. It can be deployed in containers, clod, or on-prem servers. It offers anything-to-anything search ranging from Text-to-text, image-to-image, video-to-video, or any other data type that you can feed as input to the engine. Jina operates on its primitive data type known as a document. Documents are pieces of data in any dataset you want to search, and the input queries you use to find what youΒ want.

Basically, they are the input and output data for the Jina search workflows. Jina core comprises of two main flows, which are the heart and soul of the semantic searchΒ engine:

  • Indexing Flow: An indexing Flow makes the whole corpus searchable by sentence. The indexing flow prepares and pre-processes the data to be searched. The input documents are fed in, processed, and output at the other end is stored as searchable indexes.
  • Querying Flow: A querying flow takes the user query as an input document (primitive Jina data type) and returns a list of ranked matches based on the similarity score within the word embeddings.

Jina Components

Flow represents a high-level task, e.g. indexing, searching, training. It consists of a group of pods, orchestrating them to accomplish one task. A pod is a group of executors sharing the same properties, it allows parallel execution of multiple executors and adds context and control to the executors.

Executor represents an algorithmic unit in Jina. Algorithms such as encoding images into vectors, storing vectors on the disk, ranking results, can all be formulated as Executors. Executor provides useful interfaces, allowing AI developers and engineers to really focus on the algorithm. Some common executors are asΒ follows:

  • Crafter: Crafter is used for pre-processing and the documents intoΒ chunks.
  • Encoder: Encoder takes the input pre-processed chuck of documents from the crafter and encodes them into embedding vectors.
  • Indexer: Indexer takes the encoded vectors as input and indexes and stores the vectors in a key-value fashion.
  • Ranker: Ranker runs on the indexed storage and sorts the results based on a certainΒ ranking.

Search Modalities

Jina is a data type-agnostic framework, that lets you work with any type of data and run cross-modal and multi-modal searchΒ Flows.

  • Single Modality: In this type of search the type of input and the type of output remains the same, it includes text-to-text search, image-to-image search, audio-to-audio search, etc. In a single modality, the search is designed to deal with a single data type making it less flexible and fragile to the input of different dataΒ types.
  • Cross Modality Search: It enables you to effectively find relevant documents of modality A (let's sayβ€Šβ€”β€Šβ€œimage”) by querying with documents from modality B (let's sayβ€Šβ€”β€Šβ€œtext”). Cross Modality refers to a set of applications where you can look for documents of one modality (e.g. images) with queries from another one (e.g.Β text).
  • Multi-Modality Search: It enables you to project documents of different modalities into a common embedding space, and find relevant documents with respect to the fusion of multiple modalities Multi-Modality is when you merge information in a query from different modalities as in providing an infused input consisting of (text+image) to get the output which can be flexible depending on the interpretation by theΒ model.

Support to different types of modalities unlocks a lot of powerful patterns and makes Jina fully flexible and agnostic to what can be searched.

All in one, data type agnostic search platform…

Jina inΒ Action

For showcasing a live demo, I have designed a simple neural semantic search for textual data. The model is trained on the data taken from a random Wikipedia page. Jina takes the input document and follows through the internal Jina flows (Indexing followed by Querying) to come up with a searchΒ engine.

Frameworks/Tools Used:

  • Jina Core: It enables the indexing and querying workflows for the respective application.
  • Language Model: The language model used here comes from the BERT(Bi-directional Encoder representation for Transformers) family, here we have used β€œdistilbert-bert-cased” for understanding the context under the querying flow ofΒ Jina.
  • Jina Box: Jina Box is an easy-to-use, lightweight, customizable front-end web component for data type agnostic search (be it text, audio, video, etc.) that can be easily connected to the Jina backend providing the user with a simple and efficient interface to interact with the searchΒ engine.
  • Python 3.7: It is used as the development environment for the Jina Application.

Example: Here in the search box we try to search for β€œcomputer” and get the following results. It's interesting to see that there is no mention of the exact word β€œcomputer” anywhere in the indexed document, still the model figures out the sentence which are contextually or semantically related computer.

Jina doing theΒ magic!

References

  1. https://github.com/jina-ai/jina
  2. https://docs.jina.ai
  3. https://www.thinkwithgoogle.com

If you would like to learn more or want to me write more on this subject, feel free to reachΒ out.

My social links: LinkedIn| Twitter |Β Github

If you liked this post or found it helpful, please take a minute to press the clap button, it increases the post visibility for other mediumΒ users.


Next-Gen Search powered by Jina was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Published via Towards AI

Feedback ↓