Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Why convert text to a vector?
Latest

Why convert text to a vector?

Last Updated on January 21, 2022 by Editorial Team

Author(s): vivek

Originally published on Towards AI the World’s Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses.

Mathematics

If you convert anything(text) to a vector by using some method so that we can use the power of linear algebra. But the main thing to think about is how do you convert text to vector? So that you can deploy the power of linear algebra to solve theΒ problem.

How do you convert text toΒ vector?

Here the text refers to both words and sentences and the vector refers to numerical vector. For example, we are having a review text, the aim is to convert it into a d-dimensional vector.

Plane PI with normal w is separating positive reviews from negative reviews
Fig1. Representation of reviews in a d-dimensional space

From the above figureΒ consider

+β€Šβ€”β€Špositive review representation in d-dimensional space.

_β€Šβ€”β€Šnegative review representation in d-dimensional space.

Keep a look at notations:

wTxβ€Šβ€”β€Šdenotes w transpose x ( w Superscript T multiplied byΒ x).

xiβ€Šβ€”β€Šdenotes a point (x Subscript iΒ ).

riβ€Šβ€”β€Šdenotes a review (r Subscript iΒ ).

viβ€Šβ€”β€Šdenotes a vector ( v Subscript iΒ ).

Consider all positive and negative reviews are in a d-dimensional plane which is separated by a plane Pi with a normal w. Let us assume that we can find a plan like this such that all the positive reviews are on one side and all the negative reviews are on the other side of the plane. By this, we can say that we found a model to solve the problem let’s say for the assumption.

We know that given any point x which is in the direction of the normal of the plane, then wTx for this point x is positive. Let us assume two points x1 and x2 where x1 represents the positive point and x2 represents a negative point.Β Then

WTx1: will be positive as the point lies in the direction of the normal of theΒ plane.

WTx2: will be negative as the point lies in the opposite direction of the normalΒ plane.

So, as per our assumption if all our points are in d-dimensional space and if we found a plane pi and normal w to it which divides the positive and negative points then we can sayΒ that,

If my wTxi > 0 the ri is positive else ri is negative.

Here, ri represents anyΒ review.

So, finally, we converted our text into a d-dimensional vector and found a plane that separates the text based on their polarity. The question is can we convert the text into d-dimensional space in any way or are there any set of rules to be followed for the conversion of text into a d-dimensional vector. The most important property(rule) to be followed is asΒ follows,

Suppose we are having three reviews namely r1, r2, r3 which are in a d-dimensional space(vector) each having vectors v1, v2, v3 respectively. in which r1 and r3 are very Semantically Similar(SS) to r1 and r2Β i.e.,

SS (r1, r3) > SS (r1,Β r2)

then distance(d) between the vectors v1 and v3 is less than the distance between the vectors v1 andΒ v2.

d (v1, v3) < d (v1,Β v2)

So, if the reviews r1, r3 are more semantically similar the vectors v1, v3 must be close to eachΒ other.

Fig2. Representation of Vectors in a d-dimensional space

SS (v1, v3) > SS (v1,Β v2)

which implies length (v1- v3) < length (v1-Β v2)

Which means similar points areΒ closer.

Here arises another question why do we need our vectors to be closed rather than farther to conclude them asΒ similar?

Why closer rather thanΒ farther?

Let’s refer to Fig1 once again if all of our positive reviews are close together as compared to the distance from all of our negative reviews and vice versa. Then it is very easier for us to find a plan that separates both reviews. Hence, we want similar reviews to be closer rather thanΒ farther.

So, the next question is how we find a method that converts text to a d-dimensional vector such that similar text must be closer(geometrically) to each other. Some techniques or strategies to convert text to a d-dimensional vectorΒ are:

1) Bag-of-Words (BoW).

2) Word2VecΒ (w2v).

3) Term frequency-inverse document frequency (tf-idf)

4) tf-idfΒ w2v.

5) AverageΒ w2v.

We will discuss the strategies in upcomingΒ blogs.

Thank you, Happy learningΒ people.


Why convert text to a vector? was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Join thousands of data leaders on the AI newsletter. It’s free, we don’t spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓