Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Building AI-Powered Applications with CockroachDB Vector Search: From Theory to Practice
Latest   Machine Learning

Building AI-Powered Applications with CockroachDB Vector Search: From Theory to Practice

Last Updated on November 12, 2024 by Editorial Team

Author(s): Siddharth Kshirsagar

Originally published on Towards AI.

What are Vector Databases ?

In today’s AI-driven landscape, vector databases have emerged as a critical technology for managing and querying high-dimensional data in ways that mirror human understanding. These specialized database systems store data as mathematical vectors β€” numerical representations that capture the semantic essence of text, images, audio, or other content β€” enabling applications to perform similarity searches based on meaning rather than exact matches. Unlike traditional databases that excel at storing and retrieving structured data, vector databases are specifically optimized for calculating distances between vectors, making them indispensable for modern applications like semantic search, recommendation systems, image recognition, and natural language processing. Their ability to efficiently process and compare millions of high-dimensional vectors has made them a fundamental component of AI infrastructure, allowing organizations to build more intelligent, context-aware applications that can understand and process information in ways that were previously impossible with conventional database systems.

What is CockroachDB ?

CockroachDB is a distributed SQL database designed for resilience and scalability. At its core, it operates on a transactional and strongly-consistent key-value store, offering horizontal scalability and robust ACID compliance. One of its defining features is its ability to survive various failures with minimal disruption while maintaining data consistency across distributed environments. This is achieved through automated sharding, multi-region deployments, and its implementation of the Raft consensus algorithm. The database provides a familiar SQL API, making it particularly accessible for teams already working with PostgreSQL, while its cloud-native architecture makes it ideal for globally distributed applications that require both scale and transactional integrity.

Why use CockroachDB as Vector Database ?

The recent introduction of Vector Search capabilities in CockroachDB 24.2 has positioned it as a compelling solution for AI-driven applications and vector similarity search use cases. By implementing pgvector compatibility, CockroachDB now offers efficient storage and querying of high-dimensional vectors within its distributed architecture. This integration eliminates the need for a separate vector database while leveraging CockroachDB’s existing strengths in high availability and horizontal scalability. The ability to parallelize similarity searches across multiple nodes makes it particularly effective for large-scale vector operations in applications like semantic search, recommendation systems, and natural language processing tasks. While specialized vector databases might offer more optimized performance for pure vector search use cases, CockroachDB’s solution is particularly attractive for organizations looking to consolidate their infrastructure while maintaining robust vector search capabilities alongside their operational data.

CockroachDB’s Vector Capabilities ?

CockroachDB’s vector extension, introduced in version 24.2, brings comprehensive vector processing capabilities to its distributed SQL platform through a VECTOR data type that handles fixed-length arrays of floating-point numbers. This pgvector-compatible implementation supports essential distance metrics including cosine distance (<=>), Euclidean distance (<->), and negative inner product (<#>), making it suitable for various AI applications. The cosine similarity metric excels in semantic search by comparing directional similarity regardless of magnitude, while Euclidean distance provides intuitive spatial measurements, and the L1 (Manhattan) distance offers an alternative for cases where absolute differences between vector components are more relevant.

Implementing CockroachDB vector search using CRDB Serverless.

Before implementing the vector database, we will need to create a CRDB Serverless cluster the process is very simple.
1. Login/Signup to: https://cockroachlabs.cloud

2. Click on Create Cluster button.

3. Select Basic Plan

4. Select Respective region

5. Select Capacity

7. Select Appropriate Cluster Name

8. Wait for the cluster to be created then create a appropriate SQL user.

9. Select the connection option as python

Post that, copy the Download CA Cert command based on your operating system and paste it in terminal, If you are using windows make sure you paste it in PowerShell.

10. For Connecting we will be using SQL Alchemy. Select the same in dropdown.

Testing the connection in python

  1. Install latest version of SQL alchemy package. (When I tested I was running python 3.11 so if error persists with higher versions try downgrading to python 3.11)
pip install sqlalchemy-cockroachdb 
pip install pandas
pip install psycopg2-binary

2. Test the connection

To copy the connection string copy everything user export DATABASE_URL which is in quotes.

import os 
os.environ['DATABASE_URL'] = '<Your Connection String>'

from sqlalchemy import create_engine, text
engine = create_engine(os.environ['DATABASE_URL'])
conn = engine.connect()

response = conn.execute(text('SELECT now()')).fetchall()
print(response)
[(datetime.datetime(2024, 11, 6, 7, 28, 50, 688678, tzinfo=datetime.timezone.utc),)]

Testing a very simple vector search process.

We will create a dummy database for simple cloths dataset.

And to test we will create a sample clothing item and test the similarity.

  1. Create a Table to store the data
## Create a table with vector column 
create_table_query = text("""
CREATE TABLE IF NOT EXISTS product_embeddings (
id SERIAL PRIMARY KEY,
name TEXT,
embedding VECTOR(3) -- Using 3 dimensional vectors for simplicity
)
"""
)

with engine.connect() as conn:
conn.execute(create_table_query)
conn.commit()

2. Insert Data

# Insert sample products with embeddings 
insert_query = text("""
INSERT INTO product_embeddings (name, embedding) VALUES
('Red Shirt', '[1.0, 0.0, 0.0]'),
('Blue Shirt', '[0.9, 0.1, 0.0]'),
('Green Pants', '[0.0, 1.0, 0.0]'),
('Blue Pants', '[0.0, 0.9, 0.1]')
"""
)

with engine.connect() as conn:
conn.execute(insert_query)
conn.commit()

3. Perform Similarity Search

# Perform vector similarity search
search_query = text("""
SELECT
name,
embedding,
1 - cosine_distance(embedding, '[1.0,0.0,0.0]') as similarity
FROM product_embeddings
ORDER BY 1 - cosine_distance(embedding, '[1.0,0.0,0.0]')
"""
)

with engine.connect() as conn:
result = conn.execute(search_query)

result.fetchall()

[('Green Pants', '[0,1,0]', 0.0),
('Blue Pants', '[0,0.9,0.1]', 0.0),
('Blue Shirt', '[0.9,0.1,0]', 0.9938837488013375),
('Red Shirt', '[1,0,0]', 1.0)]

This was a simple illustration now let’s understand how to use vector databases to solve real world problems.

Smart Content Search: Building an AI-Powered Marketing Library That Understands Context, Not Just Keywords

Problem Statement
Marketing teams waste countless hours searching through past campaigns, often missing valuable content because traditional search tools only match exact keywords. When a marketer searches for β€œprofessional workspace campaign,” they might miss highly relevant content labeled as β€œoffice environment promotion” or β€œcorporate culture initiative.” This keyword limitation means:

1. Lost Time: Teams spend 30–40% of their time searching for existing content and campaign references
2. Duplicate Work: Similar campaigns are recreated because existing ones aren’t found
3. Missed Opportunities: Valuable insights from past campaigns remain undiscovered
4. Inconsistent Messaging: Teams can’t easily find and reference similar audience targeting strategies

We’re solving this by building a vector database that:
– Understands the meaning behind marketing content, not just keywords
– Finds similar campaigns based on context and audience targeting
– Connects related content even when they use different terminology
– Makes our entire marketing library searchable by concept and intent

Simply put: We’re transforming our marketing database from a basic filing cabinet into an intelligent library that understands what our content means, not just what it says.

Content Search Illustration
Audience Search Illustration

Step 1 : Necessary Libraries Installation

Sentence Transformers are a type of machine learning model designed to transform sentences into high-dimensional vectors, capturing their semantic meaning. These vectors can be used for various natural language processing tasks, such as semantic search, clustering, and classification. The sentence-transformers library provides pre-trained models that can be easily integrated into your projects. To install it, you can use the following command:

pip install sentence-transformers

Step 2: Import Necessary Libraries

import os 
from datetime import datetime
import numpy as np
from sentence_transformers import SentenceTransformer
from sqlalchemy import create_engine, text
from typing import Dict, List
from tqdm import tqdm
engine = create_engine(os.environ['DATABASE_URL'])

Instantiating the SentenceTransformer model from sbert.net

model = SentenceTransformer('all-MiniLM-L6-v2')
model


SentenceTransformer(
(0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)

Step 3: Function to Create table to store marketing content

# Create table using raw SQL

def create_marketing_content():
create_table_query = text("""
CREATE TABLE IF NOT EXISTS marketing_content (
id SERIAL PRIMARY KEY,
title TEXT NOT NULL,
content_type TEXT NOT NULL,
description TEXT NOT NULL,
target_audience TEXT,
key_messaging TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
content_embedding VECTOR(384),
audience_embedding VECTOR(384)
);
"""
)

with engine.connect() as conn:
conn.execute(create_table_query)
conn.commit()

Sample Data:

MARKETING_CONTENT = [
{
"title": "Summer Fitness Challenge",
"content_type": "campaign",
"description": "30-day fitness challenge promoting our premium workout gear. Includes social media content, email sequences, and influencer partnerships.",
"target_audience": "Health-conscious millennials, aged 25-35, interested in fitness and wellness, active on Instagram",
"key_messaging": "Transform your summer with premium workout gear. Join our 30-day challenge for exclusive rewards."
},
{
"title": "Business Professional Collection",
"content_type": "product_launch",
"description": "Launch campaign for our new line of professional business attire. Focus on quality materials and modern designs.",
"target_audience": "Corporate professionals, ages 30-50, fashion-conscious, looking for high-quality business wear",
"key_messaging": "Elevate your professional wardrobe with timeless pieces designed for modern success."
},
{
"title": "Eco-Friendly Initiative",
"content_type": "brand_campaign",
"description": "Sustainability campaign highlighting our transition to 100% recycled materials and carbon-neutral manufacturing.",
"target_audience": "Environmentally conscious consumers, all ages, prioritize sustainable and ethical products",
"key_messaging": "Join us in creating a sustainable future. Every purchase makes a difference."
},
{
"title": "Student Smart Start",
"content_type": "seasonal_campaign",
"description": "Back-to-school campaign featuring student discounts, bundle deals, and campus essentials.",
"target_audience": "College students, ages 18-22, budget-conscious, looking for quality basics and study gear",
"key_messaging": "Start the semester right with student-exclusive deals and essential gear for success."
},
{
"title": "Winter Sports Collection",
"content_type": "product_launch",
"description": "Premium winter sports gear launch featuring innovative materials and cutting-edge designs for snow sports enthusiasts.",
"target_audience": "Winter sports enthusiasts, ages 25-45, passionate about skiing and snowboarding",
"key_messaging": "Conquer the slopes with gear that combines performance and style."
},
{
"title": "Mother's Day Appreciation",
"content_type": "seasonal_campaign",
"description": "Curated gift collections and special promotions celebrating mothers, featuring luxury accessories and self-care items.",
"target_audience": "Gift shoppers, ages 25-55, looking for premium gifts for mothers",
"key_messaging": "Celebrate the special women in your life with thoughtfully curated gifts."
},
{
"title": "Urban Commuter Series",
"content_type": "product_launch",
"description": "Launch of versatile clothing line designed for urban professionals who bike or walk to work.",
"target_audience": "Urban professionals, ages 25-40, environmentally conscious, active commuters",
"key_messaging": "Style meets function for the modern urban commuter."
},
{
"title": "Digital Nomad Collection",
"content_type": "product_launch",
"description": "Versatile travel-friendly clothing and accessories designed for remote workers and digital nomads.",
"target_audience": "Remote workers, ages 25-40, frequent travelers, tech-savvy professionals",
"key_messaging": "Work from anywhere in style with our travel-ready essentials."
},
{
"title": "Wellness Wednesday",
"content_type": "campaign",
"description": "Weekly wellness content series featuring workout tips, mindfulness practices, and healthy living products.",
"target_audience": "Health and wellness enthusiasts, ages 25-45, interested in holistic health",
"key_messaging": "Make wellness a priority with weekly inspiration and premium gear."
},
{
"title": "Holiday Gift Guide",
"content_type": "seasonal_campaign",
"description": "Comprehensive holiday shopping guide featuring curated gift collections for different personalities and budgets.",
"target_audience": "Holiday shoppers, all ages, looking for meaningful and quality gifts",
"key_messaging": "Find the perfect gift for everyone on your list."
},
{
"title": "Sustainable Basics",
"content_type": "product_launch",
"description": "Launch of everyday essentials made from organic and recycled materials, focusing on minimal environmental impact.",
"target_audience": "Environmentally conscious consumers, ages 20-40, interested in sustainable fashion",
"key_messaging": "Everyday essentials that feel good and do good."
},
{
"title": "Adventure Photography Series",
"content_type": "campaign",
"description": "Content series featuring outdoor photographers wearing our gear in spectacular locations.",
"target_audience": "Outdoor enthusiasts and photographers, ages 25-45, interested in adventure and photography",
"key_messaging": "Capture life's adventures in gear designed for the journey."
},
{
"title": "Spring Renewal Collection",
"content_type": "seasonal_campaign",
"description": "Fresh, vibrant spring collection featuring lightweight fabrics and nature-inspired colors.",
"target_audience": "Fashion-forward consumers, ages 25-45, looking to refresh their wardrobe",
"key_messaging": "Embrace the season with fresh styles and renewed energy."
},
{
"title": "Tech-Smart Workwear",
"content_type": "product_launch",
"description": "Innovation-focused workwear featuring smart fabrics and tech-friendly design elements.",
"target_audience": "Tech professionals, ages 25-45, interested in innovative and functional clothing",
"key_messaging": "Where technology meets style in the modern workplace."
},
{
"title": "Summer Festival Series",
"content_type": "campaign",
"description": "Festival-ready fashion collection with bohemian influences and practical features.",
"target_audience": "Festival-goers, ages 18-35, music and fashion enthusiasts",
"key_messaging": "Express yourself in style at this summer's hottest festivals."
},
{
"title": "Mindful Movement",
"content_type": "campaign",
"description": "Yoga and meditation-focused campaign promoting comfortable, sustainable activewear.",
"target_audience": "Yoga practitioners and mindfulness enthusiasts, ages 25-55",
"key_messaging": "Move mindfully in comfort and style."
},
{
"title": "Black Friday Preview",
"content_type": "seasonal_campaign",
"description": "Early access campaign for biggest sale of the year, featuring exclusive deals and member benefits.",
"target_audience": "Deal-seeking shoppers, all ages, looking for premium products at great values",
"key_messaging": "Get early access to our biggest savings of the year."
},
{
"title": "Athletic Performance Line",
"content_type": "product_launch",
"description": "High-performance athletic wear featuring moisture-wicking technology and ergonomic design.",
"target_audience": "Serious athletes, ages 20-40, focused on performance and quality",
"key_messaging": "Push your limits in gear designed for peak performance."
},
{
"title": "Capsule Wardrobe Challenge",
"content_type": "campaign",
"description": "30-day challenge promoting minimalist fashion and sustainable consumption.",
"target_audience": "Minimalism enthusiasts, ages 25-45, interested in sustainable fashion",
"key_messaging": "Do more with less. Build your perfect capsule wardrobe."
},
{
"title": "Weekend Warrior Collection",
"content_type": "product_launch",
"description": "Versatile outdoor gear designed for weekend adventures and casual outdoor activities.",
"target_audience": "Casual outdoor enthusiasts, ages 25-45, seeking versatile weekend wear",
"key_messaging": "From city to trail, gear for every weekend adventure."
},
{
"title": "Professional Petites",
"content_type": "product_launch",
"description": "Tailored professional wear designed specifically for petite frames.",
"target_audience": "Professional women with petite frames, ages 25-50, seeking well-fitted workwear",
"key_messaging": "Professional style perfectly proportioned for petite frames."
},
{
"title": "Father's Day Tech Gear",
"content_type": "seasonal_campaign",
"description": "Curated collection of tech-friendly clothing and accessories perfect for tech-savvy dads.",
"target_audience": "Gift shoppers buying for tech-interested fathers, ages 25-55",
"key_messaging": "Innovative gifts for the tech-savvy dad."
},
{
"title": "Autumn Outdoor Living",
"content_type": "seasonal_campaign",
"description": "Fall collection focused on outdoor lifestyle and layered fashion for changing weather.",
"target_audience": "Outdoor lifestyle enthusiasts, ages 25-45, interested in fall fashion",
"key_messaging": "Embrace the outdoors in style this fall."
},
{
"title": "Smart Casual Revolution",
"content_type": "brand_campaign",
"description": "Campaign redefining smart casual wear for the modern workplace.",
"target_audience": "Professional workers in casual offices, ages 25-45",
"key_messaging": "Redefining workplace style for the modern professional."
},
{
"title": "Summer Beach Essentials",
"content_type": "seasonal_campaign",
"description": "Beach-ready collection featuring UV protection and quick-dry materials.",
"target_audience": "Beach and summer lifestyle enthusiasts, ages 18-45",
"key_messaging": "Make waves this summer with essential beach gear."
},
{
"title": "Luxury Loungewear",
"content_type": "product_launch",
"description": "Premium comfort wear featuring high-end materials and sophisticated designs.",
"target_audience": "Luxury consumers, ages 30-55, valuing comfort and quality",
"key_messaging": "Elevate your downtime with luxury comfort wear."
},
{
"title": "Active Kids Collection",
"content_type": "product_launch",
"description": "Durable, comfortable clothing line designed for active children and youth.",
"target_audience": "Parents of active children, ages 30-45, seeking quality kids' wear",
"key_messaging": "Built to keep up with active kids."
},
{
"title": "Virtual Fashion Week",
"content_type": "campaign",
"description": "Digital showcase of new collections through virtual runway shows and interactive content.",
"target_audience": "Fashion enthusiasts, ages 20-45, interested in digital fashion experiences",
"key_messaging": "Experience fashion's future through immersive digital showcases."
},
{
"title": "Winter Wellness",
"content_type": "campaign",
"description": "Winter health and wellness campaign featuring cold-weather workout gear and recovery essentials.",
"target_audience": "Winter fitness enthusiasts, ages 25-45, committed to year-round wellness",
"key_messaging": "Stay active and well through the winter months."
},
{
"title": "Graduation Collection",
"content_type": "seasonal_campaign",
"description": "Professional wear and accessories curated for recent graduates entering the workforce.",
"target_audience": "Recent graduates, ages 21-25, preparing for professional life",
"key_messaging": "Start your professional journey with confidence and style."
}
]

Step 4: Insert Content

Before that let’s write a function which converts numpy vectors to pgvectors

def numpy_vector_to_pg_vector(vector: np.array) -> str:
"""
Convert a numpy array to a PostgreSQL vector string format.

Args:
vector (np.array): Input numpy array to be converted to PostgreSQL vector format

Returns:
str: String representation of vector in PostgreSQL format [x1,x2,...,xn]

Example:
>>> arr = np.array([1, 2, 3])
>>> numpy_vector_to_pg_vector(arr)
'[1,2,3]'

Notes:
- Flattens multi-dimensional arrays to 1D
- Vector values should be numeric (int/float)
- Returns string format compatible with PostgreSQL vector type
"""

return json.dumps(vector.flatten().tolist())

Function to insert content

def insert_content(data_content: Dict[str, str], batch_size=100):
"""
Batch insert marketing content with generated embeddings into database.

Args:
data_content (Dict[str, str]): Dictionary containing marketing content data with keys:
- title: Title of the marketing content
- content_type: Type of content (campaign, ad, blog etc)
- description: Content description
- target_audience: Target audience description
- key_messaging: Key message of the content
batch_size (int, optional): Number of records to insert in each batch. Defaults to 100.

Notes:
- Uses sentence transformer model to generate embeddings from content and audience text
- Content embedding combines description and key messaging
- Audience embedding generated from target audience text
- Performs batched inserts for better performance
- Automatically commits each batch

Example:
>>> content = {
"title": "Summer Campaign",
"content_type": "campaign",
"description": "Summer sale promotion",
"target_audience": "Young adults",
"key_messaging": "Get summer ready"
}
>>> insert_content([content], batch_size=100)
"""

insert_query = text(
"""
INSERT INTO marketing_content (
title,
content_type,
description,
target_audience,
key_messaging,
content_embedding,
audience_embedding
)
VALUES (
:title,
:content_type,
:description,
:target_audience,
:key_messaging,
:content_embedding,
:audience_embedding
)
"""

)

for i in range(0, len(data_content), batch_size):
batch = data_content[i : i + batch_size]
batch_parameters = []

for content in tqdm(batch):
content_text = f"{content['description']} {content['key_messaging']}"
content_embedding = model.encode(content_text)
audience_embedding = model.encode(content["target_audience"])

batch_parameters.append(
{
"title": content["title"],
"content_type": content["content_type"],
"description": content["description"],
"target_audience": content["target_audience"],
"key_messaging": content["key_messaging"],
"content_embedding": numpy_vector_to_pg_vector(content_embedding),
"audience_embedding": numpy_vector_to_pg_vector(audience_embedding),
}
)

with engine.connect() as conn:
conn.execute(insert_query, batch_parameters)
conn.commit()

Function to perform vector search

def search_marketing_content(query: str, search_type: str = 'content', limit: int = 5):
"""
Search marketing content using semantic similarity based on vector embeddings.

Args:
query (str): Search query text to find similar content
search_type (str, optional): Type of search to perform. Can be either:
- 'content': Search based on content description and messaging
- 'audience': Search based on target audience description
Defaults to 'content'.
limit (int, optional): Maximum number of results to return. Defaults to 5.

Returns:
List[Row]: List of matching records sorted by similarity score, containing:
- id: Record ID
- title: Content title
- content_type: Type of content
- description: Content description
- target_audience: Target audience
- key_messaging: Key message
- created_at: Creation timestamp
- similarity_score: Cosine similarity score (0-1, higher is more similar)

Notes:
- Uses sentence transformer model to convert query to vector embedding
- Calculates cosine similarity between query vector and stored embeddings
- Higher similarity scores indicate better matches
- Returns results sorted by similarity in descending order

Example:
>>> # Search content
>>> results = search_marketing_content("sustainable products", search_type="content")
>>>
>>> # Search audience
>>> results = search_marketing_content("young professionals", search_type="audience")
"""

embedding_column = (
"content_embedding" if search_type == "content" else "audience_embedding"
)

search_query = text(
f"""
SELECT
id,
title,
content_type,
description,
target_audience,
key_messaging,
created_at,
1 - cosine_distance({embedding_column}, :search_embedding) as similarity_score

FROM marketing_content
ORDER BY 1 - cosine_distance({embedding_column}, :search_embedding) DESC
LIMIT :limit
"""

)

with engine.connect() as conn:
result = conn.execute(search_query, {
'search_embedding': numpy_vector_to_pg_vector(model.encode(query)),
"limit": limit
})
return result.fetchall()

Execute the whole code to perform vector search

def main():
# Initialize model
print('Initialize the model')
model = SentenceTransformer('all-MiniLM-L6-v2')

# Create table and insert data
create_marketing_content()
print('Table Created')

insert_content(data_content=MARKETING_CONTENT)

# Perform content similarity search
print('\nPerform content search')
content_search = "sustainable eco-friendly products environmental impact"
result = search_marketing_content(content_search, search_type='content')
for res in result:
print(f"Title: {res[1]}, Content Type: {res[2]}")

# Perform audience similarity search
print('\nPerforming audience search')
audience_search = "young professionals interested in luxury brands and fashion"
result = search_marketing_content(audience_search, search_type='audience')
for res in result:
print(f"Title: {res[1]}, Content Type: {res[2]}")

if __name__ == "__main__":
main()
Initialize the model
Table Created
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 30/30 [00:00<00:00, 47.55it/s]

Perform content search
Title: Eco-Friendly Initiative, Content Type: brand_campaign
Title: Sustainable Basics, Content Type: product_launch
Title: Capsule Wardrobe Challenge, Content Type: campaign
Title: Mindful Movement, Content Type: campaign
Title: Tech-Smart Workwear, Content Type: product_launch

Performing audience search
Title: Luxury Loungewear, Content Type: product_launch
Title: Spring Renewal Collection, Content Type: seasonal_campaign
Title: Tech-Smart Workwear, Content Type: product_launch
Title: Business Professional Collection, Content Type: product_launch
Title: Sustainable Basics, Content Type: product_launch

Conclusion:

In this comprehensive exploration of vector databases and CockroachDB, we’ve journeyed from fundamental concepts to practical implementation, unveiling the powerful capabilities of modern database technology in AI-driven applications. Our investigation has covered several crucial aspects:

Key Takeaways

Vector Database Evolution

  • We’ve seen how vector databases have become essential infrastructure for AI applications, enabling sophisticated similarity searches and semantic understanding
  • The integration of vector capabilities into traditional databases like CockroachDB represents a significant step forward in database technology

CockroachDB’s Vector Implementation

  • The introduction of vector search capabilities in CockroachDB 24.2 demonstrates how established databases are evolving to meet modern AI requirements
  • The pgvector compatibility provides a robust foundation for vector operations while maintaining CockroachDB’s core strengths in scalability and reliability

Practical Applications

  • Through our simple product search example, we demonstrated the basic principles of vector search implementation
  • Our detailed marketing library case study showcased a real-world application, illustrating how vector databases can transform content discovery and audience targeting

Future Implications

  • The integration of vector capabilities into distributed SQL databases like CockroachDB suggests a trend toward consolidated database solutions
  • This convergence of traditional and vector database capabilities opens new possibilities for applications requiring both structured data and semantic search

Real-World Impact

Our marketing library implementation demonstrated how vector databases can solve tangible business problems:

  • Reducing time spent searching for relevant content
  • Preventing duplicate work through better content discovery
  • Enabling context-aware search that understands meaning beyond keywords
  • Improving content reusability and consistency across marketing initiatives

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓