Vector Databases & Semantic Search
Embeddings, cosine similarity, pgvector, Chroma, Pinecone, RAG retrieval pipeline.
vector-dbaisearchragembeddings
# Vector Databases & Semantic Search
## Core concept
Convert text/images to dense vectors (embeddings). Similarity search finds nearest neighbors in vector space. Unlike keyword search, understands semantic meaning.
## Generate embeddings
```python
from openai import OpenAI
client = OpenAI()
def embed(text: str) -> list[float]:
resp = client.embeddings.create(
model='text-embedding-3-small', # 1536 dims
input=text,
)
return resp.data[0].embedding
# Cosine similarity
def cosine_similarity(a, b):
import numpy as np
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
```
## pgvector (PostgreSQL extension)
```sql
-- Enable extension
CREATE EXTENSION vector;
-- Table with vector column
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
content TEXT,
metadata JSONB,
embedding vector(1536)
);
-- HNSW index for fast ANN search
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- Insert
INSERT INTO documents (content, embedding) VALUES ($1, $2);
-- Semantic search (top 5)
SELECT id, content, 1 - (embedding <=> $1) AS similarity
FROM documents
ORDER BY embedding <=> $1
LIMIT 5;
-- <=> cosine distance, <-> L2, <#> negative inner product
```
```python
# Python with psycopg2
from pgvector.psycopg2 import register_vector
register_vector(conn)
cursor.execute('SELECT content FROM documents ORDER BY embedding <=> %s LIMIT 5', (embedding,))
```
## Chroma (local, no infra)
```bash
pip install chromadb
```
```python
import chromadb
client = chromadb.PersistentClient(path='./chroma_db')
collection = client.get_or_create_collection(
name='docs',
metadata={'hnsw:space': 'cosine'},
)
# Add documents (Chroma auto-embeds with default model)
collection.add(
documents=['Text one', 'Text two', 'Text three'],
ids=['id1', 'id2', 'id3'],
metadatas=[{'source': 'wiki'}, {'source': 'blog'}, {'source': 'wiki'}],
)
# Query
results = collection.query(
query_texts=['search query here'],
n_results=3,
where={'source': 'wiki'}, # metadata filter
)
print(results['documents'], results['distances'])
```
## Pinecone (managed cloud)
```bash
pip install pinecone-client
```
```python
from pinecone import Pinecone
pc = Pinecone(api_key=os.environ['PINECONE_API_KEY'])
index = pc.Index('my-index')
# Upsert vectors
index.upsert(vectors=[
('id1', embedding1, {'text': 'doc text', 'source': 'url'}),
('id2', embedding2, {'text': 'another doc'}),
])
# Query
results = index.query(vector=query_embedding, top_k=5, include_metadata=True)
for match in results.matches:
print(match.score, match.metadata['text'])
# Filtered search
results = index.query(
vector=query_embedding,
filter={'source': {'$eq': 'wiki'}},
top_k=5,
)
```
## Full RAG pipeline
```python
def rag_query(question: str, collection, llm_client) -> str:
# 1. Embed question
q_embedding = embed(question)
# 2. Retrieve top-k docs
results = collection.query(query_embeddings=[q_embedding], n_results=4)
context = '\n\n'.join(results['documents'][0])
# 3. Generate with context
response = llm_client.chat.completions.create(
model='gpt-4o-mini',
messages=[
{'role': 'system', 'content': f'Answer based on context:\n{context}'},
{'role': 'user', 'content': question},
],
)
return response.choices[0].message.content
```
## Chunking strategies
```python
from langchain.text_splitter import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(
chunk_size=512,
chunk_overlap=64,
separators=['\n\n', '\n', '.', ' '],
)
chunks = splitter.split_text(long_text)
```API: /api/skills/vector-database