AI Education 7 min read

What is a Vector Database? How Semantic Search Powers Modern AI

Vector databases are the memory layer of AI applications. This guide explains embeddings, approximate nearest-neighbour search, HNSW indexing, and how to choose between pgvector, Pinecone, Qdrant, and Weaviate.

G
Gurpreet Singh
March 25, 2026

Why Traditional Databases Can't Power AI Search

Imagine you're building a customer support chatbot. When a user asks "My order hasn't arrived", you need to find the most relevant section of your documentation — maybe it's titled "Shipping Delays and Order Status". A traditional SQL LIKE query won't find it because the words don't match. A full-text search with Elasticsearch might work but only if the right keywords overlap.

What you need is semantic search — a search that understands meaning, not just keywords. That's what vector databases provide.

A vector database stores and indexes high-dimensional vectors — the numerical representations of semantic meaning produced by embedding models — and enables fast, accurate search over those vectors. It is the memory and retrieval layer of modern AI applications.

What is an Embedding Vector?

An embedding is a dense list of floating-point numbers (e.g., 1536 numbers for OpenAI's ada-002 model) that represents the semantic meaning of a piece of text, image, audio, or any other data.

The critical property: semantically similar content produces similar vectors. In the embedding space:

  • "How do I cancel my subscription?" is close to "I want to stop my plan"
  • "The weather is nice today" is far from both of those
  • "King" minus "Man" plus "Woman" approximately equals "Queen" — embeddings encode analogical relationships

This similarity is measured geometrically using cosine similarity (the cosine of the angle between two vectors) or dot product. Cosine similarity of 1.0 means identical direction (same meaning), 0.0 means orthogonal (unrelated), -1.0 means opposite.

Embeddings turn semantic similarity into a mathematical operation — and that's what vector databases are optimised to compute at scale.

The Core Problem: Nearest Neighbour Search

Given a query vector Q and a database of N document vectors, find the K vectors most similar to Q. This is the K-Nearest Neighbour (KNN) problem.

Brute-force KNN computes the similarity between Q and every vector in the database — O(N × D) operations where D is the embedding dimension. For N = 10 million vectors with D = 1536 dimensions, that's 15.36 billion floating-point operations per query. At 100 queries/second, you need over 1.5 trillion FLOPS/second — impractical.

The solution is Approximate Nearest Neighbour (ANN) search — algorithms that find vectors very close to the true nearest neighbours with much less computation, accepting a small accuracy trade-off.

HNSW: The Index Algorithm Powering Most Vector Databases

Hierarchical Navigable Small World (HNSW), proposed by Malkov and Yashunin (2018), is the dominant ANN algorithm used by Qdrant, Weaviate, pgvector, and most other vector databases. It builds a multi-layer graph where each node is a vector and edges connect nearby vectors.

HNSW works like navigating a map using a hierarchy of detail levels:

  • Top layers: Few nodes, long-range connections — a coarse map of the space. Search starts here, quickly narrowing to the right region.
  • Bottom layer: All nodes, short-range connections — a detailed local map. Final precise search happens here.

Search traverses from top to bottom, greedily moving to the nearest node at each layer until no improvement is possible. This greedy graph traversal reaches the approximate nearest neighbours in O(log N) time — orders of magnitude faster than brute force.

HNSW has two key parameters: M (number of connections per node, typically 16) controls graph connectivity and memory usage. efConstruction (search beam width during index building, typically 200) controls index quality at the cost of build time. Higher values mean better recall but more memory and slower indexing.

At query time, ef (search beam width) controls the recall-speed trade-off: higher ef = better recall, more computation.

IVF: Inverted File Index

Inverted File Index (IVF) is an alternative ANN approach using clustering. The vector space is partitioned into Voronoi cells (clusters) using k-means. At query time, only the nearest nprobe cluster centroids are searched, dramatically reducing the search space.

IVF is often combined with Product Quantisation (PQ) to compress vectors — reducing memory by 8–32× at the cost of some accuracy. IVFPQ is the algorithm underlying FAISS, Meta's high-performance vector search library.

Metadata Filtering: The Killer Feature

Pure vector search returns the most semantically similar vectors globally. But in production systems, you almost always need to combine vector similarity with structured filters:

  • Find documents similar to this query where category = "returns" and date > 2024-01-01
  • Find products similar to this description where price < 100 and in_stock = true
  • Find support articles similar to this question where product_version = "3.x"

This is filtered vector search. The challenge is doing it efficiently — applying filters after vector search (post-filtering) wastes computation on irrelevant vectors. Applying filters before vector search (pre-filtering) may leave too few candidates. Modern vector databases like Qdrant use payload filtering that integrates metadata into the HNSW graph structure for efficient combined search.

Choosing a Vector Database

pgvector — PostgreSQL Extension

Best for: Applications already using PostgreSQL that need vector search without additional infrastructure.

pgvector adds a vector data type and two index types to PostgreSQL: ivfflat (IVF, lower memory, slower build) and hnsw (faster queries, more memory). Queries look like standard SQL: ORDER BY embedding <=> $1 LIMIT 5 where <=> is cosine distance.

Pros: No new service to manage, ACID transactions, metadata filtering via standard SQL WHERE clauses, combined with JOINs to existing relational data.

Limitations: Performance degrades above ~1 million vectors without careful tuning. Not as fast as specialised databases at very large scales. HNSW index build is slow.

My recommendation: Perfect for most applications. Up to 5 million vectors with proper HNSW indexing handles the vast majority of business use cases.

Qdrant — Open-Source, High Performance

Best for: Production RAG systems at scale, self-hosted deployments, performance-critical applications.

Written in Rust, Qdrant is built exclusively for vector search and optimised at every layer. It supports HNSW with advanced payload filtering, sparse vectors (for BM25 hybrid search), named vectors per record (text + image embeddings on the same record), and on-disk index storage for large-scale deployments.

Pros: Exceptional query performance (sub-millisecond at millions of vectors), excellent filtering, quantisation support (scalar, product), gRPC API, built-in snapshots and collection management, active development.

Limitations: Separate service to run and maintain. No SQL interface.

Pinecone — Fully Managed Cloud

Best for: Teams that want zero infrastructure management, AWS/GCP integration, and are comfortable with per-query pricing.

Pinecone abstracts all infrastructure — no servers, no index tuning, automatic scaling. The API is simple: upsert vectors, query vectors, filter by metadata. Serverless tier starts free.

Pros: Zero ops, scales automatically, very simple API, good documentation.

Limitations: Vendor lock-in, per-vector pricing becomes expensive at large scale, data leaves your infrastructure (compliance concern), limited filtering compared to Qdrant.

Weaviate

Best for: Multi-modal search (text + image), GraphQL-native teams, built-in LLM integrations.

Weaviate has a unique schema-based approach where objects have typed properties. It supports multi-modal embeddings natively and has modules for direct integration with OpenAI, Cohere, and HuggingFace — so it can auto-vectorise text as it's inserted.

Hybrid Search: Vector + Keyword

Pure semantic search misses exact keyword matches. "GPT-4o" searched semantically might return results about "language models" — correct conceptually but missing the exact product. BM25 (traditional keyword search) catches exact matches but misses semantic intent.

Hybrid search combines both: run BM25 and vector search in parallel, then merge the ranked lists using Reciprocal Rank Fusion (RRF). The combined ranking is consistently better than either alone. Qdrant supports native sparse+dense vector hybrid search. Weaviate's hybrid search module and Elasticsearch with KNN plugin also support this.

Real-World Sizing Example

A support chatbot trained on 50,000 documentation chunks of 512 tokens each:

  • Embedding dimension: 1536 (OpenAI text-embedding-3-small)
  • Memory per vector: 1536 × 4 bytes = 6KB
  • Raw vector storage: 50,000 × 6KB = 300MB
  • HNSW index overhead (M=16): approximately 300MB additional
  • Total memory: ~600MB — fits comfortably in a $20/month Qdrant instance
  • Query latency: 5–15ms for top-10 retrieval

At 1 million chunks (large enterprise knowledge base): ~12GB memory, still runs on a single optimised server. Beyond that, Qdrant's distributed mode or Pinecone handle horizontal scaling.

#Vector Database #Embeddings #pgvector #Qdrant #Pinecone #Semantic Search #RAG #HNSW
G
Gurpreet Singh

Senior Full Stack Developer — Laravel, Vue.js, Nuxt.js & AI. Available for freelance projects.

Hire Me for Your Project

Related Articles