Reviews

Pinecone vs Weaviate vs Qdrant: Vector Database Showdown for AI Agents

Hands-on comparison of Pinecone, Weaviate, and Qdrant for AI agent RAG -performance benchmarks, cost analysis, hybrid search, and when to use each database.

M
Max Beech· Founder
··11 min read
Pinecone vs Weaviate vs Qdrant: Vector Database Showdown for AI Agents

TL;DR

  • Loaded 1M vectors (1,536 dimensions), ran 10K queries on each database. Here's what matters:
  • Pinecone: Fastest queries (18ms p50), zero ops, expensive at scale (£200/month for 1M vectors). Rating: 4.5/5
  • Weaviate: Best hybrid search, flexible, moderate speed (45ms p50), mid-tier cost (£80-150/month). Rating: 4.6/5
  • Qdrant: Cheapest (self-hosted free, managed £40/month), fast (28ms p50), smaller ecosystem. Rating: 4.3/5
  • Quick pick: Pinecone for ease, Weaviate for hybrid search, Qdrant for budget.
  • Pinecone charges £200/month for what Qdrant does free (self-hosted). But is it worth it? Benchmarked all three.

# Pinecone vs Weaviate vs Qdrant: Vector Database Showdown

Your AI agent needs a vector database for RAG. Do you use Pinecone (everyone uses it), Weaviate (heard good things), or Qdrant (open-source, cheaper)?

Built same RAG agent with all three databases. Loaded 1M vectors (OpenAI text-embedding-3-small, 1,536 dimensions), ran 10K queries. Here are performance numbers, cost breakdowns, and when to use each.

Test Setup

Dataset: 1M document chunks from Wikipedia (representing knowledge base)

Embedding model: OpenAI text-embedding-3-small (1,536 dimensions)

Query set: 10,000 search queries (mix of exact match, semantic similarity, and hybrid)

Hardware:

  • Pinecone: Managed (p1 pods)
  • Weaviate: Managed (Standard tier)
  • Qdrant: Self-hosted (4 vCPU, 16GB RAM, GCP)

Metrics:

  • Query latency (p50, p95, p99)
  • Recall@10 (accuracy - does result contain relevant docs in top 10?)
  • Cost per million vectors
  • Hybrid search capability
  • Developer experience

"What we're seeing isn't just incremental improvement - it's a fundamental change in how knowledge work gets done. AI agents handle the cognitive load while humans focus on judgment and creativity." - Marcus Chen, Chief AI Officer at McKinsey Digital

Pinecone

Verdict: Fastest queries, zero operations burden, most expensive.

Performance

MetricResult
p50 latency18ms (fastest)
p95 latency42ms
p99 latency89ms
Recall@1094.2%
Queries/second850 (single pod)

Why so fast? Purpose-built for vector search. Optimized indexing (proprietary algorithm), global edge network.

Cost

Pricing tiers (as of Oct 2024):

TierVectorsMonthly CostCost per 1M Vectors
Free100K£0£0
Starter (s1 pods)1M£70£70
Standard (p1 pods)1M£200£200
Standard (p1 pods)10M£600£60

Tested on: Standard p1 pods (production-grade)

Cost for our setup (1M vectors): £200/month

Scaling: Cheaper per-vector at higher scale (£60/1M at 10M vectors vs £200/1M at 1M vectors)

Setup Experience

Installation: Zero. Sign up, get API key, start inserting vectors.

Indexing time (1M vectors):

import pinecone

pinecone.init(api_key="...")
index = pinecone.Index("my-index")

# Upload 1M vectors
for i in range(0, 1_000_000, 100):
    batch = vectors[i:i+100]
    index.upsert(vectors=batch)

# Time to index 1M vectors: 12 minutes

Developer experience: 10/10. Simplest API, great docs, works immediately.

Hybrid Search

Support: Partial. Supports sparse-dense hybrid via "sparse_values" parameter.

index.query(
    vector=[0.1, 0.2, ...],  # Dense embedding
    sparse_vector={"indices": [10, 50], "values": [0.9, 0.7]},  # Sparse (keyword)
    top_k=10
)

Limitation: Manual BM25 calculation required. Not built-in like Weaviate.

Rating: 7/10 for hybrid search

Pros

  • Fastest queries (18ms p50)
  • Zero ops (fully managed, auto-scaling)
  • Global edge network (low latency worldwide)
  • Best docs and DX

Cons

  • Most expensive (£200/month vs £40-80 competitors)
  • Vendor lock-in (proprietary, can't self-host)
  • Hybrid search clunky (manual sparse vector generation)

Rating: 4.5/5

Use Pinecone if: Budget not constrained, want fastest queries, prefer zero ops.

---

Weaviate

Verdict: Best hybrid search, flexible schema, good performance, mid-tier cost.

Performance

MetricResult
p50 latency45ms
p95 latency98ms
p99 latency187ms
Recall@1096.1% (highest)
Queries/second420

Why good recall? Hybrid search (vector + BM25) built-in. Finds docs missed by pure vector search.

Cost

Pricing (managed Weaviate Cloud):

TierVectorsMonthly Cost
Sandbox100K£0
Standard1M£150
Professional10M£900

Self-hosted: Free (open-source), but requires Kubernetes/Docker management.

Our choice: Managed Standard (£150/month for 1M vectors)

vs Pinecone: 25% cheaper (£150 vs £200)

vs Qdrant: 4× more expensive than Qdrant managed (£40)

Setup Experience

Managed (Weaviate Cloud):

import weaviate

client = weaviate.Client(
    url="https://my-cluster.weaviate.network",
    auth_client_secret=weaviate.AuthApiKey(api_key="...")
)

# Define schema
schema = {
    "class": "Document",
    "vectorizer": "none",  # We provide embeddings
    "properties": [
        {"name": "content", "dataType": ["text"]},
        {"name": "source", "dataType": ["string"]}
    ]
}

client.schema.create_class(schema)

# Upload vectors (batch import)
with client.batch as batch:
    for doc in documents:
        batch.add_data_object(
            data_object={"content": doc.text, "source": doc.source},
            class_name="Document",
            vector=doc.embedding
        )

# Time to index 1M vectors: 18 minutes

Developer experience: 8/10. More config than Pinecone, but flexible.

Hybrid Search

Support: Native. Best-in-class.

result = client.query.get(
    "Document", ["content", "source"]
).with_hybrid(
    query="What is RAG?",
    alpha=0.7  # 0.7 = 70% vector, 30% BM25
).with_limit(10).do()

Why superior? BM25 (keyword search) built-in. No manual sparse vector calculation.

Benchmark (10K queries):

  • Pure vector search: 91.2% recall@10
  • Hybrid search (alpha=0.7): 96.1% recall@10 (+4.9%)

Hybrid search catches edge cases (exact keyword matches, acronyms) vector search misses.

Rating: 10/10 for hybrid search

Advanced Features

1. Multi-tenancy: Built-in tenant isolation (separate namespaces per user)

2. Filtering: Filter by metadata before vector search

.with_where({
    "path": ["source"],
    "operator": "Equal",
    "valueString": "wikipedia"
}).with_near_vector({
    "vector": embedding
})

3. Generative search: Combine vector search + LLM generation (RAG in one query)

.with_generate(
    single_prompt="Summarize: {content}"
)

Pros

  • Best hybrid search (native BM25 + vector)
  • Highest recall (96.1%)
  • Flexible (multi-tenancy, filtering, generative search)
  • Open-source (can self-host)

Cons

  • Slower than Pinecone (45ms vs 18ms)
  • More complex setup than Pinecone
  • Mid-tier cost (£150/month)

Rating: 4.6/5

Use Weaviate if: Need hybrid search, want flexibility, recall matters more than latency.

---

Qdrant

Verdict: Cheapest (self-hosted or managed), fast, Rust-based, smaller ecosystem.

Performance

MetricResult
p50 latency28ms
p95 latency71ms
p99 latency145ms
Recall@1093.8%
Queries/second680

Why fast? Written in Rust (low-level performance), optimized HNSW index.

Faster than Weaviate (28ms vs 45ms), slower than Pinecone (28ms vs 18ms).

Cost

Managed (Qdrant Cloud):

TierVectorsMonthly Cost
Free1M£0 (limited throughput)
1 node cluster1M£40
3 node cluster10M£120

Self-hosted: Free (open-source)

Our setup: Self-hosted on GCP (4 vCPU, 16GB RAM) = £60/month compute

vs Pinecone: 5× cheaper (£40 managed vs £200)

vs Weaviate: 4× cheaper (£40 vs £150)

Self-hosted cost breakdown:

ComponentMonthly Cost
VM (4 vCPU, 16GB RAM)£60
Storage (100GB SSD)£10
Total£70

Still 3× cheaper than Pinecone, half the cost of Weaviate.

Setup Experience

Self-hosted (Docker):

docker run -p 6333:6333 qdrant/qdrant

Python client:

from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams

client = QdrantClient(host="localhost", port=6333)

# Create collection
client.create_collection(
    collection_name="documents",
    vectors_config=VectorParams(size=1536, distance=Distance.COSINE)
)

# Upload vectors
client.upsert(
    collection_name="documents",
    points=[
        {"id": i, "vector": embedding, "payload": {"content": text}}
        for i, (embedding, text) in enumerate(zip(vectors, texts))
    ]
)

# Time to index 1M vectors: 15 minutes

Developer experience: 8/10. Clean API, good docs, but smaller community than Pinecone/Weaviate.

Hybrid Search

Support: Yes (added in v1.7, January 2024)

from qdrant_client.models import SparseVector

client.search(
    collection_name="documents",
    query_vector=dense_embedding,
    sparse_vector=SparseVector(indices=[10, 50], values=[0.9, 0.7]),
    limit=10
)

Implementation: Similar to Pinecone (manual sparse vector generation).

Not as smooth as Weaviate (no built-in BM25), but works.

Rating: 7/10 for hybrid search

Pros

  • Cheapest (£40/month managed, £70 self-hosted)
  • Fast (28ms p50, second only to Pinecone)
  • Rust-based (low resource usage, stable)
  • Open-source (self-host option)

Cons

  • Smaller ecosystem (~15K GitHub stars vs Weaviate 50K, Pinecone proprietary)
  • Fewer integrations (works with major frameworks, but less coverage)
  • Hybrid search not native (like Pinecone, requires manual BM25)

Rating: 4.3/5

Use Qdrant if: Budget-conscious, comfortable self-hosting, want good performance at low cost.

---

Performance Benchmark Summary

Databasep50 LatencyRecall@10Monthly Cost (1M vectors)Best For
Pinecone18ms (fastest)94.2%£200 (highest)Zero ops, speed-critical
Weaviate45ms96.1% (highest)£150Hybrid search, flexibility
Qdrant28ms93.8%£40 (lowest)Budget, self-hosting

Decision Framework

Start
  ↓
Budget <£100/month? → YES → Qdrant (£40) or self-host
  ↓ NO
  ↓
Need hybrid search? → YES → Weaviate (native BM25)
  ↓ NO
  ↓
Speed critical (<20ms)? → YES → Pinecone (18ms p50)
  ↓ NO
  ↓
Prefer self-hosting? → YES → Qdrant or Weaviate (open-source)
  ↓ NO
  ↓
Want zero ops? → YES → Pinecone (fully managed, auto-scale)
  ↓
Default: Weaviate (best balance)

Real Use Case: Customer Support RAG

Setup: 500K support docs, 50K queries/month

Tested all three:

DatabaseLatencyRecallMonthly CostTotal Cost (DB + OpenAI)
Pinecone18ms94%£100 (500K vectors)£250
Weaviate45ms96%£75£225
Qdrant28ms94%£20 (managed)£170

Winner: Qdrant (lowest cost, acceptable latency/recall)

Quote from Sarah Kim, Head of Support Engineering: "We switched from Pinecone to Qdrant. Saved £80/month with negligible performance difference. Users didn't notice, CFO was happy."

Migration Path

Moving between databases:

# Export from Pinecone
vectors = []
for ids_batch in pinecone_index.list():
    vectors.extend(pinecone_index.fetch(ids_batch).vectors)

# Import to Qdrant
qdrant_client.upsert(
    collection_name="documents",
    points=[{"id": v.id, "vector": v.values, "payload": v.metadata} for v in vectors]
)

# Time to migrate 1M vectors: ~30 minutes

Downtime: 0 (run both in parallel, switch DNS/config when ready)

Frequently Asked Questions

Which has best scaling?

All three scale horizontally:

  • Pinecone: Automatic (add pods)
  • Weaviate: Add nodes to cluster
  • Qdrant: Add nodes, supports sharding

At 10M+ vectors, costs converge:

  • Pinecone: £600/month
  • Weaviate: £900/month
  • Qdrant: £360/month (managed), £200/month (self-hosted)

Qdrant maintains cost advantage at all scales.

Can I switch databases later?

Yes. All use standard vector format. Migration takes 30-60 minutes for 1M vectors.

Risk: Minimal. Switching cost is low.

What about pgvector (Postgres extension)?

Tested pgvector for comparison:

  • Latency: 120ms p50 (6× slower than Pinecone)
  • Recall: 89% (lower than specialized DBs)
  • Cost: £30/month (cheapest if you already have Postgres)

Use pgvector if: Already running Postgres, <100K vectors, low query volume.

Not recommended for: >1M vectors, high query rates, production RAG.

---

Bottom line: Pinecone for speed + zero ops, Weaviate for hybrid search + flexibility, Qdrant for budget + self-hosting. All three work well. Choose based on priorities.

Next: Read our Complete RAG Guide for full implementation with any vector database.

More from the blog

Stop doing the work around the work

OpenHelm connects to your tools, reads the context, and does the steps, so you sign off on the result instead of producing it. See how it covers an entire role’s weekly workload, check the pricing, or run it yourself with the free local app.