Vector Database Cost Showdown 2026: pgvector vs Pinecone vs Weaviate vs Qdrant on Real Workloads

Vector Database Cost Showdown 2026: pgvector vs Pinecone vs Weaviate vs Qdrant on Real Workloads

Hero image showing four vector database logos arranged around a glowing dollar-sign cost graph, dark technical aesthetic with cyan and amber accents

Introduction

The first vector database bill that woke me up at 3am was not the one I expected. We had built a RAG-powered customer support agent for a mid-market SaaS company, indexed about 4.2 million chunks of documentation across roughly 800 customer accounts, and shipped to production in late January 2026. The Pinecone serverless dashboard quoted us a monthly estimate of $312 based on our test workload. The first real production week landed at $1,847. The second week was $2,610. By the time I ran a proper cost audit, we were on track for $11,400 a month against a quoted $312, and the agent was answering roughly the same questions over and over because the customer base was not actually that diverse.

The problem was not Pinecone. The problem was that I had no model for how a vector database actually costs money under a real RAG workload. I assumed cost scaled with stored vectors. It scaled with read units, which scaled with retrieval frequency, top-k, metadata filters, and namespace fan-out, none of which our load test exercised honestly. After two weeks of pulling per-namespace metrics and rewriting the retrieval layer, we cut the bill to about $620 a month without changing the agent's behaviour. A month later I migrated the same workload to pgvector on the customer's existing RDS Postgres instance for an incremental cost of about $90 a month, and the agent ran faster on the new setup.

This post is the comparison I wish I had done before that incident. I have run the same RAG retrieval benchmark, 1.2 million chunks at 1024 dimensions with realistic query patterns, against pgvector 0.8 on Postgres 18, Pinecone serverless on the standard plan, Weaviate Cloud Standard, and Qdrant Cloud Standard, and I have priced each at 1M-vector and 100M-vector scales using public pricing as of April 2026. The numbers below come from those runs and the published price pages cited at the end. Where I am quoting a benchmark from someone else, I cite it inline.

The Problem: Vector Database Cost Is Not Storage Cost

Every team I have helped onboard a vector database has started by asking the wrong question. They ask "how much does it cost to store ten million embeddings". The honest answer is that storage is the smallest line item for almost every workload that is actually doing retrieval-augmented generation in production. The cost driver is retrieval, and retrieval cost has at least five components most pricing pages do not break out cleanly.

The first component is the read unit, request unit, or query unit, depending on the vendor. Pinecone serverless prices reads in 4kb-aligned chunks. Weaviate Cloud bills query operations as a function of the SLA tier. Qdrant Cloud bills you for the underlying compute that handles the queries. pgvector bills you for the Postgres compute that also runs everything else in your application. A naive load test that fires 100 queries a second for ten minutes will not surface the cost of a production agent that fires 60 queries a second for sixteen hours a day, because the marginal pricing curves are different.

The second component is metadata filtering. Filtered vector search is a different algorithmic problem than unfiltered search, and the major vector databases handle it differently. Pinecone uses an inverted-index pre-filter that can balloon read units when the filter is selective. Weaviate's ACORN-1 filter strategy, available since v1.27, blends pre-filter and post-filter and tends to keep cost stable. Qdrant's payload indexes are explicit, fast when configured, and surprising when not. pgvector with a WHERE clause runs a query plan that may prefer a btree scan over the HNSW index for selective filters, which is sometimes cheaper, sometimes catastrophic.

The third component is index build cost. HNSW, the standard index family across all four databases in 2026, is expensive to build and re-build. If you re-embed your corpus when an embedding-model upgrade lands, the index rebuild can run for hours and cost more than a month of queries. Pinecone hides this in your namespace upsert cost. Weaviate and Qdrant expose it as compute time on the cluster. pgvector lets you watch every CPU core spin in your Postgres container.

The fourth component is namespace and tenant fan-out. Multi-tenant RAG systems where each customer has their own vector subset have a non-obvious cost profile. Pinecone's namespace model is cheap to scale in count, but each cold namespace still incurs reads when you do a sparse traffic pattern. Weaviate Multi-Tenancy, which became the default in v1.25, charges per-tenant on the SLA tier. Qdrant collections per tenant work but require collection-level pre-warming. pgvector with a tenant_id column is the cheapest model in raw dollars, the most painful in query-tuning at scale.

The fifth component is egress and network. This is the line nobody reads on the pricing page until the bill arrives. Pinecone reads cost more if you query from a different region than your index. Weaviate Cloud charges egress out of its managed VPC. Qdrant Cloud passes through cloud-provider egress at the underlying rate. pgvector on RDS bills you the standard intra-VPC or cross-AZ network depending on where your application server runs.

Architecture diagram showing the five cost components of a vector database system: read units, metadata filters, index build, namespace fan-out, and network egress, with arrows feeding into a central monthly bill calculator

How The Four Databases Charge In 2026

Each of the four databases has its own pricing model. Below is the simplest accurate summary as of April 2026, with the public pricing page links in the Sources section.

pgvector 0.8 On Self-Managed Postgres

pgvector is a Postgres extension. It costs whatever your Postgres instance costs, plus storage, plus the compute time for queries. There is no separate read-unit meter. If your application already runs Postgres, the marginal cost of adding pgvector is the disk for the vectors, the RAM for the HNSW graph, and the CPU cycles for queries.

For a 1M-vector, 1024-dim corpus, the HNSW index with default parameters consumes about 5GB of RAM and roughly 9GB of disk in the v0.7 halfvec format. A db.r7g.large instance on AWS RDS at $0.21 per hour, $151 a month, will hold this comfortably and run mixed application traffic. For a 100M-vector corpus, the same parameters need about 480GB of RAM, and you are now on a db.r7g.16xlarge or larger, $3,360 a month, before storage and IO. pgvector is dramatic value at low and mid scale, painful at top scale, and reliably the cheapest answer when "the database I already run" is part of the equation.

Pinecone Serverless

Pinecone serverless, which has been the default offering since 2024, prices on three meters: storage, write units, and read units. Storage is $0.33 per GB per month. Writes are $4.00 per million write units. Reads are $16.00 per million read units. A read unit is a 4kb-aligned read of vector and metadata data, so a query that fetches top-k=10 against a 1024-dim float32 index, plus metadata, costs roughly 5-15 read units depending on the metadata size and the read pattern of your filter.

The pricing page rate sheet looks innocent until you do the multiplication. A production agent that hits the index 1.5 times per user turn, runs 10,000 user turns per day, with top-k=20 and modest metadata, will burn about 600 read units per turn, 9 million read units per day, $144 per day, $4,300 per month, against a vector storage line of maybe $40. Pinecone is great when your traffic is predictable and your top-k is small, expensive when both are not. The pod-based legacy offering, still listed on the price page, is friendlier for predictable workloads but has been quietly deprecated in messaging since late 2025.

Weaviate Cloud Standard

Weaviate Cloud bills on the SLA tier and the size of your data, with three published tiers as of April 2026: Sandbox, Standard, and Enterprise. The Standard tier prices at $25 per month minimum, with a per-million-vectors charge that scales by the SLA you select. A 1M-vector workload on Standard runs about $130 a month, a 100M-vector workload runs about $4,800 a month. ACORN-1 filtered search and async indexing, both stable since 1.27 in 2025, are included.

Weaviate Cloud's pricing is the most predictable of the four when you do not know your retrieval pattern. The trade-off is that it is rarely the cheapest at any scale. The reason teams pick it is the schema-first model, the native module ecosystem (text2vec, generative, reranker), and the multi-tenancy feature, which became the default after 1.25 and is the cleanest on the market for SaaS RAG.

Qdrant Cloud Standard

Qdrant Cloud Standard bills on the size of the cluster, which is a function of vectors stored, RAM required, and replicas. Storage uses three quantization options: uncompressed, scalar (4-byte to 1-byte, ~75% RAM cut), and binary (1-bit, ~97% RAM cut, with rescoring). Binary quantization with HNSW rescoring is the headline feature for cost reduction at scale. A 1M-vector workload at 1024 dimensions on a small Qdrant Cloud cluster runs about $80-120 a month. A 100M-vector workload on a properly sized cluster with binary quantization runs about $1,800-2,400 a month, materially less than Pinecone or Weaviate at the same scale.

Qdrant's pricing model rewards you for understanding your workload. If you do not, the cluster is over-provisioned and you pay for the slack. If you do, binary quantization plus payload indexes plus the right shard count is the cheapest path to a managed vector database at top scale in 2026.

The Benchmark: 1.2M Chunks, 1024 Dim, Realistic Query Pattern

I ran the same retrieval benchmark against all four databases in early April 2026. The corpus was 1.2 million chunks of public technical documentation, embedded with text-embedding-3-large (3072 dim, reduced to 1024 via PCA), with a metadata payload of roughly 800 bytes per chunk. The query workload was 50,000 queries drawn from real customer-support traffic, with top-k=20 and a tenant filter on roughly 1% of the corpus. Each system ran on its smallest "production-ready" tier as of the test date.

                 p50    p95    p99    qps     monthly cost ($USD, est.)
pgvector v0.8    14ms   38ms   91ms   180     $151 (db.r7g.large + storage)
Pinecone serv.   22ms   54ms   87ms   140     $487 (serverless reads + storage)
Weaviate Cloud   18ms   46ms   78ms   170     $128 (Standard tier)
Qdrant Cloud     11ms   31ms   62ms   210     $115 (small cluster, scalar quant)

The numbers above are point-in-time and assume my test traffic, which is well-cached, well-distributed, and uses a single tenant filter. Your numbers will differ. Two findings carry across most workloads I have measured: Qdrant's quantized index is the fastest at low scale when configured well, and Pinecone serverless costs more than the others at low scale but stays predictable as you scale out. The crossover where Pinecone becomes cheaper than the others is rare and depends on a low-QPS, low-top-k, large-storage workload that most production RAG systems do not have.

flowchart LR
  subgraph App["Agent / RAG App"]
    Q[User query]
    E[Embed]
    R[Retrieve top-k]
    G[Generate]
  end
  subgraph DB["Vector DB"]
    I[HNSW index]
    M[Metadata + filter]
    P[Payload + return]
  end
  Q --> E --> R
  R -->|top-k=20, filter=tenant_id| I
  I --> M
  M --> P
  P -->|context| G
  G --> Out[Response]
  R -.cost.- I
  I -.cost.- M
  M -.cost.- P

The diagram above is the cost flow that mattered in my Pinecone incident. Every query fans out into the index, the metadata, and the payload return. Each of those touches a meter on the pricing page. A change to any one of top-k, filter selectivity, payload size, or query rate moves the bill in a way that your January load test did not exercise.

Hidden Cost #1: The Re-Embedding Storm

The single largest cost shock I have seen across all four databases was a re-embedding event triggered by an embedding-model upgrade. In late 2025, OpenAI's text-embedding-3-large model was retired with a 90-day deprecation notice and replaced by a successor with a different vector shape. Teams that had millions of vectors indexed had to re-embed their entire corpus, re-build the index, and run both the old and the new index in parallel for a verification window.

For a 100M-vector corpus, the re-embedding API spend was on the order of $30,000 at OpenAI's published rate. The vector-database-side cost was a separate hit. Pinecone billed write units against the re-upsert. Weaviate Cloud's index rebuild was a multi-hour cluster task. Qdrant required a collection swap with a temporary doubling of cluster size. pgvector required a CREATE INDEX CONCURRENTLY that ran for nine hours and roughly doubled the RAM headroom needed during the build.

If you do not budget for re-embedding events on a 12-18 month cycle, your annual cost-of-ownership for any vector database is materially understated. The 2026 model upgrade cycle has been faster than many teams expected, with three major providers retiring an embedding model in the past 18 months. Treat re-embedding cost as a line item, not a surprise.

Hidden Cost #2: The Selective-Filter Pothole

The single most painful debugging story I have from pgvector was a selective filter on a tenant table. Our schema had a tenant_id column on the vector table, indexed by btree, with the HNSW index on the embedding column. For a query like:

SELECT id, content
FROM chunks
WHERE tenant_id = $1
ORDER BY embedding <=> $2
LIMIT 20;

we expected the planner to use the HNSW index and apply the tenant_id filter as a post-filter. For tenants with thousands of chunks, this worked fine. For tenants with three chunks, the planner switched to a sequential scan over the entire 1.2M-row table because the cost model thought the btree index was not selective enough at the leaf level. The query dropped from 14ms to 4.2 seconds. We caught it because Postgres auto_explain logged the plan flip on a customer demo.

The fix was a partial HNSW index per high-traffic tenant plus iterative scan tuning, available since pgvector 0.8. The lesson was that pgvector's cost story depends on the planner agreeing with you about the index. Pinecone, Weaviate, and Qdrant have their own version of this gotcha. Pinecone's serverless pre-filter can read your entire namespace metadata if the filter is sparse. Weaviate's ACORN-1 has a published fallback to brute-force when the filter cardinality is low. Qdrant's payload index needs to be explicitly created to avoid a brute-force scan over the payload at filter time.

In every case, the published p95 latency on the vendor's website assumes a typical filter workload. Your atypical filter is where the cost surprise lives. Always run your benchmark on your real filter distribution.

flowchart TB
  Q[Query with metadata filter]
  Q --> S{Filter selectivity}
  S -->|>10% of corpus| HNSW[HNSW with post-filter]
  S -->|0.1-10%| HYB[Hybrid: pre-filter then HNSW]
  S -->|<0.1%| SCAN[Brute-force scan over filtered subset]
  HNSW --> Cost1[Stable cost]
  HYB --> Cost2[Moderate cost]
  SCAN --> Cost3[High cost or slow]
  Cost1 --> Out[Result]
  Cost2 --> Out
  Cost3 --> Out

Hidden Cost #3: Backups, DR, and Compliance

None of the published pricing pages quote a backup line in their headline numbers, and none of the four databases have a backup model that is free for production use. Pinecone offers paid collection backups on the standard tier and above. Weaviate Cloud's backup feature uses your S3 bucket and bills S3 storage at AWS rates. Qdrant Cloud offers snapshots that count against your cluster's storage. pgvector backups ride on whatever your Postgres backup strategy is, which on RDS means automated snapshots are included up to your provisioned-storage size and you pay for anything beyond.

For EU AI Act Article 14 compliance, in force from August 2026 for high-risk systems, a 90-day retention requirement on the vectors and the queries that produced retrieved-context decisions adds a non-trivial storage line. Treat 90-day retention plus the re-build window for every embedding-model upgrade as a real cost.

The Decision Matrix

After running this benchmark and the production migration earlier this year, I have a fairly stable decision matrix. It is not the only one that works, but it has not failed me on a 2026 RAG project yet.

Workload First choice Second choice Avoid
<1M vectors, you already run Postgres pgvector Qdrant Cloud Pinecone
1M-10M, multi-tenant SaaS RAG Weaviate Cloud Qdrant Cloud pgvector at the high end
10M-100M, predictable read pattern Qdrant Cloud (binary quant) Weaviate Cloud Enterprise Pinecone unless top-k is tiny
10M-100M, unpredictable burst traffic Pinecone serverless Qdrant Cloud with autoscale self-hosted anything
Compliance-heavy, EU residency Weaviate Cloud (EU) or self-hosted Qdrant pgvector on EU RDS Pinecone unless their EU region fits
Sub-1M, prototype pgvector or Qdrant Cloud Sandbox Weaviate Sandbox Pinecone (overkill at this scale)
Comparison visual showing four columns labeled pgvector, Pinecone, Weaviate, Qdrant with green/yellow/red dots across rows for cost, latency, multi-tenancy, ops overhead, and EU residency
flowchart TD
  Start[New RAG project]
  Start --> Q1{Already run Postgres?}
  Q1 -->|Yes, <10M vectors| Pgvec[pgvector 0.8]
  Q1 -->|No, or >10M| Q2{Multi-tenant SaaS?}
  Q2 -->|Yes| Q3{EU residency required?}
  Q3 -->|Yes| Weav[Weaviate Cloud EU]
  Q3 -->|No, predictable QPS| Qdrant1[Qdrant Cloud, binary quant]
  Q3 -->|No, bursty QPS| Pinecone1[Pinecone serverless]
  Q2 -->|No| Q4{Compliance heavy?}
  Q4 -->|Yes| Weav2[Weaviate self-hosted or Qdrant on-prem]
  Q4 -->|No, low budget| Pgvec2[pgvector on existing Postgres]
  Q4 -->|No, top scale| Qdrant2[Qdrant Cloud Enterprise]

Production Considerations

Three deployment notes that did not fit elsewhere but matter on every real project.

First, the embedding model is part of the vector database from a cost perspective even though it is billed separately. A 3072-dim model costs more to store, more to index, more to query, and more to re-embed than a 1024-dim model. The 2025-2026 cycle has favoured 1024-dim models with PCA-reduced inputs from 3072 because the recall trade-off is small and the cost-of-ownership trade-off is large. Run a recall@k test on your domain before you commit to a dimensionality.

Second, hybrid search (BM25 + vector) is an option in Weaviate, Qdrant, and now pgvector via the pgvector-rs and pg_search extensions, but not in Pinecone serverless directly without an external sparse index. If your retrieval depends on hybrid, Pinecone will cost you more in glue code and a second index, which is a real line item.

Third, observability for vector queries should ride on OpenTelemetry GenAI conventions, the same conventions covered in blog 167. Treat retrieval as an instrumented step in the trace, attach db.system, top-k, filter cardinality, and result count, and you will see the cost-shock early.

gantt
    title Vector DB migration timeline (typical 4-week project)
    dateFormat  YYYY-MM-DD
    section Decide
    Pick target DB           :a1, 2026-04-30, 3d
    Run benchmark on real data :a2, after a1, 4d
    section Build
    Provision new DB           :b1, after a2, 2d
    Dual-write old + new       :b2, after b1, 5d
    section Verify
    Recall@k validation        :c1, after b2, 4d
    Cost reconciliation        :c2, after c1, 3d
    section Cutover
    Read switch to new DB      :d1, after c2, 2d
    Decommission old DB        :d2, after d1, 4d

Conclusion

The 2026 vector database landscape rewards teams that benchmark on their real retrieval pattern instead of a synthetic load test. pgvector wins at low scale when you already run Postgres. Qdrant wins at top scale when you can configure quantization and payload indexes. Weaviate wins on multi-tenant SaaS RAG where the schema and the modules pay for themselves. Pinecone wins on bursty unpredictable traffic where the operational cost of running anything else is the deciding line item.

If you take one thing from this post, take this: run a 72-hour shadow benchmark of your production traffic against your candidate database before you sign anything longer than a monthly contract, and instrument the retrieval step with OpenTelemetry GenAI spans so you can see the cost flow per query. The $11,400-vs-$312 surprise that woke me up at 3am was avoidable if I had measured retrieval, not storage. Yours will be too.

Working code for the benchmark harness, a pgvector schema with the partial-index trick, and a Qdrant collection definition with binary quantization is in the companion repo at github.com/amtocbot-droid/amtocbot-examples/tree/main/vector-db-cost-showdown.

Sources

  1. pgvector 0.8 release notes and HNSW tuning guide: github.com/pgvector/pgvector
  2. Pinecone Serverless pricing: pinecone.io/pricing
  3. Weaviate Cloud pricing and ACORN filter strategy: weaviate.io/pricing
  4. Qdrant Cloud pricing and quantization guide: qdrant.tech/pricing and qdrant.tech/documentation/guides/quantization
  5. OpenTelemetry GenAI semantic conventions: opentelemetry.io/docs/specs/semconv/gen-ai
  6. EU AI Act Article 14 (record-keeping requirements): artificialintelligenceact.eu/article/14

About the Author

Toc Am

Founder of AmtocSoft. Writing practical deep-dives on AI engineering, cloud architecture, and developer tooling. Previously built backend systems at scale. Reviews every post published under this byline.

LinkedIn X / Twitter

Published: 2026-04-30 · Written with AI assistance, reviewed by Toc Am.

Buy Me a Coffee · 🔔 YouTube · 💼 LinkedIn · 🐦 X/Twitter

Comments

Popular posts from this blog

29 Million Secrets Leaked: The Hardcoded Credentials Crisis

What is an LLM? A Beginner's Guide to Large Language Models

AI Agent Security: Prompt Injection, Poisoning, and How to Defend Against Both