Building Tannin, an AI wine sommelier app, taught me that the best infrastructure is the one you're already running.
The Problem: Three Services to Do One Thing
Tannin is a wine discovery app. You scan a label, get a detailed wine card, and chat with an AI sommelier that actually knows what it's talking about. Behind the scenes, the sommelier's knowledge comes from a RAG (Retrieval-Augmented Generation) pipeline: I crawl wine websites, chunk the content into passages, embed them as vectors, and retrieve the most relevant ones at chat time.
My original pipeline looked like this:
[Firecrawl](https://www.firecrawl.dev/) (crawl) -> MongoDB Atlas (store) -> OpenAI (embed) -> MongoDB Atlas (search)
Every chunk left MongoDB, traveled to OpenAI's API, got embedded, then came back to MongoDB for storage and vector search. The pipeline worked. It was just wider than it needed to be. I was already using Atlas as the operational database and the vector store, so embeddings were the one part of the flow still happening somewhere else.
When I started expanding the sommelier with food-pairing knowledge, I decided I wanted to narrow the surface area instead of widening it. That is when I looked more closely at what Atlas already offered.
The Discovery: Atlas Has an Embedding API
MongoDB Atlas now provides direct access to Voyage AI's embedding models through a single endpoint:
POST https://ai.mongodb.com/v1/embeddings
Authorization: Bearer <your-model-api-key>
The API is OpenAI-compatible. Same request format, same response shape. The model I chose, voyage-4-large (served via Atlas's AI endpoint), produces 1024-dimension vectors and has strong retrieval performance in published Voyage benchmarks.
The switch in my Node.js code was three lines:
// Before
const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const model = 'text-embedding-3-small';
// After
const client = new OpenAI({
apiKey: process.env.VOYAGE_API_KEY,
baseURL: 'https://ai.mongodb.com/v1',
});
const model = 'voyage-4-large';
That's it. Same OpenAI SDK, different endpoint. Every existing call to embeddings.create() worked without modification.
The Migration: 40,000 Chunks, One Command
At the time of this migration, before later pruning and reshaping the corpus, I had 40,068 knowledge chunks already embedded with OpenAI at 1536 dimensions. Voyage 4 Large produces 1024-dimension vectors. You can't mix dimensions in a single vector search index, so I had to re-embed everything.
This sounds scary. It wasn't. My generateEmbeddings.js script already supported --force (re-embed all chunks) and batching. I changed the provider config and ran:
node scripts/generateEmbeddings.js --force
40,068 chunks. 401 batches. 7.4M tokens. $0.15. Zero errors. Done.
Then I updated the Atlas Vector Search index from 1536 to 1024 dimensions and the pipeline was fully migrated.
Why This Matters
A cleaner AI stack on the same platform
Before: OPENAI_API_KEY for embeddings + MONGODB_URI for storage + search. After: VOYAGE_API_KEY for embeddings + MONGODB_URI for storage + search.
Same number of keys, but embeddings now run through Atlas's AI endpoint, so the workflow is more coherent. One platform for storage, vector search, and embedding calls.
Simpler failure modes
When your embedding provider and your vector database sit this close together, there is one less network hop and one less moving piece to think about. The RAG pipeline now has two real dependencies: Firecrawl for crawling, and Atlas for the rest of the retrieval path.
Strong embeddings, smaller vectors
Voyage 4 Large at 1024 dimensions uses 33% less storage per vector than 1536-dimension embeddings. For the 40,068 chunks I was migrating at the time, that was a meaningful reduction in index size while keeping retrieval quality high.
The food-pairing pipeline that made the choice obvious
The migration happened because I was adding food-pairing knowledge to the sommelier. I pulled together pairing material from public sources and reference sites using Firecrawl, chunked the content, and needed embeddings. With Atlas handling embedding calls through its AI endpoint, the entire pipeline is:
[Firecrawl](https://www.firecrawl.dev/) (crawl) -> MongoDB Atlas (store + embed + search)
1,003 food-pairing chunks from 131 pages. The sommelier can now tell you why Riesling works with Thai curry (sweet counters spice) or why Chablis cuts through lobster bisque (acid meets cream). All grounded in real editorial content, all searchable via vector similarity.
The Technical Details
For anyone considering this migration, here's what I learned:
Dimension mismatch requires a full re-embed. You cannot mix 1536-dim and 1024-dim vectors in one Atlas Vector Search index. That is not a MongoDB problem so much as a normal consequence of changing embedding models. Plan for a one-time re-embed of the whole corpus. In my case, the 40,068 historical chunks in the migration batch cost $0.15 and finished cleanly.
The OpenAI SDK works as-is. Set baseURL to https://ai.mongodb.com/v1 and pass your Voyage API key. No new dependencies.
Don't pass the dimensions parameter. Voyage determines output dimensions from the model name. Passing dimensions in the request body returns an error. My code conditionally skips it:
const opts = { model: 'voyage-4-large', input: texts };
if (!useVoyage) opts.dimensions = 1536; // Only for OpenAI
const response = await openai.embeddings.create(opts);
Check which models your key has access to. My Atlas Model API key only had access to voyage-4-large, not voyage-3. The error message is clear ("Model voyage-3-lite is not available for caller"), but it's worth verifying upfront.
Update your vector search index. After re-embedding, update numDimensions in your Atlas Vector Search index definition to match the new model's output (1024 for Voyage 4 Large).
Post-Migration Discovery: Keep One Embedding Standard
A month after the initial migration, I found a gap. I had re-embedded all 40,068 chunks in a single --force run, but the batch had only processed the winery and food-pairing chunks that existed at that time. Educational content (grape, region, concept articles imported from Wikipedia) was added later using the old OpenAI pipeline and landed in the database with 1536-dimension vectors.
Atlas Vector Search expects the index definition and the stored vectors to agree. In practice, that means mismatched vectors simply do not participate in retrieval. I discovered the gap because my evals showed that "What is malolactic fermentation?" was returning winery boilerplate instead of the actual educational article that should have surfaced.
The fix was straightforward: re-embed the three educational categories with Voyage.
node scripts/generateEmbeddings.js --force --category grape # 1,174 chunks
node scripts/generateEmbeddings.js --force --category concept # 1,517 chunks
node scripts/generateEmbeddings.js --force --category region # 1,848 chunks
4,539 chunks. 1.47M tokens. $0.03. Zero errors.
The lesson: once you pick an embedding standard, every import path needs to honor it. A --force re-embed fixes the initial corpus, but later imports can still drift if they use the old model. I added a dimension check to the import scripts so the pipeline now fails early instead of quietly splitting into two standards.
Category-Balanced Retrieval
Re-embedding fixed visibility. But I still had a relevance problem: 31,000 winery chunks versus 11,000 educational and reference chunks. A single $vectorSearch returns the top results globally, so winery content dominated by volume even when educational chunks were more relevant to the question.
The fix: run parallel category-scoped vector searches, one per content group, and merge results by score.
const CATEGORY_GROUPS = [
{ categories: ['grape', 'region', 'concept'], limit: 3 },
{ categories: ['food-pairing'], limit: 3 },
{ categories: ['winery'], limit: 3 },
];
Each group gets its own $vectorSearch call with a category: { $in: [...] } filter (supported by the existing Atlas index). The query embedding is computed once and reused across all three calls via Promise.all. Results are deduplicated by _id, sorted by score, and trimmed to the top 6 for the system prompt.
The before/after numbers on my 8-query eval:
| Metric | Before (global top-6) | After (balanced) |
|---|---|---|
| Avg of avg scores | 0.708 | 0.743 |
| Avg of max scores | 0.748 | 0.790 |
| Expected category hit | 3/8 (37%) | 8/8 (100%) |
| Category distribution | winery(24) food-pairing(24) | food-pairing(22) grape(10) concept(8) winery(5) region(3) |
The malolactic fermentation query went from returning only winery chunks at 0.69 to returning the actual concept article at 0.84. The Burgundy terroir query found Burgundy wine region articles at 0.80. Pinot Noir returns the grape article at 0.79.
The nice part is that Atlas already gave me the primitives to do this cleanly: filtered vector search, parallel queries, and normal aggregation logic. I did not need to add a second retrieval system. I just needed to use the database more deliberately.
Critically, when a caller passes an explicit category filter (for example the food-pairing flow), balanced retrieval is bypassed and the original single-scope search runs unchanged.
What I Was Watching Next
At the time I wrote this, I was exploring Atlas's automated embedding feature, which could push even more of this flow into the database path at write time. For this migration, though, the batch pipeline worked well for knowledge ingestion, and the Voyage endpoint handled query-time embedding cleanly.
The lesson is simple: before adding another service to your stack, check what your current platform already does well. For me, Atlas went from "the database" to "the database, the vector store, and the embedding endpoint" without forcing me to rework my application code.
---
Tannin is an iOS wine discovery app built on MongoDB Atlas, Node.js, and SwiftUI. I use Atlas for operational data, vector search, and now embeddings, all on a single M10 cluster.