The Error
You're running an aggregation pipeline with $lookup on a sharded MongoDB cluster and it fails with:
MongoServerError: $lookup from a sharded collection is not allowed
The culprit is the foreign collection โ the one named in from โ being sharded. MongoDB has strict rules about this, and older versions simply refuse to run the join.
Root Cause
MongoDB's $lookup needs to resolve the foreign collection locally on a single node. When that collection is sharded, its data is spread across multiple shards. Without a targeted routing strategy, MongoDB would have to query every shard to complete the join โ a scatter-gather that hits all N shards in your cluster.
Rather than silently executing that expensive operation, MongoDB blocks it entirely.
The specific restriction depends on version:
- MongoDB < 5.1:
$lookupto a sharded foreign collection is not allowed at all. - MongoDB 5.1+: Allowed, but only under specific conditions โ co-located data on the same shard, or cross-shard with performance trade-offs.
Common triggers:
- Running the aggregation via mongos with a sharded
fromcollection in MongoDB < 5.1 - The source collection is unsharded but the
fromcollection is sharded - Using
$lookupinside a$facetwhere either side is sharded
Fix: Multiple Approaches
Option 1 โ Upgrade to MongoDB 5.1 or Later (Recommended Long-Term)
MongoDB 5.1 added native support for $lookup on sharded collections. On 4.x or early 5.x, upgrading is the cleanest path โ no query rewrites, no schema changes.
Check your current version first:
mongosh --eval "db.version()"
Stuck on an older version for now? The options below work without upgrading.
Option 2 โ Connect Directly to a Shard (Bypass mongos)
Skip mongos entirely and connect straight to a shard's primary. The aggregation runs locally on that shard, so the $lookup restriction doesn't apply.
// Connect directly to the shard's primary
mongosh "mongodb://shard1-primary:27017/mydb"
db.orders.aggregate([
{
$lookup: {
from: "products",
localField: "product_id",
foreignField: "_id",
as: "product_info"
}
}
])
Caveat: This only works if the from collection's data actually lives on that shard. If you have 3 shards and products are spread across all of them, you'll get partial results โ silently. Verify your shard distribution before using this in production.
Option 3 โ Unshard the Foreign Collection
Not every collection in a sharded database needs to be sharded. If your from collection is a small reference table โ product catalog, user roles, config data โ keep it unsharded. The restriction only applies to sharded foreign collections.
// Check if a collection is sharded
use config
db.collections.find({ _id: "mydb.products" })
To remove sharding, your options depend on version. Before 7.0, dump and restore without a shard key. Starting with MongoDB 7.0, there's a dedicated command:
// MongoDB 7.0+ only
db.adminCommand({ unshardCollection: "mydb.products" })
Option 4 โ Use the Pipeline Form of $lookup (MongoDB 5.1+)
The pipeline syntax gives MongoDB's query planner more information to work with. On 5.1+, it can use this to optimize the join against a sharded foreign collection:
db.orders.aggregate(
[
{
$lookup: {
from: "products",
let: { pid: "$product_id" },
pipeline: [
{ $match: { $expr: { $eq: ["$_id", "$$pid"] } } }
],
as: "product_info"
}
}
],
{ allowDiskUse: true }
)
Making the join condition explicit lets MongoDB route to the right shard rather than broadcasting the query across all of them.
Option 5 โ Denormalize and Embed at Write Time
For performance-critical pipelines, skip the join entirely. Embed the fields you need directly into the source document at write time โ reads then require no cross-collection lookups at all.
// Embed product data at insert time instead of joining at read time
db.orders.insertOne({
_id: ObjectId(),
product_id: "abc123",
product_snapshot: {
name: "Widget Pro",
price: 29.99,
sku: "WP-001"
}
})
Yes, you're duplicating some data. But on a sharded cluster with millions of orders, avoiding scatter-gather joins on every read is often worth the trade-off.
Verify the Fix
Run the aggregation with a $limit: 1 to test quickly:
db.orders.aggregate([
{
$lookup: {
from: "products",
localField: "product_id",
foreignField: "_id",
as: "product_info"
}
},
{ $limit: 1 }
])
No MongoServerError? Good. Even an empty result ([]) means the pipeline ran successfully.
Next, check the explain plan. You want "stage": "EQ_LOOKUP" โ not SHARD_MERGE on the foreign side, which signals an expensive cross-shard broadcast:
db.orders.explain("executionStats").aggregate([
{
$lookup: {
from: "products",
localField: "product_id",
foreignField: "_id",
as: "product_info"
}
}
])
Prevention
- Decide sharding strategy before you shard. Collections used as
fromtargets in$lookupare usually better left unsharded โ unless they're genuinely large (tens of millions of documents). - Mirror production topology in staging. A single-node dev environment won't catch sharding restrictions. Test aggregations against a cluster that matches production's shard layout.
- Target MongoDB 5.1+ for new sharded deployments. Cross-shard
$lookupsupport alone makes the upgrade worthwhile. - Think of reference/lookup tables like dimension tables in a data warehouse โ small, stable, and better off centralized. Don't shard them reflexively just because the rest of the database is sharded.
Tips
When debugging aggregation pipelines across dev and prod, accidental edits or copy-paste errors in pipeline JSON can cause behavior differences that are hard to trace. I use ToolCraft's Hash Generator to compare pipeline file hashes between environments โ a quick way to rule out "did this file actually change?" before going deeper.

