The ErrorYou’ve just finished indexing 5,000 documents into your vector store. You run your query script, expecting a smart response, but the application crashes immediately. Instead of results, you get a blunt error message:
chromadb.errors.NotFoundError: Collection 'my_docs' not found
This usually means your script is looking for a database that doesn't exist in its current context. Even if the files are on your hard drive, ChromaDB can't see them because the pathing or initialization logic is slightly off.
Why This HappensChromaDB is lightweight and fast, but its default persistence behavior catches many developers off guard. Usually, the culprit is one of three things:
- Path Confusion: Your ingestion script saved data to
./db, but your query script is looking in./chroma_dbor a different subdirectory.- Memory-Only Storage: You used a standardClient()instead of aPersistentClient(). This stores everything in RAM. Once the script stops, your data evaporates.- Case Sensitivity: ChromaDB treats 'My_Docs' and 'my_docs' as completely different entities. A single capital letter will trigger aNotFoundError.## Step-by-Step Fixes### 1. Force Absolute PathsRelative paths are dangerous in Python. If you run your script from/home/user/project, the path./chroma_dataworks. If you move into a/srcfolder and run it again, Python looks for/home/user/project/src/chroma_data, which is empty. Use theosmodule to lock down the exact location of your data.
import os
import chromadb
# Define a fixed location for your data
current_dir = os.path.dirname(os.path.abspath(__file__))
db_path = os.path.join(current_dir, "vector_storage")
# Always use PersistentClient for RAG apps
client = chromadb.PersistentClient(path=db_path)
try:
collection = client.get_collection(name="my_docs")
print(f"Successfully connected. Items in collection: {collection.count()}")
except Exception as e:
print(f"Could not find collection: {e}")
2. Fix LangChain PersistenceIf you are using LangChain, the integration can be opaque. If you don't explicitly define a persist_directory, LangChain might initialize a transient database that disappears after the process ends.
The Wrong Way:
# This often defaults to an ephemeral in-memory store
vectorstore = Chroma(collection_name="my_docs", embedding_function=embeddings)
The Right Way:
from langchain_chroma import Chroma
vectorstore = Chroma(
collection_name="my_docs",
embedding_function=embeddings,
persist_directory="./chroma_db_storage" # This must match your ingestion path exactly
)
3. Debug with list_collections()Before pulling your hair out over a collection name, ask the client what it actually sees. This simple script acts as a diagnostic tool to verify your database connection.
client = chromadb.PersistentClient(path="./chroma_db")
# Print every collection currently on disk
existing_collections = client.list_collections()
print(f"Found {len(existing_collections)} collections:")
for col in existing_collections:
print(f" - {col.name}")
# Logical check
if "my_docs" not in [c.name for c in existing_collections]:
print("CRITICAL: 'my_docs' is missing. Check your ingestion script logic.")

