Retrievers¶

Interface - Retriever.search(query, filters: CanonicalFilters, k) -> List[Chunk] - Chunk must include: text, provenance.source_id, provenance.source_type, optional provenance.url/title/domain, and metadata: entity_ids, year, optional metadata.doc_type. - Filters come from CanonicalFilters.from_state_spec(state) (entity_ids, year, source types, doc_types, domains). - Optional async: implement AsyncRetriever.asearch to avoid threadpools in async runner.

Provided adapters - LangChain: LangChainRetrieverAdapter (override doc_to_chunk or subclass). - LlamaIndex: LlamaIndexRetrieverAdapter (override node_to_chunk or subclass). - Chroma: ChromaRetrieverAdapter (embed_fn + metadata filters). - Qdrant: QdrantRetrieverAdapter (embed_fn + filter mapping). - MockRetriever: for tests/demos. - Resilience wrappers: RetryingRetriever (retry/backoff), CircuitBreakerRetriever (trip/half-open/close, optional concurrency guard). - Dedup/rerank helpers: dedup_chunks, rerank_by_score (see retrieve.rerank).

Implement your own

from contextguard import Retriever
from contextguard.core.specs import Chunk, Provenance, SourceType

class MyRetriever:
    def search(self, query, filters=None, k=10):
        # call your backend, then map results to Chunk
        return [
            Chunk(
                text="...",
                score=0.9,
                provenance=Provenance(source_id="doc1", source_type=SourceType.PRIMARY),
                entity_ids=["acme"],
                year=2024,
                metadata={"doc_type": "annual_report"},
            )
        ]

Tips - Populate entity_ids and year to let gating work correctly. - Set metadata.doc_type to benefit from diversity and coverage by doc type. - Use filters to push down constraints to your backend when possible; gating will still hard-check.***