Skip to main content
AuditedLoader is a LangChain-compatible document loader that audits each file before it enters your chain. It masks PII in-place and skips documents that fall below your minimum quality threshold — so your vector store only ever sees clean, safe content.

Install

1

Install both packages

pip install flexorch-audit langchain-community

Load documents with AuditedLoader

from flexorch_audit.integrations.langchain import AuditedLoader

loader = AuditedLoader(
    file_paths=["contracts/agreement.pdf", "invoices/inv_001.pdf"],
    min_grade="B",                        # Skip documents graded C or D
    mask_pii=True,                        # Replace PII before loading
    locales=["universal", "tr", "de"],    # Restrict detection to these jurisdictions
)

docs = loader.load()

for doc in docs:
    print(doc.metadata["quality_grade"])       # "A"
    print(doc.metadata["pii_findings_count"])  # 2
    print(doc.page_content[:200])              # PII already masked
Documents that don’t meet min_grade are excluded from the returned list. Check loader.skipped after calling load() to see which files were dropped and why.

Parameters

ParameterTypeDefaultDescription
file_pathslist[str]requiredPaths to the documents you want to load
min_gradestr"D"Minimum quality grade to include ("A", "B", "C", or "D")
mask_piiboolTrueReplace PII spans with [MASKED_...] placeholders before loading
localeslist[str]allRestrict PII detection to specific jurisdictions (e.g. "tr", "de", "us")

In a RAG pipeline

Use AuditedLoader as a drop-in replacement for any LangChain document loader. The documents it returns are already masked and quality-filtered, so you can pass them straight to your embeddings and vector store.
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from flexorch_audit.integrations.langchain import AuditedLoader

loader = AuditedLoader(
    file_paths=["docs/"],
    min_grade="B",
    mask_pii=True,
)
docs = loader.load()

embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(docs, embeddings)

retriever = vectorstore.as_retriever()
Documents with a quality grade below min_grade are silently skipped. After calling load(), inspect loader.skipped for a list of excluded files along with their grades.

As a pre-processing step

If you’re working with user-supplied text rather than files, use redact_for_llm() inline with a RunnableLambda to strip PII before it reaches your chain.
from langchain_core.runnables import RunnableLambda
from flexorch_audit import redact_for_llm

# redact_for_llm returns (clean_text, summary) — take only the text
safe_chain = RunnableLambda(lambda text: redact_for_llm(text)[0]) | your_chain
redact_for_llm() always applies the redact strategy, replacing each PII span with a [MASKED_TYPE] label. See Masking if you need a different strategy.