AuditedLoader is a LangChain-compatible document loader that audits each file before it enters your chain. It masks PII in-place and skips documents that fall below your minimum quality threshold — so your vector store only ever sees clean, safe content.
Install
Install both packages
pip install flexorch-audit langchain-community
Load documents with AuditedLoader
from flexorch_audit.integrations.langchain import AuditedLoader
loader = AuditedLoader(
file_paths=["contracts/agreement.pdf", "invoices/inv_001.pdf"],
min_grade="B", # Skip documents graded C or D
mask_pii=True, # Replace PII before loading
locales=["universal", "tr", "de"], # Restrict detection to these jurisdictions
)
docs = loader.load()
for doc in docs:
print(doc.metadata["quality_grade"]) # "A"
print(doc.metadata["pii_findings_count"]) # 2
print(doc.page_content[:200]) # PII already masked
Documents that don’t meet min_grade are excluded from the returned list. Check loader.skipped after calling load() to see which files were dropped and why.
Parameters
| Parameter | Type | Default | Description |
|---|
file_paths | list[str] | required | Paths to the documents you want to load |
min_grade | str | "D" | Minimum quality grade to include ("A", "B", "C", or "D") |
mask_pii | bool | True | Replace PII spans with [MASKED_...] placeholders before loading |
locales | list[str] | all | Restrict PII detection to specific jurisdictions (e.g. "tr", "de", "us") |
In a RAG pipeline
Use AuditedLoader as a drop-in replacement for any LangChain document loader. The documents it returns are already masked and quality-filtered, so you can pass them straight to your embeddings and vector store.
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from flexorch_audit.integrations.langchain import AuditedLoader
loader = AuditedLoader(
file_paths=["docs/"],
min_grade="B",
mask_pii=True,
)
docs = loader.load()
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(docs, embeddings)
retriever = vectorstore.as_retriever()
Documents with a quality grade below min_grade are silently skipped. After calling load(), inspect loader.skipped for a list of excluded files along with their grades.
As a pre-processing step
If you’re working with user-supplied text rather than files, use redact_for_llm() inline with a RunnableLambda to strip PII before it reaches your chain.
from langchain_core.runnables import RunnableLambda
from flexorch_audit import redact_for_llm
# redact_for_llm returns (clean_text, summary) — take only the text
safe_chain = RunnableLambda(lambda text: redact_for_llm(text)[0]) | your_chain
redact_for_llm() always applies the redact strategy, replacing each PII span with a [MASKED_TYPE] label. See Masking if you need a different strategy.