AuditedReader is a LlamaIndex-compatible reader that audits each document before it enters your index. It masks PII and filters out low-quality files automatically, so the documents you index are already clean and safe.
Install
Install both packages
pip install flexorch-audit llama-index
Load documents with AuditedReader
from flexorch_audit.integrations.llamaindex import AuditedReader
reader = AuditedReader(
min_grade="B", # Exclude documents graded C or D
mask_pii=True, # Mask PII before indexing
locales=["universal", "tr"], # Restrict detection to these jurisdictions
)
documents = reader.load_data(
file_paths=["contracts/agreement.pdf", "reports/q1.docx"]
)
for doc in documents:
print(doc.metadata["quality_grade"]) # "A"
print(doc.metadata["pii_findings_count"]) # 3
print(doc.text[:200]) # PII already masked
Documents that don’t meet min_grade are excluded from the returned list. After calling load_data(), inspect reader.skipped to see which files were dropped and the reason for each exclusion.
Parameters
| Parameter | Type | Default | Description |
|---|
min_grade | str | "D" | Minimum quality grade to include ("A", "B", "C", or "D") |
mask_pii | bool | True | Mask PII spans before the document is indexed |
locales | list[str] | all | Restrict PII detection to specific jurisdictions (e.g. "tr", "us") |
Full index pipeline
Pass the documents returned by AuditedReader directly to VectorStoreIndex to build a queryable index over privacy-safe content.
from llama_index.core import VectorStoreIndex
from flexorch_audit.integrations.llamaindex import AuditedReader
reader = AuditedReader(min_grade="B", mask_pii=True)
documents = reader.load_data(file_paths=["data/"])
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("What are the payment terms in the contracts?")
print(response)
Documents excluded by min_grade or another filter are available in reader.skipped as a list of objects containing the file path, quality grade, and the reason the document was skipped.
LlamaIndex’s default node parser may split documents into smaller chunks after loading. PII masking happens at the document level before chunking, so every chunk inherits the masked text.