Skip to main content
AuditedReader is a LlamaIndex-compatible reader that audits each document before it enters your index. It masks PII and filters out low-quality files automatically, so the documents you index are already clean and safe.

Install

1

Install both packages

pip install flexorch-audit llama-index

Load documents with AuditedReader

from flexorch_audit.integrations.llamaindex import AuditedReader

reader = AuditedReader(
    min_grade="B",                     # Exclude documents graded C or D
    mask_pii=True,                     # Mask PII before indexing
    locales=["universal", "tr"],       # Restrict detection to these jurisdictions
)

documents = reader.load_data(
    file_paths=["contracts/agreement.pdf", "reports/q1.docx"]
)

for doc in documents:
    print(doc.metadata["quality_grade"])       # "A"
    print(doc.metadata["pii_findings_count"])  # 3
    print(doc.text[:200])                      # PII already masked
Documents that don’t meet min_grade are excluded from the returned list. After calling load_data(), inspect reader.skipped to see which files were dropped and the reason for each exclusion.

Parameters

ParameterTypeDefaultDescription
min_gradestr"D"Minimum quality grade to include ("A", "B", "C", or "D")
mask_piiboolTrueMask PII spans before the document is indexed
localeslist[str]allRestrict PII detection to specific jurisdictions (e.g. "tr", "us")

Full index pipeline

Pass the documents returned by AuditedReader directly to VectorStoreIndex to build a queryable index over privacy-safe content.
from llama_index.core import VectorStoreIndex
from flexorch_audit.integrations.llamaindex import AuditedReader

reader = AuditedReader(min_grade="B", mask_pii=True)
documents = reader.load_data(file_paths=["data/"])

index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()

response = query_engine.query("What are the payment terms in the contracts?")
print(response)
Documents excluded by min_grade or another filter are available in reader.skipped as a list of objects containing the file path, quality grade, and the reason the document was skipped.
LlamaIndex’s default node parser may split documents into smaller chunks after loading. PII masking happens at the document level before chunking, so every chunk inherits the masked text.