LlamaIndex Integration with flexorch-audit AuditedReader

AuditedReader is a LlamaIndex-compatible reader that audits each document before it enters your index. It masks PII and filters out low-quality files automatically, so the documents you index are already clean and safe.

Install

Install both packages

pip install flexorch-audit llama-index

Load documents with AuditedReader

from flexorch_audit.integrations.llamaindex import AuditedReader

reader = AuditedReader(
    min_grade="B",                     # Exclude documents graded C or D
    mask_pii=True,                     # Mask PII before indexing
    locales=["universal", "tr"],       # Restrict detection to these jurisdictions
)

documents = reader.load_data(
    file_paths=["contracts/agreement.pdf", "reports/q1.docx"]
)

for doc in documents:
    print(doc.metadata["quality_grade"])       # "A"
    print(doc.metadata["pii_findings_count"])  # 3
    print(doc.text[:200])                      # PII already masked

Documents that don’t meet min_grade are excluded from the returned list. After calling load_data(), inspect reader.skipped to see which files were dropped and the reason for each exclusion.

Parameters

Parameter	Type	Default	Description
`min_grade`	`str`	`"D"`	Minimum quality grade to include (`"A"`, `"B"`, `"C"`, or `"D"`)
`mask_pii`	`bool`	`True`	Mask PII spans before the document is indexed
`locales`	`list[str]`	all	Restrict PII detection to specific jurisdictions (e.g. `"tr"`, `"us"`)

Full index pipeline

Pass the documents returned by AuditedReader directly to VectorStoreIndex to build a queryable index over privacy-safe content.

from llama_index.core import VectorStoreIndex
from flexorch_audit.integrations.llamaindex import AuditedReader

reader = AuditedReader(min_grade="B", mask_pii=True)
documents = reader.load_data(file_paths=["data/"])

index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()

response = query_engine.query("What are the payment terms in the contracts?")
print(response)

Documents excluded by min_grade or another filter are available in reader.skipped as a list of objects containing the file path, quality grade, and the reason the document was skipped.

LlamaIndex’s default node parser may split documents into smaller chunks after loading. PII masking happens at the document level before chunking, so every chunk inherits the masked text.

​Install

​Load documents with AuditedReader

​Parameters

​Full index pipeline

Install

Load documents with AuditedReader

Parameters

Full index pipeline