After you detect PII with detect_pii(), you pass the findings to mask() to replace each sensitive span with a safe placeholder. The original text is never modified in place — mask() always returns a new string.
Basic usage
from flexorch_audit import detect_pii, mask
text = "Email us at hello@example.com or call +49 30 1234567."
findings = detect_pii(text)
masked = mask(text, findings)
print(masked)
# Email us at [MASKED_EMAIL] or call [MASKED_PHONE_DE].
Masking strategies
mask() supports four strategies. Pass your chosen strategy as the third argument.
| Strategy | Example output | Best for |
|---|
redact (default) | [MASKED_EMAIL] | Production datasets and compliance logs |
replace | user@example.com | Synthetic plausible data for testing |
token | <EMAIL_1> | Structure-preserving NLP pipelines |
hash | a3f2b19c... | Deterministic, reversible anonymization |
# Token strategy — keeps grammatical structure intact
masked = mask(text, findings, strategy="token")
# Email us at <EMAIL_1> or call <PHONE_DE_1>.
# Hash strategy — same input always produces the same hash
masked = mask(text, findings, strategy="hash")
# Replace strategy — substitutes plausible synthetic values
masked = mask(text, findings, strategy="replace")
Use token when you need to preserve sentence structure for downstream NLP. Use hash when you need to consistently anonymize the same value across multiple documents.
One-liner: redact_for_llm()
When you want to detect and mask in a single call — optimized for preparing text as LLM input — use redact_for_llm(). It runs detect_pii() and mask() internally and returns the cleaned text alongside a compact summary.
from flexorch_audit import redact_for_llm
clean_text, summary = redact_for_llm(text)
print(clean_text)
# Email us at [MASKED_EMAIL] or call [MASKED_PHONE_DE].
print(summary)
# {"count": 2, "types": ["email", "phone_de"]}
redact_for_llm() always uses the redact strategy. If you need a different strategy, call detect_pii() and mask() separately.