Skip to main content
FlexOrch is built around a small set of well-defined primitives. Before you integrate the API or start exploring the dashboard, familiarise yourself with these concepts — they appear throughout the documentation, SDK, and API responses.

Document

A document is any file you upload to FlexOrch for processing. Supported formats include PDF, DOCX, TXT, XLSX, HTML, XML, EML, MSG, JPG, PNG, TIFF, and more — 15+ formats in total. Every uploaded file receives a unique document_id and is retained for the duration defined by your plan’s retention policy. The document itself is immutable once uploaded; all processing results are stored separately in the execution record.

Job

A job represents one processing run for a single document. When you call POST /v1/data-process/async, FlexOrch creates a job and runs the pipeline asynchronously in the background. You track progress by polling GET /v1/jobs/{job_id}. Jobs move through the following states:
StateMeaning
queuedWaiting to be picked up by a worker
runningThe pipeline is actively processing the document
completedProcessing finished successfully — results are available
failedProcessing could not complete — inspect failure_reason

Pipeline

The pipeline is the automated six-step sequence FlexOrch runs on every document:
  1. Extract — Parse raw text from the file. Scanned PDFs and images go through OCR first.
  2. Classify — Detect the document type (invoice, payroll slip, purchase order, and more).
  3. Extract fields — Pull structured data using deterministic rules, with LLM fallback for complex or ambiguous fields.
  4. Detect PII — Identify personal and sensitive data across 46 PII types and three jurisdictions (TR, EU, US).
  5. Quality score — Compute a score from 0 to 100 and assign a letter grade based on extraction completeness, noise ratio, and OCR confidence.
  6. Deliver — Write all results to the pipeline execution record and mark the job as completed.
You don’t configure individual steps — the pipeline runs in full every time. See Pipeline Deep Dive for more detail.

Execution

A pipeline execution is the output record produced by one complete pipeline run. It contains:
  • Extracted structured fields (e.g., vendor name, amounts, dates)
  • Detected document language
  • Quality score and grade
  • PII summary (types found, count, positions)
  • Masked text (when PII was detected and masking was applied)
Every completed job has exactly one associated execution. You query it through the job response or the executions endpoint.

Dataset

A dataset is a curated collection of pipeline executions that you explicitly build and export. Rather than exporting individual job results one at a time, you select the completed jobs you want, build a dataset, and export the whole collection in a single operation. Dataset builds and exports do not consume credits — only document processing does. Datasets can be exported in nine formats: JSONL, CSV, Parquet, Markdown, XML, XLSX, HuggingFace Arrow, and more.

Quality Grade

Every processed document receives a quality score from 0 to 100 and a corresponding letter grade. The grade reflects extraction completeness, OCR confidence, and field noise levels.
GradeScore RangeMeaning
A85 – 100High quality — ready for production use
B65 – 84Good quality — minor gaps or low-confidence fields
C45 – 64Moderate quality — some fields missing or noisy
D0 – 44Low quality — significant extraction issues; review recommended
Use quality grades to filter datasets before fine-tuning or RAG ingestion. Most teams set a minimum threshold of Grade B or higher.

PII Type

A PII type identifies the specific category of personal data found in a document — for example, email, phone_tr, national_id_tr, or iban. FlexOrch detects 46 PII types spanning Turkish (KVKK), European (GDPR), and US regulatory jurisdictions. When PII is detected, the execution record includes a findings summary with the type, count, and character positions of each match. You can choose to apply masking, which replaces sensitive values with anonymised placeholders before storage. See PII Detection for the complete type catalog.

Credit

One credit is consumed each time a document is processed — one job equals one credit, regardless of file size or document type. Dataset builds, exports, and re-queries of existing execution records do not consume credits. Credits are tracked per billing period and reset on your plan’s renewal date. You can check your current balance at any time via GET /v1/usage.
PlanCreditsBilling
Trial1,200Per 30 days (no credit card required)
Starter1,200Per month
Pro6,000Per month
EnterpriseCustomCustom

Connector

A connector is a configured connection to external cloud storage — Amazon S3, Google Cloud Storage, or Azure Blob Storage. With a connector in place, you can ingest documents directly from a bucket without uploading them manually through the API. Connectors support scheduled sync for automated ingestion pipelines.

Workspace

A workspace is the tenant-scoped container that holds all of your FlexOrch resources — documents, jobs, executions, datasets, API keys, and connectors. Every request you make is automatically scoped to your workspace via your API key. Each FlexOrch account has exactly one workspace. Enterprise plans support team management features that allow multiple users to operate within the same workspace.