Skip to main content
After processing your documents, FlexOrch lets you group completed jobs into a named dataset and export the structured output in the format your downstream pipeline expects. This page covers building, profiling, exporting, and deleting datasets using the TypeScript SDK.

Build a dataset

Pass an array of completed job IDs and a human-readable name to client.datasets.build(). The method returns a Promise<Dataset> that resolves once the dataset is assembled.
import { FlexOrch } from "@flexorch/sdk";

const client = new FlexOrch({
  apiKey: process.env.FLEXORCH_API_KEY,
});

const dataset = await client.datasets.build({
  jobIds: ["job_abc123", "job_def456", "job_ghi789"],
  name: "q1-invoices",
});

console.log(`Dataset ID:  ${dataset.id}`);
console.log(`Name:        ${dataset.name}`);
console.log(`Status:      ${dataset.status}`);
console.log(`Job count:   ${dataset.jobCount}`);
All job IDs passed to build() must be in the completed state. Jobs with status queued, running, or failed are silently skipped unless you pass strict: true, which causes the Promise to reject with an IncompleteJobError.

Build from filtered jobs

A common pattern is to filter jobs by quality grade before building:
const allJobs = await client.jobs.list({ limit: 200 });

const highQualityIds = allJobs
  .filter((j) => ["A", "B"].includes(j.qualityGrade ?? "") && j.status === "completed")
  .map((j) => j.id);

const dataset = await client.datasets.build({
  jobIds: highQualityIds,
  name: "high-quality-contracts",
});

Export a dataset

client.datasets.export() returns a Promise<Buffer> containing the dataset bytes. Write them to disk using Node’s fs module.
import { writeFileSync } from "fs";

const data = await client.datasets.export("ds_xyz789", { format: "jsonl" });

writeFileSync("q1-invoices.jsonl", data);
console.log("Export saved.");

Supported export formats

Formatformat valueBest for
JSON Lines"jsonl"LLM fine-tuning, streaming ingestion
CSV"csv"Spreadsheet tools, quick inspection
Parquet"parquet"Columnar analytics, Spark / DuckDB pipelines
Markdown"markdown"Human review, RAG document stores
Arrow"arrow"High-performance in-memory data exchange
import { writeFileSync } from "fs";

const formats = ["jsonl", "parquet", "csv"] as const;

for (const fmt of formats) {
  const data = await client.datasets.export("ds_xyz789", { format: fmt });
  writeFileSync(`q1-invoices.${fmt}`, data);
  console.log(`Saved q1-invoices.${fmt}`);
}

Profile a dataset

client.datasets.profile() returns statistical metadata about your dataset — token counts, field coverage, PII distribution, and grade breakdown — useful for quality checks before fine-tuning.
const profile = await client.datasets.profile("ds_xyz789");

console.log(`Total records:       ${profile.recordCount}`);
console.log(`Total tokens:        ${profile.totalTokens}`);
console.log(`Grade A records:     ${profile.gradeCounts["A"]}`);
console.log(`Unique PII types:    ${profile.piiTypeCount}`);
console.log(`Avg quality score:   ${profile.avgQualityScore.toFixed(2)}`);
Run profile() after building and before exporting to catch low-quality datasets early — especially when assembling training data for fine-tuning.

Delete a dataset

Deleting a dataset removes the assembled export artifact. The underlying jobs and their extracted data are not deleted.
await client.datasets.delete("ds_xyz789");
console.log("Dataset deleted.");
Deletion is immediate and irreversible. If you need the data again, you must call client.datasets.build() to reassemble it from the original jobs.

Next steps

Jobs

Learn how to filter and manage jobs before building datasets.

API Reference

Full parameter and return-type reference for all dataset methods.