Build a dataset
Pass a list of completed job IDs and a human-readable name toclient.datasets.build(). The method returns a Dataset object once the dataset is assembled.
All job IDs passed to
build() must be in the completed state. Jobs with
status queued, running, or failed are silently skipped unless you set
strict=True, which raises IncompleteJobError instead.Build from filtered jobs
A common pattern is to filter jobs by quality grade before building:Export a dataset
client.datasets.export() returns the dataset contents as raw bytes. Write them to disk with standard Python file I/O.
Supported export formats
| Format | format value | Best for |
|---|---|---|
| JSON Lines | "jsonl" | LLM fine-tuning, streaming ingestion |
| CSV | "csv" | Spreadsheet tools, quick inspection |
| Parquet | "parquet" | Columnar analytics, Spark / DuckDB pipelines |
| Markdown | "markdown" | Human review, RAG document stores |
| Arrow | "arrow" | High-performance in-memory data exchange |
Profile a dataset
client.datasets.profile() returns statistics about your dataset — token counts, field coverage, PII distribution, and grade breakdown — useful for quality checks before fine-tuning.
Delete a dataset
Deleting a dataset removes the assembled export artifact. The underlying jobs and their extracted data are not deleted.Next steps
Jobs
Learn how to filter and manage jobs before building datasets.
API Reference
Full parameter and return-type reference for all dataset methods.