/extract), a web data agent that turns a page into structured rows. Given a seed url and a natural-language query describing the rows you want, the agent extracts the matching records and returns them as a downloadable file.
Asynchronous lifecycle. Optional row schema and URL verification.
Extract is in closed beta and access is granted by request — request access to try it.
How to use
- Send a
POST /extractrequest with a seedurland aqdescribing which rows to extract. The endpoint immediately returns a task identifier andstatusof"pending". - Poll
GET /extract/{id}to retrieve a specific task, orGET /extractto list all your extract tasks. - While running,
statusis"pending"or"processing". Once the task is"completed", readoutput.resultUrland download the rows. If"failed", inspecterror.
curl
Parameters
| Parameter | Description |
|---|---|
q | Natural-language query describing which rows to extract and what each row should contain. Required. |
url | The seed URL the extract task starts from. Required. |
schema | Optional JSON schema describing a single extracted row. When provided, every returned row must match it. |
verifyUrls | Whether URLs found in extracted rows are checked for reachability after extraction. Defaults to false. |
Output
The extracted rows are not returned inline. Once"completed", output contains a resultUrl — a download link to a newline-delimited JSON (NDJSON) file, one row per line — valid for 24 hours. It also reports rowsReturned and creditsUsed, the credits used by the task.
Read the full reference: POST /extract · GET /extract · GET /extract/:id. For operational guidance, see extract best practices.