Extract overview - Linkup API Documentation

Extract is in closed beta. Access is limited to a small group of users while we refine it — request access. Parameters, behavior, and response shape may change.

Extract is Linkup’s web data agent. Given a seed url and a natural-language description of the rows you want, it extracts the matching records and returns them as structured rows. Use cases include:

turning a listing page (team, catalog, directory) into a table of records,
pulling repeated entities (people, products, jobs) with consistent fields, and
collecting links from a page and verifying they resolve.

The agent starts from the url you provide, extracts the rows described by q, and writes the result to a downloadable NDJSON file.

Parameters

Parameter	Type	Default	Description
`q`	`string`	(required)	Natural-language query describing which rows to extract and what each row should contain.
`url`	`string`	(required)	The seed URL the extract task starts from.
`schema`	`object`	`null`	JSON schema describing a single extracted row. When provided, every returned row must match it.
`verifyUrls`	`boolean`	`false`	Whether URLs found in extracted rows are checked for reachability after extraction.

The `schema` parameter

When schema is omitted, the agent infers the shape of each row from q and from the data structure on the website. Pass an explicit schema to pin the fields and types of every row. The schema describes one row — the agent applies it to each record it extracts. For example, to extract one row per team member:

{
  "type": "object",
  "properties": {
    "name": { "type": "string", "description": "Full name" },
    "role": { "type": "string" },
    "profileUrl": { "type": "string", "format": "uri" }
  },
  "required": ["name", "profileUrl"]
}

The `verifyUrls` parameter

When verifyUrls is true, any URL found in an extracted row is checked for reachability after extraction. This adds latency but filters out dead links. Leave it false when you only need the raw extracted values.

Output

Extract does not return the extracted rows in the API response. The output object contains a resultUrl — a download link to a separate file. You must make a second request to that URL to get the data. The link is valid for 24 hours.

Once the task is "completed", output holds a link to a newline-delimited JSON (NDJSON) file — one extracted row per line — rather than the rows themselves:

Field	Type	Description
`creditsUsed`	`number`	Credits used by this extract task.
`resultUrl`	`string`	Download link for the extracted rows as an NDJSON file. The rows live here, not in the API response. Valid for 24 hours.
`rowsReturned`	`integer`	Number of rows in the downloadable file.

To get the data: poll GET /v1/extract/:id until status is "completed", read output.resultUrl, then make a separate GET request to that URL to download and parse the NDJSON file (one JSON object per line).

Pricing

Extract uses variable pricing: cost scales with crawl complexity — page size, number of rows extracted, pagination depth, and whether verifyUrls is enabled. Most completed tasks fall in the $2–10 range. The exact charge appears as creditsUsed in the task output once status is "completed". No credit is deducted for failed tasks. A minimum account balance of $10 is required to submit an extract task. See pricing for billing details.

Async lifecycle

POST /v1/extract returns immediately with a task identifier and status set to "pending". Subsequent calls to GET /v1/extract/:id return the current state until status is "completed" or "failed". GET /v1/extract is also available to list all extract tasks for the account. See the list reference.

POST /extract               GET /extract/:id              GET /extract/:id
        │                            │                            │
        ▼                            ▼                            ▼
   { id, status:           { id, status: "processing" }   { id, status: "completed",
     "pending" }                    (poll)                     output: { resultUrl } }

Poll at 30 seconds for long-running tasks. Polling above 1 request per second will be rate-limited.

Example

Get your API key

Create a Linkup account for free to get your API key.

# 1. Submit
curl -X POST "https://api.linkup.so/v1/extract" \
  -H "Authorization: Bearer $LINKUP_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "q": "All engineering team members with their name, role, and profile page",
    "url": "https://example.com/team",
    "schema": {
      "type": "object",
      "properties": {
        "name": { "type": "string", "description": "Full name" },
        "role": { "type": "string" },
        "profileUrl": { "type": "string", "format": "uri" }
      },
      "required": ["name", "profileUrl"]
    },
    "verifyUrls": true
  }'
# → { "id": "01234-abcd-56789", "status": "pending", ... }

# 2. Poll
curl "https://api.linkup.so/v1/extract/01234-abcd-56789" \
  -H "Authorization: Bearer $LINKUP_API_KEY"

# 3. Download the rows once status is "completed"
curl "<resultUrl from output>"

POST /v1/extract returns the task envelope immediately, with status set to "pending" and output set to null. GET /v1/extract/{id} returns the same envelope; once status is "completed", output is populated:

{
  "id": "01234-abcd-56789",
  "type": "extract",
  "status": "completed",
  "createdAt": "2026-01-01T00:00:00.000Z",
  "updatedAt": "2026-01-01T00:03:12.000Z",
  "error": null,
  "input": {
    "q": "All engineering team members with their name, role, and profile page",
    "url": "https://example.com/team",
    "schema": {
      "type": "object",
      "properties": {
        "name": { "type": "string", "description": "Full name" },
        "role": { "type": "string" },
        "profileUrl": { "type": "string", "format": "uri" }
      },
      "required": ["name", "profileUrl"]
    },
    "verifyUrls": true
  },
  "output": {
    "creditsUsed": 2.84,
    "resultUrl": "https://<download-host>/extract/01234-abcd-56789.ndjson?...",
    "rowsReturned": 24
  }
}

Each line of the downloaded NDJSON file is one row matching schema (or the shape inferred from q and the data structure on the website):

{"name": "Ada Lovelace", "role": "Staff Engineer", "profileUrl": "https://example.com/team/ada"}
{"name": "Alan Turing", "role": "Principal Engineer", "profileUrl": "https://example.com/team/alan"}

Best practices

Query phrasing, schema design, and polling.

For AI agents

Tool definition and integration prompt.

API reference

Full parameter spec and response schema.

​Parameters

​The schema parameter

​The verifyUrls parameter

​Output

​Pricing

​Async lifecycle

​Example

Get your API key

​Next

Best practices

For AI agents

API reference

Parameters

The `schema` parameter

The `verifyUrls` parameter

Output

Pricing

Async lifecycle

Example

Next