Skip to main content
This page is structured for direct use as integration context for a coding agent, or as a function-calling tool definition. Operational guidance is repeated inline so the page is self-contained.

Linkup /extract integration guide

You are integrating the Linkup /extract API: a closed-beta web data agent (access is granted by request) that starts from a seed URL and returns structured rows of data. Given a url and a natural-language query q describing the rows to collect, the agent extracts the matching records and writes them to a downloadable NDJSON file. Use it to turn listing pages (teams, catalogs, directories, job boards) into tables of consistent records. Two required fields (q, url), an optional row schema, and an optional verifyUrls flag. Async lifecycle: POST returns the task (with its id) immediately; poll GET /extract/:id until status is "completed", then download output.resultUrl (the rows are not in the API response).

When to use it

Use /extract when the data is repeated records on a known page and you want them as structured rows. If you only need one page’s content as markdown, use /fetch. If the value is multi-source synthesis with citations, use /research. Other endpoints in the API:
  • Search (/search): synchronous web search.
  • Fetch (/fetch): one known URL as markdown.
  • Research (/research): autonomous multi-source research agent.
  • Tasks (/tasks): asynchronous batch wrapper.

Setup

export LINKUP_API_KEY="your-api-key"

Example (curl; adapt to the project’s HTTP client)

# /extract is async: POST returns immediately with an id. Poll until completed,
# then download the NDJSON result from output.resultUrl (valid 24h).

# 1. Submit
curl -X POST "https://api.linkup.so/v1/extract" \
  -H "Authorization: Bearer $LINKUP_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "q": "All engineering team members with their name, role, and profile page",
    "url": "https://example.com/team",
    "schema": {
      "type": "object",
      "properties": {
        "name": { "type": "string" },
        "role": { "type": "string" },
        "profileUrl": { "type": "string", "format": "uri" }
      },
      "required": ["name", "profileUrl"]
    },
    "verifyUrls": true
  }'

# 2. Poll (initial 2s, back off to 10s, max 1 request/second)
curl "https://api.linkup.so/v1/extract/<id>" \
  -H "Authorization: Bearer $LINKUP_API_KEY"

# 3. Download rows once status == "completed"
curl "<output.resultUrl>"

Tool definition (OpenAI function-calling format)

Remove the "type": "function" envelope and rename parameters to input_schema for the Anthropic format. Note that this tool is async: the handler should poll on the model’s behalf and return the completed result (or the downloaded rows), not the task id.
{
  "type": "function",
  "function": {
    "name": "linkup_extract",
    "description": "Extracts structured rows of data from a web page. Provide a seed URL and a natural-language description of the rows to collect; returns the rows as an NDJSON file. Use for repeated records on a known page (team members, products, jobs, directory entries). Async: the handler should poll until completion and download the result.",
    "parameters": {
      "type": "object",
      "properties": {
        "q": {
          "type": "string",
          "description": "Natural-language query describing which rows to extract and what each row should contain. Name the unit of repetition (one person, one product, one job) and the fields each row must include."
        },
        "url": {
          "type": "string",
          "description": "The seed URL the extract task starts from. Point it at the page that already lists the records, not the site homepage."
        },
        "schema": {
          "type": "object",
          "description": "Optional JSON schema describing a single extracted row. When provided, every returned row must match it. Keep it flat; use format 'uri' for link fields. Omit to let the agent infer the shape from q and the data structure on the website."
        },
        "verifyUrls": {
          "type": "boolean",
          "description": "Whether URLs found in extracted rows are checked for reachability after extraction. Adds latency. Defaults to false.",
          "default": false
        }
      },
      "required": ["q", "url"]
    }
  }
}

Operational guidance (inline)

Seed URL

Point url at the page that already contains the records (listing, directory, catalog), not a homepage. The more directly the seed page lists the rows, the more reliable the extraction.

Query phrasing

q should name the entity that defines one row and the fields each row must contain, plus any filter limiting which records qualify.

Schema design

Flat schemas with primitive fields are most reliable. Mark only mandatory fields as required; over-constraining drops valid rows. Reshape into nested structures client-side after extraction.

URL verification

Set verifyUrls to true only when extracted links must resolve; it adds latency.

Reading the result

output.resultUrl points to an NDJSON file (one JSON object per line), valid for 24 hours. Download and persist it promptly. output.rowsReturned gives the row count. output.creditsUsed gives the credits used by the task.

Polling

  • Interval: poll roughly every 30 seconds for long-running tasks.
  • Maximum poll rate: 1 request per second.

Constraints

  • Parse the result as NDJSON (line by line), not as a single JSON document.
  • The resultUrl expires after 24 hours.
  • Polling without backoff triggers rate limits without reducing time-to-completion.

Beta status

The Extract endpoint is currently in beta. Behavior, parameters, and response shape may change.