Linkup /extract integration guide
You are integrating the Linkup /extract API: a closed-beta web data agent
(access is granted by request) that starts from a seed URL and returns
structured rows of data. Given a url and
a natural-language query q describing the rows to collect, the agent
extracts the matching records and writes them to a downloadable NDJSON file.
Use it to turn listing pages (teams, catalogs, directories, job boards) into
tables of consistent records.
Two required fields (q, url), an optional row schema, and an optional
verifyUrls flag. Async lifecycle: POST returns the task (with its id)
immediately; poll GET /extract/:id until status is "completed", then
download output.resultUrl (the rows are not in the API response).
When to use it
Use/extract when the data is repeated records on a known page and you
want them as structured rows. If you only need one page’s content as markdown,
use /fetch. If the value is multi-source synthesis with citations, use
/research.
Other endpoints in the API:
- Search (
/search): synchronous web search. - Fetch (
/fetch): one known URL as markdown. - Research (
/research): autonomous multi-source research agent. - Tasks (
/tasks): asynchronous batch wrapper.
Setup
Example (curl; adapt to the project’s HTTP client)
Tool definition (OpenAI function-calling format)
Remove the"type": "function" envelope and rename parameters to
input_schema for the Anthropic format. Note that this tool is async:
the handler should poll on the model’s behalf and return the completed
result (or the downloaded rows), not the task id.
Operational guidance (inline)
Seed URL
Pointurl at the page that already contains the records (listing,
directory, catalog), not a homepage. The more directly the seed page lists
the rows, the more reliable the extraction.
Query phrasing
q should name the entity that defines one row and the fields each row must
contain, plus any filter limiting which records qualify.
Schema design
Flat schemas with primitive fields are most reliable. Mark only mandatory fields asrequired; over-constraining drops valid rows. Reshape into nested
structures client-side after extraction.
URL verification
SetverifyUrls to true only when extracted links must resolve; it adds
latency.
Reading the result
output.resultUrl points to an NDJSON file (one JSON object per line),
valid for 24 hours. Download and persist it promptly. output.rowsReturned
gives the row count. output.creditsUsed gives the credits used by the task.
Polling
- Interval: poll roughly every 30 seconds for long-running tasks.
- Maximum poll rate: 1 request per second.
Constraints
- Parse the result as NDJSON (line by line), not as a single JSON document.
- The
resultUrlexpires after 24 hours. - Polling without backoff triggers rate limits without reducing time-to-completion.