Get outputs for a scraping group
Allows you to fetch all outputs for a given scraping group. The results are a materialized view of the outputs for the group; meaning the results are deduplicated. This view is updated depending on how often the group is scheduled to be re-scraped.
The results are paginated and sorted by the created_at
field in ascending order.
You can fetch the next page by using the next_url
or next_cursor
fields in the
response metadata.
Typically, you’d also want to provide a created_after
filter to only fetch outputs
created after a certain date. This is useful when you want to fetch new outputs
since the last time you fetched outputs; thus allowing you to maintain a “real-time”
view of the outputs.
Path Parameters
ID of the scraping group you want to fetch outputs for. Can be found in the URL of the scraping group page.
Query Parameters
ID of the root scraping job you want to fetch outputs for. If not provided, all jobs will be used. Using this parameter allows you to fetch results for a specific domain.
"03583f9c-6c90-4f3c-9afd-186258d6f4d6"
Date (inclusive) in format YYYY-MM-DD
"2023-02-14"
Number of results to return per page
1 <= x <= 1000
Cursor to paginate through results
x >= 0
Name of the country to filter by (eg: United States)
"United States"
ISO 3166-2 code of the region to filter by (eg: US-CA)
"US-CA"
Response
Represents a single output from a scraping job. This is the data that was extracted
from a website by a scraper. The data
field contains the extracted data normalized
to the schema of the scraping group.
The files
field contains the files that were extracted by the scraper. The files
can be downloaded from the s3_url
field.
The change type indicate if it was the first time the output was created or if it was
an update or delete of an existing record. See ChangesetAction
for more details.