Get outputs for a scraping group

curl --request GET \
  --url https://api.reworkd.dev/v1/outputs/{group_id} \
  --header 'Authorization: Bearer <token>'

{
  "metadata": {
    "next_url": "https://api.reworkd.dev/v1/items?cursor=2612294",
    "next_cursor": 2612294
  },
  "items": [
    {
      "id": 123,
      "output_id": "206fdc72-10c0-468c-996f-ff3f8cc51592",
      "root_job_id": "497dcba3-ecbf-4587-a2dd-5eb0665e6880",
      "is_approved": true,
      "create_date": "2023-02-14T00:00:00",
      "update_date": "2023-02-14T16:00:00",
      "last_scraped_date": "2023-02-14T16:00:00",
      "source_url": "http://example.com",
      "change_type": "CREATE",
      "key_hash": "d41d8cd98f00b204e9800998ecf8427e",
      "value_hash": "d41d8cd98f00b204e9800998ecf8427e",
      "data": {
        "age": 30,
        "email": "john@example.com",
        "images": [
          "http://example.com/profile.jpg",
          "http://example.com/photo.jpg"
        ],
        "name": "John Doe"
      },
      "tags": {
        "country": "US",
        "external_customer_id": "12345",
        "region": "US-CA"
      },
      "files": [
        {
          "create_date": "2023-02-14T00:00:00",
          "field": "image[0]",
          "file_checksum": "d41d8cd98f00b204e9800998ecf8427e",
          "file_metadata": {},
          "file_type": "jpg",
          "file_url": "http://example.com/example.jpg",
          "id": "206fdc72-10c0-468c-996f-ff3f8cc51592",
          "s3_key": "example.jpg",
          "s3_url": "https://files.reworkd.dev/d41d8cd98f00b204e9800998ecf8427e.jpg",
          "source_url": "http://example.com",
          "url_etag_hash": "d41d8cd98f00b204e9800998ecf8427e"
        }
      ]
    }
  ]
}

Public

Get outputs for a scraping group

Allows you to fetch all outputs for a given scraping group. The results are a materialized view of the outputs for the group; meaning the results are deduplicated. This view is updated depending on how often the group is scheduled to be re-scraped.

The results are paginated and sorted by the create_date of items in ascending order. You can fetch the next page by using the next_url or next_cursor fields in the response metadata.

Typically, you’d also want to provide a created_after filter to only fetch outputs created after a certain date. This is useful when you want to fetch new outputs since the last time you fetched outputs; thus allowing you to maintain a “real-time” view of the outputs.

GET

outputs

{group_id}

Get outputs for a scraping group

curl --request GET \
  --url https://api.reworkd.dev/v1/outputs/{group_id} \
  --header 'Authorization: Bearer <token>'

{
  "metadata": {
    "next_url": "https://api.reworkd.dev/v1/items?cursor=2612294",
    "next_cursor": 2612294
  },
  "items": [
    {
      "id": 123,
      "output_id": "206fdc72-10c0-468c-996f-ff3f8cc51592",
      "root_job_id": "497dcba3-ecbf-4587-a2dd-5eb0665e6880",
      "is_approved": true,
      "create_date": "2023-02-14T00:00:00",
      "update_date": "2023-02-14T16:00:00",
      "last_scraped_date": "2023-02-14T16:00:00",
      "source_url": "http://example.com",
      "change_type": "CREATE",
      "key_hash": "d41d8cd98f00b204e9800998ecf8427e",
      "value_hash": "d41d8cd98f00b204e9800998ecf8427e",
      "data": {
        "age": 30,
        "email": "john@example.com",
        "images": [
          "http://example.com/profile.jpg",
          "http://example.com/photo.jpg"
        ],
        "name": "John Doe"
      },
      "tags": {
        "country": "US",
        "external_customer_id": "12345",
        "region": "US-CA"
      },
      "files": [
        {
          "create_date": "2023-02-14T00:00:00",
          "field": "image[0]",
          "file_checksum": "d41d8cd98f00b204e9800998ecf8427e",
          "file_metadata": {},
          "file_type": "jpg",
          "file_url": "http://example.com/example.jpg",
          "id": "206fdc72-10c0-468c-996f-ff3f8cc51592",
          "s3_key": "example.jpg",
          "s3_url": "https://files.reworkd.dev/d41d8cd98f00b204e9800998ecf8427e.jpg",
          "source_url": "http://example.com",
          "url_etag_hash": "d41d8cd98f00b204e9800998ecf8427e"
        }
      ]
    }
  ]
}

Authorizations

Authorization

string

header

required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Path Parameters

group_id

string

required

ID of the scraping group you want to fetch outputs for. This can be found on the groups page in the individual group card.

Query Parameters

job_id

string | null

ID of the root scraping job to filter outputs by. Useful when you need to fetch results from a specific domain or data source within a group. When omitted, outputs from all jobs in the group will be returned.

Examples:

"03583f9c-6c90-4f3c-9afd-186258d6f4d6"

null

created_after

string<date-time> | null

Filter outputs to only include those created or updated on or after this timestamp. Accepts ISO 8601 format (YYYY-MM-DD or YYYY-MM-DDTHH:MM:SSZ). Essential for incremental data syncing to avoid fetching the entire dataset on each request.

Examples:

"2025-01-15T10:00:00Z"

"2025-01-15"

null

url

string | null

Complete URL to fetch outputs for, including protocol and path. Must match exactly the URL that was processed by the scraper.

Examples:

"https://www.example.com/product/product_id_123"

limit

integer

default:10

Number of results to return per page

Required range: 1 <= x <= 1000

cursor

integer

default:0

Cursor to paginate through results

Required range: x >= 0

country

string | null

Name of the country to filter by (eg: United States)

Examples:

"United States"

region

string | null

ISO 3166-2 code of the region to filter by (eg: US-CA)

Examples:

"US-CA"

"US-TX"

Response

200

application/json

Successful Response

The response is of type object.

Get Cookies