Document Capture

Note: Before reading further please make sure you read the use case summary section first.

Overview

The Document Capture APIs allow a third-party add-on to create and manage PaperCut integrated scanning destinations. The new destinations will be available to PaperCut Hive administrators when creating a Quick Scan, and will appear alongside existing PaperCut-provided destinations such as “Scan to Google Drive” and “Scan to Email”.

The PaperCut Hive administrator experience

This video shows one of our distinguished engineers, Geoff, demonstrate how a custom document capture workflow appears to a PaperCut Hive administrator.

Some things to note:

The custom Quick Scan destination is displayed in the PaperCut Hive admin console in the same way as all other destinations.

You’ll need to provide PaperCut with an image for your add-on: 72 x 72 px with no padding and a transparent background. Here are two examples for solutions that PaperCut have written:

For horizontal orientated logos, like Box, vertical padding is OK to preserve the aspect ratio.
Output resolution In the add-on API, developers can limit settings to remove values that will not be available to the admin or end user in the Quick Scan editor UI.
Setting field visibility The administrator can select each data entry field defined in your custom Qick Scan destination, and set it to: read only; invisible; or editable. That is:
- Admin made setting visible, but read-only to the end-user at the MFD.
- Admin made setting invisible to the end-user at the MFD, but field value is still supplied in the payload.
- Admin made setting visible and editable by the end-user at the MFD.
This video explains it a bit more:

OCR Add-on requirement

A number of document capture features (listed below) require the customer to obtain the PaperCut Hive OCR Add-on, which may result in an additional PaperCut Hive subscription charges. Customers should contact their PaperCut Software partner for assistance with this additional add-on. The relevant document capture features are:

OCR
Convert to PDF/A
Blank page removal
Batch splitting on blank
Batch splitting on page number
Deskew
Despeckle
Cloud compression

API overview

You use the primary API end point to add a new Scan Destination to PaperCut Hive. The API also allows you to update or remove scan destinations.

Scan files are delivered to the add-on via a webhook callback. The public URL for this callback is declared in the add-on manifest, and the add-on service must listen for incoming messages on this URL.

The add-on is responsible for validating each incoming message by verifying the signature on the authorization header bearer token.

Define a scan destination

To add or update a scan destination, send an HTTPS POST request with a JSON payload. For example:

{
  "id": "acme",
  "label": "ACME accounting",
  "template": {
    "name": "Invoices",
    "settings": {
      "fileName": {
        "visible": true,
        "editable": true,
        "default": "invoice-${date_utc}"
      },
      "fileType": {
        "visible": true,
        "editable": true,
        "options": [
          "jpg", "pdf", "png"
        ],
        "default": "pdf"
      },
      "paperSize": {
        "visible": true,
        "editable": true,
        "options": [
          "A4", "Letter"
        ],
        "default": "A4"
      },
      "dpi": {
        "visible": false,
        "editable": false,
        "default": "200"
      },
    },
    "capture": [
      {
        "name": "invoice-page1",
        "source": "template",
        "fields": [
          {
            "name": "invoiceno",
            "label": "Invoice Number",
            "visible": true,
            "editable": true,
            "multiValue": false,
            "validation": [
              "hasValue"
            ]
          }
        ]
      }
    ]
  }
}

To update the existing destination call the endpoint a second time with the same value for id.

Notes on scan destination settings

Output filename

Currently the default filename (which can be made read-only or invisible) must be a fixed string. In the future PaperCut will add support for “File Name variables”. For example in a future API release you’ll be able to embed the username or timestamp in the final filename at the time the scan is performed.

Paper settings

The API supports eight different page settings. Larger page sizes only support landscape. (Larger paper sizes cannot be fed into office MFDs in portrait orientation.)

auto: Auto detection of mixed paper sizes in job (Refer to note below)
a4_p: ISO A4 in portrait mode
a4_l: ISO A4 in landscape mode
a3_l: ISO A4 in landscape mode (Note: A3 portrait mode is not physically supported)
letter_p: ANSI letter portrait mode
letter_l: ANSI letter landscape mode
legal_l: ANSI legal landscape mode (Note: Legal portrait mode is not physically supported)
ledger_l: ANSI ledger landscape mode (Note: Ledger portrait mode is not physically supported)

Notes:

If your solution is only deployed in a single locale, you should still assume the PaperCut Hive administrator might override your settings. For example, embassies usually use paper sizes of their country of origin, and your solution will need to accept any value when you receive the job content.
Generally the auto setting provides auto detection paper size in both single paper jobs, and mixed paper sizes in a single job (multiple pages).

However on HP devices, the auto option is renamed to “Mixed” on the MFD UI and is specific to auto detection of mixed paper sizes in a job (multiple pages).

If an HP user has a single-size scan job, they will choose that specific size on the MFD UI.

Color mode

Color mode affects file sizes. If your add-on needs small file sizes to optimize for storage and the documents are text-heavy, like contracts or invoices, then the mono BW will output the smallest size.

Remember that grayscale isn’t the same as BW.

Example of color settings

Webhook response

After a user scans a document, the add-on receives a webhook callback message with a JSON payload. It’s the add-on’s responsibility to process this information and deliver the documents to the correct final destination.

The “files” object contains a temporary access URL for each scanned document. The add-on is responsible for downloading the document from this URL and delivering it. As this process might take some time, the add-on should:

verify the authorization header bearer token
create an asynchronous task to process the webhook message
return success
asynchronously process the webhook message.

See below for instructions on verifying the authorization header bearer token.

Example webhook payload:

{
  "orgId": "string",
  "id": "string",
  "jobId": "qscan-12345",
  "destId": "acme",
  "files": [
    {
      "name": "string",
      "url": "string",
      "bytes": 1234
    }
  ],
  "metadata": {
    "userName": "string",
    "fullName": "string",
    "email": "string",
    "locale": "string",
    "captureTime": "13:15:22T2021-03-23Z",
    "capture": {
      "invoiceno": "inv1234",
    },
    "settings": {
      "fileName": "string",
      "fileType": "pdf",
      "promptMore": "false",
      "scanDuplex": "simplex",
      "paperSize": "a4_p",
      "dpi": "300",
      "colorMode": "color",
      "rotate": "0"
    },
    "processing": {
      "property1": "string",
    }
  }
}

Verifying the webhook authentication header bearer token

Both PaperCut and the add-on developers are responsible for ensuring the secure delivery of customer documents to their destination.

In particular, the webhook message containing sensitive document information must be secured from both PaperCut’s and the add-on’s point of view.

Webhook security

For PaperCut, we must ensure that the URL we send to belongs to the authorized and approved add-on developer. We do this by requiring that the webhook URL be provided in the add-on manifest, maintained by PaperCut. It is not possible to change this URL via an API call.

The add-on developer must ensure that each webhook message received is a genuine message from PaperCut. The add-on does this by verifying the authorization bearer token signature using a PaperCut-provided public key.

The steps are as follows:

Webhook verification

PaperCut uses a signed JSON Web Token (JWT) as the bearer token. The key ID is found in the JWT header, under the key “kid”. The RSA signature is verified in the standard way using the public key provided. Detailed documentation and sample source code for verifying the token will be provided to the add-on developer.

List installed destinations

GET request

Returns a list of installed scan destinations.

Get an installed destination