Troubleshooting Document Processing scans, including OCR

The scan action the user selected is meant to create a text-searchable document with OCR, so why is the scanned file not searchable or not accurately detecting the text?

OCR technology is impressive, but not perfect. A failure is usually for one of the following reasons:

The document has been loaded into the MFD in the wrong position. PaperCut MF attempts to detect the page orientation, however this is not always accurate.
The DPI of the scan job is too low. Try scanning again at a higher DPI.
The fonts are complex or artistic.
The image quality of the original file is poor, for example, the page is damaged or skewed, the page has a lot of speckling, or there isn’t much contrast between the text and the background.
The language used in the document has not been enabled for OCR on the Capture tab of the Admin web interface.
The Scan Action does not have OCR enabled for the selected file type (PDF).

Turning on the Despeckle and Deskew options can help improve OCR detection accuracy by enhancing image quality. This can result in slower performance, depending on your infrastructure.

If your document is not private in nature, or you are able to reproduce the problem with another non-private document, send the document to PaperCut support to help us improve OCR accuracy.

I have set up a locally hosted Document Processing server and I get an error message on the Capture page.

Ensure that the OCR service is functioning and can be reached from the PaperCut MF Application Server.
If you are using a firewall other than the Windows Firewall (which is configured automatically by the installer), open port 9181 (inbound) to allow connections from the PaperCut MF Application Server.
On the Capture page, make sure there are no typos in the Document Processing server hostname or IP address.
Wait for a minute and then refresh the page to see if the error message goes away.
Uninstall the Document Processing service and install it again.
Investigate any potential network problems.

When using Batch Splitting, the output documents aren't separated correctly.

When using Batch Splitting, the output documents aren’t separated correctly.

Ensure the correct duplex mode is set in the scan action Input settings.
Ensure the number of pages for each document in the input batch matches the number of pages for each document set in the scan action.
Check that the automatic document feeder isn’t grabbing multiple pages at a time.

With Blank Page Removal turned on, my documents are still being delivered containing blank pages.

Blank Page Removal works at the scan action level, not globally. Make sure it’s enabled for the specific scan action used to produce the documents..
Any content on the page will cause it to be classified as not-blank. This includes:.
- headers and footers
- page numbers
- text like “this page left intentionally blank”
- any image content
- logo
- watermarks
- smudges or shadows.
You can manually The blank page detection threshold can be manually tuned to reduce sensitivity using [THIS CONFIG KEY] ..

A user's document hasn't arrived. What should I do?

A user’s document hasn’t arrived. What should I do?

If the user received a scan success email:

Ask the user to check that the destination they are looking in is correct.

If the user did not receive a scan success email:

Check the PaperCut MF Admin interface Logs > Job Log to see if the job is still in progress.
If it is, give it a few more minutes to come through. The larger and more complex the document, the longer it takes to process, particularly if Document Processing enhancements like OCR are enabled. For example, a one-page black and white scan at 300 DPI should be delivered to the destination in less than a minute. However, a 20 page color scan with OCR, Blank Page Removal and Deskew at 600 DPI will take approximately 20 mins, depending on the speed of your network.
If the job has failed, check the Application Log for messages about the problem:
1. Select Logs > Application Log.
  The Application Log page is displayed.

App Log message	Cause	Action
Failed to activate PaperCut MF Cloud Services with the license provided. Ensure you have active Maintenance & Support (M&S) and that your network connection is stable.	This could happen for the following reasons:
	Your PaperCut MF license is invalid or expired.	Contact your reseller to renew your license.
	A network problem occurred while trying to activate PaperCut MF Cloud Services (for example, communication is blocked by a firewall)	Check that the correct URLs and port are allowed through your firewall outbound.
PaperCut MF Cloud Services have been deactivated, which means your Scan to Cloud Storage and text-searchable document scan actions are disabled. Please try to re-activate PaperCut MF Cloud Services.	This could happen for the following reasons: The license used to activate PaperCut MF Cloud Services has expired. You have changed to a new license with a different CRN. The API key that was retrieved after activating PaperCut MF Cloud Services has been inadvertently deleted. You do not have active Maintenance & Support (M&S).	Reactivate the PaperCut MF Cloud Services by attempting to edit either a Scan to Cloud Storage scan action or a scan action with OCR enabled. For more information about creating scan actions, see Setting up Integrated Scanning. If not you do not have active M&S, a message will be displayed when you try to reactivate. Contact your reseller to reactivate your M&S.
Unable to connect to cloud.papercut.com. Endpoint refused to connect.	This error can occur for the following reasons:
	PaperCut MF is behind a proxy or firewall.	Check that the correct URLs and port are allowed through your firewall outbound.
	The configured URL for the PaperCut Cloud OCR Service has been changed.	Check that the config key `system.scan.ocr.api-url` is set to `https://ocr.cloud.papercut.com` , or one of the regional URLs from PaperCut MF Cloud Services port and URLs.
	There has been an SSL certificate mismatch.	Check that your Certificate Authority SSL certificates are current and valid.
	There was a network outage.	Ask the user to retry the scan when the outage is fixed.
	The server clock timing is incorrect.	Fix the server system time.
Failed to upload the scanned file by {username} on {deviceName} to the PaperCut MF OCR Service.	This error can occur for the following reasons:
	The time taken to transmit the document from the PaperCut Cloud OCR Service to the Application server has exceeded the configured timeout period.	Increase the timeout period via the `system.scan.ocr.transfer-retry-timeout-mins` config key. For more information, see Configure advanced Integrated Scanning (config keys).
	There was a network interruption. There was a thread interruption. There was an I/O error.	Ask the user to retry the scan.
The PaperCut OCR Service could not convert the scan job by {username} on {deviceName} to a searchable format.	This error can occur for various reasons including: the scanned document is invalid or corrupted there is a problem with the PaperCut MF Cloud Services	Check the server log for more details about the error.
The OCR process has timed out after {x} minutes while converting the scan job by {username} on {deviceName} to a searchable document.	The time taken to convert the document to a text-searchable format has exceeded the configured timeout period.	Increase the timeout period via the `system.scan.ocr.cloud.download-polling-timeout-secs` config key. For more information, see Configure advanced Integrated Scanning (config keys).
Failed to download the searchable document scanned by {username} on {deviceName} from the PaperCut MF OCR Service.	This error can occur for the following reasons:
	The time taken to transmit the document from the PaperCut MF Cloud OCR Service to the Application server has exceeded the configured timeout period.	Increase the timeout period via the `system.scan.ocr.cloud.download-polling-timeout-secs` config key. For more information, see Configure advanced Integrated Scanning (config keys).
	There was a network interruption. There was a thread interruption. There was an I/O error.	Ask the user to retry the scan.
Scan file exceeds size limit.	The scan job has been converted to a text-searchable format and sent back to the Application Server, but it could not be delivered to the destination because the scan job is larger than the configured maximum file size.	The user needs to reduce the file size (for example, by reducing the DPI or splitting the file in two. Alternatively, you could increase the maximum file size allowed using one of the following config keys: `system.scan.email-max-job-size-kb` `system.scan.folder-max-job-size-kb` `system.scan.cloud-max-job-size-kb` For more information, see Configure advanced Integrated Scanning (config keys).

If you can’t resolve the problem, then contact your PaperCut MF reseller with your Support ID.

Some user's scan jobs are failing with the error "The OCR process has timed out after {0} minutes while converting the scan job by {1} on {2} to a searchable document."

The full error message is as follows:
The OCR process has timed out after {0} minutes while converting the scan job by {1} on {2} to a searchable document. Refer to the Integrated Scanning Troubleshooting section in the PaperCut manual.

Follow these steps to adjust the timeout setting.

In the PaperCut MF Admin interface, navigate to Options > General.
In the Actions menu, click Config editor (advanced).
Search for system.scan.ocr.processing-timeout-mins
Change the value to the number of minutes (default of 30) after which the job will timeout. We recommend 60.
Click Update.
Restart the PaperCut Application Server service.

Your OCR jobs will now timeout after the number of minutes you just set.

Here’s your answer

Did this solve your issue?

Oops!

Troubleshooting Document Processing scans, including OCR

This page applies to:

Notification of failed scans

Comments