Getting started

OCR for scanned documents

Understand how Polaris reads scanned text and what happens when OCR is partial or not included.

OCR lets Polaris read text inside images or scanned PDFs. It is useful when the file does not have selectable text, but visible text appears on the page.

When it applies

OCR can apply to:

  • Scanned PDFs.
  • Images with text.
  • Screenshots where text is readable.

If a PDF already has selectable text, Polaris can process it as a textual document without relying on OCR.

Plans

  • Starter: OCR not included.
  • Pro: OCR included.
  • Business: OCR included with monthly limits.
  • Enterprise: OCR with custom limits.

Availability can also depend on workspace configuration.

If OCR is not included

The document can upload successfully, but scanned text will not be used for answers. In Documents you may see warnings such as OCR not included or Partial indexing.

If the file also contains normal text, Polaris keeps that part.

Partial OCR and timeout

Partial OCR means Polaris read part of the document, but not every page.

It can happen because of:

  • a page that took too long;
  • a very long PDF;
  • an image that is difficult to read;
  • plan limits.

When this happens, Polaris keeps the text it could read and shows warnings such as Partial OCR or OCR timeout.

Best practices

  • Use sharp, properly oriented scans.
  • Avoid very long scanned PDFs.
  • Upload a version with selectable text when available.
  • Split large documents into smaller files.

Learn more