Processing PDF Files (Enhanced)

Limina supports scanning PDF files for PII and creating de-identified or redacted copies. Limina’s supported entity types function across each file type, with localized variants of different PII (Personally Identifiable Information) entities, PHI (Protected Health Information) entities, and PCI (Payment Card Industry) entities being detected. Our Supported Languages and Supported Entity Types page provides a more detailed look.

If you’d like to try it yourself, please sign up for an account to get a free API key.

How PDFs Are Processed (Enhanced)

PDFs are processed as follows:

First, the PDF pages are scanned for text as well as Object Entities using OCR.
The detected objects are replaced with white boxes, and in-place replacements are done for the text.
PDF Metadata as well as any included attachments are removed from the PDF.
A new PDF is returned with the in-place replacements and without any attachments.

Parameters

Below are the parameters that control the behaviour of the PDF De-identifier. These parameters shall be specified under pdf_options.

Parameter	Explanation	Default
`approach`	This parameter changes which PDF approach is used.	”standard”
`density`	PDFs are converted into images using this DPI value. Smaller values result in images with smaller resolutions, which will take up less storage space and process faster, at the cost of output quality & redaction accuracy.	200
`max_resolution`	PDFs are converted into images using the `density` DPI value. Any resulting images with maximum size length larger than this will be resized to this value, while preserving aspect ratio.	3000

PDF Approaches shows the differences between Standard and Enhanced PDF processing.

Support Matrix

	CPU Container	GPU Container	Community API	Professional API
Supported	Yes	Yes	No	No

Sample Request

Connect with one of our privacy experts to run this code.

{
  "file": {
    "data": "<file_content_base64>",
    "content_type": "application/pdf"
  },
  "entity_detection": {
    "return_entity": true
  },
  "pdf_options": {
    "approach":"enhanced"
  }
}

Sample Response

Response

{
  "processed_file": "Base64 Encoded File Content of the Redacted File",
  "processed_text": "string",
  "entities": "List[Entity]",
  "entities_present": true,
  "languages_detected": {"lang_1": 0.67, "lang_2": 0.74}
}

​How PDFs Are Processed (Enhanced)

​Parameters

​Support Matrix

​Sample Request

​Sample Response

How PDFs Are Processed (Enhanced)

Parameters

Support Matrix

Sample Request

Sample Response