Skip to main content
Limina supports scanning PDF files for PII and creating de-identified or redacted copies. Limina’s supported entity types function across each file type, with localized variants of different PII (Personally Identifiable Information) entities, PHI (Protected Health Information) entities, and PCI (Payment Card Industry) entities being detected. Our Supported Languages and Supported Entity Types page provides a more detailed look.
If you’d like to try it yourself, please visit our free interactive web demo. No code or account is necessary.

How PDFs Are Processed (Enhanced)

PDFs are processed as follows:
  1. First, if the PDF has an invisible text layer (copy-pasteable text), it is extracted from the PDF.
  2. Any content that doesn’t have an invisible text layer is scanned for text as well as Object Entities).
  3. The detected objects are replaced with white boxes, and in-place replacements are done for the text.
  4. PDF Metadata as well as any included attachments are removed from the PDF.
  5. A new PDF is returned with the in-place replacements and without any attachments.

Parameters

Below are the parameters that control the behaviour of the PDF De-identifier. These parameters shall be specified under pdf_options.
ParameterExplanationDefault
approachThis parameter changes which PDF approach is used.”automatic”
densityPDFs are converted into images using this DPI value. Smaller values result in images with smaller resolutions, which will take up less storage space and process faster, at the cost of output quality & redaction accuracy.200
max_resolutionPDFs are converted into images using the density DPI value. Any resulting images with maximum size length larger than this will be resized to this value, while preserving aspect ratio.3000
PDF Approaches shows the differences between Standard and Enhanced PDF processing.

Support Matrix

CPU ContainerGPU ContainerCommunity APIProfessional API
SupportedYesYesNoNo

Sample Request

{
  "file": {
    "data": "<file_content_base64>",
    "content_type": "application/pdf"
  },
  "entity_detection": {
    "return_entity": true
  },
  "pdf_options": {
    "approach":"enhanced"
  }
}

Sample Response

Response
{
  "processed_file": "Base64 Encoded File Content of the Redacted File",
  "processed_text": "string",
  "entities": "List[Entity]",
  "entities_present": true,
  "languages_detected": {"lang_1": 0.67, "lang_2": 0.74}
}