Skip to main content
Limina supports scanning Microsoft Excel XLS and XLSX files for PII and creating de-identified or redacted copies. Limina’s supported entity types function across each file type, with localized variants of different PII (Personally Identifiable Information) entities, PHI (Protected Health Information) entities, and PCI (Payment Card Industry) entities being detected. Our Supported Languages and Supported Entity Types page provides a more detailed look.

How XLSX Files Are Processed

Similar to CSV files, cell contents of XLSX files are processed using the method described for Tabular Data in the Structured Data Guide. In addition to cell contents, the following elements are handled:
Property TypeDetailsBehaviour
Core propertiesAuthor, Category, Comments, Content Status, Identifier, Keywords, Language, Last Modified By, Subject, Title, VersionRedact
Headers and footersAny content in headers and footers, such as text and images. Can appear when the document is printedPassthrough, will change to Redact in a future release
ImagesThe Images page provides a more detailed look at Image processingRedact, unsupported image types are removed
Text boxesFloating text boxesPassthrough, will change to Remove in a future release
Embedded linksHyperlinks to internet pages or documentsRemove
External elementsTables and charts embedded from another document or file, such as an Excel chart or table objectPassthrough, please process these separately
Embedded audio & videoVideos and audio clipsRemove
Review commentsComments from document reviewsPassthrough, will change to Remove in a future release
Shape objectsShapes containing textPassthrough, will change to Redact in a future release
Graphical content where text is present will be OCRed and then redacted. You can configure the OCR System by setting it as an Environment Variable or sending it in the request object. Check out our OCR Guide to further understand the OCR modes and their usage.

How XLS Files Are Processed

XLS files are processed by converting into XLSX files, followed the process described above and then converting back to XLS files.

Constraints

  • Cell contents of XLSX files are processed using the method described for Tabular Data in the Structured Data Guide. This requires the data to be column-oriented and the headers to be on the first non-empty row.
  • Shape objects will not be preserved.
  • Formulas may not be preserved after redaction.

Support Matrix

CPU ContainerGPU ContainerCommunity APIProfessional API
SupportedYesYesUp to 10 MiBNo

Sample Request

{
  "file": {
    "data": "<file_content_base64>",
    "content_type": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
  },
  "entity_detection": {
    "return_entity": true
  }
}

Sample Response

Response
{
  "processed_file": "Base64 Encoded File Content of the Redacted File",
  "processed_text": "string",
  "entities": "List[Entity]",
  "entities_present": true,
  "languages_detected": {"lang_1": 0.67, "lang_2": 0.74}
}