> ## Documentation Index
> Fetch the complete documentation index at: https://docs.getlimina.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Processing Excel (XLS/XLSX) Files

> This guide will get you started with XLSX deidentification.

Limina supports scanning Microsoft Excel XLS and XLSX files for PII and creating de-identified or redacted copies. Limina’s supported entity types function across each file type, with localized variants of different **PII** (Personally Identifiable Information) entities, **PHI** (Protected Health Information) entities, and **PCI** (Payment Card Industry) entities being detected. Our [Supported Languages](/languages) and [Supported Entity Types](/entities) page provides a more detailed look.

## How XLSX Files Are Processed

Similar to CSV files, cell contents of XLSX files are processed using the method described for Tabular Data in the [Structured Data Guide](/configuration-and-operations/working-with-files/structured-data). In addition to cell contents, the following elements are handled:

| Property Type          | Details                                                                                                                                      | Behaviour                                              |
| ---------------------- | -------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------ |
| Core properties        | Author, Category, Comments, Content Status, Identifier, Keywords, Language, Last Modified By, Subject, Title, Version                        | Redact                                                 |
| Headers and footers    | Any content in headers and footers, such as text and images. Can appear when the document is printed                                         | Passthrough, will change to Redact in a future release |
| Images                 | The [Images](/configuration-and-operations/working-with-files/processing-files/image) page provides a more detailed look at Image processing | Redact, unsupported image types are removed            |
| Text boxes             | Floating text boxes                                                                                                                          | Passthrough, will change to Remove in a future release |
| Embedded links         | Hyperlinks to internet pages or documents                                                                                                    | Remove                                                 |
| External elements      | Tables and charts embedded from another document or file, such as an Excel chart or table object                                             | Passthrough, please process these separately           |
| Embedded audio & video | Videos and audio clips                                                                                                                       | Remove                                                 |
| Review comments        | Comments from document reviews                                                                                                               | Passthrough, will change to Remove in a future release |
| Shape objects          | Shapes containing text                                                                                                                       | Passthrough, will change to Redact in a future release |

<Info>
  Graphical content where text is present will be OCRed and then redacted. You can configure the OCR System by setting it as an [Environment Variable](/configuration-and-operations/container-management/environment-variables) or sending it in the request object. Check out our [OCR Guide](/configuration-and-operations/working-with-files/processing-files/ocr-modes) to further understand the OCR modes and their usage.
</Info>

## How XLS Files Are Processed

XLS files are processed by converting into XLSX files, followed the process described above and then converting back to XLS files.

## Constraints

* Cell contents of XLSX files are processed using the method described for Tabular Data in the [Structured Data Guide](/configuration-and-operations/working-with-files/structured-data). This requires the data to be column-oriented and the headers to be on the first non-empty row.
* Shape objects will not be preserved.
* Formulas may not be preserved after redaction.

## Support Matrix

|           | CPU Container | GPU Container | Community API | Professional API |
| --------- | ------------- | ------------- | ------------- | ---------------- |
| Supported | Yes           | Yes           | Up to 10 MiB  | No               |

## Sample Request

<Info>
  [Connect with one of our privacy experts](https://getlimina.ai/contact-us/?utm_source=docs\&utm_medium=website) to run this code.
</Info>

<CodeGroup>
  ```json Request Body wrap lines theme={"theme":"poimandres"}
  {
    "file": {
      "data": "<file_content_base64>",
      "content_type": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
    },
    "entity_detection": {
      "return_entity": true
    }
  }
  ```

  ```shell curl wrap lines theme={"theme":"poimandres"}
  echo '{
            "file": {"data": "'$(base64 -w 0 sample.xlsx)'", 
            "content_type": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"}, 
            "entity_detection": {"return_entity": "True"}
        }' \
  | curl --request POST --url 'https://api.private-ai.com/community/v4/process/files/base64' \
         -H 'Content-Type: application/json' \
         -H 'x-api-key: <YOUR KEY HERE>' \
         -d @- \
         | jq -r .processed_file \
         | base64 -d > 'sample.redacted.xlsx'
  ```

  ```python python wrap lines theme={"theme":"poimandres"}
  import requests
  import base64

  file_url = "https://paidocumentation.blob.core.windows.net/$web/sample.xlsx"
  filename_out = "/path/to/output/sample.redacted.xlsx"
  file_content = requests.get(file_url).content
  file_content_base64 = base64.b64encode(file_content).decode()

  url = "https://api.private-ai.com/community/v4/process/files/base64"

  headers = {"Content-Type": "application/json", "x-api-key": "<INSERT API KEY>"}

  payload = {
    "file":{
      "data": file_content_base64,
      "content_type": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
    },
    "entity_detection": {
      "return_entity": True
    }
  }

  response = requests.post(url, json=payload, headers=headers)
  with open(filename_out, "wb") as f:
      f.write(base64.b64decode(response.json()["processed_file"]))
  ```

  ```python Python Client wrap lines theme={"theme":"poimandres"}
  from privateai_client import PAIClient
  from privateai_client.objects import request_objects
  import base64

  filename_in = "sample.xlsx"
  filename_out = "sample.redacted.xlsx"

  file_type= "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
  client = PAIClient(url="https://api.private-ai.com/community/v4/", api_key="<YOUR API KEY>")

  with open(filename_in, "rb") as b64_file:
      file_data = base64.b64encode(b64_file.read())
      file_data = file_data.decode("ascii")

  file_obj = request_objects.file_obj(data=file_data, content_type=file_type)
  request_obj = request_objects.file_base64_obj(file=file_obj)
  resp = client.process_files_base64(request_object=request_obj)

  with open(filename_out, 'wb') as redacted_file:
      processed_file = resp.processed_file.encode("ascii")
      processed_file = base64.b64decode(processed_file, validate=True)
      redacted_file.write(processed_file)
  ```
</CodeGroup>

## Sample Response

```json Response wrap lines theme={"theme":"poimandres"}
{
  "processed_file": "Base64 Encoded File Content of the Redacted File",
  "processed_text": "string",
  "entities": "List[Entity]",
  "entities_present": true,
  "languages_detected": {"lang_1": 0.67, "lang_2": 0.74}
}
```
