> ## Documentation Index
> Fetch the complete documentation index at: https://docs.getlimina.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Processing Files

> This guide will get you started with file deidentification.

<Info>
  [Connect with one of our privacy experts](https://getlimina.ai/contact-us/?utm_source=docs\&utm_medium=website) to run this code.
</Info>

Limina supports scanning a multitude of [different file types](/configuration-and-operations/working-with-files/supported-file-types) for PII and creating de-identified or redacted copies. Limina’s supported entity types function across each file type, with localized variants of different PII (Personally Identifiable Information) entities, PHI (Protected Health Information) entities, and PCI (Payment Card Industry) entities being detected. Our Supported Languages and Supported Entity Types page provides a more detailed look.

## How Does It Work?

Limina support for file processing comes in unified endpoints which works with either base64-encoded files or URIs: `/process/files/base64` and `/process/files/uri`.

### Base64

The `base64` endpoint is the recommended way to process files for most users, as there is no need to mount a volume into the container and ensure that permissions are set correctly. To use the `base64` endpoint, you first need to read the file in memory, encode its content with base64, and send it to the `base64` endpoint. You also need to pass the MIME type of the file as a hint to the file processing pipeline. The [Supported File Types](/configuration-and-operations/working-with-files/supported-file-types) page details extensions and MIME types for both endpoints

### URI

Available on the container only, the `uri` endpoint is suitable for larger data volumes and has the following advantages over the base64 endpoint:

* No overhead of base64 encoding.
* No need to first read the file in memory.
* The processed file is saved automatically by the container.

API calls are made by pointing to a file on a mounted drive. The redacted contents are automatically saved at a user-specified location with the `.redacted` suffix added to the original name. For example, the `uri` endpoint will access the file `/some/path/my-doc.pdf`, it will redact it and create a file `my-doc.redacted.pdf` with the redacted contents at the location specified by the user. When using the `uri` endpoint, the file extension is used to determine the file type.

A mounted drive can be connected to remote object storage or NAS. For instructions on how to connect a mounted drive to S3 please follow the guide here: [Using remote storage](/configuration-and-operations/working-with-files/processing-files/mounted-storage)

<Warning>
  Passing a file with no extension or with the wrong extension to the `uri` endpoint may lead to unexpected behavior.
</Warning>

## Diving Deeper

### **Processing files with the `base64` endpoint**

When using the `/process/files/base64` endpoint, there is no need to mount a folder into the container.

```shell Docker Command wrap theme={"theme":"poimandres"}
docker run --rm -v <full path to your license.json file>:/app/license/license.json \
-p 8080:8080 -it crprivateaiprod.azurecr.io/deid:<version>
```

The file is first read into memory to encode its contents then the encoded contents are passed to the file processing endpoint. On linux, this can be done with the `base64` shell command. Assuming you have the file `sample.pdf` saved in the current folder:

<CodeGroup>
  ```json Request Body wrap lines theme={"theme":"poimandres"}
  {
    "file": {
      "data": "'$(base64 -w 0 sample.pdf)'",
      "content_type": "application/pdf"
    }
  }
  ```

  ```shell cURL wrap lines theme={"theme":"poimandres"}
  echo '{"file": {"data": "'$(base64 -w 0 sample.pdf)'", "content_type": "application/pdf"}}' \
  | curl --request POST --url 'http://localhost:8080/process/files/base64' \
  -H 'Content-Type: application/json' -d @- | jq -r .processed_file | base64 -d > 'sample.redacted.pdf'
  ```

  ```python Python wrap lines theme={"theme":"poimandres"}
  import base64
  import requests

  # Specify the input and output file paths
  filename_in = "sample.pdf"
  filename_out = "sample.redacted.pdf"

  # Read the file and do base64 encoding
  with open(filename_in, "rb") as f:
      b64_file_content = base64.b64encode(f.read())
      b64_file_content = b64_file_content.decode("utf-8")

  # Make the request and load the results as JSON
  r = requests.post(url="http://localhost:8080/process/files/base64", 
                    json={"file": {"data": b64_file_content, "content_type": "application/pdf"}})
  results = r.json()

  # Decode and write the file to disk
  with open(filename_out, "wb") as f:
      f.write(base64.b64decode(results["processed_file"]))
  ```

  ```python Python Client wrap lines theme={"theme":"poimandres"}
  from privateai_client import PAIClient
  from privateai_client.objects import request_objects
  import base64

  # Specify the input and output file paths
  filename_in = "sample.pdf"
  filename_out = "sample.redacted.pdf"

  file_type= "application/pdf"
  client = PAIClient(url="https://api.private-ai.com/community/v4/", api_key='<YOUR API KEY>')

  # Read from file
  with open(filename_in, "rb") as b64_file:
      file_data = base64.b64encode(b64_file.read())
      file_data = file_data.decode("ascii")

  # Make the request
  file_obj = request_objects.file_obj(data=file_data, content_type=file_type)
  request_obj = request_objects.file_base64_obj(file=file_obj)
  resp = client.process_files_base64(request_object=request_obj)

  # Write to file
  with open(filename_out, 'wb') as redacted_file:
      processed_file = resp.processed_file.encode("ascii")
      processed_file = base64.b64decode(processed_file, validate=True)
      redacted_file.write(processed_file)
  ```
</CodeGroup>

This command will redact the file contents and return the redacted document as a base64-encoded string.

<Info>
  An example Python script showing how to process files with Limina's Python client using the base64 route can be found [here](https://github.com/privateai/deid-examples/blob/main/python/examples/process_file_base64.py).
</Info>

<Warning>
  It is important that the proper MIME type is provided with the base64-encoded string. Failing to pass the proper MIME type may lead to unexpected behavior. Check out the [Supported File Types](/configuration-and-operations/working-with-files/supported-file-types) page for proper MIME types.
</Warning>

Check out the API reference for more details on the [base64 endpoint](/latest/process-files-base64).

### **Processing files with the `uri` endpoint**

To process files with the `/process/files/uri` endpoint you are required to mount a volume when starting the container.

In addition, the service requires access to a folder where the redacted files will be stored. This is done with the `PAI_OUTPUT_FILE_DIR` environment variable. This variable must point to a folder that is mounted into the container as output folder.

```shell Docker Command wrap theme={"theme":"poimandres"}
docker run --rm -v <full path to your license.json file>:/app/license/license.json \
-v <full path to files>:<path in container> \
-v <full path to output>:<path in container> \
-e PAI_OUTPUT_FILE_DIR=<path to mounted folder in container> \
-p 8080:8080 -it crprivateaiprod.azurecr.io/deid:<version>
```

This is an example of a command mounting a `files` folder in the `admin` home folder as input, `output` folder as output location.

```shell Docker Command wrap theme={"theme":"poimandres"}
docker run --rm -v /home/admin/license.json:/app/license/license.json \
-v /home/admin/files:/home/admin/files \
-v /home/admin/output:/home/admin/output \
-e PAI_OUTPUT_FILE_DIR=/home/admin/output \
-p 8080:8080 -it crprivateaiprod.azurecr.io/deid:3.1.0-cpu
```

<Warning>
  **Common Pitfall**

  Mounting a folder to an existing os or app folder in the container may lead to unexpected behavior.
</Warning>

<Info>
  An example Python script showing how to process files with Limina's Python client using the URI route can be found [here](https://github.com/privateai/deid-examples/blob/main/python/examples/process_file_uri.py).
</Info>

Once the container is running with the above command, you can redact files with:

<CodeGroup>
  ```json Request Body wrap lines theme={"theme":"poimandres"}
  {
    "uri": "/home/admin/files/sample.pdf"
  }
  ```

  ```shell cURL wrap lines theme={"theme":"poimandres"}
  echo '{"uri": "/home/admin/files/sample.pdf"}' \
  | curl --request POST --url 'http://localhost:8080/process/files/uri' \
  -H 'Content-Type: application/json' -d @- | jq -r .processed_file | base64 -d > 'sample.redacted.pdf'
  ```

  ```python Python wrap lines theme={"theme":"poimandres"}
  import requests

  PATH_TO_PDF_FILE = "/home/admin/files/sample.pdf"

  response = requests.post(
      "http://localhost:8080/process/files/uri",
      json={
          "uri": PATH_TO_PDF_FILE
      }
  )
  ```

  ```python Python Client wrap lines theme={"theme":"poimandres"}
  from privateai_client import PAIClient
  from privateai_client.objects import request_objects

  client = PAIClient(url="https://api.private-ai.com/community/v4/", api_key='<YOUR API KEY>')
  filepath = "/home/admin/files/sample.pdf"
  req_obj = request_objects.file_uri_obj(uri=filepath)
  resp = client.process_files_uri(req_obj)

  response.raise_for_status()
  print(response.json())
  ```
</CodeGroup>

Upon successful completion, the above command will save the redacted file under `/home/admin/files/output/sample.redacted.pdf`.

<Note>
  **A note on permissions**

  Files created by the container will have the owner and permissions of the user running the docker service. This is commonly found to be `root` in default installations. However, you can change the user running the container using the docker `--user` option.

  This command will run the same container with the current user.

  ```shell Docker Command wrap theme={"theme":"poimandres"}
  docker run --rm -v /home/admin/license.json:/app/license/license.json \
  -e PAI_OUTPUT_FILE_DIR=/home/admin/output \
  -v /home/admin/files:/home/admin/files \
  -v /home/admin/output:/home/admin/output \
  --user $(id -u):$(id -u) \
  -p 8080:8080 -it crprivateaiprod.azurecr.io/deid:3.1.0-cpu
  ```
</Note>

Check out the API reference for more details on the [uri endpoint](/latest/process-files-uri).
