> ## Documentation Index
> Fetch the complete documentation index at: https://docs.getlimina.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Processing XML Files

> This guide will get you started with XML deidentification.

Limina supports scanning XML files for PII and creating de-identified or redacted copies. Limina’s supported entity types function across each file type, with localized variants of different **PII** (Personally Identifiable Information) entities, **PHI** (Protected Health Information) entities, and **PCI** (Payment Card Industry) entities being detected. Our [Supported Languages](/languages) and [Supported Entity Types](/entities) page provides a more detailed look.

## How XML Files Are Processed

Similar to JSON files, XML files are processed using the method described in the [Structured Data Guide](/configuration-and-operations/working-with-files/structured-data). The output file retains its original format with redaction markers of the format "\[LABEL\_X]" in place of the detected PII.

## Constraints

<Info>
  Please consider writing a handler for your specific application using the [Structured Data Guide](/configuration-and-operations/working-with-files/structured-data) to get around any of the constraints listed below.
</Info>

* The file processing routes are synchronous; large files over 5MB in size may take a long time to process.
* Only node text contents and attributes are redacted. Node names are assumed to not contain PII and are not redacted.
* Entity detections numbering is consistent within individual values only, not across the entire file.
* The XML must be entirely human-readable content. Encoded content such as Base64 is not natively supported and may lead to inaccurate PII detection. Please process this content separately.
* External sources, such as hyperlinks, images, and references (including custom schemas) in the XML are not loaded, processed, or redacted and will be handled verbatim as text. If you require this type of data to be handled in the XML, please consider writing your own handler.
* Because tag and element values are modified, it is possible for the de-identified XML to not match the original file schema. For example, `<money currency="US">23</money>` will be redacted as `<money currency="[LOCATION_COUNTRY]">[MONEY]</money>`. Please adjust the [enabled entity types](/configuration-and-operations/entity-detection-and-redaction/customizing-detection) for your needs.

## Support Matrix

|           | CPU Container | GPU Container | Community API | Professional API |
| --------- | ------------- | ------------- | ------------- | ---------------- |
| Supported | Yes           | Yes           | Up to 250 KiB | No               |

## Sample Request

<Info>
  [Connect with one of our privacy experts](https://getlimina.ai/en/contact-us/?utm_source=docs\&utm_medium=website) to run this code.
</Info>

<CodeGroup>
  ```json Request Body wrap lines theme={"theme":"poimandres"}
  {
    "file": {
      "data": "<file_content_base64>",
      "content_type": "application/xml"
    },
    "entity_detection": {
      "return_entity": true
    }
  }
  ```

  ```shell curl wrap lines theme={"theme":"poimandres"}
  echo '{
            "file": {"data": "'$(base64 -w 0 sample.xml)'", 
            "content_type": "application/xml"}, 
            "entity_detection": {"return_entity": "True"}
        }' \
  | curl --request POST --url 'https://api.private-ai.com/community/v4/process/files/base64' \
         -H 'Content-Type: application/json' \
         -H 'x-api-key: <YOUR KEY HERE>' \
         -d @- \
         | jq -r .processed_file \
         | base64 -d > 'sample.redacted.xml'
  ```

  ```python python wrap lines theme={"theme":"poimandres"}
  import requests
  import base64

  file_url = "https://paidocumentation.blob.core.windows.net/$web/sample.xml"
  filename_out = "/path/to/output/sample.redacted.xml"
  file_content = requests.get(file_url).content
  file_content_base64 = base64.b64encode(file_content).decode()

  url = "https://api.private-ai.com/community/v4/process/files/base64"

  headers = {"Content-Type": "application/json", "x-api-key": "<INSERT API KEY>"}

  payload = {
    "file":{
      "data": file_content_base64,
      "content_type": "application/xml",
    },
    "entity_detection": {
      "return_entity": True
    }
  }

  response = requests.post(url, json=payload, headers=headers)
  with open(filename_out, "wb") as f:
      f.write(base64.b64decode(response.json()["processed_file"]))
  ```

  ```python Python Client wrap lines theme={"theme":"poimandres"}
  from privateai_client import PAIClient
  from privateai_client.objects import request_objects
  import base64

  filename_in = "sample.xml"
  filename_out = "sample.redacted.xml"

  file_type= "application/xml"
  client = PAIClient(url="https://api.private-ai.com/community/v4/", api_key="<YOUR API KEY>")

  with open(filename_in, "rb") as b64_file:
      file_data = base64.b64encode(b64_file.read())
      file_data = file_data.decode("ascii")

  file_obj = request_objects.file_obj(data=file_data, content_type=file_type)
  request_obj = request_objects.file_base64_obj(file=file_obj)
  resp = client.process_files_base64(request_object=request_obj)

  with open(filename_out, 'wb') as redacted_file:
      processed_file = resp.processed_file.encode("ascii")
      processed_file = base64.b64decode(processed_file, validate=True)
      redacted_file.write(processed_file)
  ```
</CodeGroup>

## Sample Response

```json Response wrap lines theme={"theme":"poimandres"}
{
  "processed_file": "Base64 Encoded File Content of the Redacted File",
  "processed_text": "string",
  "entities": "List[Entity]",
  "entities_present": true,
  "languages_detected": {"lang_1": 0.67, "lang_2": 0.74}
}
```
