Processing TXT Files - Limina Docs

Limina supports scanning TXT files for PII and creating de-identified or redacted copies. Limina’s supported entity types function across each file type, with localized variants of different PII (Personally Identifiable Information) entities, PHI (Protected Health Information) entities, and PCI (Payment Card Industry) entities being detected. Our Supported Languages and Supported Entity Types page provides a more detailed look.

If you’d like to try it yourself, please sign up for an account to get a free API key.

How TXT Files Are Processed

TXT files are processed by simply reading in the contents of the TXT files verbatim and passing it through Limina’s text module. The resulting file will contain the labelled and redacted version of contents of the original.

Constraints

Limina currently only supports utf-8 encoding for text files.

Support Matrix

	CPU Container	GPU Container	Community API	Professional API
Supported	Yes	Yes	Up to 250 KiB	No

Sample Request

Connect with one of our privacy experts to run this code.

{
  "file": {
    "data": "<file_content_base64>",
    "content_type": "text/plain"
  },
  "entity_detection": {
    "return_entity": true
  }
}

echo '{
          "file": {"data": "'$(base64 -w 0 sample.txt)'", 
          "content_type": "text/plain"}, 
          "entity_detection": {"return_entity": "True"}
      }' \
| curl --request POST --url 'https://api.getlimina.ai/community/v4/process/files/base64' \
       -H 'Content-Type: application/json' \
       -H 'x-api-key: <YOUR KEY HERE>' \
       -d @- \
       | jq -r .processed_file \
       | base64 -d > 'sample.redacted.txt'

import requests
import base64

file_url = "https://paidocumentation.blob.core.windows.net/$web/sample.txt"
filename_out = "/path/to/output/sample.redacted.txt"
file_content = requests.get(file_url).content
file_content_base64 = base64.b64encode(file_content).decode()

url = "https://api.getlimina.ai/community/v4/process/files/base64"

headers = {"Content-Type": "application/json", "x-api-key": "<INSERT API KEY>"}

payload = {
  "file":{
    "data": file_content_base64,
    "content_type": "text/plain",
  },
  "entity_detection": {
    "return_entity": True
  }
}

response = requests.post(url, json=payload, headers=headers)
with open(filename_out, "wb") as f:
    f.write(base64.b64decode(response.json()["processed_file"]))

from privateai_client import PAIClient
from privateai_client.objects import request_objects
import base64

filename_in = "sample.txt"
filename_out = "sample.redacted.txt"

file_type= "text/plain"
client = PAIClient(url="https://api.getlimina.ai/community/v4/", api_key="<YOUR API KEY>")

with open(filename_in, "rb") as b64_file:
    file_data = base64.b64encode(b64_file.read())
    file_data = file_data.decode("ascii")

file_obj = request_objects.file_obj(data=file_data, content_type=file_type)
request_obj = request_objects.file_base64_obj(file=file_obj)
resp = client.process_files_base64(request_object=request_obj)

with open(filename_out, 'wb') as redacted_file:
    processed_file = resp.processed_file.encode("ascii")
    processed_file = base64.b64decode(processed_file, validate=True)
    redacted_file.write(processed_file)

Sample Response

Response

{
  "processed_file": "Base64 Encoded File Content of the Redacted File",
  "processed_text": "string",
  "entities": "List[Entity]",
  "entities_present": true,
  "languages_detected": {"lang_1": 0.67, "lang_2": 0.74}
}

​How TXT Files Are Processed

​Constraints

​Support Matrix

​Sample Request

​Sample Response

How TXT Files Are Processed

Constraints

Support Matrix

Sample Request

Sample Response