> ## Documentation Index
> Fetch the complete documentation index at: https://docs.getlimina.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Python Client

> Documentation for developing using Limina's Python client.

This document provides information about how to use Limina's Python client to interact with the container or cloud API. In addition to this guide, you might find the [Github repository](https://github.com/privateai/pai-thin-client/) helpful. It contains further examples and usage options.

## Installation

The Python client is available for download on [pypi.org](https://pypi.org/project/privateai-client/) or with pip:

```shell Pip Command theme={"theme":"poimandres"}
pip install privateai_client
```

## Quickstart

```python Python Client lines wrap theme={"theme":"poimandres"}
from privateai_client import PAIClient
from privateai_client import request_objects

client = PAIClient(url="https://api.private-ai.com/community/v4/", api_key='<YOUR API KEY>')
text_request = request_objects.process_text_obj(text=["My sample name is John Smith"])
response = client.process_text(text_request)

print(text_request.text)
print(response.processed_text)
```

Output:

```text Output theme={"theme":"poimandres"}
['My sample name is John Smith']
['My sample name is [NAME_1]']
```

## Working with the Client

### Initializing the Client for self-hosted container

The Limina client requires a scheme, host, and optional port to initialize. Alternatively, a full url can be used. Once created, the connection can be tested with the client's `ping` function

```python Python Client lines theme={"theme":"poimandres"}
from privateai_client import PAIClient
scheme = 'http'
host = 'localhost'
port= '8080'
client = PAIClient(scheme, host, port)

client.ping()


url = "http://localhost:8080"
client = PAIClient(url=url)

client.ping()
```

Output:

```text Output theme={"theme":"poimandres"}
True
True
```

##### Note: The container is hosted with your provisioned application license and does not manage authentication to the API or authorization of API requests. Access to the container is at the discretion of the user. For recommendations on how to deploy in an enterprise context including authorized use, please contact us.

### Initializing the Client for our cloud-API offering

To access the cloud API, you need to authenticate with your API key. You can get one from the [customer portal](https://portal.getlimina.ai/).

```python Python Client lines wrap theme={"theme":"poimandres"}
from privateai_client import PAIClient
# Adding credentials on initialization
client = PAIClient(url="https://api.private-ai.com/community/v4/", api_key='<YOUR API KEY>')

# Adding credentials after initialization
client = PAIClient(url="https://api.private-ai.com/community/v4/")
client.ping()
client.add_api_key('<YOUR API KEY>')
client.ping()
```

Output:

```text Output theme={"theme":"poimandres"}
The request returned with a 401 Unauthorized
True
```

### Making Requests

Once initialized the client can be used to make any request listed in the \[API documentation]\[/latest/process-text]

Available requests:

| Client Function          | Endpoint                |
| ------------------------ | ----------------------- |
| `get_version()`          | `/`                     |
| `ping()`                 | `/healthz`              |
| `get_metrics()`          | `/metrics`              |
| `get_diagnostics()`      | `/diagnostics`          |
| `ner_text()`             | `/ner/text`             |
| `process_text()`         | `/process/text`         |
| `analyze_text()`         | `/analyze/text`         |
| `process_files_uri()`    | `/process/files/uri`    |
| `process_files_base64()` | `/process/files/base64` |
| `bleep()`                | `/bleep`                |

Requests can be made using dictionaries:

```python Python Client lines wrap theme={"theme":"poimandres"}
sample_text = ["This is John Smith's sample dictionary request"]
text_dict_request = {"text": sample_text}

response = client.process_text(text_dict_request)
print(response.processed_text)
```

Output:

```text Output wrap theme={"theme":"poimandres"}
["This is [NAME_1]'s sample dictionary request"]
```

or using built-in request objects:

```python Python Client lines wrap theme={"theme":"poimandres"}
from privateai_client import request_objects

sample_text = "This is John Smith's sample process text object request"
text_request_object =  request_objects.process_text_obj(text=[sample_text])

response = client.process_text(text_request_object)
print(response.processed_text)
```

Output:

```text Output wrap theme={"theme":"poimandres"}
["This is [NAME_1]'s sample process text object request"]
```

## Request Objects

Request objects are a simple way of creating request bodies without the tediousness of writing dictionaries. Every POST request (as listed in the \[Limina API documentation]\[/latest/process-text]

) has its own request own request object.

```python Python Client lines wrap theme={"theme":"poimandres"}
from privateai_client import request_objects

sample_obj = request_objects.file_uri_obj(uri='path/to/file.jpg')
sample_obj.uri
```

Output:

```text Output wrap theme={"theme":"poimandres"}
'path/to/file.jpg'
```

Additionally there are request objects for each nested dictionary of a request:

```python Python Client lines wrap theme={"theme":"poimandres"}
from privateai_client import request_objects

sample_text = "This is John Smith's sample process text object request where names won't be removed"

# sub-dictionary of entity_detection
sample_entity_type_selector = request_objects.entity_type_selector_obj(type="DISABLE", value=['NAME', 'NAME_GIVEN', 'NAME_FAMILY'])

# sub-dictionary of a process text request
sample_entity_detection = request_objects.entity_detection_obj(entity_types=[sample_entity_type_selector])

# request object created using the sub-dictionaries
sample_request = request_objects.process_text_obj(text=[sample_text], entity_detection=sample_entity_detection)
response = client.process_text(sample_request)
print(response.processed_text)
```

Output:

```text Output wrap theme={"theme":"poimandres"}
["This is John Smith's sample process text object request where names won't be removed"]
```

### Building Request Objects

Request objects can initialized by passing in all the required values needed for the request as arguments or from a dictionary, using the object's `fromdict()` function:

```python Python Client lines wrap theme={"theme":"poimandres"}
# Passing arguments
sample_data = "JVBERi0xLjQKJdPr6eEKMSAwIG9iago8PC9UaXRsZSAoc2FtcGxlKQovUHJvZHVj..."
sample_content_type = "application/pdf"

sample_file_obj = request_objects.file_obj(data=sample_data, content_type=sample_content_type)

# Passing a dictionary using .fromdict()
sample_dict = {"data": "JVBERi0xLjQKJdPr6eEKMSAwIG9iago8PC9UaXRsZSAoc2FtcGxlKQovUHJvZHVj...",
               "content_type": "application/pdf"}

sample_file_obj2 = request_objects.file_obj.fromdict(sample_dict)
```

Request objects also can be formatted as dictionaries, using the request object's `to_dict()` function:

```python lines wrap theme={"theme":"poimandres"}
from privateai_client import request_objects

sample_text = "Sample text."
# Create the nested request objects
sample_entity_type_selector = request_objects.entity_type_selector_obj(type="DISABLE", value=['HIPAA_SAFE_HARBOR'])
sample_entity_detection = request_objects.entity_detection_obj(entity_types=[sample_entity_type_selector])
# Create the request object
sample_request = request_objects.process_text_obj(text=[sample_text], entity_detection=sample_entity_detection)

# All nested request objects are also formatted
print(sample_request.to_dict())
```

Output:

```python Output wrap lines theme={"theme":"poimandres"}
{
 'text': ['Sample text.'],
 'link_batch': False,
 'entity_detection': {'accuracy': 'high', 'entity_types': [{'type': 'DISABLE', 'value': ['HIPAA_SAFE_HARBOR']}], 'filter': [], 'return_entity': True},
 'processed_text': {'type': 'MARKER', 'pattern': '[UNIQUE_NUMBERED_ENTITY_TYPE]'}
}
```

## Sample Use

### Processing a directory of files with URI route

```python Python Client lines wrap theme={"theme":"poimandres"}
from privateai_client import PAIClient
from privateai_client.objects import request_objects
import os
import logging

file_dir = "/path/to/file/directory"
client = PAIClient(url="https://api.private-ai.com/community/v4/", api_key='<YOUR API KEY>')
for file_name in os.listdir(file_dir):
    filepath = os.path.join(file_dir, file_name)
    if not os.path.isfile(filepath):
        continue
    req_obj = request_objects.file_uri_obj(uri=filepath)
    # NOTE this method of file processing requires the container to have an the input and output directories mounted
    resp = client.process_files_uri(req_obj)
    if not resp.ok:
        logging.error(f"response for file {file_name} returned with {resp.status_code}")
```

### Processing a file with Base64 route

```python Python Client lines wrap theme={"theme":"poimandres"}
from privateai_client import PAIClient
from privateai_client.objects import request_objects
import base64
import os
import logging

file_dir = "/path/to/your/file"
file_name = 'sample_file.pdf'
filepath = os.path.join(file_dir,file_name)
file_type= "type/of_file" #eg. application/pdf
client = PAIClient(url="https://api.private-ai.com/community/v4/", api_key='<YOUR API KEY>')

# Read from file
with open(filepath, "rb") as b64_file:
    file_data = base64.b64encode(b64_file.read())
    file_data = file_data.decode("ascii")

# Make the request
file_obj = request_objects.file_obj(data=file_data, content_type=file_type)
request_obj = request_objects.file_base64_obj(file=file_obj)
resp = client.process_files_base64(request_object=request_obj)
if not resp.ok:
    logging.error(f"response for file {file_name} returned with {resp.status_code}")

# Write to file
with open(os.path.join(file_dir,f"redacted-{file_name}"), 'wb') as redacted_file:
    processed_file = resp.processed_file.encode("ascii")
    processed_file = base64.b64decode(processed_file, validate=True)
    redacted_file.write(processed_file)
```

### Bleep an audio file

```python Python Client lines wrap theme={"theme":"poimandres"}
from privateai_client import PAIClient
from privateai_client.objects import request_objects
import base64
import os
import logging

file_dir = "/path/to/your/file"
file_name = 'sample_file.pdf'
filepath = os.path.join(file_dir,file_name)
file_type= "type/of_file" #eg. audio/mp3 or audio/wav
client = PAIClient(url="https://api.private-ai.com/community/v4/", api_key='<YOUR API KEY>')


file_dir = "/home/adam/workstation/file_processing/test_audio"
file_name = "test_audio.mp3"
filepath = os.path.join(file_dir,file_name)
file_type = "audio/mp3"
with open(filepath, "rb") as b64_file:
    file_data = base64.b64encode(b64_file.read())
    file_data = file_data.decode("ascii")

file_obj = request_objects.file_obj(data=file_data, content_type=file_type)
timestamp = request_objects.timestamp_obj(start=1.12, end=2.14)
request_obj = request_objects.bleep_obj(file=file_obj, timestamps=[timestamp])

resp = client.bleep(request_object=request_obj)
if not resp.ok:
    logging.error(f"response for file {file_name} returned with {resp.status_code}")
with open(os.path.join(file_dir,f"redacted-{file_name}"), 'wb') as redacted_file:
    processed_file = resp.bleeped_file.encode("ascii")
    processed_file = base64.b64decode(processed_file, validate=True)
    redacted_file.write(processed_file)
```

## Analyze Text Post-Processing

The [`analyze/text`](/latest/analyze-text) route returns rich, structured detections you can post-process with the Limina Python client. It is a route specifically developed for text understanding. For more details on its capabilities, refer to the [analyze/text documentation](/configuration-and-operations/advanced-features/analyze-text). In this section, we describe how the Python client can be used to post-process the analyze text response. The Python client provides utilities to iterate through detected entities and apply transformation rules, such as masking, pseudonymizing, validating, or normalizing values.

The following example introduces the required pieces for post-processing, which we describe in detail.

```python Python Client lines wrap theme={"theme":"poimandres"}
# This code assumes that you have the Limina deidentification service running locally on port 8080.
# It also assumes that you have installed the Limina python client.
from privateai_client import PAIClient
from privateai_client.components import AnalyzeTextRequest
from privateai_client.post_processing import deidentify_text
from privateai_client.post_processing.processors import MarkerEntityProcessor

client = PAIClient(
    url="https://api.private-ai.com/community/v4/", api_key="<YOUR-API-KEY>"
)

text = [
    "Jenna is a 32 year old female diagnosed with asthma."
]

request = {
    "text": text,
    "locale": "en-US",
    "entity_detection": {
        "accuracy": "high",
        "entity_types": [{"type": "ENABLE", "value": ["AGE", "NAME"]}],
    },
}

text_request = AnalyzeTextRequest.fromdict(request)
resp = client.analyze_text(text_request)

# THIS IS THE CUSTOM LOGIC TO IMPLEMENT
class AgeBucketEntityProcessor:
    def __init__(self, bucket_size: int = 5):
        self.bucket_size = bucket_size

    def __call__(self, entity: dict) -> str:
        age = entity["analysis_result"].get("formatted")
        if not age:
            return "[%-%]"
        start = (age // self.bucket_size) * self.bucket_size
        end = start + self.bucket_size
        return f"[{start}-{end}]"


entity_processors = {"AGE": AgeBucketEntityProcessor(bucket_size=10)}

deidentified_texts = deidentify_text(
    text,
    resp,
    entity_processors=entity_processors,
    default_processor=MarkerEntityProcessor(),
)
for t in deidentified_texts:
    print(t)
```

The output of this code replaces the age with the corresponding range.

```text Output wrap theme={"theme":"poimandres"}
[NAME_1] is a [30-40] year old female diagnosed with asthma.
```

At the core of this workflow is the `deidentify_text` function which allows for entity replacements by invoking various entity processors. Each processor defines the exact behavior for a given entity type, making it easy to implement custom redaction tailored to your use case.

The function `deidentify_text(...)` takes the original texts plus the `analyze/text` response, walks through every detected entity in left-to-right order, and replaces each entity span using the appropriate processor. It also automatically adjusts the character offsets of the entity locations after their replacements.

```python Python Client lines wrap theme={"theme":"poimandres"}
from typing import Callable
from privateai_client.components import AnalyzeTextResponse

EntityProcessor = Callable[[dict], str]

def deidentify_text(
    text: list[str],
    response: AnalyzeTextResponse,
    entity_processors: dict[str, EntityProcessor],
    default_processor: EntityProcessor,
) -> list[str]:
    ...
```

* `text` - The original list of text messages that were passed into `PAIClient.analyze_text()`
* `response` - The structured response returned by the `analyze_text` call
* `entity_processors` - Mapping of entity type to entity processor, e.g. `{"DATE": redact_date, "CREDIT_CARD": redact_credit_card}`
  * Each processor is a callable that accepts an entity dictionary and returns the replacement string for that entity.
  * Invoked when the entity `best_label` matches a key in this dictionary.
* `default_processor` - A fallback processor applied to all entity types not explicitly listed in `entity_processors`. This ensures every entity is handled, even if you only configure custom processors for subset of the enabled entities.

The response is a list of de-identified text strings.

#### Entity Processors

The processors are callables (`Callable[[dict], str]`) that take a detected entity dictionary and return the replacement text for that span. It can be as simple as a function, or a class which implements the `__call__` method. In the example above we created the `AgeBucketEntityProcessor`, which puts the entity `AGE` into a bucket.

The potential use cases are broad. A few common examples include:

* Hide all but the last 4 digits in a `CREDIT_CARD` number;
* Keep only the year in a `DATE` entity;
* Shift all dates by an offset in a `DATE` entity;
* Replace names with initials only;
* Preserve email domain, mask the username in an `EMAIL_ADDRESS` entity;
* Leave only the less sensitive characters in a `LOCATION_ZIP` code;
* Redact entities based on fuzzy similarity to a list of identifiable terms;

#### Built-in processors

In addition to writing your own processors, the client ships with three built-in entity processors, with more planned in future releases:

* `MaskEntityProcessor` and `MarkerEntityProcessor` - intended to be used for default processing.
* `FuzzyMatchEntityProcessor` - configurable processor that matches entities against a list of known words using Damerau–Levenshtein distance. It can automatically catch misspellings or near-duplicates, and be set to allow or block specific entities while doing the opposite for all others of the same type. A complete [example](#fuzzy-matching-against-list-of-known-words) is provided below.

The sections below showcase how some of these can be implemented in more detail.

### Custom redaction of credit card numbers

```python Python Client lines wrap theme={"theme":"poimandres"}
# This code assumes that you have installed the Limina python client.
from privateai_client.post_processing import deidentify_text
from privateai_client.post_processing.processors import MarkerEntityProcessor
from privateai_client import PAIClient
from privateai_client.components import AnalyzeTextRequest

client = PAIClient(url="https://api.private-ai.com/community/v4/", api_key='<YOUR API KEY>')

text = [
    "Okay, hang on just a second because I got to get it. Okay, it is 6578-7790-4346-2237. Expiration. 1224.",
    "All right, I'm ready. 800 678-457-7896. Expiration is one. 224.",
    "CC_type: Diners Club International RuPay Visa JCB Amex CCN: 30569309025904 4242424242424242 4222222222222 6172873484776530 378282246310005 CC_CVC: 480 902 182 765 143 CC_Expiredate: 5/28 6/67 12/67 11/29 9/70",
]

request = {"text": text, "locale": "en-US", "entity_detection": {"accuracy": "high", "entity_types": [{"type": "ENABLE", "value": ["CREDIT_CARD"]}]}}

text_request = AnalyzeTextRequest.fromdict(request)
resp = client.analyze_text(text_request)

# THIS IS THE CUSTOM LOGIC TO IMPLEMENT
def redact_credit_card(entity) -> str:
    """Redacts credit card numbers"""

    analysis_result = entity["analysis_result"]
    for assertion in analysis_result["validation_assertions"]:
        if assertion["provider"] == "luhn":
            if assertion["status"] == "valid":
                return f"[{'*' * 12}{analysis_result['formatted'][-4:]}]"
            else:
                return f"{analysis_result['formatted']} [INVALID]"
    return f"{entity['text']}"


entity_processors = {"CREDIT_CARD": redact_credit_card}

deidentified_text = deidentify_text(text, resp, entity_processors=entity_processors, default_processor=MarkerEntityProcessor())
for example in deidentified_text:
    print(example)
```

The `redact_credit_card` function contains the necessary logic to redact credit card numbers as follows:

* if the credit card number is valid, hide it except for the last four characters (which could include spaces).
* if the credit card number is parsed correctly but it fails the Luhn check it means that the number is invalid. In this case, don't hide the number and add an INVALID tag after the number. This could be used to more easily identify invalid credit card numbers in text for a later review.
* if the number fails to parse as a credit card number then do nothing. This code is assuming that this is not a credit card number.

The above code output looks like this:

```text Output wrap theme={"theme":"poimandres"}
Okay, hang on just a second because I got to get it. Okay, it is 6578 7790 4346 2237 [INVALID]. Expiration. 1224.
All right, I'm ready. 800 678-457-7896. Expiration is one. 224.
CC_type: Diners Club International RuPay Visa JCB Amex CCN: [************ 904] [************4242] [************ 222] 6172 8734 8477 6530 [INVALID] [************ 005] CC_CVC: 480 902 182 765 143 CC_Expiredate: 5/28 6/67 12/67 11/29 9/70
```

Notice how the credit card number on the first line example was not redacted but an INVALID marker was added right after it instead. On the second line, the `800 678-457-7896` entity was left unredacted as expected. This entity is possibly a phone number and not a credit card number. Finally, the last line shows several examples of valid credit card numbers and a single invalid one. The valid credit card numbers were masked except for their last characters as expected.

### Custom redaction of dates

```python Python Client lines wrap theme={"theme":"poimandres"}
# This code assumes that you have installed the Limina python client.
from privateai_client.post_processing import deidentify_text
from privateai_client.post_processing.processors import MarkerEntityProcessor
from privateai_client import PAIClient
from privateai_client.components import AnalyzeTextRequest

client = PAIClient(url="https://api.private-ai.com/community/v4/", api_key='<YOUR API KEY>')

text = [
    "$MDT $MRK $QRVO $TSS &amp; 5 more stock picks for LONG swings:  https://t.co/CbkieXxqoR (July 10 2018) https://t.co/eit53RUY4g",
    "Short sale volume (not short interest) for $KBE on 2018-07-09 is 42%. https://t.co/7pWbgjJ8Ag $FOXA 38% $TVIX 34% $LITE 54% $HIG 60%",
    "$WLTW high OI range is 160 to 155 for option expiration 07/20/2018 #options https://t.co/BnVElKBKkJ",
]

request = {
    "text": text,
    "locale": "en-US",
    "entity_detection": {"accuracy": "high", "entity_types": [{"type": "ENABLE", "value": ["DATE", "DOB", "DAY", "MONTH", "YEAR"]}]},
}

text_request = AnalyzeTextRequest.fromdict(request)
resp = client.analyze_text(text_request)


# THIS IS THE CUSTOM LOGIC TO IMPLEMENT
def redact_date(entity) -> str:
   """Redacts days and months from dates"""

    offset = entity["location"]["stt_idx"]
    text = entity["text"]
    for subtype in entity["analysis_result"]["subtypes"]:
        if subtype["label"] in ["DAY", "MONTH"] and "location" in subtype:
            stt = subtype["location"]["stt_idx"] - offset
            end = subtype["location"]["end_idx"] - offset
            text = text[:stt] + "#" * (end - stt) + text[end:]

    return text


entity_processors = {"DATE": redact_date, "DOB": redact_date}

deidentified_text = deidentify_text(text, resp, entity_processors=entity_processors, default_processor=MarkerEntityProcessor())
for example in deidentified_text:
    print(example)
```

The output of this request is provided below:

```text Output wrap theme={"theme":"poimandres"}
$MDT $MRK $QRVO $TSS &amp; 5 more stock picks for LONG swings:  https://t.co/CbkieXxqoR (#### ## 2018) https://t.co/eit53RUY4g
Short sale volume (not short interest) for $KBE on 2018-##-## is 42%. https://t.co/7pWbgjJ8Ag $FOXA 38% $TVIX 34% $LITE 54% $HIG 60%
$WLTW high OI range is 160 to 155 for option expiration ##/##/2018 #options https://t.co/BnVElKBKkJ
```

Notice how the dates have been partially redacted. A similar approach can be used to instead shift the dates. To do so, simply replace the date processor in the above code with this one:

```python Python Client lines wrap theme={"theme":"poimandres"}
def redact_date(entity) -> str:
   """Shifts date by a random number of weeks (0 to 20 weeks)"""

    random_week_offset = random.randint(0, 20)
    if "formatted" in entity["analysis_result"]:
        formatted_datetime = datetime.fromisoformat(entity["analysis_result"]["formatted"])
        return str((formatted_datetime+timedelta(weeks=random_week_offset)).date())
    else:
        return entity["text"]
```

This is an example output of this date processor.

```text Output wrap theme={"theme":"poimandres"}
$MDT $MRK $QRVO $TSS &amp; 5 more stock picks for LONG swings:  https://t.co/CbkieXxqoR (2018-07-17) https://t.co/eit53RUY4g
Short sale volume (not short interest) for $KBE on 2018-08-20 is 42%. https://t.co/7pWbgjJ8Ag $FOXA 38% $TVIX 34% $LITE 54% $HIG 60%
$WLTW high OI range is 160 to 155 for option expiration 2018-09-28 #options https://t.co/BnVElKBKkJ
```

Notice how the dates are replaced with dates that have been shifted by a random number of weeks.

### Custom redaction of ages

```python Python Client lines wrap theme={"theme":"poimandres"}
# This code assumes that you have installed the Limina python client.
from privateai_client.post_processing import deidentify_text
from privateai_client.post_processing.processors import MarkerEntityProcessor
from privateai_client import PAIClient
from privateai_client.components import AnalyzeTextRequest

client = PAIClient(url="https://api.private-ai.com/community/v4/", api_key='<YOUR API KEY>')

text = [
    "A 32-year old Black female German citizen living in Germany wants to travel to the United States for leisure.",
    "West Point Public School division provides school-based preschool services for children from two through nine years of age who are children at risk and children with identified disabilities or delays.",
]

request = {"text": text, "locale": "en-US", "entity_detection": {"accuracy": "high", "entity_types": [{"type": "ENABLE", "value": ["AGE"]}]}}

text_request = AnalyzeTextRequest.fromdict(request)
resp = client.analyze_text(text_request)


# THIS IS THE CUSTOM LOGIC TO IMPLEMENT
def redact_age(entity) -> str:
    """Round to the closest 10th"""

    if "formatted" in entity["analysis_result"]:
        age = entity["analysis_result"]["formatted"]
        return str(int(round(age * 10, -2) / 10))
    else:
        "#"


entity_processors = {"AGE": redact_age}

deidentified_text = deidentify_text(text, resp, entity_processors=entity_processors, default_processor=MarkerEntityProcessor())
for example in deidentified_text:
    print(example)
```

The output of this code shows that ages have been bucketed to the closest multiple of ten.

```text Output wrap theme={"theme":"poimandres"}
A 30-year old Black female German citizen living in Germany wants to travel to the United States for leisure.
West Point Public School division provides school-based preschool services for children from 0 through 10 years of age who are children at risk and children with identified disabilities or delays.
```

### Custom redaction of locations

```python Python Client wrap theme={"theme":"poimandres"}
# This code assumes that you have installed the Limina python client.
from privateai_client.post_processing import deidentify_text
from privateai_client.post_processing.processors import MarkerEntityProcessor
from privateai_client import PAIClient
from privateai_client.components import AnalyzeTextRequest

client = PAIClient(url="https://api.private-ai.com/community/v4/", api_key='<YOUR API KEY>')

text = [
    "Please deliver this to 45, Clybaun Heights, Galway City, Ireland H91 AKK3",
    "3255 M-A-D-D-A-M-S street, huntington, west virginia is his birthplace",
    "My favorite city is San Francisco, California 94110, United States, 37.7749° N, 122.4194° W",
]

request = {"text": text, "locale": "en-US", "entity_detection": {"accuracy": "high"}}

text_request = AnalyzeTextRequest.fromdict(request)
resp = client.analyze_text(text_request)

# THIS IS THE CUSTOM LOGIC TO IMPLEMENT
def redact_address(entity) -> str:
    """Redacts address to hide the most sensitive info"""

    analysis_result = entity["analysis_result"]
    subtypes = sorted(analysis_result["subtypes"], key=lambda x: x["location"]["stt_idx"])
    address_parts = []
    for subtype in subtypes:
        if subtype["label"] in ["LOCATION_COUNTRY", "LOCATION_STATE", "LOCATION_CITY"]:
            address_parts.append(subtype["text"])
        elif subtype["label"] in ["LOCATION_ZIP"]:
            address_parts.append(subtype["text"][:3] + "#" * (len(subtype["text"]) - 3))
        else:
            address_parts.append(f"""[{subtype["label"]}]""")
    return " ".join(address_parts)


entity_processors = {"LOCATION": redact_address, "LOCATION_ADDRESS": redact_address}

deidentified_text = deidentify_text(text, resp, entity_processors=entity_processors, default_processor=MarkerEntityProcessor())
for example in deidentified_text:
    print(example)
```

The output of the code above provides the redacted addresses. As you can see, only the first 3 characters of the postal code and zip code are kept and addresses, when present, are redacted. The last example shows that GPS coordinates are also redacted.

```text Output wrap theme={"theme":"poimandres"}
Please deliver this to [LOCATION_ADDRESS_STREET] Galway City Ireland H91#####
[LOCATION_ADDRESS_STREET] huntington west virginia is his birthplace
My favorite city is San Francisco California 941## United States [LOCATION_COORDINATE]
```

### Custom redaction of coreferenced names

```python Python Client lines wrap theme={"theme":"poimandres"}
# This code assumes that you have installed the Limina python client.
from privateai_client.post_processing import deidentify_text
from privateai_client import PAIClient
from privateai_client.components import AnalyzeTextRequest

client = PAIClient(url="https://api.private-ai.com/community/v4/", api_key='<YOUR API KEY>')

text = [
    "Nikola Jokić is a basketball player. LeBron James is also a basketball player. "
    "Jokić and James played against each other. Jokić led his team with a triple-double performance. "
    "After the game, Nikola praised his teammates for their effort. "
    "Many fans consider Nikola Jokić one of the best centers in NBA history."
]

request = {
    "text": text,
    "locale": "en-US",
    "entity_detection": {"accuracy": "high"},
    "relation_detection": {"coreference_resolution": "model_prediction"},
}

text_request = AnalyzeTextRequest.fromdict(request)
resp = client.analyze_text(text_request)

# THIS IS THE CUSTOM LOGIC TO IMPLEMENT
coref_to_initials: dict[str, str] = {}

def replace_with_initials(entity: dict) -> str:
    """Replace any detected person with initials in the style A.B."""
    coref_id = entity.get("coreference_id")
    original_text = entity["text"]

    if not coref_id:
        return original_text

    if coref_id in coref_to_initials:
        return coref_to_initials[coref_id]

    parts = original_text.split()
    initials = "".join(p[0].upper() + "." for p in parts if p)

    coref_to_initials[coref_id] = initials
    return initials

entity_processors = {
    "NAME": replace_with_initials,
    "NAME_GIVEN": replace_with_initials,
    "NAME_FAMILY": replace_with_initials,
}

deidentified_text = deidentify_text(
    text,
    resp,
    entity_processors=entity_processors,
    default_processor=lambda entity: entity["text"]
)

for example in deidentified_text:
    print(example)
```

The output of running this code replaces names with the corresponding initials of the people mentioned in the text.

```text Output wrap theme={"theme":"poimandres"}
N.J. is a basketball player. L.J. is also a basketball player. N.J. and L.J. played against each other. N.J. led his team with a triple-double performance. After the game, N.J. praised his teammates for their effort. Many fans consider N.J. one of the best centers in NBA history.
```

In the following example, we explore the capabilities of the built-in `FuzzyMatchEntityProcessor` in more depth.

### Fuzzy matching against list of known words

```python Python Client lines wrap theme={"theme":"poimandres"}
# This code assumes that you have installed the Limina python client.
from privateai_client import PAIClient
from privateai_client.components import AnalyzeTextRequest
from privateai_client.post_processing import deidentify_text
from privateai_client.post_processing.processors import (
    MaskEntityProcessor,
    FuzzyMatchEntityProcessor,
)

client = PAIClient(
    url="https://api.private-ai.com/community/v4/",
    api_key="<YOUR-API-KEY>",
)


text = [
    "Limian released a new API.",
    "Our partners include ExampleSoft, OpenAI, and limina.",
    "The conference in Toronto featured Google and LUMINA on stage.",
]
request = {
    "text": text,
    "locale": "en",
    "entity_detection": {
        "accuracy": "high",
        "entity_types": [{"type": "ENABLE", "value": ["ORGANIZATION"]}],
    },
}
request_object = AnalyzeTextRequest.fromdict(request)
analyze_text_rsp = client.analyze_text(request_object)

default_mask_processor = MaskEntityProcessor()
fuzzy_processor = FuzzyMatchEntityProcessor(
    known_words_list=["Limina"],
    threshold=2,
    strategy="BLOCK",
    process_type="MASK",
    ignore_casing=True,
)

text_out = deidentify_text(
    text=text,
    response=analyze_text_rsp,
    entity_processors={"ORGANIZATION": fuzzy_processor},
    default_processor=default_mask_processor,
)
for t in text_out:
    print(t)
```

The output of running this code is:

```text Output wrap theme={"theme":"poimandres"}
###### released a new API.
Our partners include ExampleSoft, OpenAI, and ######.
The conference in Toronto featured Google and ###### on stage.
```

This example contains intentional misspellings to demonstrate fuzzy matching. All variants of "Limina" are consistently redacted with masked text. Other company names remain unchanged, since they are not in the known word list, which we intend to mask.

### Combining synthetic replacements with custom redaction

```python Python wrap theme={"theme":"poimandres"}
# This code assumes that you have installed the Limina python client.
from privateai_client.post_processing import deidentify_text
from privateai_client.post_processing.processors import SyntheticReplacementProcessor
from privateai_client import PAIClient
from privateai_client.components import AnalyzeTextRequest

client = PAIClient(url="https://api.private-ai.com/community/v4/", api_key='<YOUR API KEY>')

text = [
    "Jenna is a 32 year old female diagnosed with asthma."
]

request = {
    "text": text,
    "locale": "en-US",
    "entity_detection": {"accuracy": "high"},
    "synthetic_replacements": {
        "accuracy": "standard_automatic",
        "entity_types": [
            {
                "type": "ENABLE",
                "value": [
                    "NAME", "NAME_GIVEN"
                ]
            }
        ]
    },
}

text_request = AnalyzeTextRequest.fromdict(request)
resp = client.analyze_text(text_request)

# THIS CUSTOM LOGIC IS DUPLICATED FROM THE AGE EXAMPLE
def redact_age(entity) -> str:
    """Round to the closest 10th"""

    if "formatted" in entity["analysis_result"]:
        age = entity["analysis_result"]["formatted"]
        return str(int(round(age * 10, -2) / 10))
    else:
        "#"

synthetic_processor = SyntheticReplacementProcessor()
entity_processors = {"NAME_GIVEN": synthetic_processor, "AGE": redact_age}

deidentified_text = deidentify_text(
    text,
    resp,
    entity_processors=entity_processors,
    default_processor=lambda entity: entity["text"],
)

for example in deidentified_text:
    print(example)
```

The output of running this code replaces names with synthetic values and buckets ages to the nearest multiple of ten.

```text Output wrap theme={"theme":"poimandres"}
Sarah is a 30 year old female diagnosed with asthma.
```

Note that synthetic data generation is non-deterministic, so each request may produce different replacement values. For more details on the `synthetic_replacements` request field and its configuration options, see the [analyze/text synthetic replacements guide](/configuration-and-operations/advanced-features/analyze-text#synthetic-replacements).
