Installation
The Python client is available for download on pypi.org or with pip:Pip Command
Quickstart
Python Client
Output
Working with the Client
Initializing the Client for self-hosted container
The Limina client requires a scheme, host, and optional port to initialize. Alternatively, a full url can be used. Once created, the connection can be tested with the client’sping function
Python Client
Output
Note: The container is hosted with your provisioned application license and does not manage authentication to the API or authorization of API requests. Access to the container is at the discretion of the user. For recommendations on how to deploy in an enterprise context including authorized use, please contact us.
Initializing the Client for our cloud-API offering
To access the cloud API, you need to authenticate with your API key. You can get one from the customer portal.Python Client
Output
Making Requests
Once initialized the client can be used to make any request listed in the [API documentation][/latest/process-text] Available requests:| Client Function | Endpoint |
|---|---|
get_version() | / |
ping() | /healthz |
get_metrics() | /metrics |
get_diagnostics() | /diagnostics |
ner_text() | /ner/text |
process_text() | /process/text |
analyze_text() | /analyze/text |
process_files_uri() | /process/files/uri |
process_files_base64() | /process/files/base64 |
bleep() | /bleep |
Python Client
Output
Python Client
Output
Request Objects
Request objects are a simple way of creating request bodies without the tediousness of writing dictionaries. Every POST request (as listed in the [Limina API documentation][/latest/process-text] ) has its own request own request object.Python Client
Output
Python Client
Output
Building Request Objects
Request objects can initialized by passing in all the required values needed for the request as arguments or from a dictionary, using the object’sfromdict() function:
Python Client
to_dict() function:
Output
Sample Use
Processing a directory of files with URI route
Python Client
Processing a file with Base64 route
Python Client
Bleep an audio file
Python Client
Analyze Text Post-Processing
Theanalyze/text route returns rich, structured detections you can post-process with the Limina Python client. It is a route specifically developed for text understanding. For more details on its capabilities, refer to the analyze/text documentation. In this section, we describe how the Python client can be used to post-process the analyze text response. The Python client provides utilities to iterate through detected entities and apply transformation rules, such as masking, pseudonymizing, validating, or normalizing values.
The following example introduces the required pieces for post-processing, which we describe in detail.
Python Client
Output
deidentify_text function which allows for entity replacements by invoking various entity processors. Each processor defines the exact behavior for a given entity type, making it easy to implement custom redaction tailored to your use case.
The function deidentify_text(...) takes the original texts plus the analyze/text response, walks through every detected entity in left-to-right order, and replaces each entity span using the appropriate processor. It also automatically adjusts the character offsets of the entity locations after their replacements.
Python Client
text- The original list of text messages that were passed intoPAIClient.analyze_text()response- The structured response returned by theanalyze_textcallentity_processors- Mapping of entity type to entity processor, e.g.{"DATE": redact_date, "CREDIT_CARD": redact_credit_card}- Each processor is a callable that accepts an entity dictionary and returns the replacement string for that entity.
- Invoked when the entity
best_labelmatches a key in this dictionary.
default_processor- A fallback processor applied to all entity types not explicitly listed inentity_processors. This ensures every entity is handled, even if you only configure custom processors for subset of the enabled entities.
Entity Processors
The processors are callables (Callable[[dict], str]) that take a detected entity dictionary and return the replacement text for that span. It can be as simple as a function, or a class which implements the __call__ method. In the example above we created the AgeBucketEntityProcessor, which puts the entity AGE into a bucket.
The potential use cases are broad. A few common examples include:
- Hide all but the last 4 digits in a
CREDIT_CARDnumber; - Keep only the year in a
DATEentity; - Shift all dates by an offset in a
DATEentity; - Replace names with initials only;
- Preserve email domain, mask the username in an
EMAIL_ADDRESSentity; - Leave only the less sensitive characters in a
LOCATION_ZIPcode; - Redact entities based on fuzzy similarity to a list of identifiable terms;
Built-in processors
In addition to writing your own processors, the client ships with three built-in entity processors, with more planned in future releases:MaskEntityProcessorandMarkerEntityProcessor- intended to be used for default processing.FuzzyMatchEntityProcessor- configurable processor that matches entities against a list of known words using Damerau–Levenshtein distance. It can automatically catch misspellings or near-duplicates, and be set to allow or block specific entities while doing the opposite for all others of the same type. A complete example is provided below.
Custom redaction of credit card numbers
Python Client
redact_credit_card function contains the necessary logic to redact credit card numbers as follows:
- if the credit card number is valid, hide it except for the last four characters (which could include spaces).
- if the credit card number is parsed correctly but it fails the Luhn check it means that the number is invalid. In this case, don’t hide the number and add an INVALID tag after the number. This could be used to more easily identify invalid credit card numbers in text for a later review.
- if the number fails to parse as a credit card number then do nothing. This code is assuming that this is not a credit card number.
Output
800 678-457-7896 entity was left unredacted as expected. This entity is possibly a phone number and not a credit card number. Finally, the last line shows several examples of valid credit card numbers and a single invalid one. The valid credit card numbers were masked except for their last characters as expected.
Custom redaction of dates
Python Client
Output
Python Client
Output
Custom redaction of ages
Python Client
Output
Custom redaction of locations
Python Client
Output
Custom redaction of coreferenced names
Python Client
Output
FuzzyMatchEntityProcessor in more depth.
Fuzzy matching against list of known words
Python Client
Output