日本語
This guide relies on Limina’s cloud API. Please sign up for a free API key to run the code examples.Alternatively if using the container instead of the cloud API please follow the container quickstart first and adjust the API endpoint in each example as required.
Basic Use
Theprocess/text endpoint accepts a list of text strings and replaces each piece of PII found with a redaction marker. A simple request looks like this:
processed_text, the redacted, masked or synthetic text as defined byprocessed_textin the inputentities, a list of each PII found, which is useful for PII detection and NER (Named Entity Recognition)
Response
Processing Related Examples
If the list of strings is related, please setlink_batch like this:
My phone number is 2345435 instead of My phone number is and separately 2345435, allowing the phone number to be identified correctly:
Customizing Entity Detection With Selective Redaction
The above example identifies and removes all non-beta entity types. The types of PII that are identified can be customized using Entity Selectors. For example, to only redact the SSN:Adding Allow & Block Lists via Regexes
It is also possible to customize PII detection and de-identification/redaction via regex-based Filters, allowing for custom behaviour on specific entity types such as employee IDs, internal database IDs, and other data unique a company. Below is an example demonstrating how to combine the Entity Selectors presented above with Filters to provide fine-grained control & customization. In this hypothetical HR claim scenario, an employee has a medical injury and requires accommodation. Here, we demonstrate:- Two regex-based block filters that define custom entity types for employee IDs and business units, overriding Limina’s default entity types.
- Disabling injury, which could be important information for an insurance claim that the employer might have to make.
- We also see that the
textelement in the payload is a list, as you would expect from a conversational use case. In this case, we want to ensure that we keep the context of redactions across an entire thread of conversations by settinglink_batchtotrue. - Disabling numbering of redaction markers.
Redacted Text
Generating Synthetic Entities (Beta)
In addition to replacing PII entities with redaction markers, tokens and mask characters, Limina can generate fake or synthetic replacements for each entity. This is done using an ML-based approach that produces realistic examples that fit the context of the surrounding text. This has a number of advantages:- Unlike other synthetic data generators which generate completely new data, data with synthetic PII is mostly the original data. This minimises the chance that the synthetic data generator introduces biases into the data, maximizing the utility for downstream tasks like sentiment analysis.
- Our PII detection engine leads the market, although it isn’t perfect. Synthetic PII ensures that any PII detection misses are hidden amongst realistic, fake PII, providing a higher level of protection against re-identification.
- Less impact on downstream ML systems: Synthetic entities look more like natural text than redaction markers or hashing.
processed_text object in the API request to have a marker type of SYNTHETIC.