Skip to main content

Concurrency and Batching

The following code examples outline how to increase throughput when using Limina.
TitleDescription
AsyncIOHow to use Python’s AsyncIO library to make concurrent calls to Limina
ThreadingHow to use Python’s threading and concurrent.futures libraries to make concurrent calls to Limina
Batch RequestsHow to process multiple inputs in a single API call using Python
Please visit the recommended concurrency levels page to set the number of concurrent requests optimally.

Context

Limina relies on Machine Learning to detect PII based on context, instead of pattern matching approaches such as regular expressions. Therefore, for best performance it is advisable to send text through in the largest possible chunks that still meet latency requirements. For example, the following chat log should be sent through in one call with link_batch enabled, as opposed to line-by-line:
Text Example
"Hi John, how are you?"

"I'm good thanks"

"Great, hope Atlanta is treating you well"
The Batch Requests code example provided above shows how to implement this. Similarly, text documents should be sent through in a single request, rather than by paragraph or sentence. In addition to improving accuracy, this will minimize the number of API calls made.

Capitalization

The PII detection models are optimised for normal English capitalization, e.g. "Robert is from Sydney, Australia. Muhab is from Wales". If this is not the case for your data, please contact Limina so that we can provide you with the optimal model for your use case. Our solution will still work, but some performance will be lost. This being said, Limina is optimized for processing text with emojis and text containing ASR transcription errors.

ASR Transcripts

When processing audio transcripts, it is recommended to use the following input format:
Text Example
"<speaker id>: <message>, <speaker id>: <message>,"

Model Tuning

Limina’s PII detection models generally perform well out of the box. However, model tuning may be beneficial in order to tailor our solution to unique use cases. The tuning process is outlined below.

1. Client-supplied Data Sample

Prepare a sample of at least 20 examples illustrating the problem to be addressed. See the table at the bottom of this page for examples. The sample should contain examples that are:
  • anonymized such that all PII is manually replaced with synthetic entities
  • long enough to provide enough context, ideally including ~30 words before and after the problematic entity
  • diverse enough to be representative of the issues encountered

2. Secure Data Transfer

Data samples can be shared with Limina via a secure transfer mechanism built on Microsoft Azure. Contact us for an access token.

3. Manual Data Anonymization

Limina’s data team will again manually anonymize the provided data, to ensure that no personal data is ever stored or used for training.

4. Model Tuning & Delivery

Using few-shot learning techniques, Limina improves the PII Detection models by optimizing for your use case. Updated models are released via a new container version. Updates are usually delivered in the next regular release, but can be delivered via a patch release in as little as 72 hours, depending on the SLA and severity of the problem.

Example Data Samples

Potential Deidentification Issue*Illustrative Example
CVV number not redactedOkay. I'm just pulling it up. All right, I can go ahead and take that card number whenever you're ready. Okay, card number 4622-6542-1425-3511. All right, and the expiration date? 0226. And the three digits on the back. Six. 25. Thank you. I'll be charging your card for 243. Let me see what that amount was. 243 16. Would you like your receipt emailed?
DOSE entities not redactedBlood test done, results normal Medications Norvasc (AMLODIPINE 10MG, 1 Tablet(s) PO OD Catapres (CLONIDINE) 75MG, 1 Tablet(s) SL PRN Hydrochlorothiazide 25 MG PO PRN Aspirin 81 MG PO OD Cenolate (ASCORBIC ACID) 500MG, 1 Tablet(s) PO OD Allergies No known allergies Physical Exam Vital signs 200/130 135bpm
DOB misclassified as DATEMs. Richmond, to verify your account can I get your date of birth and your phone number, please? Absolutely, yes. So my number is 907 563 2834 and then February 14, 1990. Okay, February 14, perfect, and it'll just be a moment while I access your file here.
*Note that these examples are used only for illustration. Limina correctly picks up the PII in each of these cases.