Concurrency and Batching
The following code examples outline how to increase throughput when using Limina.| Title | Description |
|---|---|
| AsyncIO | How to use Python’s AsyncIO library to make concurrent calls to Limina |
| Threading | How to use Python’s threading and concurrent.futures libraries to make concurrent calls to Limina |
| Batch Requests | How to process multiple inputs in a single API call using Python |
Context
Limina relies on Machine Learning to detect PII based on context, instead of pattern matching approaches such as regular expressions. Therefore, for best performance it is advisable to send text through in the largest possible chunks that still meet latency requirements. For example, the following chat log should be sent through in one call withlink_batch enabled, as opposed to line-by-line:
Text Example
Capitalization
The PII detection models are optimised for normal English capitalization, e.g."Robert is from Sydney, Australia. Muhab is from Wales". If this is not the case for your data, please contact Limina so that we can provide you with the optimal model for your use case. Our solution will still work, but some performance will be lost.
This being said, Limina is optimized for processing text with emojis and text containing ASR transcription errors.
ASR Transcripts
When processing audio transcripts, it is recommended to use the following input format:Text Example
Model Tuning
Limina’s PII detection models generally perform well out of the box. However, model tuning may be beneficial in order to tailor our solution to unique use cases. The tuning process is outlined below.1. Client-supplied Data Sample
Prepare a sample of at least 20 examples illustrating the problem to be addressed. See the table at the bottom of this page for examples. The sample should contain examples that are:- anonymized such that all PII is manually replaced with synthetic entities
- long enough to provide enough context, ideally including ~30 words before and after the problematic entity
- diverse enough to be representative of the issues encountered
2. Secure Data Transfer
Data samples can be shared with Limina via a secure transfer mechanism built on Microsoft Azure. Contact us for an access token.3. Manual Data Anonymization
Limina’s data team will again manually anonymize the provided data, to ensure that no personal data is ever stored or used for training.4. Model Tuning & Delivery
Using few-shot learning techniques, Limina improves the PII Detection models by optimizing for your use case. Updated models are released via a new container version. Updates are usually delivered in the next regular release, but can be delivered via a patch release in as little as 72 hours, depending on the SLA and severity of the problem.Example Data Samples
| Potential Deidentification Issue* | Illustrative Example |
|---|---|
| CVV number not redacted | Okay. I'm just pulling it up. All right, I can go ahead and take that card number whenever you're ready. Okay, card number 4622-6542-1425-3511. All right, and the expiration date? 0226. And the three digits on the back. Six. 25. Thank you. I'll be charging your card for 243. Let me see what that amount was. 243 16. Would you like your receipt emailed? |
| DOSE entities not redacted | Blood test done, results normal Medications Norvasc (AMLODIPINE 10MG, 1 Tablet(s) PO OD Catapres (CLONIDINE) 75MG, 1 Tablet(s) SL PRN Hydrochlorothiazide 25 MG PO PRN Aspirin 81 MG PO OD Cenolate (ASCORBIC ACID) 500MG, 1 Tablet(s) PO OD Allergies No known allergies Physical Exam Vital signs 200/130 135bpm |
| DOB misclassified as DATE | Ms. Richmond, to verify your account can I get your date of birth and your phone number, please? Absolutely, yes. So my number is 907 563 2834 and then February 14, 1990. Okay, February 14, perfect, and it'll just be a moment while I access your file here. |