In order to run the example code in this guide, please sign up for your free test api key here.
analyze/text route described below is an essential tool for exploring and structuring your data as well as creating statistics around your data. In this guide, we demonstrate how to use the analyze/text endpoint introduced in 4.1 to return the analysis results of the detected entities, with examples of how these results can be used to meet your own use cases.
Analyze entities in text (new in 4.1)
Theanalyze/text route returns a list of detected entities along with the formatted text for each entity and a description of its subtypes. In this guide, we provide payloads to the Limina’s analyze/text REST API route and document the associated responses.
To better illustrate how this information can be used, we proceed by giving a series of common use cases.
Validation and custom redaction of credit card numbers
Some numerical entities integrate a checksum in their values. This checksum is used to confirm the entity’s validity and to minimize the chance of error during transcription. This is the case for credit card numbers, which must satisfy the Luhn algorithm. Theanalyze/text route implements this algorithm on top of the NER model detection. This provides an additional safeguard by ensuring that the detected number is indeed a valid credit card number. Let’s look at three specific examples including credit card numbers.
text and entity_detection, that are shared by the analyze/text, the ner/text and the process/text routes. The text field contains the text to analyze and the entity_detection field contains the NER configurations (e.g., the list of entities to detect). One last field in the request, locale, is unique to the analyze/text request. The locale field is used as a hint to the analyzer to help parse dates and other locale-dependent entities. For example, setting locale to en-US will force the analyzer to interpret the date 12-10-2020 as December 10, 2020 instead of October 12, 2020. Several example of values that can take these fields are provided below.
The full response above is a mouthful, so let’s look at the first example’s response in more detail.
JSON Response with CREDIT_CARD entity
- the entity information including its text and its location. Those fields are shared across other routes including the
ner/textandprocess/textroutes and have the same use. - the formatted text of the entity. This field is unique to the
analyze/textroute and provides a “standard” format for the entity. This can facilitate the introduction of post-processing logic on detected entities. The formats are described in the following table.
| Entity Type | Format | Example |
|---|---|---|
| CREDIT_CARD | space-separated groups of 3 to 5 digits | 6578 7790 4346 2237 |
| DATE | ISO-8601 | 2025-03-20T18:00:00+00:00 |
| DOB | ISO-8601 | 2025-03-20 |
| AGE | decimal numeral | 12 |
| All other entity types | no formatting | - |
- a list of validation assertions on the entity, which is also unique to the
analyze/textroute. It contains a list of objects that are specific to the entity being detected. In this example, theprovideris the Luhn algorithm that was run on the credit card number and the result of the algorithm is provided as part of thestatusfield. Currently, only credit card numbers contain validation assertions but more assertion providers will be added in the future.
formatted field. However, although the number matches the credit card number format, the Luhn check failed on the number, so it is not a valid credit card number. This could be the result of a transcription error, for example.
The information included in the analysis result allows the creation of custom redaction of entities, using the post-processing framework, as shown in this section.
Date shifting and custom redaction of dates
Dates are one type of PII that is encountered in almost every dataset. Redaction is one way to ensure that sensitive dates do not create privacy issues. However, fully redacting dates often reduces the utility of the redacted data. For dates, it is often preferable to use other obfuscation methods that preserve their utility. Two well-known techniques are date shifting and date bucketing. Let’s consider three examples containing dates.Response body with formatted date entities
analysis_result object. First, it is possible to access the formatted date “2018-07-10T00:00:00” from the field analysis_result.formatted. If you plan to implement logic on the dates found in the text, it might be easier to access the formatted dates rather than the original, non-standard date formats (e.g., “July 10 2018”).
Also, it is possible to directly access the day, month, and year of the date entity via the response fields in analysis_result.subtypes. This information can be used to partially redact or to bucketize dates. An example of redacting the day and month but keeping the year is provided in the custom redaction of dates guide.
Age bucketing and custom redaction of numbers
Similar to dates, it is possible to analyze ages and other numerical entities to create custom redaction. Consider these two examples.analyze/text response to bucketize ages, as shown here.
Custom redaction of addresses
The GDPR and other privacy legislations impose strict requirements regarding the redaction of addresses. In the following scenario, we demonstrate how to partially redact an address by leaving only the less sensitive characters of a zip/postal code and removing all other address information (e.g., civic number, street name, and so on).analyze/text response contains the result of the analysis. This response, along with the corresponding Limina client post-processing code, can be used to mask street addresses, in order to hide the most sensitive information.
Relation detection
Relation detection refers to the broader natural language processing (NLP) capability of understanding how entities in a text are connected. While entity recognition tells us what the entities are (e.g., a person’s name, a company, a location), relation detection tells us how those entities are related. Relation detection covers tasks like coreference resolution and relation extraction, both of which are supported, and together provide a deeper understanding of unstructured text. Theanalyze/text route can be used to configure relation detection by using the optional relation_detection field in the request.
Coreference Resolution
Coreference resolution is the task of identifying different entity mentions in a given text that refer to the same real-world entity. Therelation_detection field offers a configurable option for coreference resolution:
- coreference_resolution: Specifies the method for identifying coreferential entities:
heuristics: Uses rule-based methodsmodel_prediction: Uses machine learning modelscombined: Uses both approaches
- coreference_id: A unique identifier added to each entity that groups coreferential entities under a common label. This behavior matches the
/process/textendpoint whenprocessed_textis set to MARKER and coreference resolution is applied. For example, “Nikola Jokić”, “Никола Јокић”, “nǐkola jôkitɕ”, and “Jokić” all share the samecoreference_id(56c15276-33da-4726-bc81-369074049222), indicating that they refer to the same person.
Relation Extraction
Relation extraction is the task of identifying meaningful relations between entities in text, such as person-to-person or person-to-location links. It helps unlock document-level understanding by connecting pieces of information and making it easier to de-identify related data. Let’s look at an example:Relation Extraction Sample Text
- Nessa Jonsson (
NAME) - Sweden (
LOCATION_COUNTRY) - Erik (
NAME_GIVEN) - the United States (
LOCATION_COUNTRY) - 1980 (
DATE_INTERVAL)
- Nessa Jonsson is born in Sweden
- Nessa Jonsson is the daughter of Erik
- Erik is the father of Nessa Jonsson
- Erik lived in the United States
- Erik died in 1980
Limina and Relation Extraction (Beta)
Limina’s de-identification service offers the ability to use relation extraction on itsanalyze/text endpoint. Relation extraction is currently implemented on top of both the named entity recognition (NER) and the coreference resolution models. It is, therefore, limited to predicting relations between clusters of coreferenced entities.
Currently, the system supports a single generalized relation type: RELATED_TO, which is used to capture all of the supported semantic relations between a person and another entity:
- Kinship - a relation between two
NAMEs (or other variants, e.g.NAME_GIVEN) indicating family or close personal relationships between individuals. These may include parent-child, siblings, spouses, etc. A kinship relation is always bi-directional. - Place of birth - a relation between
NAMEandLOCATIONentities, indicating the location where the person was born. This can refer to a city, state, country, or region. - Citizenship - a relation between
NAMEandLOCATIONorORIGINentities, indicating nationality or legal citizenship of the person. - Origin - a relation between
NAMEandORIGINentities, indicating the country a person originally comes from, reflecting ancestry or cultural background rather than legal status or birthplace. - Date of birth - a relation between
NAMEandDOBentities, indicating birthdate. - Date of death - a relation between
NAMEandDATEorDATE_INTERVALentities, indicating the date of death of a person.
- Nessa Jonsson →
RELATED_TO→ Sweden - Nessa Jonsson →
RELATED_TO→ Erik - Erik →
RELATED_TO→ Nessa Jonsson
analyze/text endpoint by setting the field enable_relation_extraction to true.
enable_relation_extraction: Controls whether relation extraction is performed during analysis.true: Enables relation extractionfalse(default): Disables relation extraction
Relation Extraction and Coreference ResolutionRelation extraction relies on coreference resolution to group people mentions in text. Make sure a non-null value is set for
coreference_resolution before setting enable_relation_extraction to true.analyze/text endpoint. Notice the enable_relation_extraction field within the relation_detection object.
relations: A list of extracted relations involving the entity. Each relation object includes:coreference_id: The ID of the related entity from thecoreference_idfield of another entity in the response.label: The type of relation detected. Currently, only one relation is supported, the genericRELATED_TOrelation.
Limitations
The relation extraction model is provided as an experimental feature and is not intended for production use.
It currently supports English text and is constrained to inputs of up to 1024 tokens. Any text beyond this limit will be ignored during processing.
Relation predictions may be inaccurate or missed, particularly in complex contexts where related entities occur far apart within the text.