> ## Documentation Index
> Fetch the complete documentation index at: https://docs.getlimina.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Detect, Parse, and Validate Entities in Text

> This guide explains how to use Limina to analyze your text by detecting, parsing, and validating PII in text

<Info>
  In order to run the example code in this guide, please sign up for your [free test API key here](https://portal.getlimina.ai/?utm_source=docs\&utm_medium=website\&utm_campaign=guides\&utm_content=guides).
</Info>

In addition to de-identification and redaction, Limina also supports entity detection and validation. The [`analyze/text`](/latest/analyze-text) route described below is an essential tool for exploring and structuring your data as well as creating statistics around your data. In this guide, we demonstrate how to use the [`analyze/text`](/latest/analyze-text) endpoint introduced in `4.1` to return the analysis results of the detected entities, with examples of how these results can be used to meet your own use cases.

## Analyze entities in text <Badge color="blue">(new in 4.1)</Badge>

The [`analyze/text`](/latest/analyze-text) route returns a list of detected entities along with the formatted text for each entity and a description of its subtypes. In this guide, we provide payloads to the Limina's [`analyze/text`](/latest/analyze-text) REST API route and document the associated responses.

To better illustrate how this information can be used, we proceed by giving a series of common use cases.

### Validation and custom redaction of credit card numbers

Some numerical entities integrate a checksum in their values. This checksum is used to confirm the entity's validity and to minimize the chance of error during transcription. This is the case for credit card numbers, which must satisfy the [Luhn algorithm](https://en.wikipedia.org/wiki/Luhn_algorithm). The [`analyze/text`](/latest/analyze-text) route implements this algorithm on top of the NER model detection. This provides an additional safeguard by ensuring that the detected number is indeed a valid credit card number. Let's look at three specific examples including credit card numbers.

<CodeGroup>
  ```json Request Body wrap lines theme={"theme":"poimandres"}
  {
    "text": [
      "Okay, hang on just a second because I got to get it. Okay, it is 6578-7790-4346-2237. Expiration. 1224.",
      "All right, I'm ready. 800 678-457-7896. Expiration is one. 224.",
      "CC_type: Diners Club International RuPay Visa JCB Amex CCN: 30569309025904 4242424242424242 4222222222222 6172873484776530 378282246310005 CC_CVC: 480 902 182 765 143 CC_Expiredate: 5/28 6/67 12/67 11/29 9/70"
    ],
    "locale": "en-US",
    "entity_detection": {
      "accuracy": "high",
      "entity_types": [
        {
          "type": "ENABLE",
          "value": ["CREDIT_CARD"]
        }
      ]
    }
  }
  ```

  ```json Response Body wrap lines theme={"theme":"poimandres"}
  [
    {
      "entities": [
        {
          "text": "6578-7790-4346-2237",
          "location": {
            "stt_idx": 65,
            "end_idx": 84
          },
          "best_label": "CREDIT_CARD",
          "labels": {
            "CREDIT_CARD": 0.9022786834023215
          },
          "analysis_result": {
            "formatted": "6578 7790 4346 2237",
            "subtypes": [],
            "validation_assertions": [
              {
                "provider": "luhn",
                "status": "invalid"
              }
            ]
          }
        }
      ],
      "entities_present": true,
      "characters_processed": 103,
      "languages_detected": {
        "en": 0.9202778935432434
      }
    },
    {
      "entities": [
        {
          "text": "800 678-457-7896",
          "location": {
            "stt_idx": 22,
            "end_idx": 40
          },
          "best_label": "CREDIT_CARD",
          "labels": {
            "CREDIT_CARD": 0.9012922777069939
          },
          "analysis_result": {
            "subtypes": [],
            "validation_assertions": []
          }
        }
      ],
      "entities_present": true,
      "characters_processed": 65,
      "languages_detected": {
        "en": 0.8164065480232239
      }
    },
    {
      "entities": [
        {
          "text": "30569309025904",
          "location": {
            "stt_idx": 60,
            "end_idx": 74
          },
          "best_label": "CREDIT_CARD",
          "labels": {
            "CREDIT_CARD": 1.0
          },
          "analysis_result": {
            "formatted": "3056 9309 025 904",
            "subtypes": [],
            "validation_assertions": [
              {
                "provider": "luhn",
                "status": "valid"
              }
            ]
          }
        },
        {
          "text": "4242424242424242",
          "location": {
            "stt_idx": 75,
            "end_idx": 91
          },
          "best_label": "CREDIT_CARD",
          "labels": {
            "CREDIT_CARD": 1.0
          },
          "analysis_result": {
            "formatted": "4242 4242 4242 4242",
            "subtypes": [],
            "validation_assertions": [
              {
                "provider": "luhn",
                "status": "valid"
              }
            ]
          }
        },
        {
          "text": "4222222222222",
          "location": {
            "stt_idx": 92,
            "end_idx": 105
          },
          "best_label": "CREDIT_CARD",
          "labels": {
            "CREDIT_CARD": 1.0
          },
          "analysis_result": {
            "formatted": "4222 222 222 222",
            "subtypes": [],
            "validation_assertions": [
              {
                "provider": "luhn",
                "status": "valid"
              }
            ]
          }
        },
        {
          "text": "6172873484776530",
          "location": {
            "stt_idx": 106,
            "end_idx": 122
          },
          "best_label": "CREDIT_CARD",
          "labels": {
            "CREDIT_CARD": 0.9088553956576756
          },
          "analysis_result": {
            "formatted": "6172 8734 8477 6530",
            "subtypes": [],
            "validation_assertions": [
              {
                "provider": "luhn",
                "status": "invalid"
              }
            ]
          }
        },
        {
          "text": "378282246310005",
          "location": {
            "stt_idx": 123,
            "end_idx": 138
          },
          "best_label": "CREDIT_CARD",
          "labels": {
            "CREDIT_CARD": 1.0
          },
          "analysis_result": {
            "formatted": "3782 8224 6310 005",
            "subtypes": [],
            "validation_assertions": [
              {
                "provider": "luhn",
                "status": "valid"
              }
            ]
          }
        }
      ],
      "entities_present": true,
      "characters_processed": 208,
      "languages_detected": {
        "en": 0.24319741129875183
      }
    }
  ]
  ```
</CodeGroup>

The above request contains two fields, `text` and `entity_detection`, that are shared by the [`analyze/text`](/latest/analyze-text), the [`ner/text`](/latest/ner-text) and the [`process/text`](/latest/process-text) routes. The `text` field contains the text to analyze and the `entity_detection` field contains the NER configurations (e.g., the list of entities to detect).  One last field in the request, `locale`, is unique to the [`analyze/text`](/latest/analyze-text) request. The `locale` field is used as a hint to the analyzer to help parse dates and other locale-dependent entities. For example, setting `locale` to `en-US` will force the analyzer to interpret the date `12-10-2020` as December 10, 2020 instead of October 12, 2020. Several example of values that can take these fields are provided below.

The full response above is a mouthful, so let's look at the first example's response in more detail.

```json JSON Response with CREDIT_CARD entity wrap lines highlight={13-22} theme={"theme":"poimandres"}
{
  "entities": [
    {
      "text": "6578-7790-4346-2237",
      "location": {
        "stt_idx": 65,
        "end_idx": 84
      },
      "best_label": "CREDIT_CARD",
      "labels": {
        "CREDIT_CARD": 0.9022786834023215
      },
      "analysis_result": {
        "formatted": "6578 7790 4346 2237",
        "subtypes": [],
        "validation_assertions": [
          {
            "provider": "luhn",
            "status": "invalid"
          }
        ]
      }
    }
  ],
  "entities_present": true,
  "characters_processed": 103,
  "languages_detected": {
    "en": 0.9202778935432434
  }
}
```

The response contains three main parts:

* the entity information including its **text** and its **location**. Those fields are shared across other routes including the [`ner/text`](/latest/ner-text) and [`process/text`](/latest/process-text) routes and have the same use.
* the **formatted** text of the entity. This field is unique to the [`analyze/text`](/latest/analyze-text) route and provides a "standard" format for the entity. This can facilitate the introduction of post-processing logic on detected entities. The formats are described in the following table.

| Entity Type            | Format                                  | Example                   |
| ---------------------- | --------------------------------------- | ------------------------- |
| CREDIT\_CARD           | space-separated groups of 3 to 5 digits | 6578 7790 4346 2237       |
| DATE                   | ISO-8601                                | 2025-03-20T18:00:00+00:00 |
| DOB                    | ISO-8601                                | 2025-03-20                |
| AGE                    | decimal numeral                         | 12                        |
| All other entity types | no formatting                           | -                         |

* a list of **validation assertions** on the entity, which is also unique to the [`analyze/text`](/latest/analyze-text) route. It contains a list of objects that are specific to the entity being detected. In this example, the `provider` is the Luhn algorithm that was run on the credit card number and the result of the algorithm is provided as part of the `status` field. Currently, only credit card numbers contain validation assertions but more assertion providers will be added in the future.

The analysis result of this first example can be summed up in the following way. The credit card was successfully parsed and the parsed result is placed in the `formatted` field. However, although the number matches the credit card number format, the Luhn check failed on the number, so it is not a valid credit card number. This could be the result of a transcription error, for example.

The information included in the analysis result allows the creation of custom redaction of entities, using the post-processing framework, as shown in [this section](/product-guides/thin-client#custom-redaction-of-credit-card-numbers).

### Date shifting and custom redaction of dates

Dates are one type of PII that is encountered in almost every dataset. Redaction is one way to ensure that sensitive dates do not create privacy issues. However, fully redacting dates often reduces the utility of the redacted data. For dates, it is often preferable to use other obfuscation methods that preserve their utility. Two well-known techniques are date shifting and date bucketing. Let's consider three examples containing dates.

<CodeGroup>
  ```json Request Body wrap lines theme={"theme":"poimandres"}
  {
    "text": [
      "$MDT $MRK $QRVO $TSS &amp; 5 more stock picks for LONG swings:  https://t.co/CbkieXxqoR (July 10 2018) https://t.co/eit53RUY4g",
      "Short sale volume (not short interest) for $KBE on 2018-07-09 is 42%. https://t.co/7pWbgjJ8Ag $FOXA 38% $TVIX 34% $LITE 54% $HIG 60%",
      "$WLTW high OI range is 160 to 155 for option expiration 07/20/2018 #options https://t.co/BnVElKBKkJ"
    ],
    "locale": "en-US",
    "entity_detection": {
      "entity_types": [
        {
          "type": "ENABLE",
          "value": ["DATE", "DOB", "DAY", "MONTH", "YEAR"]
        }
      ]
    }
  }
  ```

  ```json Response Body wrap lines theme={"theme":"poimandres"}
  [
    {
      "entities": [
        {
          "text": "July 10 2018",
          "location": {
            "stt_idx": 89,
            "end_idx": 101
          },
          "best_label": "DATE",
          "labels": {
            "DATE": 0.9400081038475037,
            "MONTH": 0.3111259341239929,
            "DAY": 0.31207050879796344,
            "YEAR": 0.29245950778325397
          },
          "analysis_result": {
            "formatted": "2018-07-10T00:00:00",
            "subtypes": [
              {
                "text": "10",
                "formatted": "10",
                "label": "DAY",
                "location": {
                  "stt_idx": 94,
                  "end_idx": 96
                }
              },
              {
                "text": "July",
                "formatted": "7",
                "label": "MONTH",
                "location": {
                  "stt_idx": 89,
                  "end_idx": 93
                }
              },
              {
                "text": "2018",
                "formatted": "2018",
                "label": "YEAR",
                "location": {
                  "stt_idx": 97,
                  "end_idx": 101
                }
              }
            ],
            "validation_assertions": []
          }
        }
      ],
      "entities_present": true,
      "characters_processed": 126,
      "languages_detected": {
        "en": 0.6427053809165955
      }
    },
    {
      "entities": [
        {
          "text": "2018-07-09",
          "location": {
            "stt_idx": 51,
            "end_idx": 61
          },
          "best_label": "DATE",
          "labels": {
            "DATE": 0.9267139077186585,
            "YEAR": 0.17909334897994994,
            "MONTH": 0.18299812078475952,
            "DAY": 0.18503443002700806
          },
          "analysis_result": {
            "formatted": "2018-07-09T00:00:00",
            "subtypes": [
              {
                "text": "09",
                "formatted": "9",
                "label": "DAY",
                "location": {
                  "stt_idx": 59,
                  "end_idx": 61
                }
              },
              {
                "text": "07",
                "formatted": "7",
                "label": "MONTH",
                "location": {
                  "stt_idx": 56,
                  "end_idx": 58
                }
              },
              {
                "text": "2018",
                "formatted": "2018",
                "label": "YEAR",
                "location": {
                  "stt_idx": 51,
                  "end_idx": 55
                }
              }
            ],
            "validation_assertions": []
          }
        }
      ],
      "entities_present": true,
      "characters_processed": 132,
      "languages_detected": {
        "en": 0.5451536178588867
      }
    },
    {
      "entities": [
        {
          "text": "07/20/2018",
          "location": {
            "stt_idx": 56,
            "end_idx": 66
          },
          "best_label": "DATE",
          "labels": {
            "DATE": 0.9359936833381652,
            "MONTH": 0.18900736570358276,
            "DAY": 0.18550281524658202,
            "YEAR": 0.18460171222686766
          },
          "analysis_result": {
            "formatted": "2018-07-20T00:00:00",
            "subtypes": [
              {
                "text": "20",
                "formatted": "20",
                "label": "DAY",
                "location": {
                  "stt_idx": 59,
                  "end_idx": 61
                }
              },
              {
                "text": "07",
                "formatted": "7",
                "label": "MONTH",
                "location": {
                  "stt_idx": 56,
                  "end_idx": 58
                }
              },
              {
                "text": "2018",
                "formatted": "2018",
                "label": "YEAR",
                "location": {
                  "stt_idx": 62,
                  "end_idx": 66
                }
              }
            ],
            "validation_assertions": []
          }
        }
      ],
      "entities_present": true,
      "characters_processed": 99,
      "languages_detected": {
        "en": 0.7047932744026184
      }
    }
  ]
  ```
</CodeGroup>

Let's look at one specific date entity in the above response.

```json Response body with formatted date entities lines highlight={14-45} theme={"theme":"poimandres"}
{
  "text": "July 10 2018",
  "location": {
    "stt_idx": 89,
    "end_idx": 101
  },
  "best_label": "DATE",
  "labels": {
    "DATE": 0.9400081038475037,
    "MONTH": 0.3111259341239929,
    "DAY": 0.31207050879796344,
    "YEAR": 0.29245950778325397
  },
  "analysis_result": {
    "formatted": "2018-07-10T00:00:00",
    "subtypes": [
      {
        "text": "10",
        "formatted": "10",
        "label": "DAY",
        "location": {
          "stt_idx": 94,
          "end_idx": 96
        }
      },
      {
        "text": "July",
        "formatted": "7",
        "label": "MONTH",
        "location": {
          "stt_idx": 89,
          "end_idx": 93
        }
      },
      {
        "text": "2018",
        "formatted": "2018",
        "label": "YEAR",
        "location": {
          "stt_idx": 97,
          "end_idx": 101
        }
      }
    ],
    "validation_assertions": []
  }
}
```

Many pieces of information are accessible from the `analysis_result` object. First, it is possible to access the formatted date "2018-07-10T00:00:00" from the field `analysis_result.formatted`. If you plan to implement logic on the dates found in the text, it might be easier to access the formatted dates rather than the original, non-standard date formats (e.g., "July 10 2018").

Also, it is possible to directly access the day, month, and year of the date entity via the response fields in `analysis_result.subtypes`. This information can be used to partially redact or to bucket dates. \
An example of redacting the day and month but keeping the year is provided in the [custom redaction of dates guide](/product-guides/thin-client#custom-redaction-of-dates).

### Age bucketing and custom redaction of numbers

Similar to dates, it is possible to analyze ages and other numerical entities to create custom redaction. Consider these two examples.

<CodeGroup>
  ```json Request Body wrap lines theme={"theme":"poimandres"}
  {
    "text": [
      "A 32-year old Black female German citizen living in Germany wants to travel to the United States for leisure.",
      "West Point Public School division provides school-based preschool services for children from two through nine years of age who are children at risk and children with identified disabilities or delays."
    ],
    "link_batch": false,
    "locale": "en-US",
    "entity_detection": {
      "entity_types": [
        {
          "type": "ENABLE",
          "value": ["AGE"]
        }
      ]
    }
  }
  ```

  ```json Response Body wrap lines theme={"theme":"poimandres"}
  [
    {
      "entities": [
        {
          "text": "32",
          "location": {
            "stt_idx": 2,
            "end_idx": 4
          },
          "best_label": "AGE",
          "labels": {
            "AGE": 0.9668179750442505
          },
          "analysis_result": {
            "formatted": 32,
            "subtypes": [],
            "validation_assertions": []
          }
        }
      ],
      "entities_present": true,
      "characters_processed": 109,
      "languages_detected": {
        "en": 0.9611877202987671
      }
    },
    {
      "entities": [
        {
          "text": "two",
          "location": {
            "stt_idx": 93,
            "end_idx": 96
          },
          "best_label": "AGE",
          "labels": {
            "AGE": 0.9462096095085144
          },
          "analysis_result": {
            "formatted": 2,
            "subtypes": [],
            "validation_assertions": []
          }
        },
        {
          "text": "nine",
          "location": {
            "stt_idx": 105,
            "end_idx": 109
          },
          "best_label": "AGE",
          "labels": {
            "AGE": 0.9411536455154419
          },
          "analysis_result": {
            "formatted": 9,
            "subtypes": [],
            "validation_assertions": []
          }
        }
      ],
      "entities_present": true,
      "characters_processed": 200,
      "languages_detected": {
        "en": 0.9786704778671265
      }
    }
  ]
  ```
</CodeGroup>

Using the [Limina python client](https://pypi.org/project/privateai-client/), one can use the above [`analyze/text`](/latest/analyze-text) response to bucketize ages, as shown [here](/product-guides/thin-client#custom-redaction-of-ages).

### Custom redaction of addresses

The GDPR and other privacy legislations impose strict requirements regarding the redaction of addresses. In the following scenario, we demonstrate how to partially redact an address by leaving only the less sensitive characters of a zip/postal code and removing all other address information (e.g., civic number, street name, and so on).

<CodeGroup>
  ```json Request Body wrap lines theme={"theme":"poimandres"}
  {
    "text": [
      "Please deliver this to 45, Clybaun Heights, Galway City, Ireland H91 AKK3",
      "3255 M-A-D-D-A-M-S street, huntington, west virginia is his birthplace",
      "My favorite city is San Francisco, California 94110, United States, 37.7749° N, 122.4194° W"
    ],
    "locale": "en-US"
  }
  ```

  ```json Response Body wrap lines theme={"theme":"poimandres"}
  [
    {
      "entities": [
        {
          "text": "45, Clybaun Heights, Galway City, Ireland H91 AKK3",
          "location": {
            "stt_idx": 23,
            "end_idx": 73
          },
          "best_label": "LOCATION_ADDRESS",
          "labels": {
            "LOCATION_ADDRESS_STREET": 0.3171827793121338,
            "LOCATION": 0.9123516889179454,
            "LOCATION_ADDRESS": 0.9221759648884044,
            "LOCATION_CITY": 0.16148114204406738,
            "LOCATION_COUNTRY": 0.05482322678846471,
            "LOCATION_ZIP": 0.26978740271400004
          },
          "analysis_result": {
            "subtypes": [
              {
                "text": "45, Clybaun Heights",
                "label": "LOCATION_ADDRESS_STREET",
                "location": {
                  "stt_idx": 23,
                  "end_idx": 42
                }
              },
              {
                "text": "Galway City",
                "label": "LOCATION_CITY",
                "location": {
                  "stt_idx": 44,
                  "end_idx": 55
                }
              },
              {
                "text": "Ireland",
                "label": "LOCATION_COUNTRY",
                "location": {
                  "stt_idx": 57,
                  "end_idx": 64
                }
              },
              {
                "text": "H91 AKK3",
                "label": "LOCATION_ZIP",
                "location": {
                  "stt_idx": 65,
                  "end_idx": 73
                }
              }
            ],
            "validation_assertions": []
          }
        }
      ],
      "entities_present": true,
      "characters_processed": 73,
      "languages_detected": {
        "en": 0.8342836499214172
      }
    },
    {
      "entities": [
        {
          "text": "3255 M-A-D-D-A-M-S street, huntington, west virginia",
          "location": {
            "stt_idx": 0,
            "end_idx": 52
          },
          "best_label": "LOCATION_ADDRESS",
          "labels": {
            "LOCATION_ADDRESS_STREET": 0.6232224106788635,
            "LOCATION_ADDRESS": 0.9109632035960322,
            "LOCATION": 0.8909260371456975,
            "LOCATION_CITY": 0.07817105106685472,
            "LOCATION_STATE": 0.12203486328539641
          },
          "analysis_result": {
            "subtypes": [
              {
                "text": "3255 M-A-D-D-A-M-S street",
                "label": "LOCATION_ADDRESS_STREET",
                "location": {
                  "stt_idx": 0,
                  "end_idx": 25
                }
              },
              {
                "text": "huntington",
                "label": "LOCATION_CITY",
                "location": {
                  "stt_idx": 27,
                  "end_idx": 37
                }
              },
              {
                "text": "west virginia",
                "label": "LOCATION_STATE",
                "location": {
                  "stt_idx": 39,
                  "end_idx": 52
                }
              }
            ],
            "validation_assertions": []
          }
        }
      ],
      "entities_present": true,
      "characters_processed": 70,
      "languages_detected": {
        "en": 0.9467829465866089
      }
    },
    {
      "entities": [
        {
          "text": "San Francisco, California 94110, United States, 37.7749\u00b0 N, 122.4194\u00b0 W",
          "location": {
            "stt_idx": 20,
            "end_idx": 91
          },
          "best_label": "LOCATION",
          "labels": {
            "LOCATION_CITY": 0.080466923614343,
            "LOCATION": 0.8993716637293497,
            "LOCATION_ADDRESS": 0.200799106930693,
            "LOCATION_STATE": 0.03926792989174525,
            "LOCATION_ZIP": 0.12127648045619328,
            "LOCATION_COUNTRY": 0.07723071426153183,
            "LOCATION_COORDINATE": 0.4833615819613139
          },
          "analysis_result": {
            "subtypes": [
              {
                "text": "San Francisco",
                "label": "LOCATION_CITY",
                "location": {
                  "stt_idx": 20,
                  "end_idx": 33
                }
              },
              {
                "text": "California",
                "label": "LOCATION_STATE",
                "location": {
                  "stt_idx": 35,
                  "end_idx": 45
                }
              },
              {
                "text": "94110",
                "label": "LOCATION_ZIP",
                "location": {
                  "stt_idx": 46,
                  "end_idx": 51
                }
              },
              {
                "text": "United States",
                "label": "LOCATION_COUNTRY",
                "location": {
                  "stt_idx": 53,
                  "end_idx": 66
                }
              },
              {
                "text": "37.7749\u00b0 N, 122.4194\u00b0 W",
                "label": "LOCATION_COORDINATE",
                "location": {
                  "stt_idx": 68,
                  "end_idx": 91
                }
              }
            ],
            "validation_assertions": []
          }
        }
      ],
      "entities_present": true,
      "characters_processed": 91,
      "languages_detected": {
        "en": 0.7658711075782776
      }
    }
  ]
  ```
</CodeGroup>

The above request contains three examples containing addresses. The corresponding [`analyze/text`](/latest/analyze-text) response contains the result of the analysis. This response, along with the [corresponding Limina client post-processing code](/product-guides/thin-client#custom-redaction-of-locations), can be used to mask street addresses, in order to hide the most sensitive information.

## Relation detection

Relation detection refers to the broader natural language processing (NLP) capability of understanding how entities in a text are connected. While entity recognition tells us what the entities are (e.g., a person's name, a company, a location), relation detection tells us how those entities are related. Relation detection covers tasks like [coreference resolution](#coreference-resolution) and [relation extraction](#relation-extraction), both of which are supported, and together provide a deeper understanding of unstructured text.

The [`analyze/text`](/latest/analyze-text) route can be used to configure relation detection by using the optional `relation_detection` field in the request.

### Coreference Resolution

[Coreference resolution](/configuration-and-operations/entity-detection-and-redaction/customizing-redaction#what-is-coreference-resolution) is the task of identifying different entity mentions in a given text that refer to the same real-world entity. The `relation_detection` field offers a configurable option for coreference resolution:

* **coreference\_resolution**: Specifies the method for identifying coreferential entities:
  * `heuristics`: Uses rule-based methods
  * `model_prediction`: Uses machine learning models
  * `combined`: Uses both approaches

<CodeGroup>
  ```json Request Body lines wrap theme={"theme":"poimandres"}
  {
    "text": [
      "Nikola Jokić (Serbian Cyrillic: Никола Јокић, pronounced [nǐkola jôkitɕ] ⓘ; born February 19, 1995) is a Serbian professional basketball player who is a center for the Denver Nuggets of the National Basketball Association (NBA). Jokić was born in the city of Sombor in the northern part of Serbia. He grew up in a cramped two-bedroom apartment that housed him and his two brothers."
    ],
    "entity_detection": {
      "accuracy": "high"
    },
    "locale": "en-US",
    "relation_detection": {
      "coreference_resolution": "model_prediction"
    }
  }
  ```

  ```json Response Body lines theme={"theme":"poimandres"}
  [
    {
      "entities": [
        {
          "text": "Nikola Jokić",
          "location": {
            "stt_idx": 0,
            "end_idx": 12
          },
          "best_label": "NAME",
          "labels": {
            "NAME_GIVEN": 0.2300402671098709,
            "NAME": 0.9172913134098053,
            "NAME_FAMILY": 0.6867769062519073
          },
          "coreference_id": "56c15276-33da-4726-bc81-369074049222"
        },
        {
          "text": "Serbian Cyrillic",
          "location": {
            "stt_idx": 14,
            "end_idx": 30
          },
          "best_label": "LANGUAGE",
          "labels": {
            "LANGUAGE": 0.94222651720047
          },
          "coreference_id": "0d6296d4-c453-4c73-9415-5abc527a38e5"
        },
        {
          "text": "Никола Јокић",
          "location": {
            "stt_idx": 32,
            "end_idx": 44
          },
          "best_label": "NAME_GIVEN",
          "labels": {
            "NAME": 0.842899182013103,
            "NAME_GIVEN": 0.6497380946363721,
            "NAME_FAMILY": 0.07980045356920787
          },
          "coreference_id": "56c15276-33da-4726-bc81-369074049222"
        },
        {
          "text": "nǐkola jôkitɕ",
          "location": {
            "stt_idx": 58,
            "end_idx": 71
          },
          "best_label": "NAME",
          "labels": {
            "NAME_GIVEN": 0.4644567847251892,
            "NAME": 0.8982340276241303,
            "NAME_FAMILY": 0.44664961099624634
          },
          "coreference_id": "56c15276-33da-4726-bc81-369074049222"
        },
        {
          "text": "February 19, 1995",
          "location": {
            "stt_idx": 81,
            "end_idx": 98
          },
          "best_label": "DOB",
          "labels": {
            "DOB": 0.9391335248947144
          },
          "analysis_result": {
            "formatted": "1995-02-19T00:00:00",
            "subtypes": [
              {
                "formatted": "19",
                "label": "DAY"
              },
              {
                "formatted": "2",
                "label": "MONTH"
              },
              {
                "formatted": "1995",
                "label": "YEAR"
              }
            ],
            "validation_assertions": []
          },
          "coreference_id": "65e20278-31c8-4cfb-ad73-1e24db5fcd8e"
        },
        {
          "text": "Serbian",
          "location": {
            "stt_idx": 105,
            "end_idx": 112
          },
          "best_label": "ORIGIN",
          "labels": {
            "ORIGIN": 0.9151841402053833
          },
          "coreference_id": "43af91fe-7868-4469-a70b-c22cfcd917e2"
        },
        {
          "text": "professional basketball player",
          "location": {
            "stt_idx": 113,
            "end_idx": 143
          },
          "best_label": "OCCUPATION",
          "labels": {
            "OCCUPATION": 0.8843509753545126
          },
          "coreference_id": "e8fc3654-89ab-4cec-806a-ec97f13a9673"
        },
        {
          "text": "center",
          "location": {
            "stt_idx": 153,
            "end_idx": 159
          },
          "best_label": "OCCUPATION",
          "labels": {
            "OCCUPATION": 0.8316260576248169
          },
          "coreference_id": "8ca28112-9a34-492e-8b1f-4b9fc72c0b1f"
        },
        {
          "text": "Denver Nuggets",
          "location": {
            "stt_idx": 168,
            "end_idx": 182
          },
          "best_label": "ORGANIZATION",
          "labels": {
            "LOCATION_CITY": 0.48198258876800537,
            "ORGANIZATION": 0.9154168367385864,
            "LOCATION": 0.4703272879123688
          },
          "coreference_id": "de750d67-78eb-4606-8aec-6c2f697e9c50"
        },
        {
          "text": "National Basketball Association",
          "location": {
            "stt_idx": 190,
            "end_idx": 221
          },
          "best_label": "ORGANIZATION",
          "labels": {
            "ORGANIZATION": 0.9143192768096924
          },
          "coreference_id": "b907f506-1492-40b2-915a-1c472fc1efe8"
        },
        {
          "text": "NBA",
          "location": {
            "stt_idx": 223,
            "end_idx": 226
          },
          "best_label": "ORGANIZATION",
          "labels": {
            "ORGANIZATION": 0.8653480410575867
          },
          "coreference_id": "b907f506-1492-40b2-915a-1c472fc1efe8"
        },
        {
          "text": "Jokić",
          "location": {
            "stt_idx": 229,
            "end_idx": 234
          },
          "best_label": "NAME_FAMILY",
          "labels": {
            "NAME_FAMILY": 0.9177489876747131,
            "NAME": 0.9098437031110128
          },
          "coreference_id": "56c15276-33da-4726-bc81-369074049222"
        },
        {
          "text": "Sombor",
          "location": {
            "stt_idx": 259,
            "end_idx": 265
          },
          "best_label": "LOCATION_CITY",
          "labels": {
            "LOCATION_CITY": 0.925605853398641,
            "LOCATION": 0.9114498297373453
          },
          "coreference_id": "4b204f6e-a2ed-4c45-a268-d5a20d477478"
        },
        {
          "text": "Serbia",
          "location": {
            "stt_idx": 290,
            "end_idx": 296
          },
          "best_label": "LOCATION_COUNTRY",
          "labels": {
            "LOCATION_COUNTRY": 0.9711890816688538,
            "LOCATION": 0.9073220491409302
          },
          "coreference_id": "ef54dfa2-5bae-4081-8b9d-2dc0ddf8868c"
        }
      ],
      "entities_present": true,
      "characters_processed": 381,
      "languages_detected": {
        "en": 0.9837551116943359
      }
    }
  ]
  ```
</CodeGroup>

The response includes a key element for each entity:

* **coreference\_id**: A unique identifier added to each entity that groups coreferential entities under a common label. This behavior matches the `/process/text` endpoint when `processed_text` is set to MARKER and coreference resolution is applied. For example, "Nikola Jokić", "Никола Јокић", "nǐkola jôkitɕ", and "Jokić" all share the same `coreference_id` (`56c15276-33da-4726-bc81-369074049222`), indicating that they refer to the same person.

For an example of how to use the coreference information from the API to replace all mentions of a person with their initials, see the [custom redaction of coreferenced names](/product-guides/thin-client#custom-redaction-of-coreferenced-names) example.

### Relation Extraction

Relation extraction is the task of identifying meaningful relations between entities in text, such as person-to-person or person-to-location links. It helps unlock document-level understanding by connecting pieces of information and making it easier to de-identify related data.

Let's look at an example:

```text Relation Extraction Sample Text wrap theme={"theme":"poimandres"}
Nessa Jonsson was born and raised in Sweden. Her father, Erik, emigrated to the United States when she was a baby. He died in 1980.
```

In the text above, there are four entities:

* Nessa Jonsson (`NAME`)
* Sweden (`LOCATION_COUNTRY`)
* Erik (`NAME_GIVEN`)
* the United States (`LOCATION_COUNTRY`)
* 1980 (`DATE_INTERVAL`)

Relation extraction can be used to identify the semantic connections between those entities, such as:

* Nessa Jonsson **is born in** Sweden
* Nessa Jonsson **is the daughter of** Erik
* Erik **is the father of** Nessa Jonsson
* Erik **lived in** the United States
* Erik **died in** 1980

Relation extraction plays a key role in document understanding by uncovering how entities are connected, which enables systems to move toward structured, contextualized information. In domains like healthcare and finance, this unlocks the potential of unstructured text by identifying relationships like family connections, places of origin, or dates of birth.

### Limina and Relation Extraction (Beta)

Limina's de-identification service offers the ability to use relation extraction on its `analyze/text` endpoint. Relation extraction is currently implemented on top of both the named entity recognition (NER) and the coreference resolution models. It is, therefore, limited to predicting relations between clusters of coreferenced entities.

Currently, the system supports a single generalized relation type: `RELATED_TO`, which is used to capture all of the supported semantic relations between a person and another entity:

* Kinship - a relation between two `NAME`s (or other variants, e.g. `NAME_GIVEN`) indicating family or close personal relationships between individuals. These may include parent-child, siblings, spouses, etc. A kinship relation is always bi-directional.
* Place of birth - a relation between `NAME` and `LOCATION` entities, indicating the location where the person was born. This can refer to a city, state, country, or region.
* Citizenship - a relation between `NAME` and `LOCATION` or `ORIGIN` entities, indicating nationality or legal citizenship of the person.
* Origin - a relation between `NAME` and `ORIGIN` entities, indicating the country a person originally comes from, reflecting ancestry or cultural background rather than legal status or birthplace.
* Date of birth - a relation between `NAME` and `DOB` entities, indicating birth date.
* Date of death - a relation between `NAME` and `DATE` or `DATE_INTERVAL` entities, indicating the date of death of a person.

For the example above, the system will extract the following relations:

* Nessa Jonsson → `RELATED_TO` → Sweden
* Nessa Jonsson → `RELATED_TO` → Erik
* Erik → `RELATED_TO` → Nessa Jonsson

The relation extraction feature can be enabled as part of the [`analyze/text`](/latest/analyze-text) endpoint by setting the field `enable_relation_extraction` to `true`.

* `enable_relation_extraction`: Controls whether relation extraction is performed during analysis.
  * `true`: Enables relation extraction
  * `false` (default): Disables relation extraction

<Note>
  **Relation Extraction and Coreference Resolution**

  Relation extraction relies on coreference resolution to group people mentions in text. Make sure a non-null value is set for `coreference_resolution` before setting `enable_relation_extraction` to true.
</Note>

Here is an example of how to enable relation extraction in your request using the `analyze/text` endpoint. Notice the `enable_relation_extraction` field within the `relation_detection` object.

<CodeGroup>
  ```json Request Body wrap wrap lines theme={"theme":"poimandres"}
  {
    "text": [
      "Nessa Jonsson was born on March 17, 1995 in Sweden and currently resides there. Her sister, Erika, has a history of hypertension."
    ],
    "entity_detection": {
      "accuracy": "high"
    },
    "locale": "en-US",
    "relation_detection": {
      "coreference_resolution": "model_prediction",
      "enable_relation_extraction": true
    }
  }
  ```

  ```json Response Body wrap lines highlight={19-20, 23-24, 27-28} theme={"theme":"poimandres"}
  [
    {
      "entities": [
        {
          "text": "Nessa Jonsson",
          "location": {
            "stt_idx": 0,
            "end_idx": 13
          },
          "best_label": "NAME",
          "labels": {
            "NAME_GIVEN": 0.36283198595046995,
            "NAME": 0.9023965716361999,
            "NAME_FAMILY": 0.5529729723930359
          },
          "coreference_id": "7e688543-9aff-4b9a-a386-5277d2ee8954",
          "relations": [
            {
              "coreference_id": "60108305-97e7-4019-9d02-0dc0549b27ea",
              "label": "RELATED_TO"
            },
            {
              "coreference_id": "c83c10bb-609c-4f5f-b2a1-9ba6ac367614",
              "label": "RELATED_TO"
            },
            {
              "coreference_id": "add43460-23ff-4100-b529-b17f8b9a71f4",
              "label": "RELATED_TO"
            }
          ]
        },
        {
          "text": "March 17, 1995",
          "location": {
            "stt_idx": 26,
            "end_idx": 40
          },
          "best_label": "DOB",
          "labels": {
            "DOB": 0.9576807171106339
          },
          "analysis_result": {
            "formatted": "1995-03-17T00:00:00",
            "subtypes": [
              {
                "formatted": "17",
                "label": "DAY"
              },
              {
                "formatted": "3",
                "label": "MONTH"
              },
              {
                "formatted": "1995",
                "label": "YEAR"
              }
            ],
            "validation_assertions": []
          },
          "coreference_id": "add43460-23ff-4100-b529-b17f8b9a71f4",
          "relations": []
        },
        {
          "text": "Sweden",
          "location": {
            "stt_idx": 44,
            "end_idx": 50
          },
          "best_label": "LOCATION_COUNTRY",
          "labels": {
            "LOCATION_COUNTRY": 0.9481291770935059,
            "LOCATION": 0.9080604314804077
          },
          "coreference_id": "60108305-97e7-4019-9d02-0dc0549b27ea",
          "relations": []
        },
        {
          "text": "Erika",
          "location": {
            "stt_idx": 92,
            "end_idx": 97
          },
          "best_label": "NAME_GIVEN",
          "labels": {
            "NAME_GIVEN": 0.9050805866718292,
            "NAME": 0.8937010765075684
          },
          "coreference_id": "c83c10bb-609c-4f5f-b2a1-9ba6ac367614",
          "relations": [
            {
              "coreference_id": "7e688543-9aff-4b9a-a386-5277d2ee8954",
              "label": "RELATED_TO"
            }
          ]
        },
        {
          "text": "hypertension",
          "location": {
            "stt_idx": 116,
            "end_idx": 128
          },
          "best_label": "CONDITION",
          "labels": {
            "CONDITION": 0.9360405206680298
          },
          "coreference_id": "676cbf92-3eac-46dd-ac39-96d2368c09da",
          "relations": []
        }
      ],
      "entities_present": true,
      "characters_processed": 129,
      "languages_detected": {
        "en": 0.9928773641586304
      }
    }
  ]
  ```
</CodeGroup>

With relation extraction enabled, each entity in the response now contains an additional field capturing its relations:

* `relations`: A list of extracted relations involving the entity. Each relation object includes:
  * `coreference_id`: The ID of the related entity from the `coreference_id` field of another entity in the response.
  * `label`: The type of relation detected. Currently, only one relation is supported, the generic `RELATED_TO` relation.

<Note>
  **Limitations**
  The relation extraction model is provided as an experimental feature and is not intended for production use.
  It currently supports English text and is constrained to inputs of up to **1024 tokens**. Any text beyond this limit will be ignored during processing.
  Relation predictions may be inaccurate or missed, particularly in complex contexts where related entities occur far apart within the text.
</Note>

## Synthetic replacements

The [`analyze/text`](/latest/analyze-text) endpoint supports generating synthetic replacements for detected entities.
This allows you to combine entity analysis with synthetic data generation in a single request. For a general overview of synthetic entity generation and its use cases, see the [Synthetic PII guide](/configuration-and-operations/entity-detection-and-redaction/customizing-redaction#synthetic-pii).

To enable synthetic replacements, add the optional `synthetic_replacements` object to your request. This object supports the following fields:

* **accuracy**: Selects the synthetic model accuracy. This follows the same options as `synthetic_entity_accuracy` in the [`process/text`](/latest/process-text) route (`standard`, `standard_multilingual`, `standard_automatic`).
* **entity\_types**: Following the same pattern as `entity_detection.entity_types`, this field can be used to enable or disable synthetic generation for specific entity types.

When synthetic replacements are enabled, each entity in the response will include an additional `synthetic_text` field containing the generated synthetic value for that entity.

<Note>
  **Synthetic processing is optional**

  When the `synthetic_replacements` field is omitted, the response will not include synthetic text for detected entities.
</Note>

Here is an example of how you can enable synthetic replacements in your `analyze/text` request.

<CodeGroup>
  ```json Request Body wrap lines highlight={9-19} theme={"theme":"poimandres"}
  {
    "text": [
      "Nessa Jonsson was born on March 17, 1995."
    ],
    "entity_detection": {
      "accuracy": "high"
    },
    "locale": "en-US",
    "synthetic_replacements": {
      "accuracy": "standard_automatic",
      "entity_types": [
        {
          "type": "ENABLE",
          "value": [
            "NAME"
          ]
        }
      ]
    }
  }
  ```

  ```json Response Body wrap lines highlight={16-16} theme={"theme":"poimandres"}
  [
    {
      "entities": [
        {
          "text": "Nessa Jonsson",
          "location": {
            "stt_idx": 0,
            "end_idx": 13
          },
          "best_label": "NAME",
          "labels": {
            "NAME_GIVEN": 0.36283198595046995,
            "NAME": 0.9023965716361999,
            "NAME_FAMILY": 0.5529729723930359
          },
          "synthetic_text": "Hanna Lindahl"
        },
        {
          "text": "March 17, 1995",
          "location": {
            "stt_idx": 26,
            "end_idx": 40
          },
          "best_label": "DOB",
          "labels": {
            "DOB": 0.9576807171106339
          },
          "analysis_result": {
            "formatted": "1995-03-17T00:00:00",
            "subtypes": [
              {
                "formatted": "17",
                "label": "DAY"
              },
              {
                "formatted": "3",
                "label": "MONTH"
              },
              {
                "formatted": "1995",
                "label": "YEAR"
              }
            ],
            "validation_assertions": []
          }
        }
      ],
      "entities_present": true,
      "characters_processed": 41,
      "languages_detected": {
        "en": 0.9928773641586304
      }
    }
  ]
  ```
</CodeGroup>

In the request, synthetic replacements are only enabled for `NAME` entities. Therefore, the `synthetic_text` field is omitted in the `DOB` entity within the response. This field contains text which can be used to replace the entity span.

For an example on using synthetic replacements from the API to replace names, see this [example](/product-guides/thin-client#combining-synthetic-replacements-with-custom-redaction).