> ## Documentation Index
> Fetch the complete documentation index at: https://docs.getlimina.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# NER Text

> Detect entities such as PII, PHI or PCI in the provided text strings using Private AI's entity detection engine.


## OpenAPI

````yaml /openapi/privateai_4.4.0.json post /ner/text
openapi: 3.1.0
info:
  title: API Reference
  description: Private AI API Reference
  termsOfService: https://www.getlimina.ai/en/terms-of-use
  contact:
    url: https://www.getlimina.ai/en/contact-us
    email: info@getlimina.ai
  version: 4.4.0
servers:
  - url: https://api.getlimina.ai/community/v4
    description: Private AI Community API
  - url: https://api.getlimina.ai/professional/v4
    description: Private AI Professional API
  - url: http://localhost:8080
    description: Local Server
security: []
paths:
  /ner/text:
    post:
      summary: NER Text
      description: >-
        Detect entities such as PII, PHI or PCI in the provided text strings
        using Private AI's entity detection engine.
      operationId: ner_text_ner_text_post
      requestBody:
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/NerTextRequest'
            examples:
              simple_ner:
                summary: Simple Named Entity Recognition
                value:
                  text:
                    - Hello John and Jane
                  link_batch: false
                  entity_detection:
                    accuracy: high
                    return_entity: true
              enabled_entity_types:
                summary: Enabled Entity Types
                value:
                  text:
                    - Hello, My name is Mike and I am 24 years old
                  link_batch: false
                  entity_detection:
                    accuracy: high
                    entity_types:
                      - type: ENABLE
                        value:
                          - NAME
                    return_entity: true
              allow_list:
                summary: Allow List
                value:
                  text:
                    - >-
                      Hello Xavier, I broke my right leg on the 31st. Id of PAI
                      is PAI_ID-44444 yeah, my birthday, My id is 78549 and I'm
                      waiting for my x-ray results. dr. zhang, mercer health
                      centre.
                  link_batch: false
                  entity_detection:
                    accuracy: high
                    filter:
                      - type: ALLOW
                        pattern: PAI_ID-\d{5}
                    return_entity: true
              block_list:
                summary: Block List
                value:
                  text:
                    - >-
                      Hello Xavier, I broke my right leg on the 31st. Id of PAI
                      is PAI_ID-44444 yeah, my birthday, My id is 78549 and I'm
                      waiting for my x-ray results. dr. zhang, mercer health
                      centre.
                  link_batch: false
                  entity_detection:
                    accuracy: high
                    filter:
                      - type: BLOCK
                        entity_type: MY_5_DIGIT_PAI_ID
                        pattern: PAI_ID-\d{5}
                    return_entity: true
              link_batch:
                summary: Link Batch
                value:
                  text:
                    - >-
                      Hi, my name is Penelope, could you tell me your phone
                      number please?
                    - Sure, x234
                    - and your DOB please?
                    - fourth of Feb nineteen 86
                  link_batch: true
                  entity_detection:
                    accuracy: high
                    return_entity: true
              multiple_input:
                summary: Multiple Input
                value:
                  text:
                    - Hello John and Jane
                    - Mark is 42 years old
                  link_batch: false
                  entity_detection:
                    accuracy: high
                    return_entity: true
        required: true
      responses:
        '200':
          description: Successful Response
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/NerTextResponse'
        '400':
          description: Bad Request
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/UserErrorResponseModel'
        '500':
          description: Internal Server Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/InternalErrorResponseModel'
        4XX:
          description: Client Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/UserErrorResponseModel'
components:
  schemas:
    NerTextRequest:
      properties:
        text:
          items:
            type: string
          type: array
          title: Text
          description: >-
            UTF-8 encoded message(s) to process. E.g. `["My name is Adam"]` or
            `["I live at", "263 Spadina Av"]`. Request processing time increases
            linearly with input text length, therefore maximum length is
            dependent on provisioned hardware and any timeouts set by the user.
            Private AI has tested up to 500K characters on the CPU and GPU
            containers.
        link_batch:
          type: boolean
          title: Link Batch
          description: >-
            When set to `True`, the list of inputs provided in `text` will be
            processed together as a single input by the Private AI PII detection
            model. This shares context between the different inputs and is
            useful when processing a sequence of short inputs, such as an SMS
            chat log. This option should only be enabled when the inputs are
            related, otherwise PII detection performance could be degraded.
          default: false
        entity_detection:
          $ref: '#/components/schemas/PIIDetectionParams'
          description: >-
            This section contains a set of parameters to control the PII
            detection process. All fields have sensible default that can be
            changed for specific needs.
        project_id:
          type: string
          maxLength: 60
          pattern: ^[a-zA-Z0-9\-_\:]*$
          title: Project Id
          description: >-
            Used to categorize requests for reporting purposes. Limited to
            alphanumeric characters or the following special characters :_-
          default: main
        enable_gibberish_detection:
          type: boolean
          title: Enable Gibberish Detection
          description: >-
            When set to `True`, the gibberish detector is enabled, returning a
            `gibberish_score` for each entity in the response.
          default: false
      additionalProperties: false
      type: object
      required:
        - text
      title: NerTextRequest
      description: API 3.9 Spec Definition
    NerTextResponse:
      items:
        $ref: '#/components/schemas/NerTextResponseItem'
      type: array
      title: NerTextResponse
    UserErrorResponseModel:
      properties:
        detail:
          anyOf:
            - $ref: '#/components/schemas/ErrorMessage'
            - $ref: '#/components/schemas/ValidationErrorModel'
          title: Detail
          description: >-
            The details of the error, usually relating to an input validation
            error
      type: object
      required:
        - detail
      title: UserErrorResponseModel
    InternalErrorResponseModel:
      properties:
        detail:
          type: string
          title: Detail
          description: >-
            The details of the error, usually relating to an unhandled
            processing error
      type: object
      required:
        - detail
      title: InternalErrorResponseModel
    PIIDetectionParams:
      properties:
        accuracy:
          $ref: '#/components/schemas/AccuracyMode'
          description: >-
            Selects the model used to identify PII in the input text. By
            default, the `high_automatic` accuracy model is used. This default
            automatically chooses either the high or high_multilingual model.
            Whilst the models used by the Private AI solution are highly
            optimized (~25X faster than a reference transformer implementation),
            in high-throughput cases it is possible to trade accuracy for speed
            by selecting either the `standard` or `standard_high` accuracy
            modes. Multilingual support can be enabled by using one of the
            multilingual models, namely `standard_high_multilingual` (GPU
            container only) and `high_multilingual`. The multilingual models
            process all supported languages including English, without the need
            to specify language. It is advisable to use the English-only models
            where possible, as they perform slightly better on English.
            Automatic Models can determine which model to use (English or
            Multilingual) depending on the languages detected, provided
            Multilingual models are available. More information on different
            accuracy modes can be found here:
            https://docs.getlimina.ai/configuration-and-operations/entity-detection-and-redaction/accuracy-modes
        entity_types:
          items:
            oneOf:
              - $ref: '#/components/schemas/EnableEntityTypeSelector'
              - $ref: '#/components/schemas/DisableEntityTypeSelector'
            discriminator:
              propertyName: type
              mapping:
                DISABLE:
                  $ref: '#/components/schemas/DisableEntityTypeSelector'
                ENABLE:
                  $ref: '#/components/schemas/EnableEntityTypeSelector'
          type: array
          title: Entity Types
          description: >-
            Controls which entity types and legislation sets are detected. See
            [Supported Entity
            Types](https://docs.getlimina.ai/entities/supported-entity-types)
            for the list of possible entities and legislation sets. By default,
            all entities are detected and removed. You can specify one of many
            selectors, which can be either an individual entity type such as
            `LOCATION_CITY` or a legislation like `GDPR`.
            `EnableEntityTypeSelector` selectors will add entity types to
            detect. On the contrary, `DisableEntityTypeSelector` selectors will
            ignore entities of the specified types. If only
            `DisableEntityTypeSelector` selectors are specified, they are
            assumed to be ignoring entity types from the entire supported list
            of entity types.
        filter:
          items:
            oneOf:
              - $ref: '#/components/schemas/AllowFilter'
              - $ref: '#/components/schemas/BlockFilter'
              - $ref: '#/components/schemas/AllowTextFilter'
            discriminator:
              propertyName: type
              mapping:
                ALLOW:
                  $ref: '#/components/schemas/AllowFilter'
                ALLOW_TEXT:
                  $ref: '#/components/schemas/AllowTextFilter'
                BLOCK:
                  $ref: '#/components/schemas/BlockFilter'
          type: array
          title: Filter
          description: >-
            This field contains a list of filters expressed as regular
            expressions. These regular expressions can help users customize PII
            detection in a few ways. `ALLOW` filters can disabled redaction of
            entities containing specific text (e.g. a document id with format
            ID-1212 that is not sensitive). On the other hand, the `BLOCK`
            filters can augment the existing set of entities with custom
            entities (e.g. detecting sensitive medical code with format A18.32
            as IDC_NUMBER). Finally, the `ALLOW_TEXT` filters can select a
            section of text in a document that should not be redacted (e.g.
            author names in scientific references or dates in an audit trail
            log).
        return_entity:
          type: boolean
          title: Return Entity
          description: >-
            Controls whether the PII list in the response contains the `text`
            field. Turning this off means that no sensitive PII is returned in
            the response.
          default: true
        enable_non_max_suppression:
          type: boolean
          title: Enable Non Max Suppression
          description: >-
            When set to `True`, if the best label (i.e., the label with the
            highest likelihood) of an entity is disabled then the entity will
            not be redacted. This could be useful to minimize false-positives
            when disabling an entity label like `ACCOUNT_NUMBER` that is related
            but distinct from other entities like `BANK_ACCOUNT`. Note that on
            class hierarchies like `NAME` and `LOCATION`, the behaviour of this
            field is slightly different since it is often impossible to identify
            a best label. If the label of a sub-entity is disabled (e.g.,
            `NAME_GIVEN`) then the entity will not be redacted independently of
            the value of the likelihood for this label.
          default: false
      additionalProperties: false
      type: object
      title: PIIDetectionParams
    NerTextResponseItem:
      properties:
        entities:
          items:
            $ref: '#/components/schemas/NerEntityItem'
          type: array
          title: Entities
          description: A list of all entities found in the text.
        entities_present:
          type: boolean
          title: Entities Present
          description: Returns `True` if the list of detected entities is not empty.
        characters_processed:
          type: integer
          title: Characters Processed
          description: The number of characters in all the text inputs.
        languages_detected:
          additionalProperties:
            type: number
          type: object
          title: Languages Detected
          description: >-
            A dictionary containing ISO 639-1 language labels and the likelihood
            of their detection in the request payload.
      type: object
      required:
        - entities
        - entities_present
        - characters_processed
        - languages_detected
      title: NerTextResponseItem
    ErrorMessage:
      type: string
      title: ErrorMessage
      description: An error message
    ValidationErrorModel:
      properties:
        description:
          type: string
          title: Description
          description: User-friendly error message description.
        loc:
          items:
            anyOf:
              - type: string
              - type: integer
          type: array
          title: Loc
          description: >-
            The location where the error has occurred. This could be location of
            a character if the error is a parsing error, or it could be a value
            in the request if there is a validation error.
        msg:
          type: string
          title: Msg
          description: The error message.
        type:
          type: string
          title: Type
          description: The type of error that has been encountered.
      type: object
      required:
        - loc
        - msg
        - type
      title: ValidationErrorModel
    AccuracyMode:
      type: string
      enum:
        - standard
        - standard_high
        - standard_high_multilingual
        - standard_high_automatic
        - high
        - high_multilingual
        - high_automatic
      title: AccuracyMode
      description: >-
        Selects the model used to identify PII in the input text. By default,
        the "high_automatic" accuracy model is used. This default automatically
        chooses either the high or high_multilingual model. Whilst the models
        used by the Private AI solution are highly optimized (~25X faster than a
        reference transformer implementation), in high-throughput cases it is
        possible to trade accuracy for speed by selecting either the "standard"
        or "standard_high" accuracy modes. Multilingual support can be enabled
        by using one of the multilingual models, namely
        "standard_high_multilingual" (GPU container only) and
        "high_multilingual". The multilingual models process all supported
        languages including English, without the need to specify language. It is
        advisable to use the English-only models where possible, as they perform
        slightly better on English. Automatic Models can determine which model
        to use (English or Multilingual) depending on the languages detected,
        provided Multilingual models are available. More information on
        different accuracy modes can be found here:
        https://docs.getlimina.ai/configuration-and-operations/entity-detection-and-redaction/accuracy-modes
    EnableEntityTypeSelector:
      properties:
        type:
          type: string
          enum:
            - ENABLE
          const: ENABLE
          title: Type
          default: ENABLE
        value:
          items:
            type: string
          type: array
          title: Value
          description: >-
            A list of entity types to detect and remove. See [Supported Entity
            Types](https://docs.getlimina.ai/entities/supported-entity-types)
            for a complete list of entity types. This can also be one or many
            legislations. We currently support these legislations  ['APPI',
            'APPI_SENSITIVE', 'CORE_ENTITIES', 'CPRA', 'GDPR', 'GDPR_SENSITIVE',
            'HEALTH_INFORMATION', 'HIPAA_SAFE_HARBOR', 'LIDI', 'PCI',
            'QUEBEC_PRIVACY_ACT', 'CCI', 'NUMERICAL_EXCL_PCI'].
      additionalProperties: false
      type: object
      title: EnableEntityTypeSelector
    DisableEntityTypeSelector:
      properties:
        type:
          type: string
          enum:
            - DISABLE
          const: DISABLE
          title: Type
          default: DISABLE
        value:
          items:
            type: string
          type: array
          title: Value
          description: >-
            A list of entity types to ignore. See [Supported Entity
            Types](https://docs.getlimina.ai/entities/supported-entity-types)
            for a complete list of entity types.
      additionalProperties: false
      type: object
      title: DisableEntityTypeSelector
    AllowFilter:
      properties:
        type:
          type: string
          enum:
            - ALLOW
          const: ALLOW
          title: Type
          description: >-
            Entities with text matching the provided regex pattern will be
            discarded. It is also possible to set this option via environment
            variable. See [Environment
            Variables](https://docs.getlimina.ai/configuration-and-operations/container-management/environment-variables).
          default: ALLOW
        pattern:
          type: string
          title: Pattern
          description: >-
            A python regex pattern (e.g. `r"^ID-[\d]{4}$"`). This pattern will
            be used to match the entity text. If it matches zero or more
            characters at the **beginning of the entity text**, the entity will
            be ignored. Be sure to use the end of string character `$` if you
            want to only allow entities when the entirety of the text matches.
            It is also important to note that regex patterns may require
            escaping when used in JSON objects.
      additionalProperties: false
      type: object
      required:
        - pattern
      title: AllowFilter
    BlockFilter:
      properties:
        type:
          type: string
          enum:
            - BLOCK
          const: BLOCK
          title: Type
          description: >-
            The block feature allows you to extend the functionality of the
            Private AI models by using regular expressions. This way, you can
            define a Python regex pattern that will be used to identify
            additional tokens with the given PII label.


            Several block list filters can be specified with their own regex
            pattern.


            Lastly,for supported labels, if you would like the model to pick up
            only the tokens from the block list, you can use the enabled entity
            type feature together with the block list feature. This can be done
            by defining a list of enabled entity types and not including the
            supported label you are adding to the block list. For example, if
            you would like the label `ORGANIZATION` to only pick up Microsoft,
            you can define the enabled entity types as `[{"type":"ENABLE",
            "value": "NAME"}, {"type": "ENABLE", "value": "LOCATION"}, {"type":
            "ENABLE", "value": "AGE"}, ...]` (and omitting `ORGANIZATION`) and
            the block list as `[{"type": "BLOCK", "entitiy_type":
            "ORGANIZATION", "pattern": "Microsoft"}]`.
          default: BLOCK
        entity_type:
          type: string
          title: Entity Type
          description: >-
            Name of the custom entity type. It can either be a completely new
            entity type such as `CUSTOM_ID` or an existing entity, such as
            `NAME`.
        pattern:
          type: string
          title: Pattern
          description: >-
            This is a pattern to match in the text. This feature uses regex
            patterns, you can either pass a word (e.g. the, word, custom, etc.)
            or you can pass a valid Python regex pattern. It is important to
            note that regex patterns may require escaping when used in JSON
            objects. To give an example, if you would like to send the regex
            pattern `r"\b\w{4}\b"` which will catch every 4-character word, you
            need to send it as `"\\b\\w{4}\\b"`. A complete JSON grammar is
            found here: https://www.json.org/json-en.html. More information on
            how to write a python regex is found here:
            https://docs.python.org/3/library/re.html


            It is important to note also that only non-overlapping matches are
            returned.
        threshold:
          type: number
          title: Threshold
          description: >-
            This is defining a likelihood threshold for custom entity. This
            likelihood is compared against the predicted model likelihood and if
            it is greater then the custom entity is outputted instead of the
            model predicted entity. By default this threshold is set to 1.0
            which will ensure that the blocked entities will always be preferred
            over a matching model predicted entity. This can be any value
            between 0 and 1.
          default: 1
      additionalProperties: false
      type: object
      required:
        - entity_type
        - pattern
      title: BlockFilter
    AllowTextFilter:
      properties:
        type:
          type: string
          enum:
            - ALLOW_TEXT
          const: ALLOW_TEXT
          title: Type
          description: >-
            Input text matching the provided regex pattern will not be redacted.
            It is currently not possible to set this option via environment
            variable.
          default: ALLOW_TEXT
        pattern:
          type: string
          title: Pattern
          description: >-
            A python regex pattern (e.g. `r"^ID-[\d]{4}$"`). This pattern will
            be used to match the input text. Entities detected **inside the
            matched text** will be ignored. Note that capturing groups can be
            used in the regex pattern. If present, only the text matching a
            capturing group will be left unredacted. It is also important to
            note that regex patterns may require escaping when used in JSON
            objects.
      additionalProperties: false
      type: object
      required:
        - pattern
      title: AllowTextFilter
    NerEntityItem:
      properties:
        text:
          type: string
          title: Text
          description: >-
            The entity text. When the `return_entity` option is set to `False`,
            this returns a null string
        location:
          $ref: '#/components/schemas/NerTextLocation'
          default: >-
            Location object containing the start-end character indexes of a
            given entity in the input text.
        label:
          type: string
          title: Label
          description: The entity label.
        likelihood:
          type: number
          title: Likelihood
          description: >-
            The likelihood of the entity being of this specific label. Note that
            these are not strictly probabilities and do not sum to 1, as an
            entity can have multiple types.


            Note that the likelihoods have also been thresholded, so no
            additional thresholding is necessary.
        gibberish_score:
          anyOf:
            - type: number
              maximum: 1
              minimum: 0
            - type: 'null'
          title: Gibberish Score
          description: >-
            The score produced by the gibberish detector. This field is only
            present if the `enable_gibberish_detection` option is set to `True`
            in the request. The score ranges from 0 to 1, where higher values
            indicate text that is likely nonsensical (e.g., random character
            sequences or OCR output from low-quality images), and lower values
            indicate coherent, grammatically structured text. A high gibberish
            score may suggest that detected entities are false positives.
      additionalProperties: false
      type: object
      required:
        - text
        - label
        - likelihood
      title: NerEntityItem
    NerTextLocation:
      properties:
        stt_idx:
          type: integer
          title: Stt Idx
          description: Start character index of the entity in the original text.
        end_idx:
          type: integer
          title: End Idx
          description: >-
            Index of the character immediately following the entity, such that
            end_idx - stt_idx = number of characters in the entity.
      type: object
      required:
        - stt_idx
        - end_idx
      title: NerTextLocation
      description: Start and end indices of the entity in the original text.

````