> ## Documentation Index
> Fetch the complete documentation index at: https://docs.getlimina.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Analyze Text

> Detect entities in the provided text strings using Private AI's entity detection engine and return the results of the analysis and validation of each entity.



## OpenAPI

````yaml /openapi/privateai_4.3.0.json post /analyze/text
openapi: 3.1.0
info:
  title: API Reference
  description: Private AI API Reference
  termsOfService: https://www.getlimina.ai/en/terms-of-use
  contact:
    url: https://www.getlimina.ai/en/contact-us
    email: info@getlimina.ai
  version: 4.3.0
servers:
  - url: https://api.private-ai.com/community
    description: Private AI Community API
  - url: https://api.private-ai.com/cloud
    description: Private AI Cloud API
  - url: http://localhost:8080
    description: Local Server
security: []
paths:
  /analyze/text:
    post:
      summary: Analyze Text
      description: >-
        Detect entities in the provided text strings using Private AI's entity
        detection engine and return the results of the analysis and validation
        of each entity.
      operationId: analyze_text_analyze_text_post
      requestBody:
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/AnalyzeTextRequest'
            examples:
              simple_analyze:
                summary: Simple Text Analysis
                value:
                  text:
                    - Hello John and Jane
                  link_batch: false
                  entity_detection:
                    accuracy: high
                    return_entity: true
        required: true
      responses:
        '200':
          description: Successful Response
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/AnalyzeTextResponse'
        '400':
          description: Bad Request
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/UserErrorResponseModel'
        '500':
          description: Internal Server Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/InternalErrorResponseModel'
        4XX:
          description: Client Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/UserErrorResponseModel'
components:
  schemas:
    AnalyzeTextRequest:
      properties:
        text:
          items:
            type: string
          type: array
          title: Text
          description: >-
            UTF-8 encoded message(s) to process. E.g. `["My name is Adam"]` or
            `["I live at", "263 Spadina Av"]`. Request processing time increases
            linearly with input text length, therefore maximum length is
            dependent on provisioned hardware and any timeouts set by the user.
            Private AI has tested up to 500K characters on the CPU and GPU
            containers.
        link_batch:
          type: boolean
          title: Link Batch
          description: >-
            When set to `True`, the list of inputs provided in `text` will be
            processed together as a single input by the Private AI PII detection
            model. This shares context between the different inputs and is
            useful when processing a sequence of short inputs, such as an SMS
            chat log. This option should only be enabled when the inputs are
            related, otherwise PII detection performance could be degraded.
          default: false
        entity_detection:
          $ref: '#/components/schemas/PIIDetectionParams'
          description: >-
            This section contains a set of parameters to control the PII
            detection process. All fields have sensible default that can be
            changed for specific needs.
        project_id:
          type: string
          maxLength: 60
          pattern: ^[a-zA-Z0-9\-_\:]*$
          title: Project Id
          description: >-
            Used to categorize requests for reporting purposes. Limited to
            alphanumeric characters or the following special characters :_-
          default: main
        locale:
          anyOf:
            - type: string
            - type: 'null'
          title: Locale
          description: >-
            This optional field serves as a hint to the analyzer to interpret
            locale-dependent entity types like dates. When set to `en-CA`, the
            date 12-10-2024 will be interpreted as October 12, 2024. However, if
            the hint is set to `en-US`, the date will be interpreted as December
            10, 2024. If no locale is provided the language of the input text
            will be used.
        relation_detection:
          anyOf:
            - $ref: '#/components/schemas/RelationDetectionParams'
            - type: 'null'
          description: >-
            This section contains a set of parameters to control the relation
            detection process. It allows the user to select the coreference
            resolution mode and to enable relation extraction among entities.
      additionalProperties: false
      type: object
      required:
        - text
      title: AnalyzeTextRequest
      description: API 4.1 Spec Definition
    AnalyzeTextResponse:
      items:
        $ref: '#/components/schemas/AnalyzeTextResponseItem'
      type: array
      title: AnalyzeTextResponse
    UserErrorResponseModel:
      properties:
        detail:
          anyOf:
            - $ref: '#/components/schemas/ErrorMessage'
            - $ref: '#/components/schemas/ValidationErrorModel'
          title: Detail
          description: >-
            The details of the error, usually relating to an input validation
            error
      type: object
      required:
        - detail
      title: UserErrorResponseModel
    InternalErrorResponseModel:
      properties:
        detail:
          type: string
          title: Detail
          description: >-
            The details of the error, usually relating to an unhandled
            processing error
      type: object
      required:
        - detail
      title: InternalErrorResponseModel
    PIIDetectionParams:
      properties:
        accuracy:
          $ref: '#/components/schemas/AccuracyMode'
          description: >-
            Selects the model used to identify PII in the input text. By
            default, the `high_automatic` accuracy model is used. This default
            automatically chooses either the high or high_multilingual model.
            Whilst the models used by the Private AI solution are highly
            optimized (~25X faster than a reference transformer implementation),
            in high-throughput cases it is possible to trade accuracy for speed
            by selecting either the `standard` or `standard_high` accuracy
            modes. Multilingual support can be enabled by using one of the
            multilingual models, namely `standard_high_multilingual` (GPU
            container only) and `high_multilingual`. The multilingual models
            process all supported languages including English, without the need
            to specify language. It is advisable to use the English-only models
            where possible, as they perform slightly better on English.
            Automatic Models can determine which model to use (English or
            Multilingual) depending on the languages detected, provided
            Multilingual models are available. More information on different
            accuracy modes can be found here:
            https://docs.getlimina.ai/configuration-and-operations/entity-detection-and-redaction/accuracy-modes
        entity_types:
          items:
            oneOf:
              - $ref: '#/components/schemas/EnableEntityTypeSelector'
              - $ref: '#/components/schemas/DisableEntityTypeSelector'
            discriminator:
              propertyName: type
              mapping:
                DISABLE:
                  $ref: '#/components/schemas/DisableEntityTypeSelector'
                ENABLE:
                  $ref: '#/components/schemas/EnableEntityTypeSelector'
          type: array
          title: Entity Types
          description: >-
            Controls which entity types and legislation sets are detected. See
            [Supported Entity
            Types](https://docs.getlimina.ai/entities/supported-entity-types)
            for the list of possible entities and legislation sets. By default,
            all entities are detected and removed. You can specify one of many
            selectors, which can be either an individual entity type such as
            `LOCATION_CITY` or a legislation like `GDPR`.
            `EnableEntityTypeSelector` selectors will add entity types to
            detect. On the contrary, `DisableEntityTypeSelector` selectors will
            ignore entities of the specified types. If only
            `DisableEntityTypeSelector` selectors are specified, they are
            assumed to be ignoring entity types from the entire supported list
            of entity types.
        filter:
          items:
            oneOf:
              - $ref: '#/components/schemas/AllowFilter'
              - $ref: '#/components/schemas/BlockFilter'
              - $ref: '#/components/schemas/AllowTextFilter'
            discriminator:
              propertyName: type
              mapping:
                ALLOW:
                  $ref: '#/components/schemas/AllowFilter'
                ALLOW_TEXT:
                  $ref: '#/components/schemas/AllowTextFilter'
                BLOCK:
                  $ref: '#/components/schemas/BlockFilter'
          type: array
          title: Filter
          description: >-
            This field contains a list of filters expressed as regular
            expressions. These regular expressions can help users customize PII
            detection in a few ways. `ALLOW` filters can disabled redaction of
            entities containing specific text (e.g. a document id with format
            ID-1212 that is not sensitive). On the other hand, the `BLOCK`
            filters can augment the existing set of entities with custom
            entities (e.g. detecting sensitive medical code with format A18.32
            as IDC_NUMBER). Finally, the `ALLOW_TEXT` filters can select a
            section of text in a document that should not be redacted (e.g.
            author names in scientific references or dates in an audit trail
            log).
        return_entity:
          type: boolean
          title: Return Entity
          description: >-
            Controls whether the PII list in the response contains the `text`
            field. Turning this off means that no sensitive PII is returned in
            the response.
          default: true
        enable_non_max_suppression:
          type: boolean
          title: Enable Non Max Suppression
          description: >-
            When set, if the best label of an entity is disabled then the entity
            will not be redacted. This could be useful to minimize
            false-positives when disabling an entity label like `ACCOUNT_NUMBER`
            that is closely related to another entity like `BANK_ACCOUNT`. Note
            that, the effect of that flag is difficult to predict when disabling
            labels like `LOCATION_COUNTRY` or `NAME_GIVEN` that are hierarchical
            by nature (i.e. a `NAME_GIVEN` entity is also a `NAME` entity). We
            don't recommend setting that flag to true when some of these
            hierarchical entities are disabled.
          default: false
      additionalProperties: false
      type: object
      title: PIIDetectionParams
    RelationDetectionParams:
      properties:
        coreference_resolution:
          anyOf:
            - $ref: '#/components/schemas/CoreferenceResolutionMode'
            - type: 'null'
          description: >-
            (Experimental) Turns on the experimental coreference resolution.
            Specifies whether multiple instances of the same entity should share
            the same identifier or not. For example, with
            `coreference_resolution` set: "Hi John and Rosha, John nice to meet
            you" both instances of John will have the same identifier.
        enable_relation_extraction:
          anyOf:
            - type: boolean
            - type: 'null'
          title: Enable Relation Extraction
          description: >-
            (Experimental) Turns on the experimental relation extraction. Note
            that relation extraction relies on coreference resolution. When this
            field is set to True, one must make sure that the
            `coreference_resolution` field is also set.
          default: false
      additionalProperties: false
      type: object
      title: RelationDetectionParams
    AnalyzeTextResponseItem:
      properties:
        entities:
          items:
            $ref: '#/components/schemas/AnalyzeEntityItem'
          type: array
          title: Entities
          description: >-
            A list of entities found in the text along with their analysis
            results.
        entities_present:
          type: boolean
          title: Entities Present
          description: Returns `True` if the list of detected entities is not empty.
        characters_processed:
          type: integer
          title: Characters Processed
          description: The number of characters in all the text inputs.
        languages_detected:
          additionalProperties:
            type: number
          type: object
          title: Languages Detected
          description: >-
            A dictionary containing ISO 639-1 language labels and the likelihood
            of their detection in the request payload.
      type: object
      required:
        - entities
        - entities_present
        - characters_processed
        - languages_detected
      title: AnalyzeTextResponseItem
    ErrorMessage:
      type: string
      title: ErrorMessage
      description: An error message
    ValidationErrorModel:
      properties:
        description:
          type: string
          title: Description
          description: User-friendly error message description.
        loc:
          items:
            anyOf:
              - type: string
              - type: integer
          type: array
          title: Loc
          description: >-
            The location where the error has occurred. This could be location of
            a character if the error is a parsing error, or it could be a value
            in the request if there is a validation error.
        msg:
          type: string
          title: Msg
          description: The error message.
        type:
          type: string
          title: Type
          description: The type of error that has been encountered.
      type: object
      required:
        - loc
        - msg
        - type
      title: ValidationErrorModel
    AccuracyMode:
      type: string
      enum:
        - standard
        - standard_high
        - standard_high_multilingual
        - standard_high_automatic
        - high
        - high_multilingual
        - high_automatic
      title: AccuracyMode
      description: >-
        Selects the model used to identify PII in the input text. By default,
        the "high_automatic" accuracy model is used. This default automatically
        chooses either the high or high_multilingual model. Whilst the models
        used by the Private AI solution are highly optimized (~25X faster than a
        reference transformer implementation), in high-throughput cases it is
        possible to trade accuracy for speed by selecting either the "standard"
        or "standard_high" accuracy modes. Multilingual support can be enabled
        by using one of the multilingual models, namely
        "standard_high_multilingual" (GPU container only) and
        "high_multilingual". The multilingual models process all supported
        languages including English, without the need to specify language. It is
        advisable to use the English-only models where possible, as they perform
        slightly better on English. Automatic Models can determine which model
        to use (English or Multilingual) depending on the languages detected,
        provided Multilingual models are available. More information on
        different accuracy modes can be found here:
        https://docs.getlimina.ai/configuration-and-operations/entity-detection-and-redaction/accuracy-modes
    EnableEntityTypeSelector:
      properties:
        type:
          type: string
          enum:
            - ENABLE
          const: ENABLE
          title: Type
          default: ENABLE
        value:
          items:
            type: string
          type: array
          title: Value
          description: >-
            A list of entity types to detect and remove. See [Supported Entity
            Types](https://docs.getlimina.ai/entities/supported-entity-types)
            for a complete list of entity types. This can also be one or many
            legislations. We currently support these legislations  ['APPI',
            'APPI_SENSITIVE', 'CORE_ENTITIES', 'CPRA', 'GDPR', 'GDPR_SENSITIVE',
            'HEALTH_INFORMATION', 'HIPAA_SAFE_HARBOR', 'LIDI', 'PCI',
            'QUEBEC_PRIVACY_ACT', 'CCI', 'NUMERICAL_EXCL_PCI'].
      additionalProperties: false
      type: object
      title: EnableEntityTypeSelector
    DisableEntityTypeSelector:
      properties:
        type:
          type: string
          enum:
            - DISABLE
          const: DISABLE
          title: Type
          default: DISABLE
        value:
          items:
            type: string
          type: array
          title: Value
          description: >-
            A list of entity types to ignore. See [Supported Entity
            Types](https://docs.getlimina.ai/entities/supported-entity-types)
            for a complete list of entity types.
      additionalProperties: false
      type: object
      title: DisableEntityTypeSelector
    AllowFilter:
      properties:
        type:
          type: string
          enum:
            - ALLOW
          const: ALLOW
          title: Type
          description: >-
            Entities with text matching the provided regex pattern will be
            discarded. It is also possible to set this option via environment
            variable. See [Environment
            Variables](https://docs.getlimina.ai/configuration-and-operations/container-management/environment-variables).
          default: ALLOW
        pattern:
          type: string
          title: Pattern
          description: >-
            A python regex pattern (e.g. `r"^ID-[\d]{4}$"`). This pattern will
            be used to match the entity text. If it matches zero or more
            characters at the **beginning of the entity text**, the entity will
            be ignored. Be sure to use the end of string character `$` if you
            want to only allow entities when the entirety of the text matches.
            It is also important to note that regex patterns may require
            escaping when used in JSON objects.
      additionalProperties: false
      type: object
      required:
        - pattern
      title: AllowFilter
    BlockFilter:
      properties:
        type:
          type: string
          enum:
            - BLOCK
          const: BLOCK
          title: Type
          description: >-
            The block feature allows you to extend the functionality of the
            Private AI models by using regular expressions. This way, you can
            define a Python regex pattern that will be used to identify
            additional tokens with the given PII label.


            Several block list filters can be specified with their own regex
            pattern.


            Lastly,for supported labels, if you would like the model to pick up
            only the tokens from the block list, you can use the enabled entity
            type feature together with the block list feature. This can be done
            by defining a list of enabled entity types and not including the
            supported label you are adding to the block list. For example, if
            you would like the label `ORGANIZATION` to only pick up Microsoft,
            you can define the enabled entity types as `[{"type":"ENABLE",
            "value": "NAME"}, {"type": "ENABLE", "value": "LOCATION"}, {"type":
            "ENABLE", "value": "AGE"}, ...]` (and omitting `ORGANIZATION`) and
            the block list as `[{"type": "BLOCK", "entitiy_type":
            "ORGANIZATION", "pattern": "Microsoft"}]`.
          default: BLOCK
        entity_type:
          type: string
          title: Entity Type
          description: >-
            Name of the custom entity type. It can either be a completely new
            entity type such as `CUSTOM_ID` or an existing entity, such as
            `NAME`.
        pattern:
          type: string
          title: Pattern
          description: >-
            This is a pattern to match in the text. This feature uses regex
            patterns, you can either pass a word (e.g. the, word, custom, etc.)
            or you can pass a valid Python regex pattern. It is important to
            note that regex patterns may require escaping when used in JSON
            objects. To give an example, if you would like to send the regex
            pattern `r"\b\w{4}\b"` which will catch every 4-character word, you
            need to send it as `"\\b\\w{4}\\b"`. A complete JSON grammar is
            found here: https://www.json.org/json-en.html. More information on
            how to write a python regex is found here:
            https://docs.python.org/3/library/re.html


            It is important to note also that only non-overlapping matches are
            returned.
        threshold:
          type: number
          title: Threshold
          description: >-
            This is defining a likelihood threshold for custom entity. This
            likelihood is compared against the predicted model likelihood and if
            it is greater then the custom entity is outputted instead of the
            model predicted entity. By default this threshold is set to 1.0
            which will ensure that the blocked entities will always be preferred
            over a matching model predicted entity. This can be any value
            between 0 and 1.
          default: 1
      additionalProperties: false
      type: object
      required:
        - entity_type
        - pattern
      title: BlockFilter
    AllowTextFilter:
      properties:
        type:
          type: string
          enum:
            - ALLOW_TEXT
          const: ALLOW_TEXT
          title: Type
          description: >-
            Input text matching the provided regex pattern will not be redacted.
            It is currently not possible to set this option via environment
            variable.
          default: ALLOW_TEXT
        pattern:
          type: string
          title: Pattern
          description: >-
            A python regex pattern (e.g. `r"^ID-[\d]{4}$"`). This pattern will
            be used to match the input text. Entities detected **inside the
            matched text** will be ignored. Note that capturing groups can be
            used in the regex pattern. If present, only the text matching a
            capturing group will be left unredacted. It is also important to
            note that regex patterns may require escaping when used in JSON
            objects.
      additionalProperties: false
      type: object
      required:
        - pattern
      title: AllowTextFilter
    CoreferenceResolutionMode:
      type: string
      enum:
        - heuristics
        - combined
        - model_prediction
      title: CoreferenceResolutionMode
    AnalyzeEntityItem:
      properties:
        text:
          type: string
          title: Text
          description: >-
            The entity text. When the `return_entity` option is set to `False`,
            this returns a null string
        location:
          $ref: '#/components/schemas/NerTextLocation'
          default: >-
            Location object containing the start-end character indexes of the
            entity.
        best_label:
          type: string
          title: Best Label
          description: The entity label with the highest likelihood.
        labels:
          additionalProperties:
            type: number
          type: object
          title: Labels
          description: >-
            A dictionary containing any other labels found, together with
            associated likelihoods. Note that these are not strictly
            probabilities and do not sum to 1, as an entity can have multiple
            types.


            Note that the likelihoods have also been thresholded, so no
            additional thresholding is necessary.
        analysis_result:
          anyOf:
            - $ref: '#/components/schemas/AnalysisResult'
            - type: 'null'
          description: >-
            The results of the entity analysis. This field is not present on
            entities that were not analyzed.
        coreference_id:
          anyOf:
            - type: string
            - type: 'null'
          title: Coreference Id
          description: >-
            The coreference id (e.g., NAME_1, ORGANIZATION_1) of each entity.
            Names with the same coreference id refer to the same person (e.g.,
            John Doe and J. Doe). Organization names with the same coreference
            id refer to the same organization (e.g., Coke and Coca-Cola). This
            field is not present if `coreference_resolution` is not set in the
            request.
        relations:
          anyOf:
            - items:
                $ref: '#/components/schemas/RelationItem'
              type: array
            - type: 'null'
          title: Relations
          description: List of entities related to this entity.
      type: object
      required:
        - text
        - best_label
        - labels
        - coreference_id
        - relations
      title: AnalyzeEntityItem
    NerTextLocation:
      properties:
        stt_idx:
          type: integer
          title: Stt Idx
          description: Start character index of the entity in the original text.
        end_idx:
          type: integer
          title: End Idx
          description: >-
            Index of the character immediately following the entity, such that
            end_idx - stt_idx = number of characters in the entity.
      type: object
      required:
        - stt_idx
        - end_idx
      title: NerTextLocation
      description: Start and end indices of the entity in the original text.
    AnalysisResult:
      properties:
        formatted:
          anyOf:
            - type: integer
            - type: string
            - type: 'null'
          title: Formatted
          description: >-
            The formatted entity. Each entity type is formatted according to its
            own standard (e.g., dates are formatted according to ISO-8601). Note
            that this field will not be present if the entity failed to parse
            according to its defining standard.
        subtypes:
          items:
            $ref: '#/components/schemas/EntitySubType'
          type: array
          title: Subtypes
          description: >-
            The list of sub-entities contained in the parent. For example, an
            address could contain the sub-types `LOCATION_COUNTRY` for its
            country and `LOCATION_ADDRESS_STREET` for its street address.
        validation_assertions:
          items:
            $ref: '#/components/schemas/ValidationAssertion'
          type: array
          title: Validation Assertions
          description: >-
            The results of the validations that are run on the entity. Note that
            the list of validation assertions will be empty if the parsing of
            the entity failed.
      type: object
      title: AnalysisResult
    RelationItem:
      properties:
        coreference_id:
          type: string
          title: Coreference Id
          description: >-
            Relations are drawn from clusters of coreferential entities. This is
            the id of an entity cluster related to this entity. Note that
            relations form directed edges, that is, going from a person to its
            "attributes". In the example, John is 23-years old, the NAME entity
            `John` will contain a relation to the DOB entity `23-years old` and
            not the other way round.
        label:
          type: string
          title: Label
          description: The relation label.
          default: RELATED_TO
      type: object
      required:
        - coreference_id
      title: RelationItem
    EntitySubType:
      properties:
        text:
          anyOf:
            - type: string
            - type: 'null'
          title: Text
          description: The text of the sub-entity in the input text
        formatted:
          anyOf:
            - type: string
            - type: 'null'
          title: Formatted
          description: >-
            A formatted version of the input text. The format follows the
            standard that applies to this entity type.
        label:
          type: string
          title: Label
          description: The label that best apply to this entity
        location:
          anyOf:
            - $ref: '#/components/schemas/NerTextLocation'
            - type: 'null'
          description: >-
            Location object containing the start-end character indexes of the
            entity in the input text.
      type: object
      required:
        - label
      title: EntitySubType
    ValidationAssertion:
      properties:
        provider:
          $ref: '#/components/schemas/ValidationAlgorithm'
          description: >-
            The number validation algorithm, service or standard that is used
            for validation. Currently, this is only supporting the Luhn
            algorithm for validating the checksum of credit card numbers.
        status:
          $ref: '#/components/schemas/ValidationStatus'
          description: >-
            The result of the validation. If it is `valid`, this means that the
            validation done on the entity succeeded. This could be used as an
            added evidence that the entity is of the detected type and that it
            is formatted following the expected standard. If it is `invalid`,
            this means that the validation done on the entity failed. Note that
            it is possible for an entity to be valid according to one validation
            provider but invalid according to another one. If it is `unknown`,
            it means that the provider was not able to complete the validation
            on this entity. This may indicate that additional text formatting or
            pre-processing might be needed on the entity text.
      type: object
      required:
        - provider
        - status
      title: ValidationAssertion
      description: Provides the results of the entity validation processes.
    ValidationAlgorithm:
      type: string
      enum:
        - luhn
      const: luhn
      title: ValidationAlgorithm
    ValidationStatus:
      type: string
      enum:
        - valid
        - invalid
        - unknown
      title: ValidationStatus

````