> ## Documentation Index
> Fetch the complete documentation index at: https://docs.getlimina.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Process Files Base64

> Detect entities such as PII, PHI or PCI in a base64-encoded file using Private AI's entity detection engine. After entity detection, a copy of the file with all entities removed is created and returned.

This route is similar to `/process/files/uri`, but passes the file in the POST request itself. This route allows for simple setup and testing, as no folders or volumes need to be mounted to the container.

This route supports the following content types: application/dicom, application/json, application/msword, application/pdf, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/vnd.openxmlformats-officedocument.presentationml.presentation, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet, application/vnd.openxmlformats-officedocument.wordprocessingml.document, application/xml, audio/m4a, audio/mp3, audio/mp4, audio/mp4a-latm, audio/mpeg, audio/wav, audio/webm, audio/x-wav, image/bmp, image/gif, image/jpeg, image/jpg, image/png, image/tif, image/tiff, image/x-ms-bmp, text/csv, text/plain


## OpenAPI

````yaml /openapi/privateai_4.4.0.json post /process/files/base64
openapi: 3.1.0
info:
  title: API Reference
  description: Private AI API Reference
  termsOfService: https://www.getlimina.ai/en/terms-of-use
  contact:
    url: https://www.getlimina.ai/en/contact-us
    email: info@getlimina.ai
  version: 4.4.0
servers:
  - url: https://api.getlimina.ai/community/v4
    description: Private AI Community API
  - url: https://api.getlimina.ai/professional/v4
    description: Private AI Professional API
  - url: http://localhost:8080
    description: Local Server
security: []
paths:
  /process/files/base64:
    post:
      summary: Process Files Base64
      description: >-
        Detect entities such as PII, PHI or PCI in a base64-encoded file using
        Private AI's entity detection engine. After entity detection, a copy of
        the file with all entities removed is created and returned.


        This route is similar to `/process/files/uri`, but passes the file in
        the POST request itself. This route allows for simple setup and testing,
        as no folders or volumes need to be mounted to the container.


        This route supports the following content types: application/dicom,
        application/json, application/msword, application/pdf,
        application/vnd.ms-excel, application/vnd.ms-powerpoint,
        application/vnd.openxmlformats-officedocument.presentationml.presentation,
        application/vnd.openxmlformats-officedocument.spreadsheetml.sheet,
        application/vnd.openxmlformats-officedocument.wordprocessingml.document,
        application/xml, audio/m4a, audio/mp3, audio/mp4, audio/mp4a-latm,
        audio/mpeg, audio/wav, audio/webm, audio/x-wav, image/bmp, image/gif,
        image/jpeg, image/jpg, image/png, image/tif, image/tiff, image/x-ms-bmp,
        text/csv, text/plain
      operationId: process_files_base64_process_files_base64_post
      requestBody:
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/ProcessFileRequestBase64'
            examples:
              process_file:
                summary: Process File with base64-encoded payload
                value:
                  file:
                    data: >-
                      JVBERi0xLjQKJdPr6eEKMSAwIG9iago8PC9UaXRsZSAoc2FtcGxlKQovUHJvZHVj...
                    content_type: application/pdf or image/jpeg
                  entity_detection:
                    return_entity: true
                  pdf_options:
                    density: 150
                    max_resolution: 2000
                  audio_options:
                    bleep_start_padding: 0
                    bleep_end_padding: 0
                  ocr_options:
                    ocr_system: azure_computer_vision
        required: true
      responses:
        '200':
          description: Successful Response
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ProcessFileResponseBase64'
        '400':
          description: Bad Request
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/UserErrorResponseModel'
        '500':
          description: Internal Server Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/InternalErrorResponseModel'
        4XX:
          description: Client Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/UserErrorResponseModel'
components:
  schemas:
    ProcessFileRequestBase64:
      properties:
        file:
          $ref: '#/components/schemas/File'
          description: Base64 encoded file content.
        entity_detection:
          $ref: '#/components/schemas/PIIDetectionParams'
          description: >-
            This section contains a set of parameters to control the PII
            detection process. All fields have sensible default that can be
            changed for specific needs.
        object_entity_detection:
          $ref: '#/components/schemas/ObjectEntityDetection'
          description: >-
            This section contains a set of parameters to control the object
            entity detection process. It allows the user to select the object
            entity types to detect (e.g., to detect FACE but not LICENSE_PLATE).
        pdf_options:
          $ref: '#/components/schemas/PDFOptions'
          description: >-
            Options to process PDF files, such as the rendering quality when
            each page is turned into an image.
        office_options:
          $ref: '#/components/schemas/OfficeOptions'
          description: Options to process Office files, such as table and chart behaviour.
        image_options:
          $ref: '#/components/schemas/ImageOptions'
          description: Options to process image files, such as the masking mode.
        audio_options:
          $ref: '#/components/schemas/AudioOptions'
          description: >-
            Options to process audio files, such as the padding to add while
            redacting audio segments.
        processed_text:
          oneOf:
            - $ref: '#/components/schemas/MarkerRedactedText'
            - $ref: '#/components/schemas/SyntheticRedactedText'
            - $ref: '#/components/schemas/MaskRedactedText'
          title: Processed Text
          description: >-
            This section allows the user to generate redacted (default),
            synthetic or masked text.
          default:
            type: MARKER
            pattern: '[UNIQUE_NUMBERED_ENTITY_TYPE]'
            marker_language: en
            coreference_resolution: heuristics
          discriminator:
            propertyName: type
            mapping:
              MARKER:
                $ref: '#/components/schemas/MarkerRedactedText'
              MASK:
                $ref: '#/components/schemas/MaskRedactedText'
              SYNTHETIC:
                $ref: '#/components/schemas/SyntheticRedactedText'
        project_id:
          type: string
          maxLength: 60
          pattern: ^[a-zA-Z0-9\-_\:]*$
          title: Project Id
          description: >-
            Used to categorize requests for reporting purposes. Limited to
            alphanumeric characters or the following special characters :_-
          default: main
        ocr_options:
          oneOf:
            - $ref: '#/components/schemas/AWSTextractOCROptions'
            - $ref: '#/components/schemas/AzureComputerVisionOCROptions'
            - $ref: '#/components/schemas/AzureDocIntelligenceOCROptions'
            - $ref: '#/components/schemas/PaddleOCROptions'
            - $ref: '#/components/schemas/HybridOCROptions'
          title: Ocr Options
          description: >-
            Options to provide Optical Character Recognition (OCR) details, such
            as choice of OCR system.
          default:
            ocr_system: paddleocr
            padding_ratio: 0.15
          discriminator:
            propertyName: ocr_system
            mapping:
              aws_textract:
                $ref: '#/components/schemas/AWSTextractOCROptions'
              azure_computer_vision:
                $ref: '#/components/schemas/AzureComputerVisionOCROptions'
              azure_doc_intelligence:
                $ref: '#/components/schemas/AzureDocIntelligenceOCROptions'
              hybrid:
                $ref: '#/components/schemas/HybridOCROptions'
              paddleocr:
                $ref: '#/components/schemas/PaddleOCROptions'
        return_processed_text:
          type: boolean
          title: Return Processed Text
          description: >-
            Controls whether the response contains the `processed_text` field.
            Turning this off can significantly decrease the size of the
            response.
          default: true
      additionalProperties: false
      type: object
      required:
        - file
      title: ProcessFileRequestBase64
    ProcessFileResponseBase64:
      properties:
        processed_file:
          type: string
          title: Processed File
          description: the base64 encoded file content of the redacted file.
        processed_text:
          anyOf:
            - type: string
            - type: 'null'
          title: Processed Text
          description: >-
            This field contains the redacted version of any text that was
            extracted from the input file. It corresponds to a redacted ASR
            transcript for audio files and any text found inside a document such
            as a PDF or image file.
        entities:
          items:
            anyOf:
              - $ref: '#/components/schemas/FileEntityItem'
              - $ref: '#/components/schemas/PaginatedFileEntityItem'
          type: array
          title: Entities
          description: A list of all entities found in the provided file.
        objects:
          items:
            $ref: '#/components/schemas/FileObjectEntityItem'
          type: array
          title: Objects
          description: >-
            A list of all object entities found in the provided file using
            object detection.
        entities_present:
          type: boolean
          title: Entities Present
          description: Returns `True` if the list of detected entities is not empty.
        objects_present:
          type: boolean
          title: Objects Present
          description: Returns `True` if the list of detected objects is not empty.
          default: false
        languages_detected:
          additionalProperties:
            type: number
          type: object
          title: Languages Detected
          description: >-
            A dictionary containing ISO 639-1 language labels and the likelihood
            of their detection in the request payload.
        audio_duration:
          anyOf:
            - type: number
            - type: 'null'
          title: Audio Duration
          description: The length of the audio file in seconds.
        page_count:
          anyOf:
            - type: integer
            - type: 'null'
          title: Page Count
          description: The number of pages in the file.
      type: object
      required:
        - processed_file
        - processed_text
        - entities
        - entities_present
        - languages_detected
      title: ProcessFileResponseBase64
    UserErrorResponseModel:
      properties:
        detail:
          anyOf:
            - $ref: '#/components/schemas/ErrorMessage'
            - $ref: '#/components/schemas/ValidationErrorModel'
          title: Detail
          description: >-
            The details of the error, usually relating to an input validation
            error
      type: object
      required:
        - detail
      title: UserErrorResponseModel
    InternalErrorResponseModel:
      properties:
        detail:
          type: string
          title: Detail
          description: >-
            The details of the error, usually relating to an unhandled
            processing error
      type: object
      required:
        - detail
      title: InternalErrorResponseModel
    File:
      properties:
        data:
          type: string
          title: Data
          description: Base64 encoded ASCII text data of the file to process.
        content_type:
          type: string
          title: Content Type
          description: >-
            Content type of the file. Currently ['application/dicom',
            'application/json', 'application/msword', 'application/pdf',
            'application/vnd.ms-excel', 'application/vnd.ms-powerpoint',
            'application/vnd.openxmlformats-officedocument.presentationml.presentation',
            'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet',
            'application/vnd.openxmlformats-officedocument.wordprocessingml.document',
            'application/xml', 'audio/m4a', 'audio/mp3', 'audio/mp4',
            'audio/mp4a-latm', 'audio/mpeg', 'audio/wav', 'audio/webm',
            'audio/x-wav', 'image/bmp', 'image/gif', 'image/jpeg', 'image/jpg',
            'image/png', 'image/tif', 'image/tiff', 'image/x-ms-bmp',
            'text/csv', 'text/plain'] are supported.
      additionalProperties: false
      type: object
      required:
        - data
        - content_type
      title: File
    PIIDetectionParams:
      properties:
        accuracy:
          $ref: '#/components/schemas/AccuracyMode'
          description: >-
            Selects the model used to identify PII in the input text. By
            default, the `high_automatic` accuracy model is used. This default
            automatically chooses either the high or high_multilingual model.
            Whilst the models used by the Private AI solution are highly
            optimized (~25X faster than a reference transformer implementation),
            in high-throughput cases it is possible to trade accuracy for speed
            by selecting either the `standard` or `standard_high` accuracy
            modes. Multilingual support can be enabled by using one of the
            multilingual models, namely `standard_high_multilingual` (GPU
            container only) and `high_multilingual`. The multilingual models
            process all supported languages including English, without the need
            to specify language. It is advisable to use the English-only models
            where possible, as they perform slightly better on English.
            Automatic Models can determine which model to use (English or
            Multilingual) depending on the languages detected, provided
            Multilingual models are available. More information on different
            accuracy modes can be found here:
            https://docs.getlimina.ai/configuration-and-operations/entity-detection-and-redaction/accuracy-modes
        entity_types:
          items:
            oneOf:
              - $ref: '#/components/schemas/EnableEntityTypeSelector'
              - $ref: '#/components/schemas/DisableEntityTypeSelector'
            discriminator:
              propertyName: type
              mapping:
                DISABLE:
                  $ref: '#/components/schemas/DisableEntityTypeSelector'
                ENABLE:
                  $ref: '#/components/schemas/EnableEntityTypeSelector'
          type: array
          title: Entity Types
          description: >-
            Controls which entity types and legislation sets are detected. See
            [Supported Entity
            Types](https://docs.getlimina.ai/entities/supported-entity-types)
            for the list of possible entities and legislation sets. By default,
            all entities are detected and removed. You can specify one of many
            selectors, which can be either an individual entity type such as
            `LOCATION_CITY` or a legislation like `GDPR`.
            `EnableEntityTypeSelector` selectors will add entity types to
            detect. On the contrary, `DisableEntityTypeSelector` selectors will
            ignore entities of the specified types. If only
            `DisableEntityTypeSelector` selectors are specified, they are
            assumed to be ignoring entity types from the entire supported list
            of entity types.
        filter:
          items:
            oneOf:
              - $ref: '#/components/schemas/AllowFilter'
              - $ref: '#/components/schemas/BlockFilter'
              - $ref: '#/components/schemas/AllowTextFilter'
            discriminator:
              propertyName: type
              mapping:
                ALLOW:
                  $ref: '#/components/schemas/AllowFilter'
                ALLOW_TEXT:
                  $ref: '#/components/schemas/AllowTextFilter'
                BLOCK:
                  $ref: '#/components/schemas/BlockFilter'
          type: array
          title: Filter
          description: >-
            This field contains a list of filters expressed as regular
            expressions. These regular expressions can help users customize PII
            detection in a few ways. `ALLOW` filters can disabled redaction of
            entities containing specific text (e.g. a document id with format
            ID-1212 that is not sensitive). On the other hand, the `BLOCK`
            filters can augment the existing set of entities with custom
            entities (e.g. detecting sensitive medical code with format A18.32
            as IDC_NUMBER). Finally, the `ALLOW_TEXT` filters can select a
            section of text in a document that should not be redacted (e.g.
            author names in scientific references or dates in an audit trail
            log).
        return_entity:
          type: boolean
          title: Return Entity
          description: >-
            Controls whether the PII list in the response contains the `text`
            field. Turning this off means that no sensitive PII is returned in
            the response.
          default: true
        enable_non_max_suppression:
          type: boolean
          title: Enable Non Max Suppression
          description: >-
            When set to `True`, if the best label (i.e., the label with the
            highest likelihood) of an entity is disabled then the entity will
            not be redacted. This could be useful to minimize false-positives
            when disabling an entity label like `ACCOUNT_NUMBER` that is related
            but distinct from other entities like `BANK_ACCOUNT`. Note that on
            class hierarchies like `NAME` and `LOCATION`, the behaviour of this
            field is slightly different since it is often impossible to identify
            a best label. If the label of a sub-entity is disabled (e.g.,
            `NAME_GIVEN`) then the entity will not be redacted independently of
            the value of the likelihood for this label.
          default: false
      additionalProperties: false
      type: object
      title: PIIDetectionParams
    ObjectEntityDetection:
      properties:
        object_entity_types:
          items:
            anyOf:
              - $ref: '#/components/schemas/EnableObjectEntityTypeSelector'
              - $ref: '#/components/schemas/DisableObjectEntityTypeSelector'
          type: array
          title: Object Entity Types
          description: >-
            Controls which objects are detected.
            `EnableObjectEntityTypeSelector` selectors will add object entity
            types to detect. On the contrary, `DisableObjectEntityTypeSelector`
            selectors will ignore object entities of the specified types. If
            only `DisableEntityTypeSelector` selectors are specified, they are
            assumed to be ignoring types from the entire supported list of
            object entity types.
      additionalProperties: false
      type: object
      title: ObjectEntityDetection
    PDFOptions:
      properties:
        approach:
          $ref: '#/components/schemas/PDFApproachMode'
          description: >-
            This PDF option controls whether to process the PDF pages as images
            or as native PDF files.
          default: standard
        density:
          type: integer
          title: Density
          description: >-
            PDFs are converted into images using this DPI value. Smaller values
            result in images with smaller resolutions, which will take up less
            storage space and process faster, at the cost of output quality &
            redaction accuracy.
          default: 200
        max_resolution:
          type: integer
          title: Max Resolution
          description: >-
            PDFs are converted into images using the `density` DPI value. Any
            resulting images with maximum size length larger than this will be
            resized to this value, whilst preserving aspect ratio.
          default: 3000
        enable_pdf_text_layer:
          type: boolean
          title: Enable Pdf Text Layer
          description: >-
            This PDF option controls whether a text layer is included in the PDF
            files generated by         the process. When set to `true`, the
            application will include a text layer in the PDFs, allowing for
            text         selection, search, and accessibility features. If set
            to any other value or not set at all, the text layer         will be
            disabled, potentially resulting in smaller file sizes, faster
            processing, but with reduced         functionality regarding text
            selection and search. The default value is `True`, but can be
            adjusted via `PAI_ENABLE_PDF_TEXT_LAYER` environment variable by
            setting it to `False`
          default: true
        use_inline_replacement:
          type: boolean
          title: Use Inline Replacement
          description: >-
            Applicable for the enhanced PDF approach only. When `True`
            (default), text PII is             replaced in place with the
            processed_text (marker/synthetic/mask). When `False`, PII
            regions             are instead covered using the masking method
            configured in `image_options.masking_method` (blur or            
            blackbox).
          default: true
      additionalProperties: false
      type: object
      title: PDFOptions
    OfficeOptions:
      properties:
        chart_behaviour:
          $ref: '#/components/schemas/OfficeChartApproach'
          description: >-
            This option controls the behaviour for chart deidentification in
            Office files. Available options are ['deidentify', 'set_to_zero']
          default: deidentify
        unsupported_image_behaviour:
          $ref: '#/components/schemas/OfficeUnsupportedImageApproach'
          description: >-
            This option controls the behaviour for images in Office files that
            aren't compatible with the Image Deidentifier. Available options are
            ['replace', 'keep'].The `"replace"` option will replace unsupported
            images with a blank image. The `"keep"` option will keep unsupported
            images as they are without any modifications. Note that the selected
            option is also used for supported images that fail deidentification.
          default: replace
        structure_tables:
          type: boolean
          title: Structure Tables
          description: >-
            Specifies whether the tables will be structured. If set to True, the
            table will be structured with the assumption that the first row
            contains headers.
          default: true
        original_file_integrity_check:
          type: boolean
          title: Original File Integrity Check
          description: >-
            If this option is enabled, the deidentifier will raise an exception
            if the file is corrupted. If disabled, it will raise a warning.
          default: true
        bin_containing_file_behaviour:
          $ref: '#/components/schemas/OfficeBinApproach'
          description: >-
            Certain Office XML files have inner .bin files that are not
            readable. These files can contain PII. If convert is selected, the
            deidentifier will remove the .bin file which could result in lost
            data. If leave is selected, the .bin file will be left as-is and the
            rest of the document will be deidentified.
          default: convert
        element_types_to_process:
          items: {}
          type: array
          title: Element Types To Process
          description: >-
            Controls which elements in an Office file are detected and
            processed. If not set, all elements will be deidentified. Available
            elements are [<OfficeElements.PARAGRAPHS: 'Paragraphs'>,
            <OfficeElements.TABLES: 'Tables'>, <OfficeElements.CHARTS:
            'Charts'>, <OfficeElements.HYPERLINKS: 'Hyperlinks'>,
            <OfficeElements.HEADERS: 'Headers'>, <OfficeElements.FOOTNOTES:
            'Footnotes'>, <OfficeElements.REVIEWCOMMENTS: 'ReviewComments'>,
            <OfficeElements.REVIEWAUTHORS: 'ReviewCommentAuthors'>,
            <OfficeElements.SMARTARTS: 'SmartArts'>,
            <OfficeElements.COREPROPERTIES: 'CoreProperties'>,
            <OfficeElements.ALTTEXT: 'AltText'>, <OfficeElements.INKDRAWINGS:
            'InkDrawings'>]
          default:
            - Paragraphs
            - Tables
            - Charts
            - Hyperlinks
            - Headers
            - Footnotes
            - ReviewComments
            - ReviewCommentAuthors
            - SmartArts
            - CoreProperties
            - AltText
            - InkDrawings
      additionalProperties: false
      type: object
      title: OfficeOptions
    ImageOptions:
      properties:
        masking_method:
          $ref: '#/components/schemas/MaskMode'
      additionalProperties: false
      type: object
      title: ImageOptions
    AudioOptions:
      properties:
        bleep_start_padding:
          type: number
          minimum: 0
          title: Bleep Start Padding
          description: Additional padding at the start of bleep, in seconds.
          default: 0.5
        bleep_end_padding:
          type: number
          minimum: 0
          title: Bleep End Padding
          description: Additional padding at the end of bleep, in seconds.
          default: 0.2
        distortion_steps:
          type: integer
          title: Distortion Steps
          description: >-
            Specifies how the distortion will be made. Providing a number more
            than 0 will result in a higher tone and a coefficient less than 0
            will result in a lower tone.
          default: 0
        bleep_frequency:
          type: integer
          title: Bleep Frequency
          description: >-
            The `bleep_frequency` parameter configures the frequency of the sine
            wave used for the bleep sound in an audio segment.         This
            setting allows users to adjust the pitch of the bleep, with higher
            values resulting in a higher pitch and vice versa. Ideal for        
            customizing the bleep tone to suit various audio environments, it is
            expressed in Hertz (Hz) and should be chosen considering the
            balance         and clarity needed in the audio. The default setting
            is 600 Hz, which represents a standard bleep tone.
          default: 600
        bleep_gain:
          type: number
          title: Bleep Gain
          description: >-
            The `bleep_gain` parameter sets the gain level, in decibels (dB),
            for the bleep sound within the audio segment.         It controls
            the relative loudness of the bleep, allowing for precise volume
            adjustments. A value of 0.0 dB maintains the         original
            amplitude of the bleep, positive values increase its loudness, and
            negative values decrease it.
          default: -3
      additionalProperties: false
      type: object
      title: AudioOptions
    MarkerRedactedText:
      properties:
        type:
          type: string
          enum:
            - MARKER
          const: MARKER
          title: Type
          default: MARKER
        pattern:
          type: string
          title: Pattern
          description: >-
            Specifies a custom redaction marker format. The format must contain
            one and only one of these predefined keywords: `BEST_ENTITY_TYPE`,
            `ALL_ENTITY_TYPES`, `UNIQUE_NUMBERED_ENTITY_TYPE`,
            `UNIQUE_HASHED_ENTITY_TYPE`, which will be replaced as follows:

             The keyword `BEST_ENTITY_TYPE` will be replaced by the entity type. For example, `--[BEST_ENTITY_TYPE]--` will transform `My name is John` into `My name is --[NAME_GIVEN]--`.

            The keyword `UNIQUE_NUMBERED_ENTITY_TYPE` is similar, but it will
            also make the marker unique by appending an index at the end. For
            example `--[UNIQUE_NUMBERED_ENTITY_TYPE]--` yields `My name is
            --[NAME_GIVEN_1]--`.


            The keyword `ALL_ENTITY_TYPES` will be replaced with all the entity
            types applicable to the entity. For example `--[ALL_ENTITY_TYPES]--`
            yields `My name is --[NAME,NAME_GIVEN]--`.


            It is also possible to set this option via environment variable. See
            [Environment
            Variables](https://docs.getlimina.ai/configuration-and-operations/container-management/environment-variables).
          default: '[UNIQUE_NUMBERED_ENTITY_TYPE]'
        marker_language:
          $ref: '#/components/schemas/MarkerLanguage'
          description: >-
            The ISO 639 code of the language that the markers will be in. If set
            to `auto`, the detected language of the text is used. This feature
            is only available for `nl`, `en`, `fr`, `de`, `hi`, `it`, `ja`,
            `ko`, `zh`, `pt`, `ru`, `es`, `tl`, `uk`, if this feature is not
            available for the detected language, `en` will be used. More
            information on the translated labels used by the marker_language is
            found here:
            https://docs.getlimina.ai/entities/translated-entity-labels
          default: en
        coreference_resolution:
          $ref: '#/components/schemas/CoreferenceResolutionMode'
          description: >-
            (Experimental) Turns on the experimental coreference resolution.
            Specifies whether multiple instances of the same entity should be
            replaced with the same marker or not. For example, with
            `coreference_resolution` set: "Hi John and Rosha, John nice to meet
            you" -> "Hi [NAME_1] and [NAME_2], [NAME_1] nice to meet you".
            Without `coreference_resolution` set: "Hi [NAME_1] and [NAME_2],
            [NAME_3] nice to meet you". Note that this option is only available
            if the pattern field contains either the
            `UNIQUE_NUMBERED_ENTITY_TYPE` keyword or the
            `UNIQUE_HASHED_ENTITY_TYPE` keyword.
          default: heuristics
      additionalProperties: false
      type: object
      title: MarkerRedactedText
    SyntheticRedactedText:
      properties:
        type:
          type: string
          enum:
            - SYNTHETIC
          const: SYNTHETIC
          title: Type
          default: SYNTHETIC
        coreference_resolution:
          anyOf:
            - $ref: '#/components/schemas/CoreferenceResolutionMode'
            - type: 'null'
          description: >-
            (Experimental) Turns on the experimental coreference resolution.
            Specifies whether multiple instances of the same entity should have
            the same generated synthetic entity or not. For example, with
            `coreference_resolution` set: "Hi John and Rosha, John nice to meet
            you" -> "Hi Harry and Alev, Harry nice to meet you". Without
            `coreference_resolution` set: "Hi John and Rosha, John nice to meet
            you" -> "Hi Harry and Alev, Sulav nice to meet you". If this option
            is not set, no attempt will be made to replace multiple mentions of
            the same entity with a unique value. Set this field to null if you
            want to disable coreference resolution for synthetic entities.
          default: heuristics
        synthetic_entity_accuracy:
          $ref: '#/components/schemas/SyntheticPIIAccuracyMode'
          description: >-
            (Beta) Enable synthetic entity generation using the specified model.
            Currently this feature is in beta. Supported values are ['standard',
            'standard_multilingual', 'standard_automatic']. Use the value
            `standard_automatic` to let the service automatically choose the
            best of `standard` or `standard_multilingual` options. The default
            value is `standard_automatic` if the container is equipped with the
            synthetic multilingual model, otherwise, the default value is
            `standard`
      additionalProperties: false
      type: object
      title: SyntheticRedactedText
    MaskRedactedText:
      properties:
        type:
          type: string
          enum:
            - MASK
          const: MASK
          title: Type
          default: MASK
        mask_character:
          type: string
          title: Mask Character
          description: >-
            Replaces redacted PII with the specified mask character. Note that
            this input may only be a single character
          default: '#'
      additionalProperties: false
      type: object
      title: MaskRedactedText
    AWSTextractOCROptions:
      properties:
        ocr_system:
          type: string
          enum:
            - aws_textract
          const: aws_textract
          title: Ocr System
          default: aws_textract
      additionalProperties: false
      type: object
      title: AWSTextractOCROptions
    AzureComputerVisionOCROptions:
      properties:
        ocr_system:
          type: string
          enum:
            - azure_computer_vision
          const: azure_computer_vision
          title: Ocr System
          default: azure_computer_vision
      additionalProperties: false
      type: object
      title: AzureComputerVisionOCROptions
    AzureDocIntelligenceOCROptions:
      properties:
        ocr_system:
          type: string
          enum:
            - azure_doc_intelligence
          const: azure_doc_intelligence
          title: Ocr System
          default: azure_doc_intelligence
      additionalProperties: false
      type: object
      title: AzureDocIntelligenceOCROptions
    PaddleOCROptions:
      properties:
        ocr_system:
          type: string
          enum:
            - paddleocr
          const: paddleocr
          title: Ocr System
          default: paddleocr
        padding_ratio:
          type: number
          maximum: 1
          minimum: 0
          title: Padding Ratio
          description: >-
            This parameter adjusts the size of the extra padding around the
            added masked boxes for text snippets in the processed image.
          default: 0.15
      additionalProperties: false
      type: object
      title: PaddleOCROptions
    HybridOCROptions:
      properties:
        ocr_system:
          type: string
          enum:
            - hybrid
          const: hybrid
          title: Ocr System
          default: hybrid
        padding_ratio:
          type: number
          maximum: 1
          minimum: 0
          title: Padding Ratio
          description: >-
            This parameter adjusts the size of the extra padding around the
            added masked boxes for text snippets in the processed image.
          default: 0.15
      additionalProperties: false
      type: object
      title: HybridOCROptions
    FileEntityItem:
      properties:
        processed_text:
          type: string
          title: Processed Text
          description: >-
            The corresponding marker in the de-identified text (result field),
            where the entity exists. Note that this field is only populated for
            text-based formats such as `.txt`
        text:
          type: string
          title: Text
          description: >-
            The text corresponding to the entity. For images the text is
            obtained using OCR, whilst for audio the text is obtained using ASR.
        location:
          anyOf:
            - $ref: '#/components/schemas/TextLocation'
            - $ref: '#/components/schemas/PaginatedTextLocation'
            - $ref: '#/components/schemas/ImageLocation'
            - $ref: '#/components/schemas/AudioLocation'
          title: Location
        best_label:
          type: string
          title: Best Label
          description: The entity label with the highest likelihood.
        labels:
          additionalProperties:
            type: number
          type: object
          title: Labels
          description: >-
            A dictionary of all possible labels, together with associated
            likelihoods. Note that these are not strictly probabilities and do
            not sum to 1, as a word can belong to multiple classes. The scores
            have also been thresholded, so no additional thresholding is
            necessary.
      type: object
      required:
        - processed_text
        - text
        - location
        - best_label
        - labels
      title: FileEntityItem
      description: Empty
    PaginatedFileEntityItem:
      properties:
        processed_text:
          type: string
          title: Processed Text
          description: >-
            The corresponding marker in the de-identified text (result field),
            where the entity exists. Note that this field is only populated for
            text-based formats such as `.txt`
        text:
          type: string
          title: Text
          description: >-
            The text corresponding to the entity. For images the text is
            obtained using OCR, whilst for audio the text is obtained using ASR.
        location:
          anyOf:
            - $ref: '#/components/schemas/TextLocation'
            - $ref: '#/components/schemas/PaginatedTextLocation'
            - $ref: '#/components/schemas/ImageLocation'
            - $ref: '#/components/schemas/AudioLocation'
          title: Location
        best_label:
          type: string
          title: Best Label
          description: The entity label with the highest likelihood.
        labels:
          additionalProperties:
            type: number
          type: object
          title: Labels
          description: >-
            A dictionary of all possible labels, together with associated
            likelihoods. Note that these are not strictly probabilities and do
            not sum to 1, as a word can belong to multiple classes. The scores
            have also been thresholded, so no additional thresholding is
            necessary.
        element:
          type: string
          title: Element
          description: Which element the entity is found in.
      type: object
      required:
        - processed_text
        - text
        - location
        - best_label
        - labels
      title: PaginatedFileEntityItem
      description: >-
        Location of an entity item in a given Paginated file. Also specifies
        what element the entity is found in.
    FileObjectEntityItem:
      properties:
        type:
          $ref: '#/components/schemas/ObjectEntityType'
          description: Type of the detected object (e.g., LICENSE_PLATE, FACE).
        location:
          $ref: '#/components/schemas/ObjectLocation'
          description: Location coordinates of the object within the document.
      type: object
      required:
        - type
        - location
      title: FileObjectEntityItem
      description: >-
        Represents an object detected in the document, including its type and
        location coordinates.
    ErrorMessage:
      type: string
      title: ErrorMessage
      description: An error message
    ValidationErrorModel:
      properties:
        description:
          type: string
          title: Description
          description: User-friendly error message description.
        loc:
          items:
            anyOf:
              - type: string
              - type: integer
          type: array
          title: Loc
          description: >-
            The location where the error has occurred. This could be location of
            a character if the error is a parsing error, or it could be a value
            in the request if there is a validation error.
        msg:
          type: string
          title: Msg
          description: The error message.
        type:
          type: string
          title: Type
          description: The type of error that has been encountered.
      type: object
      required:
        - loc
        - msg
        - type
      title: ValidationErrorModel
    AccuracyMode:
      type: string
      enum:
        - standard
        - standard_high
        - standard_high_multilingual
        - standard_high_automatic
        - high
        - high_multilingual
        - high_automatic
      title: AccuracyMode
      description: >-
        Selects the model used to identify PII in the input text. By default,
        the "high_automatic" accuracy model is used. This default automatically
        chooses either the high or high_multilingual model. Whilst the models
        used by the Private AI solution are highly optimized (~25X faster than a
        reference transformer implementation), in high-throughput cases it is
        possible to trade accuracy for speed by selecting either the "standard"
        or "standard_high" accuracy modes. Multilingual support can be enabled
        by using one of the multilingual models, namely
        "standard_high_multilingual" (GPU container only) and
        "high_multilingual". The multilingual models process all supported
        languages including English, without the need to specify language. It is
        advisable to use the English-only models where possible, as they perform
        slightly better on English. Automatic Models can determine which model
        to use (English or Multilingual) depending on the languages detected,
        provided Multilingual models are available. More information on
        different accuracy modes can be found here:
        https://docs.getlimina.ai/configuration-and-operations/entity-detection-and-redaction/accuracy-modes
    EnableEntityTypeSelector:
      properties:
        type:
          type: string
          enum:
            - ENABLE
          const: ENABLE
          title: Type
          default: ENABLE
        value:
          items:
            type: string
          type: array
          title: Value
          description: >-
            A list of entity types to detect and remove. See [Supported Entity
            Types](https://docs.getlimina.ai/entities/supported-entity-types)
            for a complete list of entity types. This can also be one or many
            legislations. We currently support these legislations  ['APPI',
            'APPI_SENSITIVE', 'CORE_ENTITIES', 'CPRA', 'GDPR', 'GDPR_SENSITIVE',
            'HEALTH_INFORMATION', 'HIPAA_SAFE_HARBOR', 'LIDI', 'PCI',
            'QUEBEC_PRIVACY_ACT', 'CCI', 'NUMERICAL_EXCL_PCI'].
      additionalProperties: false
      type: object
      title: EnableEntityTypeSelector
    DisableEntityTypeSelector:
      properties:
        type:
          type: string
          enum:
            - DISABLE
          const: DISABLE
          title: Type
          default: DISABLE
        value:
          items:
            type: string
          type: array
          title: Value
          description: >-
            A list of entity types to ignore. See [Supported Entity
            Types](https://docs.getlimina.ai/entities/supported-entity-types)
            for a complete list of entity types.
      additionalProperties: false
      type: object
      title: DisableEntityTypeSelector
    AllowFilter:
      properties:
        type:
          type: string
          enum:
            - ALLOW
          const: ALLOW
          title: Type
          description: >-
            Entities with text matching the provided regex pattern will be
            discarded. It is also possible to set this option via environment
            variable. See [Environment
            Variables](https://docs.getlimina.ai/configuration-and-operations/container-management/environment-variables).
          default: ALLOW
        pattern:
          type: string
          title: Pattern
          description: >-
            A python regex pattern (e.g. `r"^ID-[\d]{4}$"`). This pattern will
            be used to match the entity text. If it matches zero or more
            characters at the **beginning of the entity text**, the entity will
            be ignored. Be sure to use the end of string character `$` if you
            want to only allow entities when the entirety of the text matches.
            It is also important to note that regex patterns may require
            escaping when used in JSON objects.
      additionalProperties: false
      type: object
      required:
        - pattern
      title: AllowFilter
    BlockFilter:
      properties:
        type:
          type: string
          enum:
            - BLOCK
          const: BLOCK
          title: Type
          description: >-
            The block feature allows you to extend the functionality of the
            Private AI models by using regular expressions. This way, you can
            define a Python regex pattern that will be used to identify
            additional tokens with the given PII label.


            Several block list filters can be specified with their own regex
            pattern.


            Lastly,for supported labels, if you would like the model to pick up
            only the tokens from the block list, you can use the enabled entity
            type feature together with the block list feature. This can be done
            by defining a list of enabled entity types and not including the
            supported label you are adding to the block list. For example, if
            you would like the label `ORGANIZATION` to only pick up Microsoft,
            you can define the enabled entity types as `[{"type":"ENABLE",
            "value": "NAME"}, {"type": "ENABLE", "value": "LOCATION"}, {"type":
            "ENABLE", "value": "AGE"}, ...]` (and omitting `ORGANIZATION`) and
            the block list as `[{"type": "BLOCK", "entitiy_type":
            "ORGANIZATION", "pattern": "Microsoft"}]`.
          default: BLOCK
        entity_type:
          type: string
          title: Entity Type
          description: >-
            Name of the custom entity type. It can either be a completely new
            entity type such as `CUSTOM_ID` or an existing entity, such as
            `NAME`.
        pattern:
          type: string
          title: Pattern
          description: >-
            This is a pattern to match in the text. This feature uses regex
            patterns, you can either pass a word (e.g. the, word, custom, etc.)
            or you can pass a valid Python regex pattern. It is important to
            note that regex patterns may require escaping when used in JSON
            objects. To give an example, if you would like to send the regex
            pattern `r"\b\w{4}\b"` which will catch every 4-character word, you
            need to send it as `"\\b\\w{4}\\b"`. A complete JSON grammar is
            found here: https://www.json.org/json-en.html. More information on
            how to write a python regex is found here:
            https://docs.python.org/3/library/re.html


            It is important to note also that only non-overlapping matches are
            returned.
        threshold:
          type: number
          title: Threshold
          description: >-
            This is defining a likelihood threshold for custom entity. This
            likelihood is compared against the predicted model likelihood and if
            it is greater then the custom entity is outputted instead of the
            model predicted entity. By default this threshold is set to 1.0
            which will ensure that the blocked entities will always be preferred
            over a matching model predicted entity. This can be any value
            between 0 and 1.
          default: 1
      additionalProperties: false
      type: object
      required:
        - entity_type
        - pattern
      title: BlockFilter
    AllowTextFilter:
      properties:
        type:
          type: string
          enum:
            - ALLOW_TEXT
          const: ALLOW_TEXT
          title: Type
          description: >-
            Input text matching the provided regex pattern will not be redacted.
            It is currently not possible to set this option via environment
            variable.
          default: ALLOW_TEXT
        pattern:
          type: string
          title: Pattern
          description: >-
            A python regex pattern (e.g. `r"^ID-[\d]{4}$"`). This pattern will
            be used to match the input text. Entities detected **inside the
            matched text** will be ignored. Note that capturing groups can be
            used in the regex pattern. If present, only the text matching a
            capturing group will be left unredacted. It is also important to
            note that regex patterns may require escaping when used in JSON
            objects.
      additionalProperties: false
      type: object
      required:
        - pattern
      title: AllowTextFilter
    EnableObjectEntityTypeSelector:
      properties:
        type:
          type: string
          enum:
            - ENABLE
          const: ENABLE
          title: Type
          default: ENABLE
        value:
          items:
            $ref: '#/components/schemas/ObjectEntityType'
          type: array
          title: Value
          description: >-
            A list of object types to detect and process (e.g., FACE,
            LICENSE_PLATE).
      additionalProperties: false
      type: object
      title: EnableObjectEntityTypeSelector
    DisableObjectEntityTypeSelector:
      properties:
        type:
          type: string
          enum:
            - DISABLE
          const: DISABLE
          title: Type
          default: DISABLE
        value:
          items:
            $ref: '#/components/schemas/ObjectEntityType'
          type: array
          title: Value
          description: A list of object types to ignore (e.g., FACE, LICENSE_PLATE).
      additionalProperties: false
      type: object
      title: DisableObjectEntityTypeSelector
    PDFApproachMode:
      type: string
      enum:
        - enhanced
        - standard
        - auto
      title: PDFApproachMode
    OfficeChartApproach:
      type: string
      enum:
        - deidentify
        - set_to_zero
      title: OfficeChartApproach
      description: >-
        This Enum defines the options for chart deidentification in an Office
        file.
    OfficeUnsupportedImageApproach:
      type: string
      enum:
        - replace
        - keep
      title: OfficeUnsupportedImageApproach
      description: >-
        This Enum defines the options for INCOMPATIBLE image deidentification in
        an Office file.
    OfficeBinApproach:
      type: string
      enum:
        - convert
        - leave
      title: OfficeBinApproach
      description: >-
        This Enum defines the options for deidentification for Office files
        containing inner .bin files.
    MaskMode:
      type: string
      enum:
        - blur
        - blackbox
      title: MaskMode
    MarkerLanguage:
      type: string
      enum:
        - auto
        - nl
        - en
        - fr
        - de
        - hi
        - it
        - ja
        - ko
        - zh
        - pt
        - ru
        - es
        - tl
        - uk
      title: MarkerLanguage
    CoreferenceResolutionMode:
      type: string
      enum:
        - heuristics
        - combined
        - model_prediction
      title: CoreferenceResolutionMode
    SyntheticPIIAccuracyMode:
      type: string
      enum:
        - standard
        - standard_multilingual
        - standard_automatic
      title: SyntheticPIIAccuracyMode
    TextLocation:
      properties:
        stt_idx:
          type: integer
          title: Stt Idx
          description: Start character index of the entity in the original text.
        end_idx:
          type: integer
          title: End Idx
          description: >-
            Index of the character immediately following the entity, such that
            end_idx - stt_idx = number of characters in the entity.
        stt_idx_processed:
          type: integer
          title: Stt Idx Processed
          description: Start character index of the entity in the processed text.
        end_idx_processed:
          type: integer
          title: End Idx Processed
          description: >-
            Index of the character immediately following the entity in the
            processed text.
      type: object
      required:
        - stt_idx
        - end_idx
        - stt_idx_processed
        - end_idx_processed
      title: TextLocation
      description: Start and end indices of the entity in a text.
    PaginatedTextLocation:
      properties:
        stt_idx:
          type: integer
          title: Stt Idx
          description: Start character index of the entity in the original text.
        end_idx:
          type: integer
          title: End Idx
          description: >-
            Index of the character immediately following the entity, such that
            end_idx - stt_idx = number of characters in the entity.
        stt_idx_processed:
          type: integer
          title: Stt Idx Processed
          description: Start character index of the entity in the processed text.
        end_idx_processed:
          type: integer
          title: End Idx Processed
          description: >-
            Index of the character immediately following the entity in the
            processed text.
        page:
          type: integer
          title: Page
          description: >-
            Page info for the entity. Only applies to files with paginated text.
            This includes Office and native PDF documents.
          default: 0
      type: object
      required:
        - stt_idx
        - end_idx
        - stt_idx_processed
        - end_idx_processed
      title: PaginatedTextLocation
      description: Start and end indices of the entity in a text.
    ImageLocation:
      properties:
        page:
          type: integer
          title: Page
          description: >-
            The page or layer that the entity occurs on. This corresponds to
            page in a PDF document or layer in a TIFF image.
        x0:
          type: number
          maximum: 1
          minimum: 0
          title: X0
          description: X coordinate of the upper left point of the entity bounding box.
        x1:
          type: number
          maximum: 1
          minimum: 0
          title: X1
          description: X coordinate of the lower right point of the entity bounding box.
        y0:
          type: number
          maximum: 1
          minimum: 0
          title: Y0
          description: Y coordinate of the upper left point of the entity bounding box.
        y1:
          type: number
          maximum: 1
          minimum: 0
          title: Y1
          description: Y coordinate of the lower right point of the entity bounding box.
      type: object
      required:
        - page
        - x0
        - x1
        - y0
        - y1
      title: ImageLocation
      description: >-
        Bounding box of the entity in an image or PDF file (PDF files are
        converted to images). The origin is the upper left pixel of the image.
        Coordinates are given as a fraction of the X and Y image dimensions.
    AudioLocation:
      properties:
        stt_time:
          type: number
          title: Stt Time
          description: The start timestamp of the entity, in seconds
        end_time:
          type: number
          title: End Time
          description: The end timestamp of the entity, in seconds
      type: object
      required:
        - stt_time
        - end_time
      title: AudioLocation
      description: Timestamp of the entity in an audio file such as `.wav`.
    ObjectEntityType:
      type: string
      enum:
        - FACE
        - LICENSE_PLATE
        - LOGO
        - SIGNATURE
      title: ObjectEntityType
    ObjectLocation:
      properties:
        page:
          type: integer
          title: Page
          description: >-
            The page or layer that the entity occurs on. This corresponds to
            page in a PDF document or layer in a TIFF image.
        x0:
          type: number
          maximum: 1
          minimum: 0
          title: X0
          description: X coordinate of the upper left point of the entity bounding box.
        x1:
          type: number
          maximum: 1
          minimum: 0
          title: X1
          description: X coordinate of the lower right point of the entity bounding box.
        y0:
          type: number
          maximum: 1
          minimum: 0
          title: Y0
          description: Y coordinate of the upper left point of the entity bounding box.
        y1:
          type: number
          maximum: 1
          minimum: 0
          title: Y1
          description: Y coordinate of the lower right point of the entity bounding box.
      type: object
      required:
        - page
        - x0
        - x1
        - y0
        - y1
      title: ObjectLocation
      description: >-
        Bounding box of the Object entity in an image or PDF file (PDF files are
        converted to images). The origin is the upper left pixel of the image.
        Coordinates are given as a fraction of the X and Y image dimensions.

````