> ## Documentation Index
> Fetch the complete documentation index at: https://docs.getlimina.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Processing CSV Files

> This guide will get you started with CSV deidentification.

Limina supports scanning Comma Separated Value (CSV) files for PII and creating de-identified or redacted copies. Limina’s supported entity types function across each file type, with localized variants of different **PII** (Personally Identifiable Information) entities, **PHI** (Protected Health Information) entities, and **PCI** (Payment Card Industry) entities being detected. Our [Supported Languages](/languages) and [Supported Entity Types](/entities) page provides a more detailed look.

## How CSV Files Are Processed

Similar to Excel files, CSV files are processed using the method described for Tabular Data in the [Structured Data Guide](/configuration-and-operations/working-with-files/structured-data). The output file retains its original format with labels in place of the detected PII.

## Constraints

<Info>
  Please consider writing a handler for your specific application using the [Structured Data Guide](/configuration-and-operations/working-with-files/structured-data) to get around any of the constraints listed below.
</Info>

* The file processing routes are synchronous, meaning that large files over 10MB in size may take a long time to process.
* The data in the CSV file must be row-oriented (i.e. each row represents a separate record) and the headers must be on the first row.
* Files must adhere to the [csv file standards](https://www.loc.gov/preservation/digital/formats/fdd/fdd000323.shtml).

## Additional Options

CSV file processing is additionally supported by a specialized pre-processing parameter called `csv_options`. This parameter enables a sampling mode that provides efficiencies in overall file processing performance. The sampling feature looks at a selection of cells within each column of the file, predicts the entity type, and redacts the entirety of the column using those detected entity types. This can have a drastic improvement in performance when working with very large CSV files.

## Support Matrix

|           | CPU Container | GPU Container | Community API | Professional API |
| --------- | ------------- | ------------- | ------------- | ---------------- |
| Supported | Yes           | Yes           | Up to 250 KiB | No               |

## Sample Request

<Info>
  [Connect with one of our privacy experts](https://getlimina.ai/contact-us/?utm_source=docs\&utm_medium=website) to run this code.
</Info>

<CodeGroup>
  ```json Request Body wrap lines theme={"theme":"poimandres"}
  {
    "file": {
      "data": "<file_content_base64>",
      "content_type": "text/csv"
    },
    "entity_detection": {
      "return_entity": true
    }
  }
  ```

  ```shell curl wrap lines theme={"theme":"poimandres"}
  echo '{
            "file": {"data": "'$(base64 -w 0 sample.csv)'", 
            "content_type": "text/csv"}, 
            "entity_detection": {"return_entity": "True"}
        }' \
  | curl --request POST --url 'https://api.private-ai.com/community/v4/process/files/base64' \
         -H 'Content-Type: application/json' \
         -H 'x-api-key: <YOUR KEY HERE>' \
         -d @- \
         | jq -r .processed_file \
         | base64 -d > 'sample.redacted.csv'
  ```

  ```python python wrap lines theme={"theme":"poimandres"}
  import requests
  import base64

  file_url = "https://paidocumentation.blob.core.windows.net/$web/sample.csv"
  filename_out = "/path/to/output/sample.redacted.csv"
  file_content = requests.get(file_url).content
  file_content_base64 = base64.b64encode(file_content).decode()

  headers = {"Content-Type": "application/json", "x-api-key": "<INSERT API KEY>"}

  url = "https://api.private-ai.com/community/v4/process/files/base64"

  payload = {
    "file":{
      "data": file_content_base64,
      "content_type": "text/csv",
    },
    "entity_detection": {
      "return_entity": True
    }
  }

  response = requests.post(url, json=payload, headers=headers)
  with open(filename_out, "wb") as f:
      f.write(base64.b64decode(response.json()["processed_file"]))
  ```

  ```python Python Client wrap lines theme={"theme":"poimandres"}
  from privateai_client import PAIClient
  from privateai_client.objects import request_objects
  import base64

  filename_in = "sample.csv"
  filename_out = "sample.redacted.csv"

  file_type= "text/csv"
  client = PAIClient(url="https://api.private-ai.com/community/v4/", api_key="<YOUR API KEY>")

  with open(filename_in, "rb") as b64_file:
      file_data = base64.b64encode(b64_file.read())
      file_data = file_data.decode("ascii")

  file_obj = request_objects.file_obj(data=file_data, content_type=file_type)
  request_obj = request_objects.file_base64_obj(file=file_obj)
  resp = client.process_files_base64(request_object=request_obj)

  with open(filename_out, 'wb') as redacted_file:
      processed_file = resp.processed_file.encode("ascii")
      processed_file = base64.b64decode(processed_file, validate=True)
      redacted_file.write(processed_file)
  ```
</CodeGroup>

## Sample Response

```json Response wrap lines theme={"theme":"poimandres"}
{
  "processed_file": "Base64 Encoded File Content of the Redacted File",
  "processed_text": "string",
  "entities": "List[Entity]",
  "entities_present": true,
  "languages_detected": {"lang_1": 0.67, "lang_2": 0.74}
}
```
