Connect with one of our privacy experts to run this code.
ner/text endpoint introduced in 3.9 to return entities in text and describe an approach to do the same in files.
Detect entities in text (new in 3.9)
Thener/text route introduced in 3.9 returns a list of detected entities. It can be thought of as a cut-down version of process/text that only returns the list of detected entities, with a key difference described in the next section. In this snippet we use our Python SDK to invoke the ner/text route on a short sentence and to return a list of detected entities:
Text NER
entities field:
Text NER
NER Response
Process vs Detect Entities
There is a key difference between the entities returned inprocess/text route and ner/text: process/text groups overlapping entity detections into a single entity object, while ner/text does not. This is evident from the previous example, where John Smith detected three different entities: John Smith, John and Smith. The corresponding process/text entity list is:
Process Text Json Object
ner/text provides the raw output of the entity detection engine and is recommended if details about all entities discovered in a text fragment, including overlapping ones are required. With the ner/text route you will be able to answer questions like Does this text contain zip codes? or Does it contain a complete address? This extra flexibility implies that you should be ready to implement your own post-processing logic.
You should use the process/text if non-overlapping logical entities are required, e.g. to count the number of detected entities.
Detect entities in files
While thener/text route only supports text at this time, it is still possible to achieve a similar behaviour for files with the caveat mentioned in the previous section, only grouped entities are accessible for files.
In this snippet we use our python sdk to process a file as base64.
Processing File Via Base64 Route
.entities object from the API response, and add it to a dictionary with the original file path set in the path key. In this case we are creating one dictionary to map the file to the entities, but to process an entire directory of files you can build a list where each element is a dictionary as described below, or emit the dictionary to a datastore of your chosing.
File NER
File NER JSON Response
Wrap Up
Getting a list of entities contained in a text input or in a file is equally simple. The key in this guide is to access theentities field in the response. It’s that simple 😀. See the API Reference to learn more about the other response fields like processed_text and processed_file.