Raw and XML formats with custom tags, supporting
Pubmed and BioMed Central articles.
Process target data.
Modules for sentence splitting, tokenization, dependency parsing, concept recognition (dictionary and machine learning), and more.
Get concept tree.
Innovative concept tree with nested and intersected annotations supporting multiple identifiers.
Store your concepts.
Various known output formats:
Easy, yet advanced.
Neji is ready to make complex biomedical concept recognition a simple routine task.
Easy to use.
Using the CLI tool, annotating millions of documents is as simple as running a bash command. Programmatically, developing custom pipelines is straightforward, taking advantage of helpers to deal with resources, batch processing and concept trees.
Flexible and Scalable.
The architecture allows easy development of new modules and pipelines, supporting parallel documents processing.
Neji provides fast annotation of documents, depending on used models complexity and dictionaries. In a typical use case, it is able to process up to 400 sentences/second.
For you and the community.
Use, change, distribute and contribute.
You are free to use, change, distribute and contribute to the framework development. Neji source code is available at Github.