neji
Flexible, easy and powerful framework for faster biomedical concept recognition.
Automatically extract dozens of heterogeneous biomedical concepts using the most appropriate and optimized techniques.

Build your processing pipeline.

Dozens of modules available to fit your needs.

Read your documents.

Raw and XML formats with custom tags, supporting Pubmed and BioMed Central articles.

Process target data.

Modules for sentence splitting, tokenization, dependency parsing, concept recognition (dictionary and machine learning), and more.

Get concept tree.

Innovative concept tree with nested and intersected annotations supporting multiple identifiers.

Store your concepts.

Various known output formats: XML, A1, CoNLL, JSON, and Neji.

Easy, yet advanced.

Neji is ready to make complex biomedical concept recognition a simple routine task.

Easy to use.

Using the CLI tool, annotating millions of documents is as simple as running a bash command. Programmatically, developing custom pipelines is straightforward, taking advantage of helpers to deal with resources, batch processing and concept trees.

Flexible and Scalable.

The architecture allows easy development of new modules and pipelines, supporting parallel documents processing.

Fast.

Neji provides fast annotation of documents, depending on used models complexity and dictionaries. In a typical use case, it is able to process up to 400 sentences/second.

For you and the community.

Use, change, distribute and contribute.

Open source.

You are free to use, change, distribute and contribute to the framework development. Neji source code is available at Github.

License.

Attribution, share alike and non-commercial use (CC BY-NC-SA 3.0).

Documentation.

Complete javadoc and tutorials for the CLI tool and framework are provided.