FALCON is an alignment-free unsupervised system to measure a similarity top of multiple reads according to a database. The machine learning system can be used, for example, to classify metagenomic samples. The core of the method is based on the relative algorithmic entropy, a notion that uses model-freezing and exclusive information from a reference, allowing to use much lower computational resources. Moreover, it uses variable multi-threading, without multiplying the memory for each thread, being able to run efficiently from a powerful server to a common laptop. To measure the similarity, the system will build multiple finite-context (Markovian) models that at the end of the reference sequence will be kept frozen. The target reads will then be measured using a mixture of the frozen models. The mixture estimates the probabilities assuming dependency from model performance, and thus, it will allow to adapt the usage of the models according to the nature of the target sequence. Furthermore, it uses fault tolerant (substitution edits) Markovian models that bridge the gap between context sizes. Several running modes are available for different hardware and speed specifications. The system is able to automatically learn to measure similarity, whose properties are characteristics of the Artificial Intelligence field.


Paper was submitted, currently the citation should be addressed to the url (bioinformatics.ua.pt/software/falcon).