=== The comprof-cmd-src package contains the following files: README This file. Makefile Type "make" for constructing "comprof". comprof.c The main program. mem.c mem.h Functions for accounting memory usage. context.c context.h Functions for handling the finite-context models. seqs.c seqs.h Functions for reading sequences and handling alphabets. defs.h Definition of some data types. scI.fa.gz S. cerevisiae S288c chromosome I sequence, used as an example. It is in gzip fasta format. sp3.fa.gz S. pombe 972h- chromosome III sequence, used as an example. It is in gzip fasta format. === The comprof program has the following interface: comprof [ -h (help) ] [ -o outFile ] [ -wig chrom ] [ -v (print more info) ] [ -g gamma (def 0.990) ] [ -k order n/d ] [ -k order n/d ] ... [ -ic (do not update inverted complements) ] [ -ms modelSeq ] [ -mr start:end (model range) ] [ -tr start:end (target range) ] [ -or start:end (output range) ] [ -pow (standard lib pow) ] [ -1 (omit index col) ] [ -m ms/md/min/lr/rl (def min) ] [ -s S (def 1; sub-sampling 1:S) ] [ -w M (def 0 or 5*S if -s present; window size 2M+1) ] [ -hm (Hamming, def) ] [ -hn (Hann) ] [ -bk (Blackman) ] [ -rc (Rectangular) ] [ -p (gnuplot) ] targetSeq -h Displays additional information. -o outFile If present, it writes the information profile in file "outFile". -wig chrom The output file is in wiggle format (accepted, for example, by the UCSC Genome Browser custom tracks and by the Integrative Genomics Viewer of the Broad Institute). "chrom" should be the chromosome name that is recognized by the browser. -v Verbose. Some additional information is displayed. -g gamma (def 0.990) The value of gamma (for further details, see the paper) for the mixture of the probabilities estimated by the finite-context models. -k order n/d The order of a finite-context model, optionally followed by the alpha of the probability estimator (for further details, see the paper) in the form of a fraction, where "n" is the numerator and "d" is the denominator. -ic By default, the corresponding inverted complements are updated in the finite-context models. This flag prevents this behaviour. -ms modelSeq The model sequence, when required. -mr start:end Range of bases to be considered in the model sequence. -tr start:end Range of bases to be considered in the target sequence. -or start:end Range of bases (in the target sequence) for which an information profile is output. -pow Use the standard library power function, instead of a (faster) approximation. -1 Do not write the index column in the output file. -m ms/md/min/lr/rl (def min) Definition of the operation mode of the program. For more details, run "comprof -h". -s S (def 1) Sub-sampling factor (1:S) of the information profile. -w M (def 0 or 5*S if -s present) The window size (2M+1) for low-pass filtering (smoothing) of the information profile. If omitted and if the flag -s is present it is calculated as M = 5*S, where S is the sub-sampling factor described above; otherwise, the default value is 0. -hm (def) Use a Hamming window for low-pass filtering. -hn Use a Hann window for low-pass filtering. -bk Use a Blackman window for low-pass filtering. -rc Use a rectangular window for low-pass filtering. -p Display the output information profile using "gnuplot". targetSeq The DNA sequence to be analyzed (fasta or gzip fasta are accepted). === Example 1, using the supplied file sp3.fa.gz and the default parameters, and making use of the gnuplot program for displaying the information profile. shell prompt> comprof -o sp3.inf sp3.fa.gz comprof version 1.000 The target sequence has a total of 2452883 bases Needed 0.02 s for reading the sequence(s) Creating 64 pModels (tSize: 3, delta = 1/1) Creating 16384 pModels (tSize: 7, delta = 1/1) Creating 4194304 pModels (tSize: 11, delta = 1/1) Creating 268435456 pModels (tSize: 14, delta = 1/20) Run mode: SELF_MIN Target sequence range: 1:2452883 Output sequence range: 1:2452883 Current memory allocated by (m/c/re)alloc: 225.77 MiB Reseting cModels... ...done Current memory allocated by (m/c/re)alloc: 140.67 MiB Needed 7.22 s for generating the profile Using a window size of 491 bp for low-pass filtering Total cpu time used: 7.24 s Total memory allocated by (m/c/re)alloc: 235.13 MiB shell prompt> gnuplot -p -e 'plot "sp3.inf" w l' shell prompt> === Example 2, using the supplied file scI.fa.gz and the default parameters, for generating a WIG file that is compatible with the UCSC Genome Browser and the Integrative Genomics Viewer. shell prompt> comprof -wig chrI -o scI.wig scI.fa.gz comprof version 1.000 The target sequence has a total of 230218 bases Needed 0.01 s for reading the sequence(s) Creating 64 pModels (tSize: 3, delta = 1/1) Creating 16384 pModels (tSize: 7, delta = 1/1) Creating 4194304 pModels (tSize: 11, delta = 1/1) Creating 268435456 pModels (tSize: 14, delta = 1/20) Run mode: SELF_MIN Target sequence range: 1:230218 Output sequence range: 1:230218 Current memory allocated by (m/c/re)alloc: 138.52 MiB Reseting cModels... ...done Current memory allocated by (m/c/re)alloc: 130.28 MiB Needed 0.68 s for generating the profile Using a window size of 41 bp for low-pass filtering Total cpu time used: 0.69 s Total memory allocated by (m/c/re)alloc: 139.40 MiB shell prompt> === Example 3, using the supplied file sp3.fa.gz and custom parameters, and making use of the gnuplot program for displaying the information profile. shell prompt> comprof -k 2 -k 4 -k 6 -k 8 -k 10 -k 12 -k 14 1/20 -k 16 1/20 -s 100 -o sp3.inf sp3.fa.gz comprof version 1.000 The target sequence has a total of 2452883 bases Needed 0.03 s for reading the sequence(s) Creating 16 pModels (tSize: 2, delta = 1/1) Creating 256 pModels (tSize: 4, delta = 1/1) Creating 4096 pModels (tSize: 6, delta = 1/1) Creating 65536 pModels (tSize: 8, delta = 1/1) Creating 1048576 pModels (tSize: 10, delta = 1/1) Creating 16777216 pModels (tSize: 12, delta = 1/1) Creating 268435456 pModels (tSize: 14, delta = 1/20) Creating 4294967296 pModels (tSize: 16, delta = 1/20) Run mode: SELF_MIN Target sequence range: 1:2452883 Output sequence range: 1:2452883 Current memory allocated by (m/c/re)alloc: 479.07 MiB Reseting cModels... ...done Current memory allocated by (m/c/re)alloc: 340.45 MiB Needed 15.8 s for generating the profile Using a window size of 1001 bp for low-pass filtering Total cpu time used: 15.83 s Total memory allocated by (m/c/re)alloc: 488.43 MiB shell prompt> gnuplot -p -e 'plot "sp3.inf" w l' shell prompt>