Tools for manipulating sequence clusters.
This is a bunch of stuff I needed at some for manipulating sequence clusters. See the README for details. The tools included are:
filter - remove unwanted sequences from a clustering
hist - produce a histogram of cluster sizes from a "label"-formatted clustering.
clusc - compare clusterings, calculating numerous pair-based and entropy based indices.
add_single - add singletons to a clustering.
ace2contigs - parse an ACE assembly file, and output the contigs in a FASTA file.
ace2fasta - parse an ACE assembly, and output each assembly in a separate FASTA file
ace2clusters - parse an ACE assembly, and output clusters in TGICL format
clusterlibs - given a table of regular expressions and library names, along with a clustering (TGICL-format), output a table of cluster sizes per library.
xcerpt - extract sequences from a list of sequence labels.
The Darcs repository is at: http://malde.org/~ketil/biohaskell/cluster_tools.