Description
Simple library for keyword extraction.
GNU libextractor is a library used to extract meta-data from files of arbitrary type. It is designed to use helper-libraries to perform the actual extraction, and to be trivially extendable by linking against external extractors for additional file types.
The goal is to provide developers of file-sharing networks or
WWW-indexing bots with a universal library to obtain simple keywords
to match against queries. libextractor contains a shell-command
extract that, similar to the well-known file command, can extract
meta-data from a file an print the results to stdout.
Currently, libextractor supports the following formats: HTML, PDF,
PS, OLE2 (DOC, XLS, PPT), OpenOffice (sxw), StarOffice (sdw), DVI,
MAN, FLAC, MP3 (ID3v1 and ID3v2), NSF(E) (NES music), SID (C64
music), OGG, WAV, EXIV2, JPEG, GIF, PNG, TIFF, DEB, RPM, TAR(.GZ),
ZIP, ELF, S3M (Scream Tracker 3), XM (eXtended Module), IT (Impulse
Tracker), FLV, REAL, RIFF (AVI), MPEG, QT and ASF. Also, various
additional MIME types are detected.