README

README — An introduction of ChupaText, a text extraction utility

Name

ChupaText

Author

Nobuyoshi Nakada <nakada@clear-code.com>
Kouhei Sutou <kou@clear-code.com>

License

Source: LGPLv2.1 or later. (detail: license/lgpl-2.1.txt )
Document: Triple license: LGPL, GFDL and/or CC.
- LGPL: v2.1 or later. (detail: license/lgpl-2.1.txt )
- GFDL: v1.3 or later. (detail: license/gfdl-1.3.txt )
- CC: BY-SA
Exceptions:
- modules/excel/: GPLv2. (detail: license/gpl-2.txt ) They are included in Gnumeric .
- ...

What's this?

ChupaText is a text extraction utility. It can extracts text and metadata from PDF and office documents. You can use it vie library, command line and Web service.

Dependency libraries and softwares

Required:

GLib >= 2.24
libgsf

Optional:

Poppler
wv
libgoffice
Gnumeric
LibreOffice, OpenOffice.org or unoconv
ruby >= 1.9.2

Get

tar.gz: <URL:http://rubyforge.org/frs/?group_id=8073>

Repository

There is the repository for ChupaText on GitHub .

% git clone git://github.com/ranguba/chupatext.git

Install

See install .

Usage

% chupatext [OPTION ...] FILE ...

FILE is a file what you want to extract from.

See chupatext for more details.

Thanks

Yuto Hayamizu

Fulltext search with Ruby and groonga - Ranguba