pdf2djvu (1) - Linux Manuals

pdf2djvu: creates DjVu files from PDF files

Command to display pdf2djvu manual in Linux: $ man 1 pdf2djvu

NAME

pdf2djvu - creates DjVu files from PDF files

SYNOPSIS

pdf2djvu [{-o | --output} output-djvu-file] [option...] pdf-file...
pdf2djvu {-i | --indirect} index-djvu-file [option...] pdf-file...
pdf2djvu {--version | --help | -h}

DESCRIPTION

This program creates a DjVu file from one or more Portable Document Format files.

OPTIONS

pdf2djvu accepts the following options:

Document type, file names

-o, --output=output-djvu-file

: Generate a bundled multi-page document. Write the file into output-djvu-file instead of standard output.

-i, --indirect=index-djvu-file

: Generate an indirect multi-page document. Use index-djvu-file as the index file name; put the component files into the same directory. The directory must exist and be writable.

--pageid-template=template

Specifies the naming scheme for page identifiers. Consult the "TEMPLATE LANGUAGE" section for the template language description.

The default template is "p{page:04*}.djvu".

For portability reasons, page identifiers:

: • must consist only of lowercase ASCII letters, digits, _, +, - and dot,

: • cannot start with a +, - or a dot,

: • cannot contain two consecutive dots,

: • must end with the .djvu or the .djv extension.

--pageid-prefix=prefix

: Equivalent to "--pageid-template=prefix{page:04*}.djvu".

--page-title-template=template

: Specifies the template for page titles. Consult the "TEMPLATE LANGUAGE" section for the template language description.
The default is to set no page titles.

Resolution, page size

-d, --dpi=resolution

: Specifies the desired resolution to resolution dots per inch. The default is 300 dpi. The allowed range is: 72 ≤ resolution ≤ 6000.

--media-box

: Use MediaBox to determine page size. CropBox is used by default.

--page-size=widthxheight

: Specifies the preferred page size to width pixels × height pixels. The actual page size may be altered in order to respect aspect ratio and DjVu limitations on resolution. (This option takes precedence over -d/--dpi.)

--guess-dpi

: Try to guess native resolution by inspecting embedded images. Use with care.

Image quality

--bg-slices=n+...+n, --bg-slices=n,...,n

: Specifies the encoding quality of the IW44 background layer. This option is similar to the -slice option of c44. Consult the c44(1) manual page for details. The default is 72+11+10+10.

--bg-subsample=n

: Specifies the background subsampling ratio. The default is 3. Valid values are integers between 1 and 12, inclusive.

--fg-colors=default

: Try to preserve all the foreground layer colors. This is the default.

--fg-colors=web

: Reduce foreground layer colors to the web palette (216 colors). This option is not recommended.

--fg-colors=n

: Use GraphicsMagick to reduce number of distinct colors in the foreground layer to n. Valid values are integers between 1 and 4080. This option is not recommended.

--fg-colors=black

: Discard any color information from the foreground layer.

--monochrome

: Render pages as monochrome bitmaps. With this option, --bg-... and --fg-... options are not respected.

--loss-level=n

: Specifies the aggressiveness of the lossy compression. The default is 0 (lossless). Valid values are integers between 0 and 200, inclusive. This option is similar to the -losslevel option of cjb2; consult the cjb2(1) manual page for details. This option is respected only along with the --monochrome option.

--lossy

: Synonym for --loss-level=100.

--anti-alias

: Enable font and vector anti-aliasing. This option is not recommended.

Extraction

--no-metadata

Don't extract the metadata.

By default:

: • The following entries of the document information dictionary are extracted: Title, Author, Subject, Creator, Producer, CreationDate, ModDate. Timestamps are formatted according to m[blue]RFC 3999m[][1], with date and time components separated by a single space.

: • The XMP metadata is extracted (or created) and updated accordingly.

: Note
If multiple input documents are specified, only metadata of the first one is taken into account.

--verbatim-metadata

: Keep the original metadata intact.

--no-outline

: Don't extract the document outline.

--hyperlinks=border-avis

: Make hyperlink borders always visible.
By default, a hyperlink border is visible only when the mouse is over the hyperlink.

--hyperlinks=#RRGGBB

: Force the specified border color for hyperlinks.

--no-hyperlinks, --hyperlinks=none

: Don't extract hyperlinks.

--no-text

: Don't extract the text.

--words

: Extract the text. Record the location of every word. This is the default.

--lines

: Extract the text. Record the location of every line, rather that every word.

--crop-text

: Extract no text outside the page boundary.

--no-nfkc

: Don't m[blue]NFKCm[][2]-normalize the text.

--filter-text=command-line

: Filter the text through the command-line. The provided filter must preserve whitespace, control characters and decimal digits.
This option implies --no-nfkc.

-p, --pages=page-range

: Specifies pages to convert. page-range is a comma-separated list of sub-ranges. Each sub-range is either a single page (e.g. 17) or a contiguous range of pages (e.g. 37-42). Pages are numbered from 1.
The default is to convert all pages.

Performance

-j, --jobs=n

: Use n threads to perform conversion. The default is to use one thread.

-j0, --jobs=0

: Determine automatically how many threads to use to perform conversion.

Verbosity, help

-v, --verbose

: Display more informational messages while converting the file.

-q, --quiet

: Don't display informational messages while converting the file.

--version

: Output version information and exit.

-h, --help

: Display help and exit.

ENVIRONMENT

The following environment variables affects pdf2djvu on Unix systems:

OMP_*

: Details of runtime behaviour with respect to parallelism can be controlled by several environment variables. Please refer to the m[blue]OpenMP API specificationm[][3] for details.

TMPDIR

: pdf2djvu makes heavy use of temporary files. It will store them in a directory specified by this variable. The default is /tmp.

TEMPLATE LANGUAGE

Template syntax

The template language is roughly modelled on the m[blue]Python string formatting syntaxm[][4].

A template is a piece of text which contains fields, surrounded by curly braces {}. Fields are replaced with appropriately formatted values when the template is evaluated. Moreover, {{ is replaced with a single { and }} is replaced with a single }.

Field syntax

Each field consists of a variable name, optionally followed by a shift, optionally followed by a format specification.

The shift is a signed (i.e. starting with a + or - character) integer.

The format specification consists of a colon, followed by a width specification.

The width specification is a decimal integer defining the minimum field width. If not specified, then the field width will be determined by the content. Preceding the width specification with a zero (0) character enables zero-padding.

The width specification is optionally followed by an asterisk (*) character, which increases the minimum field width to the width of the longest possible content of the variable.

Available variables

page, spage

: Page number in the PDF document.

dpage

: Page number in the DjVu document.

IMPLEMENTATION DETAILS

Layer separation algorithm

Unless the --monochrome option is on, pdf2djvu uses the following naïve layer separation algorithm:

1. For each page, do the following:

: 1. Raster the page into a pixmap, in the usual manner.

2. Raster the page into another pixmap, omitting the following page elements:

: • text,

: • 1 bit-per-pixel raster images,

: • vector elements (except fills of large areas).

3. Compare both pixmaps, pixel by pixel:

: 1. If their colors match, classify the pixel as a part of the background layer.

: 2. Otherwise, classify the pixel as a part of the foreground layer.

BUG REPORTS

If you find a bug in pdf2djvu, please report it at m[blue]the issue trackerm[][5].

AUTHOR

Jakub Wilk <jwilk [at] jwilk.net>

: Author.

NOTES

1.

RFC 3999

: https://www.ietf.org/rfc/rfc3339

2.

NFKC

: http://unicode.org/reports/tr15/

3.

OpenMP API specification

: http://openmp.org/wp/openmp-specifications/

4.

Python string formatting syntax

: https://docs.python.org/library/string.html#format-string-syntax

5.

the issue tracker

: https://bitbucket.org/jwilk/pdf2djvu/issues