Bug#926056: poppler-utils: pdftohtml with -xml options generate corrupted xml file
(too old to reply)
Robert Paciorek
2019-03-31 00:40:02 UTC
Package: poppler-utils
Version: 0.71.0-3
Severity: important


pdftohtml with -xml options puts incorrect characters (binary data?) in "id" and "size" attributes of <fontspec/> tag.

Bug appeared in 0.71 version - 0.69.0 don't have this issue (but have other problem with missing whitespaces, not present in 0.71).

Looks like bug is fixed in 0.74 (0.74 from Ubuntu works OK).

Best Regards,
Robert Paciorek

-- System Information:
Debian Release: buster/sid
APT prefers testing
APT policy: (500, 'testing')
Architecture: amd64 (x86_64)

Kernel: Linux 4.9.0-1-amd64 (SMP w/6 CPU cores)
Kernel taint flags: TAINT_UNSIGNED_MODULE
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) (ignored: LC_ALL set to C.UTF-8), LANGUAGE=en_US.UTF-8 (charmap=UTF-8) (ignored: LC_ALL set to C.UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: unable to detect

Versions of packages poppler-utils depends on:
ii libc6 2.28-8
ii libcairo2 1.16.0-4
ii libfreetype6 2.9.1-3
ii liblcms2-2 2.9-3
ii libpoppler82 0.71.0-3
ii libstdc++6 8.3.0-2

poppler-utils recommends no packages.

poppler-utils suggests no packages.

-- no debconf information
Grégoire Sutre
2019-11-08 12:40:02 UTC

I was hit by the same bug. I fixed it by applying this upstream commit :


Note that the package that this commit fixes is in fact libpoppler82.