Discussion:
Bug#774422: perl: please make perl builds reproducible
(too old to reply)
Jérémy Bobbio
2015-01-02 13:50:02 UTC
Permalink
Source: perl
Version: 5.20.1-4
Severity: wishlist
Tags: patch
User: reproducible-***@lists.alioth.debian.org
Usertags: timestamps fileordering

Hi!

While working on the “reproducible builds” effort [1], we have noticed
that perl could not be built reproducibly.

The attached patches will fix that with our current experimental
framework. I hope the description of each patch is enough to understand
their purpose.

[1]: https://wiki.debian.org/ReproducibleBuilds
--
Lunar .''`.
***@debian.org : :Ⓐ : # apt-get install anarchism
`. `'`
`-
Niko Tyni
2015-01-09 19:50:01 UTC
Permalink
Post by Jérémy Bobbio
Source: perl
Version: 5.20.1-4
Severity: wishlist
Tags: patch
Usertags: timestamps fileordering
The attached patches will fix that with our current experimental
framework. I hope the description of each patch is enough to understand
their purpose.
Thanks, this is awesome! I only had a quick look so just a couple of
notes and questions for now.
Post by Jérémy Bobbio
Subject: [PATCH] Fix mtimes before building binary packages
To enable perl to build reproducibly, mtimes of any files created
after the date of the latest debian/changelog entry will be changed to
that date.
Is this because of the date header in manpages? Setting the POD_MAN_DATE
environment variable could/should suffice for that, I think. See
debian/patches/fixes/pod_man_reproducible_date.diff
Post by Jérémy Bobbio
Subject: [PATCH] Stop recording build date and time
In order to make the package build reproducibly, we remove the
recording of the build date and time. This was already optional
in case the __DATE__ C pre-processor macro was not available.
I expect this needs to be made configurable for upstream to accept
it. Also, it might be safer to replace __DATE__ and __TIME__ with
some placeholders rather than dropping them, at least until this is
upstreamed. There might well be some crazy things parsing 'perl -V'
output or something like that which could choke if the lines are left
out altogether.
Post by Jérémy Bobbio
Subject: [PATCH] Create libperl.a using deterministic mode
In order to make Perl builds reproducible, create libperl.a using ar
in deterministic mode.
This patch was duplicated in your mail: first
Post by Jérémy Bobbio
- $(AR) rcu $(LIBPERL) $(obj) $(DYNALOADER)
+ $(AR) Drcu $(LIBPERL) $(obj) $(DYNALOADER)
and later
Post by Jérémy Bobbio
- $(AR) rcu $(LIBPERL) $(obj) $(DYNALOADER)
+ $(AR) Drc $(LIBPERL) $(obj) $(DYNALOADER)
I assume the first is the correct one.
Post by Jérémy Bobbio
Subject: [PATCH] Set mtime of patchlevel.h to highest mtime of Debian patches
$patchlevel_date in perlbug is determined by looking at patchlevel.h mtime.
In order to make Perl builds reproducible, we thus set this value to
the highest mtime of all the Debian patches.
---
debian/gen-patchlevel | 10 ++++++++++
Not sure the 'touch' part belongs in gen-patchlevel, which currently
just prints to STDOUT. But I can see it would be nice to pick up the
mtime while reading the patches anyway. I wonder if we could/should
use the changelog date instead, though. The whole thing of writing
$patchlevel_date into perlbug to see how old this perl is feels weird...
--
Niko Tyni ***@debian.org
--
To UNSUBSCRIBE, email to debian-bugs-dist-***@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact ***@lists.debian.org
Jérémy Bobbio
2015-01-09 20:10:01 UTC
Permalink
Post by Niko Tyni
Post by Jérémy Bobbio
Subject: [PATCH] Fix mtimes before building binary packages
To enable perl to build reproducibly, mtimes of any files created
after the date of the latest debian/changelog entry will be changed to
that date.
Is this because of the date header in manpages? Setting the POD_MAN_DATE
environment variable could/should suffice for that, I think. See
debian/patches/fixes/pod_man_reproducible_date.diff
This is needed to have reproducible mtimes in data.tar and control.tar.
This is done right before calling dpkg-source.
Post by Niko Tyni
Post by Jérémy Bobbio
Subject: [PATCH] Stop recording build date and time
In order to make the package build reproducibly, we remove the
recording of the build date and time. This was already optional
in case the __DATE__ C pre-processor macro was not available.
I expect this needs to be made configurable for upstream to accept
it. Also, it might be safer to replace __DATE__ and __TIME__ with
some placeholders rather than dropping them, at least until this is
upstreamed. There might well be some crazy things parsing 'perl -V'
output or something like that which could choke if the lines are left
out altogether.
I went ahead with removing the values because there were already
#ifdefs. But maybe the value of cf_time should be passed through `-D` or
something similar. I'm not sure what the best way is.
Post by Niko Tyni
Post by Jérémy Bobbio
Subject: [PATCH] Create libperl.a using deterministic mode
In order to make Perl builds reproducible, create libperl.a using ar
in deterministic mode.
This patch was duplicated in your mail: first
Post by Jérémy Bobbio
- $(AR) rcu $(LIBPERL) $(obj) $(DYNALOADER)
+ $(AR) Drcu $(LIBPERL) $(obj) $(DYNALOADER)
and later
Post by Jérémy Bobbio
- $(AR) rcu $(LIBPERL) $(obj) $(DYNALOADER)
+ $(AR) Drc $(LIBPERL) $(obj) $(DYNALOADER)
Oops. The later is the one to pick. 'D' is incompatible with 'u'.
Post by Niko Tyni
Post by Jérémy Bobbio
Subject: [PATCH] Set mtime of patchlevel.h to highest mtime of Debian patches
$patchlevel_date in perlbug is determined by looking at patchlevel.h mtime.
In order to make Perl builds reproducible, we thus set this value to
the highest mtime of all the Debian patches.
---
debian/gen-patchlevel | 10 ++++++++++
Not sure the 'touch' part belongs in gen-patchlevel, which currently
just prints to STDOUT. But I can see it would be nice to pick up the
mtime while reading the patches anyway. I wonder if we could/should
use the changelog date instead, though. The whole thing of writing
$patchlevel_date into perlbug to see how old this perl is feels weird...
I believe this is a matter of taste. :)

Thanks for having a look,
--
Lunar .''`.
***@debian.org : :Ⓐ : # apt-get install anarchism
`. `'`
`-
Niko Tyni
2015-01-22 20:20:03 UTC
Permalink
Post by Jérémy Bobbio
Post by Niko Tyni
Post by Jérémy Bobbio
Subject: [PATCH] Fix mtimes before building binary packages
Is this because of the date header in manpages? Setting the POD_MAN_DATE
environment variable could/should suffice for that, I think. See
debian/patches/fixes/pod_man_reproducible_date.diff
This is needed to have reproducible mtimes in data.tar and control.tar.
This is done right before calling dpkg-source.
Ah, right. Sorry about that.

A few more notes:

- the build system also embeds information about the build host, at
least the kernel version and hostname. Those need to be stripped too.
From 'perl -V':

osname=linux, osvers=3.16.0-4-amd64, archname=x86_64-linux-gnu-thread-multi
uname='linux estella 3.16.0-4-amd64 #1 smp debian 3.16.7-ckt2-1 (2014-12-08) x86_64 gnulinux '

I assume varying uname et al. isn't actively tested yet?

- I would expect some of the generated manual pages to embed the build
date, at least for patched modules like Net::SMTP. Are builds from
different days compared currently and/or are you setting POD_MAN_DATE
externally? (see #759405)

- I don't think 0003-Allow-cf_time-to-be-set-externally is needed,
as config.over can override cf_time without it AFAICS.

Sorry I'm a bit slow with this... :)
--
Niko
--
To UNSUBSCRIBE, email to debian-bugs-dist-***@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact ***@lists.debian.org
Jérémy Bobbio
2015-05-04 12:40:02 UTC
Permalink
Hi!

Here's an update after rebasing my patches on 5.20.2-4.
Post by Niko Tyni
- the build system also embeds information about the build host, at
least the kernel version and hostname. Those need to be stripped too.
osname=linux, osvers=3.16.0-4-amd64, archname=x86_64-linux-gnu-thread-multi
uname='linux estella 3.16.0-4-amd64 #1 smp debian 3.16.7-ckt2-1 (2014-12-08) x86_64 gnulinux '
I assume varying uname et al. isn't actively tested yet?
We do now test it by calling `linux64 --uname-2.6`. It will make the
version look like 2.6.56-4. And indeed, this is an issue.

The kernel version shows in Config.pm (`osvers`), Config_heavy.pl
(`osvers`).

The full uname is shown in Config_heavy.pl (in a comment, and in
`myuname`), in CORE/config.h (in a comment, in `OSVERS`), and in the
binaries.

I'm not sure what's the best answer here. Always use 2.6.42? As in
Debian we can't really know which version of the kernel the package is
going to be used with, it should stay compatible with older kernels as
much as possible.


Another issue that surfaced now that we are doing timezone variations is
that LOCALTIME_MIN and LOCALTIME_MAX gets different values depending on
the value of the TZ environment variable.

This shows in CORE/conf.h, in Config_heavy.pl, and in the binaries.

If I read it right, `sLOCALTIME_min` and `sLOCALTIME_max` can be
overloaded from `Configure`.

The minimum I had on my amd64 system is with TZ=UTC-24, -62167305600.
The maximum is with TZ=UTC and is 67768036191590399.

It feels like a bug to have something that can be configured through an
environment variable on a running system affect what gets encoded in the
binary.
--
Lunar .''`.
***@debian.org : :Ⓐ : # apt-get install anarchism
`. `'`
`-
Dominic Hargreaves
2015-06-01 22:00:02 UTC
Permalink
Post by Jérémy Bobbio
Hi!
Here's an update after rebasing my patches on 5.20.2-4.
Post by Niko Tyni
- the build system also embeds information about the build host, at
least the kernel version and hostname. Those need to be stripped too.
osname=linux, osvers=3.16.0-4-amd64, archname=x86_64-linux-gnu-thread-multi
uname='linux estella 3.16.0-4-amd64 #1 smp debian 3.16.7-ckt2-1 (2014-12-08) x86_64 gnulinux '
I assume varying uname et al. isn't actively tested yet?
We do now test it by calling `linux64 --uname-2.6`. It will make the
version look like 2.6.56-4. And indeed, this is an issue.
The kernel version shows in Config.pm (`osvers`), Config_heavy.pl
(`osvers`).
The full uname is shown in Config_heavy.pl (in a comment, and in
`myuname`), in CORE/config.h (in a comment, in `OSVERS`), and in the
binaries.
I'm not sure what's the best answer here. Always use 2.6.42? As in
Debian we can't really know which version of the kernel the package is
going to be used with, it should stay compatible with older kernels as
much as possible.
Another issue that surfaced now that we are doing timezone variations is
that LOCALTIME_MIN and LOCALTIME_MAX gets different values depending on
the value of the TZ environment variable.
This shows in CORE/conf.h, in Config_heavy.pl, and in the binaries.
If I read it right, `sLOCALTIME_min` and `sLOCALTIME_max` can be
overloaded from `Configure`.
The minimum I had on my amd64 system is with TZ=UTC-24, -62167305600.
The maximum is with TZ=UTC and is 67768036191590399.
It feels like a bug to have something that can be configured through an
environment variable on a running system affect what gets encoded in the
binary.
Hello,

Thanks for the update! I noticed that you didn't include your
rebased patches as attachments, however.

We've now uploaded perl 5.22.0~rc2-2 to experimental, and that will
be a good base on which to forward patches upstream, so if you were
able to do one more rebasing that'd be excellent.

Cheers,
Dominic.
--
To UNSUBSCRIBE, email to debian-bugs-dist-***@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact ***@lists.debian.org
Niko Tyni
2015-07-03 20:30:02 UTC
Permalink
clone 774422 -1
retitle -1 perl: build timezone affects LOCALTIME_{MIN,MAX}
severity -1 normal
thanks
Post by Jérémy Bobbio
Here's an update after rebasing my patches on 5.20.2-4.
Thanks. I had a look at this and will try to get a reproducible 5.22
package into experimental soonish. It looks like the only thing that
needs upstream source changes (as opposed to configuration) is the
__DATE__/__TIME__ stuff. I understand the 'ar D' patch isn't necessary
anymore since binutils was changed.

I'll discuss at least the __DATE__ part upstream, but I think disabling
it at this phase should be good enough.
Post by Jérémy Bobbio
Post by Niko Tyni
I assume varying uname et al. isn't actively tested yet?
We do now test it by calling `linux64 --uname-2.6`. It will make the
version look like 2.6.56-4. And indeed, this is an issue.
I'm not sure what's the best answer here. Always use 2.6.42? As in
Debian we can't really know which version of the kernel the package is
going to be used with, it should stay compatible with older kernels as
much as possible.
It gets worse when we take kfreebsd and hurd into account too, but
maybe we shouldn't care about those at this point.

I suspect the uname (stored as $Config{myuname}) doesn't matter much:
codesearch.debian.net only finds libcrypt-openssl-x509-perl using it
(and even that should probably use $^O instead, which gives the runtime
OS name instead of the build time one.)

As for osvers, which has much more hits, I think it should be good enough
to hardcode a version that approximates a ~current Debian stable kernel.

My current candidate for an override in config.debian is this monstrosity:

myhostname=localhost
case "$osname" in
linux)
osvers=3.16.0
osdesc="#1 smp debian $osvers"
os=gnulinux
;;
gnu)
osvers=0.6
osdesc="gnu-mach"
os=gnu
;;
gnukfreebsd)
osvers=9.0
osdesc="#0"
os=gnukfreebsd
;;
esac
if [ -n "$osdesc" ]; then
machine_uname=$(uname -m | tr '[A-Z]' '[a-z]' | sed -e "s,['/],,g")
myuname="$osname $myhostname $osvers $osdesc $machine_uname $os "
fi

which probably is too much work for little gain.

Not sure if "leaking" uname -m output is appropriate, but making
that constant between architectures doesn't feel right either.
Post by Jérémy Bobbio
Another issue that surfaced now that we are doing timezone variations is
that LOCALTIME_MIN and LOCALTIME_MAX gets different values depending on
the value of the TZ environment variable.
The minimum I had on my amd64 system is with TZ=UTC-24, -62167305600.
The maximum is with TZ=UTC and is 67768036191590399.
It feels like a bug to have something that can be configured through an
environment variable on a running system affect what gets encoded in the
binary.
This feels like a bug to me too, and should be handled separately.
I'm cloning this and will export TZ=UTC in debian/rules, at least
for now.
--
Niko Tyni ***@debian.org
--
To UNSUBSCRIBE, email to debian-bugs-dist-***@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact ***@lists.debian.org
Dominic Hargreaves
2015-08-18 15:30:03 UTC
Permalink
Post by Niko Tyni
clone 774422 -1
retitle -1 perl: build timezone affects LOCALTIME_{MIN,MAX}
severity -1 normal
thanks
Post by Jérémy Bobbio
Here's an update after rebasing my patches on 5.20.2-4.
Thanks. I had a look at this and will try to get a reproducible 5.22
package into experimental soonish. It looks like the only thing that
needs upstream source changes (as opposed to configuration) is the
__DATE__/__TIME__ stuff. I understand the 'ar D' patch isn't necessary
anymore since binutils was changed.
I'll discuss at least the __DATE__ part upstream, but I think disabling
it at this phase should be good enough.
Just to provide an update on this: the branch dom/reproducible_builds
(with the heavy lifting all done by Niko) is ready to close this bug
with passing tests, but yesterday and today we got some feedback from
upstream about preferring the other patch[1].

Niko, do you have any preferences? I guess the main difference is
that the first version keeps on saying 'Compiled at' even when it's
not actually the compile date. This is probably acceptable for now, though.

Cheers,
Dominic.

[1] <https://rt.perl.org/Public/Bug/Display.html?id=125830#txn-1361236>
Niko Tyni
2019-10-23 19:50:02 UTC
Permalink
Control: found -1 5.30.0-8
Post by Niko Tyni
Post by Jérémy Bobbio
Another issue that surfaced now that we are doing timezone variations is
that LOCALTIME_MIN and LOCALTIME_MAX gets different values depending on
the value of the TZ environment variable.
The minimum I had on my amd64 system is with TZ=UTC-24, -62167305600.
The maximum is with TZ=UTC and is 67768036191590399.
It feels like a bug to have something that can be configured through an
environment variable on a running system affect what gets encoded in the
binary.
This feels like a bug to me too, and should be handled separately.
I'm cloning this and will export TZ=UTC in debian/rules, at least
for now.
The TZ=UTC part was accidentally dropped in the build system debhelper
conversion for 5.30 packaging. This resulted in a reproducibility
regression that Holger pointed out to me on IRC (thanks!).

I'll re-instate TZ=UTC in 5.30.0-9 or so, but clearly the underlying
issue remains.
--
Niko Tyni ***@debian.org
Guillem Jover
2019-10-28 11:40:01 UTC
Permalink
Hi!
Post by Niko Tyni
Post by Niko Tyni
Post by Jérémy Bobbio
Another issue that surfaced now that we are doing timezone variations is
that LOCALTIME_MIN and LOCALTIME_MAX gets different values depending on
the value of the TZ environment variable.
The minimum I had on my amd64 system is with TZ=UTC-24, -62167305600.
The maximum is with TZ=UTC and is 67768036191590399.
It feels like a bug to have something that can be configured through an
environment variable on a running system affect what gets encoded in the
binary.
This feels like a bug to me too, and should be handled separately.
I'm cloning this and will export TZ=UTC in debian/rules, at least
for now.
The TZ=UTC part was accidentally dropped in the build system debhelper
conversion for 5.30 packaging. This resulted in a reproducibility
regression that Holger pointed out to me on IRC (thanks!).
I'll re-instate TZ=UTC in 5.30.0-9 or so, but clearly the underlying
issue remains.
Just noticed this change from the changelog. :) UTC is not really a
proper timezone specification, the format requires an offset, so here
it would be UTC0 (see «man timezone»).

Thanks,
Guillem
Niko Tyni
2019-10-29 19:10:01 UTC
Permalink
Post by Guillem Jover
Post by Niko Tyni
I'll re-instate TZ=UTC in 5.30.0-9 or so, but clearly the underlying
issue remains.
Just noticed this change from the changelog. :) UTC is not really a
proper timezone specification, the format requires an offset, so here
it would be UTC0 (see «man timezone»).
Oh! Thanks for the note. This is probably a very common misconception.
I think the reproducible builds docs have advised setting TZ=UTC in
the past, and I see https://reproducible-builds.org/docs/timezones/
mentions it currently.

Also, codesearch.debian.net reports 95 packages matching TZ=UTC
but only two match TZ=UTC[0-9]. Time for a mass bug filing? :)
--
Niko
Dominic Hargreaves
2020-11-14 17:40:02 UTC
Permalink
Post by Niko Tyni
Control: found -1 5.30.0-8
Post by Niko Tyni
Post by Jérémy Bobbio
Another issue that surfaced now that we are doing timezone variations is
that LOCALTIME_MIN and LOCALTIME_MAX gets different values depending on
the value of the TZ environment variable.
The minimum I had on my amd64 system is with TZ=UTC-24, -62167305600.
The maximum is with TZ=UTC and is 67768036191590399.
It feels like a bug to have something that can be configured through an
environment variable on a running system affect what gets encoded in the
binary.
This feels like a bug to me too, and should be handled separately.
I'm cloning this and will export TZ=UTC in debian/rules, at least
for now.
The TZ=UTC part was accidentally dropped in the build system debhelper
conversion for 5.30 packaging. This resulted in a reproducibility
regression that Holger pointed out to me on IRC (thanks!).
I'll re-instate TZ=UTC in 5.30.0-9 or so, but clearly the underlying
issue remains.
Hi Niko,

I'm struggling to see the practical problem with having the timezone
vary LOCALTIME_{MIN,MAX} (other than reproducibility, which AIUI has
already been addressed). I don't agree with the starting point that
an environment variable shouldn't be able to influence the contents
of the binary (this is clearly a very common and necessary pattern).

Could you elaborate on your reasoning for keeping this bug open?

Thanks
Dominic
Niko Tyni
2020-11-16 16:20:02 UTC
Permalink
Control: submitter -1 !
Control: severity -1 minor
Control: tag -1 upstream
Post by Dominic Hargreaves
I'm struggling to see the practical problem with having the timezone
vary LOCALTIME_{MIN,MAX} (other than reproducibility, which AIUI has
already been addressed).
I'm not aware of any practical problems here. I suspect nothing
uses $Config{sLOCALTIME_max} et al.

Reproducibility has been addressed in a Debian-specific way. Ideally,
it would be fixed upstream so that the build result would be reproducible
regardless of the build timezone (which we are currently overriding.)
Post by Dominic Hargreaves
I don't agree with the starting point that
an environment variable shouldn't be able to influence the contents
of the binary (this is clearly a very common and necessary pattern).
I think it depends on the environment variable and its main purpose.
Something like BUILD_BZIP2 does and should influence the result, that's
what it's there for. But what's the use for encoding the local timezone
into the binaries? Binaries can be copied between hosts in different time
zones (our buildd results certainly are), users connect to hosts from
different time zones, and even hosts (think laptops) can move between
time zones.

I don't really mind closing this, it's just a minor detail and I obviously
haven't got around to doing anything about it so far. But I do think
the current TZ=UTC solution is more a workaround than a fix.

I'm updating the metadata at least, feel free to close if you're not
convinced :)
--
Niko
Holger Levsen
2015-01-23 11:10:02 UTC
Permalink
Hi Niko,
A quick search indicates that there's no separate namespace for other
uname(2) information than the host name and domain name. This suggests
that something like http://www.bstern.org/libuname/ is needed. I'm not
aware of anything in Debian already that does that. Time for an RFP maybe
:)
it builds fine but doesn't work:

***@jenkins:~/u/libuname-1.0.0$ make
gcc -Wall -Werror -O2 -fPIC -c -o libuname.o libuname.c
if [ "`uname -s`" = "SunOS" ]; then \
ld -G -dy -z text -Qn -o libuname.so libuname.o; \
else \
ld -shared -fPIC -o libuname.so libuname.o; \
fi
***@jenkins:~/u/libuname-1.0.0$ LD_PRELOAD=$PWD/libuname.so
LIBUNAME='Linux;bar;2.6.15;#1;Mon Feb 37 22:33:44 UTC 2006;i686;unknown' uname
-a
uname: symbol lookup error: /var/lib/jenkins/u/libuname-1.0.0/libuname.so:
undefined symbol: dlsym


cheers,
Holger
Daniel Kahn Gillmor
2015-01-23 15:50:03 UTC
Permalink
Post by Holger Levsen
Hi Niko,
A quick search indicates that there's no separate namespace for other
uname(2) information than the host name and domain name. This suggests
that something like http://www.bstern.org/libuname/ is needed. I'm not
aware of anything in Debian already that does that. Time for an RFP maybe
:)
gcc -Wall -Werror -O2 -fPIC -c -o libuname.o libuname.c
if [ "`uname -s`" = "SunOS" ]; then \
ld -G -dy -z text -Qn -o libuname.so libuname.o; \
else \
ld -shared -fPIC -o libuname.so libuname.o; \
fi
LIBUNAME='Linux;bar;2.6.15;#1;Mon Feb 37 22:33:44 UTC 2006;i686;unknown' uname
-a
undefined symbol: dlsym
This is resolved by the attached patch.

--dkg
Loading...