Discussion:
Bug#868255: openjdk-9: Please build with --with-debug-level=slowdebug on Zero-only architectures
(too old to reply)
John Paul Adrian Glaubitz
2017-07-13 20:10:02 UTC
Permalink
Raw Message
Source: openjdk-9
Version: 9~b177-3
Severity: normal

Hi!

openjdk-9 currently FTBFS on architectures which exclusively rely on the Zero
VM. This happens because the JVM segfaults during build at some point [1].

The exact reason for the segmentation fault has not been discovered yet, but
we know that building with "--with-debug-level=slowdebug" instead of "=release"
resolves the issue.

Thus, I suggest building openjdk-9 on Zero-only architectures for the time
being as a work-around.

Thanks,
Adrian
[1] http://mail.openjdk.java.net/pipermail/hotspot-dev/2017-June/027117.html
--
.''`. John Paul Adrian Glaubitz
: :' : Debian Developer - ***@debian.org
`. `' Freie Universitaet Berlin - ***@physik.fu-berlin.de
`- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
Andrew Haley
2017-07-17 11:10:03 UTC
Permalink
Raw Message
Post by John Paul Adrian Glaubitz
Source: openjdk-9
Version: 9~b177-3
Severity: normal
If no-one at Debian can fix this, you could send me a login.
--
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
John Paul Adrian Glaubitz
2017-07-27 09:50:02 UTC
Permalink
Raw Message
Interestingly, this issue does not seem to affect all architectures.

On sh4 (which is Zero-only), for example, openjdk-9 builds fine
despite being built with --with-debug-level=release [1].

So far, it's only m68k and powerpc which are affected (and presumably
powerpcspe as well which is powerpc just without FPU and Altivec).
Also, as we're building Zero on the Hotspot architectures as well and
it works there with release debug level as well.

Thus, I suggest only switching the debug level to slowdebug on m68k,
powerpc and powerpcspe:

--- debian/rules.orig 2017-07-26 16:10:01.192537186 +0200
+++ debian/rules 2017-07-27 11:39:07.527175594 +0200
@@ -548,11 +548,11 @@
--with-version-pre=$(distribution) \
--with-version-opt=$(PKGVERSION) \

-ifneq (,$(filter $(DEB_HOST_ARCH),$(hotspot_archs)))
- DEFAULT_CONFIGURE_ARGS += --with-debug-level=release
- ZERO_CONFIGURE_ARGS += --with-debug-level=slowdebug
+# see #868255
+ifneq (,$(filter $(DEB_HOST_ARCH),m68k powerpc powerpcspe))
+ COMMON_CONFIGURE_ARGS += --with-debug-level=slowdebug
else
- DEFAULT_CONFIGURE_ARGS += --with-debug-level=slowdebug
+ COMMON_CONFIGURE_ARGS += --with-debug-level=release
endif

COMMON_CONFIGURE_ARGS += \

Attaching patch.

However, before applying this, let me test whether this might resolve
by building openjdk-9 with gcc-7. We've seen gcc-6 miscompiling code
where gcc-7 worked fine [2].
[1] https://buildd.debian.org/status/package.php?p=openjdk-9&suite=sid
[2] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=869373
--
.''`. John Paul Adrian Glaubitz
: :' : Debian Developer - ***@debian.org
`. `' Freie Universitaet Berlin - ***@physik.fu-berlin.de
`- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
John Paul Adrian Glaubitz
2017-07-27 12:30:01 UTC
Permalink
Raw Message
Post by John Paul Adrian Glaubitz
However, before applying this, let me test whether this might resolve
by building openjdk-9 with gcc-7. We've seen gcc-6 miscompiling code
where gcc-7 worked fine [2].
That doesn't help, unfortunately. So, for the time being, we have to
use the suggested patch above until someone has figured out why the
JVM segfaults on m68k and powerpc.

Adrian
--
.''`. John Paul Adrian Glaubitz
: :' : Debian Developer - ***@debian.org
`. `' Freie Universitaet Berlin - ***@physik.fu-berlin.de
`- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
John Paul Adrian Glaubitz
2017-08-01 17:20:02 UTC
Permalink
Raw Message
Post by John Paul Adrian Glaubitz
So far, it's only m68k and powerpc which are affected (and presumably
powerpcspe as well which is powerpc just without FPU and Altivec).
Also, as we're building Zero on the Hotspot architectures as well and
it works there with release debug level as well.
Thus, I suggest only switching the debug level to slowdebug on m68k,
Now that we know what the problem on powerpc was (see #870403 [1]),
this patch can be reduced to using the slowdebug debug level on m68k
only:

--- debian/rules.orig 2017-07-24 13:20:07.000000000 +0200
+++ debian/rules 2017-08-01 19:02:54.325839488 +0200
@@ -548,15 +548,13 @@
--with-version-pre=$(distribution) \
--with-version-opt=$(PKGVERSION) \

-ifneq (,$(filter $(DEB_HOST_ARCH),$(hotspot_archs)))
- DEFAULT_CONFIGURE_ARGS += --with-debug-level=release
- ZERO_CONFIGURE_ARGS += --with-debug-level=slowdebug
+ifneq (,$(filter $(DEB_HOST_ARCH),m68k))
+ COMMON_CONFIGURE_ARGS += --with-debug-level=slowdebug
else
- DEFAULT_CONFIGURE_ARGS += --with-debug-level=slowdebug
+ COMMON_CONFIGURE_ARGS += --with-debug-level=release
endif

COMMON_CONFIGURE_ARGS += \
- --with-debug-level=release \
--enable-unlimited-crypto \
--with-zlib=system \
--with-giflib=system \

Attaching an updated patch.

Thanks,
Adrian
Post by John Paul Adrian Glaubitz
[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=870403
--
.''`. John Paul Adrian Glaubitz
: :' : Debian Developer - ***@debian.org
`. `' Freie Universitaet Berlin - ***@physik.fu-berlin.de
`- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
John Paul Adrian Glaubitz
2017-07-22 18:50:01 UTC
Permalink
Raw Message
Control: reopen -1

That didn't work, unfortunately. On powerpc, it's still building with debug-level
release and consequently fails [1] as powerpc is Zero-only target:

(No custom hook found at /«PKGBUILDDIR»/src/closed/autoconf/custom-hook.m4)
cd build && MAKE_VERBOSE=y QUIETLY= LOG=debug IGNORE_OLD_CONFIG=true LIBFFI_LIBS=-lffi_pic DEBUG_BINARIES=true FULL_DEBUG_SYMBOLS=0 ZIP_DEBUGINFO_FILES=0
STRIP_POLICY=none POST_STRIP_CMD=true ../src/configure \
--host=powerpc-linux-gnu --build=powerpc-linux-gnu --with-jvm-variants=zero --with-boot-jdk=/usr/lib/jvm/java-8-openjdk-powerpc
--with-boot-jdk-jvmargs="-XX:ThreadStackSize=2240" --with-extra-cflags='-Wdate-time -D_FORTIFY_SOURCE=2 -g -fdebug-prefix-map=/«PKGBUILDDIR»=. -Wformat
-fno-stack-protector -Wno-deprecated-declarations -Wdate-time -D_FORTIFY_SOURCE=2' --with-extra-cxxflags='-Wdate-time -D_FORTIFY_SOURCE=2 -g
-fdebug-prefix-map=/«PKGBUILDDIR»=. -Wformat -fno-stack-protector -Wno-deprecated-declarations' --with-extra-ldflags='-Xlinker -z -Xlinker relro -Xlinker
-Bsymbolic-functions' \
--disable-ccache --with-jtreg=/usr --with-version-pre=Debian --with-version-opt=9~b179-1 --with-debug-level=release --enable-unlimited-crypto
--with-zlib=system --with-giflib=system --with-libpng=system --with-libjpeg=system --with-lcms=system --with-pcsclite=system --with-stdc++lib=dynamic
--disable-warnings-as-errors --disable-javac-server --with-num-cores=4
Warning: You are using legacy autoconf cross-compilation flags.
It is recommended that you use --openjdk-target instead.

From the debian/rules file:

COMMON_CONFIGURE_ARGS += \
--with-debug-level=release \
--enable-unlimited-crypto \
--with-zlib=system \
--with-giflib=system \
--with-libpng=system \
--with-libjpeg=system \
--with-lcms=system \
--with-pcsclite=system \
--with-stdc++lib=dynamic \
--disable-warnings-as-errors \
--disable-javac-server \

This has to be "--with-debug-level=slowdebug" for Zero targets.

Thus, I suggest the following change:

--- debian/rules.orig 2017-07-21 16:02:19.000000000 +0200

+++ debian/rules 2017-07-22 20:43:09.146954686 +0200

@@ -524,9 +524,11 @@

else

DEFAULT_CONFIGURE_ARGS += --with-jvm-variants=server

endif

+ DEFAULT_CONFIGURE_ARGS += --with-debug-level=release

else

DEFAULT_CONFIGURE_ARGS += --host=$(DEB_HOST_GNU_TYPE) --build=$(DEB_BUILD_GNU_TYPE)

DEFAULT_CONFIGURE_ARGS += --with-jvm-variants=zero

+ DEFAULT_CONFIGURE_ARGS += --with-debug-level=slowdebug

endif

ZERO_CONFIGURE_ARGS += --with-jvm-variants=zero


@@ -560,7 +562,6 @@
--with-version-opt=$(PKGVERSION) \

COMMON_CONFIGURE_ARGS += \
- --with-debug-level=release \
--enable-unlimited-crypto \
--with-zlib=system \
--with-giflib=system \
[1] https://buildd.debian.org/status/fetch.php?pkg=openjdk-9&arch=powerpc&ver=9%7Eb179-1&stamp=1500747475&raw=0
--
.''`. John Paul Adrian Glaubitz
: :' : Debian Developer - ***@debian.org
`. `' Freie Universitaet Berlin - ***@physik.fu-berlin.de
`- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
John Paul Adrian Glaubitz
2017-07-24 15:10:01 UTC
Permalink
Raw Message
Control: reopen -1

The build still fails because "--with-debug-level=release" is still
part of the common configure flags:

COMMON_CONFIGURE_ARGS += \
--with-debug-level=release \

Adrian
--
.''`. John Paul Adrian Glaubitz
: :' : Debian Developer - ***@debian.org
`. `' Freie Universitaet Berlin - ***@physik.fu-berlin.de
`- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
John Paul Adrian Glaubitz
2017-08-01 13:10:02 UTC
Permalink
Raw Message
So I found two bugs in the package which stop it from building, one
yours and one ours. The first one is
debian/patches/8073754-stack-overflow-9-build.diff, which sets the
thread stack size to 2240: this is too small, and the build aborts. I
think this problem may be due to the use of 64k pages.
Interesting.
NOTE THAT you should not increase the thread sizes in
os_linux_zero.cpp: these are minimums. Change the values in
hotspot/src/os_cpu/linux_zero/vm/globals_linux_zero.hpp and
common/autoconf/boot-jdk.m4 .
Ok, I will test that.
The second one is more subtle. Zero is so called because it uses zero
assembly language, but this is not quite true: there is a tiny bit of
assembly language, and it is wrong.
Yeah, I already assumed that because of the fact that the Zero build
fails on powerpc with --with-debug-level=release but not on sh4, for
example.
Here is the PPC32 definition of
atomic_copy64. It uses a floating-point register to copy a 64-bit
// Atomically copy 64 bits of data
static void atomic_copy64(volatile void *src, volatile void *dst) {
#if defined(PPC32) && !defined(__NO_FPRS__)
double tmp;
asm volatile ("lfd %0, 0(%1)\n"
"stfd %0, 0(%2)\n"
: "=f"(tmp)
: "b"(src), "b"(dst));
The eagle-eyed among you might have noticed the bug: this asm has no
memory effect. It has no memory inputs, no memory outputs, and no
memory clobber. So, as far as GCC is concerned atomic_copy64 does not
touch memory at all, and there is no need to store the source operand
into memory. For all GCC knows, the asm might just be doing some
arithmetic on the pointers. We need a better definition of
// Atomically copy 64 bits of data
static void atomic_copy64(volatile void *src, volatile void *dst) {
#if defined(PPC32) && !defined(__NO_FPRS__)
double tmp;
asm volatile ("lfd %0, %2\n"
"stfd %0, %1\n"
: "=&f"(tmp), "=Q"(*(volatile double*)dst)
: "Q"(*(volatile double*)src));
Wow, that's indeed very subtle.
Note that we dereference src and dst and pass the actual memory
operands to the asm, not just pointers to them.
(This might be more detail than you need, and I'm sorry this isn't a
real patch, but if you base a patch on what I've said here, it should
build. Let me know.)
Ok, I'll give it a try. Thanks a lot for digging this out!

Adrian
--
.''`. John Paul Adrian Glaubitz
: :' : Debian Developer - ***@debian.org
`. `' Freie Universitaet Berlin - ***@physik.fu-berlin.de
`- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
Andrew Haley
2017-08-01 13:10:02 UTC
Permalink
Raw Message
Post by John Paul Adrian Glaubitz
openjdk-9 currently FTBFS on architectures which exclusively rely on the Zero
VM. This happens because the JVM segfaults during build at some point [1].
So I found two bugs in the package which stop it from building, one
yours and one ours. The first one is
debian/patches/8073754-stack-overflow-9-build.diff, which sets the
thread stack size to 2240: this is too small, and the build aborts. I
think this problem may be due to the use of 64k pages.

NOTE THAT you should not increase the thread sizes in
os_linux_zero.cpp: these are minimums. Change the values in
hotspot/src/os_cpu/linux_zero/vm/globals_linux_zero.hpp and
common/autoconf/boot-jdk.m4 .

The second one is more subtle. Zero is so called because it uses zero
assembly language, but this is not quite true: there is a tiny bit of
assembly language, and it is wrong. Here is the PPC32 definition of
atomic_copy64. It uses a floating-point register to copy a 64-bit
doubleword atomically:

// Atomically copy 64 bits of data
static void atomic_copy64(volatile void *src, volatile void *dst) {
#if defined(PPC32) && !defined(__NO_FPRS__)
double tmp;
asm volatile ("lfd %0, 0(%1)\n"
"stfd %0, 0(%2)\n"
: "=f"(tmp)
: "b"(src), "b"(dst));

The eagle-eyed among you might have noticed the bug: this asm has no
memory effect. It has no memory inputs, no memory outputs, and no
memory clobber. So, as far as GCC is concerned atomic_copy64 does not
touch memory at all, and there is no need to store the source operand
into memory. For all GCC knows, the asm might just be doing some
arithmetic on the pointers. We need a better definition of
atomic_copy64, and this is mine:

// Atomically copy 64 bits of data
static void atomic_copy64(volatile void *src, volatile void *dst) {
#if defined(PPC32) && !defined(__NO_FPRS__)
double tmp;
asm volatile ("lfd %0, %2\n"
"stfd %0, %1\n"
: "=&f"(tmp), "=Q"(*(volatile double*)dst)
: "Q"(*(volatile double*)src));

Note that we dereference src and dst and pass the actual memory
operands to the asm, not just pointers to them.

(This might be more detail than you need, and I'm sorry this isn't a
real patch, but if you base a patch on what I've said here, it should
build. Let me know.)
--
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
Andrew Haley
2017-08-01 13:10:02 UTC
Permalink
Raw Message
NOTE THAT you should not increase the thread sizes in
os_linux_zero.cpp: these are minimums. Change the values in
hotspot/src/os_cpu/linux_zero/vm/globals_linux_zero.hpp and
common/autoconf/boot-jdk.m4 .
Sorry, I should have said: set the size to 2560.
--
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
John Paul Adrian Glaubitz
2017-08-01 14:40:02 UTC
Permalink
Raw Message
So I found two bugs in the package which stop it from building, one
yours and one ours. The first one is
debian/patches/8073754-stack-overflow-9-build.diff, which sets the
thread stack size to 2240: this is too small, and the build aborts. I
think this problem may be due to the use of 64k pages.
NOTE THAT you should not increase the thread sizes in
os_linux_zero.cpp: these are minimums. Change the values in
hotspot/src/os_cpu/linux_zero/vm/globals_linux_zero.hpp and
common/autoconf/boot-jdk.m4 .
Ohm I forgot. We already dropped this particular patch because it also
broke the build on ppc64. So, it seems we only need to deal with the
second problem.

Adrian
--
.''`. John Paul Adrian Glaubitz
: :' : Debian Developer - ***@debian.org
`. `' Freie Universitaet Berlin - ***@physik.fu-berlin.de
`- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
John Paul Adrian Glaubitz
2017-08-01 20:10:02 UTC
Permalink
Raw Message
Hi Andrew!
The second one is more subtle. Zero is so called because it uses zero
assembly language, but this is not quite true: there is a tiny bit of
assembly language, and it is wrong. Here is the PPC32 definition of
atomic_copy64. It uses a floating-point register to copy a 64-bit
// Atomically copy 64 bits of data
static void atomic_copy64(volatile void *src, volatile void *dst) {
#if defined(PPC32) && !defined(__NO_FPRS__)
double tmp;
asm volatile ("lfd %0, 0(%1)\n"
"stfd %0, 0(%2)\n"
: "=f"(tmp)
: "b"(src), "b"(dst));
The eagle-eyed among you might have noticed the bug: this asm has no
memory effect. It has no memory inputs, no memory outputs, and no
memory clobber. So, as far as GCC is concerned atomic_copy64 does not
touch memory at all, and there is no need to store the source operand
into memory. For all GCC knows, the asm might just be doing some
arithmetic on the pointers. We need a better definition of
// Atomically copy 64 bits of data
static void atomic_copy64(volatile void *src, volatile void *dst) {
#if defined(PPC32) && !defined(__NO_FPRS__)
double tmp;
asm volatile ("lfd %0, %2\n"
"stfd %0, %1\n"
: "=&f"(tmp), "=Q"(*(volatile double*)dst)
: "Q"(*(volatile double*)src));
Note that we dereference src and dst and pass the actual memory
operands to the asm, not just pointers to them.
This patch fixes the build for me. Could you get it merged upstream?

I assume it will go into the jdk10 branch because jdk9 isn't taking
any particular fixes at the moment. Am I correct?

Adrian
--
.''`. John Paul Adrian Glaubitz
: :' : Debian Developer - ***@debian.org
`. `' Freie Universitaet Berlin - ***@physik.fu-berlin.de
`- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
John Paul Adrian Glaubitz
2017-08-01 20:20:02 UTC
Permalink
Raw Message
Post by John Paul Adrian Glaubitz
This patch fixes the build for me. Could you get it merged upstream?
Hmm, wait a second. I just had the JVM lock up. Will do a clean build
to re-test.

Adrian
--
.''`. John Paul Adrian Glaubitz
: :' : Debian Developer - ***@debian.org
`. `' Freie Universitaet Berlin - ***@physik.fu-berlin.de
`- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
John Paul Adrian Glaubitz
2017-08-02 05:40:02 UTC
Permalink
Raw Message
Post by John Paul Adrian Glaubitz
Hmm, wait a second. I just had the JVM lock up. Will do a clean build
to re-test.
Ok, I should have just waited :-). Your patch works:

Build Architecture: powerpc
Build Type: any
Build-Space: 4093428
Build-Time: 23425
Distribution: sid
Host Architecture: powerpc
Install-Time: 115
Job: /var/lib/buildd/debian/openjdk-9_9~b179-2.dsc
Machine Architecture: powerpc
Package: openjdk-9
Package-Time: 23548
Source-Version: 9~b179-2
Space: 4093428
Status: successful
Version: 9~b179-2
--------------------------------------------------------------------------------
Finished at 2017-08-02T02:42:34Z
Build needed 06:32:28, 4093428k disk space
--
.''`. John Paul Adrian Glaubitz
: :' : Debian Developer - ***@debian.org
`. `' Freie Universitaet Berlin - ***@physik.fu-berlin.de
`- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
John Paul Adrian Glaubitz
2017-08-02 08:30:03 UTC
Permalink
Raw Message
Well, this one is a serious crasher, and it only affects Zero,
so it's possible. On the other hand, we could just patch the
packages. I'll put it in to JDK 10.
My general stance on patches is to avoid carrying them around in distributions
but rather get them merged upstream. This way the fix is available to all
downstreams and not just Debian. Plus, it reduces the burden of the package
maintainer of carrying around patches and rebasing them.

So, if it were possible to get this fix into JDK9 as well, that would be
great! Besides Debian, Gentoo and openSUSE still have releases for PPC32,
so they would profit from having the fix merged for JDK9 as well.

Thanks,
Adrian
--
.''`. John Paul Adrian Glaubitz
: :' : Debian Developer - ***@debian.org
`. `' Freie Universitaet Berlin - ***@physik.fu-berlin.de
`- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
Andrew Haley
2017-08-02 08:30:03 UTC
Permalink
Raw Message
Post by John Paul Adrian Glaubitz
I assume it will go into the jdk10 branch because jdk9 isn't taking
any particular fixes at the moment. Am I correct?
Well, this one is a serious crasher, and it only affects Zero,
so it's possible. On the other hand, we could just patch the
packages. I'll put it in to JDK 10.
--
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
Loading...