Discussion:
Bug#1093124: libhsa-runtime64-1: HSA exception: Queue create failed at hsaKmtCreateQueue with multiple programs
Add Reply
Dieter Faulbaum
2025-01-15 11:50:01 UTC
Reply
Permalink
Package: libhsa-runtime64-1
Version: 5.7.1-3
Severity: important

Dear Maintainer,

as an example darktable can't use my GPU anymore.

Programs I used to reproduce the error:
darktable-cltest, clpeak and clinfo -l

with all these programs I get lines like these:
HSA exception: Queue create failed at hsaKmtCreateQueue

Maybe the "best" info is from darktable-cltest:

darktable 5.0.0
Copyright (C) 2012-2024 Johannes Hanika and other contributors.

Compile options:
Bit depth -> 64 bit
Debug -> DISABLED
SSE2 optimizations -> ENABLED
OpenMP -> ENABLED
OpenCL -> ENABLED
Lua -> ENABLED - API version 9.4.0
Colord -> ENABLED
gPhoto2 -> ENABLED
GMIC -> ENABLED - Compressed LUTs are supported
GraphicsMagick -> ENABLED
ImageMagick -> DISABLED
libavif -> ENABLED
libheif -> ENABLED
libjxl -> ENABLED
LibRaw -> ENABLED - Version 0.22.0-Devel202403
OpenJPEG -> ENABLED
OpenEXR -> ENABLED
WebP -> ENABLED

See https://www.darktable.org/resources/ for detailed documentation.
See https://github.com/darktable-org/darktable/issues/new/choose to report
bugs.

0.3710 [dt_get_sysresource_level] switched to 1 as `default'
0.3710 total mem: 64229MB
0.3710 mipmap cache: 8028MB
0.3710 available mem: 32114MB
0.3710 singlebuff: 501MB
0.3747 [opencl_init] opencl disabled via darktable preferences
0.3748 [opencl_init] opencl library 'libOpenCL' found on your system and
loaded, preference 'default path'
0.4307 [opencl_init] found 1 platform
[opencl_init] found 1 device

[dt_opencl_device_init]
DEVICE: 0: 'gfx803'
CONF KEY: cldevice_v5_amdacceleratedparallelprocessinggfx803
PLATFORM, VENDOR & ID: AMD Accelerated Parallel Processing, Advanced
Micro Devices, Inc., ID=4098
CANONICAL NAME: amdacceleratedparallelprocessinggfx803
DRIVER VERSION: 3590.0 (HSA1.1,LC)
DEVICE VERSION: OpenCL 1.2
DEVICE_TYPE: GPU, dedicated mem
GLOBAL MEM SIZE: 8192 MB
MAX MEM ALLOC: 6963 MB
MAX IMAGE SIZE: 16384 x 16384
MAX WORK GROUP SIZE: 256
MAX WORK ITEM DIMENSIONS: 3
MAX WORK ITEM SIZES: [ 1024 1024 1024 ]
ASYNC PIXELPIPE: NO
PINNED MEMORY TRANSFER: NO
AVOID ATOMICS: NO
MICRO NAP: 250
ROUNDUP WIDTH & HEIGHT 16x16
CHECK EVENT HANDLES: 128
TILING ADVANTAGE: 0.000
DEFAULT DEVICE: NO
HSA exception: Queue create failed at hsaKmtCreateQueue

HSA exception: Queue create failed at hsaKmtCreateQueue

HSA exception: Queue create failed at hsaKmtCreateQueue

HSA exception: Queue create failed at hsaKmtCreateQueue

HSA exception: Queue create failed at hsaKmtCreateQueue

HSA exception: Queue create failed at hsaKmtCreateQueue

HSA exception: Queue create failed at hsaKmtCreateQueue

HSA exception: Queue create failed at hsaKmtCreateQueue

HSA exception: Queue create failed at hsaKmtCreateQueue

*** could not create command queue *** CL_OUT_OF_HOST_MEMORY
[opencl_init] no suitable devices found.
0.4530 [opencl_init] FINALLY: opencl PREFERENCE=OFF is NOT AVAILABLE and
NOT ENABLED.


-- System Information:
Debian Release: trixie/sid
APT prefers testing
APT policy: (500, 'testing')
Architecture: amd64 (x86_64)

Kernel: Linux 6.12.6-amd64 (SMP w/16 CPU threads; PREEMPT)
Locale: LANG=en_US.utf8, LC_CTYPE=de_DE.utf8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages libhsa-runtime64-1 depends on:
ii libc6 2.40-5
ii libdrm-amdgpu1 2.4.123-1
ii libdrm2 2.4.123-1
ii libelf1t64 0.192-4
ii libgcc-s1 14.2.0-12
ii libhsakmt1 5.7.0-1
ii libstdc++6 14.2.0-12

libhsa-runtime64-1 recommends no packages.

libhsa-runtime64-1 suggests no packages.

-- no debconf information
Cordell Bloor
2025-01-16 18:50:01 UTC
Reply
Permalink
Hi Dieter,

What GPU are you using? I see that it's an gfx803 card. It sounds like
this was working for you in the past. Do you know what changed? Maybe a
new kernel version?

Unfortunately, I'm not hopeful for the long-term support for gfx803.
It's my understanding that there are serious bugs in rocm-opencl-icd for
gfx803 on ROCm 6 and later [1]. It's unlikely that Debian will be able
to keep rocm-opencl-icd working on that hardware as AMD is no longer
supporting it. Somebody could fork/fix the necessary ROCm libraries and
maintain gfx803 support if they cared about it strongly enough, but I
don't think the Debian AI Team has the bandwidth to do that.

Sincerely,
Cory Bloor

[1]: https://aur.archlinux.org/packages/opencl-amd#comment-949758
Cordell Bloor
2025-01-17 00:10:01 UTC
Reply
Permalink
It's a Sapphire Pulse Radeon RX 570 8G G5 (from 2019)
I think it was the package firmware-amd-graphics (not totally sure but
relatively), the package libhsa-runtime64-1 is not changed since
2024-08-26.
And this card still works "some" weeks ago.
Thanks. I have an XFX Radeon RX 570 (8GB) available to me. I'll try to
reproduce and investigate this bug on my test bench.
Ok. I'm thinking about buying a new card.
Sapphire Pulse Radeon RX 7600 XT OC, 16GB GDDR6
Or will an Intel card (I'm not a player and need it only for opencl) a
better choice (I don't like Nvidia,-)?
My dayjob is working at AMD, so I'm not sure I can give unbiased advice.

The AMD Radeon RX 7900 GRE (Navi 31) is the cheapest RDNA 3 GPU that is
listed on AMD's official support list for ROCm on Linux [1]. The AMD
Radeon RX 7800 XT (Navi 32) may be a reasonable alternative. The RX 7800
XT is not officially supported on Linux, but it is officially supported
on Windows [2] and it may benefit from work done for the AMD Radeon PRO
V710 (Navi 32), which is officially supported on Linux.

I personally own a PowerColor Hellhound AMD Radeon RX 7600XT (Navi 33)
that I purchased for testing software on Debian. The RX 7600 XT is
officially supported by AMD on Windows, but it is not officially
supported on Linux (nor are there any other Navi 33 GPUs that have
official support on Linux). The ROCm math libraries distributed by AMD
are built for Navi 33 despite that lack of official support, but I don't
know if that extends to all AI libraries and frameworks.

It's perhaps also worth noting that RDNA 4 is just around the corner
[3]. Though, it may be some time before support for RDNA 4 hardware is
available in Debian.

Sincerely,
Cory Bloor

[1]:
https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.3.1/reference/system-requirements.html#supported-gpus
[2]:
https://rocm.docs.amd.com/projects/install-on-windows/en/docs-6.3.1/reference/system-requirements.html#windows-supported-gpus
[3]:
https://community.amd.com/t5/part-recommendations/rdna-4-emerges-amd-unveils-red-hot-radeon-rx-9000-gpus-with-big/m-p/737069
Cordell Bloor
2025-01-21 06:10:02 UTC
Reply
Permalink
Hi Dieter,

Thanks for introducing me to darktable. That looks like a useful program.
It's a Sapphire Pulse Radeon RX 570 8G G5 (from 2019)
I can reproduce this bug on my XFX Radeon RX 570 8G. Of course, the
cause remains unknown.
Does your PowerColor Hellhound AMD Radeon RX 7600XT work with opencl?
Yes. I tried using it for clpeak and darktable and it seemed to work. I
did see one crash in Darktable while I was playing around with 'expand
canvas', but I have no idea if that was related to the GPU.
Unfortunately, I didn't have a debugger installed to catch it.

The one caveat for the RX 7600 XT is that I do see a dmesg warning about
an access to an unmapped page when I run darktable-cltest or clpeak.
That probably indicates that _something_ is broken for Navi 33 GPUs, but
I'm not sure what. The darktable and clpeak programs seemed to work just
fine despite the warning. I'm probably not going to bother looking into
it until after we get the ROCm stack updated to the latest upstream release.

Sincerely,
Cory Bloor

Loading...