Discussion:
Bug#928631: firmware-amd-graphics: Update to 20190502-1 causus hang of system directly after grub
Add Reply
Michael Becker
2019-05-18 12:30:01 UTC
Reply
Permalink
Package: firmware-amd-graphics
Followup-For: Bug #928631

Dear Maintainer,

I had the same problem after upgrading to 20190502-1. I downgraded
back to 20190502-1 and was able to normally boot into my system.
But I did not notice any spontaneous reboots so far.

The following lspci output is from after I downgraded the package.
Hope this helps in some way.


lspci -vvv -s 0b:00.0

0b:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XL/XT [Radeon RX Vega 56/64] (rev c3) (prog-if 00 [VGA controller])
Subsystem: Sapphire Technology Limited Vega 10 XL/XT [Radeon RX Vega 56/64]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 151
Region 0: Memory at e0000000 (64-bit, prefetchable) [size=256M]
Region 2: Memory at f0000000 (64-bit, prefetchable) [size=2M]
Region 4: I/O ports at d000 [size=256]
Region 5: Memory at fcc00000 (32-bit, non-prefetchable) [size=512K]
Expansion ROM at 000c0000 [disabled] [size=128K]
Capabilities: [48] Vendor Specific Information: Len=08 <?>
Capabilities: [50] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [64] Express (v2) Legacy Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 8GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR+, OBFF Not Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+
EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: 00000000fee00000 Data: 0000
Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
Capabilities: [150 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
Capabilities: [200 v1] #15
Capabilities: [270 v1] #19
Capabilities: [2a0 v1] Access Control Services
ACSCap: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
Capabilities: [2b0 v1] Address Translation Service (ATS)
ATSCap: Invalidate Queue Depth: 00
ATSCtl: Enable+, Smallest Translation Unit: 00
Capabilities: [2c0 v1] Page Request Interface (PRI)
PRICtl: Enable- Reset-
PRISta: RF- UPRGI- Stopped+
Page Request Capacity: 00000020, Page Request Allocation: 00000000
Capabilities: [2d0 v1] Process Address Space ID (PASID)
PASIDCap: Exec+ Priv+, Max PASID Width: 10
PASIDCtl: Enable- Exec- Priv-
Capabilities: [320 v1] Latency Tolerance Reporting
Max snoop latency: 0ns
Max no snoop latency: 0ns
Kernel driver in use: amdgpu
Kernel modules: amdgpu


-- System Information:
Debian Release: 10.0
APT prefers unstable-debug
APT policy: (500, 'unstable-debug'), (500, 'unstable')
Architecture: amd64 (x86_64)

Kernel: Linux 4.19.0-5-amd64 (SMP w/16 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE=en_US:en (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

firmware-amd-graphics depends on no packages.

firmware-amd-graphics recommends no packages.

Versions of packages firmware-amd-graphics suggests:
ii initramfs-tools 0.133

-- no debconf information
Diederik de Haas
2019-05-18 13:30:01 UTC
Reply
Permalink
Post by Michael Becker
But I did not notice any spontaneous reboots so far.
The spontaneous reboots seems to be an entirely different issue, see https://
bugs.debian.org/cgi-bin/bugreport.cgi?bug=924895#15 for details.
Diederik de Haas
2019-05-21 19:30:01 UTC
Reply
Permalink
Hi,
Hi,
firmware-amd-graphics 20190502-1 is based onto upstream commit
92e17d0dd2437140fab044ae62baf69b35d7d1fa, that is commit "amdgpu: update
vega20 to the latest 19.10 firmware" . Two commits behind there is commit
"amdgpu: update vega10 to the latest 19.10 firmware", that is already
included in firmware-amd-graphics 20190502-1.
Could you try to revert "amdgpu: update vega10 to the latest 19.10
firmware" ? So try to use the firmware for vega10 that is before this
commit. Does it work for you ?
1. Use linux-firmware.git with last HEAD in the master branch
update polaris11 to the latest 19.10 firmware", that is the commit before
bumping vega10 to 19.10) 3. Copy vega10 binary blobs to
/lib/firmware/amdgpu
Does it work ?
Yes, that does work.
What did surprise me is that I saw a blinking cursor, which I don't see with
firmware-amd-graphics version 20190114-1.
$ git log --oneline -- amdgpu/vega10_ce.bin
0f22c85 Revert "amdgpu: update vega10 fw for 18.50 release"
ec4b0cd amdgpu: update vega10 fw for 18.50 release
ac5f8bd amdgpu: update vega10 firmware to 18.40
10e2971 amdgpu: sync up vega10 firmware with 18.20 release
0d672f7 amdgpu: sync up vega10 firmware with 18.10 release
f0698be amdgpu: add initial vega10 firmware

This tells me I'm running actually running "ac5f8bd amdgpu: update vega10
firmware to 18.40"
https://tracker.debian.org/news/1021249/accepted-firmware-nonfree-20190114-1-source-into-unstable/ contains:
- amd-graphics:
+ "Polaris10", "Polaris11", "Raven" firmware updates to sync with
18.50 release
+ "Fiji", "Tonga", "Vega10", "Carrizo" firmware updates to sync with
18.40 release

So in both cases I'm supposed to run the exact same firmware version, so even
the minor change in behavior (blinking cursor) surprises me.

What was the reason for the test?
Checking 'git log' for that specific file before I did the test made me conclude
it wouldn't make a difference with packaged version 20190114-1 (but did the
test anyway as requested).

Cheers,
Diederik
Dean Loros
2019-06-21 15:00:02 UTC
Reply
Permalink
Can I confirm that this is a problem with AMD graphics only--or will this
affect all systems regardless of Video card type?
Diederik de Haas
2019-06-22 11:10:01 UTC
Reply
Permalink
Post by Dean Loros
Can I confirm that this is a problem with AMD graphics only--or will this
affect all systems regardless of Video card type?
Yes. (bug is filed against firmware-amd-graphics)


Michael Becker: what CPU do you have?
Antonio De Luci: Are you running Sid as well?


I feel 'bad' because this bug is preventing the whole of firmware-nonfree to
not migrate to testing/Buster and thus causes Buster to be released with
(somewhat) dated firmware. I suspect many would benefit from version 20190502-1.

It very much looks to be an issue with the Vega 56/64 card + AMD Ryzen (7?)
CPU, ie a very specific combination. The people who ran into this issue are
running Sid (at least 2 of them), so wouldn't be affected by an 'old' version
in the next Stable.

I remember there being 'stretch-ignore' tags to bugs and I would be fine if
'buster-ignore' would be applied to this bug so firmware-nonfree can migrate to
testing/Buster.
Diederik de Haas
2019-07-14 21:30:01 UTC
Reply
Permalink
Hi,
20190502-1 is already outdated, since amdgpu firmware had some updates
https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git
/log/amdgpu
Does this problem still occur if you use latest upstream firmware?
I first updated my local git repo to 3d1e5537dbd8ac36c01fc33e7bf525e5c8e4b708.

I then upgraded back to the latest package versions and rebooted and
encountered the same issue as reported.

Started SystemRescueCD and chrooted into my system (just like before), but I
now copied the vega10* files from the amdgpu dir to /lib/firmware/amdgpu/ and
rebooted again.
And now it succeeded :D
Thanks a lot for the hint!
By the way, it works well for me (Ryzen 7 2700X / Vega 56), but I'm running
kernel 5.2.1 in Debian testing.
I didn't upgrade my kernel, so I'm still on 4.19.0-5-amd64 (4.19.37-5) in
Debian Sid.
So the only thing that was needed for me was updating the firmware files to the
latest version.

Cheers,
Diederik
Estevo Paz Freire
2019-08-12 11:20:02 UTC
Reply
Permalink
I have same problem with last driver version 20190717-1,
in my case it's a related problem with HDMI connection,
if I boot with HDMI plugin freeze in the same time.

If I plug after login in X session, system freeze:
[drm] amdgpu_dm_irq_schedule_work FAILED src 2

I'm not sure if the problem comes from firmware-amd-graphics
or linux-image-amd64_4.18+99_amd64.deb

Let me know to open a new bug if required.


Thanks.

Loading...