FrogAi 659adb6457 openpilot v0.9.7 release
date: 2024-03-17T10:14:38
master commit: 7e9a909e0e57ecb31df4c87c5b9a06b1204fd034
2024-05-24 17:43:27 -07:00

42 lines
2.8 KiB
Plaintext

# run two tinygrad matrix example in a loop
# amdgpu-6.0.5-1581431.20.04
# NOT fixed in kernel 6.2.14
[ 553.016624] gmc_v11_0_process_interrupt: 30 callbacks suppressed
[ 553.016631] amdgpu 0000:0b:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:9 pasid:32770, for process python3 pid 10001 thread python3 pid 10001)
[ 553.016790] amdgpu 0000:0b:00.0: amdgpu: in page starting at address 0x00007f0000000000 from client 10
[ 553.016892] amdgpu 0000:0b:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00901A30
[ 553.016974] amdgpu 0000:0b:00.0: amdgpu: Faulty UTCL2 client ID: SDMA0 (0xd)
[ 553.017051] amdgpu 0000:0b:00.0: amdgpu: MORE_FAULTS: 0x0
[ 553.017111] amdgpu 0000:0b:00.0: amdgpu: WALKER_ERROR: 0x0
[ 553.017173] amdgpu 0000:0b:00.0: amdgpu: PERMISSION_FAULTS: 0x3
[ 553.017238] amdgpu 0000:0b:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 553.017300] amdgpu 0000:0b:00.0: amdgpu: RW: 0x0
[ 553.123921] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=2
[ 553.124153] amdgpu: failed to add hardware queue to MES, doorbell=0x1a16
[ 553.124195] amdgpu: MES might be in unrecoverable state, issue a GPU reset
[ 553.124237] amdgpu: Failed to restore queue 2
[ 553.124266] amdgpu: Failed to restore process queues
[ 553.124270] amdgpu: Failed to evict queue 3
[ 553.124297] amdgpu: amdgpu_amdkfd_restore_userptr_worker: Failed to resume KFD
# alternative crash in kernel 6.2.14
[ 151.097948] gmc_v11_0_process_interrupt: 30 callbacks suppressed
[ 151.097953] amdgpu 0000:0b:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:8 pasid:32771, for process python3 pid 7525 thread python3 pid 7525)
[ 151.097993] amdgpu 0000:0b:00.0: amdgpu: in page starting at address 0x00007f0000000000 from client 10
[ 151.098008] amdgpu 0000:0b:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00801A30
[ 151.098020] amdgpu 0000:0b:00.0: amdgpu: Faulty UTCL2 client ID: SDMA0 (0xd)
[ 151.098032] amdgpu 0000:0b:00.0: amdgpu: MORE_FAULTS: 0x0
[ 151.098042] amdgpu 0000:0b:00.0: amdgpu: WALKER_ERROR: 0x0
[ 151.098052] amdgpu 0000:0b:00.0: amdgpu: PERMISSION_FAULTS: 0x3
[ 151.098062] amdgpu 0000:0b:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 151.098071] amdgpu 0000:0b:00.0: amdgpu: RW: 0x0
[ 151.209517] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=2
[ 151.209724] amdgpu: failed to add hardware queue to MES, doorbell=0x1002
[ 151.209734] amdgpu: MES might be in unrecoverable state, issue a GPU reset
[ 151.209743] amdgpu: Failed to restore queue 1
[ 151.209751] amdgpu: Failed to restore process queues
[ 151.209759] amdgpu: amdgpu_amdkfd_restore_userptr_worker: Failed to resume KFD
[ 151.209858] amdgpu 0000:0b:00.0: amdgpu: GPU reset begin!