Jean Delvare [Wed, 4 Sep 2019 09:12:48 +0000 (11:12 +0200)]
drm/amd: be quiet when no SAD block is found
It is fine for displays without audio functionality to not provide
any SAD block in their EDID. Do not log an error in that case,
just return quietly.
This fixes half of bug fdo#107825:
https://bugs.freedesktop.org/show_bug.cgi?id=107825
Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Jean Delvare <jdelvare@suse.de> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: "David (ChunMing) Zhou" <David1.Zhou@amd.com> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Harry Wentland <harry.wentland@amd.com> Cc: Leo Li <sunpeng.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Trek [Sat, 31 Aug 2019 19:25:36 +0000 (21:25 +0200)]
drm/amdgpu: Check for valid number of registers to read
Do not try to allocate any amount of memory requested by the user.
Instead limit it to 128 registers. Actually the longest series of
consecutive allowed registers are 48, mmGB_TILE_MODE0-31 and
mmGB_MACROTILE_MODE0-15 (0x2644-0x2673).
Bug: https://bugs.freedesktop.org/show_bug.cgi?id=111273 Signed-off-by: Trek <trek00@inbox.ru> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Hans de Goede [Sat, 7 Sep 2019 20:32:38 +0000 (22:32 +0200)]
drm/radeon: Bail earlier when radeon.cik_/si_support=0 is passed
Bail from the pci_driver probe function instead of from the drm_driver
load function.
This avoid /dev/dri/card0 temporarily getting registered and then
unregistered again, sending unwanted add / remove udev events to
userspace.
Specifically this avoids triggering the (userspace) bug fixed by this
plymouth merge-request:
https://gitlab.freedesktop.org/plymouth/plymouth/merge_requests/59
Note that despite that being an userspace bug, not sending unnecessary
udev events is a good idea in general.
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1490490 Reviewed-by: Michel Dänzer <mdaenzer@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
VG20 did not require this workaround, as the fix is in the VBIOS.
Leave VG10/12 workaround as some older shipped cards do not have the
VBIOS fix in place, and the kernel workaround is required in those
situations
Signed-off-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Christian König [Fri, 7 Dec 2018 14:18:43 +0000 (15:18 +0100)]
drm/amdgpu: add graceful VM fault handling v3
Next step towards HMM support. For now just silence the retry fault and
optionally redirect the request to the dummy page.
v2: make sure the VM is not destroyed while we handle the fault.
v3: fix VM destroy check, cleanup comments
Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Christian König [Wed, 17 Jul 2019 07:58:47 +0000 (09:58 +0200)]
drm/amdgpu: reserve the root PD while freeing PASIDs
Free the pasid only while the root PD is reserved. This prevents use after
free in the page fault handling.
Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Christian König [Mon, 16 Sep 2019 15:35:53 +0000 (10:35 -0500)]
drm/amdgpu: allocate PDs/PTs with no_gpu_wait in a page fault
While handling a page fault we can't wait for other ongoing GPU
operations or we can potentially run into deadlocks.
Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Christian König [Thu, 28 Mar 2019 09:53:33 +0000 (10:53 +0100)]
drm/amdgpu: allow direct submission of clears
For handling PD/PT clears directly in the fault handler.
Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Christian König [Wed, 27 Mar 2019 12:59:23 +0000 (13:59 +0100)]
drm/amdgpu: allow direct submission of PTE updates
For handling PTE updates directly in the fault handler.
Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Christian König [Thu, 14 Mar 2019 08:10:01 +0000 (09:10 +0100)]
drm/amdgpu: allow direct submission of PDE updates v2
For handling PDE updates directly in the fault handler.
v2: fix typo in comment
Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Christian König [Mon, 16 Sep 2019 15:33:28 +0000 (10:33 -0500)]
drm/amdgpu: allow direct submission in the VM backends v2
This allows us to update page tables directly while in a page fault.
v2: use direct/delayed entities and still wait for moves
Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Christian König [Fri, 19 Jul 2019 12:41:12 +0000 (14:41 +0200)]
drm/amdgpu: split the VM entity into direct and delayed
For page fault handling we need to use a direct update which can't be
blocked by ongoing user CS.
Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Christian König [Mon, 16 Sep 2019 15:20:47 +0000 (10:20 -0500)]
drm/ttm: return -EBUSY on pipelining with no_gpu_wait (v2)
Setting the no_gpu_wait flag means that the allocate BO must be available
immediately and we can't wait for any GPU operation to finish.
v2: squash in mem leak fix, rebase
Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Christian König [Mon, 9 Sep 2019 11:57:32 +0000 (13:57 +0200)]
drm/amdgpu: grab the id mgr lock while accessing passid_mapping
Need to make sure that we actually dropping the right fence.
Could be done with RCU as well, but to complicated for a fix.
Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Chunming Zhou <david1.zhou@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jack Zhang [Tue, 10 Sep 2019 04:29:14 +0000 (12:29 +0800)]
drm/amdgpu/sriov: add ring_stop before ring_create in psp v11 code
psp v11 code missed ring stop in ring create function(VMR)
while psp v3.1 code had the code. This will cause VM destroy1
fail and psp ring create fail.
For SIOV-VF, ring_stop should not be deleted in ring_create
function.
Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Evan Quan [Wed, 11 Sep 2019 11:39:34 +0000 (19:39 +0800)]
drm/amd/powerplay: check SMU engine readiness before proceeding on S3 resume
This is especially needed for non-psp loading way.
Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Evan Quan [Wed, 11 Sep 2019 11:35:45 +0000 (19:35 +0800)]
drm/amd/powerplay: properly set mp1 state for SW SMU suspend/reset routine
Set mp1 state properly for SW SMU suspend/reset routine.
Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Felix Kuehling [Thu, 5 Sep 2019 23:22:02 +0000 (19:22 -0400)]
drm/amdgpu: Fix KFD-related kernel oops on Hawaii
Hawaii needs to flush caches explicitly, submitting an IB in a user
VMID from kernel mode. There is no s_fence in this case.
Fixes: eb3961a57424 ("drm/amdgpu: remove fence context from the job") Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Problem:
amdgpu_ras_reserve_bad_pages was moved to amdgpu_ras_reset_gpu
because writing to EEPROM during ASIC reset was unstable.
But for ERREVENT_ATHUB_INTERRUPT amdgpu_ras_reset_gpu is called
directly from ISR context and so locking is not allowed. Also it's
irrelevant for this partilcular interrupt as this is generic RAS
interrupt and not memory errors specific.
Fix:
Avoid calling amdgpu_ras_reserve_bad_pages if not in task context.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The table grows quickly during debug/development effort when
multiple RAS errors are injected. Allow to avoid this by setting
table header back to empty if needed.
v2: Switch to debugfs entry instead of load time parameter.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Shirish S [Thu, 12 Sep 2019 06:33:55 +0000 (12:03 +0530)]
drm/amdgpu: remove needless usage of #ifdef
define sched_policy in case CONFIG_HSA_AMD is not
enabled, with this there is no need to check for CONFIG_HSA_AMD
else where in driver code.
Suggested-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Shirish S <shirish.s@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Shirish S [Tue, 10 Sep 2019 07:15:01 +0000 (12:45 +0530)]
drm/amdgpu: fix build error without CONFIG_HSA_AMD
If CONFIG_HSA_AMD is not set, build fails:
drivers/gpu/drm/amd/amdgpu/amdgpu_device.o: In function `amdgpu_device_ip_early_init':
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:1626: undefined reference to `sched_policy'
Use CONFIG_HSA_AMD to guard this.
Fixes: 1abb680ad371 ("drm/amdgpu: disable gfxoff while use no H/W scheduling policy") Signed-off-by: Shirish S <shirish.s@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Christian König [Mon, 2 Sep 2019 14:39:40 +0000 (16:39 +0200)]
drm/amdgpu: cleanup PTE flag generation v3
Move the ASIC specific code into a new callback function.
v2: mask the flags for SI and CIK instead of a BUG_ON().
v3: remove last missed BUG_ON().
Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Philip Yang [Fri, 6 Sep 2019 17:20:40 +0000 (13:20 -0400)]
drm/amdgpu: check if nbio->ras_if exist
To avoid NULL function pointer access. This happens on VG10, reboot
command hangs and have to power off/on to reboot the machine. This is
serial console log:
Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amdgpu: disable gfxoff while use no H/W scheduling policy
While gfxoff is enabled, the mmVM_XXX registers will be 0xfffffff while the GFX
is in "off" state. KFD queue creattion doesn't use ring based method, so it will
trigger a VM fault.
Signed-off-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Yong Zhao [Fri, 30 Aug 2019 22:09:10 +0000 (18:09 -0400)]
drm/amdgpu: Add a kernel parameter for specifying the asic type
As more and more new asics start to reuse the old device IDs before
launch, there is a need to quickly override the existing asic type
corresponding to the reused device ID through a kernel parameter. With
this, engineers no longer need to rely on local hack patches,
facilitating cooperation across teams.
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Bayan Zabihiyan [Mon, 19 Aug 2019 19:18:43 +0000 (15:18 -0400)]
drm/amd/display: Isolate DSC module from driver dependencies
[Why]
Edid Utility wishes to include DSC module from driver instead
of doing it's own logic which will need to be updated every time
someone modifies the driver logic.
[How]
Modify some functions such that we dont need to pass the entire
DC structure as parameter.
-Remove DC inclusion from module.
-Filter out problematic types and inclusions
Signed-off-by: Bayan Zabihiyan <bayan.zabihiyan@amd.com> Reviewed-by: Jun Lei <Jun.Lei@amd.com> Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jaehyun Chung [Mon, 19 Aug 2019 20:45:05 +0000 (16:45 -0400)]
drm/amd/display: OTC underflow fix
[Why] Underflow occurs on some display setups(repro'd on 3x4K HDR) on boot,
mode set, and hot-plugs with. Underflow occurs because mem clk
is not set high after disabling pstate switching. This behaviour occurs
because some calculations assumed displays were synchronized.
[How] Add a condition to check if timing sync is disabled so that
synchronized vblank can be set to false.
Signed-off-by: Jaehyun Chung <jaehyun.chung@amd.com> Reviewed-by: Alvin Lee <Alvin.Lee2@amd.com> Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jun Lei [Thu, 15 Aug 2019 19:22:34 +0000 (15:22 -0400)]
drm/amd/display: remove hw access from dc_destroy
[why]
dc_destroy should only clean up SW, this is because GPUs may be
removed before driver unload, leading to HW to be unavailable.
[how]
remove GPIO close as part of GPIO destroy, this is unnecessary because
GPIO is not shared, and GPIOs are generally closed after being opened
Add tracking to HW access during destructor to make future issues
easier to pinpoint, and block access to prevent hangs.
Signed-off-by: Jun Lei <Jun.Lei@amd.com> Reviewed-by: Yongqiang Sun <yongqiang.sun@amd.com> Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
- Need to add missing surface address register definitions.
- RGBE+A does not work in a stereo configuration because
surface addresses are no programmed correctly.
[How]
Added the register definitions and surface address programming.
Signed-off-by: Ilya Bakoulin <Ilya.Bakoulin@amd.com> Reviewed-by: Charlene Liu <Charlene.Liu@amd.com> Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Nikola Cornij [Fri, 16 Aug 2019 18:26:56 +0000 (14:26 -0400)]
drm/amd/display: config to override DSC start slice height
[why]
It's sometimes useful to have this option when debugging
[how]
Add a config flag. If the flag is not set, use driver default policy.
If the flag is set, use the value from the flag as the starting DSC slice
height.
Signed-off-by: Nikola Cornij <nikola.cornij@amd.com> Reviewed-by: Martin Leung <Martin.Leung@amd.com> Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Krunoslav Kovac [Fri, 9 Aug 2019 21:16:18 +0000 (17:16 -0400)]
drm/amd/display: Subsample mode suboptimal for YCbCr4:2:2
[Why&How]
Driver defaults to 1-tap subsample mode for 4:2:2.
DCE11.2 added 3-tap. The policy is:
DCE8-DCE11 - change to 2-tap, it's better than 1-tap.
DCE11.2+ - use 3-tap
Note that 4:2:0 was added in DCE11.2 and already uses 3-tap always.
Note 2 is that DCE not covered on Linux, only DCN+
Signed-off-by: Krunoslav Kovac <Krunoslav.Kovac@amd.com> Reviewed-by: Charlene Liu <Charlene.Liu@amd.com> Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lewis Huang [Wed, 7 Aug 2019 10:05:49 +0000 (18:05 +0800)]
drm/amd/display: refine i2c over aux
[Why]
When user mode use i2c over aux through ADL or DDI, the function
dal_ddc_service_query_ddc_data will be called. There are two issues.
1. When read/write length > 16byte, current always return false because
the DEFAULT_AUX_MAX_DATA_SIZE != length.
2. When usermode only need to read data through i2c, driver will write
mot = true at the same address and cause i2c sink confused. Therefore
the following read command will get garbage.
[How]
1. Add function dal_dcc_submit_aux_command to handle length > 16 byte.
2. Check read size and write size when query ddc data.
Signed-off-by: Lewis Huang <Lewis.Huang@amd.com> Reviewed-by: Charlene Liu <Charlene.Liu@amd.com> Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Tao Zhou [Fri, 30 Aug 2019 11:50:39 +0000 (19:50 +0800)]
drm/amdgpu: move the call of ras recovery_init and bad page reserve to proper place
ras recovery_init should be called after ttm init,
bad page reserve should be put in front of gpu reset since i2c
may be unstable during gpu reset.
add cleanup for recovery_init and recovery_fini
v2: add more comment and print.
remove cancel_work_sync in recovery_init.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Felix Kuehling [Fri, 30 Aug 2019 04:10:34 +0000 (00:10 -0400)]
drm/amdgpu: Disable page faults while reading user wptrs
These wptrs must be pinned and GPU accessible when this is called
from hqd_load functions. So they should never fault. This resolves
a circular lock dependency issue involving four locks including the
DQM lock and mmap_sem.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Oak Zeng <Oak.Zeng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Aaron Liu [Wed, 4 Sep 2019 03:47:48 +0000 (11:47 +0800)]
drm/amdgpu: disable stutter mode for renoir
With stutter mode enabled, NMI prints frequently.
Disable stutter for the moment because NMI warning storm, and will
enable it back till the issue is addressed
Signed-off-by: Aaron Liu <aaron.liu@amd.com> Acked-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Kevin Wang <kevin1.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>