]> asedeno.scripts.mit.edu Git - linux.git/log
linux.git
4 years agodrm/i915: Mark the removal of the i915_request from the sched.link
Chris Wilson [Wed, 22 Jan 2020 14:02:43 +0000 (14:02 +0000)]
drm/i915: Mark the removal of the i915_request from the sched.link

Keep the rq->fence.flags consistent with the status of the
rq->sched.link, and clear the associated bits when decoupling the link
on retirement (as we may wish to inspect those flags independent of
other state).

Fixes: c3f1ed90e6ff ("drm/i915/gt: Allow temporary suspension of inflight requests")
References: https://gitlab.freedesktop.org/drm/intel/issues/997
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200122140243.495621-3-chris@chris-wilson.co.uk
(cherry picked from commit b4a9a149f91ea345da76bcfe3f8a39715ac346a6)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
4 years agodrm/i915/execlists: Reclaim the hanging virtual request
Chris Wilson [Wed, 22 Jan 2020 14:02:42 +0000 (14:02 +0000)]
drm/i915/execlists: Reclaim the hanging virtual request

If we encounter a hang on a virtual engine, as we process the hang the
request may already have been moved back to the virtual engine (we are
processing the hang on the physical engine). We need to reclaim the
request from the virtual engine so that the locking is consistent and
local to the real engine on which we will hold the request for error
state capturing.

v2: Pull the reclamation into execlists_hold() and assert that cannot be
called from outside of the reset (i.e. with the tasklet disabled).
v3: Added selftest
v4: Drop the reference owned by the virtual engine

Fixes: ad18ba7b5eeb ("drm/i915/execlists: Offline error capture")
Testcase: igt/gem_exec_balancer/hang
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200122140243.495621-2-chris@chris-wilson.co.uk
(cherry picked from commit 989df3a7bd2abe566521e61d1aebf603eb013b7f)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
4 years agodrm/i915/execlists: Take a reference while capturing the guilty request
Chris Wilson [Wed, 22 Jan 2020 14:02:41 +0000 (14:02 +0000)]
drm/i915/execlists: Take a reference while capturing the guilty request

Thanks to preempt-to-busy, we leave the request on the HW as we submit
the preemption request. This means that the request may complete at any
moment as we process HW events, and in particular the request may be
retired as we are planning to capture it for a preemption timeout.

Be more careful while obtaining the request to capture after a
preemption timeout, and check to see if it completed before we were able
to put it on the on-hold list. If we do see it did complete just before
we capture the request, proclaim the preemption-timeout a false positive
and pardon the reset as we should hit an arbitration point momentarily
and so be able to process the preemption.

Note that even after we move the request to be on hold it may be retired
(as the reset to stop the HW comes after), so we do require to hold our
own reference as we work on the request for capture (and all of the
peeking at state within the request needs to be carefully protected).

Fixes: c3f1ed90e6ff ("drm/i915/gt: Allow temporary suspension of inflight requests")
Closes: https://gitlab.freedesktop.org/drm/intel/issues/997
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200122140243.495621-1-chris@chris-wilson.co.uk
(cherry picked from commit 4ba5c086a1d8e38d6927967ae1a3271a6ab7a927)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
4 years agodrm/i915/execlists: Offline error capture
Chris Wilson [Thu, 16 Jan 2020 18:47:54 +0000 (18:47 +0000)]
drm/i915/execlists: Offline error capture

Currently, we skip error capture upon forced preemption. We apply forced
preemption when there is a higher priority request that should be
running but is being blocked, and we skip inline error capture so that
the preemption request is not further delayed by a user controlled
capture -- extending the denial of service.

However, preemption reset is also used for heartbeats and regular GPU
hangs. By skipping the error capture, we remove the ability to debug GPU
hangs.

In order to capture the error without delaying the preemption request
further, we can do an out-of-line capture by removing the guilty request
from the execution queue and scheduling a worker to dump that request.
When removing a request, we need to remove the entire context and all
descendants from the execution queue, so that they do not jump past.

Closes: https://gitlab.freedesktop.org/drm/intel/issues/738
Fixes: 3a7a92aba8fb ("drm/i915/execlists: Force preemption")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200116184754.2860848-3-chris@chris-wilson.co.uk
(cherry picked from commit 748317386afb235e11616098d2c7772e49776b58)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
4 years agodrm/i915/gt: Allow temporary suspension of inflight requests
Chris Wilson [Thu, 16 Jan 2020 18:47:53 +0000 (18:47 +0000)]
drm/i915/gt: Allow temporary suspension of inflight requests

In order to support out-of-line error capture, we need to remove the
active request from HW and put it to one side while a worker compresses
and stores all the details associated with that request. (As that
compression may take an arbitrary user-controlled amount of time, we
want to let the engine continue running on other workloads while the
hanging request is dumped.) Not only do we need to remove the active
request, but we also have to remove its context and all requests that
were dependent on it (both in flight, queued and future submission).

Finally once the capture is complete, we need to be able to resubmit the
request and its dependents and allow them to execute.

v2: Replace stack recursion with a simple list.
v3: Check all the parents, not just the first, when searching for a
stuck ancestor!

References: https://gitlab.freedesktop.org/drm/intel/issues/738
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200116184754.2860848-2-chris@chris-wilson.co.uk
(cherry picked from commit 32ff621fd74496f0c33644125fb69ff175859b1f)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
4 years agodrm/i915: Keep track of request among the scheduling lists
Chris Wilson [Thu, 16 Jan 2020 18:47:52 +0000 (18:47 +0000)]
drm/i915: Keep track of request among the scheduling lists

If we keep track of when the i915_request.sched.link is on the HW
runlist, or in the priority queue we can simplify our interactions with
the request (such as during rescheduling). This also simplifies the next
patch where we introduce a new in-between list, for requests that are
ready but neither on the run list or in the queue.

v2: Update i915_sched_node.link explanation for current usage where it
is a link on both the queue and on the runlists.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200116184754.2860848-1-chris@chris-wilson.co.uk
(cherry picked from commit 672c368f9398042b629740cc9816e8e939eff2db)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
4 years agoMerge tag 'gvt-fixes-2020-02-12' of https://github.com/intel/gvt-linux into drm-intel...
Jani Nikula [Wed, 12 Feb 2020 14:50:04 +0000 (16:50 +0200)]
Merge tag 'gvt-fixes-2020-02-12' of https://github.com/intel/gvt-linux into drm-intel-next-fixes

gvt-fixes-2020-02-12

- fix possible high-order allocation fail for late load (Igor)
- fix one missed lock for ppgtt mm LRU list (Igor)

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
From: Zhenyu Wang <zhenyuw@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200212065912.GB4997@zhen-hp.sh.intel.com
4 years agodrm/i915/gem: Tighten checks and acquiring the mmap object
Chris Wilson [Thu, 30 Jan 2020 14:39:31 +0000 (14:39 +0000)]
drm/i915/gem: Tighten checks and acquiring the mmap object

Make sure we hold the rcu lock as we acquire the rcu protected reference
of the object when looking it up from the associated mmap vma.

Closes: https://gitlab.freedesktop.org/drm/intel/issues/1083
Fixes: cc662126b413 ("drm/i915: Introduce DRM_I915_GEM_MMAP_OFFSET")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200130143931.1906301-1-chris@chris-wilson.co.uk
(cherry picked from commit 280d14a69da2e71f43408537c008f2775d5e5360)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
4 years agodrm/i915: Fix preallocated barrier list append
José Roberto de Souza [Wed, 29 Jan 2020 23:23:45 +0000 (15:23 -0800)]
drm/i915: Fix preallocated barrier list append

Only the first and the last nodes were being added to
ref->preallocated_barriers.

Renaming variables to make it more easy to read.

Fixes: 841350223816 ("drm/i915/gt: Drop mutex serialisation between context pin/unpin")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200129232345.84512-1-jose.souza@intel.com
(cherry picked from commit d4c3c0b8221a72107eaf35c80c40716b81ca463e)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
4 years agodrm/i915/gt: Acquire ce->active before ce->pin_count/ce->pin_mutex
Chris Wilson [Mon, 27 Jan 2020 15:28:29 +0000 (15:28 +0000)]
drm/i915/gt: Acquire ce->active before ce->pin_count/ce->pin_mutex

Similar to commit ac0e331a628b ("drm/i915: Tighten atomicity of
i915_active_acquire vs i915_active_release") we have the same race of
trying to pin the context underneath a mutex while allowing the
decrement to be atomic outside of that mutex. This leads to the problem
where two threads may simultaneously try to pin the context and the
second not notice that they needed to repin the context.

<2> [198.669621] kernel BUG at drivers/gpu/drm/i915/gt/intel_timeline.c:387!
<4> [198.669703] invalid opcode: 0000 [#1] PREEMPT SMP PTI
<4> [198.669712] CPU: 0 PID: 1246 Comm: gem_exec_create Tainted: G     U  W         5.5.0-rc6-CI-CI_DRM_7755+ #1
<4> [198.669723] Hardware name:  /NUC7i5BNB, BIOS BNKBL357.86A.0054.2017.1025.1822 10/25/2017
<4> [198.669776] RIP: 0010:timeline_advance+0x7b/0xe0 [i915]
<4> [198.669785] Code: 00 48 c7 c2 10 f1 46 a0 48 c7 c7 70 1b 32 a0 e8 bb dd e7 e0 bf 01 00 00 00 e8 d1 af e7 e0 31 f6 bf 09 00 00 00 e8 35 ef d8 e0 <0f> 0b 48 c7 c1 48 fa 49 a0 ba 84 01 00 00 48 c7 c6 10 f1 46 a0 48
<4> [198.669803] RSP: 0018:ffffc900004c3a38 EFLAGS: 00010296
<4> [198.669810] RAX: ffff888270b35140 RBX: ffff88826f32ee00 RCX: 0000000000000006
<4> [198.669818] RDX: 00000000000017c5 RSI: 0000000000000000 RDI: 0000000000000009
<4> [198.669826] RBP: ffffc900004c3a64 R08: 0000000000000000 R09: 0000000000000000
<4> [198.669834] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88826f9b5980
<4> [198.669841] R13: 0000000000000cc0 R14: ffffc900004c3dc0 R15: ffff888253610068
<4> [198.669849] FS:  00007f63e663fe40(0000) GS:ffff888276c00000(0000) knlGS:0000000000000000
<4> [198.669857] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [198.669864] CR2: 00007f171f8e39a8 CR3: 000000026b1f6005 CR4: 00000000003606f0
<4> [198.669872] Call Trace:
<4> [198.669924]  intel_timeline_get_seqno+0x12/0x40 [i915]
<4> [198.669977]  __i915_request_create+0x76/0x5a0 [i915]
<4> [198.670024]  i915_request_create+0x86/0x1c0 [i915]
<4> [198.670068]  i915_gem_do_execbuffer+0xbf2/0x2500 [i915]
<4> [198.670082]  ? __lock_acquire+0x460/0x15d0
<4> [198.670128]  i915_gem_execbuffer2_ioctl+0x11f/0x470 [i915]
<4> [198.670171]  ? i915_gem_execbuffer_ioctl+0x300/0x300 [i915]
<4> [198.670181]  drm_ioctl_kernel+0xa7/0xf0
<4> [198.670188]  drm_ioctl+0x2e1/0x390
<4> [198.670233]  ? i915_gem_execbuffer_ioctl+0x300/0x300 [i915]

Fixes: 841350223816 ("drm/i915/gt: Drop mutex serialisation between context pin/unpin")
References: ac0e331a628b ("drm/i915: Tighten atomicity of i915_active_acquire vs i915_active_release")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200127152829.2842149-1-chris@chris-wilson.co.uk
(cherry picked from commit e5429340bfa2dc43a07c3329e0c30cdae4cc0b35)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
4 years agodrm/i915: Tighten atomicity of i915_active_acquire vs i915_active_release
Chris Wilson [Sun, 26 Jan 2020 10:23:43 +0000 (10:23 +0000)]
drm/i915: Tighten atomicity of i915_active_acquire vs i915_active_release

As we use a mutex to serialise the first acquire (as it may be a lengthy
operation), but only an atomic decrement for the release, we have to
be careful in case a second thread races and completes both
acquire/release as the first finishes its acquire.

Thread A Thread B
i915_active_acquire i915_active_acquire
  atomic_read() == 0   atomic_read() == 0
  mutex_lock()   mutex_lock()
  atomic_read() == 0
    ref->active();
  atomic_inc()
  mutex_unlock()
  atomic_read() == 1
i915_active_release
  atomic_dec_and_test() -> 0
    ref->retire()
  atomic_inc() -> 1
  mutex_unlock()

So thread A has acquired the ref->active_count but since the ref was
still active at the time, it did not initialise it. By switching the
check inside the mutex to an atomic increment only if already active, we
close the race.

Fixes: c9ad602feabe ("drm/i915: Split i915_active.mutex into an irq-safe spinlock for the rbtree")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200126102346.1877661-3-chris@chris-wilson.co.uk
(cherry picked from commit ac0e331a628b5ded087eab09fad2ffb082ac61ba)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
4 years agodrm/i915: Stub out i915_gpu_coredump_put
Chris Wilson [Fri, 24 Jan 2020 19:22:55 +0000 (19:22 +0000)]
drm/i915: Stub out i915_gpu_coredump_put

i915_gpu_coreddump_put is currently only defined if
CONFIG_DRM_I915_CAPTURE_ERROR is enabled, provide a stub otherwise.

Reported-by: Mike Lothian <mike@fireburn.co.uk>
Fixes: 742379c0c400 ("drm/i915: Start chopping up the GPU error capture")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mike Lothian <mike@fireburn.co.uk>
Cc: Andi Shyti <andi.shyti@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200124192255.541355-1-chris@chris-wilson.co.uk
(cherry picked from commit 7e36505d0cf82f2920f2fd22ebb14a8b540396a3)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
4 years agodrm/i915: Check activity on i915_vma after confirming pin_count==0
Chris Wilson [Thu, 23 Jan 2020 22:44:58 +0000 (22:44 +0000)]
drm/i915: Check activity on i915_vma after confirming pin_count==0

Only assert that the i915_vma is now idle if and only if no other pins
are present. If another user has the i915_vma pinned, they may submit
more work to the i915_vma skipping the vm->mutex used to serialise the
unbind. We need to wait again, if we want to continue and unbind this
vma.

However, if we own the i915_vma (we hold the vm->mutex for the unbind
and the pin_count is 0), we can assert that the vma remains idle as we
unbind.

Fixes: 2850748ef876 ("drm/i915: Pull i915_vma_pin under the vm->mutex")
Closes: https://gitlab.freedesktop.org/drm/intel/issues/530
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200123224459.38128-1-chris@chris-wilson.co.uk
(cherry picked from commit 60e94557fff1f5514c7fc4da7ddc2c7a13ffff26)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
4 years agodrm/i915/gem: Detect overflow in calculating dumb buffer size
Chris Wilson [Thu, 23 Jan 2020 12:59:34 +0000 (12:59 +0000)]
drm/i915/gem: Detect overflow in calculating dumb buffer size

To multiply 2 u32 numbers to generate a u64 in C requires a bit of
forewarning for the compiler.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ramalingam C <ramalingam.c@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200123125934.1401755-1-chris@chris-wilson.co.uk
(cherry picked from commit 0f8f8a64300092852b9361cd835395ee71e6a7d6)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
4 years agodrm/i915: Don't show the blank process name for internal/simulated errors
Chris Wilson [Tue, 21 Jan 2020 13:21:07 +0000 (13:21 +0000)]
drm/i915: Don't show the blank process name for internal/simulated errors

For a simulated preemption reset, we don't populate the request and so
do not fill in the guilty context name.

[   79.991294] i915 0000:00:02.0: GPU HANG: ecode 9:1:e757fefe, in  [0]

Just don't mention the empty string in the logs!

Fixes: 742379c0c400 ("drm/i915: Start chopping up the GPU error capture")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200121132107.267709-1-chris@chris-wilson.co.uk
(cherry picked from commit 29baf3ae8daa4c673de58106ff41c7236dff57f4)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
4 years agodrm/i915/gem: Store mmap_offsets in an rbtree rather than a plain list
Chris Wilson [Mon, 20 Jan 2020 10:49:22 +0000 (10:49 +0000)]
drm/i915/gem: Store mmap_offsets in an rbtree rather than a plain list

Currently we create a new mmap_offset for every call to
mmap_offset_ioctl. This exposes ourselves to an abusive client that may
simply create new mmap_offsets ad infinitum, which will exhaust physical
memory and the virtual address space. In addition to the exhaustion, a
very long linear list of mmap_offsets causes other clients using the
object to incur long list walks -- these long lists can also be
generated by simply having many clients generate their own mmap_offset.

However, we can simply use the drm_vma_node itself to manage the file
association (allow/revoke) dropping our need to keep an mmo per-file.
Then if we keep a small rbtree of per-type mmap_offsets, we can lookup
duplicate requests quickly.

Fixes: cc662126b413 ("drm/i915: Introduce DRM_I915_GEM_MMAP_OFFSET")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200120104924.4000706-3-chris@chris-wilson.co.uk
(cherry picked from commit 7865559872074a9ab169c87915504661d630addf)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
4 years agodrm/i915/execlists: Leave resetting ring to intel_ring
Chris Wilson [Wed, 15 Jan 2020 17:58:29 +0000 (17:58 +0000)]
drm/i915/execlists: Leave resetting ring to intel_ring

We need to allow concurrent intel_context_unpin, which means avoiding
doing destructive operations like intel_ring_reset(). This was already
fixed for intel_ring_unpin() in commit 0725d9a31869 ("drm/i915/gt: Make
intel_ring_unpin() safe for concurrent pint"), but I overlooked that
execlists_context_unpin() also made the same mistake.

Reported-by: Matthew Brost <matthew.brost@intel.com>
Fixes: 841350223816 ("drm/i915/gt: Drop mutex serialisation between context pin/unpin")
References: 0725d9a31869 ("drm/i915/gt: Make intel_ring_unpin() safe for concurrent pint")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200115175829.2761329-1-chris@chris-wilson.co.uk
(cherry picked from commit f3c0efc9fe7a4e61544034f525348a3aa86ac5aa)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
4 years agodrm/i915/gt: Use the BIT when checking the flags, not the index
Chris Wilson [Wed, 15 Jan 2020 12:25:09 +0000 (12:25 +0000)]
drm/i915/gt: Use the BIT when checking the flags, not the index

In converting over to using set_bit()/test_bit(), when manually
inspecting the rq->fence.flags, we need to use BIT().

Fixes: e1c31fb5dde3 ("drm/i915: Merge i915_request.flags with i915_request.fence.flags")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200115122509.2673075-1-chris@chris-wilson.co.uk
(cherry picked from commit 72ff2b8d5f2dcb09bfa37b902c23311eec426496)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
4 years agodrm/i915/selftests: Add a mock i915_vma to the mock_ring
Chris Wilson [Tue, 14 Jan 2020 16:00:30 +0000 (16:00 +0000)]
drm/i915/selftests: Add a mock i915_vma to the mock_ring

Add a i915_vma to the mock_engine/mock_ring so that the core code can
always assume the presence of ring->vma.

Fixes: 8ccfc20a7d56 ("drm/i915/gt: Mark ring->vma as active while pinned")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200114160030.2468927-1-chris@chris-wilson.co.uk
(cherry picked from commit b63b4feaef7363d2cf46dd76bb6e87e060b2b0de)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
4 years agodrm/i915: Make a copy of the ggtt view for slave plane
Ville Syrjälä [Fri, 10 Jan 2020 18:32:23 +0000 (20:32 +0200)]
drm/i915: Make a copy of the ggtt view for slave plane

intel_prepare_plane_fb() will always pin plane_state->hw.fb whenever
it is present. We copy that from the master plane to the slave plane,
but we fail to copy the corresponding ggtt view. Thus when it comes time
to pin the slave plane's fb we use some stale ggtt view left over from
the last time the plane was used as a non-slave plane. If that previous
use involved 90/270 degree rotation or remapping we'll try to shuffle
the pages of the new fb around accordingingly. However the new
fb may be backed by a bo with less pages than what the ggtt view
rotation/remapped info requires, and so we we trip a GEM_BUG().

Steps to reproduce on icl:
1. plane 1: whatever
   plane 6: largish !NV12 fb + 90 degree rotation
2. plane 1: smallish NV12 fb
   plane 6: make invisible so it gets slaved to plane 1
3. GEM_BUG()

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Closes: https://gitlab.freedesktop.org/drm/intel/issues/951
Fixes: 1f594b209fe1 ("drm/i915: Remove special case slave handling during hw programming, v3.")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200110183228.8199-1-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
(cherry picked from commit 103605e0d1e77cfb5d0f5a9e8aba7d97f1b49339)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
4 years agodrm/i915/gem: Take local vma references for the parser
Chris Wilson [Mon, 13 Jan 2020 15:45:55 +0000 (15:45 +0000)]
drm/i915/gem: Take local vma references for the parser

Take and hold a reference to each of the vma (and their objects) as we
process them with the cmdparser. This stops them being freed during the
work if the GEM execbuf is interrupted and the request we expected to
keep the objects alive is incomplete.

Fixes: 686c7c35abc2 ("drm/i915/gem: Asynchronous cmdparser")
Closes: https://gitlab.freedesktop.org/drm/intel/issues/970
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200113154555.1909639-1-chris@chris-wilson.co.uk
(cherry picked from commit 36c8e356a76e147f0b631fd29838147c01b50d04)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
4 years agodrm/i915/pmu: Correct the rc6 offset upon enabling
Chris Wilson [Tue, 14 Jan 2020 10:56:47 +0000 (10:56 +0000)]
drm/i915/pmu: Correct the rc6 offset upon enabling

The rc6 residency starts ticking from 0 from BIOS POST, but the kernel
starts measuring the time from its boot. If we start measuruing
I915_PMU_RC6_RESIDENCY while the GT is idle, we start our sampling from
0 and then upon first activity (park/unpark) add in all the rc6
residency since boot. After the first park with the sampler engaged, the
sleep/active counters are aligned.

v2: With a wakeref to be sure

Closes: https://gitlab.freedesktop.org/drm/intel/issues/973
Fixes: df6a42053513 ("drm/i915/pmu: Ensure monotonic rc6")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200114105648.2172026-1-chris@chris-wilson.co.uk
(cherry picked from commit f4e9894b6952a2819937f363cd42e7cd7894a1e4)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
4 years agodrm/i915/gvt: more locking for ppgtt mm LRU list
Igor Druzhinin [Mon, 3 Feb 2020 15:07:01 +0000 (15:07 +0000)]
drm/i915/gvt: more locking for ppgtt mm LRU list

When the lock was introduced in commit 72aabfb862e40 ("drm/i915/gvt: Add mutual
lock for ppgtt mm LRU list") one place got lost.

Fixes: 72aabfb862e4 ("drm/i915/gvt: Add mutual lock for ppgtt mm LRU list")
Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1580742421-25194-1-git-send-email-igor.druzhinin@citrix.com
4 years agodrm/i915/gvt: fix high-order allocation failure on late load
Igor Druzhinin [Wed, 22 Jan 2020 20:10:24 +0000 (20:10 +0000)]
drm/i915/gvt: fix high-order allocation failure on late load

If the module happens to be loaded later at runtime there is a chance
memory is already fragmented enough to fail allocation of firmware
blob storage and consequently GVT init. Since it doesn't seem to be
necessary to have the blob contiguous, use vmalloc() instead to avoid
the issue.

Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1579723824-25711-1-git-send-email-igor.druzhinin@citrix.com
4 years agodrm/i915: Fix i915_error_state_store error defination
Zhang Xiaoxu [Fri, 17 Jan 2020 07:34:36 +0000 (15:34 +0800)]
drm/i915: Fix i915_error_state_store error defination

Since commit 742379c0c4001 ("drm/i915: Start chopping up the GPU error
capture"), function 'i915_error_state_store' was defined and used with
only one parameter.

But if no 'CONFIG_DRM_I915_CAPTURE_ERROR', this function was defined
with two parameter.

This may lead compile error. This patch fix it.

Fixes: 742379c0c400 ("drm/i915: Start chopping up the GPU error capture")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200117073436.6507-1-zhangxiaoxu5@huawei.com
(cherry picked from commit 04062c58faafddf62006c6f8e5077dc050e8207e)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
4 years agodrm/i915/bios: Fix the timing parameters
Vandita Kulkarni [Fri, 24 Jan 2020 12:58:29 +0000 (18:28 +0530)]
drm/i915/bios: Fix the timing parameters

Fix htotal and vtotal parameters derived from DTD block of VBT. The
values miss the back porch.

Fixes: 33ef6d4fd8df ("drm/i915/vbt: Handle generic DTD block")
Signed-off-by: Vandita Kulkarni <vandita.kulkarni@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200124125829.16973-1-vandita.kulkarni@intel.com
(cherry picked from commit ad278f358446707d03a1fe89f880e6ac80ca06cd)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
4 years agodrm/i915/dsi: Ensure that the ACPI adapter lookup overrides the bus num
Vivek Kasireddy [Sat, 18 Jan 2020 00:58:48 +0000 (16:58 -0800)]
drm/i915/dsi: Ensure that the ACPI adapter lookup overrides the bus num

Remove the i2c_bus_num >= 0 check from the adapter lookup function
as this would prevent ACPI bus number override. This check was mainly
there to return early if the bus number has already been found but we
anyway return in the next line if the slave address does not match.

Fixes: 8cbf89db2941 ("drm/i915/dsi: Parse the I2C element from the VBT MIPI sequence block (v3)")
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Nabendu Maiti <nabendu.bikash.maiti@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Bob Paauwe <bob.j.paauwe@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200118005848.20382-1-vivek.kasireddy@intel.com
(cherry picked from commit de409661c4c90d63cfc64579edbad0a6b10bd50d)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
4 years agodrm/i915: Fix post-fastset modeset check for port sync
Ville Syrjälä [Wed, 15 Jan 2020 19:08:09 +0000 (21:08 +0200)]
drm/i915: Fix post-fastset modeset check for port sync

The post-fastset "does anyone still need a full modeset?" for
port sync looks busted. The outer loop bails out of a full modeset
is still needed by the current crtc, and then we skip forcing
a full modeset on the related crtcs. That's totally the opposite
of what we want.

The MST path has the logic mostly the other way around so it
looks correct. To fix the port sync case let's follow the MST
logic for both. So, if the current crtc already needs a modeset
we do nothing. otherwise we check if any of the related crtcs
needs a modeset, and if so we force a full modeset for the
current crtc.

And while at let's change the else if to a plain if to so
we don't have needless coupling between the MST and port sync
checks.

Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Manasi Navare <manasi.d.navare@intel.com>
Fixes: 05a8e45136ca ("drm/i915/display: Use external dependency loop for port sync")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200115190813.17971-1-ville.syrjala@linux.intel.com
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
(cherry picked from commit d0eed1545fe75f115a548691a008e94b0e7abc45)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
4 years agodrm/i915/dsi: Lookup the i2c bus from ACPI NS only if CONFIG_ACPI=y (v2)
Vivek Kasireddy [Wed, 15 Jan 2020 01:23:05 +0000 (17:23 -0800)]
drm/i915/dsi: Lookup the i2c bus from ACPI NS only if CONFIG_ACPI=y (v2)

Perform the i2c bus/adapter lookup from ACPI Namespace only if ACPI is
enabled in the kernel config. If ACPI is not enabled or if the lookup
fails, we'll fallback to using the VBT for identifying the i2c bus.

v2: Add fixes tag (Jani)

Fixes: 8cbf89db2941 ("drm/i915/dsi: Parse the I2C element from the VBT MIPI sequence block (v3)")
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Nabendu Maiti <nabendu.bikash.maiti@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Bob Paauwe <bob.j.paauwe@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200115012305.27395-1-vivek.kasireddy@intel.com
(cherry picked from commit 960287ca58fd549af9826ff1cb735fe17d031486)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
4 years agoMerge tag 'amd-drm-next-5.6-2020-02-05' of git://people.freedesktop.org/~agd5f/linux...
Dave Airlie [Fri, 7 Feb 2020 02:29:35 +0000 (12:29 +1000)]
Merge tag 'amd-drm-next-5.6-2020-02-05' of git://people.freedesktop.org/~agd5f/linux into drm-next

amd-drm-next-5.6-2020-02-05:

amdgpu:
- EDC fixes for Arcturus
- GDDR6 memory training fixe
- Fix for reading gfx clockgating registers while in GFXOFF state
- i2c freq fixes
- Misc display fixes
- TLB invalidation fix when using semaphores
- VCN 2.5 instancing fixes
- Switch raven1 gfxoff to a blacklist
- Coreboot workaround for KV/KB
- Root cause dongle fixes for display and revert workaround
- Enable GPU reset for renoir and navi
- Navi overclocking fixes
- Fix up confusing warnings in display clock validation on raven

amdkfd:
- SDMA fix

radeon:
- Misc LUT fixes

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200206035458.3894-1-alexander.deucher@amd.com
4 years agoMerge branch 'linux-5.6' of git://github.com/skeggsb/linux into drm-next
Dave Airlie [Fri, 7 Feb 2020 02:23:24 +0000 (12:23 +1000)]
Merge branch 'linux-5.6' of git://github.com/skeggsb/linux into drm-next

Just a couple of fixes to Volta/Turing modesetting on some systems.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Ben Skeggs <skeggsb@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/
4 years agoMerge tag 'drm/tegra/for-5.6-rc1-fixes' of git://anongit.freedesktop.org/tegra/linux...
Dave Airlie [Fri, 7 Feb 2020 02:22:23 +0000 (12:22 +1000)]
Merge tag 'drm/tegra/for-5.6-rc1-fixes' of git://anongit.freedesktop.org/tegra/linux into drm-next

drm/tegra: Fixes for v5.6-rc1

These are a couple of quick fixes for regressions that were found during
the first two weeks of the merge window.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thierry Reding <thierry.reding@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200206172753.2185390-1-thierry.reding@gmail.com
4 years agogpu: host1x: Set DMA direction only for DMA-mapped buffer objects
Thierry Reding [Tue, 4 Feb 2020 13:59:26 +0000 (14:59 +0100)]
gpu: host1x: Set DMA direction only for DMA-mapped buffer objects

The DMA direction is only used by the DMA API, so there is no use in
setting it when a buffer object isn't mapped with the DMA API.

Signed-off-by: Thierry Reding <treding@nvidia.com>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
Reviewed-by: Dmitry Osipenko <digetx@gmail.com>
4 years agodrm/tegra: Reuse IOVA mapping where possible
Thierry Reding [Tue, 4 Feb 2020 13:59:25 +0000 (14:59 +0100)]
drm/tegra: Reuse IOVA mapping where possible

This partially reverts the DMA API support that was recently merged
because it was causing performance regressions on older Tegra devices.
Unfortunately, the cache maintenance performed by dma_map_sg() and
dma_unmap_sg() causes performance to drop by a factor of 10.

The right solution for this would be to cache mappings for buffers per
consumer device, but that's a bit involved. Instead, we simply revert to
the old behaviour of sharing IOVA mappings when we know that devices can
do so (i.e. they share the same IOMMU domain).

Cc: <stable@vger.kernel.org> # v5.5
Reported-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
Reviewed-by: Dmitry Osipenko <digetx@gmail.com>
4 years agodrm/tegra: Relax IOMMU usage criteria on old Tegra
Thierry Reding [Tue, 4 Feb 2020 13:59:24 +0000 (14:59 +0100)]
drm/tegra: Relax IOMMU usage criteria on old Tegra

Older Tegra devices only allow addressing 32 bits of memory, so whether
or not the host1x is attached to an IOMMU doesn't matter. host1x IOMMU
attachment is only needed on devices that can address memory beyond the
32-bit boundary and where the host1x doesn't support the wide GATHER
opcode that allows it to access buffers at higher addresses.

Cc: <stable@vger.kernel.org> # v5.5
Signed-off-by: Thierry Reding <treding@nvidia.com>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
Reviewed-by: Dmitry Osipenko <digetx@gmail.com>
4 years agodrm/amd/dm/mst: Ignore payload update failures
Lyude Paul [Fri, 24 Jan 2020 19:10:46 +0000 (14:10 -0500)]
drm/amd/dm/mst: Ignore payload update failures

Disabling a display on MST can potentially happen after the entire MST
topology has been removed, which means that we can't communicate with
the topology at all in this scenario. Likewise, this also means that we
can't properly update payloads on the topology and as such, it's a good
idea to ignore payload update failures when disabling displays.
Currently, amdgpu makes the mistake of halting the payload update
process when any payload update failures occur, resulting in leaving
DC's local copies of the payload tables out of date.

This ends up causing problems with hotplugging MST topologies, and
causes modesets on the second hotplug to fail like so:

[drm] Failed to updateMST allocation table forpipe idx:1
------------[ cut here ]------------
WARNING: CPU: 5 PID: 1511 at
drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_link.c:2677
update_mst_stream_alloc_table+0x11e/0x130 [amdgpu]
Modules linked in: cdc_ether usbnet fuse xt_conntrack nf_conntrack
nf_defrag_ipv6 libcrc32c nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4
nft_counter nft_compat nf_tables nfnetlink tun bridge stp llc sunrpc
vfat fat wmi_bmof uvcvideo snd_hda_codec_realtek snd_hda_codec_generic
snd_hda_codec_hdmi videobuf2_vmalloc snd_hda_intel videobuf2_memops
videobuf2_v4l2 snd_intel_dspcfg videobuf2_common crct10dif_pclmul
snd_hda_codec videodev crc32_pclmul snd_hwdep snd_hda_core
ghash_clmulni_intel snd_seq mc joydev pcspkr snd_seq_device snd_pcm
sp5100_tco k10temp i2c_piix4 snd_timer thinkpad_acpi ledtrig_audio snd
wmi soundcore video i2c_scmi acpi_cpufreq ip_tables amdgpu(O)
rtsx_pci_sdmmc amd_iommu_v2 gpu_sched mmc_core i2c_algo_bit ttm
drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec drm
crc32c_intel serio_raw hid_multitouch r8152 mii nvme r8169 nvme_core
rtsx_pci pinctrl_amd
CPU: 5 PID: 1511 Comm: gnome-shell Tainted: G           O      5.5.0-rc7Lyude-Test+ #4
Hardware name: LENOVO FA495SIT26/FA495SIT26, BIOS R12ET22W(0.22 ) 01/31/2019
RIP: 0010:update_mst_stream_alloc_table+0x11e/0x130 [amdgpu]
Code: 28 00 00 00 75 2b 48 8d 65 e0 5b 41 5c 41 5d 41 5e 5d c3 0f b6 06
49 89 1c 24 41 88 44 24 08 0f b6 46 01 41 88 44 24 09 eb 93 <0f> 0b e9
2f ff ff ff e8 a6 82 a3 c2 66 0f 1f 44 00 00 0f 1f 44 00
RSP: 0018:ffffac428127f5b0 EFLAGS: 00010202
RAX: 0000000000000002 RBX: ffff8d1e166eee80 RCX: 0000000000000000
RDX: ffffac428127f668 RSI: ffff8d1e166eee80 RDI: ffffac428127f610
RBP: ffffac428127f640 R08: ffffffffc03d94a8 R09: 0000000000000000
R10: ffff8d1e24b02000 R11: ffffac428127f5b0 R12: ffff8d1e1b83d000
R13: ffff8d1e1bea0b08 R14: 0000000000000002 R15: 0000000000000002
FS:  00007fab23ffcd80(0000) GS:ffff8d1e28b40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f151f1711e8 CR3: 00000005997c0000 CR4: 00000000003406e0
Call Trace:
 ? mutex_lock+0xe/0x30
 dc_link_allocate_mst_payload+0x9a/0x210 [amdgpu]
 ? dm_read_reg_func+0x39/0xb0 [amdgpu]
 ? core_link_enable_stream+0x656/0x730 [amdgpu]
 core_link_enable_stream+0x656/0x730 [amdgpu]
 dce110_apply_ctx_to_hw+0x58e/0x5d0 [amdgpu]
 ? dcn10_verify_allow_pstate_change_high+0x1d/0x280 [amdgpu]
 ? dcn10_wait_for_mpcc_disconnect+0x3c/0x130 [amdgpu]
 dc_commit_state+0x292/0x770 [amdgpu]
 ? add_timer+0x101/0x1f0
 ? ttm_bo_put+0x1a1/0x2f0 [ttm]
 amdgpu_dm_atomic_commit_tail+0xb59/0x1ff0 [amdgpu]
 ? amdgpu_move_blit.constprop.0+0xb8/0x1f0 [amdgpu]
 ? amdgpu_bo_move+0x16d/0x2b0 [amdgpu]
 ? ttm_bo_handle_move_mem+0x118/0x570 [ttm]
 ? ttm_bo_validate+0x134/0x150 [ttm]
 ? dm_plane_helper_prepare_fb+0x1b9/0x2a0 [amdgpu]
 ? _cond_resched+0x15/0x30
 ? wait_for_completion_timeout+0x38/0x160
 ? _cond_resched+0x15/0x30
 ? wait_for_completion_interruptible+0x33/0x190
 commit_tail+0x94/0x130 [drm_kms_helper]
 drm_atomic_helper_commit+0x113/0x140 [drm_kms_helper]
 drm_atomic_helper_set_config+0x70/0xb0 [drm_kms_helper]
 drm_mode_setcrtc+0x194/0x6a0 [drm]
 ? _cond_resched+0x15/0x30
 ? mutex_lock+0xe/0x30
 ? drm_mode_getcrtc+0x180/0x180 [drm]
 drm_ioctl_kernel+0xaa/0xf0 [drm]
 drm_ioctl+0x208/0x390 [drm]
 ? drm_mode_getcrtc+0x180/0x180 [drm]
 amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
 do_vfs_ioctl+0x458/0x6d0
 ksys_ioctl+0x5e/0x90
 __x64_sys_ioctl+0x16/0x20
 do_syscall_64+0x55/0x1b0
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fab2121f87b
Code: 0f 1e fa 48 8b 05 0d 96 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff
ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01
f0 ff ff 73 01 c3 48 8b 0d dd 95 2c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffd045f9068 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007ffd045f90a0 RCX: 00007fab2121f87b
RDX: 00007ffd045f90a0 RSI: 00000000c06864a2 RDI: 000000000000000b
RBP: 00007ffd045f90a0 R08: 0000000000000000 R09: 000055dbd2985d10
R10: 000055dbd2196280 R11: 0000000000000246 R12: 00000000c06864a2
R13: 000000000000000b R14: 0000000000000000 R15: 000055dbd2196280
---[ end trace 6ea888c24d2059cd ]---

Note as well, I have only been able to reproduce this on setups with 2
MST displays.

Changes since v1:
* Don't return false when part 1 or part 2 of updating the payloads
  fails, we don't want to abort at any step of the process even if
  things fail

Reviewed-by: Mikita Lipski <Mikita.Lipski@amd.com>
Signed-off-by: Lyude Paul <lyude@redhat.com>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 years agodrm/amdgpu: update default voltage for boot od table for navi1x
Alex Deucher [Tue, 4 Feb 2020 14:07:19 +0000 (09:07 -0500)]
drm/amdgpu: update default voltage for boot od table for navi1x

It needed to be updated as well so it will show the proper values
if you reset to the defaults.

Bug: https://gitlab.freedesktop.org/drm/amd/issues/1020
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 years agodrm/amdgpu/smu10: fix smu10_get_clock_by_type_with_voltage
Alex Deucher [Wed, 29 Jan 2020 17:42:57 +0000 (12:42 -0500)]
drm/amdgpu/smu10: fix smu10_get_clock_by_type_with_voltage

Cull out 0 clocks to avoid a warning in DC.

Bug: https://gitlab.freedesktop.org/drm/amd/issues/963
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 years agodrm/amdgpu/smu10: fix smu10_get_clock_by_type_with_latency
Alex Deucher [Tue, 28 Jan 2020 18:19:51 +0000 (13:19 -0500)]
drm/amdgpu/smu10: fix smu10_get_clock_by_type_with_latency

Only send non-0 clocks to DC for validation.  This mirrors
what the windows driver does.

Bug: https://gitlab.freedesktop.org/drm/amd/issues/963
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 years agodrm/amdgpu/display: handle multiple numbers of fclks in dcn_calcs.c (v2)
Alex Deucher [Tue, 28 Jan 2020 19:39:45 +0000 (14:39 -0500)]
drm/amdgpu/display: handle multiple numbers of fclks in dcn_calcs.c (v2)

We might get different numbers of clocks from powerplay depending
on what the OEM has populated.

v2: add assert for at least one level

Bug: https://gitlab.freedesktop.org/drm/amd/issues/963
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 years agodrm/amdgpu: fetch default VDDC curve voltages (v2)
Alex Deucher [Sat, 25 Jan 2020 18:30:45 +0000 (13:30 -0500)]
drm/amdgpu: fetch default VDDC curve voltages (v2)

Ask the SMU for the default VDDC curve voltage values.  This
properly reports the VDDC values in the OD interface.

v2: only update if the original values are 0

Bug: https://gitlab.freedesktop.org/drm/amd/issues/1020
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 5.5.x
4 years agodrm/amdgpu/smu_v11_0: Correct behavior of restoring default tables (v2)
Matt Coffin [Sat, 25 Jan 2020 18:04:05 +0000 (13:04 -0500)]
drm/amdgpu/smu_v11_0: Correct behavior of restoring default tables (v2)

Previously, the syfs functionality for restoring the default powerplay
table was sourcing it's information from the currently-staged powerplay
table.

This patch adds a step to cache the first overdrive table that we see on
boot, so that it can be used later to "restore" the powerplay table

v2: sqaush my original with Matt's fix

Bug: https://gitlab.freedesktop.org/drm/amd/issues/1020
Signed-off-by: Matt Coffin <mcoffin13@gmail.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 5.5.x
4 years agodrm/amdgpu/navi10: add OD_RANGE for navi overclocking
Alex Deucher [Sat, 25 Jan 2020 16:27:06 +0000 (11:27 -0500)]
drm/amdgpu/navi10: add OD_RANGE for navi overclocking

So users can see the range of valid values.

Bug: https://gitlab.freedesktop.org/drm/amd/issues/1020
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 5.5.x
4 years agodrm/amdgpu/navi: fix index for OD MCLK
Alex Deucher [Sat, 25 Jan 2020 16:51:41 +0000 (11:51 -0500)]
drm/amdgpu/navi: fix index for OD MCLK

You can only adjust the max mclk, not the min.

Bug: https://gitlab.freedesktop.org/drm/amd/issues/1020
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 5.5.x
4 years agodrm/amd/display: Fix HW/SW state mismatch
Bhawanpreet Lakha [Fri, 6 Dec 2019 18:16:08 +0000 (13:16 -0500)]
drm/amd/display: Fix HW/SW state mismatch

[Why]
When we disable a connector we don't explicitly remove it from the module so the
display is still cached(SW) in the hdcp_module.

SST: no issues because we can only have 1 display per link

MST: We have x displays per link, now if we disable 1 we don't remove it from the
module so the module has x display cached(SW).

If we try to enable HDCP, psp verification will fail because we are reporting x
displays while the HW only has x-1 display enabled

[How]
Check the callback for when we disable stream and call remove display.

Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 years agodrm/amd/display: Fix a typo when computing dsc configuration
Mikita Lipski [Fri, 31 Jan 2020 14:51:23 +0000 (09:51 -0500)]
drm/amd/display: Fix a typo when computing dsc configuration

[why]
Remove a backslash symbol accidentally left in increase bpp function
when computing mst dsc configuration.

Signed-off-by: Mikita Lipski <mikita.lipski@amd.com>
Reviewed-by: Zhan Liu <zhan.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 years agodrm/amd/powerplay: fix navi10 system intermittent reboot issue V2
Evan Quan [Thu, 30 Jan 2020 08:46:38 +0000 (16:46 +0800)]
drm/amd/powerplay: fix navi10 system intermittent reboot issue V2

This workaround is needed only for Navi10 12 Gbps SKUs.

V2: added SMU firmware version guard

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
4 years agodrm/amdkfd: Fix a bug in SDMA RLC queue counting under HWS mode
Yong Zhao [Thu, 30 Jan 2020 00:55:47 +0000 (19:55 -0500)]
drm/amdkfd: Fix a bug in SDMA RLC queue counting under HWS mode

The sdma_queue_count increment should be done before
execute_queues_cpsch(), which calls pm_calc_rlib_size() where
sdma_queue_count is used to calculate whether over_subscription is
triggered.

With the previous code, when a SDMA queue is created,
compute_queue_count in pm_calc_rlib_size() is one more than the
actual compute queue number, because the queue_count has been
incremented while sdma_queue_count has not. This patch fixes that.

Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 years agodrm/amd/display: Only enable cursor on pipes that need it
Nicholas Kazlauskas [Thu, 30 Jan 2020 18:29:05 +0000 (13:29 -0500)]
drm/amd/display: Only enable cursor on pipes that need it

[Why]
In current code we're essentially drawing the cursor on every pipe
that contains it. This only works when the planes have the same
scaling for src to dest rect, otherwise we'll get "double cursor" where
one cursor is incorrectly filtered and offset from the real position.

[How]
Without dedicated cursor planes on DCN we require at least one pipe
that matches the scaling of the current timing.

This is an optimization and workaround for the most common case where
the top-most plane is not scaled but the bottom-most plane is scaled.

Whenever a pipe has a parent pipe in the blending tree whose recout
fully contains the current pipe we can disable the pipe.

This only applies when the pipe is actually visible of course.

Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 years agodrm/nouveau/kms/gv100-: avoid sending a core update until the first modeset
Ben Skeggs [Mon, 3 Feb 2020 08:37:07 +0000 (03:37 -0500)]
drm/nouveau/kms/gv100-: avoid sending a core update until the first modeset

The OR routing logic in NVKM does not expect to receive supervisor
interrupts until the DD has provided consistent information on the
ORs it's using and the EVO/NVD assembly state to match.

The combination of changing window ownership + core channel update
during display init triggered a situation where we'd disconnect an
OR from the pad it was meant to still be driving on some systems.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
4 years agodrm/nouveau/kms/gv100-: move window ownership setup into modesetting path
Ben Skeggs [Mon, 3 Feb 2020 08:36:30 +0000 (03:36 -0500)]
drm/nouveau/kms/gv100-: move window ownership setup into modesetting path

For various complicated reasons, we need to avoid sending a core update
method during display init.  Something, which we've been required to do
on GV100 and up because we've been assigning windows to heads there and
the HW is rather picky about when that's allowed.

This moves window assignment into the modesetting path at a point where
it's much safer to send our first update methods to NVDisplay.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
4 years agodrm/nouveau/disp/gv100-: halt NV_PDISP_FE_RM_INTR_STAT_CTRL_DISP_ERROR storms
Ben Skeggs [Mon, 3 Feb 2020 06:58:45 +0000 (01:58 -0500)]
drm/nouveau/disp/gv100-: halt NV_PDISP_FE_RM_INTR_STAT_CTRL_DISP_ERROR storms

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
4 years agodrm/tegra: sor: Initialize runtime PM before use
Thierry Reding [Fri, 31 Jan 2020 16:59:10 +0000 (17:59 +0100)]
drm/tegra: sor: Initialize runtime PM before use

Commit fd67e9c6ed5a ("drm/tegra: Do not implement runtime PM") replaced
the generic runtime PM usage by a host1x bus-specific implementation in
order to work around some assumptions baked into runtime PM that are in
conflict with the requirements in the Tegra DRM driver.

Unfortunately the new runtime PM callbacks are not setup yet at the time
when the SOR driver first needs to resume the device to register the SOR
pad clock, and accesses to register will cause the system to hang.

Note that this only happens on Tegra124 and Tegra210 because those are
the only SoCs where the SOR pad clock is registered from the SOR driver.
Later generations use a SOR pad clock provided by the BPMP.

Fix this by moving the registration of the SOR pad clock after the
host1x client has been registered. That's somewhat suboptimal because
this could potentially, though it's very unlikely, cause the Tegra DRM
to be probed if the SOR happens to be the last subdevice to register,
only to be immediately removed again if the SOR pad output clock fails
to register. That's just a minor annoyance, though, and doesn't justify
implementing a workaround.

Fixes: fd67e9c6ed5a ("drm/tegra: Do not implement runtime PM")
Signed-off-by: Thierry Reding <treding@nvidia.com>
4 years agodrm/tegra: sor: Disable runtime PM on probe failure
Thierry Reding [Fri, 31 Jan 2020 16:59:09 +0000 (17:59 +0100)]
drm/tegra: sor: Disable runtime PM on probe failure

If the driver fails to probe, make sure to disable runtime PM again.

While at it, make the cleanup code in ->remove() symmetric.

Signed-off-by: Thierry Reding <treding@nvidia.com>
4 years agodrm/tegra: sor: Suspend on clock registration failure
Thierry Reding [Fri, 31 Jan 2020 16:59:08 +0000 (17:59 +0100)]
drm/tegra: sor: Suspend on clock registration failure

Make sure the SOR module is suspenden after we fail to register the SOR
pad output clock.

Signed-off-by: Thierry Reding <treding@nvidia.com>
4 years agoMerge branch 'ttm-prot-fix' of git://people.freedesktop.org/~thomash/linux into drm-next
Dave Airlie [Fri, 31 Jan 2020 05:18:21 +0000 (15:18 +1000)]
Merge branch 'ttm-prot-fix' of git://people.freedesktop.org/~thomash/linux into drm-next

A small fix for the long-standing ttm vm page protection hack.

Sent as a separate PR as it touches mm, has all acks in place.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Hellström (VMware) <thellstrom@vmware.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200116102411.3056-1-thomas_os@shipmail.org
4 years agodrm/amdgpu/navi10: add mclk to navi10_get_clock_by_type_with_latency
Alex Deucher [Sat, 25 Jan 2020 16:30:25 +0000 (11:30 -0500)]
drm/amdgpu/navi10: add mclk to navi10_get_clock_by_type_with_latency

Doesn't seem to be used, but add it just in case.

Reviewed-by: Matt Coffin <mcoffin13@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 years agodrm/amdgpu: Fix implicit enum conversion in gfx_v9_4_ras_error_inject
Nathan Chancellor [Thu, 30 Jan 2020 01:24:35 +0000 (18:24 -0700)]
drm/amdgpu: Fix implicit enum conversion in gfx_v9_4_ras_error_inject

Clang warns:

../drivers/gpu/drm/amd/amdgpu/gfx_v9_4.c:967:35: warning: implicit
conversion from enumeration type 'enum amdgpu_ras_block' to different
enumeration type 'enum ta_ras_block' [-Wenum-conversion]
        block_info.block_id = info->head.block;
                            ~ ~~~~~~~~~~~^~~~~
1 warning generated.

Use the function added in commit 828cfa29093f ("drm/amdgpu: Fix amdgpu
ras to ta enums conversion") that handles this conversion explicitly.

Fixes: 4c461d89db4f ("drm/amdgpu: add RAS support for the gfx block of Arcturus")
Link: https://github.com/ClangBuiltLinux/linux/issues/849
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 years agoradeon: completely remove lut leftovers
Daniel Vetter [Wed, 29 Jan 2020 08:09:05 +0000 (09:09 +0100)]
radeon: completely remove lut leftovers

This is an oversight from

commit 42585395ebc1034a98937702849669f17eadb35f
Author: Peter Rosin <peda@axentia.se>
Date:   Thu Jul 13 18:25:36 2017 +0200

    drm: radeon: remove dead code and pointless local lut storage

v2: Also remove leftover local variable.

Cc: Peter Rosin <peda@axentia.se>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 years agodrm/amd/display: Move drm_dp_mst_atomic_check() to the front of dc_validate_global_st...
Zhan Liu [Tue, 28 Jan 2020 21:38:53 +0000 (16:38 -0500)]
drm/amd/display: Move drm_dp_mst_atomic_check() to the front of dc_validate_global_state()

[Why]
Need to do atomic check first, then validate global state.
If not, when connecting both MST and HDMI displays and
set a bad mode via xrandr, system will hang.

[How]
Move drm_dp_mst_atomic_check() to the front of
dc_validate_global_state().

Signed-off-by: Zhan Liu <zhan.liu@amd.com>
Reviewed-by: Mikita Lipski <mikita.lipski@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 years agoradeon: insert 10ms sleep in dce5_crtc_load_lut
Daniel Vetter [Tue, 28 Jan 2020 16:09:52 +0000 (17:09 +0100)]
radeon: insert 10ms sleep in dce5_crtc_load_lut

Per at least one tester this is enough magic to recover the regression
introduced for some people (but not all) in

commit b8e2b0199cc377617dc238f5106352c06dcd3fa2
Author: Peter Rosin <peda@axentia.se>
Date:   Tue Jul 4 12:36:57 2017 +0200

    drm/fb-helper: factor out pseudo-palette

which for radeon had the side-effect of refactoring out a seemingly
redudant writing of the color palette.

10ms in a fairly slow modeset path feels like an acceptable form of
duct-tape, so maybe worth a shot and see what sticks.

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Michel Dänzer <michel.daenzer@amd.com>
References: https://bugzilla.kernel.org/show_bug.cgi?id=198123
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 years agodrm/amd/display: fix spelling mistake link_integiry_check -> link_integrity_check
Colin Ian King [Tue, 28 Jan 2020 11:28:27 +0000 (11:28 +0000)]
drm/amd/display: fix spelling mistake link_integiry_check -> link_integrity_check

There is a spelling mistake on the struct field name link_integiry_check,
fix this by renaming it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 years agoamdgpu: using vmalloc requires includeing vmalloc.h
Stephen Rothwell [Tue, 28 Jan 2020 04:42:27 +0000 (15:42 +1100)]
amdgpu: using vmalloc requires includeing vmalloc.h

Fixes: 240c811ccde4 ("drm/amdgpu: fix VRAM partially encroached issue in GDDR6 memory training(V2)")
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 years agodrm/amdgpu: allocate entities on demand
Nirmoy Das [Tue, 21 Jan 2020 14:53:53 +0000 (15:53 +0100)]
drm/amdgpu: allocate entities on demand

Currently we pre-allocate entities and fences for all the HW IPs on
context creation and some of which are might never be used.

This patch tries to resolve entity/fences wastage by creating entity
only when needed.

v2: allocate memory for entity and fences together

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 years agodrm/amdgpu: Enable DISABLE_BARRIER_WAITCNT for Arcturus
Joseph Greathouse [Mon, 27 Jan 2020 22:08:11 +0000 (16:08 -0600)]
drm/amdgpu: Enable DISABLE_BARRIER_WAITCNT for Arcturus

In previous gfx9 parts, S_BARRIER shader instructions are implicitly
S_WAITCNT 0 instructions as well. This setting turns off that
mechanism in Arcturus and beyond. With this, shaders must follow the
ISA guide insofar as putting in explicit S_WAITCNT operations even
after an S_BARRIER.

v2: Fix patch title to list component

Signed-off-by: Joseph Greathouse <Joseph.Greathouse@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 years agoMerge branch 'linux-5.6' of git://github.com/skeggsb/linux into drm-next
Dave Airlie [Thu, 30 Jan 2020 05:18:33 +0000 (15:18 +1000)]
Merge branch 'linux-5.6' of git://github.com/skeggsb/linux into drm-next

A couple of OOPS fixes, fixes for TU1xx if firmware isn't available,
better behaviour in the face of GPU faults, and a patch to make HD
audio work again after runpm changes.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Ben Skeggs <skeggsb@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/
4 years agodrm/nouveau/fb/gp102-: allow module to load even when scrubber binary is missing
Ben Skeggs [Wed, 29 Jan 2020 01:29:05 +0000 (11:29 +1000)]
drm/nouveau/fb/gp102-: allow module to load even when scrubber binary is missing

Without relaxing this requirement, TU10x boards will fail to load without
an updated linux-firmware, and TU11x will completely fail to load because
FW isn't available yet.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
4 years agodrm/nouveau/acr: return error when registering LSF if ACR not supported
Ben Skeggs [Wed, 29 Jan 2020 01:20:34 +0000 (11:20 +1000)]
drm/nouveau/acr: return error when registering LSF if ACR not supported

This fixes an oops on TU11x GPUs where SEC2 attempts to register its falcon,
and triggers a NULL-pointer deref because ACR isn't yet supported.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
4 years agodrm/nouveau/disp/gv100-: not all channel types support reporting error codes
Ben Skeggs [Tue, 28 Jan 2020 05:03:25 +0000 (15:03 +1000)]
drm/nouveau/disp/gv100-: not all channel types support reporting error codes

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
4 years agodrm/nouveau/disp/nv50-: prevent oops when no channel method map provided
Ben Skeggs [Tue, 28 Jan 2020 04:39:26 +0000 (14:39 +1000)]
drm/nouveau/disp/nv50-: prevent oops when no channel method map provided

The implementations for most channel types contains a map of methods to
priv registers in order to provide debugging info when a disp exception
has been raised.

This info is missing from the implementation of PIO channels as they're
rather simplistic already, however, if an exception is raised by one of
them, we'd end up triggering a NULL-pointer deref.  Not ideal...

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=206299
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
4 years agodrm/nouveau: support synchronous pushbuf submission
Ben Skeggs [Thu, 23 Jan 2020 05:52:12 +0000 (15:52 +1000)]
drm/nouveau: support synchronous pushbuf submission

This is useful for debugging GPU hangs.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
4 years agodrm/nouveau: signal pending fences when channel has been killed
Ben Skeggs [Thu, 23 Jan 2020 05:39:27 +0000 (15:39 +1000)]
drm/nouveau: signal pending fences when channel has been killed

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
4 years agodrm/nouveau: reject attempts to submit to dead channels
Ben Skeggs [Thu, 23 Jan 2020 03:47:12 +0000 (13:47 +1000)]
drm/nouveau: reject attempts to submit to dead channels

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
4 years agodrm/nouveau: zero vma pointer even if we only unreference it rather than free
Ben Skeggs [Thu, 23 Jan 2020 03:55:19 +0000 (13:55 +1000)]
drm/nouveau: zero vma pointer even if we only unreference it rather than free

I'm not sure this affects anything, but best be safe.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
4 years agodrm/nouveau: Add HD-audio component notifier support
Takashi Iwai [Mon, 13 Jan 2020 14:17:21 +0000 (15:17 +0100)]
drm/nouveau: Add HD-audio component notifier support

This patch adds the support for the notification of HD-audio hotplug
via the already existing drm_audio_component framework.  This allows
us more reliable hotplug notification and ELD transfer without
accessing HD-audio bus; it's more efficient, and more importantly, it
works without waking up the runtime PM.

The implementation is rather simplistic: nouveau driver provides the
get_eld ops for HD-audio, and it notifies the audio hotplug via
pin_eld_notify callback upon each nv50_audio_enable() and _disable()
call.  As the HD-audio pin assignment seems corresponding to the CRTC,
the crtc->index number is passed directly as the zero-based port
number.

The bind and unbind callbacks handle the device-link so that it
assures the PM call order.

Link: https://lore.kernel.org/r/20190722143815.7339-3-tiwai@suse.de
Reviewed-by: Lyude Paul <lyude@redhat.com>
Cc: Jaroslav Kysela <perex@perex.cz>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
4 years agodrm/amd/powerplay: fix spelling mistake "Attemp" -> "Attempt"
Colin Ian King [Sat, 25 Jan 2020 20:26:13 +0000 (20:26 +0000)]
drm/amd/powerplay: fix spelling mistake "Attemp" -> "Attempt"

There are several spelling mistakes in PP_ASSERT_WITH_CODE messages.
Fix these.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 years agodrm/amd/display: fix for-loop with incorrectly sized loop counter (v2)
Colin Ian King [Fri, 17 Jan 2020 13:33:05 +0000 (13:33 +0000)]
drm/amd/display: fix for-loop with incorrectly sized loop counter (v2)

A for-loop is iterating from 0 up to 1000 however the loop variable count
is a u8 and hence not large enough.  Fix this by making count an int.
Also remove the redundant initialization of count since this is never used
and add { } on the loop statement make the loop block clearer.

v2: drop useless else (Walter Harms)

Addresses-Coverity: ("Operands don't affect result")
Fixes: ed581a0ace44 ("drm/amd/display: wait for update when setting dpg test pattern")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 years agodrm/amdgpu: enable GPU reset by default on renoir
Alex Deucher [Mon, 27 Jan 2020 19:35:10 +0000 (14:35 -0500)]
drm/amdgpu: enable GPU reset by default on renoir

Everything is in place.

Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 years agodrm/amdgpu: enable GPU reset by default on Navi
Alex Deucher [Mon, 27 Jan 2020 19:31:49 +0000 (14:31 -0500)]
drm/amdgpu: enable GPU reset by default on Navi

Has been working fine for a while.

Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 years agodrm/amd/display: do not allocate display_mode_lib unnecessarily
Dor Askayo [Sat, 4 Jan 2020 12:22:15 +0000 (14:22 +0200)]
drm/amd/display: do not allocate display_mode_lib unnecessarily

This allocation isn't required and can fail when resuming from suspend.

Bug: https://gitlab.freedesktop.org/drm/amd/issues/1009
Signed-off-by: Dor Askayo <dor.askayo@gmail.com>
Reviewed-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 years agodrm/amdgpu: add coreboot workaround for KV/KB
Christian König [Thu, 16 Jan 2020 13:06:59 +0000 (14:06 +0100)]
drm/amdgpu: add coreboot workaround for KV/KB

Coreboot seems to have a problem correctly setting up access to the stolen VRAM
on KV/KB. Use the direct access only when necessary.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reported-and-tested-by: Fredrik Bruhn <fredrik.bruhn@unibap.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 years agoRevert "drm/amd/display: Don't skip link training for empty dongle"
Harry Wentland [Tue, 21 Jan 2020 21:29:54 +0000 (16:29 -0500)]
Revert "drm/amd/display: Don't skip link training for empty dongle"

This reverts commit 80adaebd2d411b7d6872a097634848a71eb13d20.

[WHY]
This change was working around a regression that occured in this:
commit 0301ccbaf67d ("drm/amd/display: DP Compliance 400.1.1 failure")

With the fix to run verify_link_cap when the SINK_COUNT of
dongles becomes non-zero this change is no longer needed.

Cc: Louis Li <Ching-shih.Li@amd.com>
Cc: Wenjing Liu <Wenjing.Liu@amd.com>
Cc: Hersen Wu <hersenxs.wu@amd.com>
Cc: Eric Yang <Eric.Yang2@amd.com>
Reviewed-by: Wenjing Liu <Wenjing.Liu@amd.com>
Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 years agodrm/amd/display: Retrain dongles when SINK_COUNT becomes non-zero
Harry Wentland [Tue, 21 Jan 2020 21:12:45 +0000 (16:12 -0500)]
drm/amd/display: Retrain dongles when SINK_COUNT becomes non-zero

[WHY]
Two years ago the patch referenced by the Fixes tag stopped running
dp_verify_link_cap_with_retries during DP detection when the reason
for the detection was a short-pulse interrupt. This effectively meant
that we were no longer doing the verify_link_cap training on active
dongles when their SINK_COUNT changed from 0 to 1.

A year ago this was partly remedied with:
commit 80adaebd2d41 ("drm/amd/display: Don't skip link training for empty dongle")

This made sure that we trained the dongle on initial hotplug (without
connected downstream devices).

This is all fine and dandy if it weren't for the fact that there are
some dongles on the market that don't like link training when SINK_COUNT
is 0 These dongles will in fact indicate a SINK_COUNT of 0 immediately
after hotplug, even when a downstream device is connected, and then
trigger a shortpulse interrupt indicating a SINK_COUNT change to 1.

In order to play nicely we will need our policy to not link train an
active DP dongle when SINK_COUNT is 0 but ensure we train it when the
SINK_COUNT changes to 1.

[HOW]
Call dp_verify_link_cap_with_retries on detection even when the detection
is triggered from a short pulse interrupt.

With this change we can also revert this commit which we'll do in a separate
follow-up change:
commit 80adaebd2d41 ("drm/amd/display: Don't skip link training for empty dongle")

Fixes: 0301ccbaf67d ("drm/amd/display: DP Compliance 400.1.1 failure")
Suggested-by: Louis Li <Ching-shih.Li@amd.com>
Tested-by: Louis Li <Ching-shih.Li@amd.com>
Cc: Wenjing Liu <Wenjing.Liu@amd.com>
Cc: Hersen Wu <hersenxs.wu@amd.com>
Cc: Eric Yang <Eric.Yang2@amd.com>
Reviewed-by: Wenjing Liu <Wenjing.Liu@amd.com>
Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 years agodrm/amdgpu: original raven doesn't support full asic reset
Alex Deucher [Wed, 15 Jan 2020 17:56:37 +0000 (12:56 -0500)]
drm/amdgpu: original raven doesn't support full asic reset

So don't use it.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 years agodrm/amdgpu: attempt to enable gfxoff on more raven1 boards (v2)
Alex Deucher [Wed, 15 Jan 2020 17:26:51 +0000 (12:26 -0500)]
drm/amdgpu: attempt to enable gfxoff on more raven1 boards (v2)

Switch to a blacklist so we can disable specific boards
that are problematic.

v2: make the blacklist non-raven specific.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 years agodrm/amd/amdgpu: fix spelling mistake "to" -> "too"
Colin Ian King [Thu, 23 Jan 2020 00:22:16 +0000 (00:22 +0000)]
drm/amd/amdgpu: fix spelling mistake "to" -> "too"

There is a spelling mistake in a DRM_ERROR message. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 years agodrm/amd/powerplay: use true, false for bool variable in smu7_hwmgr.c
zhengbin [Wed, 22 Jan 2020 07:53:11 +0000 (15:53 +0800)]
drm/amd/powerplay: use true, false for bool variable in smu7_hwmgr.c

Fixes coccicheck warning:

drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c:723:2-50: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c:733:3-52: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c:747:3-51: WARNING: Assignment of 0/1 to bool variable

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 years agodrm/amdgpu: fix doc by clarifying sched_list definition
Nirmoy Das [Wed, 22 Jan 2020 09:37:56 +0000 (10:37 +0100)]
drm/amdgpu: fix doc by clarifying sched_list definition

expand sched_list definition for better understanding.
Also fix a typo atleast -> at least

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 years agodrm/amdgpu: initialize bo_va_list when add gws to process
xinhui pan [Wed, 22 Jan 2020 03:03:30 +0000 (11:03 +0800)]
drm/amdgpu: initialize bo_va_list when add gws to process

bo_va_list is list_head, so initialize it.

Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 years agodrm/amdgpu/vcn: use inst_idx relacing inst
James Zhu [Tue, 21 Jan 2020 21:33:21 +0000 (16:33 -0500)]
drm/amdgpu/vcn: use inst_idx relacing inst

Use inst_idx relacing inst in SOC15_DPG_MODE macro to avoid confusion.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 years agodrm/amdgpu/vcn: fix typo error
James Zhu [Tue, 21 Jan 2020 21:28:07 +0000 (16:28 -0500)]
drm/amdgpu/vcn: fix typo error

Fix typo error, should be inst_idx instead of inst.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 years agodrm/amdgpu/vcn: fix vcn2.5 instance issue
James Zhu [Tue, 21 Jan 2020 02:44:07 +0000 (21:44 -0500)]
drm/amdgpu/vcn: fix vcn2.5 instance issue

Fix vcn2.5 instance issue, vcn0 and vcn1 have same register offset

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 years agodrm/amdgpu/vcn2.5: fix a bug for the 2nd vcn instance (v2)
James Zhu [Mon, 20 Jan 2020 20:47:35 +0000 (15:47 -0500)]
drm/amdgpu/vcn2.5: fix a bug for the 2nd vcn instance (v2)

Fix a bug for the 2nd vcn instance at start and stop.

v2: squash in unused label removal.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 years agodrm/amdgpu/vcn: Share vcn_v2_0_dec_ring_test_ring to vcn2.5
James Zhu [Mon, 20 Jan 2020 20:43:04 +0000 (15:43 -0500)]
drm/amdgpu/vcn: Share vcn_v2_0_dec_ring_test_ring to vcn2.5

Share vcn_v2_0_dec_ring_test_ring to vcn2.5 to support
vcn software ring.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 years agodrm/amdgpu: Use the correct flush_type in flush_gpu_tlb_pasid
Felix Kuehling [Sat, 18 Jan 2020 01:08:42 +0000 (20:08 -0500)]
drm/amdgpu: Use the correct flush_type in flush_gpu_tlb_pasid

The flush_type was incorrectly hard-coded to 0 when calling falling back
to MMIO-based invalidation in flush_gpu_tlb_pasid.

Fixes: ea930000a6dc ("drm/amdgpu: export function to flush TLB via pasid")
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Oak Zeng <Oak.Zeng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 years agodrm/amdgpu: Fix TLB invalidation request when using semaphore
Felix Kuehling [Sat, 18 Jan 2020 00:54:45 +0000 (19:54 -0500)]
drm/amdgpu: Fix TLB invalidation request when using semaphore

Use a more meaningful variable name for the invalidation request
that is distinct from the tmp variable that gets overwritten when
acquiring the invalidation semaphore.

Fixes: 4ed8a03740d0 ("drm/amdgpu: invalidate mmhub semaphore workaround in gmc9/gmc10")
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Yong Zhao <Yong.Zhao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 years agoMerge branch 'linux-5.6' of git://github.com/skeggsb/linux into drm-next
Dave Airlie [Wed, 22 Jan 2020 23:39:02 +0000 (09:39 +1000)]
Merge branch 'linux-5.6' of git://github.com/skeggsb/linux into drm-next

Just a couple of fixes to GP10x ACR support after the work, and a
(fairly severe if you're running piglit a lot) memory leak fix.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Ben Skeggs <skeggsb@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/
4 years agoMerge tag 'exynos-drm-next-for-v5.6' of git://git.kernel.org/pub/scm/linux/kernel...
Dave Airlie [Wed, 22 Jan 2020 23:32:54 +0000 (09:32 +1000)]
Merge tag 'exynos-drm-next-for-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into drm-next

Change Exynos DRM specific callback function names
- it changes enable and disable callback functions names of
  struct exynos_drm_crtc_ops to atomic_enable and atomic_disable
  for consistency.
Modify "EXYNOS" prefix to "Exynos"
- "Exynos" name is a regular trademarked name promoted by its
  manufacturer, Samsung Electronics Co., Ltd.. This patch
  corrects the name.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Inki Dae <inki.dae@samsung.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1579567970-4467-1-git-send-email-inki.dae@samsung.com
4 years agoMerge branch 'vmwgfx-next' of git://people.freedesktop.org/~thomash/linux into drm-next
Dave Airlie [Mon, 20 Jan 2020 04:16:32 +0000 (14:16 +1000)]
Merge branch 'vmwgfx-next' of git://people.freedesktop.org/~thomash/linux into drm-next

vmwgfx updates + new logging uapi

https://patchwork.freedesktop.org/patch/349809/ is appropriate userpsace patch.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: =?UTF-8?q?Thomas=20Hellstr=C3=B6m=20=28VMware=29?=
Link: https://patchwork.freedesktop.org/patch/msgid/20200116092934.5276-1-thomas_os@shipmail.org
4 years agodrm/nouveau: fix build error without CONFIG_IOMMU_API
Chen Zhou [Thu, 16 Jan 2020 12:50:10 +0000 (20:50 +0800)]
drm/nouveau: fix build error without CONFIG_IOMMU_API

If CONFIG_IOMMU_API is n, build fails:

vers/gpu/drm/nouveau/nvkm/subdev/ltc/gp10b.c:37:9: error: implicit declaration of function dev_iommu_fwspec_get; did you mean iommu_fwspec_free? [-Werror=implicit-function-declaration]
  spec = dev_iommu_fwspec_get(device->dev);
         ^~~~~~~~~~~~~~~~~~~~
         iommu_fwspec_free
drivers/gpu/drm/nouveau/nvkm/subdev/ltc/gp10b.c:37:7: warning: assignment makes pointer from integer without a cast [-Wint-conversion]
  spec = dev_iommu_fwspec_get(device->dev);
       ^
drivers/gpu/drm/nouveau/nvkm/subdev/ltc/gp10b.c:39:17: error: struct iommu_fwspec has no member named ids
   u32 sid = spec->ids[0] & 0xffff;

Seletc IOMMU_API under config DRM_NOUVEAU to fix this.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>