]> asedeno.scripts.mit.edu Git - linux.git/log
linux.git
4 years agoseq_tab_next() should increase position index
Vasily Averin [Thu, 23 Jan 2020 07:11:08 +0000 (10:11 +0300)]
seq_tab_next() should increase position index

if seq_file .next fuction does not change position index,
read after some lseek can generate unexpected output.

https://bugzilla.kernel.org/show_bug.cgi?id=206283
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agotcp: do not leave dangling pointers in tp->highest_sack
Eric Dumazet [Thu, 23 Jan 2020 05:03:00 +0000 (21:03 -0800)]
tcp: do not leave dangling pointers in tp->highest_sack

Latest commit 853697504de0 ("tcp: Fix highest_sack and highest_sack_seq")
apparently allowed syzbot to trigger various crashes in TCP stack [1]

I believe this commit only made things easier for syzbot to find
its way into triggering use-after-frees. But really the bugs
could lead to bad TCP behavior or even plain crashes even for
non malicious peers.

I have audited all calls to tcp_rtx_queue_unlink() and
tcp_rtx_queue_unlink_and_free() and made sure tp->highest_sack would be updated
if we are removing from rtx queue the skb that tp->highest_sack points to.

These updates were missing in three locations :

1) tcp_clean_rtx_queue() [This one seems quite serious,
                          I have no idea why this was not caught earlier]

2) tcp_rtx_queue_purge() [Probably not a big deal for normal operations]

3) tcp_send_synack()     [Probably not a big deal for normal operations]

[1]
BUG: KASAN: use-after-free in tcp_highest_sack_seq include/net/tcp.h:1864 [inline]
BUG: KASAN: use-after-free in tcp_highest_sack_seq include/net/tcp.h:1856 [inline]
BUG: KASAN: use-after-free in tcp_check_sack_reordering+0x33c/0x3a0 net/ipv4/tcp_input.c:891
Read of size 4 at addr ffff8880a488d068 by task ksoftirqd/1/16

CPU: 1 PID: 16 Comm: ksoftirqd/1 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x197/0x210 lib/dump_stack.c:118
 print_address_description.constprop.0.cold+0xd4/0x30b mm/kasan/report.c:374
 __kasan_report.cold+0x1b/0x41 mm/kasan/report.c:506
 kasan_report+0x12/0x20 mm/kasan/common.c:639
 __asan_report_load4_noabort+0x14/0x20 mm/kasan/generic_report.c:134
 tcp_highest_sack_seq include/net/tcp.h:1864 [inline]
 tcp_highest_sack_seq include/net/tcp.h:1856 [inline]
 tcp_check_sack_reordering+0x33c/0x3a0 net/ipv4/tcp_input.c:891
 tcp_try_undo_partial net/ipv4/tcp_input.c:2730 [inline]
 tcp_fastretrans_alert+0xf74/0x23f0 net/ipv4/tcp_input.c:2847
 tcp_ack+0x2577/0x5bf0 net/ipv4/tcp_input.c:3710
 tcp_rcv_established+0x6dd/0x1e90 net/ipv4/tcp_input.c:5706
 tcp_v4_do_rcv+0x619/0x8d0 net/ipv4/tcp_ipv4.c:1619
 tcp_v4_rcv+0x307f/0x3b40 net/ipv4/tcp_ipv4.c:2001
 ip_protocol_deliver_rcu+0x5a/0x880 net/ipv4/ip_input.c:204
 ip_local_deliver_finish+0x23b/0x380 net/ipv4/ip_input.c:231
 NF_HOOK include/linux/netfilter.h:307 [inline]
 NF_HOOK include/linux/netfilter.h:301 [inline]
 ip_local_deliver+0x1e9/0x520 net/ipv4/ip_input.c:252
 dst_input include/net/dst.h:442 [inline]
 ip_rcv_finish+0x1db/0x2f0 net/ipv4/ip_input.c:428
 NF_HOOK include/linux/netfilter.h:307 [inline]
 NF_HOOK include/linux/netfilter.h:301 [inline]
 ip_rcv+0xe8/0x3f0 net/ipv4/ip_input.c:538
 __netif_receive_skb_one_core+0x113/0x1a0 net/core/dev.c:5148
 __netif_receive_skb+0x2c/0x1d0 net/core/dev.c:5262
 process_backlog+0x206/0x750 net/core/dev.c:6093
 napi_poll net/core/dev.c:6530 [inline]
 net_rx_action+0x508/0x1120 net/core/dev.c:6598
 __do_softirq+0x262/0x98c kernel/softirq.c:292
 run_ksoftirqd kernel/softirq.c:603 [inline]
 run_ksoftirqd+0x8e/0x110 kernel/softirq.c:595
 smpboot_thread_fn+0x6a3/0xa40 kernel/smpboot.c:165
 kthread+0x361/0x430 kernel/kthread.c:255
 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

Allocated by task 10091:
 save_stack+0x23/0x90 mm/kasan/common.c:72
 set_track mm/kasan/common.c:80 [inline]
 __kasan_kmalloc mm/kasan/common.c:513 [inline]
 __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:486
 kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:521
 slab_post_alloc_hook mm/slab.h:584 [inline]
 slab_alloc_node mm/slab.c:3263 [inline]
 kmem_cache_alloc_node+0x138/0x740 mm/slab.c:3575
 __alloc_skb+0xd5/0x5e0 net/core/skbuff.c:198
 alloc_skb_fclone include/linux/skbuff.h:1099 [inline]
 sk_stream_alloc_skb net/ipv4/tcp.c:875 [inline]
 sk_stream_alloc_skb+0x113/0xc90 net/ipv4/tcp.c:852
 tcp_sendmsg_locked+0xcf9/0x3470 net/ipv4/tcp.c:1282
 tcp_sendmsg+0x30/0x50 net/ipv4/tcp.c:1432
 inet_sendmsg+0x9e/0xe0 net/ipv4/af_inet.c:807
 sock_sendmsg_nosec net/socket.c:652 [inline]
 sock_sendmsg+0xd7/0x130 net/socket.c:672
 __sys_sendto+0x262/0x380 net/socket.c:1998
 __do_sys_sendto net/socket.c:2010 [inline]
 __se_sys_sendto net/socket.c:2006 [inline]
 __x64_sys_sendto+0xe1/0x1a0 net/socket.c:2006
 do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 10095:
 save_stack+0x23/0x90 mm/kasan/common.c:72
 set_track mm/kasan/common.c:80 [inline]
 kasan_set_free_info mm/kasan/common.c:335 [inline]
 __kasan_slab_free+0x102/0x150 mm/kasan/common.c:474
 kasan_slab_free+0xe/0x10 mm/kasan/common.c:483
 __cache_free mm/slab.c:3426 [inline]
 kmem_cache_free+0x86/0x320 mm/slab.c:3694
 kfree_skbmem+0x178/0x1c0 net/core/skbuff.c:645
 __kfree_skb+0x1e/0x30 net/core/skbuff.c:681
 sk_eat_skb include/net/sock.h:2453 [inline]
 tcp_recvmsg+0x1252/0x2930 net/ipv4/tcp.c:2166
 inet_recvmsg+0x136/0x610 net/ipv4/af_inet.c:838
 sock_recvmsg_nosec net/socket.c:886 [inline]
 sock_recvmsg net/socket.c:904 [inline]
 sock_recvmsg+0xce/0x110 net/socket.c:900
 __sys_recvfrom+0x1ff/0x350 net/socket.c:2055
 __do_sys_recvfrom net/socket.c:2073 [inline]
 __se_sys_recvfrom net/socket.c:2069 [inline]
 __x64_sys_recvfrom+0xe1/0x1a0 net/socket.c:2069
 do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff8880a488d040
 which belongs to the cache skbuff_fclone_cache of size 456
The buggy address is located 40 bytes inside of
 456-byte region [ffff8880a488d040ffff8880a488d208)
The buggy address belongs to the page:
page:ffffea0002922340 refcount:1 mapcount:0 mapping:ffff88821b057000 index:0x0
raw: 00fffe0000000200 ffffea00022a5788 ffffea0002624a48 ffff88821b057000
raw: 0000000000000000 ffff8880a488d040 0000000100000006 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff8880a488cf00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff8880a488cf80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8880a488d000: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
                                                          ^
 ffff8880a488d080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff8880a488d100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 853697504de0 ("tcp: Fix highest_sack and highest_sack_seq")
Fixes: 50895b9de1d3 ("tcp: highest_sack fix")
Fixes: 737ff314563c ("tcp: use sequence distance to detect reordering")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Cambda Zhu <cambda@linux.alibaba.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/rose: fix spelling mistake "to" -> "too"
Colin Ian King [Thu, 23 Jan 2020 09:27:30 +0000 (09:27 +0000)]
net/rose: fix spelling mistake "to" -> "too"

There is a spelling mistake in a printk message. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agocaif_usb: fix spelling mistake "to" -> "too"
Colin Ian King [Thu, 23 Jan 2020 00:45:55 +0000 (00:45 +0000)]
caif_usb: fix spelling mistake "to" -> "too"

There is a spelling mistake in a pr_warn message. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoipvs: fix spelling mistake "to" -> "too"
Colin Ian King [Thu, 23 Jan 2020 00:43:28 +0000 (00:43 +0000)]
ipvs: fix spelling mistake "to" -> "too"

There is a spelling mistake in a IP_VS_ERR_RL message. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoi40e: fix spelling mistake "to" -> "too"
Colin Ian King [Thu, 23 Jan 2020 00:14:29 +0000 (00:14 +0000)]
i40e: fix spelling mistake "to" -> "too"

There is a spelling mistake in a hw_dbg message. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agodmaengine: Create symlinks between DMA channels and slaves
Geert Uytterhoeven [Fri, 17 Jan 2020 15:30:56 +0000 (16:30 +0100)]
dmaengine: Create symlinks between DMA channels and slaves

Currently it is not easy to find out which DMA channels are in use, and
which slave devices are using which channels.

Fix this by creating two symlinks between the DMA channel and the actual
slave device when a channel is requested:
  1. A "slave" symlink from DMA channel to slave device,
  2. A "dma:<name>" symlink slave device to DMA channel.
When the channel is released, the symlinks are removed again.
The latter requires keeping track of the slave device and the channel
name in the dma_chan structure.

Note that this is limited to channel request functions for requesting an
exclusive slave channel that take a device pointer (dma_request_chan()
and dma_request_slave_channel*()).

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Niklas Söderlund <niklas.soderlund@ragnatech.se>
Link: https://lore.kernel.org/r/20200117153056.31363-1-geert+renesas@glider.be
Signed-off-by: Vinod Koul <vkoul@kernel.org>
4 years agodmaengine: hisilicon: Add Kunpeng DMA engine support
Zhou Wang [Thu, 16 Jan 2020 06:10:57 +0000 (14:10 +0800)]
dmaengine: hisilicon: Add Kunpeng DMA engine support

This patch adds a driver for HiSilicon Kunpeng DMA engine. This DMA engine
which is an PCIe iEP offers 30 channels, each channel has a send queue, a
complete queue and an interrupt to help to do tasks. This DMA engine can do
memory copy between memory blocks or between memory and device buffer.

Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com>
Signed-off-by: Zhenfa Qiu <qiuzhenfa@hisilicon.com>
Link: https://lore.kernel.org/r/1579155057-80523-1-git-send-email-wangzhou1@hisilicon.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
4 years agodmaengine: idxd: add char driver to expose submission portal to userland
Dave Jiang [Tue, 21 Jan 2020 23:44:29 +0000 (16:44 -0700)]
dmaengine: idxd: add char driver to expose submission portal to userland

Create a char device region that will allow acquisition of user portals in
order to allow applications to submit DMA operations. A char device will be
created per work queue that gets exposed. The workqueue type "user"
is used to mark a work queue for user char device. For example if the
workqueue 0 of DSA device 0 is marked for char device, then a device node
of /dev/dsa/wq0.0 will be created.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/157965026985.73301.976523230037106742.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
4 years agodmaengine: idxd: connect idxd to dmaengine subsystem
Dave Jiang [Tue, 21 Jan 2020 23:44:23 +0000 (16:44 -0700)]
dmaengine: idxd: connect idxd to dmaengine subsystem

Add plumbing for dmaengine subsystem connection. The driver register a DMA
device per DSA device. The channels are dynamically registered when a
workqueue is configured to be "kernel:dmanegine" type.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/157965026376.73301.13867988830650740445.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
4 years agodmaengine: idxd: add descriptor manipulation routines
Dave Jiang [Tue, 21 Jan 2020 23:44:17 +0000 (16:44 -0700)]
dmaengine: idxd: add descriptor manipulation routines

This commit adds helper functions for DSA descriptor allocation,
submission, and free operations.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/157965025757.73301.12692876585357550065.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
4 years agodmaengine: idxd: add sysfs ABI for idxd driver
Jing Lin [Tue, 21 Jan 2020 23:44:11 +0000 (16:44 -0700)]
dmaengine: idxd: add sysfs ABI for idxd driver

Add the sysfs ABI information for idxd driver in
Documentation/ABI/stable directory.

Signed-off-by: Jing Lin <jing.lin@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/157965025170.73301.13428570530450446901.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
4 years agodmaengine: idxd: add configuration component of driver
Dave Jiang [Tue, 21 Jan 2020 23:44:05 +0000 (16:44 -0700)]
dmaengine: idxd: add configuration component of driver

The device is left unconfigured when the driver is loaded. Various
components are configured via the driver sysfs attributes. Once
configuration is done, the device can be enabled by writing the device name
to the bind attribute of the device driver sysfs. Disabling can be done
similarly. Also the individual work queues can also be enabled and disabled
through the bind/unbind attributes. A constructed hierarchy is created
through the struct device framework in order to provide appropriate
configuration points and device state and status. This hierarchy is
presented off the virtual DSA bus.

i.e. /sys/bus/dsa/...

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/157965024585.73301.6431413676230150589.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
4 years agodmaengine: idxd: Init and probe for Intel data accelerators
Dave Jiang [Tue, 21 Jan 2020 23:43:59 +0000 (16:43 -0700)]
dmaengine: idxd: Init and probe for Intel data accelerators

The idxd driver introduces the Intel Data Stream Accelerator [1] that will
be available on future Intel Xeon CPUs. One of the kernel access
point for the driver is through the dmaengine subsystem. It will initially
provide the DMA copy service to the kernel.

Some of the main functionality introduced with this accelerator
are: shared virtual memory (SVM) support, and descriptor submission using
Intel CPU instructions movdir64b and enqcmds. There will be additional
accelerator devices that share the same driver with variations to
capabilities.

This commit introduces the probe and initialization component of the
driver.

[1]: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/157965023991.73301.6186843973135311580.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
4 years agodmaengine: add support to dynamic register/unregister of channels
Dave Jiang [Tue, 21 Jan 2020 23:43:53 +0000 (16:43 -0700)]
dmaengine: add support to dynamic register/unregister of channels

With the channel registration routines broken out, now add support code to
allow independent registering and unregistering of channels in a hotplug fashion.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/157965023364.73301.7821862091077299040.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
4 years agodmaengine: break out channel registration
Dave Jiang [Tue, 21 Jan 2020 23:43:47 +0000 (16:43 -0700)]
dmaengine: break out channel registration

In preparation for dynamic channel registration, the code segment that
does the channel registration is broken out to its own function.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/157965022778.73301.8929944324898985438.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
4 years agox86/asm: add iosubmit_cmds512() based on MOVDIR64B CPU instruction
Dave Jiang [Tue, 21 Jan 2020 23:43:41 +0000 (16:43 -0700)]
x86/asm: add iosubmit_cmds512() based on MOVDIR64B CPU instruction

With the introduction of MOVDIR64B instruction, there is now an instruction
that can write 64 bytes of data atomically.

Quoting from Intel SDM:
"There is no atomicity guarantee provided for the 64-byte load operation
from source address, and processor implementations may use multiple
load operations to read the 64-bytes. The 64-byte direct-store issued
by MOVDIR64B guarantees 64-byte write-completion atomicity. This means
that the data arrives at the destination in a single undivided 64-byte
write transaction."

We have identified at least 3 different use cases for this instruction in
the format of func(dst, src, count):
1) Clear poison / Initialize MKTME memory
   @dst is normal memory.
   @src in normal memory. Does not increment. (Copy same line to all
   targets)
   @count (to clear/init multiple lines)
2) Submit command(s) to new devices
   @dst is a special MMIO region for a device. Does not increment.
   @src is normal memory. Increments.
   @count usually is 1, but can be multiple.
3) Copy to iomem in big chunks
   @dst is iomem and increments
   @src in normal memory and increments
   @count is number of chunks to copy

Add support for case #2 to support device that will accept commands via
this instruction. We provide a @count in order to submit a batch of
preprogrammed descriptors in virtually contiguous memory. This
allows the caller to submit multiple descriptors to a device with a single
submission. The special device requires the entire 64bytes descriptor to
be written atomically and will accept MOVDIR64B instruction.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Acked-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/157965022175.73301.10174614665472962675.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
4 years agoMerge tag 'mmc-v5.5-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Linus Torvalds [Fri, 24 Jan 2020 00:02:00 +0000 (16:02 -0800)]
Merge tag 'mmc-v5.5-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc

Pull MMC fixes from Ulf Hansson:
 "A couple of MMC host fixes:

   - sdhci: Fix minimum clock rate for v3 controllers

   - sdhci-tegra: Fix SDR50 tuning override

   - sdhci_am654: Fixup tuning issues and support for CQHCI

   - sdhci_am654: Remove wrong write protect flag"

* tag 'mmc-v5.5-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: sdhci: fix minimum clock rate for v3 controller
  mmc: tegra: fix SDR50 tuning override
  mmc: sdhci_am654: Fix Command Queuing in AM65x
  mmc: sdhci_am654: Reset Command and Data line after tuning
  mmc: sdhci_am654: Remove Inverted Write Protect flag

4 years agohwmon: (k10temp) Display up to eight sets of CCD temperatures
Guenter Roeck [Thu, 23 Jan 2020 02:41:18 +0000 (18:41 -0800)]
hwmon: (k10temp) Display up to eight sets of CCD temperatures

In HWiNFO, we see support for Tccd1, Tccd3, Tccd5, and Tccd7 temperature
sensors on Zen2 based Threadripper CPUs. Checking register maps on
Threadripper 3970X confirms SMN register addresses and values for those
sensors.

Register values observed in an idle system:

0x059950: 00000000 00000abc 00000000 00000ad8
0x059960: 00000000 00000ade 00000000 00000ae4

Under load:

0x059950: 00000000 00000c02 00000000 00000c14
0x059960: 00000000 00000c30 00000000 00000c22

More analysis shows that EPYC CPUs support up to 8 CCD temperature
sensors. EPYC 7601 supports three CCD temperature sensors. Unlike
Zen2 CPUs, the register space in Zen1 CPUs supports a maximum of four
sensors, so only search for a maximum of four sensors on Zen1 CPUs.

On top of that, in thm_10_0_sh_mask.h in the Linux kernel, we find
definitions for THM_DIE{1-3}_TEMP__VALID_MASK, set to 0x00000800, as well
as matching SMN addresses. This lets us conclude that bit 11 of the
respective registers is a valid bit. With this assumption, the temperature
offset is now 49 degrees C. This conveniently matches the documented
temperature offset for Tdie, again suggesting that above registers indeed
report temperatures sensor values. Assume that bit 11 is indeed a valid
bit, and add support for the additional sensors.

With this patch applied, output from 3970X (idle) looks as follows:

k10temp-pci-00c3
Adapter: PCI adapter
Tdie:         +55.9°C
Tctl:         +55.9°C
Tccd1:        +39.8°C
Tccd3:        +43.8°C
Tccd5:        +43.8°C
Tccd7:        +44.8°C

Tested-by: Michael Larabel <michael@phoronix.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
4 years agoMerge tag 'amd-drm-fixes-5.5-2020-01-23' of git://people.freedesktop.org/~agd5f/linux...
Dave Airlie [Thu, 23 Jan 2020 22:58:12 +0000 (08:58 +1000)]
Merge tag 'amd-drm-fixes-5.5-2020-01-23' of git://people.freedesktop.org/~agd5f/linux into drm-fixes

amd-drm-fixes-5.5-2020-01-23:

amdgpu:
- remove the experimental flag from renoir

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200123191424.3849-1-alexander.deucher@amd.com
4 years agoMerge tag 'drm-intel-fixes-2020-01-23' of git://anongit.freedesktop.org/drm/drm-intel...
Dave Airlie [Thu, 23 Jan 2020 22:57:37 +0000 (08:57 +1000)]
Merge tag 'drm-intel-fixes-2020-01-23' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

- Avoid overflow with huge userptr objects
- uAPI fix to correctly handle negative values in
  engine->uabi_class/instance (cc: stable)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200123135045.GA12584@jlahtine-desk.ger.corp.intel.com
4 years agohwmon: (k10temp) Add debugfs support
Guenter Roeck [Wed, 22 Jan 2020 05:33:54 +0000 (21:33 -0800)]
hwmon: (k10temp) Add debugfs support

Show thermal and SVI registers for Family 17h CPUs.

Tested-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
4 years agohwmon: (k10temp) Don't show temperature limits on Ryzen (Zen) CPUs
Guenter Roeck [Fri, 17 Jan 2020 14:43:20 +0000 (06:43 -0800)]
hwmon: (k10temp) Don't show temperature limits on Ryzen (Zen) CPUs

The maximum Tdie or Tctl is not published for Ryzen CPUs. What is
known, however, is that the traditional value of 70 degrees C is no
longer correct. On top of that, the limit applies to Tctl, not to Tdie.
Displaying it in either context is meaningless, confusing, and wrong.
Stop doing it.

Tested-by: Brad Campbell <lists2009@fnarfbargle.com>
Tested-by: Holger Kiehl <holger.kiehl@dwd.de>
Tested-by: Michael Larabel <michael@phoronix.com>
Tested-by: Jonathan McDowell <noodles@earth.li>
Tested-by: Ken Moffat <zarniwhoop73@googlemail.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
4 years agohwmon: (k10temp) Show core and SoC current and voltages on Ryzen CPUs
Guenter Roeck [Wed, 15 Jan 2020 01:54:05 +0000 (17:54 -0800)]
hwmon: (k10temp) Show core and SoC current and voltages on Ryzen CPUs

Ryzen CPUs report core and SoC voltages and currents. Add support
for it to the k10temp driver.

For the time being, only report voltages and currents for Ryzen
CPUs. Threadripper and EPYC appear to use a different mechanism.

Tested-by: Brad Campbell <lists2009@fnarfbargle.com>
Tested-by: Bernhard Gebetsberger <bernhard.gebetsberger@gmx.at>
Tested-by: Holger Kiehl <holger.kiehl@dwd.de>
Tested-by: Michael Larabel <michael@phoronix.com>
Tested-by: Jonathan McDowell <noodles@earth.li>
Tested-by: Ken Moffat <zarniwhoop73@googlemail.com>
Tested-by: Darren Salt <devspam@moreofthesa.me.uk>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
4 years agohwmon: (k10temp) Report temperatures per CPU die
Guenter Roeck [Wed, 15 Jan 2020 01:40:12 +0000 (17:40 -0800)]
hwmon: (k10temp) Report temperatures per CPU die

Zen2 reports reporting temperatures per CPU die (called Core Complex Dies,
or CCD, by AMD). Add support for it to the k10temp driver.

Tested-by: Brad Campbell <lists2009@fnarfbargle.com>
Tested-by: Bernhard Gebetsberger <bernhard.gebetsberger@gmx.at>
Tested-by: Holger Kiehl <holger.kiehl@dwd.de>
Tested-by: Michael Larabel <michael@phoronix.com>
Tested-by: Jonathan McDowell <noodles@earth.li>
Tested-by: Ken Moffat <zarniwhoop73@googlemail.com>
Tested-by: Darren Salt <devspam@moreofthesa.me.uk>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
4 years agohmon: (k10temp) Convert to use devm_hwmon_device_register_with_info
Guenter Roeck [Tue, 24 Dec 2019 15:20:55 +0000 (07:20 -0800)]
hmon: (k10temp) Convert to use devm_hwmon_device_register_with_info

Convert driver to use devm_hwmon_device_register_with_info to simplify
the code and to reduce its size.

Old size (x86_64):
   text    data     bss     dec     hex filename
   8247    4488      64   12799    31ff drivers/hwmon/k10temp.o
New size:
   text    data     bss     dec     hex filename
   6778    2792      64    9634    25a2 drivers/hwmon/k10temp.o

Tested-by: Brad Campbell <lists2009@fnarfbargle.com>
Tested-by: Bernhard Gebetsberger <bernhard.gebetsberger@gmx.at>
Tested-by: Holger Kiehl <holger.kiehl@dwd.de>
Tested-by: Michael Larabel <michael@phoronix.com>
Tested-by: Jonathan McDowell <noodles@earth.li>
Tested-by: Ken Moffat <zarniwhoop73@googlemail.com>
Tested-by: Darren Salt <devspam@moreofthesa.me.uk>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
4 years agohwmon: (k10temp) Use bitops
Guenter Roeck [Sun, 29 Apr 2018 15:39:24 +0000 (08:39 -0700)]
hwmon: (k10temp) Use bitops

Using bitops makes bit masks and shifts easier to read.

Tested-by: Brad Campbell <lists2009@fnarfbargle.com>
Tested-by: Bernhard Gebetsberger <bernhard.gebetsberger@gmx.at>
Tested-by: Holger Kiehl <holger.kiehl@dwd.de>
Tested-by: Michael Larabel <michael@phoronix.com>
Tested-by: Jonathan McDowell <noodles@earth.li>
Tested-by: Ken Moffat <zarniwhoop73@googlemail.com>
Tested-by: Darren Salt <devspam@moreofthesa.me.uk>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
4 years agohwmon: (pwm-fan) stop fan on shutdown
Akinobu Mita [Mon, 20 Jan 2020 15:32:24 +0000 (00:32 +0900)]
hwmon: (pwm-fan) stop fan on shutdown

The pwm-fan driver stops the fan in suspend but leaves the fan on in
shutdown.  It seems strange to leave the fan on in shutdown because there
is no use case in my mind and the gpio-fan driver on the other hand stops
in shutdown.

This change turns off the fan in shutdown.  If anyone complains then we'll
add an optional property to switch the behavior.

Cc: Rob Herring <robh+dt@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Kamil Debski <kamil@wypas.org>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Thierry Reding <thierry.reding@gmail.com>
Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Link: https://lore.kernel.org/r/1579534344-11694-1-git-send-email-akinobu.mita@gmail.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
4 years agoMAINTAINERS: add entry for ADM1177 driver
Beniamin Bia [Tue, 14 Jan 2020 11:21:59 +0000 (13:21 +0200)]
MAINTAINERS: add entry for ADM1177 driver

Add Beniamin Bia and Michael Hennerich as a maintainer for ADM1177 ADC.

Signed-off-by: Beniamin Bia <beniamin.bia@analog.com>
Link: https://lore.kernel.org/r/20200114112159.25998-3-beniamin.bia@analog.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
4 years agodt-binding: hwmon: Add documentation for ADM1177
Beniamin Bia [Tue, 14 Jan 2020 11:21:58 +0000 (13:21 +0200)]
dt-binding: hwmon: Add documentation for ADM1177

Documentation for ADM1177 was added.

Signed-off-by: Beniamin Bia <beniamin.bia@analog.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20200114112159.25998-2-beniamin.bia@analog.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
4 years agohwmon: (adm1177) Add ADM1177 Hot Swap Controller and Digital Power Monitor driver
Beniamin Bia [Tue, 14 Jan 2020 11:21:57 +0000 (13:21 +0200)]
hwmon: (adm1177) Add ADM1177 Hot Swap Controller and Digital Power Monitor driver

ADM1177 is a Hot Swap Controller and Digital Power Monitor with
Soft Start Pin.

Datasheet:
Link: https://www.analog.com/media/en/technical-documentation/data-sheets/ADM1177.pdf
Signed-off-by: Beniamin Bia <beniamin.bia@analog.com>
Link: https://lore.kernel.org/r/20200114112159.25998-1-beniamin.bia@analog.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
4 years agodocs: hwmon: Include 'xdpe12284.rst' into docs
Vadim Pasternak [Mon, 13 Jan 2020 15:08:41 +0000 (15:08 +0000)]
docs: hwmon: Include 'xdpe12284.rst' into docs

Add documentation for 'xdpe122' devices.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Link: https://lore.kernel.org/r/20200113150841.17670-7-vadimp@mellanox.com
[groeck: Added to index.rst]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
4 years agohwmon: (pmbus) Add support for Infineon Multi-phase xdpe122 family controllers
Vadim Pasternak [Mon, 13 Jan 2020 15:08:39 +0000 (15:08 +0000)]
hwmon: (pmbus) Add support for Infineon Multi-phase xdpe122 family controllers

Add support for devices XDPE12254, XDPE12284.

All these device support two pages.
The below lists of VOUT_MODE command readout with their related VID
protocols, Digital to Analog Converter steps, supported by these
devices:
VR12.0 mode, 5-mV DAC - 0x01;
VR12.5 mode, 10-mV DAC - 0x02;
IMVP9 mode, 5-mV DAC - 0x03;
AMD mode 6.25mV - 0x10.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Link: https://lore.kernel.org/r/20200113150841.17670-5-vadimp@mellanox.com
[groeck: Added missing break statement]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
4 years agohwmon: (pmbus/tps53679) Extend device list supported by driver
Vadim Pasternak [Mon, 13 Jan 2020 15:08:38 +0000 (15:08 +0000)]
hwmon: (pmbus/tps53679) Extend device list supported by driver

Extends driver with support of the additional devices:
Texas Instruments Dual channel DCAP+ multiphase controllers: TPS53688.

Extend Kconfig with added device.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Link: https://lore.kernel.org/r/20200113150841.17670-4-vadimp@mellanox.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
4 years agohwmon: (pmbus/core) Add support for Intel IMVP9 and AMD 6.25mV modes
Vadim Pasternak [Mon, 13 Jan 2020 15:08:37 +0000 (15:08 +0000)]
hwmon: (pmbus/core) Add support for Intel IMVP9 and AMD 6.25mV modes

Extend "vrm_version" with the type for Intel IMVP9 and AMD 6.25mV VID
modes.
Add calculation for those types.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Link: https://lore.kernel.org/r/20200113150841.17670-3-vadimp@mellanox.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
4 years agohwmon: (pmbus/core) Add support for vid mode detection per page bases
Vadim Pasternak [Mon, 13 Jan 2020 15:08:36 +0000 (15:08 +0000)]
hwmon: (pmbus/core) Add support for vid mode detection per page bases

Add support for VID protocol detection per page bases, instead of
detecting it based on "PMBU_VOUT" readout from page 0 for all the pages
supported by particular device.
The reason that some devices allows to configure different VID modes
per page within the same device.
Patch modifies the field "vrm_version" within the structure
"pmbus_driver_info" to be per page array.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Link: https://lore.kernel.org/r/20200113150841.17670-2-vadimp@mellanox.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
4 years agohwmon: (pmbus/ibm-cffps) Prevent writing on_off_config with bad data
Eddie James [Tue, 7 Jan 2020 15:40:40 +0000 (09:40 -0600)]
hwmon: (pmbus/ibm-cffps) Prevent writing on_off_config with bad data

If the user write parameters resulted in no bytes being written to the
temporary buffer, then ON_OFF_CONFIG will be written with uninitialized
data. Prevent this by bailing out in this case.

Signed-off-by: Eddie James <eajames@linux.ibm.com>
Link: https://lore.kernel.org/r/1578411640-16929-1-git-send-email-eajames@linux.ibm.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
4 years agohwmon: (w83627ehf) Remove set but not used variable 'fan4min'
YueHaibing [Wed, 8 Jan 2020 03:45:14 +0000 (03:45 +0000)]
hwmon: (w83627ehf) Remove set but not used variable 'fan4min'

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/hwmon/w83627ehf.c: In function 'w83627ehf_check_fan_inputs':
drivers/hwmon/w83627ehf.c:1296:24: warning:
 variable 'fan4min' set but not used [-Wunused-but-set-variable]

commit 62000264cfa8 ("hwmon: (w83627ehf) remove nct6775 and nct6776 support")
left behind this unused variable.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Link: https://lore.kernel.org/r/20200108034514.50130-1-yuehaibing@huawei.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
4 years agohwmon: Driver for disk and solid state drives with temperature sensors
Guenter Roeck [Fri, 29 Nov 2019 05:34:40 +0000 (21:34 -0800)]
hwmon: Driver for disk and solid state drives with temperature sensors

Reading the temperature of ATA drives has been supported for years
by userspace tools such as smarttools or hddtemp. The downside of
such tools is that they need to run with super-user privilege, that
the temperatures are not reported by standard tools such as 'sensors'
or 'libsensors', and that drive temperatures are not available for use
in the kernel's thermal subsystem.

This driver solves this problem by adding support for reading the
temperature of ATA drives from the kernel using the hwmon API and
by adding a temperature zone for each drive.

With this driver, the hard disk temperature can be read using the
unprivileged 'sensors' application:

$ sensors drivetemp-scsi-1-0
drivetemp-scsi-1-0
Adapter: SCSI adapter
temp1:        +23.0°C

or directly from sysfs:

$ grep . /sys/class/hwmon/hwmon9/{name,temp1_input}
/sys/class/hwmon/hwmon9/name:drivetemp
/sys/class/hwmon/hwmon9/temp1_input:23000

If the drive supports SCT transport and reports temperature limits,
those are reported as well.

drivetemp-scsi-0-0
Adapter: SCSI adapter
temp1:        +27.0°C  (low  =  +0.0°C, high = +60.0°C)
                       (crit low = -41.0°C, crit = +85.0°C)
                       (lowest = +23.0°C, highest = +34.0°C)

The driver attempts to use SCT Command Transport to read the drive
temperature. If the SCT Command Transport feature set is not available,
or if it does not report the drive temperature, drive temperatures may
be readable through SMART attributes. Since SMART attributes are not well
defined, this method is only used as fallback mechanism.

Cc: Chris Healy <cphealy@gmail.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Tested-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
4 years agohwmon: (pmbus/ibm-cffps) Fix the LED behavior when turned off
Eddie James [Thu, 19 Dec 2019 20:50:07 +0000 (14:50 -0600)]
hwmon: (pmbus/ibm-cffps) Fix the LED behavior when turned off

The driver should remain in control of the LED on the PSU, even while
off, not the PSU firmware as previously indicated.

Signed-off-by: Eddie James <eajames@linux.ibm.com>
Link: https://lore.kernel.org/r/1576788607-13567-4-git-send-email-eajames@linux.ibm.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
4 years agohwmon: (pmbus/ibm-cffps) Add the VMON property for version 2
Eddie James [Thu, 19 Dec 2019 20:50:06 +0000 (14:50 -0600)]
hwmon: (pmbus/ibm-cffps) Add the VMON property for version 2

Version 2 of the PSU supports reading an auxiliary voltage. Use the
pmbus VMON property and associated virtual register to read it.

Signed-off-by: Eddie James <eajames@linux.ibm.com>
Link: https://lore.kernel.org/r/1576788607-13567-3-git-send-email-eajames@linux.ibm.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
4 years agohwmon: (pmbus/ibm-cffps) Add new manufacturer debugfs entries
Eddie James [Thu, 19 Dec 2019 20:50:05 +0000 (14:50 -0600)]
hwmon: (pmbus/ibm-cffps) Add new manufacturer debugfs entries

Add support for a number of manufacturer-specific registers in the
debugfs entries, as well as support to read and write the
PMBUS_ON_OFF_CONFIG register through debugfs.

Signed-off-by: Eddie James <eajames@linux.ibm.com>
Link: https://lore.kernel.org/r/1576788607-13567-2-git-send-email-eajames@linux.ibm.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
4 years agohwmon: (pmbus) Driver for MAX20730, MAX20734, and MAX20743
Guenter Roeck [Fri, 6 Dec 2019 04:26:24 +0000 (20:26 -0800)]
hwmon: (pmbus) Driver for MAX20730, MAX20734, and MAX20743

Add support for Maxim MAX20730, MAX20734, MAX20743 Integrated,
Step-Down Switching Regulators with PMBus support.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
4 years agohwmon: (w83627ehf) Now only one intrusion channel
Dr. David Alan Gilbert [Wed, 25 Dec 2019 02:32:25 +0000 (02:32 +0000)]
hwmon: (w83627ehf) Now only one intrusion channel

The 2nd intrusion channel was only used on the nct6776

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Link: https://lore.kernel.org/r/20191225023225.2785-4-linux@treblig.org
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
4 years agohwmon: (w83627ehf) Remove code not needed after nct677* removal
Dr. David Alan Gilbert [Wed, 25 Dec 2019 02:32:24 +0000 (02:32 +0000)]
hwmon: (w83627ehf) Remove code not needed after nct677* removal

Now the nct677* are gone, we can clean up some flags that are
always the same now and simplify some code.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Link: https://lore.kernel.org/r/20191225023225.2785-3-linux@treblig.org
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
4 years agohwmon: (w83627ehf) remove nct6775 and nct6776 support
Dr. David Alan Gilbert [Wed, 25 Dec 2019 02:32:23 +0000 (02:32 +0000)]
hwmon: (w83627ehf) remove nct6775 and nct6776 support

The nct6775 and nct6776 are supported by the separate nct6775.c driver,
so remove the code from the w83627ehf driver.

Suggested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Link: https://lore.kernel.org/r/20191225023225.2785-2-linux@treblig.org
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
4 years agohwmon: (pmbus) Add MAX20796 to devices supported by generic pmbus driver
Guenter Roeck [Fri, 13 Dec 2019 21:36:36 +0000 (13:36 -0800)]
hwmon: (pmbus) Add MAX20796 to devices supported by generic pmbus driver

MAX20796 is a dual-phase scalable integrated voltage regulator with
PMBus interface.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
4 years agohwmon: (w83627ehf) make sensor_dev_attr_##_name variables static
Chen Zhou [Fri, 13 Dec 2019 01:56:05 +0000 (09:56 +0800)]
hwmon: (w83627ehf) make sensor_dev_attr_##_name variables static

Fix sparse warning:

drivers/hwmon/w83627ehf.c:1202:1: warning:
symbol 'sensor_dev_attr_pwm1_target' was not declared. Should it be static?

and many more similar messages.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Link: https://lore.kernel.org/r/20191213015605.172472-1-chenzhou10@huawei.com
[groeck: Dropped all but one log message from description]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
4 years agohwmon: (pmbus) Detect if chip is write protected
Guenter Roeck [Thu, 12 Dec 2019 17:14:34 +0000 (09:14 -0800)]
hwmon: (pmbus) Detect if chip is write protected

If a chip is write protected, we can not change any limits, and we can
not clear status flags. This may be the reason why clearing status flags
is reported to not work for some chips. Detect the condition in the pmbus
core. If the chip is write protected, set limit attributes as read-only,
and set the flag indicating that the status flag should be ignored.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
4 years agohwmon: Driver for MAX31730
Guenter Roeck [Sat, 23 Nov 2019 19:11:26 +0000 (11:11 -0800)]
hwmon: Driver for MAX31730

MAX31730 is a 3-Channel Remote Temperature Sensor.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
4 years agohwmon: Add support for enable attributes to hwmon core
Guenter Roeck [Tue, 17 Jul 2018 17:17:19 +0000 (10:17 -0700)]
hwmon: Add support for enable attributes to hwmon core

The hwmon ABI supports enable attributes since commit fb41a710f84e
("hwmon: Document the sensor enable attribute"), but did not
add support for those attributes to the hwmon core. Do that now.

Since the enable attributes are logically the most important attributes,
they are added as first attribute to the attribute list. Move
hwmon_in_enable from last to first place for consistency.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
4 years agohwmon: (w83627ehf) convert to with_info interface
Dr. David Alan Gilbert [Sun, 24 Nov 2019 20:20:30 +0000 (20:20 +0000)]
hwmon: (w83627ehf) convert to with_info interface

Convert the old hwmon_device_register code to
devm_hwmon_device_register_with_info.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Link: https://lore.kernel.org/r/20191124202030.45360-3-linux@treblig.org
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
4 years agohwmon: (pmbus/ucd9000) Add support for UCD90320 Power Sequencer
Jim Wright [Thu, 5 Dec 2019 23:24:11 +0000 (17:24 -0600)]
hwmon: (pmbus/ucd9000) Add support for UCD90320 Power Sequencer

Add support for the UCD90320 chip and its expanded set of GPIO pins.

Signed-off-by: Jim Wright <wrightj@linux.vnet.ibm.com>
Link: https://lore.kernel.org/r/20191205232411.21492-3-wrightj@linux.vnet.ibm.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
4 years agodt-bindings: hwmon/pmbus: Add ti,ucd90320 power sequencer
Jim Wright [Thu, 5 Dec 2019 23:24:10 +0000 (17:24 -0600)]
dt-bindings: hwmon/pmbus: Add ti,ucd90320 power sequencer

Document the UCD90320 device tree binding.

Signed-off-by: Jim Wright <wrightj@linux.vnet.ibm.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20191205232411.21492-2-wrightj@linux.vnet.ibm.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
4 years agohwmon: Add intrusion templates
Dr. David Alan Gilbert [Sun, 24 Nov 2019 20:20:29 +0000 (20:20 +0000)]
hwmon: Add intrusion templates

Add templates for intrusion%d_alarm and intrusion%d_beep.
Note, these start at 0.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Link: https://lore.kernel.org/r/20191124202030.45360-2-linux@treblig.org
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
4 years agonet_sched: fix datalen for ematch
Cong Wang [Wed, 22 Jan 2020 23:42:02 +0000 (15:42 -0800)]
net_sched: fix datalen for ematch

syzbot reported an out-of-bound access in em_nbyte. As initially
analyzed by Eric, this is because em_nbyte sets its own em->datalen
in em_nbyte_change() other than the one specified by user, but this
value gets overwritten later by its caller tcf_em_validate().
We should leave em->datalen untouched to respect their choices.

I audit all the in-tree ematch users, all of those implement
->change() set em->datalen, so we can just avoid setting it twice
in this case.

Reported-and-tested-by: syzbot+5af9a90dad568aa9f611@syzkaller.appspotmail.com
Reported-by: syzbot+2f07903a5b05e7f36410@syzkaller.appspotmail.com
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'Fixes-for-SONIC-ethernet-driver'
David S. Miller [Thu, 23 Jan 2020 20:24:37 +0000 (21:24 +0100)]
Merge branch 'Fixes-for-SONIC-ethernet-driver'

Finn Thain says:

====================
Fixes for SONIC ethernet driver

Various SONIC driver problems have become apparent over the years,
including tx watchdog timeouts, lost packets and duplicated packets.

The problems are mostly caused by bugs in buffer handling, locking and
(re-)initialization code.

This patch series resolves these problems.

This series has been tested on National Semiconductor hardware (macsonic),
qemu-system-m68k (macsonic) and qemu-system-mips64el (jazzsonic).

The emulated dp8393x device used in QEMU also has bugs.
I have fixed the bugs that I know of in a series of patches at,
https://github.com/fthain/qemu/commits/sonic

Changed since v1:
 - Minor revisions as described in commit logs.
 - Deferred net-next patches.
Changed since v2:
 - Minor revisions as described in commit logs.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/sonic: Prevent tx watchdog timeout
Finn Thain [Wed, 22 Jan 2020 22:07:26 +0000 (09:07 +1100)]
net/sonic: Prevent tx watchdog timeout

Section 5.5.3.2 of the datasheet says,

    If FIFO Underrun, Byte Count Mismatch, Excessive Collision, or
    Excessive Deferral (if enabled) errors occur, transmission ceases.

In this situation, the chip asserts a TXER interrupt rather than TXDN.
But the handler for the TXDN is the only way that the transmit queue
gets restarted. Hence, an aborted transmission can result in a watchdog
timeout.

This problem can be reproduced on congested link, as that can result in
excessive transmitter collisions. Another way to reproduce this is with
a FIFO Underrun, which may be caused by DMA latency.

In event of a TXER interrupt, prevent a watchdog timeout by restarting
transmission.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/sonic: Fix CAM initialization
Finn Thain [Wed, 22 Jan 2020 22:07:26 +0000 (09:07 +1100)]
net/sonic: Fix CAM initialization

Section 4.3.1 of the datasheet says,

    This bit [TXP] must not be set if a Load CAM operation is in
    progress (LCAM is set). The SONIC will lock up if both bits are
    set simultaneously.

Testing has shown that the driver sometimes attempts to set LCAM
while TXP is set. Avoid this by waiting for command completion
before and after giving the LCAM command.

After issuing the Load CAM command, poll for !SONIC_CR_LCAM rather than
SONIC_INT_LCD, because the SONIC_CR_TXP bit can't be used until
!SONIC_CR_LCAM.

When in reset mode, take the opportunity to reset the CAM Enable
register.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/sonic: Fix command register usage
Finn Thain [Wed, 22 Jan 2020 22:07:26 +0000 (09:07 +1100)]
net/sonic: Fix command register usage

There are several issues relating to command register usage during
chip initialization.

Firstly, the SONIC sometimes comes out of software reset with the
Start Timer bit set. This gets logged as,

    macsonic macsonic eth0: sonic_init: status=24, i=101

Avoid this by giving the Stop Timer command earlier than later.

Secondly, the loop that waits for the Read RRA command to complete has
the break condition inverted. That's why the for loop iterates until
its termination condition. Call the helper for this instead.

Finally, give the Receiver Enable command after clearing interrupts,
not before, to avoid the possibility of losing an interrupt.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/sonic: Quiesce SONIC before re-initializing descriptor memory
Finn Thain [Wed, 22 Jan 2020 22:07:26 +0000 (09:07 +1100)]
net/sonic: Quiesce SONIC before re-initializing descriptor memory

Make sure the SONIC's DMA engine is idle before altering the transmit
and receive descriptors. Add a helper for this as it will be needed
again.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/sonic: Fix receive buffer replenishment
Finn Thain [Wed, 22 Jan 2020 22:07:26 +0000 (09:07 +1100)]
net/sonic: Fix receive buffer replenishment

As soon as the driver is finished with a receive buffer it allocs a new
one and overwrites the corresponding RRA entry with a new buffer pointer.

Problem is, the buffer pointer is split across two word-sized registers.
It can't be updated in one atomic store. So this operation races with the
chip while it stores received packets and advances its RRP register.
This could result in memory corruption by a DMA write.

Avoid this problem by adding buffers only at the location given by the
RWP register, in accordance with the National Semiconductor datasheet.

Re-factor this code into separate functions to calculate a RRA pointer
and to update the RWP.

Fixes: efcce839360f ("[PATCH] macsonic/jazzsonic network drivers update")
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/sonic: Improve receive descriptor status flag check
Finn Thain [Wed, 22 Jan 2020 22:07:26 +0000 (09:07 +1100)]
net/sonic: Improve receive descriptor status flag check

After sonic_tx_timeout() calls sonic_init(), it can happen that
sonic_rx() will subsequently encounter a receive descriptor with no
flags set. Remove the comment that says that this can't happen.

When giving a receive descriptor to the SONIC, clear the descriptor
status field. That way, any rx descriptor with flags set can only be
a newly received packet.

Don't process a descriptor without the LPKT bit set. The buffer is
still in use by the SONIC.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/sonic: Avoid needless receive descriptor EOL flag updates
Finn Thain [Wed, 22 Jan 2020 22:07:26 +0000 (09:07 +1100)]
net/sonic: Avoid needless receive descriptor EOL flag updates

The while loop in sonic_rx() traverses the rx descriptor ring. It stops
when it reaches a descriptor that the SONIC has not used. Each iteration
advances the EOL flag so the SONIC can keep using more descriptors.
Therefore, the while loop has no definite termination condition.

The algorithm described in the National Semiconductor literature is quite
different. It consumes descriptors up to the one with its EOL flag set
(which will also have its "in use" flag set). All freed descriptors are
then returned to the ring at once, by adjusting the EOL flags (and link
pointers).

Adopt the algorithm from datasheet as it's simpler, terminates quickly
and avoids a lot of pointless descriptor EOL flag changes.

Fixes: efcce839360f ("[PATCH] macsonic/jazzsonic network drivers update")
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/sonic: Fix receive buffer handling
Finn Thain [Wed, 22 Jan 2020 22:07:26 +0000 (09:07 +1100)]
net/sonic: Fix receive buffer handling

The SONIC can sometimes advance its rx buffer pointer (RRP register)
without advancing its rx descriptor pointer (CRDA register). As a result
the index of the current rx descriptor may not equal that of the current
rx buffer. The driver mistakenly assumes that they are always equal.
This assumption leads to incorrect packet lengths and possible packet
duplication. Avoid this by calling a new function to locate the buffer
corresponding to a given descriptor.

Fixes: efcce839360f ("[PATCH] macsonic/jazzsonic network drivers update")
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/sonic: Fix interface error stats collection
Finn Thain [Wed, 22 Jan 2020 22:07:26 +0000 (09:07 +1100)]
net/sonic: Fix interface error stats collection

The tx_aborted_errors statistic should count packets flagged with EXD,
EXC, FU, or BCM bits because those bits denote an aborted transmission.
That corresponds to the bitmask 0x0446, not 0x0642. Use macros for these
constants to avoid mistakes. Better to leave out FIFO Underruns (FU) as
there's a separate counter for that purpose.

Don't lump all these errors in with the general tx_errors counter as
that's used for tx timeout events.

On the rx side, don't count RDE and RBAE interrupts as dropped packets.
These interrupts don't indicate a lost packet, just a lack of resources.
When a lack of resources results in a lost packet, this gets reported
in the rx_missed_errors counter (along with RFO events).

Don't double-count rx_frame_errors and rx_crc_errors.

Don't use the general rx_errors counter for events that already have
special counters.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/sonic: Use MMIO accessors
Finn Thain [Wed, 22 Jan 2020 22:07:26 +0000 (09:07 +1100)]
net/sonic: Use MMIO accessors

The driver accesses descriptor memory which is simultaneously accessed by
the chip, so the compiler must not be allowed to re-order CPU accesses.
sonic_buf_get() used 'volatile' to prevent that. sonic_buf_put() should
have done so too but was overlooked.

Fixes: efcce839360f ("[PATCH] macsonic/jazzsonic network drivers update")
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/sonic: Clear interrupt flags immediately
Finn Thain [Wed, 22 Jan 2020 22:07:26 +0000 (09:07 +1100)]
net/sonic: Clear interrupt flags immediately

The chip can change a packet's descriptor status flags at any time.
However, an active interrupt flag gets cleared rather late. This
allows a race condition that could theoretically lose an interrupt.
Fix this by clearing asserted interrupt flags immediately.

Fixes: efcce839360f ("[PATCH] macsonic/jazzsonic network drivers update")
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/sonic: Add mutual exclusion for accessing shared state
Finn Thain [Wed, 22 Jan 2020 22:07:26 +0000 (09:07 +1100)]
net/sonic: Add mutual exclusion for accessing shared state

The netif_stop_queue() call in sonic_send_packet() races with the
netif_wake_queue() call in sonic_interrupt(). This causes issues
like "NETDEV WATCHDOG: eth0 (macsonic): transmit queue 0 timed out".
Fix this by disabling interrupts when accessing tx_skb[] and next_tx.
Update a comment to clarify the synchronization properties.

Fixes: efcce839360f ("[PATCH] macsonic/jazzsonic network drivers update")
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: fsl/fman: rename IF_MODE_XGMII to IF_MODE_10G
Madalin Bucur [Wed, 22 Jan 2020 14:15:14 +0000 (16:15 +0200)]
net: fsl/fman: rename IF_MODE_XGMII to IF_MODE_10G

As the only 10G PHY interface type defined at the moment the code
was developed was XGMII, although the PHY interface mode used was
not XGMII, XGMII was used in the code to denote 10G. This patch
renames the 10G interface mode to remove the ambiguity.

Signed-off-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'net-fsl-fman-address-erratum-A011043'
David S. Miller [Thu, 23 Jan 2020 20:17:13 +0000 (21:17 +0100)]
Merge branch 'net-fsl-fman-address-erratum-A011043'

Madalin Bucur says:

====================
net: fsl/fman: address erratum A011043

This addresses a HW erratum on some QorIQ DPAA devices.

MDIO reads to internal PCS registers may result in having
the MDIO_CFG[MDIO_RD_ER] bit set, even when there is no
error and read data (MDIO_DATA[MDIO_DATA]) is correct.
Software may get false read error when reading internal
PCS registers through MDIO. As a workaround, all internal
MDIO accesses should ignore the MDIO_CFG[MDIO_RD_ER] bit.
When the issue was present, one could see such errors
during boot:

  mdio_bus ffe4e5000: Error while reading PHY0 reg at 3.3
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/fsl: treat fsl,erratum-a011043
Madalin Bucur [Wed, 22 Jan 2020 13:20:29 +0000 (15:20 +0200)]
net/fsl: treat fsl,erratum-a011043

When fsl,erratum-a011043 is set, adjust for erratum A011043:
MDIO reads to internal PCS registers may result in having
the MDIO_CFG[MDIO_RD_ER] bit set, even when there is no
error and read data (MDIO_DATA[MDIO_DATA]) is correct.
Software may get false read error when reading internal
PCS registers through MDIO. As a workaround, all internal
MDIO accesses should ignore the MDIO_CFG[MDIO_RD_ER] bit.

Signed-off-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agopowerpc/fsl/dts: add fsl,erratum-a011043
Madalin Bucur [Wed, 22 Jan 2020 13:20:28 +0000 (15:20 +0200)]
powerpc/fsl/dts: add fsl,erratum-a011043

Add fsl,erratum-a011043 to internal MDIO buses.
Software may get false read error when reading internal
PCS registers through MDIO. As a workaround, all internal
MDIO accesses should ignore the MDIO_CFG[MDIO_RD_ER] bit.

Signed-off-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agodt-bindings: net: add fsl,erratum-a011043
Madalin Bucur [Wed, 22 Jan 2020 13:20:27 +0000 (15:20 +0200)]
dt-bindings: net: add fsl,erratum-a011043

Add an entry for erratum A011043: the MDIO_CFG[MDIO_RD_ER]
bit may be falsely set when reading internal PCS registers.
MDIO reads to internal PCS registers may result in having
the MDIO_CFG[MDIO_RD_ER] bit set, even when there is no
error and read data (MDIO_DATA[MDIO_DATA]) is correct.
Software may get false read error when reading internal
PCS registers through MDIO. As a workaround, all internal
MDIO accesses should ignore the MDIO_CFG[MDIO_RD_ER] bit.

Signed-off-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoqlcnic: Fix CPU soft lockup while collecting firmware dump
Manish Chopra [Wed, 22 Jan 2020 09:43:38 +0000 (01:43 -0800)]
qlcnic: Fix CPU soft lockup while collecting firmware dump

Driver while collecting firmware dump takes longer time to
collect/process some of the firmware dump entries/memories.
Bigger capture masks makes it worse as it results in larger
amount of data being collected and results in CPU soft lockup.
Place cond_resched() in some of the driver flows that are
expectedly time consuming to relinquish the CPU to avoid CPU
soft lockup panic.

Signed-off-by: Shahed Shaikh <shshaikh@marvell.com>
Tested-by: Yonggen Xu <Yonggen.Xu@dell.com>
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge tag 'xarray-5.5' of git://git.infradead.org/users/willy/linux-dax
Linus Torvalds [Thu, 23 Jan 2020 19:37:19 +0000 (11:37 -0800)]
Merge tag 'xarray-5.5' of git://git.infradead.org/users/willy/linux-dax

Pull XArray fixes from Matthew Wilcox:
 "Primarily bugfixes, mostly around handling index wrap-around
  correctly.

  A couple of doc fixes and adding missing APIs.

  I had an oops live on stage at linux.conf.au this year, and it turned
  out to be a bug in xas_find() which I can't prove isn't triggerable in
  the current codebase. Then in looking for the bug, I spotted two more
  bugs.

  The bots have had a few days to chew on this with no problems
  reported, and it passes the test-suite (which now has more tests to
  make sure these problems don't come back)"

* tag 'xarray-5.5' of git://git.infradead.org/users/willy/linux-dax:
  XArray: Add xa_for_each_range
  XArray: Fix xas_find returning too many entries
  XArray: Fix xa_find_after with multi-index entries
  XArray: Fix infinite loop with entry at ULONG_MAX
  XArray: Add wrappers for nested spinlocks
  XArray: Improve documentation of search marks
  XArray: Fix xas_pause at ULONG_MAX

4 years agoMerge tag 'trace-v5.5-rc6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt...
Linus Torvalds [Thu, 23 Jan 2020 19:23:37 +0000 (11:23 -0800)]
Merge tag 'trace-v5.5-rc6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "Various tracing fixes:

   - Fix a function comparison warning for a xen trace event macro

   - Fix a double perf_event linking to a trace_uprobe_filter for
     multiple events

   - Fix suspicious RCU warnings in trace event code for using
     list_for_each_entry_rcu() when the "_rcu" portion wasn't needed.

   - Fix a bug in the histogram code when using the same variable

   - Fix a NULL pointer dereference when tracefs lockdown enabled and
     calling trace_set_default_clock()

   - A fix to a bug found with the double perf_event linking patch"

* tag 'trace-v5.5-rc6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing/uprobe: Fix to make trace_uprobe_filter alignment safe
  tracing: Do not set trace clock if tracefs lockdown is in effect
  tracing: Fix histogram code when expression has same var as value
  tracing: trigger: Replace unneeded RCU-list traversals
  tracing/uprobe: Fix double perf_event linking on multiprobe uprobe
  tracing: xen: Ordered comparison of function pointers

4 years agoMerge tag 'ceph-for-5.5-rc8' of https://github.com/ceph/ceph-client
Linus Torvalds [Thu, 23 Jan 2020 19:21:35 +0000 (11:21 -0800)]
Merge tag 'ceph-for-5.5-rc8' of https://github.com/ceph/ceph-client

Pull ceph fix from Ilya Dryomov:
 "A fix for a potential use-after-free from Jeff, marked for stable"

* tag 'ceph-for-5.5-rc8' of https://github.com/ceph/ceph-client:
  ceph: hold extra reference to r_parent over life of request

4 years agoMerge tag 'pm-5.5-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Linus Torvalds [Thu, 23 Jan 2020 19:10:21 +0000 (11:10 -0800)]
Merge tag 'pm-5.5-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fix from Rafael Wysocki:
 "Prevent the kernel from crashing during resume from hibernation if
  free pages contain leftover data from the restore kernel and
  init_on_free is set (Alexander Potapenko)"

* tag 'pm-5.5-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM: hibernate: fix crashes with init_on_free=1

4 years agoMerge tag 'pci-v5.5-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Linus Torvalds [Thu, 23 Jan 2020 19:08:15 +0000 (11:08 -0800)]
Merge tag 'pci-v5.5-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI fix from Bjorn Helgaas:
 "Mark ATS as broken on AMD Navi14 GPU rev 0xc5 (Alex Deucher)"

* tag 'pci-v5.5-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: Mark AMD Navi14 GPU rev 0xc5 ATS as broken

4 years agopartitions/ldm: fix spelling mistake "to" -> "too"
Colin Ian King [Thu, 23 Jan 2020 00:29:21 +0000 (00:29 +0000)]
partitions/ldm: fix spelling mistake "to" -> "too"

There is a spelling mistake in a ldm_error message. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agobcache: reap from tail of c->btree_cache in bch_mca_scan()
Coly Li [Thu, 23 Jan 2020 17:01:42 +0000 (01:01 +0800)]
bcache: reap from tail of c->btree_cache in bch_mca_scan()

When shrink btree node cache from c->btree_cache in bch_mca_scan(),
no matter the selected node is reaped or not, it will be rotated from
the head to the tail of c->btree_cache list. But in bcache journal
code, when flushing the btree nodes with oldest journal entry, btree
nodes are iterated and slected from the tail of c->btree_cache list in
btree_flush_write(). The list_rotate_left() in bch_mca_scan() will
make btree_flush_write() iterate more nodes in c->btree_list in reverse
order.

This patch just reaps the selected btree node cache, and not move it
from the head to the tail of c->btree_cache list. Then bch_mca_scan()
will not mess up c->btree_cache list to btree_flush_write().

Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agobcache: reap c->btree_cache_freeable from the tail in bch_mca_scan()
Coly Li [Thu, 23 Jan 2020 17:01:41 +0000 (01:01 +0800)]
bcache: reap c->btree_cache_freeable from the tail in bch_mca_scan()

In order to skip the most recently freed btree node cahce, currently
in bch_mca_scan() the first 3 caches in c->btree_cache_freeable list
are skipped when shrinking bcache node caches in bch_mca_scan(). The
related code in bch_mca_scan() is,

 737 list_for_each_entry_safe(b, t, &c->btree_cache_freeable, list) {
 738         if (nr <= 0)
 739                 goto out;
 740
 741         if (++i > 3 &&
 742             !mca_reap(b, 0, false)) {
              lines free cache memory
 746         }
 747         nr--;
 748 }

The problem is, if virtual memory code calls bch_mca_scan() and
the calculated 'nr' is 1 or 2, then in the above loop, nothing will
be shunk. In such case, if slub/slab manager calls bch_mca_scan()
for many times with small scan number, it does not help to shrink
cache memory and just wasts CPU cycles.

This patch just selects btree node caches from tail of the
c->btree_cache_freeable list, then the newly freed host cache can
still be allocated by mca_alloc(), and at least 1 node can be shunk.

Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agobcache: remove member accessed from struct btree
Coly Li [Thu, 23 Jan 2020 17:01:40 +0000 (01:01 +0800)]
bcache: remove member accessed from struct btree

The member 'accessed' of struct btree is used in bch_mca_scan() when
shrinking btree node caches. The original idea is, if b->accessed is
set, clean it and look at next btree node cache from c->btree_cache
list, and only shrink the caches whose b->accessed is cleaned. Then
only cold btree node cache will be shrunk.

But when I/O pressure is high, it is very probably that b->accessed
of a btree node cache will be set again in bch_btree_node_get()
before bch_mca_scan() selects it again. Then there is no chance for
bch_mca_scan() to shrink enough memory back to slub or slab system.

This patch removes member accessed from struct btree, then once a
btree node ache is selected, it will be immediately shunk. By this
change, bch_mca_scan() may release btree node cahce more efficiently.

Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agobcache: print written and keys in trace_bcache_btree_write
Guoju Fang [Thu, 23 Jan 2020 17:01:38 +0000 (01:01 +0800)]
bcache: print written and keys in trace_bcache_btree_write

It's useful to dump written block and keys on btree write, this patch
add them into trace_bcache_btree_write.

Signed-off-by: Guoju Fang <fangguoju@gmail.com>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agobcache: avoid unnecessary btree nodes flushing in btree_flush_write()
Coly Li [Thu, 23 Jan 2020 17:01:37 +0000 (01:01 +0800)]
bcache: avoid unnecessary btree nodes flushing in btree_flush_write()

the commit 91be66e1318f ("bcache: performance improvement for
btree_flush_write()") was an effort to flushing btree node with oldest
btree node faster in following methods,
- Only iterate dirty btree nodes in c->btree_cache, avoid scanning a lot
  of clean btree nodes.
- Take c->btree_cache as a LRU-like list, aggressively flushing all
  dirty nodes from tail of c->btree_cache util the btree node with
  oldest journal entry is flushed. This is to reduce the time of holding
  c->bucket_lock.

Guoju Fang and Shuang Li reported that they observe unexptected extra
write I/Os on cache device after applying the above patch. Guoju Fang
provideed more detailed diagnose information that the aggressive
btree nodes flushing may cause 10x more btree nodes to flush in his
workload. He points out when system memory is large enough to hold all
btree nodes in memory, c->btree_cache is not a LRU-like list any more.
Then the btree node with oldest journal entry is very probably not-
close to the tail of c->btree_cache list. In such situation much more
dirty btree nodes will be aggressively flushed before the target node
is flushed. When slow SATA SSD is used as cache device, such over-
aggressive flushing behavior will cause performance regression.

After spending a lot of time on debug and diagnose, I find the real
condition is more complicated, aggressive flushing dirty btree nodes
from tail of c->btree_cache list is not a good solution.
- When all btree nodes are cached in memory, c->btree_cache is not
  a LRU-like list, the btree nodes with oldest journal entry won't
  be close to the tail of the list.
- There can be hundreds dirty btree nodes reference the oldest journal
  entry, before flushing all the nodes the oldest journal entry cannot
  be reclaimed.
When the above two conditions mixed together, a simply flushing from
tail of c->btree_cache list is really NOT a good idea.

Fortunately there is still chance to make btree_flush_write() work
better. Here is how this patch avoids unnecessary btree nodes flushing,
- Only acquire c->journal.lock when getting oldest journal entry of
  fifo c->journal.pin. In rested locations check the journal entries
  locklessly, so their values can be changed on other cores
  in parallel.
- In loop list_for_each_entry_safe_reverse(), checking latest front
  point of fifo c->journal.pin. If it is different from the original
  point which we get with locking c->journal.lock, it means the oldest
  journal entry is reclaim on other cores. At this moment, all selected
  dirty nodes recorded in array btree_nodes[] are all flushed and clean
  on other CPU cores, it is unncessary to iterate c->btree_cache any
  longer. Just quit the list_for_each_entry_safe_reverse() loop and
  the following for-loop will skip all the selected clean nodes.
- Find a proper time to quit the list_for_each_entry_safe_reverse()
  loop. Check the refcount value of orignial fifo front point, if the
  value is larger than selected node number of btree_nodes[], it means
  more matching btree nodes should be scanned. Otherwise it means no
  more matching btee nodes in rest of c->btree_cache list, the loop
  can be quit. If the original oldest journal entry is reclaimed and
  fifo front point is updated, the refcount of original fifo front point
  will be 0, then the loop will be quit too.
- Not hold c->bucket_lock too long time. c->bucket_lock is also required
  for space allocation for cached data, hold it for too long time will
  block regular I/O requests. When iterating list c->btree_cache, even
  there are a lot of maching btree nodes, in order to not holding
  c->bucket_lock for too long time, only BTREE_FLUSH_NR nodes are
  selected and to flush in following for-loop.
With this patch, only btree nodes referencing oldest journal entry
are flushed to cache device, no aggressive flushing for  unnecessary
btree node any more. And in order to avoid blocking regluar I/O
requests, each time when btree_flush_write() called, at most only
BTREE_FLUSH_NR btree nodes are selected to flush, even there are more
maching btree nodes in list c->btree_cache.

At last, one more thing to explain: Why it is safe to read front point
of c->journal.pin without holding c->journal.lock inside the
list_for_each_entry_safe_reverse() loop ?

Here is my answer: When reading the front point of fifo c->journal.pin,
we don't need to know the exact value of front point, we just want to
check whether the value is different from the original front point
(which is accurate value because we get it while c->jouranl.lock is
held). For such purpose, it works as expected without holding
c->journal.lock. Even the front point is changed on other CPU core and
not updated to local core, and current iterating btree node has
identical journal entry local as original fetched fifo front point, it
is still safe. Because after holding mutex b->write_lock (with memory
barrier) this btree node can be found as clean and skipped, the loop
will quite latter when iterate on next node of list c->btree_cache.

Fixes: 91be66e1318f ("bcache: performance improvement for btree_flush_write()")
Reported-by: Guoju Fang <fangguoju@gmail.com>
Reported-by: Shuang Li <psymon@bonuscloud.io>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agobcache: add code comments for state->pool in __btree_sort()
Coly Li [Thu, 23 Jan 2020 17:01:36 +0000 (01:01 +0800)]
bcache: add code comments for state->pool in __btree_sort()

To explain the pages allocated from mempool state->pool can be
swapped in __btree_sort(), because state->pool is a page pool,
which allocates pages by alloc_pages() indeed.

Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agolib: crc64: include <linux/crc64.h> for 'crc64_be'
Ben Dooks (Codethink) [Thu, 23 Jan 2020 17:01:35 +0000 (01:01 +0800)]
lib: crc64: include <linux/crc64.h> for 'crc64_be'

The crc64_be() is declared in <linux/crc64.h> so include
this where the symbol is defined to avoid the following
warning:

lib/crc64.c:43:12: warning: symbol 'crc64_be' was not declared. Should it be static?

Signed-off-by: Ben Dooks (Codethink) <ben.dooks@codethink.co.uk>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agobcache: use read_cache_page_gfp to read the superblock
Christoph Hellwig [Thu, 23 Jan 2020 17:01:34 +0000 (01:01 +0800)]
bcache: use read_cache_page_gfp to read the superblock

Avoid a pointless dependency on buffer heads in bcache by simply open
coding reading a single page.  Also add a SB_OFFSET define for the
byte offset of the superblock instead of using magic numbers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agobcache: store a pointer to the on-disk sb in the cache and cached_dev structures
Christoph Hellwig [Thu, 23 Jan 2020 17:01:33 +0000 (01:01 +0800)]
bcache: store a pointer to the on-disk sb in the cache and cached_dev structures

This allows to properly build the superblock bio including the offset in
the page using the normal bio helpers.  This fixes writing the superblock
for page sizes larger than 4k where the sb write bio would need an offset
in the bio_vec.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agobcache: return a pointer to the on-disk sb from read_super
Christoph Hellwig [Thu, 23 Jan 2020 17:01:32 +0000 (01:01 +0800)]
bcache: return a pointer to the on-disk sb from read_super

Returning the properly typed actual data structure insteaf of the
containing struct page will save the callers some work going
forward.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agobcache: transfer the sb_page reference to register_{bdev,cache}
Christoph Hellwig [Thu, 23 Jan 2020 17:01:31 +0000 (01:01 +0800)]
bcache: transfer the sb_page reference to register_{bdev,cache}

Avoid an extra reference count roundtrip by transferring the sb_page
ownership to the lower level register helpers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agobcache: fix use-after-free in register_bcache()
Coly Li [Thu, 23 Jan 2020 17:01:30 +0000 (01:01 +0800)]
bcache: fix use-after-free in register_bcache()

The patch "bcache: rework error unwinding in register_bcache" introduces
a use-after-free regression in register_bcache(). Here are current code,
2510 out_free_path:
2511         kfree(path);
2512 out_module_put:
2513         module_put(THIS_MODULE);
2514 out:
2515         pr_info("error %s: %s", path, err);
2516         return ret;
If some error happens and the above code path is executed, at line 2511
path is released, but referenced at line 2515. Then KASAN reports a use-
after-free error message.

This patch changes line 2515 in the following way to fix the problem,
2515         pr_info("error %s: %s", path?path:"", err);

Signed-off-by: Coly Li <colyli@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agobcache: properly initialize 'path' and 'err' in register_bcache()
Coly Li [Thu, 23 Jan 2020 17:01:29 +0000 (01:01 +0800)]
bcache: properly initialize 'path' and 'err' in register_bcache()

Patch "bcache: rework error unwinding in register_bcache" from
Christoph Hellwig changes the local variables 'path' and 'err'
in undefined initial state. If the code in register_bcache() jumps
to label 'out:' or 'out_module_put:' by goto, these two variables
might be reference with undefined value by the following line,

out_module_put:
        module_put(THIS_MODULE);
out:
        pr_info("error %s: %s", path, err);
        return ret;

Therefore this patch initializes these two local variables properly
in register_bcache() to avoid such issue.

Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agobcache: rework error unwinding in register_bcache
Christoph Hellwig [Thu, 23 Jan 2020 17:01:28 +0000 (01:01 +0800)]
bcache: rework error unwinding in register_bcache

Split the successful and error return path, and use one goto label for each
resource to unwind.  This also fixes some small errors like leaking the
module reference count in the reboot case (which seems entirely harmless)
or printing the wrong warning messages for early failures.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agobcache: use a separate data structure for the on-disk super block
Christoph Hellwig [Thu, 23 Jan 2020 17:01:27 +0000 (01:01 +0800)]
bcache: use a separate data structure for the on-disk super block

Split out an on-disk version struct cache_sb with the proper endianness
annotations.  This fixes a fair chunk of sparse warnings, but there are
some left due to the way the checksum is defined.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agobcache: cached_dev_free needs to put the sb page
Liang Chen [Thu, 23 Jan 2020 17:01:26 +0000 (01:01 +0800)]
bcache: cached_dev_free needs to put the sb page

Same as cache device, the buffer page needs to be put while
freeing cached_dev.  Otherwise a page would be leaked every
time a cached_dev is stopped.

Signed-off-by: Liang Chen <liangchen.linux@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoreaddir: make user_access_begin() use the real access range
Linus Torvalds [Wed, 22 Jan 2020 20:37:25 +0000 (12:37 -0800)]
readdir: make user_access_begin() use the real access range

In commit 9f79b78ef744 ("Convert filldir[64]() from __put_user() to
unsafe_put_user()") I changed filldir to not do individual __put_user()
accesses, but instead use unsafe_put_user() surrounded by the proper
user_access_begin/end() pair.

That make them enormously faster on modern x86, where the STAC/CLAC
games make individual user accesses fairly heavy-weight.

However, the user_access_begin() range was not really the exact right
one, since filldir() has the unfortunate problem that it needs to not
only fill out the new directory entry, it also needs to fix up the
previous one to contain the proper file offset.

It's unfortunate, but the "d_off" field in "struct dirent" is _not_ the
file offset of the directory entry itself - it's the offset of the next
one.  So we end up backfilling the offset in the previous entry as we
walk along.

But since x86 didn't really care about the exact range, and used to be
the only architecture that did anything fancy in user_access_begin() to
begin with, the filldir[64]() changes did something lazy, and even
commented on it:

/*
 * Note! This range-checks 'previous' (which may be NULL).
 * The real range was checked in getdents
 */
if (!user_access_begin(dirent, sizeof(*dirent)))
goto efault;

and it all worked fine.

But now 32-bit ppc is starting to also implement user_access_begin(),
and the fact that we faked the range to only be the (possibly not even
valid) previous directory entry becomes a problem, because ppc32 will
actually be using the range that is passed in for more than just "check
that it's user space".

This is a complete rewrite of Christophe's original patch.

By saving off the record length of the previous entry instead of a
pointer to it in the filldir data structures, we can simplify the range
check and the writing of the previous entry d_off field.  No need for
any conditionals in the user accesses themselves, although we retain the
conditional EINTR checking for the "was this the first directory entry"
signal handling latency logic.

Fixes: 9f79b78ef744 ("Convert filldir[64]() from __put_user() to unsafe_put_user()")
Link: https://lore.kernel.org/lkml/a02d3426f93f7eb04960a4d9140902d278cab0bb.1579697910.git.christophe.leroy@c-s.fr/
Link: https://lore.kernel.org/lkml/408c90c4068b00ea8f1c41cca45b84ec23d4946b.1579783936.git.christophe.leroy@c-s.fr/
Reported-and-tested-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoreaddir: be more conservative with directory entry names
Linus Torvalds [Thu, 23 Jan 2020 18:05:05 +0000 (10:05 -0800)]
readdir: be more conservative with directory entry names

Commit 8a23eb804ca4 ("Make filldir[64]() verify the directory entry
filename is valid") added some minimal validity checks on the directory
entries passed to filldir[64]().  But they really were pretty minimal.

This fleshes out at least the name length check: we used to disallow
zero-length names, but really, negative lengths or oevr-long names
aren't ok either.  Both could happen if there is some filesystem
corruption going on.

Now, most filesystems tend to use just an "unsigned char" or similar for
the length of a directory entry name, so even with a corrupt filesystem
you should never see anything odd like that.  But since we then use the
name length to create the directory entry record length, let's make sure
it actually is half-way sensible.

Note how POSIX states that the size of a path component is limited by
NAME_MAX, but we actually use PATH_MAX for the check here.  That's
because while NAME_MAX is generally the correct maximum name length
(it's 255, for the same old "name length is usually just a byte on
disk"), there's nothing in the VFS layer that really cares.

So the real limitation at a VFS layer is the total pathname length you
can pass as a filename: PATH_MAX.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agodrm/amdgpu: remove the experimental flag for renoir
Alex Deucher [Wed, 22 Jan 2020 16:17:24 +0000 (11:17 -0500)]
drm/amdgpu: remove the experimental flag for renoir

Should work properly with the latest sbios on 5.5 and newer
kernels.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>