]> asedeno.scripts.mit.edu Git - linux.git/log
linux.git
5 years agosctp: implement memory accounting on rx path
Xin Long [Mon, 15 Apr 2019 09:15:07 +0000 (17:15 +0800)]
sctp: implement memory accounting on rx path

sk_forward_alloc's updating is also done on rx path, but to be consistent
we change to use sk_mem_charge() in sctp_skb_set_owner_r().

In sctp_eat_data(), it's not enough to check sctp_memory_pressure only,
which doesn't work for mem_cgroup_sockets_enabled, so we change to use
sk_under_memory_pressure().

When it's under memory pressure, sk_mem_reclaim() and sk_rmem_schedule()
should be called on both RENEGE or CHUNK DELIVERY path exit the memory
pressure status as soon as possible.

Note that sk_rmem_schedule() is using datalen to make things easy there.

Reported-by: Matteo Croce <mcroce@redhat.com>
Tested-by: Matteo Croce <mcroce@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agosctp: implement memory accounting on tx path
Xin Long [Mon, 15 Apr 2019 09:15:06 +0000 (17:15 +0800)]
sctp: implement memory accounting on tx path

Now when sending packets, sk_mem_charge() and sk_mem_uncharge() have been
used to set sk_forward_alloc. We just need to call sk_wmem_schedule() to
check if the allocated should be raised, and call sk_mem_reclaim() to
check if the allocated should be reduced when it's under memory pressure.

If sk_wmem_schedule() returns false, which means no memory is allowed to
allocate, it will block and wait for memory to become available.

Note different from tcp, sctp wait_for_buf happens before allocating any
skb, so memory accounting check is done with the whole msg_len before it
too.

Reported-by: Matteo Croce <mcroce@redhat.com>
Tested-by: Matteo Croce <mcroce@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoroute: Avoid crash from dereferencing NULL rt->from
Jonathan Lemon [Sun, 14 Apr 2019 21:21:29 +0000 (14:21 -0700)]
route: Avoid crash from dereferencing NULL rt->from

When __ip6_rt_update_pmtu() is called, rt->from is RCU dereferenced, but is
never checked for null - rt6_flush_exceptions() may have removed the entry.

[ 1913.989004] RIP: 0010:ip6_rt_cache_alloc+0x13/0x170
[ 1914.209410] Call Trace:
[ 1914.214798]  <IRQ>
[ 1914.219226]  __ip6_rt_update_pmtu+0xb0/0x190
[ 1914.228649]  ip6_tnl_xmit+0x2c2/0x970 [ip6_tunnel]
[ 1914.239223]  ? ip6_tnl_parse_tlv_enc_lim+0x32/0x1a0 [ip6_tunnel]
[ 1914.252489]  ? __gre6_xmit+0x148/0x530 [ip6_gre]
[ 1914.262678]  ip6gre_tunnel_xmit+0x17e/0x3c7 [ip6_gre]
[ 1914.273831]  dev_hard_start_xmit+0x8d/0x1f0
[ 1914.283061]  sch_direct_xmit+0xfa/0x230
[ 1914.291521]  __qdisc_run+0x154/0x4b0
[ 1914.299407]  net_tx_action+0x10e/0x1f0
[ 1914.307678]  __do_softirq+0xca/0x297
[ 1914.315567]  irq_exit+0x96/0xa0
[ 1914.322494]  smp_apic_timer_interrupt+0x68/0x130
[ 1914.332683]  apic_timer_interrupt+0xf/0x20
[ 1914.341721]  </IRQ>

Fixes: a68886a69180 ("net/ipv6: Make from in rt6_info rcu protected")
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'mlxsw-Add-neighbour-offload-indication'
David S. Miller [Mon, 15 Apr 2019 20:29:21 +0000 (13:29 -0700)]
Merge branch 'mlxsw-Add-neighbour-offload-indication'

Ido Schimmel says:

====================
mlxsw: Add neighbour offload indication

Neighbour entries are programmed to the device's table so that the
correct destination MAC will be specified in a packet after it was
routed.

Despite being programmed to the device and unlike routes and FDB
entries, neighbour entries are currently not marked as offloaded. This
patchset changes that.

Patch #1 is a preparatory patch to make sure we only mark a neighbour as
offloaded in case it was successfully programmed to the device.

Patch #2 sets the offload indication on neighbours.

Patch #3 adds a test to verify above mentioned functionality.

Patched iproute2 version that prints the offload indication is available
here [1].

[1] https://github.com/idosch/iproute2/tree/idosch-next
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: mlxsw: Test neighbour offload indication
Ido Schimmel [Sun, 14 Apr 2019 18:57:50 +0000 (18:57 +0000)]
selftests: mlxsw: Test neighbour offload indication

Test that neighbour entries are marked as offloaded.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum_router: Add neighbour offload indication
Ido Schimmel [Sun, 14 Apr 2019 18:57:49 +0000 (18:57 +0000)]
mlxsw: spectrum_router: Add neighbour offload indication

In a similar fashion to routes and FDB entries, the neighbour table is
reflected to the device.

Set an offload indication on the neighbour in case it was programmed to
the device.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum_router: Propagate neighbour update errors
Ido Schimmel [Sun, 14 Apr 2019 18:57:47 +0000 (18:57 +0000)]
mlxsw: spectrum_router: Propagate neighbour update errors

Next patch will add offload indication to neighbours, but the indication
should only be altered in case the neighbour was successfully added to /
deleted from the device.

Propagate neighbour update errors, so that they could be taken into
account by the next patch.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMAINTAINERS: normalize Woojung Huh's email address
Lukas Bulwahn [Sat, 13 Apr 2019 07:52:15 +0000 (09:52 +0200)]
MAINTAINERS: normalize Woojung Huh's email address

MAINTAINERS contains a lower-case and upper-case variant of
Woojung Huh' s email address.

Only keep the lower-case variant in MAINTAINERS.

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Acked-by: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agobonding: fix event handling for stacked bonds
Sabrina Dubroca [Fri, 12 Apr 2019 13:04:10 +0000 (15:04 +0200)]
bonding: fix event handling for stacked bonds

When a bond is enslaved to another bond, bond_netdev_event() only
handles the event as if the bond is a master, and skips treating the
bond as a slave.

This leads to a refcount leak on the slave, since we don't remove the
adjacency to its master and the master holds a reference on the slave.

Reproducer:
  ip link add bondL type bond
  ip link add bondU type bond
  ip link set bondL master bondU
  ip link del bondL

No "Fixes:" tag, this code is older than git history.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoRevert "net-sysfs: Fix memory leak in netdev_register_kobject"
Wang Hai [Fri, 12 Apr 2019 20:36:33 +0000 (16:36 -0400)]
Revert "net-sysfs: Fix memory leak in netdev_register_kobject"

This reverts commit 6b70fc94afd165342876e53fc4b2f7d085009945.

The reverted bugfix will cause another issue.
Reported by syzbot+6024817a931b2830bc93@syzkaller.appspotmail.com.
See https://syzkaller.appspot.com/x/log.txt?x=1737671b200000 for
details.

Signed-off-by: Wang Hai <wanghai26@huawei.com>
Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
David S. Miller [Mon, 15 Apr 2019 19:07:35 +0000 (12:07 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next

Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter updates for net-next:

1) Remove the broute pseudo hook, implement this from the bridge
   prerouting hook instead. Now broute becomes real table in ebtables,
   from Florian Westphal. This also includes a size reduction patch for the
   bridge control buffer area via squashing boolean into bitfields and
   a selftest.

2) Add OS passive fingerprint version matching, from Fernando Fernandez.

3) Support for gue encapsulation for IPVS, from Jacky Hu.

4) Add support for NAT to the inet family, from Florian Westphal.
   This includes support for masquerade, redirect and nat extensions.

5) Skip interface lookup in flowtable, use device in the dst object.

6) Add jiffies64_to_msecs() and use it, from Li RongQing.

7) Remove unused parameter in nf_tables_set_desc_parse(), from Colin Ian King.

8) Statify several functions, patches from YueHaibing and Florian Westphal.

9) Add an optimized version of nf_inet_addr_cmp(), from Li RongQing.

10) Merge route extension to core, also from Florian.

11) Use IS_ENABLED(CONFIG_NF_NAT) instead of NF_NAT_NEEDED, from Florian.

12) Merge ip/ip6 masquerade extensions, from Florian. This includes
    netdevice notifier unification.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge tag 'wireless-drivers-for-davem-2019-04-15' of git://git.kernel.org/pub/scm...
David S. Miller [Mon, 15 Apr 2019 19:02:29 +0000 (12:02 -0700)]
Merge tag 'wireless-drivers-for-davem-2019-04-15' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers

Kalle Valo says:

====================
wireless-drivers fixes for 5.1

Second set of fixes for 5.1.

iwlwifi

* add some new PCI IDs (plus a struct name change they depend on)

* fix crypto with new devices, namely 22560 and above

* fix for a potential deadlock in the TX path

* a fix for offloaded rate-control

* support new PCI HW IDs which use a new FW

mt76

* fix lock initialisation and a possible deadlock

* aggregation fixes

rt2x00

* fix sequence numbering during retransmits
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agobridge: only include nf_queue.h if needed
Stephen Rothwell [Sat, 13 Apr 2019 04:03:36 +0000 (14:03 +1000)]
bridge: only include nf_queue.h if needed

After merging the netfilter-next tree, today's linux-next build (powerpc
ppc44x_defconfig) failed like this:

In file included from net/bridge/br_input.c:19:
include/net/netfilter/nf_queue.h:16:23: error: field 'state' has incomplete type
  struct nf_hook_state state;
                       ^~~~~

Fixes: 971502d77faa ("bridge: netfilter: unroll NF_HOOK helper in bridge input path")
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
5 years agoKVM: x86/mmu: Fix an inverted list_empty() check when zapping sptes
Sean Christopherson [Sat, 13 Apr 2019 02:55:41 +0000 (19:55 -0700)]
KVM: x86/mmu: Fix an inverted list_empty() check when zapping sptes

A recently introduced helper for handling zap vs. remote flush
incorrectly bails early, effectively leaking defunct shadow pages.
Manifests as a slab BUG when exiting KVM due to the shadow pages
being alive when their associated cache is destroyed.

==========================================================================
BUG kvm_mmu_page_header: Objects remaining in kvm_mmu_page_header on ...
--------------------------------------------------------------------------
Disabling lock debugging due to kernel taint
INFO: Slab 0x00000000fc436387 objects=26 used=23 fp=0x00000000d023caee ...
CPU: 6 PID: 4315 Comm: rmmod Tainted: G    B             5.1.0-rc2+ #19
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
Call Trace:
 dump_stack+0x46/0x5b
 slab_err+0xad/0xd0
 ? on_each_cpu_mask+0x3c/0x50
 ? ksm_migrate_page+0x60/0x60
 ? on_each_cpu_cond_mask+0x7c/0xa0
 ? __kmalloc+0x1ca/0x1e0
 __kmem_cache_shutdown+0x13a/0x310
 shutdown_cache+0xf/0x130
 kmem_cache_destroy+0x1d5/0x200
 kvm_mmu_module_exit+0xa/0x30 [kvm]
 kvm_arch_exit+0x45/0x60 [kvm]
 kvm_exit+0x6f/0x80 [kvm]
 vmx_exit+0x1a/0x50 [kvm_intel]
 __x64_sys_delete_module+0x153/0x1f0
 ? exit_to_usermode_loop+0x88/0xc0
 do_syscall_64+0x4f/0x100
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: a21136345cb6f ("KVM: x86/mmu: Split remote_flush+zap case out of kvm_mmu_flush_or_zap()")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
5 years agoLinux 5.1-rc5 v5.1-rc5
Linus Torvalds [Sun, 14 Apr 2019 22:17:41 +0000 (15:17 -0700)]
Linux 5.1-rc5

5 years agoMerge branch 'page-refs' (page ref overflow)
Linus Torvalds [Sun, 14 Apr 2019 22:09:40 +0000 (15:09 -0700)]
Merge branch 'page-refs' (page ref overflow)

Merge page ref overflow branch.

Jann Horn reported that he can overflow the page ref count with
sufficient memory (and a filesystem that is intentionally extremely
slow).

Admittedly it's not exactly easy.  To have more than four billion
references to a page requires a minimum of 32GB of kernel memory just
for the pointers to the pages, much less any metadata to keep track of
those pointers.  Jann needed a total of 140GB of memory and a specially
crafted filesystem that leaves all reads pending (in order to not ever
free the page references and just keep adding more).

Still, we have a fairly straightforward way to limit the two obvious
user-controllable sources of page references: direct-IO like page
references gotten through get_user_pages(), and the splice pipe page
duplication.  So let's just do that.

* branch page-refs:
  fs: prevent page refcount overflow in pipe_buf_get
  mm: prevent get_user_pages() from overflowing page refcount
  mm: add 'try_get_page()' helper function
  mm: make page ref count overflow check tighter and more explicit

5 years agoMerge tag 'mlx5-fixes-2019-04-09' of git://git.kernel.org/pub/scm/linux/kernel/git...
David S. Miller [Sun, 14 Apr 2019 22:07:30 +0000 (15:07 -0700)]
Merge tag 'mlx5-fixes-2019-04-09' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
Mellanox, mlx5 fixes 2019-04-09

This series provides some fixes to mlx5 driver.

I've cc'ed some of the checksum fixes to Eric Dumazet and i would like to get
his feedback before you pull.

For -stable v4.19
('net/mlx5: FPGA, tls, idr remove on flow delete')
('net/mlx5: FPGA, tls, hold rcu read lock a bit longer')

For -stable v4.20
('net/mlx5e: Rx, Check ip headers sanity')
('Revert "net/mlx5e: Enable reporting checksum unnecessary also for L3 packets"')
('net/mlx5e: Rx, Fixup skb checksum for packets with tail padding')

For -stable v5.0
('net/mlx5e: Switch to Toeplitz RSS hash by default')
('net/mlx5e: Protect against non-uplink representor for encap')
('net/mlx5e: XDP, Avoid checksum complete when XDP prog is loaded')
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agortnetlink: fix rtnl_valid_stats_req() nlmsg_len check
Eric Dumazet [Sun, 14 Apr 2019 18:02:05 +0000 (11:02 -0700)]
rtnetlink: fix rtnl_valid_stats_req() nlmsg_len check

Jakub forgot to either use nlmsg_len() or nlmsg_msg_size(),
allowing KMSAN to detect a possible uninit-value in rtnl_stats_get

BUG: KMSAN: uninit-value in rtnl_stats_get+0x6d9/0x11d0 net/core/rtnetlink.c:4997
CPU: 0 PID: 10428 Comm: syz-executor034 Not tainted 5.1.0-rc2+ #24
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x173/0x1d0 lib/dump_stack.c:113
 kmsan_report+0x131/0x2a0 mm/kmsan/kmsan.c:619
 __msan_warning+0x7a/0xf0 mm/kmsan/kmsan_instr.c:310
 rtnl_stats_get+0x6d9/0x11d0 net/core/rtnetlink.c:4997
 rtnetlink_rcv_msg+0x115b/0x1550 net/core/rtnetlink.c:5192
 netlink_rcv_skb+0x431/0x620 net/netlink/af_netlink.c:2485
 rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:5210
 netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
 netlink_unicast+0xf3e/0x1020 net/netlink/af_netlink.c:1336
 netlink_sendmsg+0x127f/0x1300 net/netlink/af_netlink.c:1925
 sock_sendmsg_nosec net/socket.c:622 [inline]
 sock_sendmsg net/socket.c:632 [inline]
 ___sys_sendmsg+0xdb3/0x1220 net/socket.c:2137
 __sys_sendmsg net/socket.c:2175 [inline]
 __do_sys_sendmsg net/socket.c:2184 [inline]
 __se_sys_sendmsg+0x305/0x460 net/socket.c:2182
 __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2182
 do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291
 entry_SYSCALL_64_after_hwframe+0x63/0xe7

Fixes: 51bc860d4a99 ("rtnetlink: stats: validate attributes in get as well as dumps")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'qed-doorbell-overflow-recovery'
David S. Miller [Sun, 14 Apr 2019 20:59:49 +0000 (13:59 -0700)]
Merge branch 'qed-doorbell-overflow-recovery'

Denis Bolotin says:

====================
qed: Fix the Doorbell Overflow Recovery mechanism

This patch series fixes and improves the doorbell recovery mechanism.
The main goals of this series are to fix missing attentions from the
doorbells block (DORQ) or not handling them properly, and execute the
recovery from periodic handler instead of the attention handler.

Please consider applying the series to net.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoqed: Fix the DORQ's attentions handling
Denis Bolotin [Sun, 14 Apr 2019 14:23:08 +0000 (17:23 +0300)]
qed: Fix the DORQ's attentions handling

Separate the overflow handling from the hardware interrupt status analysis.
The interrupt status is a single register and is common for all PFs. The
first PF reading the register is not necessarily the one who overflowed.
All PFs must check their overflow status on every attention.
In this change we clear the sticky indication in the attention handler to
allow doorbells to be processed again as soon as possible, but running
the doorbell recovery is scheduled for the periodic handler to reduce the
time spent in the attention handler.
Checking the need for DORQ flush was changed to "db_bar_no_edpm" because
qed_edpm_enabled()'s result could change dynamically and might have
prevented a needed flush.

Signed-off-by: Denis Bolotin <dbolotin@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoqed: Fix missing DORQ attentions
Denis Bolotin [Sun, 14 Apr 2019 14:23:07 +0000 (17:23 +0300)]
qed: Fix missing DORQ attentions

When the DORQ (doorbell block) is overflowed, all PFs get attentions at the
same time. If one PF finished handling the attention before another PF even
started, the second PF might miss the DORQ's attention bit and not handle
the attention at all.
If the DORQ attention is missed and the issue is not resolved, another
attention will not be sent, therefore each attention is treated as a
potential DORQ attention.
As a result, the attention callback is called more frequently so the debug
print was moved to reduce its quantity.
The number of periodic doorbell recovery handler schedules was reduced
because it was the previous way to mitigating the missed attention issue.

Signed-off-by: Denis Bolotin <dbolotin@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoqed: Fix the doorbell address sanity check
Denis Bolotin [Sun, 14 Apr 2019 14:23:06 +0000 (17:23 +0300)]
qed: Fix the doorbell address sanity check

Fix the condition which verifies that doorbell address is inside the
doorbell bar by checking that the end of the address is within range
as well.

Signed-off-by: Denis Bolotin <dbolotin@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoqed: Delete redundant doorbell recovery types
Denis Bolotin [Sun, 14 Apr 2019 14:23:05 +0000 (17:23 +0300)]
qed: Delete redundant doorbell recovery types

DB_REC_DRY_RUN (running doorbell recovery without sending doorbells) is
never used. DB_REC_ONCE (send a single doorbell from the doorbell recovery)
is not needed anymore because by running the periodic handler we make sure
we check the overflow status later instead.
This patch is needed because in the next patches, the only doorbell
recovery type being used is DB_REC_REAL_DEAL, and the fixes are much
cleaner without this enum.

Signed-off-by: Denis Bolotin <dbolotin@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agor8169: change irq handler to always trigger NAPI polling
Heiner Kallweit [Sun, 14 Apr 2019 09:48:39 +0000 (11:48 +0200)]
r8169: change irq handler to always trigger NAPI polling

This check isn't really needed and we can simplify the code and save
some CPU cycles by removing it. Only in case of an error none of these
bits are set, and calling the NAPI callback doesn't hurt in this case.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'r8169-phy-func-ptr-arrays'
David S. Miller [Sun, 14 Apr 2019 20:50:05 +0000 (13:50 -0700)]
Merge branch 'r8169-phy-func-ptr-arrays'

Heiner Kallweit says:

====================
r8169: create function pointer arrays for PHY and chip hw init functions

Using function pointer arrays makes the code easier to read and better
maintainable. AFAIK function pointer arrays cause some performance
drawback due to Spectre mitigation, but we're not in a hot path.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agor8169: create function pointer array for chip hw init functions
Heiner Kallweit [Sun, 14 Apr 2019 08:32:07 +0000 (10:32 +0200)]
r8169: create function pointer array for chip hw init functions

Using a function pointer array makes this easier to read and better
maintainable.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agor8169: create function pointer array for PHY init functions
Heiner Kallweit [Sun, 14 Apr 2019 08:30:24 +0000 (10:30 +0200)]
r8169: create function pointer array for PHY init functions

Using a function pointer array makes this easier to read and better
maintainable. AFAIK function pointer arrays cause some performance
drawback due to Spectre mitigation, but we're not in a hot path here.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'hns3-next'
David S. Miller [Sun, 14 Apr 2019 20:47:35 +0000 (13:47 -0700)]
Merge branch 'hns3-next'

Huazhong Tan says:

====================
code optimizations & bugfixes for HNS3 driver

This patch-set includes code optimizations and bugfixes for the HNS3
ethernet controller driver.

[patch 1/12 - 4/12] optimizes the VLAN freature and adds support for port
based VLAN, fixes some related bugs about the current implementation.

[patch 5/12 - 12/12] includes some other code optimizations for the HNS3
ethernet controller driver.

Change log:
V1->V2: modifies some patches' commint log and code.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: code optimization for command queue' spin lock
Peng Li [Sun, 14 Apr 2019 01:47:46 +0000 (09:47 +0800)]
net: hns3: code optimization for command queue' spin lock

This patch removes some redundant BH disable when initializing
and uninitializing command queue.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: free the pending skb when clean RX ring
Peng Li [Sun, 14 Apr 2019 01:47:45 +0000 (09:47 +0800)]
net: hns3: free the pending skb when clean RX ring

If there is pending skb in RX flow when close the port, and the
pending buffer is not cleaned, the new packet will be added to
the pending skb when the port opens again, and the first new
packet has error data.

This patch cleans the pending skb when clean RX ring.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: do not initialize MDIO bus when PHY is inexistent
Jian Shen [Sun, 14 Apr 2019 01:47:44 +0000 (09:47 +0800)]
net: hns3: do not initialize MDIO bus when PHY is inexistent

For some cases, PHY may not be connected to MDIO bus, then
the driver will initialize fail since MDIO bus initialization
fails.

This patch fixes it by skipping the MDIO bus initialization
when PHY is inexistent.

Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: set dividual reset level for all RAS and MSI-X errors
Weihang Li [Sun, 14 Apr 2019 01:47:43 +0000 (09:47 +0800)]
net: hns3: set dividual reset level for all RAS and MSI-X errors

According to hardware description, reset level that should be
triggered are not consistent in a module. For example, in SSU
common errors, the first two bits has no need to do reset,
but the other bits need global reset.

This patch sets separate reset level for all RAS and MSI-X
interrupts by adding a reset_lvel field in struct hclge_hw_error,
and fixes some incorrect reset level.

Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: divide shared buffer between TC
Yunsheng Lin [Sun, 14 Apr 2019 01:47:42 +0000 (09:47 +0800)]
net: hns3: divide shared buffer between TC

Currently hardware may have not enough buffer to receive packet
when it has used more than two MPS(maximum packet size) of
buffer, but there are still a lot of shared buffer left unused
when TC num is small.

This patch divides shared buffer to be used between TC when
the port supports DCB, and adjusts the waterline and threshold
according to user manual for the port that does not support
DCB.

This patch also change hclge_get_tc_num's return type to u32
to avoid signed-unsigned mix with divide.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: always assume no drop TC for performance reason
Yunsheng Lin [Sun, 14 Apr 2019 01:47:41 +0000 (09:47 +0800)]
net: hns3: always assume no drop TC for performance reason

Currently RX shared buffer' threshold size for speific TC is
set to smaller value when the TC's PFC is not enabled, which may
cause performance problem because hardware may not have enough
hardware buffer when PFC is not enabled.

This patch sets the same threshold size for all TC no matter if
the specific TC's PFC is enabled.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: add hns3_gro_complete for HW GRO process
Yunsheng Lin [Sun, 14 Apr 2019 01:47:40 +0000 (09:47 +0800)]
net: hns3: add hns3_gro_complete for HW GRO process

When a GRO packet is received by driver, the cwr field in the
struct tcphdr needs to be checked to decide whether to set the
SKB_GSO_TCP_ECN for skb_shinfo(skb)->gso_type.

So this patch adds hns3_gro_complete to do that, and adds the
hns3_handle_bdinfo to handle the hns3_gro_complete and
hns3_rx_checksum.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: minor refactor for hns3_rx_checksum
Yunsheng Lin [Sun, 14 Apr 2019 01:47:39 +0000 (09:47 +0800)]
net: hns3: minor refactor for hns3_rx_checksum

Change the parameters of hns3_rx_checksum to be more specific to
what is used internally, rather than passing in a pointer to the
whole hns3_desc. Reduces duplicate code and bring this function
inline with the approach used in hns3_set_gro_param.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: fix set port based VLAN issue for VF
Jian Shen [Sun, 14 Apr 2019 01:47:38 +0000 (09:47 +0800)]
net: hns3: fix set port based VLAN issue for VF

In original codes, ndo_set_vf_vlan() in hns3 driver was implemented
wrong. It adds or removes VLAN into VLAN filter for VF, but VF is
unaware of it.

This patch fixes it. When VF loads up, it firstly queries the port
based VLAN state from PF. When user change port based VLAN state
from PF, PF firstly checks whether the VF is alive. If the VF is
alive, then PF notifies the VF the modification; otherwise PF
configure the port based VLAN state directly.

Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: fix set port based VLAN for PF
Jian Shen [Sun, 14 Apr 2019 01:47:37 +0000 (09:47 +0800)]
net: hns3: fix set port based VLAN for PF

In original codes, ndo_set_vf_vlan() in hns3 driver was implemented
wrong. It adds or removes VLAN into VLAN filter for VF, but VF is
unaware of it.

Indeed, ndo_set_vf_vlan() is expected to enable or disable port based
VLAN (hardware inserts a specified VLAN tag to all TX packets for a
specified VF) . When enable port based VLAN, we use port based VLAN id
as VLAN filter entry. When disable port based VLAN, we use VLAN id of
VLAN device.

This patch fixes it for PF, enable/disable port based VLAN when calls
ndo_set_vf_vlan().

Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: fix VLAN offload handle for VLAN inserted by port
Jian Shen [Sun, 14 Apr 2019 01:47:36 +0000 (09:47 +0800)]
net: hns3: fix VLAN offload handle for VLAN inserted by port

Currently, in TX direction, driver implements the TX VLAN offload
by checking the VLAN header in skb, and filling it into TX descriptor.
Usually it works well, but if enable inserting VLAN header based on
port, it may conflict when out_tag field of TX descriptor is already
used, and cause RAS error.

In RX direction, hardware supports stripping max two VLAN headers.
For vlan_tci in skb can only store one VLAN tag, when RX VLAN offload
enabled, driver tells hardware to strip one VLAN header from RX
packet; when RX VLAN offload disabled, driver tells hardware not to
strip VLAN header from RX packet. Now if port based insert VLAN
enabled, all RX packets will have the port based VLAN header. This
header is useless for stack, driver needs to ask hardware to strip
it. Unfortunately, hardware can't drop this VLAN header, and always
fill it into RX descriptor, so driver has to identify and drop it.

Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: modify VLAN initialization to be compatible with port based VLAN
Jian Shen [Sun, 14 Apr 2019 01:47:35 +0000 (09:47 +0800)]
net: hns3: modify VLAN initialization to be compatible with port based VLAN

Our hardware supports inserting a specified VLAN header for each
function when sending packets. User can enable it with command
"ip link set <devname> vf  <vfid> vlan <vlan id>".
For this VLAN header is inserted by hardware, not from stack,
hardware also needs to strip it from received packets before
sending to stack.  In this case, driver needs to tell
hardware which VLAN to insert or strip.

The current VLAN initialization doesn't allow inserting
VLAN header by hardware, this patch modifies it, in order be
compatible with VLAN inserted base on port.

Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv4: ensure rcu_read_lock() in ipv4_link_failure()
Eric Dumazet [Sun, 14 Apr 2019 00:32:21 +0000 (17:32 -0700)]
ipv4: ensure rcu_read_lock() in ipv4_link_failure()

fib_compute_spec_dst() needs to be called under rcu protection.

syzbot reported :

WARNING: suspicious RCU usage
5.1.0-rc4+ #165 Not tainted
include/linux/inetdevice.h:220 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
1 lock held by swapper/0/0:
 #0: 0000000051b67925 ((&n->timer)){+.-.}, at: lockdep_copy_map include/linux/lockdep.h:170 [inline]
 #0: 0000000051b67925 ((&n->timer)){+.-.}, at: call_timer_fn+0xda/0x720 kernel/time/timer.c:1315

stack backtrace:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.1.0-rc4+ #165
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x172/0x1f0 lib/dump_stack.c:113
 lockdep_rcu_suspicious+0x153/0x15d kernel/locking/lockdep.c:5162
 __in_dev_get_rcu include/linux/inetdevice.h:220 [inline]
 fib_compute_spec_dst+0xbbd/0x1030 net/ipv4/fib_frontend.c:294
 spec_dst_fill net/ipv4/ip_options.c:245 [inline]
 __ip_options_compile+0x15a7/0x1a10 net/ipv4/ip_options.c:343
 ipv4_link_failure+0x172/0x400 net/ipv4/route.c:1195
 dst_link_failure include/net/dst.h:427 [inline]
 arp_error_report+0xd1/0x1c0 net/ipv4/arp.c:297
 neigh_invalidate+0x24b/0x570 net/core/neighbour.c:995
 neigh_timer_handler+0xc35/0xf30 net/core/neighbour.c:1081
 call_timer_fn+0x190/0x720 kernel/time/timer.c:1325
 expire_timers kernel/time/timer.c:1362 [inline]
 __run_timers kernel/time/timer.c:1681 [inline]
 __run_timers kernel/time/timer.c:1649 [inline]
 run_timer_softirq+0x652/0x1700 kernel/time/timer.c:1694
 __do_softirq+0x266/0x95a kernel/softirq.c:293
 invoke_softirq kernel/softirq.c:374 [inline]
 irq_exit+0x180/0x1d0 kernel/softirq.c:414
 exiting_irq arch/x86/include/asm/apic.h:536 [inline]
 smp_apic_timer_interrupt+0x14a/0x570 arch/x86/kernel/apic/apic.c:1062
 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:807

Fixes: ed0de45a1008 ("ipv4: recompile ip options in ipv4_link_failure")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Stephen Suryaputra <ssuryaextr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'net-phy-shrink-PHY-settings-array-and-add-200Gbps-support'
David S. Miller [Sun, 14 Apr 2019 20:37:06 +0000 (13:37 -0700)]
Merge branch 'net-phy-shrink-PHY-settings-array-and-add-200Gbps-support'

Heiner Kallweit says:

====================
net: phy: shrink PHY settings array and add 200Gbps support

The definition of array settings[] is quite lengthy meanwhile. Add a
macro to shrink the definition.

When doing this I saw that the new 200Gbps modes and few 100Gbps/50Gbps
modes aren't supported in phylib yet. So add this.

To avoid ethtool and phylib mode definitions getting out of sync, add
a build bug to check for this.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agophy: warn if phylib and ethtool PHY mode definitions are out of sync
Heiner Kallweit [Sat, 13 Apr 2019 18:53:43 +0000 (20:53 +0200)]
phy: warn if phylib and ethtool PHY mode definitions are out of sync

If new PHY modes are added people may miss to update all relevant places
in the kernel. Therefore add a build bug check for new modes in enum
ethtool_link_mode_bit_indices that haven't been added to phylib yet.

Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: add support for new modes in phylib
Heiner Kallweit [Sat, 13 Apr 2019 18:50:24 +0000 (20:50 +0200)]
net: phy: add support for new modes in phylib

Recently new modes have been added to ethtool.h, but the related
extension to phylib hasn't been done yet. So add support for these
modes.

v2:
- add missing 100Gbps and 50Gbps modes

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: shrink PHY settings array
Heiner Kallweit [Sat, 13 Apr 2019 18:48:55 +0000 (20:48 +0200)]
net: phy: shrink PHY settings array

The definition of array settings[] is quite lengthy meanwhile. Add a
macro to shrink the definition.

v2:
- Fix an indentation issue

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agofs: prevent page refcount overflow in pipe_buf_get
Matthew Wilcox [Fri, 5 Apr 2019 21:02:10 +0000 (14:02 -0700)]
fs: prevent page refcount overflow in pipe_buf_get

Change pipe_buf_get() to return a bool indicating whether it succeeded
in raising the refcount of the page (if the thing in the pipe is a page).
This removes another mechanism for overflowing the page refcount.  All
callers converted to handle a failure.

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agomm: prevent get_user_pages() from overflowing page refcount
Linus Torvalds [Thu, 11 Apr 2019 17:49:19 +0000 (10:49 -0700)]
mm: prevent get_user_pages() from overflowing page refcount

If the page refcount wraps around past zero, it will be freed while
there are still four billion references to it.  One of the possible
avenues for an attacker to try to make this happen is by doing direct IO
on a page multiple times.  This patch makes get_user_pages() refuse to
take a new page reference if there are already more than two billion
references to the page.

Reported-by: Jann Horn <jannh@google.com>
Acked-by: Matthew Wilcox <willy@infradead.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agomm: add 'try_get_page()' helper function
Linus Torvalds [Thu, 11 Apr 2019 17:14:59 +0000 (10:14 -0700)]
mm: add 'try_get_page()' helper function

This is the same as the traditional 'get_page()' function, but instead
of unconditionally incrementing the reference count of the page, it only
does so if the count was "safe".  It returns whether the reference count
was incremented (and is marked __must_check, since the caller obviously
has to be aware of it).

Also like 'get_page()', you can't use this function unless you already
had a reference to the page.  The intent is that you can use this
exactly like get_page(), but in situations where you want to limit the
maximum reference count.

The code currently does an unconditional WARN_ON_ONCE() if we ever hit
the reference count issues (either zero or negative), as a notification
that the conditional non-increment actually happened.

NOTE! The count access for the "safety" check is inherently racy, but
that doesn't matter since the buffer we use is basically half the range
of the reference count (ie we look at the sign of the count).

Acked-by: Matthew Wilcox <willy@infradead.org>
Cc: Jann Horn <jannh@google.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agomm: make page ref count overflow check tighter and more explicit
Linus Torvalds [Thu, 11 Apr 2019 17:06:20 +0000 (10:06 -0700)]
mm: make page ref count overflow check tighter and more explicit

We have a VM_BUG_ON() to check that the page reference count doesn't
underflow (or get close to overflow) by checking the sign of the count.

That's all fine, but we actually want to allow people to use a "get page
ref unless it's already very high" helper function, and we want that one
to use the sign of the page ref (without triggering this VM_BUG_ON).

Change the VM_BUG_ON to only check for small underflows (or _very_ close
to overflowing), and ignore overflows which have strayed into negative
territory.

Acked-by: Matthew Wilcox <willy@infradead.org>
Cc: Jann Horn <jannh@google.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agoMerge tag 'for-linus-20190412' of git://git.kernel.dk/linux-block
Linus Torvalds [Sat, 13 Apr 2019 23:23:16 +0000 (16:23 -0700)]
Merge tag 'for-linus-20190412' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "Set of fixes that should go into this round. This pull is larger than
  I'd like at this time, but there's really no specific reason for that.
  Some are fixes for issues that went into this merge window, others are
  not. Anyway, this contains:

   - Hardware queue limiting for virtio-blk/scsi (Dongli)

   - Multi-page bvec fixes for lightnvm pblk

   - Multi-bio dio error fix (Jason)

   - Remove the cache hint from the io_uring tool side, since we didn't
     move forward with that (me)

   - Make io_uring SETUP_SQPOLL root restricted (me)

   - Fix leak of page in error handling for pc requests (Jérôme)

   - Fix BFQ regression introduced in this merge window (Paolo)

   - Fix break logic for bio segment iteration (Ming)

   - Fix NVMe cancel request error handling (Ming)

   - NVMe pull request with two fixes (Christoph):
       - fix the initial CSN for nvme-fc (James)
       - handle log page offsets properly in the target (Keith)"

* tag 'for-linus-20190412' of git://git.kernel.dk/linux-block:
  block: fix the return errno for direct IO
  nvmet: fix discover log page when offsets are used
  nvme-fc: correct csn initialization and increments on error
  block: do not leak memory in bio_copy_user_iov()
  lightnvm: pblk: fix crash in pblk_end_partial_read due to multipage bvecs
  nvme: cancel request synchronously
  blk-mq: introduce blk_mq_complete_request_sync()
  scsi: virtio_scsi: limit number of hw queues by nr_cpu_ids
  virtio-blk: limit number of hw queues by nr_cpu_ids
  block, bfq: fix use after free in bfq_bfqq_expire
  io_uring: restrict IORING_SETUP_SQPOLL to root
  tools/io_uring: remove IOCQE_FLAG_CACHEHIT
  block: don't use for-inside-for in bio_for_each_segment_all

5 years agoMerge tag 'nfs-for-5.1-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Linus Torvalds [Sat, 13 Apr 2019 21:47:06 +0000 (14:47 -0700)]
Merge tag 'nfs-for-5.1-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 "Highlights include:

  Stable fix:

   - Fix a deadlock in close() due to incorrect draining of RDMA queues

  Bugfixes:

   - Revert "SUNRPC: Micro-optimise when the task is known not to be
     sleeping" as it is causing stack overflows

   - Fix a regression where NFSv4 getacl and fs_locations stopped
     working

   - Forbid setting AF_INET6 to "struct sockaddr_in"->sin_family.

   - Fix xfstests failures due to incorrect copy_file_range() return
     values"

* tag 'nfs-for-5.1-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  Revert "SUNRPC: Micro-optimise when the task is known not to be sleeping"
  NFSv4.1 fix incorrect return value in copy_file_range
  xprtrdma: Fix helper that drains the transport
  NFS: Fix handling of reply page vector
  NFS: Forbid setting AF_INET6 to "struct sockaddr_in"->sin_family.

5 years agoMerge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Sat, 13 Apr 2019 21:37:49 +0000 (14:37 -0700)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fix from James Bottomley:
 "One obvious fix for a ciostor data corruption on error bug"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: csiostor: fix missing data copy in csio_scsi_err_handler()

5 years agoMerge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 13 Apr 2019 21:33:56 +0000 (14:33 -0700)]
Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

Pull clk fixes from Stephen Boyd:
 "Here's more than a handful of clk driver fixes for changes that came
  in during the merge window:

   - Fix the AT91 sama5d2 programmable clk prescaler formula

   - A bunch of Amlogic meson clk driver fixes for the VPU clks

   - A DMI quirk for Intel's Bay Trail SoC's driver to properly mark pmc
     clks as critical only when really needed

   - Stop overwriting CLK_SET_RATE_PARENT flag in mediatek's clk gate
     implementation

   - Use the right structure to test for a frequency table in i.MX's
     PLL_1416x driver"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: imx: Fix PLL_1416X not rounding rates
  clk: mediatek: fix clk-gate flag setting
  platform/x86: pmc_atom: Drop __initconst on dmi table
  clk: x86: Add system specific quirk to mark clocks as critical
  clk: meson: vid-pll-div: remove warning and return 0 on invalid config
  clk: meson: pll: fix rounding and setting a rate that matches precisely
  clk: meson-g12a: fix VPU clock parents
  clk: meson: g12a: fix VPU clock muxes mask
  clk: meson-gxbb: round the vdec dividers to closest
  clk: at91: fix programmable clock for sama5d2

5 years agoMerge tag 'pci-v5.1-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Linus Torvalds [Sat, 13 Apr 2019 21:29:21 +0000 (14:29 -0700)]
Merge tag 'pci-v5.1-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI fixes from Bjorn Helgaas:

 - Add a DMA alias quirk for another Marvell SATA device (Andre
   Przywara)

 - Fix a pciehp regression that broke safe removal of devices (Sergey
   Miroshnichenko)

* tag 'pci-v5.1-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: pciehp: Ignore Link State Changes after powering off a slot
  PCI: Add function 1 DMA alias quirk for Marvell 9170 SATA controller

5 years agoMerge tag 'powerpc-5.1-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Linus Torvalds [Sat, 13 Apr 2019 16:03:09 +0000 (09:03 -0700)]
Merge tag 'powerpc-5.1-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "A minor build fix for 64-bit FLATMEM configs.

  A fix for a boot failure on 32-bit powermacs.

  My commit to fix CLOCK_MONOTONIC across Y2038 broke the 32-bit VDSO on
  64-bit kernels, ie. compat mode, which is only used on big endian.

  The rewrite of the SLB code we merged in 4.20 missed the fact that the
  0x380 exception is also used with the Radix MMU to report out of range
  accesses. This could lead to an oops if userspace tried to read from
  addresses outside the user or kernel range.

  Thanks to: Aneesh Kumar K.V, Christophe Leroy, Larry Finger, Nicholas
  Piggin"

* tag 'powerpc-5.1-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/mm: Define MAX_PHYSMEM_BITS for all 64-bit configs
  powerpc/64s/radix: Fix radix segment exception handling
  powerpc/vdso32: fix CLOCK_MONOTONIC on PPC64
  powerpc/32: Fix early boot failure with RTAS built-in

5 years agoMerge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Linus Torvalds [Sat, 13 Apr 2019 15:57:00 +0000 (08:57 -0700)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Will Deacon:
 "The main thing is a fix to our FUTEX_WAKE_OP implementation which was
  unbelievably broken, but did actually work for the one scenario that
  GLIBC used to use.

  Summary:

   - Fix stack unwinding so we ignore user stacks

   - Fix ftrace module PLT trampoline initialisation checks

   - Fix terminally broken implementation of FUTEX_WAKE_OP atomics"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: futex: Fix FUTEX_WAKE_OP atomic ops with non-zero result value
  arm64: backtrace: Don't bother trying to unwind the userspace stack
  arm64/ftrace: fix inadvertent BUG() in trampoline check

5 years agoMerge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 13 Apr 2019 03:54:40 +0000 (20:54 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:
 "Fix typos in user-visible resctrl parameters, and also fix assembly
  constraint bugs that might result in miscompilation"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/asm: Use stricter assembly constraints in bitops
  x86/resctrl: Fix typos in the mba_sc mount option

5 years agoMerge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 13 Apr 2019 03:52:28 +0000 (20:52 -0700)]
Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fix from Ingo Molnar:
 "Fix the alarm_timer_remaining() return value"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  alarmtimer: Return correct remaining time

5 years agoMerge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 13 Apr 2019 03:50:43 +0000 (20:50 -0700)]
Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fix from Ingo Molnar:
 "Fix a NULL pointer dereference crash in certain environments"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/fair: Do not re-read ->h_load_next during hierarchical load calculation

5 years agoMerge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 13 Apr 2019 03:42:30 +0000 (20:42 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fixes from Ingo Molnar:
 "Six kernel side fixes: three related to NMI handling on AMD systems, a
  race fix, a kexec initialization fix and a PEBS sampling fix"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/core: Fix perf_event_disable_inatomic() race
  x86/perf/amd: Remove need to check "running" bit in NMI handler
  x86/perf/amd: Resolve NMI latency issues for active PMCs
  x86/perf/amd: Resolve race condition when disabling PMC
  perf/x86/intel: Initialize TFA MSR
  perf/x86/intel: Fix handling of wakeup_events for multi-entry PEBS

5 years agoMerge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 13 Apr 2019 03:31:08 +0000 (20:31 -0700)]
Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking fix from Ingo Molnar:
 "Fixes a crash when accessing /proc/lockdep"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/lockdep: Zap lock classes even with lock debugging disabled

5 years agoMerge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 13 Apr 2019 03:21:59 +0000 (20:21 -0700)]
Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fixes from Ingo Molnar:
 "Two genirq fixes, plus an irqchip driver error handling fix"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq: Respect IRQCHIP_SKIP_SET_WAKE in irq_chip_set_wake_parent()
  genirq: Initialize request_mutex if CONFIG_SPARSE_IRQ=n
  irqchip/irq-ls1x: Missing error code in ls1x_intc_of_init()

5 years agoMerge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 13 Apr 2019 03:13:13 +0000 (20:13 -0700)]
Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull core fixes from Ingo Molnar:
 "Fix an objtool warning plus fix a u64_to_user_ptr() macro expansion
  bug"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool: Add rewind_stack_do_exit() to the noreturn list
  linux/kernel.h: Use parentheses around argument in u64_to_user_ptr()

5 years agoMerge branch 'rhashtable-bit-locking-m68k'
David S. Miller [Sat, 13 Apr 2019 00:34:46 +0000 (17:34 -0700)]
Merge branch 'rhashtable-bit-locking-m68k'

NeilBrown says:

====================
Fix rhashtable bit-locking for m68k

As reported by Guenter Roeck, the new rhashtable bit-locking
doesn't work on m68k as it only requires 2-byte alignment, so BIT(1)
is addresses is not unused.

We current use BIT(0) to identify a NULLS marker, but that is only
needed in ->next pointers.  The bucket head does not need a NULLS
marker, so the lsb there can be used for locking.

the first 4 patches make some small improvements and re-arrange some
code.  The final patch converts to using only BIT(0) for these two
different special purposes.

I had previously suggested dropping the series until I fix it.  Given
that this was fairly easy, I retract that I think it best simply to
add these patches to fix the code.
====================

Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agorhashtable: use BIT(0) for locking.
NeilBrown [Fri, 12 Apr 2019 01:52:08 +0000 (11:52 +1000)]
rhashtable: use BIT(0) for locking.

As reported by Guenter Roeck, the new bit-locking using
BIT(1) doesn't work on the m68k architecture.  m68k only requires
2-byte alignment for words and longwords, so there is only one
unused bit in pointers to structs - We current use two, one for the
NULLS marker at the end of the linked list, and one for the bit-lock
in the head of the list.

The two uses don't need to conflict as we never need the head of the
list to be a NULLS marker - the marker is only needed to check if an
object has moved to a different table, and the bucket head cannot
move.  The NULLS marker is only needed in a ->next pointer.

As we already have different types for the bucket head pointer (struct
rhash_lock_head) and the ->next pointers (struct rhash_head), it is
fairly easy to treat the lsb differently in each.

So: Initialize buckets heads to NULL, and use the lsb for locking.
When loading the pointer from the bucket head, if it is NULL (ignoring
the lock big), report as being the expected NULLS marker.
When storing a value into a bucket head, if it is a NULLS marker,
store NULL instead.

And convert all places that used bit 1 for locking, to use bit 0.

Fixes: 8f0db018006a ("rhashtable: use bit_spin_locks to protect hash bucket.")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agorhashtable: replace rht_ptr_locked() with rht_assign_locked()
NeilBrown [Fri, 12 Apr 2019 01:52:08 +0000 (11:52 +1000)]
rhashtable: replace rht_ptr_locked() with rht_assign_locked()

The only times rht_ptr_locked() is used, it is to store a new
value in a bucket-head.  This is the only time it makes sense
to use it too.  So replace it by a function which does the
whole task:  Sets the lock bit and assigns to a bucket head.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agorhashtable: move dereference inside rht_ptr()
NeilBrown [Fri, 12 Apr 2019 01:52:08 +0000 (11:52 +1000)]
rhashtable: move dereference inside rht_ptr()

Rather than dereferencing a pointer to a bucket and then passing the
result to rht_ptr(), we now pass in the pointer and do the dereference
in rht_ptr().

This requires that we pass in the tbl and hash as well to support RCU
checks, and means that the various rht_for_each functions can expect a
pointer that can be dereferenced without further care.

There are two places where we dereference a bucket pointer
where there is no testable protection - in each case we know
that we much have exclusive access without having taken a lock.
The previous code used rht_dereference() to pretend that holding
the mutex provided protects, but holding the mutex never provides
protection for accessing buckets.

So instead introduce rht_ptr_exclusive() that can be used when
there is known to be exclusive access without holding any locks.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agorhashtable: reorder some inline functions and macros.
NeilBrown [Fri, 12 Apr 2019 01:52:08 +0000 (11:52 +1000)]
rhashtable: reorder some inline functions and macros.

This patch only moves some code around, it doesn't
change the code at all.
A subsequent patch will benefit from this as it needs
to add calls to functions which are now defined before the
call-site, but weren't before.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agorhashtable: fix some __rcu annotation errors
NeilBrown [Fri, 12 Apr 2019 01:52:07 +0000 (11:52 +1000)]
rhashtable: fix some __rcu annotation errors

With these annotations, the rhashtable now gets no
warnings when compiled with "C=1" for sparse checking.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agorhashtable: use struct_size() in kvzalloc()
Gustavo A. R. Silva [Thu, 11 Apr 2019 23:43:06 +0000 (18:43 -0500)]
rhashtable: use struct_size() in kvzalloc()

One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along with
memory for some number of elements for that array.  For example:

struct foo {
    int stuff;
    struct boo entry[];
};

size = sizeof(struct foo) + count * sizeof(struct boo);
instance = kvzalloc(size, GFP_KERNEL);

Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:

instance = kvzalloc(struct_size(instance, entry, count), GFP_KERNEL);

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'nfp-update-to-control-structures'
David S. Miller [Sat, 13 Apr 2019 00:29:15 +0000 (17:29 -0700)]
Merge branch 'nfp-update-to-control-structures'

Jakub Kicinski says:

====================
nfp: update to control structures

This series prepares NFP control structures for crypto offloads.
So far we mostly dealt with configuration requests under rtnl lock.
This will no longer be the case with crypto.  Additionally we will
try to reuse the BPF control message format, so we move common code
out of BPF.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonfp: split out common control message handling code
Jakub Kicinski [Fri, 12 Apr 2019 03:27:07 +0000 (20:27 -0700)]
nfp: split out common control message handling code

BPF's control message handler seems like a good base to built
on for request-reply control messages.  Split it out to allow
for reuse.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonfp: move vNIC reset before netdev init
Jakub Kicinski [Fri, 12 Apr 2019 03:27:06 +0000 (20:27 -0700)]
nfp: move vNIC reset before netdev init

During probe we clear vNIC configuration in case the device
wasn't closed cleanly by previous driver.  Move that code
before netdev init, so netdev init can already try to apply
its config parameters.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonfp: add a mutex lock for the vNIC ctrl BAR
Jakub Kicinski [Fri, 12 Apr 2019 03:27:05 +0000 (20:27 -0700)]
nfp: add a mutex lock for the vNIC ctrl BAR

Soon we will try to write to the vNIC mailbox without RTNL held.
Add a new mutex to protect access to specific parts of the PCI
control BAR.

Move the mailbox size checking to the mailbox lock() helper, where
it can be more effective (happen prior to potential overwrite of
other data).

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonfp: opportunistically poll for reconfig result
Dirk van der Merwe [Fri, 12 Apr 2019 03:27:04 +0000 (20:27 -0700)]
nfp: opportunistically poll for reconfig result

If the reconfig was a quick update, we could have results available from
firmware within 200us.

Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv4: recompile ip options in ipv4_link_failure
Stephen Suryaputra [Fri, 12 Apr 2019 20:19:27 +0000 (16:19 -0400)]
ipv4: recompile ip options in ipv4_link_failure

Recompile IP options since IPCB may not be valid anymore when
ipv4_link_failure is called from arp_error_report.

Refer to the commit 3da1ed7ac398 ("net: avoid use IPCB in cipso_v4_error")
and the commit before that (9ef6b42ad6fd) for a similar issue.

Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv6: Remove flowi6_oif compare from __ip6_route_redirect
David Ahern [Fri, 12 Apr 2019 17:31:14 +0000 (10:31 -0700)]
ipv6: Remove flowi6_oif compare from __ip6_route_redirect

In the review of 0b34eb004347 ("ipv6: Refactor __ip6_route_redirect"),
Martin noted that the flowi6_oif compare is moved to the new helper and
should be removed from __ip6_route_redirect. Fix the oversight.

Fixes: 0b34eb004347 ("ipv6: Refactor __ip6_route_redirect")
Reported-by: Martin Lau <kafai@fb.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'rxrpc-fixes'
David S. Miller [Fri, 12 Apr 2019 23:57:23 +0000 (16:57 -0700)]
Merge branch 'rxrpc-fixes'

David Howells says:

====================
rxrpc: Fixes

Here is a collection of fixes for rxrpc:

 (1) rxrpc_error_report() needs to call sock_error() to clear the error
     code from the UDP transport socket, lest it be unexpectedly revisited
     on the next kernel_sendmsg() call.  This has been causing all sorts of
     weird effects in AFS as the effects have typically been felt by the
     wrong RxRPC call.

 (2) Allow a kernel user of AF_RXRPC to easily detect if an rxrpc call has
     completed.

 (3) Allow errors incurred by attempting to transmit data through the UDP
     socket to get back up the stack to AFS.

 (4) Make AFS use (2) to abort the synchronous-mode call waiting loop if
     the rxrpc-level call completed.

 (5) Add a missing tracepoint case for tracing abort reception.

 (6) Fix detection and handling of out-of-order ACKs.

====================

Tested-by: Jonathan Billings <jsbillin@umich.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agorxrpc: Fix detection of out of order acks
Jeffrey Altman [Fri, 12 Apr 2019 15:34:16 +0000 (16:34 +0100)]
rxrpc: Fix detection of out of order acks

The rxrpc packet serial number cannot be safely used to compute out of
order ack packets for several reasons:

 1. The allocation of serial numbers cannot be assumed to imply the order
    by which acks are populated and transmitted.  In some rxrpc
    implementations, delayed acks and ping acks are transmitted
    asynchronously to the receipt of data packets and so may be transmitted
    out of order.  As a result, they can race with idle acks.

 2. Serial numbers are allocated by the rxrpc connection and not the call
    and as such may wrap independently if multiple channels are in use.

In any case, what matters is whether the ack packet provides new
information relating to the bounds of the window (the firstPacket and
previousPacket in the ACK data).

Fix this by discarding packets that appear to wind back the window bounds
rather than on serial number procession.

Fixes: 298bc15b2079 ("rxrpc: Only take the rwind and mtu values from latest ACK")
Signed-off-by: Jeffrey Altman <jaltman@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agorxrpc: Trace received connection aborts
David Howells [Fri, 12 Apr 2019 15:34:09 +0000 (16:34 +0100)]
rxrpc: Trace received connection aborts

Trace received calls that are aborted due to a connection abort, typically
because of authentication failure.  Without this, connection aborts don't
show up in the trace log.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoafs: Check for rxrpc call completion in wait loop
Marc Dionne [Fri, 12 Apr 2019 15:34:02 +0000 (16:34 +0100)]
afs: Check for rxrpc call completion in wait loop

Check the state of the rxrpc call backing an afs call in each iteration of
the call wait loop in case the rxrpc call has already been terminated at
the rxrpc layer.

Interrupt the wait loop and mark the afs call as complete if the rxrpc
layer call is complete.

There were cases where rxrpc errors were not passed up to afs, which could
result in this loop waiting forever for an afs call to transition to
AFS_CALL_COMPLETE while the rx call was already complete.

Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agorxrpc: Allow errors to be returned from rxrpc_queue_packet()
Marc Dionne [Fri, 12 Apr 2019 15:33:54 +0000 (16:33 +0100)]
rxrpc: Allow errors to be returned from rxrpc_queue_packet()

Change rxrpc_queue_packet()'s signature so that it can return any error
code it may encounter when trying to send the packet.

This allows the caller to eventually do something in case of error - though
it should be noted that the packet has been queued and a resend is
scheduled.

Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agorxrpc: Make rxrpc_kernel_check_life() indicate if call completed
Marc Dionne [Fri, 12 Apr 2019 15:33:47 +0000 (16:33 +0100)]
rxrpc: Make rxrpc_kernel_check_life() indicate if call completed

Make rxrpc_kernel_check_life() pass back the life counter through the
argument list and return true if the call has not yet completed.

Suggested-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agorxrpc: Clear socket error
Marc Dionne [Fri, 12 Apr 2019 15:33:40 +0000 (16:33 +0100)]
rxrpc: Clear socket error

When an ICMP or ICMPV6 error is received, the error will be attached
to the socket (sk_err) and the report function will get called.
Clear any pending error here by calling sock_error().

This would cause the following attempt to use the socket to fail with
the error code stored by the ICMP error, resulting in unexpected errors
with various side effects depending on the context.

Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Jonathan Billings <jsbillin@umich.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoqede: fix write to free'd pointer error and double free of ptp
Colin Ian King [Fri, 12 Apr 2019 14:13:27 +0000 (15:13 +0100)]
qede: fix write to free'd pointer error and double free of ptp

The err2 error return path calls qede_ptp_disable that cleans up
on an error and frees ptp. After this, the free'd ptp is dereferenced
when ptp->clock is set to NULL and the code falls-through to error
path err1 that frees ptp again.

Fix this by calling qede_ptp_disable and exiting via an error
return path that does not set ptp->clock or kfree ptp.

Addresses-Coverity: ("Write to pointer after free")
Fixes: 035744975aec ("qede: Add support for PTP resource locking.")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agovxge: fix return of a free'd memblock on a failed dma mapping
Colin Ian King [Fri, 12 Apr 2019 13:45:12 +0000 (14:45 +0100)]
vxge: fix return of a free'd memblock on a failed dma mapping

Currently if a pci dma mapping failure is detected a free'd
memblock address is returned rather than a NULL (that indicates
an error). Fix this by ensuring NULL is returned on this error case.

Addresses-Coverity: ("Use after free")
Fixes: 528f727279ae ("vxge: code cleanup and reorganization")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'netdevsim-Mostly-cleanup-in-sdev-bpf-iface-area'
David S. Miller [Fri, 12 Apr 2019 23:49:54 +0000 (16:49 -0700)]
Merge branch 'netdevsim-Mostly-cleanup-in-sdev-bpf-iface-area'

Jiri Pirko says:

====================
netdevsim: Mostly cleanup in sdev/bpf iface area

This patches does mainly internal netdevsim code shuffle. Nothing
serious, just small changes to help readability and preparations for
future work.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonetdevsim: move sdev-specific init/uninit code into separate functions
Jiri Pirko [Fri, 12 Apr 2019 12:49:29 +0000 (14:49 +0200)]
netdevsim: move sdev-specific init/uninit code into separate functions

In order to improve readability and prepare for future code changes,
move sdev specific init/uninit code into separate functions.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonetdevsim: make bpf_offload_dev_create() per-sdev instead of first ns
Jiri Pirko [Fri, 12 Apr 2019 12:49:28 +0000 (14:49 +0200)]
netdevsim: make bpf_offload_dev_create() per-sdev instead of first ns

offload dev is stored in sdev struct. However, first netdevsim instance
is used as a priv. Change this to be sdev to as it is shared among
multiple netdevsim instances.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonetdevsim: move sdev specific bpf debugfs files to sdev dir
Jiri Pirko [Fri, 12 Apr 2019 12:49:27 +0000 (14:49 +0200)]
netdevsim: move sdev specific bpf debugfs files to sdev dir

Some netdevsim bpf debugfs files are per-sdev, yet they are defined per
netdevsim instance. Move them under sdev directory.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonetdevsim: move shared dev creation and destruction into separate file
Jiri Pirko [Fri, 12 Apr 2019 12:49:26 +0000 (14:49 +0200)]
netdevsim: move shared dev creation and destruction into separate file

To make code easier to read, move shared dev bits into a separate file.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoDocumentation: net: dsa: transition to the rst format
Ioana Ciornei [Fri, 12 Apr 2019 11:55:18 +0000 (14:55 +0300)]
Documentation: net: dsa: transition to the rst format

This patch also performs some minor adjustments such as numbering for
the receive path sequence, conversion of keywords to inline literals and
adding an index page so it looks better in the output of 'make htmldocs'.

Signed-off-by: Ioana Ciornei <ciorneiioana@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: veth: use generic helper to report timestamping info
Julian Wiedmann [Fri, 12 Apr 2019 11:06:15 +0000 (13:06 +0200)]
net: veth: use generic helper to report timestamping info

For reporting the common set of SW timestamping capabilities, use
ethtool_op_get_ts_info() instead of re-implementing it.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: loopback: use generic helper to report timestamping info
Julian Wiedmann [Fri, 12 Apr 2019 11:06:14 +0000 (13:06 +0200)]
net: loopback: use generic helper to report timestamping info

For reporting the common set of SW timestamping capabilities, use
ethtool_op_get_ts_info() instead of re-implementing it.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dummy: use generic helper to report timestamping info
Julian Wiedmann [Fri, 12 Apr 2019 11:06:13 +0000 (13:06 +0200)]
net: dummy: use generic helper to report timestamping info

For reporting the common set of SW timestamping capabilities, use
ethtool_op_get_ts_info() instead of re-implementing it.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoclk: imx: Fix PLL_1416X not rounding rates
Leonard Crestez [Fri, 12 Apr 2019 14:10:03 +0000 (14:10 +0000)]
clk: imx: Fix PLL_1416X not rounding rates

Code which initializes the "clk_init_data.ops" checks pll->rate_table
before that field is ever assigned to so it always picks
"clk_pll1416x_min_ops".

This breaks dynamic rate rounding for features such as cpufreq.

Fix by checking pll_clk->rate_table instead, here pll_clk refers to
the constant initialization data coming from per-soc clk driver.

Signed-off-by: Leonard Crestez <leonard.crestez@nxp.com>
Fixes: 8646d4dcc7fb ("clk: imx: Add PLLs driver for imx8mm soc")
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
5 years agoMerge tag 'iwlwifi-for-kalle-2019-04-03' of git://git.kernel.org/pub/scm/linux/kernel...
Kalle Valo [Fri, 12 Apr 2019 18:34:27 +0000 (21:34 +0300)]
Merge tag 'iwlwifi-for-kalle-2019-04-03' of git://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-fixes

Second batch of iwlwifi fixes intended for v5.1

* fix for a potential deadlock in the TX path;
* a fix for offloaded rate-control;
* support new PCI HW IDs which use a new FW;

5 years agomt76x02: avoid status_list.lock and sta->rate_ctrl_lock dependency
Stanislaw Gruszka [Fri, 5 Apr 2019 11:42:56 +0000 (13:42 +0200)]
mt76x02: avoid status_list.lock and sta->rate_ctrl_lock dependency

Move ieee80211_tx_status_ext() outside of status_list lock section
in order to avoid locking dependency and possible deadlock reposed by
LOCKDEP in below warning.

Also do mt76_tx_status_lock() just before it's needed.

[  440.224832] WARNING: possible circular locking dependency detected
[  440.224833] 5.1.0-rc2+ #22 Not tainted
[  440.224834] ------------------------------------------------------
[  440.224835] kworker/u16:28/2362 is trying to acquire lock:
[  440.224836] 0000000089b8cacf (&(&q->lock)->rlock#2){+.-.}, at: mt76_wake_tx_queue+0x4c/0xb0 [mt76]
[  440.224842]
               but task is already holding lock:
[  440.224842] 000000002cfedc59 (&(&sta->lock)->rlock){+.-.}, at: ieee80211_stop_tx_ba_cb+0x32/0x1f0 [mac80211]
[  440.224863]
               which lock already depends on the new lock.

[  440.224863]
               the existing dependency chain (in reverse order) is:
[  440.224864]
               -> #3 (&(&sta->lock)->rlock){+.-.}:
[  440.224869]        _raw_spin_lock_bh+0x34/0x40
[  440.224880]        ieee80211_start_tx_ba_session+0xe4/0x3d0 [mac80211]
[  440.224894]        minstrel_ht_get_rate+0x45c/0x510 [mac80211]
[  440.224906]        rate_control_get_rate+0xc1/0x140 [mac80211]
[  440.224918]        ieee80211_tx_h_rate_ctrl+0x195/0x3c0 [mac80211]
[  440.224930]        ieee80211_xmit_fast+0x26d/0xa50 [mac80211]
[  440.224942]        __ieee80211_subif_start_xmit+0xfc/0x310 [mac80211]
[  440.224954]        ieee80211_subif_start_xmit+0x38/0x390 [mac80211]
[  440.224956]        dev_hard_start_xmit+0xb8/0x300
[  440.224957]        __dev_queue_xmit+0x7d4/0xbb0
[  440.224968]        ip6_finish_output2+0x246/0x860 [ipv6]
[  440.224978]        mld_sendpack+0x1bd/0x360 [ipv6]
[  440.224987]        mld_ifc_timer_expire+0x1a4/0x2f0 [ipv6]
[  440.224989]        call_timer_fn+0x89/0x2a0
[  440.224990]        run_timer_softirq+0x1bd/0x4d0
[  440.224992]        __do_softirq+0xdb/0x47c
[  440.224994]        irq_exit+0xfa/0x100
[  440.224996]        smp_apic_timer_interrupt+0x9a/0x220
[  440.224997]        apic_timer_interrupt+0xf/0x20
[  440.224999]        cpuidle_enter_state+0xc1/0x470
[  440.225000]        do_idle+0x21a/0x260
[  440.225001]        cpu_startup_entry+0x19/0x20
[  440.225004]        start_secondary+0x135/0x170
[  440.225006]        secondary_startup_64+0xa4/0xb0
[  440.225007]
               -> #2 (&(&sta->rate_ctrl_lock)->rlock){+.-.}:
[  440.225009]        _raw_spin_lock_bh+0x34/0x40
[  440.225022]        rate_control_tx_status+0x4f/0xb0 [mac80211]
[  440.225031]        ieee80211_tx_status_ext+0x142/0x1a0 [mac80211]
[  440.225035]        mt76x02_send_tx_status+0x2e4/0x340 [mt76x02_lib]
[  440.225037]        mt76x02_tx_status_data+0x31/0x40 [mt76x02_lib]
[  440.225040]        mt76u_tx_status_data+0x51/0xa0 [mt76_usb]
[  440.225042]        process_one_work+0x237/0x5d0
[  440.225043]        worker_thread+0x3c/0x390
[  440.225045]        kthread+0x11d/0x140
[  440.225046]        ret_from_fork+0x3a/0x50
[  440.225047]
               -> #1 (&(&list->lock)->rlock#8){+.-.}:
[  440.225049]        _raw_spin_lock_bh+0x34/0x40
[  440.225052]        mt76_tx_status_skb_add+0x51/0x100 [mt76]
[  440.225054]        mt76x02u_tx_prepare_skb+0xbd/0x116 [mt76x02_usb]
[  440.225056]        mt76u_tx_queue_skb+0x5f/0x180 [mt76_usb]
[  440.225058]        mt76_tx+0x93/0x190 [mt76]
[  440.225070]        ieee80211_tx_frags+0x148/0x210 [mac80211]
[  440.225081]        __ieee80211_tx+0x75/0x1b0 [mac80211]
[  440.225092]        ieee80211_tx+0xde/0x110 [mac80211]
[  440.225105]        __ieee80211_tx_skb_tid_band+0x72/0x90 [mac80211]
[  440.225122]        ieee80211_send_auth+0x1f3/0x360 [mac80211]
[  440.225141]        ieee80211_auth.cold.40+0x6c/0x100 [mac80211]
[  440.225156]        ieee80211_mgd_auth.cold.50+0x132/0x15f [mac80211]
[  440.225171]        cfg80211_mlme_auth+0x149/0x360 [cfg80211]
[  440.225181]        nl80211_authenticate+0x273/0x2e0 [cfg80211]
[  440.225183]        genl_family_rcv_msg+0x196/0x3a0
[  440.225184]        genl_rcv_msg+0x47/0x8e
[  440.225185]        netlink_rcv_skb+0x3a/0xf0
[  440.225187]        genl_rcv+0x24/0x40
[  440.225188]        netlink_unicast+0x16d/0x210
[  440.225189]        netlink_sendmsg+0x204/0x3b0
[  440.225191]        sock_sendmsg+0x36/0x40
[  440.225193]        ___sys_sendmsg+0x259/0x2b0
[  440.225194]        __sys_sendmsg+0x47/0x80
[  440.225196]        do_syscall_64+0x60/0x1f0
[  440.225197]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  440.225198]
               -> #0 (&(&q->lock)->rlock#2){+.-.}:
[  440.225200]        lock_acquire+0xb9/0x1a0
[  440.225202]        _raw_spin_lock_bh+0x34/0x40
[  440.225204]        mt76_wake_tx_queue+0x4c/0xb0 [mt76]
[  440.225215]        ieee80211_agg_start_txq+0xe8/0x2b0 [mac80211]
[  440.225225]        ieee80211_stop_tx_ba_cb+0xb8/0x1f0 [mac80211]
[  440.225235]        ieee80211_ba_session_work+0x1c1/0x2f0 [mac80211]
[  440.225236]        process_one_work+0x237/0x5d0
[  440.225237]        worker_thread+0x3c/0x390
[  440.225239]        kthread+0x11d/0x140
[  440.225240]        ret_from_fork+0x3a/0x50
[  440.225240]
               other info that might help us debug this:

[  440.225241] Chain exists of:
                 &(&q->lock)->rlock#2 --> &(&sta->rate_ctrl_lock)->rlock --> &(&sta->lock)->rlock

[  440.225243]  Possible unsafe locking scenario:

[  440.225244]        CPU0                    CPU1
[  440.225244]        ----                    ----
[  440.225245]   lock(&(&sta->lock)->rlock);
[  440.225245]                                lock(&(&sta->rate_ctrl_lock)->rlock);
[  440.225246]                                lock(&(&sta->lock)->rlock);
[  440.225247]   lock(&(&q->lock)->rlock#2);
[  440.225248]
                *** DEADLOCK ***

[  440.225249] 5 locks held by kworker/u16:28/2362:
[  440.225250]  #0: 0000000048fcd291 ((wq_completion)phy0){+.+.}, at: process_one_work+0x1b5/0x5d0
[  440.225252]  #1: 00000000f1c6828f ((work_completion)(&sta->ampdu_mlme.work)){+.+.}, at: process_one_work+0x1b5/0x5d0
[  440.225254]  #2: 00000000433d2b2c (&sta->ampdu_mlme.mtx){+.+.}, at: ieee80211_ba_session_work+0x5c/0x2f0 [mac80211]
[  440.225265]  #3: 000000002cfedc59 (&(&sta->lock)->rlock){+.-.}, at: ieee80211_stop_tx_ba_cb+0x32/0x1f0 [mac80211]
[  440.225276]  #4: 000000009d7b9a44 (rcu_read_lock){....}, at: ieee80211_agg_start_txq+0x33/0x2b0 [mac80211]
[  440.225286]
               stack backtrace:
[  440.225288] CPU: 2 PID: 2362 Comm: kworker/u16:28 Not tainted 5.1.0-rc2+ #22
[  440.225289] Hardware name: LENOVO 20KGS23S0P/20KGS23S0P, BIOS N23ET55W (1.30 ) 08/31/2018
[  440.225300] Workqueue: phy0 ieee80211_ba_session_work [mac80211]
[  440.225301] Call Trace:
[  440.225304]  dump_stack+0x85/0xc0
[  440.225306]  print_circular_bug.isra.38.cold.58+0x15c/0x195
[  440.225307]  check_prev_add.constprop.48+0x5f0/0xc00
[  440.225309]  ? check_prev_add.constprop.48+0x39d/0xc00
[  440.225311]  ? __lock_acquire+0x41d/0x1100
[  440.225312]  __lock_acquire+0xd98/0x1100
[  440.225313]  ? __lock_acquire+0x41d/0x1100
[  440.225315]  lock_acquire+0xb9/0x1a0
[  440.225317]  ? mt76_wake_tx_queue+0x4c/0xb0 [mt76]
[  440.225319]  _raw_spin_lock_bh+0x34/0x40
[  440.225321]  ? mt76_wake_tx_queue+0x4c/0xb0 [mt76]
[  440.225323]  mt76_wake_tx_queue+0x4c/0xb0 [mt76]
[  440.225334]  ieee80211_agg_start_txq+0xe8/0x2b0 [mac80211]
[  440.225344]  ieee80211_stop_tx_ba_cb+0xb8/0x1f0 [mac80211]
[  440.225354]  ieee80211_ba_session_work+0x1c1/0x2f0 [mac80211]
[  440.225356]  process_one_work+0x237/0x5d0
[  440.225358]  worker_thread+0x3c/0x390
[  440.225359]  ? wq_calc_node_cpumask+0x70/0x70
[  440.225360]  kthread+0x11d/0x140
[  440.225362]  ? kthread_create_on_node+0x40/0x40
[  440.225363]  ret_from_fork+0x3a/0x50

Cc: stable@vger.kernel.org
Fixes: 88046b2c9f6d ("mt76: add support for reporting tx status with skb")
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Acked-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
5 years agort2x00: do not increment sequence number while re-transmitting
Vijayakumar Durai [Wed, 27 Mar 2019 10:03:17 +0000 (11:03 +0100)]
rt2x00: do not increment sequence number while re-transmitting

Currently rt2x00 devices retransmit the management frames with
incremented sequence number if hardware is assigning the sequence.

This is HW bug fixed already for non-QOS data frames, but it should
be fixed for management frames except beacon.

Without fix retransmitted frames have wrong SN:

 AlphaNet_e8:fb:36 Vivotek_52:31:51 Authentication, SN=1648, FN=0, Flags=........C Frame is not being retransmitted 1648 1
 AlphaNet_e8:fb:36 Vivotek_52:31:51 Authentication, SN=1649, FN=0, Flags=....R...C Frame is being retransmitted 1649 1
 AlphaNet_e8:fb:36 Vivotek_52:31:51 Authentication, SN=1650, FN=0, Flags=....R...C Frame is being retransmitted 1650 1

With the fix SN stays correctly the same:

 88:6a:e3:e8:f9:a2 8c:f5:a3:88:76:87 Authentication, SN=1450, FN=0, Flags=........C
 88:6a:e3:e8:f9:a2 8c:f5:a3:88:76:87 Authentication, SN=1450, FN=0, Flags=....R...C
 88:6a:e3:e8:f9:a2 8c:f5:a3:88:76:87 Authentication, SN=1450, FN=0, Flags=....R...C

Cc: stable@vger.kernel.org
Signed-off-by: Vijayakumar Durai <vijayakumar.durai1@vivint.com>
[sgruszka: simplify code, change comments and changelog]
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
5 years agomt76: mt7603: send BAR after powersave wakeup
Felix Fietkau [Tue, 26 Mar 2019 08:34:21 +0000 (09:34 +0100)]
mt76: mt7603: send BAR after powersave wakeup

Now that the sequence number allocation is fixed, we can finally send a BAR
at powersave wakeup time to refresh the receiver side reorder window

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>