asedeno.scripts.mit.edu Git

net: updating dst lastusage is an unlikely event.

Since commit 0da4af00b2ed ("ipv6: only update __use and lastusetime
once per jiffy at most"), updating the dst lastuse field is an
unlikely action: it happens at most once per jiffy, out of
potentially millions of calls per second.

Mark explicitly the code as such, and let the compiler generate
better code.

Note: gcc 7.2 and several older versions do actually generate
different - better - code when the unlikely() hint is in place,
avoid jump in the fast path and keeping better code locality.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'tcp-smc-rendezvous'

Ursula Braun says:

====================
TCP experimental option for SMC rendezvous

SMC-capability is to be negotiated with a TCP experimental option.
As requested during code review of our previous approach using
netfilter hooks, here's a new version. It touches tcp-code in the
first patch and exploits the new tcp flag in the smc-code.

Changelog:

V3:
* move include for linux/unaligned/access_ok.h to tcp_input.c

V2:
* switch to current jump labels API
* remove static key checking in smc_set_capability()
  (comment from Eric Dumazet)
* use inet_request_sock parameter for smc_set_option_cond()
* smc_listen_work(): replace local variable lgr_lock_taken by new labels
                     and separate this change into a prerequisite first
                     patch
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

smc: add SMC rendezvous protocol

The SMC protocol [1] uses a rendezvous protocol to negotiate SMC
capability between peers. The current Linux implementation does not yet
use this rendezvous protocol and, thus, is not compliant to RFC7609 and
incompatible with other SMC implementations like in zOS.
This patch adds support for the SMC rendezvous protocol. It uses a new
TCP experimental option. With this option, SMC capabilities are
exchanged between the peers during the TCP three way handshake.

[1] SMC-R Informational RFC: http://www.rfc-editor.org/info/rfc7609

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: TCP experimental option for SMC

The SMC protocol [1] relies on the use of a new TCP experimental
option [2, 3]. With this option, SMC capabilities are exchanged
between peers during the TCP three way handshake. This patch adds
support for this experimental option to TCP.

References:
[1] SMC-R Informational RFC: http://www.rfc-editor.org/info/rfc7609
[2] Shared Use of TCP Experimental Options RFC 6994:
https://tools.ietf.org/rfc/rfc6994.txt
[3] IANA ExID SMCR:
http://www.iana.org/assignments/tcp-parameters/tcp-parameters.xhtml#tcp-exids

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

smc: fix mutex unlocks during link group creation

Link group creation is synchronized with the smc_create_lgr_pending
lock. In smc_listen_work() this mutex is sometimes unlocked, even
though it has not been locked before. This issue will surface in
presence of the SMC rendezvous code.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tools: bpftool: try to mount bpffs if required for pinning objects

One possible cause of failure for `bpftool {prog|map} pin * file FILE`
is the FILE not being in an eBPF virtual file system (bpffs). In this
case, make bpftool attempt to mount bpffs on the parent directory of the
FILE. Then, if this operation is successful, try again to pin the
object.

The code for mnt_bpffs() is a copy of function bpf_mnt_fs() from
iproute2 package (under lib/bpf.c, taken at commit 4b73d52f8a81), with
modifications regarding handling of error messages.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: fix a dangling pointer

tsk->group is set to grp earlier, but we forget to unset it
after grp is freed.

Fixes: 75da2163dbb6 ("tipc: introduce communication groups")
Reported-by: syzkaller bot
Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

vsock: always call vsock_init_tables()

Although CONFIG_VSOCKETS_DIAG depends on CONFIG_VSOCKETS,
vsock_init_tables() is not always called, it is called only
if other modules call its caller. Therefore if we only
enable CONFIG_VSOCKETS_DIAG, it would crash kernel on uninitialized
vsock_bind_table.

This patch fixes it by moving vsock_init_tables() to its own
module_init().

Fixes: 413a4317aca7 ("VSOCK: add sock_diag interface")
Reported-by: syzkaller bot
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: lan9303: Do not disable switch fabric port 0 at .probe

Make the LAN9303 work when lan9303_probe() is called twice.

For some unknown reason the LAN9303 switch fail to forward data when switch
fabric port 0 TX is disabled during probe. (Write of LAN9303_MAC_TX_CFG_0
in lan9303_disable_processing_port().)

In that situation the switch fabric seem to receive frames, because the ALR
is learning addresses. But no frames are transmitted on any of the ports.

In our system lan9303_probe() is called twice, first time
dsa_register_switch() return -EPROBE_DEFER. As an experiment, modified the
code to skip writing LAN9303_MAC_TX_CFG_0, port 0 during the first probe.
Then the switch works as expected.

Resolve the problem by not calling lan9303_disable_processing_port() on
port 0 during probe. Ports 1 and 2 are still disabled.

Although unsatisfying that the exact failure mechanism is not known,
the patch should not cause any harm.

Signed-off-by: Egil Hjelmeland <privat@egil-hjelmeland.no>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: fix overflow in collecting IBQ and OBQ dump

Destination buffer already has offset added. So, don't add offset
again.

Fetch actual size of configured OBQ from hardware, instead of using
hardcoded value.

Fixes: 7c075ce221cf ("cxgb4: collect IBQ and OBQ dumps")
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'hns3-fixes'

Lipeng says:

====================
net: hns3: fix some bugs for HNS3 driver

This patchset fixes some bugs reported by Hisilicon test team.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: fix the bug when reuse command description in hclge_add_mac_vlan_tbl

When reusing a command description read from HW, driver should set
IN_VLD bit, WR bit and NO_INTR bit. If IN_VLD bit and NO_INTR bit
are not set, the command fails and driver prints error message:

[ 135.261284] hns3 0000:7d:00.0: cmdq execute failed for get_mac_vlan_cmd_status,status=2.
[ 135.270983] hns3 0000:7d:00.0: add mac addr failed for cmd_send, ret =-5.

This patch fixes the bug.
Fixes: 46a3df9 (net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support)
Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: fix a bug in hclge_uninit_client_instance

HNS3 driver initialize hdev->roce_client and vport->roce.client in
hclge_init_client_instance, and need set hdev->roce_client and
vport->roce.client NULL.

If do not set them NULL when uninit, it will fail in the scene:
insmod hns3.ko, hns-roce.ko, hns-roce-hw-v3.ko successfully, but
rmmod hns3.ko after rmmod hns-roce-hw-v2.ko and hns-roce.ko.
This patch fixes the issue.

Fixes: 46a3df9 (net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support)
Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: add nic_client check when initialize roce base information

Roce driver works base on HNS3 driver.If insmod Roce driver before
NIC driver there is a error because do not check nic_client. This patch
adds nic_client check when initialize roce base information.

Fixes: 46a3df9 (net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support)
Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: fix the bug of hns3_set_txbd_baseinfo

The SC bits of TX BD mean switch control. For this area, value 0
indicates no switch control, the packet is routed according to the
forwarding table. Value 1 indicates that the packet is transmitted
to the network bypassing the forwarding table.

As HNS3 driver need support VF later, VF conmunicate with its own
PF need forwarding table. This patch sets SC bits of TX BD 0 and use
forwarding table.

Fixes: 76ad4f0 (net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC)
Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'dsa-dont-unmask-port-bitmaps'

Vivien Didelot says:

====================
net: dsa: don't unmask port bitmaps

DSA has several bitmaps to store the type of ports: cpu_port_mask,
dsa_port_mask and enabled_port_mask. But the code is inconsistently
unmasking them.

The legacy code tries to unmask cpu_port_mask and dsa_port_mask but
skips enabled_port_mask.

The new bindings unmasks cpu_port_mask and enabled_port_mask but skips
dsa_port_mask.

In fact there is no need to unmask them because we are in the error
path, and they won't be used after. Instead of fixing the unmasking,
simply remove them.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: don't unmask port bitmaps

The unapply functions are called on the error path.

As for dsa_port_mask, enabled_port_mask and cpu_port_mask won't be used
after so there's no need to unmask the corresponding port bit from them.

This makes dsa_cpu_port_unapply() and dsa_dsa_port_unapply() identical,
which can be factorized later.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: legacy: don't unmask port bitmaps

The legacy code does not unmask the cpu_port_mask and dsa_port_mask as
stated. But this is done on the error path and those masks won't be used
after that. So instead of fixing the bit operation, simply remove it.

Fixes: 83c0afaec7b7 ("net: dsa: Add new binding implementation")
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bcmgenet-start-stop-sequence-refinement'

Doug Berger says:

====================
net: bcmgenet: start/stop sequence refinement

This commit set is the result of an investigation into an issue that
occurred when bringing the interface up and down repeatedly with an
external 100BASE-T PHY. In some cases the MAC would experience mass
receive packet duplication that could in rare cases lead to a stall
from overflow. The fix for this is contained in the third commit.

The first 3 commits represent bug fixes that should be applied to the
net repository and are candidates for backporting to stable releases.
The remaining commits are enhancements which is why the set is being
submitted to net-next but they are implemented on top of the fixes.

The first fix is provided as justification for why the set isn't
split between a net submission and a net-next submission.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: bcmgenet: use dev->phydev instead of priv->phydev

Now that the software reset of the PHY has been removed it is no
longer necessary to retain a private pointer to the phydev for
use when the PHY is detached (which isn't generally safe anyway).

The driver now uses the phydev member attached to the net_device.

For ethtool commands that have a PHY component, an explicit check
is made to prevent accessing an invalid phydev pointer when one
is not attached (e.g. interface is down).

Signed-off-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Revert "net: bcmgenet: Software reset EPHY after power on"

With commit f7d72996e222 ("net: bcmgenet: enable loopback during
UniMAC sw_reset") it is no longer necessary to force the software
reset of the internal EPHY before resetting the UniMAC to ensure a
clean reset.

Therefore this commit reverts commit 5dbebbb44a6a ("net: bcmgenet:
Software reset EPHY after power on").

Signed-off-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bcmgenet: relax lock constraints to reduce IRQ latency

Since the ring locks are not used in a hard IRQ context it is often
not necessary to disable global IRQs while waiting on a lock.

Using less restrictive lock and unlock calls improves the real-time
responsiveness of the system.

Signed-off-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bcmgenet: rework bcmgenet_netif_start and bcmgenet_netif_stop

This commit consolidates more common functionality from
bcmgenet_close and bcmgenet_suspend into bcmgenet_netif_stop and
modifies the start and stop sequences to better suit the design
of the GENET hardware.

Signed-off-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bcmgenet: cleanup ring interrupt masking and unmasking

Since the NAPI interrupts are basically ignored when NAPI is
disabled we don't need to mask them within the functions
bcmgenet_disable_tx_napi() and bcmgenet_disable_rx_napi().
So wait until all NAPI instances are disabled and mask all of the
bcmgenet driver interrupts together in bcmgenet_netif_stop().

The interrupts can still be enabled in the functions
bcmgenet_enable_tx_napi() and bcmgenet_enable_rx_napi(), but use
the ring context int_enable() method to keep the functionality
consistent and the code cleaner.

Signed-off-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bcmgenet: move NAPI initialization to ring initialization

Since each ring has its own NAPI instance it might as well be
initialized along with the other ring context.

Signed-off-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bcmgenet: enable loopback during UniMAC sw_reset

It is necessary for the UniMAC to be clocked at least 5 cycles
while the sw_reset is asserted to ensure a clean reset.

It was discovered that this condition was not being met when
connected to an external RGMII PHY that disabled the Rx clock in
the Power Save state.

This commit modifies the reset_umac function to place the (RG)MII
interface into a local loopback mode where the Rx clock comes
from the GENET sourced Tx clk during the sw_reset to ensure the
presence and stability of the clock.

In addition, it turns out that the sw_reset of the UniMAC is not
self clearing, but this was masked by a bug in the timeout code.

The sw_reset is now explicitly cleared by zeroing the UMAC_CMD
register before returning from reset_umac which makes it no
longer necessary to do so in init_umac and makes the clearing of
CMD_TX_EN and CMD_RX_EN by umac_enable_set redundant. The
timeout code (and its associated bug) are removed so reset_umac
no longer needs to return a result, and that means init_umac
that calls reset_umac does not need to as well.

Fixes: 1c1008c793fa ("net: bcmgenet: add main driver file")
Signed-off-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bcmgenet: prevent duplicate calls of bcmgenet_dma_teardown

When bcmgenet_dma_teardown is called from bcmgenet_fini_dma it ends
up getting called twice from the bcmgenet_close and bcmgenet_suspend
functions (once directly and once inside the bcmgenet_fini_dma call).

This commit removes the call from bcmgenet_fini_dma and ensures that
bcmgenet_dma_teardown is called before bcmgenet_fini_dma in all paths
of execution.

Fixes: 4a0c081eff43 ("net: bcmgenet: call bcmgenet_dma_teardown in bcmgenet_fini_dma")
Signed-off-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bcmgenet: correct bad merge

As noted in the net-next submission for GENETv5 support [1], there
were merge conflicts with an earlier net submission [2] that had not
yet found its way to the net-next repository.

Unfortunately, when the branches were merged the conflicts were not
correctly resolved. This commit attempts to correct that.

[1] https://lkml.org/lkml/2017/3/13/1145
[2] https://lkml.org/lkml/2017/3/9/890

Fixes: 101c431492d2 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net")
Signed-off-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

macvlan: remove unused fields in struct macvlan_dev

commit 635b8c8ecdd2 ("tap: Renaming tap related APIs, data structures,
macros") captured all the tap related fields into a new struct tap_dev.
However, it failed to remove those fields from struct macvlan_dev.
Those fields are currently unused and must be removed. While there
I moved the comment for MAX_TAP_QUEUES to the right place.

Fixes: 635b8c8ecdd27142 (tap: Renaming tap related APIs, data structures, macros)
Signed-off-by: Girish Moodalbail <girish.moodalbail@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet: cavium: octeon: Switch to using netdev_info().

Signed-off-by: Steven J. Hill <Steven.Hill@cavium.com>
Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: eliminate KASAN warning

The following warning was reported by syzbot on Oct 24. 2017:
KASAN: slab-out-of-bounds Read in tipc_nametbl_lookup_dst_nodes

This is a harmless bug, but we still want to get rid of the warning,
so we swap the two conditions in question.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers/net: wan/sbni: Convert timers to use timer_setup()

In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.

Cc: David Howells <dhowells@redhat.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers/net: sis: Convert timers to use timer_setup()

In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.

Cc: Francois Romieu <romieu@fr.zoreil.com>
Cc: Daniele Venzano <venza@brownhat.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Daniele Venzano <venza@brownhat.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: atm/mpc: Stop using open-coded timer .data field

In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using an explicit static variable to hold
additional expiration details.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Bhumika Goyal <bhumirks@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Reshetova, Elena" <elena.reshetova@intel.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: af_packet: Convert timers to use timer_setup()

In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Mike Maloney <maloney@google.com>
Cc: Jarno Rajahalme <jarno@ovn.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hsr: Convert timers to use timer_setup()

In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.

Cc: Arvid Brodin <arvid.brodin@alten.se>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dccp: Convert timers to use timer_setup()

In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly. Adds a pointer back to the sock.

Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: dccp@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet/sfc: Convert timers to use timer_setup()

In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.

Cc: Solarflare linux maintainers <linux-net-drivers@solarflare.com>
Cc: Edward Cree <ecree@solarflare.com>
Cc: Bert Kenward <bkenward@solarflare.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: LLC: Convert timers to use timer_setup()

In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Hans Liljestrand <ishkamiel@gmail.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: "Reshetova, Elena" <elena.reshetova@intel.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ax25: Convert timers to use timer_setup()

In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.

Cc: Joerg Reuter <jreuter@yaina.de>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-hams@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sctp: Convert timers to use timer_setup()

In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.

Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-sctp@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-remove-rtmsg_ifinfo-used-in-bridge-and-bonding'

Xin Long says:

====================
net: remove rtmsg_ifinfo used in bridge and bonding

It's better to send notifications to userspace by the events in
rtnetlink_event, instead of calling rtmsg_ifinfo directly.

This patcheset is to remove rtmsg_ifinfo called in bonding and
bridge, the notifications can be handled by NETDEV_CHANGEUPPER
and NETDEV_CHANGELOWERSTATE events in rtnetlink_event.

It could also fix some redundant notifications from bonding and
bridge.

v1->v2:
  - post to net-next.git instead of net.git, for it's more like an
    improvement for bonding
v2->v3:
  - add patch 1/4 to remove rtmsg_ifinfo called in add_del_if
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: remove rtmsg_ifinfo called after bond_lower_state_changed

After the patch 'rtnetlink: bring NETDEV_CHANGELOWERSTATE event
process back to rtnetlink_event', bond_lower_state_changed would
generate NETDEV_CHANGEUPPER event which would send a notification
to userspace in rtnetlink_event.

There's no need to call rtmsg_ifinfo to send the notification
any more. So this patch is to remove it from these places after
bond_lower_state_changed.

Besides, after this, rtmsg_ifinfo is not needed to be exported.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

rtnetlink: bring NETDEV_CHANGELOWERSTATE event process back to rtnetlink_event

This patch is to bring NETDEV_CHANGELOWERSTATE event process back
to rtnetlink_event so that bonding could use it instead of calling
rtmsg_ifinfo to send a notification to userspace after netdev lower
state is changed in the later patch.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: remove rtmsg_ifinfo called in bond_master_upper_dev_link

Since commit 42e52bf9e3ae ("net: add netnotifier event for upper device
change"), netdev_master_upper_dev_link has generated NETDEV_CHANGEUPPER
event which would send a notification to userspace in rtnetlink_event.

There's no need to call rtmsg_ifinfo to send the notification any more.
So this patch is to remove it from bond_master_upper_dev_link as well
as bond_upper_dev_unlink to avoid the redundant notifications.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: remove rtmsg_ifinfo called in add_del_if

Since commit dc709f375743 ("rtnetlink: bring NETDEV_CHANGEUPPER event
process back in rtnetlink_event"), rtnetlink_event would process the
NETDEV_CHANGEUPPER event and send a notification to userspace.

In add_del_if, it would generate NETDEV_CHANGEUPPER event by whether
netdev_master_upper_dev_link or netdev_upper_dev_unlink. There's
no need to call rtmsg_ifinfo to send the notification any more.

So this patch is to remove it from add_del_if also to avoid redundant
notifications.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bpf-permit-multiple-bpf-attachments-for-a-single-perf-tracepoint-event'

Yonghong Song says:

====================
bpf: permit multiple bpf attachments for a single perf tracepoint event

This patch set adds support to permit multiple bpf prog attachments
for a single perf tracepoint event. Patch 1 does some cleanup such
that perf_event_{set|free}_bpf_handler is called under the
same condition. Patch 2 has the core implementation, and
Patch 3 adds a test case.

Changelogs:
v2 -> v3:
  . fix compilation error.
v1 -> v2:
  . fix a potential deadlock issue discovered by Daniel.
  . fix some coding style issues.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: add a test case to test single tp multiple bpf attachment

The bpf sample program syscall_tp is modified to
show attachment of more than bpf programs
for a particular kernel tracepoint.

Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: permit multiple bpf attachments for a single perf event

This patch enables multiple bpf attachments for a
kprobe/uprobe/tracepoint single trace event.
Each trace_event keeps a list of attached perf events.
When an event happens, all attached bpf programs will
be executed based on the order of attachment.

A global bpf_event_mutex lock is introduced to protect
prog_array attaching and detaching. An alternative will
be introduce a mutex lock in every trace_event_call
structure, but it takes a lot of extra memory.
So a global bpf_event_mutex lock is a good compromise.

The bpf prog detachment involves allocation of memory.
If the allocation fails, a dummy do-nothing program
will replace to-be-detached program in-place.

Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: use the same condition in perf event set/free bpf handler

This is a cleanup such that doing the same check in
perf_event_free_bpf_prog as we already do in
perf_event_set_bpf_prog step.

Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ip6_tunnel: Allow rcv/xmit even if remote address is a local address

Currently, ip6_tnl_xmit_ctl drops tunneled packets if the remote
address (outer v6 destination) is one of host's locally configured
addresses.
Same applies to ip6_tnl_rcv_ctl: it drops packets if the remote address
(outer v6 source) is a local address.

This prevents using ipxip6 (and ip6_gre) tunnels whose local/remote
endpoints are on same host; OTOH v4 tunnels (ipip or gre) allow such
configurations.

An example where this proves useful is a system where entities are
identified by their unique v6 addresses, and use tunnels to encapsulate
traffic between them. The limitation prevents placing several entities
on same host.

Introduce IP6_TNL_F_ALLOW_LOCAL_REMOTE which allows to bypass this
restriction.

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mlxsw-Various-fixes'

Jiri Pirko says:

====================
mlxsw: Various fixes

Yotam says:

Cleanup some issues reported by sparse.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum: mr_tcam: Include the mr_tcam header file

Make the spectrum_mr_tcam.c include the spectrum_mr_tcam.h header file.

Cleans up sparse warning:
symbol 'mlxsw_sp_mr_tcam_ops' was not declared. Should it be static?

Fixes: 0e14c7777acb6 ("mlxsw: spectrum: Add the multicast routing hardware logic")
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum: mr: Make the function mlxsw_sp_mr_dev_vif_lookup static

The function is only used internally in spectrum_mr.c and is not declared
in the header file, thus make it static.

Cleans up sparse warning:
symbol 'mlxsw_sp_mr_dev_vif_lookup' was not declared. Should it be static?

Fixes: c011ec1bbfd6 ("mlxsw: spectrum: Add the multicast routing offloading logic")
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum: mr: Fix various endianness issues

Fix various endianness issues in comparisons and assignments. The fix is
entirely cosmetic as all the values fixed are endianness-agnostic.

Cleans up sparse warnings:
spectrum_mr.c:156:49: warning: restricted __be32 degrades to integer
spectrum_mr.c:206:26: warning: restricted __be32 degrades to integer
spectrum_mr.c:212:31: warning: incorrect type in assignment (different
  base types)
spectrum_mr.c:212:31:    expected restricted __be32 [usertype] addr4
spectrum_mr.c:212:31:    got unsigned int
spectrum_mr.c:214:32: warning: incorrect type in assignment (different
  base types)
spectrum_mr.c:214:32:    expected restricted __be32 [usertype] addr4
spectrum_mr.c:214:32:    got unsigned int
spectrum_mr.c:461:16: warning: restricted __be32 degrades to integer
spectrum_mr.c:461:49: warning: restricted __be32 degrades to integer

Fixes: c011ec1bbfd6 ("mlxsw: spectrum: Add the multicast routing offloading logic")
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_dpipe: Fix entries dump of the adjacency table

During the dump the per netlink packet entry counter should be zeroed out
when new packet is created.

Fixes: 190d38a52a73 ("mlxsw: spectrum_dpipe: Add support for adjacency table dump")
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reported-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/sched: Fix actions list corruption when adding offloaded tc flows

Prior to commit b3f55bdda8df, the networking core doesn't wire an in-place
actions list the when the low level driver is called to offload the flow,
but all low level drivers do that (call tcf_exts_to_list()) in their
offloading "add" logic.

Now, the in-place list is set in the core which goes over the list in a loop,
but also by the hw driver when their offloading code is invoked indirectly:

cls_xxx add flow -> tc_setup_cb_call -> tc_exts_setup_cb_egdev_call -> hw driver

which messes up the core list instance upon driver return. Fix that by avoiding
in-place list on the net core code that deals with adding flows.

Fixes: b3f55bdda8df ('net: sched: introduce per-egress action device callbacks')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

liquidio: pass date and time info to NIC firmware

Pass date and time information to NIC at the time of loading
firmware and periodically update the host time to NIC firmware.
This is to make NIC firmware use the same time reference as Host,
so that it is easy to correlate logs from firmware and host for debugging.

Signed-off-by: Veerasenareddy Burru <veerasenareddy.burru@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: add ip6_null_entry check in rt6_select()

In rt6_select(), fn->leaf could be pointing to net->ipv6.ip6_null_entry.
In this case, we should directly return instead of trying to carry on
with the rest of the process.
If not, we could crash at:
  spin_lock_bh(&leaf->rt6i_table->rt6_lock);
because net->ipv6.ip6_null_entry does not have rt6i_table set.

Syzkaller recently reported following issue on net-next:
Use struct sctp_sack_info instead
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
sctp: [Deprecated]: syz-executor4 (pid 26496) Use of struct sctp_assoc_value in delayed_ack socket option.
Use struct sctp_sack_info instead
CPU: 1 PID: 26523 Comm: syz-executor6 Not tainted 4.14.0-rc4+ #85
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
task: ffff8801d147e3c0 task.stack: ffff8801a4328000
RIP: 0010:debug_spin_lock_before kernel/locking/spinlock_debug.c:83 [inline]
RIP: 0010:do_raw_spin_lock+0x23/0x1e0 kernel/locking/spinlock_debug.c:112
RSP: 0018:ffff8801a432ed70 EFLAGS: 00010207
RAX: dffffc0000000000 RBX: 0000000000000018 RCX: 0000000000000000
RDX: 0000000000000003 RSI: 0000000000000000 RDI: 000000000000001c
RBP: ffff8801a432ed90 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: ffffffff8482b279 R12: ffff8801ce2ff3a0
sctp: [Deprecated]: syz-executor1 (pid 26546) Use of int in maxseg socket option.
Use struct sctp_assoc_value instead
R13: dffffc0000000000 R14: ffff8801d971e000 R15: ffff8801ce2ff0d8
FS:  00007f56e82f5700(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001ddbc22000 CR3: 00000001a4a04000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
__raw_spin_lock_bh include/linux/spinlock_api_smp.h:136 [inline]
_raw_spin_lock_bh+0x39/0x40 kernel/locking/spinlock.c:175
spin_lock_bh include/linux/spinlock.h:321 [inline]
rt6_select net/ipv6/route.c:786 [inline]
ip6_pol_route+0x1be3/0x3bd0 net/ipv6/route.c:1650
sctp: [Deprecated]: syz-executor1 (pid 26576) Use of int in maxseg socket option.
Use struct sctp_assoc_value instead
TCP: request_sock_TCPv6: Possible SYN flooding on port 20002. Sending cookies.  Check SNMP counters.
ip6_pol_route_output+0x4c/0x60 net/ipv6/route.c:1843
fib6_rule_lookup+0x9e/0x2a0 net/ipv6/ip6_fib.c:309
ip6_route_output_flags+0x1f1/0x2b0 net/ipv6/route.c:1871
ip6_route_output include/net/ip6_route.h:80 [inline]
ip6_dst_lookup_tail+0x4ea/0x970 net/ipv6/ip6_output.c:953
ip6_dst_lookup_flow+0xc8/0x270 net/ipv6/ip6_output.c:1076
sctp_v6_get_dst+0x675/0x1c30 net/sctp/ipv6.c:274
sctp_transport_route+0xa8/0x430 net/sctp/transport.c:287
sctp_assoc_add_peer+0x4fe/0x1100 net/sctp/associola.c:656
__sctp_connect+0x251/0xc80 net/sctp/socket.c:1187
sctp_connect+0xb4/0xf0 net/sctp/socket.c:4209
inet_dgram_connect+0x16b/0x1f0 net/ipv4/af_inet.c:541
SYSC_connect+0x20a/0x480 net/socket.c:1642
SyS_connect+0x24/0x30 net/socket.c:1623
entry_SYSCALL_64_fastpath+0x1f/0xbe

Fixes: 66f5d6ce53e6 ("ipv6: replace rwlock with rcu and spinlock in fib6_table")
Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: Configure TFO without cookie per socket and/or per route

We already allow to enable TFO without a cookie by using the
fastopen-sysctl and setting it to TFO_SERVER_COOKIE_NOT_REQD (or
TFO_CLIENT_NO_COOKIE).
This is safe to do in certain environments where we know that there
isn't a malicous host (aka., data-centers) or when the
application-protocol already provides an authentication mechanism in the
first flight of data.

A server however might be providing multiple services or talking to both
sides (public Internet and data-center). So, this server would want to
enable cookie-less TFO for certain services and/or for connections that
go to the data-center.

This patch exposes a socket-option and a per-route attribute to enable such
fine-grained configurations.

Signed-off-by: Christoph Paasch <cpaasch@apple.com>
Reviewed-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/sock: Update sk rcu iterator macro.

Mark hlist node in sk rcu iterator as protected by the rcu.
hlist_next_rcu accomplishes this and silences the warnings
sparse throws.

Found with make C=1 net/ipv4/udp.o on linux-next tag
next-20171009.

Signed-off-by: Tim Hansen <devtimhansen@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: tcp_minisocks: use BUG_ON instead of if condition followed by BUG

Use BUG_ON instead of if condition followed by BUG in tcp_time_wait.

This issue was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: icmp: use BUG_ON instead of if condition followed by BUG

Use BUG_ON instead of if condition followed by BUG in icmp_timestamp.

This issue was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: cpumap fix potential lost wake-up problem

As pointed out by Michael, commit 1c601d829ab0 ("bpf: cpumap xdp_buff
to skb conversion and allocation") contains a classical example of the
potential lost wake-up problem.

We need to recheck the condition __ptr_ring_empty() after changing
current->state to TASK_INTERRUPTIBLE, this avoids a race between
wake_up_process() and schedule(). After this, a race with
wake_up_process() will simply change the state to TASK_RUNNING, and
the schedule() call not really put us to sleep.

Fixes: 1c601d829ab0 ("bpf: cpumap xdp_buff to skb conversion and allocation")
Reported-by: "Michael S. Tsirkin" <mst@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: smc_close: mark expected switch fall-through

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Notice that in this particular case I placed the "fall through" comment
on its own line, which is what GCC is expecting to find.

Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: rxrpc: mark expected switch fall-throughs

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ipv6-addrconf-hash-improvements-and-cleanups'

Eric Dumazet says:

====================
ipv6: addrconf: hash improvements and cleanups

Remove unecessary BH blocking, and bring IPv6 addrconf to modern world,
with per netns hash perturbation and decent hash size.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: addrconf: do not block BH in ipv6_chk_home_addr()

rcu_read_lock() is enough here, no need to block BH.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: addrconf: do not block BH in /proc/net/if_inet6 handling

Table is really RCU protected, no need to block BH

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: addrconf: do not block BH in ipv6_get_ifaddr()

rcu_read_lock() is enough here, no need to block BH.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: addrconf: do not block BH in ipv6_chk_addr_and_flags()

rcu_read_lock() is enough here, as inet6_ifa_finish_destroy()
uses kfree_rcu()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: addrconf: add per netns perturbation in inet6_addr_hash()

Bring IPv6 in par with IPv4 :

- Use net_hash_mix() to spread addresses a bit more.
- Use 256 slots hash table instead of 16

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: addrconf: factorize inet6_addr_hash() call

ipv6_add_addr_hash() can compute the hash value outside of
locked section and pass it to ipv6_chk_same_addr().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: addrconf: move ipv6_chk_same_addr() to avoid forward declaration

ipv6_chk_same_addr() is only used by ipv6_add_addr_hash(),
so moving it avoids a forward declaration.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'nfp-bpf-stack-support-in-offload'

Jakub Kicinski says:

====================
nfp: bpf: stack support in offload

This series brings stack support for offload.

We use the LMEM (Local memory) register file as memory to store
the stack.  Since this is a register file we need to do appropriate
shifts on unaligned accesses.  Verifier's state tracking helps us
with that.

LMEM can't be accessed directly, so we add support for setting
pointer registers through which one can read/write LMEM.

This set does not support accessing the stack when the alignment
is not known.  This can be added later (most likely using the byte_align
instructions).  There is also a number of optimizations which have been
left out:
- in more complex non aligned accesses, double shift and rotation
   can save us a cycle.  This, however, leads to code explosion
   since all access sizes have to be coded separately;
- since setting LM pointers costs around 5 cycles, we should be
   tracking their values to make sure we don't move them when
   they're already set correctly for earlier access;
- in case of 8 byte access aligned to 4 bytes and crossing
   32 byte boundary but not crossing a 64 byte boundary we don't
   have to increment the pointer, but this seems like a pretty
   rare case to justify the added complexity.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: bpf: optimize mov64 a little

Loading 64bit constants require up to 4 load immediates, since
we can only load 16 bits at a time. If the 32bit halves of
the 64bit constant are the same, however, we can save a cycle
by doing a register move instead of two loads of 16 bits.

Note that we don't optimize the normal ALU64 load because even
though it's a 64 bit load the upper half of the register is
a coming from sign extension so we can load it in one cycle
anyway.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: bpf: support stack accesses via non-constant pointers

If stack pointer has a different value on different paths
but the alignment to words (4B) remains the same, we can
set a new LMEM access pointer to the calculated value and
access whichever word it's pointing to.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: bpf: support accessing the stack beyond 64 bytes

To access beyond 64th byte of the stack we need to set a new
stack pointer register (LMEM is accessed indirectly through
those pointers).  Add a function for encoding local CSR access
instruction.  Use stack pointer number 3.

Note that stack pointer registers allow us to index into 32
bytes of LMEM (with shift operations i.e. when operands are
restricted).  This means if access is crossing 32 byte boundary
we must not use offsetting, we have to set the pointer to the
exact address and move it with post-increments.

We depend on the datapath placing the stack base address in
GPR A22 for our use.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: bpf: allow stack accesses via modified stack registers

As long as the verifier tells us the stack offset exactly we
can render the LMEM reads quite easily. Simply make sure that
the offset is constant for a given instruction and add it to
the instruction's offset.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: bpf: optimize the RMW for stack accesses

When we are performing unaligned stack accesses in the 32-64B window
we have to do a read-modify-write cycle.  E.g. for reading 8 bytes
from address 17:

0:  tmp    = stack[16]
1:  gprLo  = tmp >> 8
2:  tmp    = stack[20]
3:  gprLo |= tmp << 24
4:  tmp    = stack[20]
5:  gprHi  = tmp >> 8
6:  tmp    = stack[24]
7:  gprHi |= tmp << 24

The load on line 4 is unnecessary, because tmp already contains data
from stack[20].

For write we can optimize both loads and writebacks away.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: bpf: add stack read support

Add simple stack read support, similar to write in every aspect,
but data flowing the other way. Note that unlike write which can
be done in smaller than word quantities, if registers are loaded
with less-than-word of stack contents - the values have to be
zero extended.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: bpf: add stack write support

Stack is implemented by the LMEM register file.  Unaligned accesses
to LMEM are not allowed.  Accesses also have to be 4B wide.

To support stack we need to make sure offsets of pointers are known
at translation time (for now) and perform correct load/mask/shift
operations.

Since we can access first 64B of LMEM without much effort support
only stacks not bigger than 64B.  Following commits will extend
the possible sizes beyond that.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: bpf: refactor nfp_bpf_check_ptr()

nfp_bpf_check_ptr() mostly looks at the pointer register.
Add a temporary variable to shorten the code.

While at it make sure we print error messages if translation
fails to help users identify the problem (to be carried in
ext_ack in due course).

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: bpf: add helper for emitting nops

The need to emitting a few nops will become more common soon
as we add stack and map support. Add a helper. This allows
for code to be shorter but also may be handy for marking the
nops with a "reason" to ease applying optimizations.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bpftool-JSON'

Jakub Kicinski says:

====================
tools: bpftool: Add JSON output to bpftool

Quentin says:

This series introduces support for JSON output to all bpftool commands. It
adds option parsing, and several options are created:

  * -j, --json     Switch to JSON output.
  * -p, --pretty   Switch to JSON and print it in a human-friendly fashion.
  * -h, --help     Print generic help message.
  * -V, --version  Print version number.

This code uses a "json_writer", which is a copy of the one written by
Stephen Hemminger in iproute2.
---
I don't know if there is an easy way to share the code for json_write
without copying the file, so I am very open to suggestions on this matter.
====================

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

tools: bpftool: update documentation for --json and --pretty usage

Update the documentation to provide help about JSON output generation,
and add an example in bpftool-prog manual page.

Also reintroduce an example that was left aside when the tool was moved
from GitHub to the kernel sources, in order to show how to mount the
bpffs file system (to pin programs) inside the bpftool-prog manual page.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

tools: bpftool: add cosmetic changes for the manual pages

Make the look-and-feel of the manual pages somewhat closer to other
manual pages, such as the ones from the utilities from iproute2, by
highlighting more keywords.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

tools: bpftool: provide JSON output for all possible commands

As all commands can now return JSON output (possibly just a "null"
value), output of `bpftool --json batch file FILE` should also be fully
JSON compliant.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

tools: bpftool: turn err() and info() macros into functions

Turn err() and info() macros into functions.

In order to avoid naming conflicts with variables in the code, rename
them as p_err() and p_info() respectively.

The behavior of these functions is similar to the one of the macros for
plain output. However, when JSON output is requested, these macros
return a JSON-formatted "error" object instead of printing a message to
stderr.

To handle error messages correctly with JSON, a modification was brought
to their behavior nonetheless: the functions now append a end-of-line
character at the end of the message. This way, we can remove end-of-line
characters at the end of the argument strings, and not have them in the
JSON output.

All error messages are formatted to hold in a single call to p_err(), in
order to produce a single JSON field.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

tools: bpftool: add JSON output for `bpftool batch file FILE` command

`bpftool batch file FILE` takes FILE as an argument and executes all the
bpftool commands it finds inside (or stops if an error occurs).

To obtain a consistent JSON output, create a root JSON array, then for
each command create a new object containing two fields: one with the
command arguments, the other with the output (which is the JSON object
that the command would have produced, if called on its own).

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

tools: bpftool: add JSON output for `bpftool map *` commands

Reuse the json_writer API introduced in an earlier commit to make
bpftool able to generate JSON output on
`bpftool map { show | dump | lookup | getnext }` commands. Remaining
commands produce no output.

Some functions have been spit into plain-output and JSON versions in
order to remain readable.

Outputs for sample maps have been successfully tested against a JSON
validator.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

tools: bpftool: add JSON output for `bpftool prog dump xlated *` command

Add a new printing function to dump translated eBPF instructions as
JSON. As for plain output, opcodes are printed only on request (when
`opcodes` is provided on the command line).

The disassembled output is generated by the same code that is used by
the kernel verifier.

Example output:

    $ bpftool --json --pretty prog dump xlated id 1
    [{
            "disasm": "(bf) r6 = r1"
        },{
            "disasm": "(61) r7 = *(u32 *)(r6 +16)"
        },{
            "disasm": "(95) exit"
        }
    ]

    $ bpftool --json --pretty prog dump xlated id 1 opcodes
    [{
            "disasm": "(bf) r6 = r1",
            "opcodes": {
                "code": "0xbf",
                "src_reg": "0x1",
                "dst_reg": "0x6",
                "off": ["0x00","0x00"
                ],
                "imm": ["0x00","0x00","0x00","0x00"
                ]
            }
        },{
            "disasm": "(61) r7 = *(u32 *)(r6 +16)",
            "opcodes": {
                "code": "0x61",
                "src_reg": "0x6",
                "dst_reg": "0x7",
                "off": ["0x10","0x00"
                ],
                "imm": ["0x00","0x00","0x00","0x00"
                ]
            }
        },{
            "disasm": "(95) exit",
            "opcodes": {
                "code": "0x95",
                "src_reg": "0x0",
                "dst_reg": "0x0",
                "off": ["0x00","0x00"
                ],
                "imm": ["0x00","0x00","0x00","0x00"
                ]
            }
        }
    ]

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

tools: bpftool: add JSON output for `bpftool prog dump jited *` command

Reuse the json_writer API introduced in an earlier commit to make
bpftool able to generate JSON output on `bpftool prog show *` commands.
A new printing function is created to be passed as an argument to the
disassembler.

Similarly to plain output, opcodes are printed on request.

Outputs from sample programs have been successfully tested against a
JSON validator.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

tools: bpftool: add JSON output for `bpftool prog show *` command

Reuse the json_writer API introduced in an earlier commit to make
bpftool able to generate JSON output on `bpftool prog show *` commands.

For readability, the code from show_prog() has been split into two
functions, one for plain output, one for JSON.

Outputs from sample programs have been successfully tested against a
JSON validator.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

tools: bpftool: introduce --json and --pretty options

These two options can be used to ask for a JSON output (--j or -json),
and to make this JSON human-readable (-p or --pretty).

A json_writer object is created when JSON is required, and will be used
in follow-up commits to produce JSON output.

Note that --pretty implies --json.

Update for the manual pages and interactive help messages comes in a
later patch of the series.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

tools: bpftool: add option parsing to bpftool, --help and --version

Add an option parsing facility to bpftool, in prevision of future
options for demanding JSON output. Currently, two options are added:
--help and --version, that act the same as the respective commands
`help` and `version`.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

tools: bpftool: copy JSON writer from iproute2 repository

In prevision of following commits, supposed to add JSON output to the
tool, two files are copied from the iproute2 repository (taken at commit
268a9eee985f): lib/json_writer.c and include/json_writer.h.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'tcp-tracepoints'

Song Liu says:

====================
net: add a set of tracepoints to tcp stack

Changes from v1:

Fix build error (with ipv6 as ko) by adding EXPORT_TRACEPOINT_SYMBOL_GPL
for trace_tcp_send_reset.

These patches add the following tracepoints to tcp stack.

tcp_send_reset
tcp_receive_reset
tcp_destroy_sock
tcp_set_state

These tracepoints can be used to track TCP state changes. Such state
changes include but are not limited to: connection establish,
connection termination, tx and rx of RST, various retransmits.

Currently, we use the following kprobes to trace these events:

int kprobe__tcp_validate_incoming
int kprobe__tcp_send_active_reset
int kprobe__tcp_v4_send_reset
int kprobe__tcp_v6_send_reset
int kprobe__tcp_v4_destroy_sock
int kprobe__tcp_set_state
int kprobe__tcp_retransmit_skb

These tracepoints will help us simplify this work.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: add tracepoint trace_tcp_set_state()

This patch adds tracepoint trace_tcp_set_state. Besides usual fields
(s/d ports, IP addresses), old and new state of the socket is also
printed with TP_printk, with __print_symbolic().

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: add tracepoint trace_tcp_destroy_sock

This patch adds trace event trace_tcp_destroy_sock.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>