]> asedeno.scripts.mit.edu Git - linux.git/log
linux.git
5 years agonetfilter: bridge: add connection tracking system
Pablo Neira Ayuso [Wed, 29 May 2019 11:25:37 +0000 (13:25 +0200)]
netfilter: bridge: add connection tracking system

This patch adds basic connection tracking support for the bridge,
including initial IPv4 support.

This patch register two hooks to deal with the bridge forwarding path,
one from the bridge prerouting hook to call nf_conntrack_in(); and
another from the bridge postrouting hook to confirm the entry.

The conntrack bridge prerouting hook defragments packets before passing
them to nf_conntrack_in() to look up for an existing entry, otherwise a
new entry is allocated and it is attached to the skbuff. The conntrack
bridge postrouting hook confirms new conntrack entries, ie. if this is
the first packet seen, then it adds the entry to the hashtable and (if
needed) it refragments the skbuff into the original fragments, leaving
the geometry as is if possible. Exceptions are linearized skbuffs, eg.
skbuffs that are passed up to nfqueue and conntrack helpers, as well as
cloned skbuff for the local delivery (eg. tcpdump), also in case of
bridge port flooding (cloned skbuff too).

The packet defragmentation is done through the ip_defrag() call.  This
forces us to save the bridge control buffer, reset the IP control buffer
area and then restore it after call. This function also bumps the IP
fragmentation statistics, it would be probably desiderable to have
independent statistics for the bridge defragmentation/refragmentation.
The maximum fragment length is stored in the control buffer and it is
used to refragment the skbuff from the postrouting path.

The new fraglist splitter and fragment transformer APIs are used to
implement the bridge refragmentation code. The br_ip_fragment() function
drops the packet in case the maximum fragment size seen is larger than
the output port MTU.

This patchset follows the principle that conntrack should not drop
packets, so users can do it through policy via invalid state matching.

Like br_netfilter, there is no refragmentation for packets that are
passed up for local delivery, ie. prerouting -> input path. There are
calls to nf_reset() already in several spots in the stack since time ago
already, eg. af_packet, that show that skbuff fraglist handling from the
netif_rx path is supported already.

The helpers are called from the postrouting hook, before confirmation,
from there we may see packet floods to bridge ports. Then, although
unlikely, this may result in exercising the helpers many times for each
clone. It would be good to explore how to pass all the packets in a list
to the conntrack hook to do this handle only once for this case.

Thanks to Florian Westphal for handing me over an initial patchset
version to add support for conntrack bridge.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonetfilter: nf_conntrack: allow to register bridge support
Pablo Neira Ayuso [Wed, 29 May 2019 11:25:36 +0000 (13:25 +0200)]
netfilter: nf_conntrack: allow to register bridge support

This patch adds infrastructure to register and to unregister bridge
support for the conntrack module via nf_ct_bridge_register() and
nf_ct_bridge_unregister().

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ipv4: place control buffer handling away from fragmentation iterators
Pablo Neira Ayuso [Wed, 29 May 2019 11:25:35 +0000 (13:25 +0200)]
net: ipv4: place control buffer handling away from fragmentation iterators

Deal with the IPCB() area away from the iterators.

The bridge codebase has its own control buffer layout, move specific
IP control buffer handling into the IPv4 codepath.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ipv6: split skbuff into fragments transformer
Pablo Neira Ayuso [Wed, 29 May 2019 11:25:34 +0000 (13:25 +0200)]
net: ipv6: split skbuff into fragments transformer

This patch exposes a new API to refragment a skbuff. This allows you to
split either a linear skbuff or to force the refragmentation of an
existing fraglist using a different mtu. The API consists of:

* ip6_frag_init(), that initializes the internal state of the transformer.
* ip6_frag_next(), that allows you to fetch the next fragment. This function
  internally allocates the skbuff that represents the fragment, it pushes
  the IPv6 header, and it also copies the payload for each fragment.

The ip6_frag_state object stores the internal state of the splitter.

This code has been extracted from ip6_fragment(). Symbols are also
exported to allow to reuse this iterator from the bridge codepath to
build its own refragmentation routine by reusing the existing codebase.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ipv4: split skbuff into fragments transformer
Pablo Neira Ayuso [Wed, 29 May 2019 11:25:33 +0000 (13:25 +0200)]
net: ipv4: split skbuff into fragments transformer

This patch exposes a new API to refragment a skbuff. This allows you to
split either a linear skbuff or to force the refragmentation of an
existing fraglist using a different mtu. The API consists of:

* ip_frag_init(), that initializes the internal state of the transformer.
* ip_frag_next(), that allows you to fetch the next fragment. This function
  internally allocates the skbuff that represents the fragment, it pushes
  the IPv4 header, and it also copies the payload for each fragment.

The ip_frag_state object stores the internal state of the splitter.

This code has been extracted from ip_do_fragment(). Symbols are also
exported to allow to reuse this iterator from the bridge codepath to
build its own refragmentation routine by reusing the existing codebase.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ipv6: add skbuff fraglist splitter
Pablo Neira Ayuso [Wed, 29 May 2019 11:25:32 +0000 (13:25 +0200)]
net: ipv6: add skbuff fraglist splitter

This patch adds the skbuff fraglist split iterator. This API provides an
iterator to transform the fraglist into single skbuff objects, it
consists of:

* ip6_fraglist_init(), that initializes the internal state of the
  fraglist iterator.
* ip6_fraglist_prepare(), that restores the IPv6 header on the fragment.
* ip6_fraglist_next(), that retrieves the fragment from the fraglist and
  updates the internal state of the iterator to point to the next
  fragment in the fraglist.

The ip6_fraglist_iter object stores the internal state of the iterator.

This code has been extracted from ip6_fragment(). Symbols are also
exported to allow to reuse this iterator from the bridge codepath to
build its own refragmentation routine by reusing the existing codebase.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ipv4: add skbuff fraglist splitter
Pablo Neira Ayuso [Wed, 29 May 2019 11:25:31 +0000 (13:25 +0200)]
net: ipv4: add skbuff fraglist splitter

This patch adds the skbuff fraglist splitter. This API provides an
iterator to transform the fraglist into single skbuff objects, it
consists of:

* ip_fraglist_init(), that initializes the internal state of the
  fraglist splitter.
* ip_fraglist_prepare(), that restores the IPv4 header on the
  fragments.
* ip_fraglist_next(), that retrieves the fragment from the fraglist and
  it updates the internal state of the splitter to point to the next
  fragment skbuff in the fraglist.

The ip_fraglist_iter object stores the internal state of the iterator.

This code has been extracted from ip_do_fragment(). Symbols are also
exported to allow to reuse this iterator from the bridge codepath to
build its own refragmentation routine by reusing the existing codebase.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'add-TFO-backup-key'
David S. Miller [Thu, 30 May 2019 20:41:26 +0000 (13:41 -0700)]
Merge branch 'add-TFO-backup-key'

Jason Baron says:

====================
add TFO backup key

Christoph, Igor, and I have worked on an API that facilitates TFO key
rotation. This is a follow up to the series that Christoph previously
posted, with an API that meets both of our use-cases. Here's a
link to the previous work:
https://patchwork.ozlabs.org/cover/1013753/

Changes in v2:
  -spelling fixes in ip-sysctl.txt (Jeremy Sowden)
  -re-base to latest net-next
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests/net: add TFO key rotation selftest
Jason Baron [Wed, 29 May 2019 16:34:01 +0000 (12:34 -0400)]
selftests/net: add TFO key rotation selftest

Demonstrate how the primary and backup TFO keys can be rotated while
minimizing the number of client cookies that are rejected.

Signed-off-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoDocumentation: ip-sysctl.txt: Document tcp_fastopen_key
Jason Baron [Wed, 29 May 2019 16:34:00 +0000 (12:34 -0400)]
Documentation: ip-sysctl.txt: Document tcp_fastopen_key

Add docs for /proc/sys/net/ipv4/tcp_fastopen_key

Signed-off-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
Cc: Jeremy Sowden <jeremy@azazel.net>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotcp: add support for optional TFO backup key to net.ipv4.tcp_fastopen_key
Jason Baron [Wed, 29 May 2019 16:33:59 +0000 (12:33 -0400)]
tcp: add support for optional TFO backup key to net.ipv4.tcp_fastopen_key

Add the ability to add a backup TFO key as:

# echo "x-x-x-x,x-x-x-x" > /proc/sys/net/ipv4/tcp_fastopen_key

The key before the comma acks as the primary TFO key and the key after the
comma is the backup TFO key. This change is intended to be backwards
compatible since if only one key is set, userspace will simply read back
that single key as follows:

# echo "x-x-x-x" > /proc/sys/net/ipv4/tcp_fastopen_key
# cat /proc/sys/net/ipv4/tcp_fastopen_key
x-x-x-x

Signed-off-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotcp: add support to TCP_FASTOPEN_KEY for optional backup key
Jason Baron [Wed, 29 May 2019 16:33:58 +0000 (12:33 -0400)]
tcp: add support to TCP_FASTOPEN_KEY for optional backup key

Add support for get/set of an optional backup key via TCP_FASTOPEN_KEY, in
addition to the current 'primary' key. The primary key is used to encrypt
and decrypt TFO cookies, while the backup is only used to decrypt TFO
cookies. The backup key is used to maximize successful TFO connections when
TFO keys are rotated.

Currently, TCP_FASTOPEN_KEY allows a single 16-byte primary key to be set.
This patch now allows a 32-byte value to be set, where the first 16 bytes
are used as the primary key and the second 16 bytes are used for the backup
key. Similarly, for getsockopt(), we can receive a 32-byte value as output
if requested. If a 16-byte value is used to set the primary key via
TCP_FASTOPEN_KEY, then any previously set backup key will be removed.

Signed-off-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotcp: add backup TFO key infrastructure
Jason Baron [Wed, 29 May 2019 16:33:57 +0000 (12:33 -0400)]
tcp: add backup TFO key infrastructure

We would like to be able to rotate TFO keys while minimizing the number of
client cookies that are rejected. Currently, we have only one key which can
be used to generate and validate cookies, thus if we simply replace this
key clients can easily have cookies rejected upon rotation.

We propose having the ability to have both a primary key and a backup key.
The primary key is used to generate as well as to validate cookies.
The backup is only used to validate cookies. Thus, keys can be rotated as:

1) generate new key
2) add new key as the backup key
3) swap the primary and backup key, thus setting the new key as the primary

We don't simply set the new key as the primary key and move the old key to
the backup slot because the ip may be behind a load balancer and we further
allow for the fact that all machines behind the load balancer will not be
updated simultaneously.

We make use of this infrastructure in subsequent patches.

Suggested-by: Igor Lubashev <ilubashe@akamai.com>
Signed-off-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotcp: introduce __tcp_fastopen_cookie_gen_cipher()
Christoph Paasch [Wed, 29 May 2019 16:33:56 +0000 (12:33 -0400)]
tcp: introduce __tcp_fastopen_cookie_gen_cipher()

Restructure __tcp_fastopen_cookie_gen() to take a 'struct crypto_cipher'
argument and rename it as __tcp_fastopen_cookie_gen_cipher(). Subsequent
patches will provide different ciphers based on which key is being used for
the cookie generation.

Signed-off-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: Jason Baron <jbaron@akamai.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'mlxsw-Hardware-monitoring-enhancements'
David S. Miller [Thu, 30 May 2019 19:59:53 +0000 (12:59 -0700)]
Merge branch 'mlxsw-Hardware-monitoring-enhancements'

Ido Schimmel says:

====================
mlxsw: Hardware monitoring enhancements

This patchset from Vadim provides various hardware monitoring related
improvements for mlxsw.

Patch #1 allows querying firmware version from the switch driver when
the underlying bus is I2C. This is useful for baseboard management
controller (BMC) systems that communicate with the ASIC over I2C.

Patch #2 improves driver's performance over I2C by utilizing larger
transactions sizes, if possible.

Patch #3 re-orders driver's initialization sequence to enforce a
specific firmware version before new firmware features are utilized.
This is a prerequisite for patches #4-#6.

Patches #4-#6 expose the temperature of inter-connect devices
(gearboxes) that are present in Mellanox SN3800 systems and split
2x50Gb/s lanes to 4x25Gb/s lanes.

Patches #7-#8 reduce the transaction size when reading SFP modules
temperatures, which is crucial when working over I2C.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: core: Reduce buffer size in transactions for SFP modules temperature readout
Vadim Pasternak [Wed, 29 May 2019 08:47:22 +0000 (11:47 +0300)]
mlxsw: core: Reduce buffer size in transactions for SFP modules temperature readout

Obtain SFP modules temperatures through MTMP register instead of MTBR
register, because the first one utilizes shorter transaction buffer size
for request. It improves performance in case low frequency interface
(I2C) is used for communication with a chip.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: core: Extend the index size for temperature sensors readout
Vadim Pasternak [Wed, 29 May 2019 08:47:21 +0000 (11:47 +0300)]
mlxsw: core: Extend the index size for temperature sensors readout

Extend sensor index size for Management Temperature Bulk Register
(MTBR) and Management Temperature Register (MTMP) upto 12 bits in
order to align registers description with new version of PRM document.
Add define for base sensor index for SFP modules temperature reading
for MTMP register.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: core: Extend hwmon interface with inter-connect temperature attributes
Vadim Pasternak [Wed, 29 May 2019 08:47:20 +0000 (11:47 +0300)]
mlxsw: core: Extend hwmon interface with inter-connect temperature attributes

Add new attributes to hwmon object for exposing inter-connects temperature
input, highest, reset_history temperatures and label. Temperatures are read
from Management Temperature Register.
The number of inter-connect devices is read from Management General
Peripheral Information Register.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: reg: Add Management General Peripheral Information Register
Vadim Pasternak [Wed, 29 May 2019 08:47:19 +0000 (11:47 +0300)]
mlxsw: reg: Add Management General Peripheral Information Register

Add MGPIR - Management General Peripheral Information Register, which
allows software to query the hardware and firmware general information
of peripheral entities as Gearboxes etc.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: reg: Extend sensor index field size of Management Temperature Register
Vadim Pasternak [Wed, 29 May 2019 08:47:18 +0000 (11:47 +0300)]
mlxsw: reg: Extend sensor index field size of Management Temperature Register

Extend the size of sensor_index field of MTMP (Management Temperature
Register), from 8 to 12 bits due to hardware change.
Add define for sensor index for Gear Box (inter-connects) temperature
reading.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: core: Re-order initialization sequence
Ido Schimmel [Wed, 29 May 2019 08:47:17 +0000 (11:47 +0300)]
mlxsw: core: Re-order initialization sequence

The driver core first registers with the hwmon and thermal subsystems
and only then proceeds to initialize the switch driver (e.g.,
mlxsw_spectrum). It is only during the last stage that the current
firmware version is validated and a newer one flashed, if necessary.

The above means that if a new firmware feature is utilized by the
hwmon/thermal code, the driver will not be able to load.

Solve this by re-ordering initializing the switch driver before
registering with the hwmon and thermal subsystems.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: i2c: Allow flexible setting of I2C transactions size
Vadim Pasternak [Wed, 29 May 2019 08:47:16 +0000 (11:47 +0300)]
mlxsw: i2c: Allow flexible setting of I2C transactions size

Current implementation uses fixed size of I2C data transaction buffer.
Allow to set size of I2C transactions according to I2C physical adapter
capability. For that purpose adapter read and write size is obtained
from the I2C physical adapter and buffer size is set according to the
minimum of these two values. If adapter does not provide such info,
default buffer size is to be used.
It allows to improve performance of I2C access to silicon when long
size transactions are used.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: i2c: Extend initialization with querying firmware info
Vadim Pasternak [Wed, 29 May 2019 08:47:15 +0000 (11:47 +0300)]
mlxsw: i2c: Extend initialization with querying firmware info

Extend initialization flow with query request for firmware info in
order to obtain firmware version info.
This info is to be provided to minimal driver to support ethtool
get_drvinfo() interface.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'net-stmmac-selftests-Two-fixes'
David S. Miller [Thu, 30 May 2019 19:59:07 +0000 (12:59 -0700)]
Merge branch 'net-stmmac-selftests-Two-fixes'

Jose Abreu says:

====================
net: stmmac: selftests: Two fixes

Two fixes reported by kbuild.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: stmmac: selftests: Use kfree_skb() instead of kfree()
Jose Abreu [Wed, 29 May 2019 08:30:26 +0000 (10:30 +0200)]
net: stmmac: selftests: Use kfree_skb() instead of kfree()

kfree_skb() shall be used instead of kfree(). Fix it.

Fixes: 091810dbded9 ("net: stmmac: Introduce selftests support")
Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: stmmac: selftests: Fix sparse warning
Jose Abreu [Wed, 29 May 2019 08:30:25 +0000 (10:30 +0200)]
net: stmmac: selftests: Fix sparse warning

Variable shall be __be16. Fix it.

Fixes: 091810dbded9 ("net: stmmac: Introduce selftests support")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv4: tcp_input: fix stack out of bounds when parsing TCP options.
Young Xiao [Wed, 29 May 2019 08:10:59 +0000 (16:10 +0800)]
ipv4: tcp_input: fix stack out of bounds when parsing TCP options.

The TCP option parsing routines in tcp_parse_options function could
read one byte out of the buffer of the TCP options.

1         while (length > 0) {
2                 int opcode = *ptr++;
3                 int opsize;
4
5                 switch (opcode) {
6                 case TCPOPT_EOL:
7                         return;
8                 case TCPOPT_NOP:        /* Ref: RFC 793 section 3.1 */
9                         length--;
10                        continue;
11                default:
12                        opsize = *ptr++; //out of bound access

If length = 1, then there is an access in line2.
And another access is occurred in line 12.
This would lead to out-of-bound access.

Therefore, in the patch we check that the available data length is
larger enough to pase both TCP option code and size.

Signed-off-by: Young Xiao <92siuyang@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'mlxsw-Two-small-fixes'
David S. Miller [Thu, 30 May 2019 19:30:47 +0000 (12:30 -0700)]
Merge branch 'mlxsw-Two-small-fixes'

Ido Schimmel says:

====================
mlxsw: Two small fixes

Patch #1 from Jiri fixes an issue specific to Spectrum-2 where the
insertion of two identical flower filters with different priorities
would trigger a warning.

Patch #2 from Amit prevents the driver from trying to configure a port
with a speed of 56Gb/s and autoneg off as this is not supported and
results in error messages from firmware.

Please consider patch #1 for stable.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum: Prevent force of 56G
Amit Cohen [Wed, 29 May 2019 07:59:45 +0000 (10:59 +0300)]
mlxsw: spectrum: Prevent force of 56G

Force of 56G is not supported by hardware in Ethernet devices. This
configuration fails with a bad parameter error from firmware.

Add check of this case. Instead of trying to set 56G with autoneg off,
return a meaningful error.

Fixes: 56ade8fe3fe1 ("mlxsw: spectrum: Add initial support for Spectrum ASIC")
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum_acl: Avoid warning after identical rules insertion
Jiri Pirko [Wed, 29 May 2019 07:59:44 +0000 (10:59 +0300)]
mlxsw: spectrum_acl: Avoid warning after identical rules insertion

When identical rules are inserted, the latter one goes to C-TCAM. For
that, a second eRP with the same mask is created. These 2 eRPs by the
nature cannot be merged and also one cannot be parent of another.
Teach mlxsw_sp_acl_erp_delta_fill() about this possibility and handle it
gracefully.

Reported-by: Alex Kushnarov <alexanderk@mellanox.com>
Fixes: c22291f7cf45 ("mlxsw: spectrum: acl: Implement delta for ERP")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: mv88e6xxx: fix handling of upper half of STATS_TYPE_PORT
Rasmus Villemoes [Wed, 29 May 2019 07:02:11 +0000 (07:02 +0000)]
net: dsa: mv88e6xxx: fix handling of upper half of STATS_TYPE_PORT

Currently, the upper half of a 4-byte STATS_TYPE_PORT statistic ends
up in bits 47:32 of the return value, instead of bits 31:16 as they
should.

Fixes: 6e46e2d821bb ("net: dsa: mv88e6xxx: Fix u64 statistics")
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agor8169: fix MAC address being lost in PCI D3
Heiner Kallweit [Wed, 29 May 2019 05:44:01 +0000 (07:44 +0200)]
r8169: fix MAC address being lost in PCI D3

(At least) RTL8168e forgets its MAC address in PCI D3. To fix this set
the MAC address when resuming. For resuming from runtime-suspend we
had this in place already, for resuming from S3/S5 it was missing.

The commit referenced as being fixed isn't wrong, it's just the first
one where the patch applies cleanly.

Fixes: 0f07bd850d36 ("r8169: use dev_get_drvdata where possible")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reported-by: Albert Astals Cid <aacid@kde.org>
Tested-by: Albert Astals Cid <aacid@kde.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoinet: frags: Remove unnecessary smp_store_release/READ_ONCE
Herbert Xu [Wed, 29 May 2019 05:40:26 +0000 (13:40 +0800)]
inet: frags: Remove unnecessary smp_store_release/READ_ONCE

The smp_store_release call in fqdir_exit cannot protect the setting
of fqdir->dead as claimed because its memory barrier is only
guaranteed to be one-way and the barrier precedes the setting of
fqdir->dead.

IOW it doesn't provide any barriers between fq->dir and the following
hash table destruction.

In fact, the code is safe anyway because call_rcu does provide both
the memory barrier as well as a guarantee that when the destruction
work starts executing all RCU readers will see the updated value for
fqdir->dead.

Therefore this patch removes the unnecessary smp_store_release call
as well as the corresponding READ_ONCE on the read-side in order to
not confuse future readers of this code.  Comments have been added
in their places.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: mvpp2: cls: Remove unnessesary check in mvpp2_ethtool_cls_rule_ins
YueHaibing [Wed, 29 May 2019 02:59:06 +0000 (10:59 +0800)]
net: mvpp2: cls: Remove unnessesary check in mvpp2_ethtool_cls_rule_ins

Fix smatch warning:

drivers/net/ethernet/marvell/mvpp2/mvpp2_cls.c:1236
 mvpp2_ethtool_cls_rule_ins() warn: unsigned 'info->fs.location' is never less than zero.

'info->fs.location' is u32 type, never less than zero.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: stmmac: Switch to devm_alloc_etherdev_mqs
Jisheng Zhang [Wed, 29 May 2019 02:26:07 +0000 (02:26 +0000)]
net: stmmac: Switch to devm_alloc_etherdev_mqs

Make use of devm_alloc_etherdev_mqs() to simplify the code.

Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotua6100: Avoid build warnings.
David S. Miller [Thu, 30 May 2019 18:36:15 +0000 (11:36 -0700)]
tua6100: Avoid build warnings.

Rename _P to _P_VAL and _R to _R_VAL to avoid global
namespace conflicts:

drivers/media/dvb-frontends/tua6100.c: In function ‘tua6100_set_params’:
drivers/media/dvb-frontends/tua6100.c:79: warning: "_P" redefined
 #define _P 32

In file included from ./include/acpi/platform/aclinux.h:54,
                 from ./include/acpi/platform/acenv.h:152,
                 from ./include/acpi/acpi.h:22,
                 from ./include/linux/acpi.h:34,
                 from ./include/linux/i2c.h:17,
                 from drivers/media/dvb-frontends/tua6100.h:30,
                 from drivers/media/dvb-frontends/tua6100.c:32:
./include/linux/ctype.h:14: note: this is the location of the previous definition
 #define _P 0x10 /* punct */

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge tag 'mlx5-fixes-2019-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git...
David S. Miller [Thu, 30 May 2019 18:34:38 +0000 (11:34 -0700)]
Merge tag 'mlx5-fixes-2019-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
Mellanox, mlx5 fixes 2019-05-28

This series introduces some fixes to mlx5 driver.

Please pull and let me know if there is any problem.

For -stable v4.13:
('net/mlx5: Allocate root ns memory using kzalloc to match kfree')

For -stable v4.16:
('net/mlx5: Avoid double free in fs init error unwinding path')

For -stable v4.18:
('net/mlx5e: Disable rxhash when CQE compress is enabled')
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'Enable-SFP-on-ACPI-based-systems'
David S. Miller [Thu, 30 May 2019 18:27:47 +0000 (11:27 -0700)]
Merge branch 'Enable-SFP-on-ACPI-based-systems'

Ruslan Babayev says:

====================
Enable SFP on ACPI based systems

Changes:
v2:
- more descriptive commit body
v3:
- made 'i2c_acpi_find_adapter_by_handle' static inline
v4:
- don't initialize i2c_adapter to NULL. Instead see below...
- handle the case of neither DT nor ACPI present as invalid.
- alphabetical includes.
- use has_acpi_companion().
- use the same argument name in i2c_acpi_find_adapter_by_handle()
  in both stubbed and non-stubbed cases.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: sfp: enable i2c-bus detection on ACPI based systems
Ruslan Babayev [Tue, 28 May 2019 23:02:33 +0000 (16:02 -0700)]
net: phy: sfp: enable i2c-bus detection on ACPI based systems

Lookup I2C adapter using the "i2c-bus" device property on ACPI based
systems similar to how it's done with DT.

An example DSD describing an SFP on an ACPI based system:

Device (SFP0)
{
    Name (_HID, "PRP0001")
    Name (_CRS, ResourceTemplate()
    {
        GpioIo(Exclusive, PullDefault, 0, 0, IoRestrictionNone,
               "\\_SB.PCI0.RP01.GPIO", 0, ResourceConsumer)
            { 0, 1, 2, 3, 4 }
    })
    Name (_DSD, Package ()
    {
        ToUUID ("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"),
        Package () {
            Package () { "compatible", "sff,sfp" },
            Package () { "i2c-bus", \_SB.PCI0.RP01.I2C.MUX.CH0 },
            Package () { "maximum-power-milliwatt", 1000 },
            Package () { "tx-disable-gpios", Package () { ^SFP0, 0, 0, 1} },
            Package () { "reset-gpio",       Package () { ^SFP0, 0, 1, 1} },
            Package () { "mod-def0-gpios",   Package () { ^SFP0, 0, 2, 1} },
            Package () { "tx-fault-gpios",   Package () { ^SFP0, 0, 3, 0} },
            Package () { "los-gpios",        Package () { ^SFP0, 0, 4, 1} },
        },
    })
}

Device (PHY0)
{
    Name (_HID, "PRP0001")
    Name (_DSD, Package ()
    {
        ToUUID ("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"),
        Package () {
            Package () { "compatible", "ethernet-phy-ieee802.3-c45" },
            Package () { "sfp", \_SB.PCI0.RP01.SFP0 },
            Package () { "managed", "in-band-status" },
            Package () { "phy-mode", "sgmii" },
        },
    })
}

Signed-off-by: Ruslan Babayev <ruslan@babayev.com>
Cc: xe-linux-external@cisco.com
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoi2c: acpi: export i2c_acpi_find_adapter_by_handle
Ruslan Babayev [Tue, 28 May 2019 23:02:32 +0000 (16:02 -0700)]
i2c: acpi: export i2c_acpi_find_adapter_by_handle

This allows drivers to lookup i2c adapters on ACPI based systems similar to
of_get_i2c_adapter_by_node() with DT based systems.

Signed-off-by: Ruslan Babayev <ruslan@babayev.com>
Cc: xe-linux-external@cisco.com
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'XDP-generic-fixes'
David S. Miller [Thu, 30 May 2019 18:12:21 +0000 (11:12 -0700)]
Merge branch 'XDP-generic-fixes'

Stephen Hemminger says:

====================
XDP generic fixes

This set of patches came about while investigating XDP
generic on Azure. The split brain nature of the accelerated
networking exposed issues with the stack device model.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: core: support XDP generic on stacked devices.
Stephen Hemminger [Tue, 28 May 2019 18:47:31 +0000 (11:47 -0700)]
net: core: support XDP generic on stacked devices.

When a device is stacked like (team, bonding, failsafe or netvsc) the
XDP generic program for the parent device was not called.

Move the call to XDP generic inside __netif_receive_skb_core where
it can be done multiple times for stacked case.

Fixes: d445516966dc ("net: xdp: support xdp generic on virtual devices")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonetvsc: unshare skb in VF rx handler
Stephen Hemminger [Tue, 28 May 2019 18:47:30 +0000 (11:47 -0700)]
netvsc: unshare skb in VF rx handler

The netvsc VF skb handler should make sure that skb is not
shared. Similar logic already exists in bonding and team device
drivers.

This is not an issue in practice because the VF devicex
does not send up shared skb's. But the netvsc driver
should do the right thing if it did.

Fixes: 0c195567a8f6 ("netvsc: transparent VF management")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoudp: Avoid post-GRO UDP checksum recalculation
Sean Tranchetti [Tue, 28 May 2019 18:22:54 +0000 (12:22 -0600)]
udp: Avoid post-GRO UDP checksum recalculation

Currently, when resegmenting an unexpected UDP GRO packet, the full UDP
checksum will be calculated for every new SKB created by skb_segment()
because the netdev features passed in by udp_rcv_segment() lack any
information about checksum offload capabilities.

Usually, we have no need to perform this calculation again, as
  1) The GRO implementation guarantees that any packets making it to the
     udp_rcv_segment() function had correct checksums, and, more
     importantly,
  2) Upon the successful return of udp_rcv_segment(), we immediately pull
     the UDP header off and either queue the segment to the socket or
     hand it off to a new protocol handler.

Unless userspace has set the IP_CHECKSUM sockopt to indicate that they
want the final checksum values, we can pass the needed netdev feature
flags to __skb_gso_segment() to avoid checksumming each segment in
skb_segment().

Fixes: cf329aa42b66 ("udp: cope with UDP GRO packet misdirection")
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: Sean Tranchetti <stranche@codeaurora.org>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoice: Trivial cosmetic changes
Anirudh Venkataramanan [Tue, 16 Apr 2019 17:35:03 +0000 (10:35 -0700)]
ice: Trivial cosmetic changes

This patch mostly capitalizes abbreviations in code comments. Fixed some
typos and removed some unnecessary newlines as well.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Recognize higher speeds
Anirudh Venkataramanan [Tue, 16 Apr 2019 17:35:02 +0000 (10:35 -0700)]
ice: Recognize higher speeds

In ice_print_link_msg, add cases for 50GB and 100GB speeds. This
results in the right speed being reported on load, instead of
"Unknownbps".

When VF link if forced (in ice_set_pfe_link_forced), report
max speed 100GB.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Use a different ICE_DBG bit for firmware log messages
Jacob Keller [Tue, 16 Apr 2019 17:35:01 +0000 (10:35 -0700)]
ice: Use a different ICE_DBG bit for firmware log messages

Replace the use of the ICE_DBG_AQ_MSG bit when dumping firmware logging
messages with a separate distinct type ICE_DBG_FW_LOG. This is useful
so that developers may enable ICE_DBG_FW_LOG and get firmware logging
messages, without also dumping AdminQ messages at the same time.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Update function header
Anirudh Venkataramanan [Tue, 16 Apr 2019 17:35:00 +0000 (10:35 -0700)]
ice: Update function header

Add some details to the function header for ice_deinit_hw.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Move define for ICE_AQC_DRIVER_UNLOADING
Anirudh Venkataramanan [Tue, 16 Apr 2019 17:34:59 +0000 (10:34 -0700)]
ice: Move define for ICE_AQC_DRIVER_UNLOADING

The define describing the bits for the struct field should be below
the field itself.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Align to updated AQ command formats
Anirudh Venkataramanan [Tue, 16 Apr 2019 17:34:58 +0000 (10:34 -0700)]
ice: Align to updated AQ command formats

The current specification has updates to the command formats for
manage MAC opcodes (opcodes 0x0107 and 0x0108) and get PHY caps
(opcode 0x0600). Update the code to reflect this.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Use continue instead of an else block
Anirudh Venkataramanan [Tue, 16 Apr 2019 17:34:57 +0000 (10:34 -0700)]
ice: Use continue instead of an else block

For style consistency, use continue instead of an else block in
ice_pf_dcb_recfg.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Change minimum descriptor count value for Tx/Rx rings
Preethi Banala [Tue, 16 Apr 2019 17:34:56 +0000 (10:34 -0700)]
ice: Change minimum descriptor count value for Tx/Rx rings

Change minimum number of descriptor count from 32 to 64. This is to have
a feature parity with previous Intel NIC drivers.

Signed-off-by: Preethi Banala <preethi.banala@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Add switch rules to handle LLDP packets
Dave Ertman [Tue, 16 Apr 2019 17:34:55 +0000 (10:34 -0700)]
ice: Add switch rules to handle LLDP packets

Add call to configure dropping egress LLDP packets in ice_vsi_setup
and remove the rule in ice_vsi_release.

Add calls to add/remove rule to route LLDP packets to default VSI when
FW LLDP engine is disabled/enabled and remove rule if applied during
ice_vsi_release.

In the function ice_add_eth_mac(), there is a line that hard codes the
filter info flag to TX. This is incorrect as this flag will be set by
the calling function that built the list of filters to add. So remove
the hard coded value.

This patch also contains a fix to stop treating the DCBx state of
"Not Started" as an error state that kicks DCB in SW mode. This will
address having non-cabled interfaces automatically go into SW mode
with the FW engine running.

Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Cleanup ice_update_link_info
Bruce Allan [Tue, 16 Apr 2019 17:34:54 +0000 (10:34 -0700)]
ice: Cleanup ice_update_link_info

Do not allocate memory for the Get PHY Abilities command data buffer when
it is not necessary, change one local variable to another to reduce the
number of de-references, reduce the scope of some local variables, and
reorder the code and change exit points to get rid of an unnecessary goto
label.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Use right type for ice_cfg_vsi_lan return
Akeem G Abodunrin [Tue, 16 Apr 2019 17:34:53 +0000 (10:34 -0700)]
ice: Use right type for ice_cfg_vsi_lan return

ice_cfg_vsi_lan returns a value of type enum ice_status. So
use a local of the same type to capture the return value.

Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Add support for Forward Error Correction (FEC)
Paul Greenwalt [Tue, 16 Apr 2019 17:34:52 +0000 (10:34 -0700)]
ice: Add support for Forward Error Correction (FEC)

This patch adds driver support for Forward Error Correction (FEC)
and ethtool handlers to set/get FEC params.

Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Add support for virtchnl_vector_map.[rxq|txq]_map
Anirudh Venkataramanan [Tue, 16 Apr 2019 17:34:51 +0000 (10:34 -0700)]
ice: Add support for virtchnl_vector_map.[rxq|txq]_map

Add support for virtchnl_vector_map.[rxq|txq]_map to use bitmap to
associate indicated queues with the specified vector. This support is
needed since the Windows AVF driver calls VIRTCHNL_OP_CONFIG_IRQ_MAP for
each vector and used the bitmap to indicate the associated queues.

Updated ice_vc_dis_qs_msg to not subtract one from
virtchnl_irq_map_info.num_vectors, and changed the VSI vector index to
the vector id. This change supports the Windows AVF driver which maps
one vector at a time and sets num_vectors to one. Using vectors_id to
index the vector array .

Add check for vector_id zero, and return VIRTCHNL_STATUS_ERR_PARAM
if vector_id is zero and there are rings associated with that vector.
Vector_id zero is for the OICR.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Introduce ice_init_mac_fltr and move ice_napi_del
Tony Nguyen [Tue, 16 Apr 2019 17:34:50 +0000 (10:34 -0700)]
ice: Introduce ice_init_mac_fltr and move ice_napi_del

Consolidate adding unicast and broadcast MAC filters in a single new
function ice_init_mac_fltr.

Move ice_napi_del to ice_lib.c

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Use GLINT_DYN_CTL to disable VF's interrupts
Brett Creeley [Tue, 16 Apr 2019 17:30:52 +0000 (10:30 -0700)]
ice: Use GLINT_DYN_CTL to disable VF's interrupts

Currently in ice_free_vf_res() we are writing to the VFINT_DYN_CTLN
register in the PF's function space to disable all VF's interrupts. This
is incorrect because this register is only for use in the VF's function
space. This becomes obvious when seeing that the valid indices used for
the VFINT_DYN_CTLN register is from 0-63, which is the maximum number of
interrupts for a VF (not including the OICR interrupt). Fix this by
writing to the GLINT_DYN_CTL register for each VF. We can do this
because we keep track of each VF's first_vector_idx inside of the PF's
function space and the number of interrupts given to each VF.

Also in ice_free_vfs() we were disabling Rx/Tx queues after calling
pci_disable_sriov(). One part of disabling the Tx queues causes the PF
driver to trigger a software interrupt, which causes the VF's napi
routine to run. This doesn't currently work because pci_disable_sriov()
causes iavf_remove() to be called which disables interrupts. Fix this by
disabling Rx/Tx queues prior to pci_disable_sriov().

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agonet: phy: tja11xx: Switch to HWMON_CHANNEL_INFO()
Marek Vasut [Tue, 28 May 2019 18:15:41 +0000 (20:15 +0200)]
net: phy: tja11xx: Switch to HWMON_CHANNEL_INFO()

The HWMON_CHANNEL_INFO macro simplifies the code, reduces the likelihood
of errors, and makes the code easier to read.

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Cc: Jean Delvare <jdelvare@suse.com>
Cc: linux-hwmon@vger.kernel.org
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ethernet: ti: cpsw: correct .ndo_open error path
Ivan Khoronzhuk [Tue, 28 May 2019 17:45:19 +0000 (20:45 +0300)]
net: ethernet: ti: cpsw: correct .ndo_open error path

It's found while review and probably never happens, but real number
of queues is set per device, and error path should be per device.
So split error path based on usage_count.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'Decoupling-PHYLINK-from-struct-net_device'
David S. Miller [Thu, 30 May 2019 04:48:54 +0000 (21:48 -0700)]
Merge branch 'Decoupling-PHYLINK-from-struct-net_device'

Ioana Ciornei says:

====================
Decoupling PHYLINK from struct net_device

Following two separate discussion threads in:
  https://www.spinics.net/lists/netdev/msg569087.html
and:
  https://www.spinics.net/lists/netdev/msg570450.html

Previous RFC patch set: https://www.spinics.net/lists/netdev/msg571995.html

PHYLINK was reworked in order to accept multiple operation types,
PHYLINK_NETDEV and PHYLINK_DEV, passed through a phylink_config
structure alongside the corresponding struct device.

One of the main concerns expressed in the RFC was that using notifiers
to signal the corresponding phylink_mac_ops would break PHYLINK's API
unity and that it would become harder to grep for its users.
Using the current approach, we maintain a common API for all users.
Also, printing useful information in PHYLINK, when decoupled from a
net_device, is achieved using dev_err&co on the struct device received
(in DSA's case is the device corresponding to the dsa_switch).

PHYLIB (which PHYLINK uses) was reworked to the extent that it does not
crash when connecting to a PHY and the net_device pointer is NULL.

Lastly, DSA has been reworked in its way that it handles PHYs for ports
that lack a net_device (CPU and DSA ports).  For these, it was
previously using PHYLIB and is now using the PHYLINK_DEV operation type.
Previously, a driver that wanted to support PHY operations on CPU/DSA
ports has to implement .adjust_link(). This patch set not only gives
drivers the options to use PHYLINK uniformly but also urges them to
convert to it. For compatibility, the old code is kept but it will be
removed once all drivers switch over.

The patchset was tested on the NXP LS1021A-TSN board having the
following Ethernet layout:
  https://lkml.org/lkml/2019/5/5/279
The CPU port was moved from the internal RGMII fixed-link (enet2 ->
switch port 4) to an external loopback Cat5 cable between the enet1 port
and the front-facing swp2 SJA1105 port. In this mode, both the master
and the CPU port have an attached PHY which detects link change events:

[   49.105426] fsl-gianfar soc:ethernet@2d50000 eth1: Link is Down
[   50.305486] sja1105 spi0.1: Link is Down
[   53.265596] fsl-gianfar soc:ethernet@2d50000 eth1: Link is Up - 1Gbps/Full - flow control off
[   54.466304] sja1105 spi0.1: Link is Up - 1Gbps/Full - flow control off

Changes in v2:
  - fixed sparse warnings
  - updated 'Documentation/ABI/testing/sysfs-class-net-phydev'
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: sja1105: Fix broken fixed-link interfaces on user ports
Vladimir Oltean [Tue, 28 May 2019 17:38:17 +0000 (20:38 +0300)]
net: dsa: sja1105: Fix broken fixed-link interfaces on user ports

PHYLIB and PHYLINK handle fixed-link interfaces differently. PHYLIB
wraps them in a software PHY ("pseudo fixed link") phydev construct such
that .adjust_link driver callbacks see an unified API. Whereas PHYLINK
simply creates a phylink_link_state structure and passes it to
.mac_config.

At the time the driver was introduced, DSA was using PHYLIB for the
CPU/cascade ports (the ones with no net devices) and PHYLINK for
everything else.

As explained below:

commit aab9c4067d2389d0adfc9c53806437df7b0fe3d5
Author: Florian Fainelli <f.fainelli@gmail.com>
Date:   Thu May 10 13:17:36 2018 -0700

  net: dsa: Plug in PHYLINK support

  Drivers that utilize fixed links for user-facing ports (e.g: bcm_sf2)
  will need to implement phylink_mac_ops from now on to preserve
  functionality, since PHYLINK *does not* create a phy_device instance
  for fixed links.

In the above patch, DSA guards the .phylink_mac_config callback against
a NULL phydev pointer.  Therefore, .adjust_link is not called in case of
a fixed-link user port.

This patch fixes the situation by converting the driver from using
.adjust_link to .phylink_mac_config.  This can be done now in a unified
fashion for both slave and CPU/cascade ports because DSA now uses
PHYLINK for all ports.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: Use PHYLINK for the CPU/DSA ports
Ioana Ciornei [Tue, 28 May 2019 17:38:16 +0000 (20:38 +0300)]
net: dsa: Use PHYLINK for the CPU/DSA ports

For DSA switches that do not have an .adjust_link callback, aka those
who transitioned totally to the PHYLINK-compliant API, use PHYLINK to
drive the CPU/DSA ports.

The PHYLIB usage and .adjust_link are kept but deprecated, and users are
asked to transition from it.  The reason why we can't do anything for
them is because PHYLINK does not wrap the fixed-link state behind a
phydev object, so we cannot wrap .phylink_mac_config into .adjust_link
unless we fabricate a phy_device structure.

For these ports, the newly introduced PHYLINK_DEV operation type is
used and the dsa_switch device structure is passed to PHYLINK for
printing purposes.  The handling of the PHYLINK_NETDEV and PHYLINK_DEV
PHYLINK instances is common from the perspective of the driver.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: Move the phylink driver calls into port.c
Ioana Ciornei [Tue, 28 May 2019 17:38:15 +0000 (20:38 +0300)]
net: dsa: Move the phylink driver calls into port.c

In order to have a common handling of PHYLINK for the slave and non-user
ports, the DSA core glue logic (between PHYLINK and the driver) must use
an API that does not rely on a struct net_device.

These will also be called by the CPU-port-handling code in a further
patch.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Suggested-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phylink: Add phylink_{printk, err, warn, info, dbg} macros
Ioana Ciornei [Tue, 28 May 2019 17:38:14 +0000 (20:38 +0300)]
net: phylink: Add phylink_{printk, err, warn, info, dbg} macros

With the latest addition to the PHYLINK infrastructure, we are faced
with a decision on when to print necessary info using the struct
net_device and when with the struct device.

Add a series of macros that encapsulate this decision and replace all
uses of netdev_err&co with phylink_err.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phylink: Add PHYLINK_DEV operation type
Ioana Ciornei [Tue, 28 May 2019 17:38:13 +0000 (20:38 +0300)]
net: phylink: Add PHYLINK_DEV operation type

In the PHYLINK_DEV operation type, the PHYLINK infrastructure can work
without an attached net_device. For printing usecases, instead, a struct
device * should be passed to PHYLINK using the phylink_config structure.

Also, netif_carrier_* calls ar guarded by the presence of a valid
net_device. When using the PHYLINK_DEV operation type, we cannot check
link status using the netif_carrier_ok() API so instead, keep an
internal state of the MAC and call mac_link_{down,up} only when the link
changed.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phylink: Add struct phylink_config to PHYLINK API
Ioana Ciornei [Tue, 28 May 2019 17:38:12 +0000 (20:38 +0300)]
net: phylink: Add struct phylink_config to PHYLINK API

The phylink_config structure will encapsulate a pointer to a struct
device and the operation type requested for this instance of PHYLINK.
This patch does not make any functional changes, it just transitions the
PHYLINK internals and all its users to the new API.

A pointer to a phylink_config structure will be passed to
phylink_create() instead of the net_device directly. Also, the same
phylink_config pointer will be passed back to all phylink_mac_ops
callbacks instead of the net_device. Using this mechanism, a PHYLINK
user can get the original net_device using a structure such as
'to_net_dev(config->dev)' or directly the structure containing the
phylink_config using a container_of call.

At the moment, only the PHYLINK_NETDEV is defined as a valid operation
type for PHYLINK. In this mode, a valid reference to a struct device
linked to the original net_device should be passed to PHYLINK through
the phylink_config structure.

This API changes is mainly driven by the necessity of adding a new
operation type in PHYLINK that disconnects the phy_device from the
net_device and also works when the net_device is lacking.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phylink: Add phylink_mac_link_{up, down} wrapper functions
Ioana Ciornei [Tue, 28 May 2019 17:38:11 +0000 (20:38 +0300)]
net: phylink: Add phylink_mac_link_{up, down} wrapper functions

This is a cosmetic patch that reduces the clutter in phylink_resolve
around calling the .mac_link_up/.mac_link_down driver callbacks.  In a
further patch this logic will be extended to emit notifications in case
a net device does not exist.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: Add phy_standalone sysfs entry
Ioana Ciornei [Tue, 28 May 2019 17:38:10 +0000 (20:38 +0300)]
net: phy: Add phy_standalone sysfs entry

Export a phy_standalone device attribute that is meant to give the
indication that this PHY lacks an attached_dev and its corresponding
sysfs link. The attribute will be created only when the
phy_attach_direct() function will be called with a NULL net_device.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: Check against net_device being NULL
Ioana Ciornei [Tue, 28 May 2019 17:38:09 +0000 (20:38 +0300)]
net: phy: Check against net_device being NULL

In general, we don't want MAC drivers calling phy_attach_direct with the
net_device being NULL. Add checks against this in all the functions
calling it: phy_attach() and phy_connect_direct().

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: Guard against the presence of a netdev
Ioana Ciornei [Tue, 28 May 2019 17:38:08 +0000 (20:38 +0300)]
net: phy: Guard against the presence of a netdev

A prerequisite for PHYLIB to work in the absence of a struct net_device
is to not access pointers to it.

Changes are needed in the following areas:

 - Printing: In some places netdev_err was replaced with phydev_err.

 - Incrementing reference count to the parent MDIO bus driver: If there
   is no net device, then the reference count should definitely be
   incremented since there is no chance that it was an Ethernet driver
   who registered the MDIO bus.

 - Sysfs links are not created in case there is no attached_dev.

 - No netif_carrier_off is done if there is no attached_dev.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: Add phy_sysfs_create_links helper function
Vladimir Oltean [Tue, 28 May 2019 17:38:07 +0000 (20:38 +0300)]
net: phy: Add phy_sysfs_create_links helper function

This is a cosmetic patch that wraps the operation of creating sysfs
links between the netdev->phydev and the phydev->attached_dev.

This is needed to keep the indentation level in check in a follow-up
patch where this function will be guarded against the existence of a
phydev->attached_dev.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: sched: Introduce act_ctinfo action
Kevin 'ldir' Darbyshire-Bryant [Tue, 28 May 2019 17:03:50 +0000 (17:03 +0000)]
net: sched: Introduce act_ctinfo action

ctinfo is a new tc filter action module.  It is designed to restore
information contained in firewall conntrack marks to other packet fields
and is typically used on packet ingress paths.  At present it has two
independent sub-functions or operating modes, DSCP restoration mode &
skb mark restoration mode.

The DSCP restore mode:

This mode copies DSCP values that have been placed in the firewall
conntrack mark back into the IPv4/v6 diffserv fields of relevant
packets.

The DSCP restoration is intended for use and has been found useful for
restoring ingress classifications based on egress classifications across
links that bleach or otherwise change DSCP, typically home ISP Internet
links.  Restoring DSCP on ingress on the WAN link allows qdiscs such as
but by no means limited to CAKE to shape inbound packets according to
policies that are easier to set & mark on egress.

Ingress classification is traditionally a challenging task since
iptables rules haven't yet run and tc filter/eBPF programs are pre-NAT
lookups, hence are unable to see internal IPv4 addresses as used on the
typical home masquerading gateway.  Thus marking the connection in some
manner on egress for later restoration of classification on ingress is
easier to implement.

Parameters related to DSCP restore mode:

dscpmask - a 32 bit mask of 6 contiguous bits and indicate bits of the
conntrack mark field contain the DSCP value to be restored.

statemask - a 32 bit mask of (usually) 1 bit length, outside the area
specified by dscpmask.  This represents a conditional operation flag
whereby the DSCP is only restored if the flag is set.  This is useful to
implement a 'one shot' iptables based classification where the
'complicated' iptables rules are only run once to classify the
connection on initial (egress) packet and subsequent packets are all
marked/restored with the same DSCP.  A mask of zero disables the
conditional behaviour ie. the conntrack mark DSCP bits are always
restored to the ip diffserv field (assuming the conntrack entry is found
& the skb is an ipv4/ipv6 type)

e.g. dscpmask 0xfc000000 statemask 0x01000000

|----0xFC----conntrack mark----000000---|
| Bits 31-26 | bit 25 | bit24 |~~~ Bit 0|
| DSCP       | unused | flag  |unused   |
|-----------------------0x01---000000---|
      |                   |
      |                   |
      ---|             Conditional flag
         v             only restore if set
|-ip diffserv-|
| 6 bits      |
|-------------|

The skb mark restore mode (cpmark):

This mode copies the firewall conntrack mark to the skb's mark field.
It is completely the functional equivalent of the existing act_connmark
action with the additional feature of being able to apply a mask to the
restored value.

Parameters related to skb mark restore mode:

mask - a 32 bit mask applied to the firewall conntrack mark to mask out
bits unwanted for restoration.  This can be useful where the conntrack
mark is being used for different purposes by different applications.  If
not specified and by default the whole mark field is copied (i.e.
default mask of 0xffffffff)

e.g. mask 0x00ffffff to mask out the top 8 bits being used by the
aforementioned DSCP restore mode.

|----0x00----conntrack mark----ffffff---|
| Bits 31-24 |                          |
| DSCP & flag|      some value here     |
|---------------------------------------|
|
|
v
|------------skb mark-------------------|
|            |                          |
|  zeroed    |                          |
|---------------------------------------|

Overall parameters:

zone - conntrack zone

control - action related control (reclassify | pipe | drop | continue |
ok | goto chain <CHAIN_INDEX>)

Signed-off-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agor8169: remove 1000/Half from supported modes
Heiner Kallweit [Tue, 28 May 2019 16:43:46 +0000 (18:43 +0200)]
r8169: remove 1000/Half from supported modes

MAC on the GBit versions supports 1000/Full only, however the PHY
partially claims to support 1000/Half. So let's explicitly remove
this mode.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: mscc: ocelot: Implement port policers via tc command
Joergen Andreasen [Tue, 28 May 2019 12:49:17 +0000 (14:49 +0200)]
net: mscc: ocelot: Implement port policers via tc command

Hardware offload of matchall classifier and police action are now
supported via the tc command.
Supported police parameters are: rate and burst.

Example:

Add:
tc qdisc add dev eth3 handle ffff: ingress
tc filter add dev eth3 parent ffff: prio 1 handle 2 \
matchall skip_sw \
action police rate 100Mbit burst 10000

Show:
tc -s -d qdisc show dev eth3
tc -s -d filter show dev eth3 ingress

Delete:
tc filter del dev eth3 parent ffff: prio 1
tc qdisc del dev eth3 handle ffff: ingress

Signed-off-by: Joergen Andreasen <joergen.andreasen@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agolibbpf: reduce unnecessary line wrapping
Andrii Nakryiko [Wed, 29 May 2019 17:36:11 +0000 (10:36 -0700)]
libbpf: reduce unnecessary line wrapping

There are a bunch of lines of code or comments that are unnecessary
wrapped into multi-lines. Fix that without violating any code
guidelines.

Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agolibbpf: typo and formatting fixes
Andrii Nakryiko [Wed, 29 May 2019 17:36:10 +0000 (10:36 -0700)]
libbpf: typo and formatting fixes

A bunch of typo and formatting fixes.

Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agolibbpf: simplify two pieces of logic
Andrii Nakryiko [Wed, 29 May 2019 17:36:09 +0000 (10:36 -0700)]
libbpf: simplify two pieces of logic

Extra check for type is unnecessary in first case.

Extra zeroing is unnecessary, as snprintf guarantees that it will
zero-terminate string.

Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agolibbpf: use negative fd to specify missing BTF
Andrii Nakryiko [Wed, 29 May 2019 17:36:08 +0000 (10:36 -0700)]
libbpf: use negative fd to specify missing BTF

0 is a valid FD, so it's better to initialize it to -1, as is done in
other places. Also, technically, BTF type ID 0 is valid (it's a VOID
type), so it's more reliable to check btf_fd, instead of
btf_key_type_id, to determine if there is any BTF associated with a map.

Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agolibbpf: fix error code returned on corrupted ELF
Andrii Nakryiko [Wed, 29 May 2019 17:36:07 +0000 (10:36 -0700)]
libbpf: fix error code returned on corrupted ELF

All of libbpf errors are negative, except this one. Fix it.

Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agolibbpf: check map name retrieved from ELF
Andrii Nakryiko [Wed, 29 May 2019 17:36:06 +0000 (10:36 -0700)]
libbpf: check map name retrieved from ELF

Validate there was no error retrieving symbol name corresponding to
a BPF map.

Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agolibbpf: simplify endianness check
Andrii Nakryiko [Wed, 29 May 2019 17:36:05 +0000 (10:36 -0700)]
libbpf: simplify endianness check

Rewrite endianness check to use "more canonical" way, using
compiler-defined macros, similar to few other places in libbpf. It also
is more obvious and shorter.

Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agolibbpf: preserve errno before calling into user callback
Andrii Nakryiko [Wed, 29 May 2019 17:36:04 +0000 (10:36 -0700)]
libbpf: preserve errno before calling into user callback

pr_warning ultimately may call into user-provided callback function,
which can clobber errno value, so we need to save it before that.

Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agolibbpf: fix detection of corrupted BPF instructions section
Andrii Nakryiko [Wed, 29 May 2019 17:36:03 +0000 (10:36 -0700)]
libbpf: fix detection of corrupted BPF instructions section

Ensure that size of a section w/ BPF instruction is exactly a multiple
of BPF instruction size.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agoMerge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next...
David S. Miller [Wed, 29 May 2019 21:51:23 +0000 (14:51 -0700)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
100GbE Intel Wired LAN Driver Updates 2019-05-29

This series contains updates to ice driver only.

Bruce cleans up white space issues and fixes complaints about using
bitop assignments using operands of different sizes.

Anirudh cleans up code that is no longer needed now that the firmware
supports the functionality.  Adds support for ethtool selftestto the ice
driver, which includes testing link, interrupts, eeprom, registers and
packet loopback.  Also, cleaned up duplicate code.

Tony implements support for toggling receive VLAN filter via ethtool.

Brett bumps up the minimum receive descriptor count per queue to resolve
dropped packets.  Refactored the interrupt tracking for the ice driver
to resolve issues seen with the co-existence of features and SR-IOV, so
instead of having a hardware IRQ tracker and a software IRQ tracker,
simply use one tracker.  Also adds a helper function to trigger software
interrupts.

Mitch changes how Malicious Driver Detection (MDD) events are handled,
to ensure all VFs checked for MDD events and just log the event instead
of disabling the VF, which was preventing proper release of resources if
the VF is rebooted or the VF driver reloaded.

Dave cleans up a redundant call to register LLDP MIB change events.

Dan adds support to retrieve the current setting of firmware logging
from the hardware to properly initialize the hardware structure.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge tag 'docs-5.2-fixes2' of git://git.lwn.net/linux
Linus Torvalds [Wed, 29 May 2019 21:36:41 +0000 (14:36 -0700)]
Merge tag 'docs-5.2-fixes2' of git://git.lwn.net/linux

Pull documentation fixes from Jonathan Corbet:
 "The Sphinx 2.0 release contained a few incompatible API changes that
  broke our extensions and, thus, the documentation build in general.
  Who knew that those deprecation warnings it was outputting actually
  meant we should change something? This set of fixes makes the build
  work again with Sphinx 2.0 and eliminates the warnings for 1.8. As
  part of that, we also need a few fixes to the docs for places where
  the new Sphinx is more strict.

  It is a bit late in the cycle for this kind of change, but it does fix
  problems that people are experiencing now.

  There has been some talk of raising the minimum version of Sphinx we
  support. I don't want to do that abruptly, though, so these changes
  add some glue to continue to support versions back to 1.3. We will be
  adding some infrastructure soon to nudge users of old versions
  forward, with the idea of maybe increasing our minimum version (and
  removing this glue) sometime in the future"

* tag 'docs-5.2-fixes2' of git://git.lwn.net/linux:
  drm/i915: Maintain consistent documentation subsection ordering
  scripts/sphinx-pre-install: make it handle Sphinx versions
  docs: Fix conf.py for Sphinx 2.0
  docs: fix multiple doc build warnings in enumeration.rst
  lib/list_sort: fix kerneldoc build error
  docs: fix numaperf.rst and add it to the doc tree
  doc: Cope with the deprecation of AutoReporter
  doc: Cope with Sphinx logging deprecations

5 years agoMerge branch 'net-phy-dp83867-add-some-fixes'
David S. Miller [Wed, 29 May 2019 21:28:48 +0000 (14:28 -0700)]
Merge branch 'net-phy-dp83867-add-some-fixes'

Max Uvarov says:

====================
net: phy: dp83867: add some fixes

v3: use phy_modify_mmd()
v2: fix minor comments by Heiner Kallweit and Florian Fainelli
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: dp83867: Set up RGMII TX delay
Max Uvarov [Tue, 28 May 2019 10:00:52 +0000 (13:00 +0300)]
net: phy: dp83867: Set up RGMII TX delay

PHY_INTERFACE_MODE_RGMII_RXID is less then TXID
so code to set tx delay is never called.

Fixes: 2a10154abcb75 ("net: phy: dp83867: Add TI dp83867 phy")
Signed-off-by: Max Uvarov <muvarov@gmail.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: dp83867: do not call config_init twice
Max Uvarov [Tue, 28 May 2019 10:00:51 +0000 (13:00 +0300)]
net: phy: dp83867: do not call config_init twice

Phy state machine calls _config_init just after
reset.

Signed-off-by: Max Uvarov <muvarov@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: dp83867: increase SGMII autoneg timer duration
Max Uvarov [Tue, 28 May 2019 10:00:50 +0000 (13:00 +0300)]
net: phy: dp83867: increase SGMII autoneg timer duration

After reset SGMII Autoneg timer is set to 2us (bits 6 and 5 are 01).
That is not enough to finalize autonegatiation on some devices.
Increase this timer duration to maximum supported 16ms.

Signed-off-by: Max Uvarov <muvarov@gmail.com>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: dp83867: fix speed 10 in sgmii mode
Max Uvarov [Tue, 28 May 2019 10:00:49 +0000 (13:00 +0300)]
net: phy: dp83867: fix speed 10 in sgmii mode

For supporting 10Mps speed in SGMII mode DP83867_10M_SGMII_RATE_ADAPT bit
of DP83867_10M_SGMII_CFG register has to be cleared by software.
That does not affect speeds 100 and 1000 so can be done on init.

Signed-off-by: Max Uvarov <muvarov@gmail.com>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: marvell10g: report if the PHY fails to boot firmware
Russell King [Tue, 28 May 2019 09:34:42 +0000 (10:34 +0100)]
net: phy: marvell10g: report if the PHY fails to boot firmware

Some boards do not have the PHY firmware programmed in the 3310's flash,
which leads to the PHY not working as expected.  Warn the user when the
PHY fails to boot the firmware and refuse to initialise.

Fixes: 20b2af32ff3f ("net: phy: add Marvell Alaska X 88X3310 10Gigabit PHY support")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phylink: ensure consistent phy interface mode
Russell King [Tue, 28 May 2019 09:27:21 +0000 (10:27 +0100)]
net: phylink: ensure consistent phy interface mode

Ensure that we supply the same phy interface mode to mac_link_down() as
we did for the corresponding mac_link_up() call.  This ensures that MAC
drivers that use the phy interface mode in these methods can depend on
mac_link_down() always corresponding to a mac_link_up() call for the
same interface mode.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: stmmac: Fix build error without CONFIG_INET
YueHaibing [Tue, 28 May 2019 09:10:40 +0000 (17:10 +0800)]
net: stmmac: Fix build error without CONFIG_INET

Fix gcc build error while CONFIG_INET is not set

drivers/net/ethernet/stmicro/stmmac/stmmac_selftests.o: In function `__stmmac_test_loopback':
stmmac_selftests.c:(.text+0x8ec): undefined reference to `ip_send_check'
stmmac_selftests.c:(.text+0xacc): undefined reference to `udp4_hwcsum'

Add CONFIG_INET dependency to fix this.

Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: 091810dbded9 ("net: stmmac: Introduce selftests support")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agorhashtable: Add rht_ptr_rcu and improve rht_ptr
Herbert Xu [Tue, 28 May 2019 07:02:31 +0000 (15:02 +0800)]
rhashtable: Add rht_ptr_rcu and improve rht_ptr

This patch moves common code between rht_ptr and rht_ptr_exclusive
into __rht_ptr.  It also adds a new helper rht_ptr_rcu exclusively
for the RCU case.  This way rht_ptr becomes a lock-only construct
so we can use the lighter rcu_dereference_protected primitive.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: stmmac: use dev_info() before netdev is registered
Jisheng Zhang [Tue, 28 May 2019 07:02:07 +0000 (07:02 +0000)]
net: stmmac: use dev_info() before netdev is registered

Before the netdev is registered, calling netdev_info() will emit
something as "(unnamed net device) (uninitialized)", looks confusing.

Before this patch:
[    3.155028] stmmaceth f7b60000.ethernet (unnamed net_device) (uninitialized): device MAC address 52:1a:55:18:9e:9d

After this patch:
[    3.155028] stmmaceth f7b60000.ethernet: device MAC address 52:1a:55:18:9e:9d

Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoqed: fix spelling mistake "inculde" -> "include"
Colin Ian King [Tue, 28 May 2019 06:52:17 +0000 (07:52 +0100)]
qed: fix spelling mistake "inculde" -> "include"

There is a spelling mistake in a DP_INFO message. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: sh_eth: fix mdio access in sh_eth_close() for R-Car Gen2 and RZ/A1 SoCs
Yoshihiro Shimoda [Tue, 28 May 2019 04:10:46 +0000 (13:10 +0900)]
net: sh_eth: fix mdio access in sh_eth_close() for R-Car Gen2 and RZ/A1 SoCs

The sh_eth_close() resets the MAC and then calls phy_stop()
so that mdio read access result is incorrect without any error
according to kernel trace like below:

ifconfig-216   [003] .n..   109.133124: mdio_access: ee700000.ethernet-ffffffff read  phy:0x01 reg:0x00 val:0xffff

According to the hardware manual, the RMII mode should be set to 1
before operation the Ethernet MAC. However, the previous code was not
set to 1 after the driver issued the soft_reset in sh_eth_dev_exit()
so that the mdio read access result seemed incorrect. To fix the issue,
this patch adds a condition and set the RMII mode register in
sh_eth_dev_exit() for R-Car Gen2 and RZ/A1 SoCs.

Note that when I have tried to move the sh_eth_dev_exit() calling
after phy_stop() on sh_eth_close(), but it gets worse (kernel panic
happened and it seems that a register is accessed while the clock is
off).

Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge tag 'linux-kselftest-5.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Wed, 29 May 2019 20:20:02 +0000 (13:20 -0700)]
Merge tag 'linux-kselftest-5.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull Kselftest fixes from Shuah Khan:

 - Alexandre Belloni's fixes to rtc regressions introduced in kselftest
   Makefile test run output refactoring work from Kees Cook.

 - ftrace test checkbashisms fixes from Masami Hiramatsu

* tag 'linux-kselftest-5.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests: rtc: rtctest: specify timeouts
  selftests/harness: Allow test to configure timeout
  selftests/ftrace: Add checkbashisms meta-testcase
  selftests/ftrace: Make a script checkbashisms clean