asedeno.scripts.mit.edu Git

Merge branch 'bpf-libbpf-helpers'

Andrii Nakryiko says:

====================
This patch set makes bpf_helpers.h and bpf_endian.h a part of libbpf itself
for consumption by user BPF programs, not just selftests. It also splits off
tracing helpers into bpf_tracing.h, which also becomes part of libbpf. Some of
the legacy stuff (BPF_ANNOTATE_KV_PAIR, load_{byte,half,word}, bpf_map_def
with unsupported fields, etc, is extracted into selftests-only bpf_legacy.h.
All the selftests and samples are switched to use libbpf's headers and
selftests' ones are removed.

As part of this patch set we also add BPF_CORE_READ variadic macros, that are
simplifying BPF CO-RE reads, especially the ones that have to follow few
pointers. E.g., what in non-BPF world (and when using BCC) would be:

int x = s->a->b.c->d; /* s, a, and b.c are pointers */

Today would have to be written using explicit bpf_probe_read() calls as:

  void *t;
  int x;
  bpf_probe_read(&t, sizeof(t), s->a);
  bpf_probe_read(&t, sizeof(t), ((struct b *)t)->b.c);
  bpf_probe_read(&x, sizeof(x), ((struct c *)t)->d);

This is super inconvenient and distracts from program logic a lot. Now, with
added BPF_CORE_READ() macros, you can write the above as:

  int x = BPF_CORE_READ(s, a, b.c, d);

Up to 9 levels of pointer chasing are supported, which should be enough for
any practical purpose, hopefully, without adding too much boilerplate macro
definitions (though there is admittedly some, given how variadic and recursive
C macro have to be implemented).

There is also BPF_CORE_READ_INTO() variant, which relies on caller to allocate
space for result:

  int x;
  BPF_CORE_READ_INTO(&x, s, a, b.c, d);

Result of last bpf_probe_read() call in the chain of calls is the result of
BPF_CORE_READ_INTO(). If any intermediate bpf_probe_read() aall fails, then
all the subsequent ones will fail too, so this is sufficient to know whether
overall "operation" succeeded or not. No short-circuiting of bpf_probe_read()s
is done, though.

BPF_CORE_READ_STR_INTO() is added as well, which differs from
BPF_CORE_READ_INTO() only in that last bpf_probe_read() call (to read final
field after chasing pointers) is replaced with bpf_probe_read_str(). Result of
bpf_probe_read_str() is returned as a result of BPF_CORE_READ_STR_INTO() macro
itself, so that applications can track return code and/or length of read
string.

Patch set outline:
- patch #1 undoes previously added GCC-specific bpf-helpers.h include;
- patch #2 splits off legacy stuff we don't want to carry over;
- patch #3 adjusts CO-RE reloc tests to avoid subsequent naming conflict with
  BPF_CORE_READ;
- patch #4 splits off bpf_tracing.h;
- patch #5 moves bpf_{helpers,endian,tracing}.h and bpf_helper_defs.h
  generation into libbpf and adjusts Makefiles to include libbpf for header
  search;
- patch #6 adds variadic BPF_CORE_READ() macro family, as described above;
- patch #7 adds tests to verify all possible levels of pointer nestedness for
  BPF_CORE_READ(), as well as correctness test for BPF_CORE_READ_STR_INTO().

v4->v5:
- move BPF_CORE_READ() stuff into bpf_core_read.h header (Alexei);

v3->v4:
- rebase on latest bpf-next master;
- bpf_helper_defs.h generation is moved into libbpf's Makefile;

v2->v3:
- small formatting fixes and macro () fixes (Song);

v1->v2:
- fix CO-RE reloc tests before bpf_helpers.h move (Song);
- split off legacy stuff we don't want to carry over (Daniel, Toke);
- split off bpf_tracing.h (Daniel);
- fix samples/bpf build (assuming other fixes are applied);
- switch remaining maps either to bpf_map_def_legacy or BTF-defined maps;
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

selftests/bpf: Add BPF_CORE_READ and BPF_CORE_READ_STR_INTO macro tests

Validate BPF_CORE_READ correctness and handling of up to 9 levels of
nestedness using cyclic task->(group_leader->)*->tgid chains.

Also add a test of maximum-dpeth BPF_CORE_READ_STR_INTO() macro.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20191008175942.1769476-8-andriin@fb.com

libbpf: Add BPF_CORE_READ/BPF_CORE_READ_INTO helpers

Add few macros simplifying BCC-like multi-level probe reads, while also
emitting CO-RE relocations for each read.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20191008175942.1769476-7-andriin@fb.com

libbpf: Move bpf_{helpers, helper_defs, endian, tracing}.h into libbpf

Move bpf_helpers.h, bpf_tracing.h, and bpf_endian.h into libbpf. Move
bpf_helper_defs.h generation into libbpf's Makefile. Ensure all those
headers are installed along the other libbpf headers. Also, adjust
selftests and samples include path to include libbpf now.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20191008175942.1769476-6-andriin@fb.com

selftests/bpf: Split off tracing-only helpers into bpf_tracing.h

Split-off PT_REGS-related helpers into bpf_tracing.h header. Adjust
selftests and samples to include it where necessary.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20191008175942.1769476-5-andriin@fb.com

selftests/bpf: Adjust CO-RE reloc tests for new bpf_core_read() macro

To allow adding a variadic BPF_CORE_READ macro with slightly different
syntax and semantics, define CORE_READ in CO-RE reloc tests, which is
a thin wrapper around low-level bpf_core_read() macro, which in turn is
just a wrapper around bpf_probe_read().

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20191008175942.1769476-4-andriin@fb.com

selftests/bpf: samples/bpf: Split off legacy stuff from bpf_helpers.h

Split off few legacy things from bpf_helpers.h into separate
bpf_legacy.h file:
- load_{byte|half|word};
- remove extra inner_idx and numa_node fields from bpf_map_def and
introduce bpf_map_def_legacy for use in samples;
- move BPF_ANNOTATE_KV_PAIR into bpf_legacy.h.

Adjust samples and selftests accordingly by either including
bpf_legacy.h and using bpf_map_def_legacy, or switching to BTF-defined
maps altogether.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20191008175942.1769476-3-andriin@fb.com

selftests/bpf: Undo GCC-specific bpf_helpers.h changes

Having GCC provide its own bpf-helper.h is not the right approach and is
going to be changed. Undo bpf_helpers.h change before moving
bpf_helpers.h into libbpf.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20191008175942.1769476-2-andriin@fb.com

tun: fix memory leak in error path

syzbot reported a warning [1] that triggered after recent Jiri patch.

This exposes a bug that we hit already in the past (see commit
ff244c6b29b1 ("tun: handle register_netdevice() failures properly")
for details)

tun uses priv->destructor without an ndo_init() method.

register_netdevice() can return an error, but will
not call priv->destructor() in some cases. Jiri recent
patch added one more.

A long term fix would be to transfer the initialization
of what we destroy in ->destructor() in the ndo_init()

This looks a bit risky given the complexity of tun driver.

A simpler fix is to detect after the failed register_netdevice()
if the tun_free_netdev() function was called already.

[1]
ODEBUG: free active (active state 0) object type: timer_list hint: tun_flow_cleanup+0x0/0x280 drivers/net/tun.c:457
WARNING: CPU: 0 PID: 8653 at lib/debugobjects.c:481 debug_print_object+0x168/0x250 lib/debugobjects.c:481
Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 8653 Comm: syz-executor976 Not tainted 5.4.0-rc1-next-20191004 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
panic+0x2dc/0x755 kernel/panic.c:220
__warn.cold+0x2f/0x3c kernel/panic.c:581
report_bug+0x289/0x300 lib/bug.c:195
fixup_bug arch/x86/kernel/traps.c:174 [inline]
fixup_bug arch/x86/kernel/traps.c:169 [inline]
do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:267
do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:286
invalid_op+0x23/0x30 arch/x86/entry/entry_64.S:1028
RIP: 0010:debug_print_object+0x168/0x250 lib/debugobjects.c:481
Code: dd 80 b9 e6 87 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 b5 00 00 00 48 8b 14 dd 80 b9 e6 87 48 c7 c7 e0 ae e6 87 e8 80 84 ff fd <0f> 0b 83 05 e3 ee 80 06 01 48 83 c4 20 5b 41 5c 41 5d 41 5e 5d c3
RSP: 0018:ffff888095997a28 EFLAGS: 00010082
RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff815cb526 RDI: ffffed1012b32f37
RBP: ffff888095997a68 R08: ffff8880a92ac580 R09: ffffed1015d04101
R10: ffffed1015d04100 R11: ffff8880ae820807 R12: 0000000000000001
R13: ffffffff88fb5340 R14: ffffffff81627110 R15: ffff8880aa41eab8
__debug_check_no_obj_freed lib/debugobjects.c:963 [inline]
debug_check_no_obj_freed+0x2d4/0x43f lib/debugobjects.c:994
kfree+0xf8/0x2c0 mm/slab.c:3755
kvfree+0x61/0x70 mm/util.c:593
netdev_freemem net/core/dev.c:9384 [inline]
free_netdev+0x39d/0x450 net/core/dev.c:9533
tun_set_iff drivers/net/tun.c:2871 [inline]
__tun_chr_ioctl+0x317b/0x3f30 drivers/net/tun.c:3075
tun_chr_ioctl+0x2b/0x40 drivers/net/tun.c:3355
vfs_ioctl fs/ioctl.c:47 [inline]
file_ioctl fs/ioctl.c:539 [inline]
do_vfs_ioctl+0xdb6/0x13e0 fs/ioctl.c:726
ksys_ioctl+0xab/0xd0 fs/ioctl.c:743
__do_sys_ioctl fs/ioctl.c:750 [inline]
__se_sys_ioctl fs/ioctl.c:748 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:748
do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x441439
Code: e8 9c ae 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 3b 0a fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fff61c37438 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000441439
RDX: 0000000020000400 RSI: 00000000400454ca RDI: 0000000000000004
RBP: 00007fff61c37470 R08: 0000000000000001 R09: 0000000100000000
R10: 0000000000000000 R11: 0000000000000246 R12: ffffffffffffffff
R13: 0000000000000005 R14: 0000000000000000 R15: 0000000000000000
Kernel Offset: disabled
Rebooting in 86400 seconds..

Fixes: ff92741270bf ("net: introduce name_node struct to be used in hashlist")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>

netdevsim: fix spelling mistake "forbidded" -> "forbid"

There is a spelling mistake in a NL_SET_ERR_MSG_MOD message. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>

net: phy: mscc: make arrays static, makes object smaller

Don't populate const arrays on the stack but instead make them
static. Makes the object code smaller by 1058 bytes.

Before:
   text    data     bss     dec     hex filename
  29879    6144       0   36023    8cb7 drivers/net/phy/mscc.o

After:
   text    data     bss     dec     hex filename
  28437    6528       0   34965    8895 drivers/net/phy/mscc.o

(gcc version 9.2.1, amd64)

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>

nfp: bpf: make array exp_mask static, makes object smaller

Don't populate the array exp_mask on the stack but instead make it
static. Makes the object code smaller by 224 bytes.

Before:
   text    data     bss     dec     hex filename
  77832    2290       0   80122   138fa ethernet/netronome/nfp/bpf/jit.o

After:
   text    data     bss     dec     hex filename
  77544    2354       0   79898   1381a ethernet/netronome/nfp/bpf/jit.o

(gcc version 9.2.1, amd64)

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>

spi: Add a PTP system timestamp to the transfer structure

SPI is one of the interfaces used to access devices which have a POSIX
clock driver (real time clocks, 1588 timers etc). The fact that the SPI
bus is slow is not what the main problem is, but rather the fact that
drivers don't take a constant amount of time in transferring data over
SPI. When there is a high delay in the readout of time, there will be
uncertainty in the value that has been read out of the peripheral.
When that delay is constant, the uncertainty can at least be
approximated with a certain accuracy which is fine more often than not.

Timing jitter occurs all over in the kernel code, and is mainly caused
by having to let go of the CPU for various reasons such as preemption,
servicing interrupts, going to sleep, etc. Another major reason is CPU
dynamic frequency scaling.

It turns out that the problem of retrieving time from a SPI peripheral
with high accuracy can be solved by the use of "PTP system
timestamping" - a mechanism to correlate the time when the device has
snapshotted its internal time counter with the Linux system time at that
same moment. This is sufficient for having a precise time measurement -
it is not necessary for the whole SPI transfer to be transmitted "as
fast as possible", or "as low-jitter as possible". The system has to be
low-jitter for a very short amount of time to be effective.

This patch introduces a PTP system timestamping mechanism in struct
spi_transfer. This is to be used by SPI device drivers when they need to
know the exact time at which the underlying device's time was
snapshotted. More often than not, SPI peripherals have a very exact
timing for when their SPI-to-interconnect bridge issues a transaction
for snapshotting and reading the time register, and that will be
dependent on when the SPI-to-interconnect bridge figures out that this
is what it should do, aka as soon as it sees byte N of the SPI transfer.
Since spi_device drivers are the ones who'd know best how the peripheral
behaves in this regard, expose a mechanism in spi_transfer which allows
them to specify which word (or word range) from the transfer should be
timestamped.

Add a default implementation of the PTP system timestamping in the SPI
core. This is not going to be satisfactory performance-wise, but should
at least increase the likelihood that SPI device drivers will use PTP
system timestamping in the future.
There are 3 entry points from the core towards the SPI controller
drivers:

- transfer_one: The driver is passed individual spi_transfers to
  execute. This is the easiest to timestamp.

- transfer_one_message: The core passes the driver an entire spi_message
  (a potential batch of spi_transfers). The core puts the same pre and
  post timestamp to all transfers within a message. This is not ideal,
  but nothing better can be done by default anyway, since the core has
  no insight into how the driver batches the transfers.

- transfer: Like transfer_one_message, but for unqueued drivers (i.e.
  the driver implements its own queue scheduling).

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20190905010114.26718-3-olteanv@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>

samples: bpf: Add max_pckt_size option at xdp_adjust_tail

Currently, at xdp_adjust_tail_kern.c, MAX_PCKT_SIZE is limited
to 600. To make this size flexible, static global variable
'max_pcktsz' is added.

By updating new packet size from the user space, xdp_adjust_tail_kern.o
will use this value as a new max packet size.

This static global variable can be accesible from .data section with
bpf_object__find_map* from user space, since it is considered as
internal map (accessible with .bss/.data/.rodata suffix).

If no '-P <MAX_PCKT_SIZE>' option is used, the size of maximum packet
will be 600 as a default.

For clarity, change the helper to fetch map from 'bpf_map__next'
to 'bpf_object__find_map_fd_by_name'. Also, changed the way to
test prog_fd, map_fd from '!= 0' to '< 0', since fd could be 0
when stdin is closed.

Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20191007172117.3916-1-danieltimlee@gmail.com

Merge branch 'enforce-global-flow-dissector'

Stanislav Fomichev says:

====================
While having a per-net-ns flow dissector programs is convenient for
testing, security-wise it's better to have only one vetted global
flow dissector implementation.

Let's have a convention that when BPF flow dissector is installed
in the root namespace, child namespaces can't override it.

The intended use-case is to attach global BPF flow dissector
early from the init scripts/systemd. Attaching global dissector
is prohibited if some non-root namespace already has flow dissector
attached. Also, attaching to non-root namespace is prohibited
when there is flow dissector attached to the root namespace.

v3:
* drop extra check and empty line (Andrii Nakryiko)

v2:
* EPERM -> EEXIST (Song Liu)
* Make sure we don't have dissector attached to non-root namespaces
when attaching the global one (Andrii Nakryiko)
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: add test for BPF flow dissector in the root namespace

Make sure non-root namespaces get an error if root flow dissector is
attached.

Cc: Petar Penkov <ppenkov@google.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf/flow_dissector: add mode to enforce global BPF flow dissector

Always use init_net flow dissector BPF program if it's attached and fall
back to the per-net namespace one. Also, deny installing new programs if
there is already one attached to the root namespace.
Users can still detach their BPF programs, but can't attach any
new ones (-EEXIST).

Cc: Petar Penkov <ppenkov@google.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

samples/bpf: Trivial - fix spelling mistake in usage

Fix spelling mistake.

Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20191007082636.14686-1-anton.ivanov@cambridgegreys.com

bpftool: Fix bpftool build by switching to bpf_object__open_file()

As part of libbpf in 5e61f2707029 ("libbpf: stop enforcing kern_version,
populate it for users") non-LIBBPF_API __bpf_object__open_xattr() API
was removed from libbpf.h header. This broke bpftool, which relied on
that function. This patch fixes the build by switching to newly added
bpf_object__open_file() which provides the same capabilities, but is
official and future-proof API.

v1->v2:
- fix prog_type shadowing (Stanislav).

Fixes: 5e61f2707029 ("libbpf: stop enforcing kern_version, populate it for users")
Reported-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20191007225604.2006146-1-andriin@fb.com

selftests/bpf: Fix dependency ordering for attach_probe test

Current Makefile dependency chain is not strict enough and allows
test_attach_probe.o to be built before test_progs's
prog_test/attach_probe.o is built, which leads to assembler complaining
about missing included binary.

This patch is a minimal fix to fix this issue by enforcing that
test_attach_probe.o (BPF object file) is built before
prog_tests/attach_probe.c is attempted to be compiled.

Fixes: 928ca75e59d7 ("selftests/bpf: switch tests to new bpf_object__open_{file, mem}() APIs")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20191007204149.1575990-1-andriin@fb.com

Merge branch 'dpaa2-eth-misc-cleanup'

Ioana Ciornei says:

====================
dpaa2-eth: misc cleanup

This patch set consists of some cleanup patches ranging from removing dead
code to fixing a minor issue in ethtool stats. Also, unbounded while loops
are removed from the driver by adding a maximum number of retries for DPIO
portal commands.

Changes in v2:
- return -ETIMEDOUT where possible if the number of retries is hit
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

dpaa2-eth: Avoid unbounded while loops

Throughout the driver there are several places where we wait
indefinitely for DPIO portal commands to be executed, while
the portal returns a busy response code.

Even though in theory we are guaranteed the portals become
available eventually, in practice the QBMan hardware module
may become unresponsive in various corner cases.

Make sure we can never get stuck in an infinite while loop
by adding a retry counter for all portal commands.

Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dpaa2-eth: Fix minor bug in ethtool stats reporting

Don't print error message for a successful return value.

Fixes: d84c3a4ded96 ("dpaa2-eth: Add new DPNI statistics counters")
Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dpaa2-eth: Cleanup dead code

Remove one function call whose result was not used anywhere.

Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: make array tick_array static, makes object smaller

Don't populate the array tick_array on the stack but instead make it
static. Makes the object code smaller by 29 bytes.

Before:
   text    data     bss     dec     hex filename
  19191     432       0   19623    4ca7 hisilicon/hns3/hns3pf/hclge_tm.o

After:
   text    data     bss     dec     hex filename
  19098     496       0   19594    4c8a hisilicon/hns3/hns3pf/hclge_tm.o

(gcc version 9.2.1, amd64)

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns: make arrays static, makes object smaller

Don't populate the arrays port_map and sl_map on the stack but
instead make them static. Makes the object code smaller by 64 bytes.

Before:
   text    data     bss     dec     hex filename
  49575    6872      64   56511    dcbf hisilicon/hns/hns_dsaf_main.o

After:
   text    data     bss     dec     hex filename
  49350    7032      64   56446    dc7e hisilicon/hns/hns_dsaf_main.o

(gcc version 9.2.1, amd64)

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-tls-minor-micro-optimizations'

Jakub Kicinski says:

====================
net/tls: minor micro optimizations

This set brings a number of minor code changes from my tree which
don't have a noticeable impact on performance but seem reasonable
nonetheless.

First sk_msg_sg copy array is converted to a bitmap, zeroing that
structure takes a lot of time, hence we should try to keep it
small.

Next two conditions are marked as unlikely, GCC seemed to had
little trouble correctly reasoning about those.

Patch 4 adds parameters to tls_device_decrypted() to avoid
walking structures, as all callers already have the relevant
pointers.

Lastly two boolean members of TLS context structures are
converted to a bitfield.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net/tls: store decrypted on a single bit

Use a single bit instead of boolean to remember if packet
was already decrypted.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/tls: store async_capable on a single bit

Store async_capable on a single bit instead of a full integer
to save space.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/tls: pass context to tls_device_decrypted()

Avoid unnecessary pointer chasing and calculations, callers already
have most of the state tls_device_decrypted() needs.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/tls: make allocation failure unlikely

Make sure GCC realizes it's unlikely that allocations will fail.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/tls: mark sk->err being set as unlikely

Tell GCC sk->err is not likely to be set.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sockmap: use bitmap for copy info

Don't use bool array in struct sk_msg_sg, save 12 bytes.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: core: use helper skb_ensure_writable in more places

Use helper skb_ensure_writable in two more places to simplify the code.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: Make ipv6_mc_may_pull() return bool.

Consistent with how pskb_may_pull() also now does so.

Signed-off-by: David S. Miller <davem@davemloft.net>

net: core: change return type of pskb_may_pull to bool

This function de-facto returns a bool, so let's change the return type
accordingly.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ena-set_channels'

Sameeh Jubran says:

====================
ena: Support ethtool set_channels

Difference from v2:
* ethtool's set/get channels: Switched to using combined instead of
separate rx/tx
* Fixed error handling in set_channels
* Fixed indentation and cosmetic issues as requested by Jakub Kicinski

Difference from v1:
* Dropped the print from patch 0002 - "net: ena: multiple queue creation
related cleanups" as requested by David Miller
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: ena: ethtool: support set_channels callback

Set channels callback enables the user to change the count of queues
used by the driver using ethtool. We decided to currently support only
equal number of rx and tx queues, this might change in the future.

Also rename dev_up to dev_was_up in ena_update_queue_count() to make
it clearer.

Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ena: remove redundant print of number of queues

The number of queues can be derived using ethtool, no need to print
it in ena_probe()

Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ena: make ethtool -l show correct max number of queues

- Update ena_ethtool:ena_get_channels() to return adapter->max_io_queues
  so that ethtool -l returns the correct maximum queue number.

- Change the name of ena_calc_io_queue_num() to
  ena_calc_max_io_queue_num() as it returns the maximum number of io
  queues and actual number of queues can be smaller if changed
  by ethtool -L which is implemented in a later commit.

- Change variable name from io_queue_num to max_num_io_queues in
  ena_calc_max_io_queue_num() and ena_probe().

- Make all types of variables that convey the number and sizeof queues
  to be u32, for consistency with the API between the driver and the
  device.

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ena: ethtool: get_channels: use combined only

Since we use the same IRQ and NAPI to service RX and TX then we need to
use a combined channel instead of rx and tx channels.

Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ena: multiple queue creation related cleanups

- Rename ena_calc_queue_size() to ena_calc_io_queue_size() for clarity
  and consistency
- Remove redundant number of io queues parameter in functions
  ena_enable_msix() and ena_enable_msix_and_set_admin_interrupts(),
  which already get adapter parameter, so use adapter->num_io_queues
  in the function instead.
- Use the local variable ena_dev instead of ctx->ena_dev in
  ena_calc_io_queue_size
- Fix multi row comment alignments

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ena: change num_queues to num_io_queues for clarity and consistency

Most places in the code refer to the IO queues as io_queues and not
simply queues. Examples - max_io_queues_per_vf, ENA_MAX_NUM_IO_QUEUES,
ena_destroy_all_io_queues() etc..

We are also adding the new max_num_io_queues field to struct ena_adapter
in the following commit.

The changes included in this commit are:
struct ena_adapter->num_queues => struct ena_adapter->num_io_queues

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'samples-pktgen-allow-to-specify-destination-IP-range'

Daniel T. Lee says:

====================
samples: pktgen: allow to specify destination IP range

Currently, pktgen script supports specify destination port range.

To further extend the capabilities, this commit allows to specify destination
IP range with CIDR when running pktgen script.

Specifying destination IP range will be useful on various situation such as
testing RSS/RPS with randomizing n-tuple.

This patchset fixes the problem with checking the command result on proc_cmd,
and add feature to allow destination IP range.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

samples: pktgen: allow to specify destination IP range (CIDR)

Currently, kernel pktgen has the feature to specify destination
address range for sending packet. (e.g. pgset "dst_min/dst_max")

But on samples, each pktgen script doesn't have any option to achieve this.

This commit adds the feature to specify the destination address range with CIDR.

    -d : ($DEST_IP)   destination IP. CIDR (e.g. 198.18.0.0/15) is also allowed

    # ./pktgen_sample01_simple.sh -6 -d fe80::20/126 -p 3000 -n 4
    # tcpdump ip6 and udp
    05:14:18.082285 IP6 fe80::99.71 > fe80::23.3000: UDP, length 16
    05:14:18.082564 IP6 fe80::99.43 > fe80::23.3000: UDP, length 16
    05:14:18.083366 IP6 fe80::99.107 > fe80::22.3000: UDP, length 16
    05:14:18.083585 IP6 fe80::99.97 > fe80::21.3000: UDP, length 16

Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

samples: pktgen: add helper functions for IP(v4/v6) CIDR parsing

This commit adds CIDR parsing and IP validate helper function to parse
single IP or range of IP with CIDR. (e.g. 198.18.0.0/15)

Validating the address should be preceded prior to the parsing.
Helpers will be used in prior to set target address in samples/pktgen.

Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

samples: pktgen: fix proc_cmd command result check logic

Currently, proc_cmd is used to dispatch command to 'pg_ctrl', 'pg_thread',
'pg_set'. proc_cmd is designed to check command result with grep the
"Result:", but this might fail since this string is only shown in
'pg_thread' and 'pg_set'.

This commit fixes this logic by grep-ing the "Result:" string only when
the command is not for 'pg_ctrl'.

For clarity of an execution flow, 'errexit' flag has been set.

To cleanup pktgen on exit, trap has been added for EXIT signal.

Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

samples: pktgen: make variable consistent with option

This commit changes variable names that can cause confusion.

For example, variable DST_MIN is quite confusing since the
keyword 'udp_dst_min' and keyword 'dst_min' is used with pg_ctrl.

On the following commit, 'dst_min' will be used to set destination IP,
and the existing variable name DST_MIN should be changed.

Variable names are matched to the exact keyword used with pg_ctrl.

Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'netdevsim-implement-devlink-dev_info-op'

Jiri Pirko says:

====================
netdevsim: implement devlink dev_info op

Initial implementation of devlink dev_info op - just driver name is
filled up and sent to user. Bundled with selftest.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: add netdevsim devlink dev info test

Add test to verify netdevsim driver name returned by devlink dev info.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netdevsim: implement devlink dev_info op

Do simple dev_info devlink operation implementation which only fills up
the driver name.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: devlink: fix reporter dump dumpit

In order for attrs to be prepared for reporter dump dumpit callback,
set GENL_DONT_VALIDATE_DUMP_STRICT instead of GENL_DONT_VALIDATE_DUMP.

Fixes: ee85da535fe3 ("devlink: have genetlink code to parse the attrs during dumpit"
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'autogen-bpf-helpers'

Andrii Nakryiko says:

====================
This patch set adds ability to auto-generate list of BPF helper definitions.
It relies on existing scripts/bpf_helpers_doc.py and include/uapi/linux/bpf.h
having a well-defined set of comments. bpf_helper_defs.h contains all BPF
helper signatures which stay in sync with latest bpf.h UAPI. This
auto-generated header is included from bpf_helpers.h, while all previously
hand-written BPF helper definitions are simultaneously removed in patch #3.
The end result is less manually maintained and redundant boilerplate code,
while also more consistent and well-documented set of BPF helpers. Generated
helper definitions are completely independent from a specific bpf.h on
a target system, because it doesn't use BPF_FUNC_xxx enums.

v3->v4:
- instead of libbpf's Makefile, integrate with selftest/bpf's Makefile (Alexei);

v2->v3:
- delete bpf_helper_defs.h properly (Alexei);

v1->v2:
- add bpf_helper_defs.h to .gitignore and `make clean` (Alexei).
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>

libbpf: auto-generate list of BPF helper definitions

Get rid of list of BPF helpers in bpf_helpers.h (irony...) and
auto-generate it into bpf_helpers_defs.h, which is now included from
bpf_helpers.h.

Suggested-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

scripts/bpf: teach bpf_helpers_doc.py to dump BPF helper definitions

Enhance scripts/bpf_helpers_doc.py to emit C header with BPF helper
definitions (to be included from libbpf's bpf_helpers.h).

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

uapi/bpf: fix helper docs

Various small fixes to BPF helper documentation comments, enabling
automatic header generation with a list of BPF helpers.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Merge branch 'stmmac-next'

Jose Abreu says:

====================
net: stmmac: Improvements for -next

Improvements for -next. More info in commit logs.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: Implement L3/L4 Filters in GMAC4+

GMAC4+ cores support Layer 3 and Layer 4 filtering. Add the
corresponding callbacks in these cores.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: selftests: Add tests for VLAN Perfect Filtering

Add two new tests for VLAN Perfect Filtering. While at it, increase a
little bit the tests strings lenght so that we can have more descriptive
test names.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: Fallback to VLAN Perfect filtering if HASH is not available

If VLAN Hash Filtering is not available we can fallback to perfect
filtering instead. Let's implement this in XGMAC and GMAC cores and let
the user use this filter.

VLAN VID=0 always passes filter so we check if more than 2 VLANs are
created and return proper error code if so because perfect filtering
only supports 1 VID at a time.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfc: s3fwrn5: fix platform_no_drv_owner.cocci warning

Remove .owner field if calls are used which set it automatically
Generated by: scripts/coccinelle/api/platform_no_drv_owner.cocci

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfc: nfcmrvl: fix platform_no_drv_owner.cocci warning

Remove .owner field if calls are used which set it automatically
Generated by: scripts/coccinelle/api/platform_no_drv_owner.cocci

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: ksz9477: fix platform_no_drv_owner.cocci warning

Remove .owner field if calls are used which set it automatically
Generated by: scripts/coccinelle/api/platform_no_drv_owner.cocci

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/rds: Add missing include file

Fix build error:

net/rds/ib_cm.c: In function rds_dma_hdrs_alloc:
net/rds/ib_cm.c:475:13: error: implicit declaration of function dma_pool_zalloc; did you mean mempool_alloc? [-Werror=implicit-function-declaration]
   hdrs[i] = dma_pool_zalloc(pool, GFP_KERNEL, &hdr_daddrs[i]);
             ^~~~~~~~~~~~~~~
             mempool_alloc

net/rds/ib.c: In function rds_ib_dev_free:
net/rds/ib.c:111:3: error: implicit declaration of function dma_pool_destroy; did you mean mempool_destroy? [-Werror=implicit-function-declaration]
   dma_pool_destroy(rds_ibdev->rid_hdrs_pool);
   ^~~~~~~~~~~~~~~~
   mempool_destroy

Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: 9b17f5884be4 ("net/rds: Use DMA memory pool allocation for rds_header")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mlxsw-Query-number-of-modules-from-firmware'

Ido Schimmel says:

====================
mlxsw: Query number of modules from firmware

Vadim says:

The patchset adds support for a new field "num_of_modules" of Management
General Peripheral Information Register (MGPIR), providing the maximum
number of QSFP modules, which can be supported by the system.

It allows to obtain the number of QSFP modules directly from this field,
as a static data, instead of old method of getting this info through
"network port to QSFP module" mapping. With the old method, in case of
port dynamic re-configuration some modules can logically "disappear" as
a result of port split operations, which can cause some modules to
appear missing.

Such scenario can happen on a system equipped with a BMC card, while PCI
chip driver at host CPU side can perform some ports "split" or "unsplit"
operations, while BMC side I2C chip driver reads the "port-to-module"
mapping.

Add common API for FW "minor" and "subminor" versions validation and
share it between PCI and I2C based drivers.

Add FW version validation for "minimal" driver, because use of new field
"num_of_modules" in MGPIR register is not backward compatible.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: minimal: Add validation for FW version

Add validation for FW version in order to prevent driver initialization
in case FW version is older than expected. FW version validation is
necessary, because use of a new field 'num_of_modules' in MGPIR register
is not backward compatible. FW 'minor' and 'subminor' versions are
expected to be greater than or equal to 2000 and 1886, respectively.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: core: Push minor/subminor fw version check into helper

Add new API for FW "minor" and "subminor" version validation for
sharing it between "spectrum" and "minimal" drivers.
Use it in "spectrum" driver.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: thermal: Provide optimization for QSFP modules number detection

Use new field "num_of_modules" of MGPIR register for "thermal" interface
in order to get the number of modules supported by system directly from
the system configuration, instead of getting it from port to module
mapping info.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: hwmon: Provide optimization for QSFP modules number detection

Use new field "num_of_modules" of MGPIR register for "hwmon" interface
in order to get the number of modules supported by system directly from
the system configuration, instead of getting it from port to module
mapping info.

Reading this info through MGPIR register is faster and does not depend
on possible dynamic re-configuration of ports.
In case of port dynamic re-configuration some modules can logically
"disappear" as a result of port split and un-spilt operations, which
can cause missing of some modules, in case this info is taken from port
to module mapping info.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: reg: Extend MGPIR register with new field exposing the number of QSFP modules

Extend MGPIR - Management General Peripheral Information Register
with new field "num_of_modules" exposing the number of modules
supported by specific system.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'netdevsim-allow-to-test-reload-failures'

Jiri Pirko says:

====================
netdevsim: allow to test reload failures

Allow user to test devlink reload failures: Fail to reload and fail
during reload.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: test netdevsim reload forbid and fail

Extend netdevsim reload test by simulation of failures.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netdevsim: add couple of debugfs bools to debug devlink reload

Add flag to disallow reload and another one that causes reload to
always fail.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-genetlink-parse-attrs-for-dumpit-callback'

Jiri Pirko says:

====================
net: genetlink: parse attrs for dumpit() callback

In generic netlink, parsing attributes for doit() callback is already
implemented. They are available in info->attrs.

For dumpit() however, each user which is interested in attributes have to
parse it manually. Even though the attributes may be (depending on flag)
already validated (by parse function).

Make usage of attributes in dumpit() more convenient and prepare
info->attrs too.

Patchset also make the existing users of genl_family_attrbuf() converted
to use info->attrs and removes the helper.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: have genetlink code to parse the attrs during dumpit

Benefit from the fact that the generic netlink code can parse the attrs
for dumpit op and avoid need to parse it in the op callback.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: genetlink: remove unused genl_family_attrbuf()

genl_family_attrbuf() function is no longer used by anyone, so remove it.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: tipc: allocate attrs locally instead of using genl_family_attrbuf in compat_dumpit()

As this is the last user of genl_family_attrbuf, convert to allocate
attrs locally and do it in a similar way this is done in compat_doit().

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: tipc: have genetlink code to parse the attrs during dumpit

Benefit from the fact that the generic netlink code can parse the attrs
for dumpit op and avoid need to parse it in the op callback.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: nfc: have genetlink code to parse the attrs during dumpit

Benefit from the fact that the generic netlink code can parse the attrs
for dumpit op and avoid need to parse it in the op callback.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ieee802154: have genetlink code to parse the attrs during dumpit

Benefit from the fact that the generic netlink code can parse the attrs
for dumpit op and avoid need to parse it in the op callback.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: genetlink: parse attrs and store in contect info struct during dumpit

Extend the dumpit info struct for attrs. Instead of existing attribute
validation do parse them and save in the info struct. Caller can benefit
from this and does not have to do parse itself. In order to properly
free attrs, genl_family pointer needs to be added to dumpit info struct
as well.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: genetlink: push attrbuf allocation and parsing to a separate function

To be re-usable by dumpit as well, push the code that is taking care of
attrbuf allocation and parting from doit into separate function.
Introduce a helper to free the buffer too.

Check family->maxattr too before calling kfree() to be symmetrical with
the allocation check.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: genetlink: introduce dump info struct to be available during dumpit op

Currently the cb->data is taken by ops during non-parallel dumping.
Introduce a new structure genl_dumpit_info and store the ops there.
Distribute the info to both non-parallel and parallel dumping. Also add
a helper genl_dumpit_info() to easily get the info structure in the
dumpit callback from cb.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: genetlink: push doit/dumpit code from genl_family_rcv_msg

Currently the function genl_family_rcv_msg() is quite big. Since it is
quite convenient, push code that is related to doit and dumpit ops into
separate functions.

Do small changes on the way, like rc/err unification, NULL check etc.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

openvswitch: Allow attaching helper in later commit

This patch allows to attach conntrack helper to a confirmed conntrack
entry.  Currently, we can only attach alg helper to a conntrack entry
when it is in the unconfirmed state.  This patch enables an use case
that we can firstly commit a conntrack entry after it passed some
initial conditions.  After that the processing pipeline will further
check a couple of packets to determine if the connection belongs to
a particular application, and attach alg helper to the connection
in a later stage.

Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

libbpf: Add cscope and tags targets to Makefile

Using cscope and/or TAGS files for navigating the source code is useful.
Add simple targets to the Makefile to generate the index files for both
tools.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20191004153444.1711278-1-toke@redhat.com

Merge branch 'libbpf-api'

Andrii Nakryiko says:

====================
Add bpf_object__open_file() and bpf_object__open_mem() APIs that use a new
approach to providing future-proof non-ABI-breaking API changes. It relies on
APIs accepting optional self-describing "opts" struct, containing its own
size, filled out and provided by potentially outdated (as well as
newer-than-libbpf) user application. A set of internal helper macros
(OPTS_VALID, OPTS_HAS, and OPTS_GET) streamline and simplify a graceful
handling forward and backward compatibility for user applications dynamically
linked against different versions of libbpf shared library.

Users of libbpf are provided with convenience macro LIBBPF_OPTS that takes
care of populating correct structure size and zero-initializes options struct,
which helps avoid obscure issues of unitialized padding. Uninitialized padding
in a struct might turn into garbage-populated new fields understood by future
versions of libbpf.

Patch #1 removes enforcement of kern_version in libbpf and always populates
correct one on behalf of users.
Patch #2 defines necessary infrastructure for options and two new open APIs
relying on it.
Patch #3 fixes bug in bpf_object__name().
Patch #4 switches two of test_progs' tests to use new APIs as a validation
that they work as expected.

v2->v3:
- fix LIBBPF_OPTS() to ensure zero-initialization of padded bytes;
- pass through name override and relaxed maps flag for open_file() (Toke);
- fix bpf_object__name() to actually return object name;
- don't bother parsing and verifying version section (John);

v1->v2:
- use better approach for tracking last field in opts struct;
- convert few tests to new APIs for validation;
- fix bug with using offsetof(last_field) instead of offsetofend(last_field).
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: switch tests to new bpf_object__open_{file, mem}() APIs

Verify new bpf_object__open_mem() and bpf_object__open_file() APIs work
as expected by switching test_attach_probe test to use embedded BPF
object and bpf_object__open_mem() and test_reference_tracking to
bpf_object__open_file().

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

libbpf: fix bpf_object__name() to actually return object name

bpf_object__name() was returning file path, not name. Fix this.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

libbpf: add bpf_object__open_{file, mem} w/ extensible opts

Add new set of bpf_object__open APIs using new approach to optional
parameters extensibility allowing simpler ABI compatibility approach.

This patch demonstrates an approach to implementing libbpf APIs that
makes it easy to extend existing APIs with extra optional parameters in
such a way, that ABI compatibility is preserved without having to do
symbol versioning and generating lots of boilerplate code to handle it.
To facilitate succinct code for working with options, add OPTS_VALID,
OPTS_HAS, and OPTS_GET macros that hide all the NULL, size, and zero
checks.

Additionally, newly added libbpf APIs are encouraged to follow similar
pattern of having all mandatory parameters as formal function parameters
and always have optional (NULL-able) xxx_opts struct, which should
always have real struct size as a first field and the rest would be
optional parameters added over time, which tune the behavior of existing
API, if specified by user.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

libbpf: stop enforcing kern_version, populate it for users

Kernel version enforcement for kprobes/kretprobes was removed from
5.0 kernel in 6c4fc209fcf9 ("bpf: remove useless version check for prog load").
Since then, BPF programs were specifying SEC("version") just to please
libbpf. We should stop enforcing this in libbpf, if even kernel doesn't
care. Furthermore, libbpf now will pre-populate current kernel version
of the host system, in case we are still running on old kernel.

This patch also removes __bpf_object__open_xattr from libbpf.h, as
nothing in libbpf is relying on having it in that header. That function
was never exported as LIBBPF_API and even name suggests its internal
version. So this should be safe to remove, as it doesn't break ABI.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

libbpf: Fix BTF-defined map's __type macro handling of arrays

Due to a quirky C syntax of declaring pointers to array or function
prototype, existing __type() macro doesn't work with map key/value types
that are array or function prototype. One has to create a typedef first
and use it to specify key/value type for a BPF map. By using typeof(),
pointer to type is now handled uniformly for all kinds of types. Convert
one of self-tests as a demonstration.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20191004040211.2434033-1-andriin@fb.com

Merge branch 'create-netdevsim-instances-in-namespace'

Jiri Pirko says:

====================
create netdevsim instances in namespace

Allow user to create netdevsim devlink and netdevice instances in a
network namespace according to the namespace where the user resides in.
Add a selftest to test this.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: test creating netdevsim inside network namespace

Add a test that creates netdevsim instance inside network namespace
and verifies that the related devlink instance and port netdevices
reside in the namespace.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netdevsim: create devlink and netdev instances in namespace

When user does create new netdevsim instance using sysfs bus file,
create the devlink instance and related netdev instance in the namespace
of the caller.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: devlink: export devlink net setter

For newly allocated devlink instance allow drivers to set net struct

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-tls-add-ctrl-path-tracing-and-statistics'

Jakub Kicinski says:

====================
net/tls: add ctrl path tracing and statistics

This set adds trace events related to TLS offload and basic MIB stats
for TLS.

First patch contains the TLS offload related trace points. Those are
helpful in troubleshooting offload issues, especially around the
resync paths.

Second patch adds a tracepoint to the fastpath of device offload,
it's separated out in case there will be objections to adding
fast path tracepoints. Again, it's quite useful for debugging
offload issues.

Next four patches add MIB statistics. The statistics are implemented
as per-cpu per-netns counters. Since there are currently no fast path
statistics we could move to atomic variables. Per-CPU seem more common.

Most basic statistics are number of created and live sessions, broken
out to offloaded and non-offloaded. Users seem to like those a lot.

Next there is a statistic for decryption errors. These are primarily
useful for device offload debug, in normal deployments decryption
errors should not be common.

Last but not least a counter for device RX resync.
====================

Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/tls: add TlsDeviceRxResync statistic

Add a statistic for number of RX resyncs sent down to the NIC.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/tls: add TlsDecryptError stat

Add a statistic for TLS record decryption errors.

Since devices are supposed to pass records as-is when they
encounter errors this statistic will count bad records in
both pure software and inline crypto configurations.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/tls: add statistics for installed sessions

Add SNMP stats for number of sockets with successfully
installed sessions. Break them down to software and
hardware ones. Note that if hardware offload fails
stack uses software implementation, and counts the
session appropriately.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>