]> asedeno.scripts.mit.edu Git - linux.git/log
linux.git
4 years agoCIFS: Fix task struct use-after-free on reconnect
Vincent Whitchurch [Thu, 23 Jan 2020 16:09:06 +0000 (17:09 +0100)]
CIFS: Fix task struct use-after-free on reconnect

The task which created the MID may be gone by the time cifsd attempts to
call the callbacks on MIDs from cifs_reconnect().

This leads to a use-after-free of the task struct in cifs_wake_up_task:

 ==================================================================
 BUG: KASAN: use-after-free in __lock_acquire+0x31a0/0x3270
 Read of size 8 at addr ffff8880103e3a68 by task cifsd/630

 CPU: 0 PID: 630 Comm: cifsd Not tainted 5.5.0-rc6+ #119
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
 Call Trace:
  dump_stack+0x8e/0xcb
  print_address_description.constprop.5+0x1d3/0x3c0
  ? __lock_acquire+0x31a0/0x3270
  __kasan_report+0x152/0x1aa
  ? __lock_acquire+0x31a0/0x3270
  ? __lock_acquire+0x31a0/0x3270
  kasan_report+0xe/0x20
  __lock_acquire+0x31a0/0x3270
  ? __wake_up_common+0x1dc/0x630
  ? _raw_spin_unlock_irqrestore+0x4c/0x60
  ? mark_held_locks+0xf0/0xf0
  ? _raw_spin_unlock_irqrestore+0x39/0x60
  ? __wake_up_common_lock+0xd5/0x130
  ? __wake_up_common+0x630/0x630
  lock_acquire+0x13f/0x330
  ? try_to_wake_up+0xa3/0x19e0
  _raw_spin_lock_irqsave+0x38/0x50
  ? try_to_wake_up+0xa3/0x19e0
  try_to_wake_up+0xa3/0x19e0
  ? cifs_compound_callback+0x178/0x210
  ? set_cpus_allowed_ptr+0x10/0x10
  cifs_reconnect+0xa1c/0x15d0
  ? generic_ip_connect+0x1860/0x1860
  ? rwlock_bug.part.0+0x90/0x90
  cifs_readv_from_socket+0x479/0x690
  cifs_read_from_socket+0x9d/0xe0
  ? cifs_readv_from_socket+0x690/0x690
  ? mempool_resize+0x690/0x690
  ? rwlock_bug.part.0+0x90/0x90
  ? memset+0x1f/0x40
  ? allocate_buffers+0xff/0x340
  cifs_demultiplex_thread+0x388/0x2a50
  ? cifs_handle_standard+0x610/0x610
  ? rcu_read_lock_held_common+0x120/0x120
  ? mark_lock+0x11b/0xc00
  ? __lock_acquire+0x14ed/0x3270
  ? __kthread_parkme+0x78/0x100
  ? lockdep_hardirqs_on+0x3e8/0x560
  ? lock_downgrade+0x6a0/0x6a0
  ? lockdep_hardirqs_on+0x3e8/0x560
  ? _raw_spin_unlock_irqrestore+0x39/0x60
  ? cifs_handle_standard+0x610/0x610
  kthread+0x2bb/0x3a0
  ? kthread_create_worker_on_cpu+0xc0/0xc0
  ret_from_fork+0x3a/0x50

 Allocated by task 649:
  save_stack+0x19/0x70
  __kasan_kmalloc.constprop.5+0xa6/0xf0
  kmem_cache_alloc+0x107/0x320
  copy_process+0x17bc/0x5370
  _do_fork+0x103/0xbf0
  __x64_sys_clone+0x168/0x1e0
  do_syscall_64+0x9b/0xec0
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

 Freed by task 0:
  save_stack+0x19/0x70
  __kasan_slab_free+0x11d/0x160
  kmem_cache_free+0xb5/0x3d0
  rcu_core+0x52f/0x1230
  __do_softirq+0x24d/0x962

 The buggy address belongs to the object at ffff8880103e32c0
  which belongs to the cache task_struct of size 6016
 The buggy address is located 1960 bytes inside of
  6016-byte region [ffff8880103e32c0ffff8880103e4a40)
 The buggy address belongs to the page:
 page:ffffea000040f800 refcount:1 mapcount:0 mapping:ffff8880108da5c0
 index:0xffff8880103e4c00 compound_mapcount: 0
 raw: 4000000000010200 ffffea00001f2208 ffffea00001e3408 ffff8880108da5c0
 raw: ffff8880103e4c00 0000000000050003 00000001ffffffff 0000000000000000
 page dumped because: kasan: bad access detected

 Memory state around the buggy address:
  ffff8880103e3900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff8880103e3980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 >ffff8880103e3a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                           ^
  ffff8880103e3a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff8880103e3b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ==================================================================

This can be reliably reproduced by adding the below delay to
cifs_reconnect(), running find(1) on the mount, restarting the samba
server while find is running, and killing find during the delay:

   spin_unlock(&GlobalMid_Lock);
   mutex_unlock(&server->srv_mutex);

 + msleep(10000);
 +
   cifs_dbg(FYI, "%s: issuing mid callbacks\n", __func__);
   list_for_each_safe(tmp, tmp2, &retry_list) {
   mid_entry = list_entry(tmp, struct mid_q_entry, qhead);

Fix this by holding a reference to the task struct until the MID is
freed.

Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
4 years agocifs: use PTR_ERR_OR_ZERO() to simplify code
Chen Zhou [Wed, 22 Jan 2020 10:20:30 +0000 (18:20 +0800)]
cifs: use PTR_ERR_OR_ZERO() to simplify code

PTR_ERR_OR_ZERO contains if(IS_ERR(...)) + PTR_ERR, just use
PTR_ERR_OR_ZERO directly.

Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
4 years agocifs: add support for fallocate mode 0 for non-sparse files
Ronnie Sahlberg [Fri, 17 Jan 2020 01:45:02 +0000 (11:45 +1000)]
cifs: add support for fallocate mode 0 for non-sparse files

RHBZ 1336264

When we extend a file we must also force the size to be updated.

This fixes an issue with holetest in xfs-tests which performs the following
sequence :
1, create a new file
2, use fallocate mode==0 to populate the file
3, mmap the file
4, touch each page by reading the mmapped region.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
4 years agocifs: fix NULL dereference in match_prepath
Ronnie Sahlberg [Wed, 22 Jan 2020 01:07:56 +0000 (11:07 +1000)]
cifs: fix NULL dereference in match_prepath

RHBZ: 1760879

Fix an oops in match_prepath() by making sure that the prepath string is not
NULL before we pass it into strcmp().

This is similar to other checks we make for example in cifs_root_iget()

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
4 years agosmb3: fix default permissions on new files when mounting with modefromsid
Steve French [Fri, 17 Jan 2020 01:55:33 +0000 (19:55 -0600)]
smb3: fix default permissions on new files when mounting with modefromsid

When mounting with "modefromsid" mount parm most servers will require
that some default permissions are given to users in the ACL on newly
created files, files created with the new 'sd context' - when passing in
an sd context on create, permissions are not inherited from the parent
directory, so in addition to the ACE with the special SID which contains
the mode, we also must pass in an ACE allowing users to access the file
(GENERIC_ALL for authenticated users seemed like a reasonable default,
although later we could allow a mount option or config switch to make
it GENERIC_ALL for EVERYONE special sid).

CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-By: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
4 years agoCIFS: Add support for setting owner info, dos attributes, and create time
Boris Protopopov [Thu, 19 Dec 2019 16:52:50 +0000 (16:52 +0000)]
CIFS: Add support for setting owner info, dos attributes, and create time

This is needed for backup/restore scenarios among others.

Add extended attribute "system.cifs_ntsd" (and alias "system.smb3_ntsd")
to allow for setting owner and DACL in the security descriptor. This is in
addition to the existing "system.cifs_acl" and "system.smb3_acl" attributes
that allow for setting DACL only. Add support for setting creation time and
dos attributes using set_file_info() calls to complement the existing
support for getting these attributes via query_path_info() calls.

Signed-off-by: Boris Protopopov <bprotopopov@hotmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
4 years agocifs: remove set but not used variable 'server'
YueHaibing [Fri, 17 Jan 2020 02:57:17 +0000 (10:57 +0800)]
cifs: remove set but not used variable 'server'

fs/cifs/smb2pdu.c: In function 'SMB2_query_directory':
fs/cifs/smb2pdu.c:4444:26: warning:
 variable 'server' set but not used [-Wunused-but-set-variable]
  struct TCP_Server_Info *server;

It is not used, so remove it.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
4 years agocifs: Fix memory allocation in __smb2_handle_cancelled_cmd()
Paulo Alcantara (SUSE) [Mon, 13 Jan 2020 20:46:59 +0000 (17:46 -0300)]
cifs: Fix memory allocation in __smb2_handle_cancelled_cmd()

__smb2_handle_cancelled_cmd() is called under a spin lock held in
cifs_mid_q_entry_release(), so make its memory allocation GFP_ATOMIC.

This issue was observed when running xfstests generic/028:

[ 1722.589204] CIFS VFS: \\192.168.30.26 Cancelling wait for mid 72064 cmd: 5
[ 1722.590687] CIFS VFS: \\192.168.30.26 Cancelling wait for mid 72065 cmd: 17
[ 1722.593529] CIFS VFS: \\192.168.30.26 Cancelling wait for mid 72066 cmd: 6
[ 1723.039014] BUG: sleeping function called from invalid context at mm/slab.h:565
[ 1723.040710] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 30877, name: cifsd
[ 1723.045098] CPU: 3 PID: 30877 Comm: cifsd Not tainted 5.5.0-rc4+ #313
[ 1723.046256] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
[ 1723.048221] Call Trace:
[ 1723.048689]  dump_stack+0x97/0xe0
[ 1723.049268]  ___might_sleep.cold+0xd1/0xe1
[ 1723.050069]  kmem_cache_alloc_trace+0x204/0x2b0
[ 1723.051051]  __smb2_handle_cancelled_cmd+0x40/0x140 [cifs]
[ 1723.052137]  smb2_handle_cancelled_mid+0xf6/0x120 [cifs]
[ 1723.053247]  cifs_mid_q_entry_release+0x44d/0x630 [cifs]
[ 1723.054351]  ? cifs_reconnect+0x26a/0x1620 [cifs]
[ 1723.055325]  cifs_demultiplex_thread+0xad4/0x14a0 [cifs]
[ 1723.056458]  ? cifs_handle_standard+0x2c0/0x2c0 [cifs]
[ 1723.057365]  ? kvm_sched_clock_read+0x14/0x30
[ 1723.058197]  ? sched_clock+0x5/0x10
[ 1723.058838]  ? sched_clock_cpu+0x18/0x110
[ 1723.059629]  ? lockdep_hardirqs_on+0x17d/0x250
[ 1723.060456]  kthread+0x1ab/0x200
[ 1723.061149]  ? cifs_handle_standard+0x2c0/0x2c0 [cifs]
[ 1723.062078]  ? kthread_create_on_node+0xd0/0xd0
[ 1723.062897]  ret_from_fork+0x3a/0x50

Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Fixes: 9150c3adbf24 ("CIFS: Close open handle after interrupted close")
Cc: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
4 years agocifs: Fix mount options set in automount
Paulo Alcantara (SUSE) [Thu, 9 Jan 2020 13:03:19 +0000 (10:03 -0300)]
cifs: Fix mount options set in automount

Starting from 4a367dc04435, we must set the mount options based on the
DFS full path rather than the resolved target, that is, cifs_mount()
will be responsible for resolving the DFS link (cached) as well as
performing failover to any other targets in the referral.

Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reported-by: Martijn de Gouw <martijn.de.gouw@prodrive-technologies.com>
Fixes: 4a367dc04435 ("cifs: Add support for failover in cifs_mount()")
Link: https://lore.kernel.org/linux-cifs/39643d7d-2abb-14d3-ced6-c394fab9a777@prodrive-technologies.com
Tested-by: Martijn de Gouw <martijn.de.gouw@prodrive-technologies.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
4 years agocifs: fix unitialized variable poential problem with network I/O cache lock patch
Steve French [Thu, 16 Jan 2020 21:58:00 +0000 (15:58 -0600)]
cifs: fix unitialized variable poential problem with network I/O cache lock patch

static analysis with Coverity detected an issue with the following
commit:

 Author: Paulo Alcantara (SUSE) <pc@cjr.nz>
 Date:   Wed Dec 4 17:38:03 2019 -0300

    cifs: Avoid doing network I/O while holding cache lock

Addresses-Coverity: ("Uninitialized pointer read")
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
4 years agocifs: Fix return value in __update_cache_entry
YueHaibing [Fri, 17 Jan 2020 02:21:56 +0000 (10:21 +0800)]
cifs: Fix return value in __update_cache_entry

copy_ref_data() may return error, it should be
returned to upstream caller.

Fixes: 03535b72873b ("cifs: Avoid doing network I/O while holding cache lock")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
4 years agocifs: Avoid doing network I/O while holding cache lock
Paulo Alcantara (SUSE) [Wed, 4 Dec 2019 20:38:03 +0000 (17:38 -0300)]
cifs: Avoid doing network I/O while holding cache lock

When creating or updating a cache entry, we need to get an DFS
referral (get_dfs_referral), so avoid holding any locks during such
network operation.

To prevent that, do the following:
* change cache hashtable sync method from RCU sync to a read/write
  lock.
* use GFP_ATOMIC in memory allocations.

Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
4 years agocifs: Fix potential deadlock when updating vol in cifs_reconnect()
Paulo Alcantara (SUSE) [Wed, 4 Dec 2019 20:38:02 +0000 (17:38 -0300)]
cifs: Fix potential deadlock when updating vol in cifs_reconnect()

We can't acquire volume lock while refreshing the DFS cache because
cifs_reconnect() may call dfs_cache_update_vol() while we are walking
through the volume list.

To prevent that, make vol_info refcounted, create a temp list with all
volumes eligible for refreshing, and then use it without any locks
held.

Besides, replace vol_lock with a spinlock and protect cache_ttl from
concurrent accesses or changes.

Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
4 years agocifs: Merge is_path_valid() into get_normalized_path()
Paulo Alcantara (SUSE) [Wed, 4 Dec 2019 20:38:01 +0000 (17:38 -0300)]
cifs: Merge is_path_valid() into get_normalized_path()

Just do the trivial path validation in get_normalized_path().

Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
4 years agocifs: Introduce helpers for finding TCP connection
Paulo Alcantara (SUSE) [Wed, 4 Dec 2019 20:38:00 +0000 (17:38 -0300)]
cifs: Introduce helpers for finding TCP connection

Add helpers for finding TCP connections that are good candidates for
being used by DFS refresh worker.

Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
4 years agocifs: Get rid of kstrdup_const()'d paths
Paulo Alcantara (SUSE) [Wed, 4 Dec 2019 20:37:59 +0000 (17:37 -0300)]
cifs: Get rid of kstrdup_const()'d paths

The DFS cache API is mostly used with heap allocated strings.

Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
4 years agocifs: Clean up DFS referral cache
Paulo Alcantara (SUSE) [Wed, 4 Dec 2019 20:37:58 +0000 (17:37 -0300)]
cifs: Clean up DFS referral cache

Do some renaming and code cleanup.

No functional changes.

Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
4 years agocifs: Don't use iov_iter::type directly
David Howells [Thu, 21 Nov 2019 08:13:58 +0000 (08:13 +0000)]
cifs: Don't use iov_iter::type directly

Don't use iov_iter::type directly, but rather use the new accessor
functions that have been added.  This allows the .type field to be split
and rearranged without the need to update the filesystems.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
4 years agocifs: set correct max-buffer-size for smb2_ioctl_init()
Ronnie Sahlberg [Wed, 8 Jan 2020 03:08:07 +0000 (13:08 +1000)]
cifs: set correct max-buffer-size for smb2_ioctl_init()

Fix two places where we need to adjust down the max response size for
ioctl when it is used together with compounding.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
CC: Stable <stable@vger.kernel.org>
4 years agocifs: use compounding for open and first query-dir for readdir()
Ronnie Sahlberg [Wed, 8 Jan 2020 03:08:06 +0000 (13:08 +1000)]
cifs: use compounding for open and first query-dir for readdir()

Combine the initial SMB2_Open and the first SMB2_Query_Directory in a compound.
This shaves one round-trip of each directory listing, changing it from 4 to 3
for small directories.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
4 years agocifs: create a helper function to parse the query-directory response buffer
Ronnie Sahlberg [Wed, 8 Jan 2020 03:08:05 +0000 (13:08 +1000)]
cifs: create a helper function to parse the query-directory response buffer

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
4 years agocifs: prepare SMB2_query_directory to be used with compounding
Ronnie Sahlberg [Wed, 8 Jan 2020 03:08:04 +0000 (13:08 +1000)]
cifs: prepare SMB2_query_directory to be used with compounding

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
4 years agofs/cifs/cifssmb.c: use true,false for bool variable
zhengbin [Wed, 25 Dec 2019 03:30:21 +0000 (11:30 +0800)]
fs/cifs/cifssmb.c: use true,false for bool variable

Fixes coccicheck warning:

fs/cifs/cifssmb.c:4622:3-22: WARNING: Assignment of 0/1 to bool variable
fs/cifs/cifssmb.c:4756:3-22: WARNING: Assignment of 0/1 to bool variable

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
4 years agofs/cifs/smb2ops.c: use true,false for bool variable
zhengbin [Wed, 25 Dec 2019 03:30:20 +0000 (11:30 +0800)]
fs/cifs/smb2ops.c: use true,false for bool variable

Fixes coccicheck warning:

fs/cifs/smb2ops.c:807:2-36: WARNING: Assignment of 0/1 to bool variable

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
4 years agoLinux 5.5 v5.5
Linus Torvalds [Mon, 27 Jan 2020 00:23:03 +0000 (16:23 -0800)]
Linux 5.5

4 years agoMerge tag 'io_uring-5.5-2020-01-26' of git://git.kernel.dk/linux-block
Linus Torvalds [Sun, 26 Jan 2020 20:23:04 +0000 (12:23 -0800)]
Merge tag 'io_uring-5.5-2020-01-26' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
 "Fix for two regressions in this cycle, both reported by the postgresql
  use case.

  One removes the added restriction on who can submit IO, making it
  possible for rings shared across forks to do so. The other fixes an
  issue for the same kind of use case, where one exiting process would
  cancel all IO"

* tag 'io_uring-5.5-2020-01-26' of git://git.kernel.dk/linux-block:
  io_uring: don't cancel all work on process exit
  Revert "io_uring: only allow submit from owning task"

4 years agoMerge tag 'block-5.5-2020-01-26' of git://git.kernel.dk/linux-block
Linus Torvalds [Sun, 26 Jan 2020 20:12:36 +0000 (12:12 -0800)]
Merge tag 'block-5.5-2020-01-26' of git://git.kernel.dk/linux-block

Pull block fix from Jens Axboe:
 "Unfortunately this weekend we had a few last minute reports, one was
  for block.

  The partition disable for zoned devices was overly restrictive, it can
  work (and be supported) just fine for host-aware variants.

  Here's a fix ensuring that's the case so we don't break existing users
  of that"

* tag 'block-5.5-2020-01-26' of git://git.kernel.dk/linux-block:
  block: allow partitions on host aware zone devices

4 years agoMerge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Sun, 26 Jan 2020 18:39:09 +0000 (10:39 -0800)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Two last minute fixes, both in drivers.

  The fnic one is a highly unlikely condition, but the RDMA one is a
  recently introduced regression that causes a kernel warning to trigger
  in every RDMA logon, which would be unsightly if it got into the final
  release"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: RDMA/isert: Fix a recently introduced regression related to logout
  scsi: fnic: do not queue commands during fwreset

4 years agoMerge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Linus Torvalds [Sun, 26 Jan 2020 18:33:48 +0000 (10:33 -0800)]
Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull vfs fix from Al Viro:
 "Fix a use-after-free in do_last() handling of sysctl_protected_...
  checks.

  The use-after-free normally doesn't happen there, but race with
  rename() and it becomes possible"

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  do_last(): fetch directory ->i_mode and ->i_uid before it's too late

4 years agoio_uring: don't cancel all work on process exit
Jens Axboe [Sun, 26 Jan 2020 17:17:12 +0000 (10:17 -0700)]
io_uring: don't cancel all work on process exit

If we're sharing the ring across forks, then one process exiting means
that we cancel ALL work and prevent future work. This is overly
restrictive. As long as we cancel the work associated with the files
from the current task, it's safe to let others persist. Normal fd close
on exit will still wait (and cancel) pending work.

Fixes: fcb323cc53e2 ("io_uring: io_uring: add support for async work inheriting files")
Reported-by: Andres Freund <andres@anarazel.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: allow partitions on host aware zone devices
Christoph Hellwig [Sun, 26 Jan 2020 13:05:43 +0000 (14:05 +0100)]
block: allow partitions on host aware zone devices

Host-aware SMR drives can be used with the commands to explicitly manage
zone state, but they can also be used as normal disks.  In the former
case it makes perfect sense to allow partitions on them, in the latter
it does not, just like for host managed devices.  Add a check to
add_partition to allow partitions on host aware devices, but give
up any zone management capabilities in that case, which also catches
the previously missed case of adding a partition vs just scanning it.

Because sd can rescan the attribute at runtime it needs to check if
a disk has partitions, for which a new helper is added to genhd.h.

Fixes: 5eac3eb30c9a ("block: Remove partition support for zoned block devices")
Reported-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoRevert "io_uring: only allow submit from owning task"
Jens Axboe [Sun, 26 Jan 2020 16:53:12 +0000 (09:53 -0700)]
Revert "io_uring: only allow submit from owning task"

This ends up being too restrictive for tasks that willingly fork and
share the ring between forks. Andres reports that this breaks his
postgresql work. Since we're close to 5.5 release, revert this change
for now.

Cc: stable@vger.kernel.org
Fixes: 44d282796f81 ("io_uring: only allow submit from owning task")
Reported-by: Andres Freund <andres@anarazel.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoafs: Fix characters allowed into cell names
David Howells [Sun, 26 Jan 2020 01:02:53 +0000 (01:02 +0000)]
afs: Fix characters allowed into cell names

The afs filesystem needs to prohibit certain characters from cell names,
such as '/', as these are used to form filenames in procfs, leading to
the following warning being generated:

WARNING: CPU: 0 PID: 3489 at fs/proc/generic.c:178

Fix afs_alloc_cell() to disallow nonprintable characters, '/', '@' and
names that begin with a dot.

Remove the check for "@cell" as that is then redundant.

This can be tested by running:

echo add foo/.bar 1.2.3.4 >/proc/fs/afs/cells

Note that we will also need to deal with:

 - Names ending in ".invalid" shouldn't be passed to the DNS.

 - Names that contain non-valid domainname chars shouldn't be passed to
   the DNS.

 - DNS replies that say "your-dns-needs-immediate-attention.<gTLD>" and
   replies containing A records that say 127.0.53.53 should be
   considered invalid.
   [https://www.icann.org/en/system/files/files/name-collision-mitigation-01aug14-en.pdf]

but these need to be dealt with by the kafs-client DNS program rather
than the kernel.

Reported-by: syzbot+b904ba7c947a37b4b291@syzkaller.appspotmail.com
Cc: stable@kernel.org
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agodo_last(): fetch directory ->i_mode and ->i_uid before it's too late
Al Viro [Sun, 26 Jan 2020 14:29:34 +0000 (09:29 -0500)]
do_last(): fetch directory ->i_mode and ->i_uid before it's too late

may_create_in_sticky() call is done when we already have dropped the
reference to dir.

Fixes: 30aba6656f61e (namei: allow restricted O_CREAT of FIFOs and regular files)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
4 years agoMerge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm
Linus Torvalds [Sat, 25 Jan 2020 22:32:51 +0000 (14:32 -0800)]
Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm

Pull ARM fixes from Russell King:

 - fix ftrace relocation type filtering

 - relax arch timer version check

* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
  ARM: 8955/1: virt: Relax arch timer version check during early boot
  ARM: 8950/1: ftrace/recordmcount: filter relocation types

4 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Sat, 25 Jan 2020 22:19:32 +0000 (14:19 -0800)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from David Miller:

 1) Off by one in mt76 airtime calculation, from Dan Carpenter.

 2) Fix TLV fragment allocation loop condition in iwlwifi, from Luca
    Coelho.

 3) Don't confirm neigh entries when doing ipsec pmtu updates, from Xu
    Wang.

 4) More checks to make sure we only send TSO packets to lan78xx chips
    that they can actually handle. From James Hughes.

 5) Fix ip_tunnel namespace move, from William Dauchy.

 6) Fix unintended packet reordering due to cooperation between
    listification done by GRO and non-GRO paths. From Maxim
    Mikityanskiy.

 7) Add Jakub Kicincki formally as networking co-maintainer.

 8) Info leak in airo ioctls, from Michael Ellerman.

 9) IFLA_MTU attribute needs validation during rtnl_create_link(), from
    Eric Dumazet.

10) Use after free during reload in mlxsw, from Ido Schimmel.

11) Dangling pointers are possible in tp->highest_sack, fix from Eric
    Dumazet.

12) Missing *pos++ in various networking seq_next handlers, from Vasily
    Averin.

13) CHELSIO_GET_MEM operation neds CAP_NET_ADMIN check, from Michael
    Ellerman.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (109 commits)
  firestream: fix memory leaks
  net: cxgb3_main: Add CAP_NET_ADMIN check to CHELSIO_GET_MEM
  net: bcmgenet: Use netif_tx_napi_add() for TX NAPI
  tipc: change maintainer email address
  net: stmmac: platform: fix probe for ACPI devices
  net/mlx5e: kTLS, Do not send decrypted-marked SKBs via non-accel path
  net/mlx5e: kTLS, Remove redundant posts in TX resync flow
  net/mlx5e: kTLS, Fix corner-case checks in TX resync flow
  net/mlx5e: Clear VF config when switching modes
  net/mlx5: DR, use non preemptible call to get the current cpu number
  net/mlx5: E-Switch, Prevent ingress rate configuration of uplink rep
  net/mlx5: DR, Enable counter on non-fwd-dest objects
  net/mlx5: Update the list of the PCI supported devices
  net/mlx5: Fix lowest FDB pool size
  net: Fix skb->csum update in inet_proto_csum_replace16().
  netfilter: nf_tables: autoload modules from the abort path
  netfilter: nf_tables: add __nft_chain_type_get()
  netfilter: nf_tables_offload: fix check the chain offload flag
  netfilter: conntrack: sctp: use distinct states for new SCTP connections
  ipv6_route_seq_next should increase position index
  ...

4 years agoMerge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Linus Torvalds [Sat, 25 Jan 2020 22:08:43 +0000 (14:08 -0800)]
Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull ARM SoC fixes from Olof Johansson:
 "A couple of fixes have come in that would be good to include in this
  release:

   - A fix for amount of memory on Beaglebone Black. Surfaced now since
     GRUB2 doesn't update memory size in the booted kernel.

   - A fix to make SPI interfaces work on am43x-epos-evm.

   - Small Kconfig fix for OPTEE (adds a depend on MMU) to avoid build
     failures"

* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
  ARM: dts: am43x-epos-evm: set data pin directions for spi0 and spi1
  tee: optee: Fix compilation issue with nommu
  ARM: dts: am335x-boneblack-common: fix memory size

4 years agofirestream: fix memory leaks
Wenwen Wang [Sat, 25 Jan 2020 14:33:29 +0000 (14:33 +0000)]
firestream: fix memory leaks

In fs_open(), 'vcc' is allocated through kmalloc() and assigned to
'atm_vcc->dev_data.' In the following execution, if an error occurs, e.g.,
there is no more free channel, an error code EBUSY or ENOMEM will be
returned. However, 'vcc' is not deallocated, leading to memory leaks. Note
that, in normal cases where fs_open() returns 0, 'vcc' will be deallocated
in fs_close(). But, if fs_open() fails, there is no guarantee that
fs_close() will be invoked.

To fix this issue, deallocate 'vcc' before the error code is returned.

Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
David S. Miller [Sat, 25 Jan 2020 20:40:39 +0000 (21:40 +0100)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) Missing netlink attribute sanity check for NFTA_OSF_DREG,
   from Florian Westphal.

2) Use bitmap infrastructure in ipset to fix KASAN slab-out-of-bounds
   reads, from Jozsef Kadlecsik.

3) Missing initial CLOSED state in new sctp connection through
   ctnetlink events, from Jiri Wiesner.

4) Missing check for NFT_CHAIN_HW_OFFLOAD in nf_tables offload
   indirect block infrastructure, from wenxu.

5) Add __nft_chain_type_get() to sanity check family and chain type.

6) Autoload modules from the nf_tables abort path to fix races
   reported by syzbot.

7) Remove unnecessary skb->csum update on inet_proto_csum_replace16(),
   from Praveen Chaudhary.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge tag 'for-5.5-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Linus Torvalds [Sat, 25 Jan 2020 18:55:24 +0000 (10:55 -0800)]
Merge tag 'for-5.5-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fix from David Sterba:
 "Here's a last minute fix for a regression introduced in this
  development cycle.

  There's a small chance of a silent corruption when device replace and
  NOCOW data writes happen at the same time in one block group. Metadata
  or COW data writes are unaffected.

  The extra fixup patch is there to silence an unnecessary warning"

* tag 'for-5.5-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: dev-replace: remove warning for unknown return codes when finished
  btrfs: scrub: Require mandatory block group RO for dev-replace

4 years agoMerge tag 'pinctrl-v5.5-5' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw...
Linus Torvalds [Sat, 25 Jan 2020 18:46:07 +0000 (10:46 -0800)]
Merge tag 'pinctrl-v5.5-5' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

Pull pin control fix from Linus Walleij:
 "A single fix for the Intel Sunrisepoint pin controller that makes the
  interrupts work properly on it"

* tag 'pinctrl-v5.5-5' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: sunrisepoint: Add missing Interrupt Status register offset

4 years agoMerge tag 'mlx5-fixes-2020-01-24' of git://git.kernel.org/pub/scm/linux/kernel/git...
David S. Miller [Sat, 25 Jan 2020 12:46:00 +0000 (13:46 +0100)]
Merge tag 'mlx5-fixes-2020-01-24' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
Mellanox, mlx5 fixes 2020-01-24

This series introduces some fixes to mlx5 driver.

Please pull and let me know if there is any problem.

Merge conflict: once merge with net-next, a contextual conflict will
appear in drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
since the code moved in net-next.
To resolve, just delete ALL of the conflicting hunk from net.
So sorry for the small mess ..

For -stable v5.4:
 ('net/mlx5: Update the list of the PCI supported devices')
 ('net/mlx5: Fix lowest FDB pool size')
 ('net/mlx5e: kTLS, Fix corner-case checks in TX resync flow')
 ('net/mlx5e: kTLS, Do not send decrypted-marked SKBs via non-accel path')
 ('net/mlx5: Eswitch, Prevent ingress rate configuration of uplink rep')
 ('net/mlx5e: kTLS, Remove redundant posts in TX resync flow')
 ('net/mlx5: DR, Enable counter on non-fwd-dest objects')
 ('net/mlx5: DR, use non preemptible call to get the current cpu number')
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agobtrfs: dev-replace: remove warning for unknown return codes when finished
David Sterba [Sat, 25 Jan 2020 11:35:38 +0000 (12:35 +0100)]
btrfs: dev-replace: remove warning for unknown return codes when finished

The fstests btrfs/011 triggered a warning at the end of device replace,

  [ 1891.998975] BTRFS warning (device vdd): failed setting block group ro: -28
  [ 1892.038338] BTRFS error (device vdd): btrfs_scrub_dev(/dev/vdd, 1, /dev/vdb) failed -28
  [ 1892.059993] ------------[ cut here ]------------
  [ 1892.063032] WARNING: CPU: 2 PID: 2244 at fs/btrfs/dev-replace.c:506 btrfs_dev_replace_start.cold+0xf9/0x140 [btrfs]
  [ 1892.074346] CPU: 2 PID: 2244 Comm: btrfs Not tainted 5.5.0-rc7-default+ #942
  [ 1892.079956] RIP: 0010:btrfs_dev_replace_start.cold+0xf9/0x140 [btrfs]

  [ 1892.096576] RSP: 0018:ffffbb58c7b3fd10 EFLAGS: 00010286
  [ 1892.098311] RAX: 00000000ffffffe4 RBX: 0000000000000001 RCX: 8888888888888889
  [ 1892.100342] RDX: 0000000000000001 RSI: ffff9e889645f5d8 RDI: ffffffff92821080
  [ 1892.102291] RBP: ffff9e889645c000 R08: 000001b8878fe1f6 R09: 0000000000000000
  [ 1892.104239] R10: ffffbb58c7b3fd08 R11: 0000000000000000 R12: ffff9e88a0017000
  [ 1892.106434] R13: ffff9e889645f608 R14: ffff9e88794e1000 R15: ffff9e88a07b5200
  [ 1892.108642] FS:  00007fcaed3f18c0(0000) GS:ffff9e88bda00000(0000) knlGS:0000000000000000
  [ 1892.111558] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [ 1892.113492] CR2: 00007f52509ff420 CR3: 00000000603dd002 CR4: 0000000000160ee0

  [ 1892.115814] Call Trace:
  [ 1892.116896]  btrfs_dev_replace_by_ioctl+0x35/0x60 [btrfs]
  [ 1892.118962]  btrfs_ioctl+0x1d62/0x2550 [btrfs]

caused by the previous patch ("btrfs: scrub: Require mandatory block
group RO for dev-replace"). Hitting ENOSPC is possible and could happen
when the block group is set read-only, preventing NOCOW writes to the
area that's being accessed by dev-replace.

This has happend with scratch devices of size 12G but not with 5G and
20G, so this is depends on timing and other activity on the filesystem.
The whole replace operation is restartable, the space state should be
examined by the user in any case.

The error code is propagated back to the ioctl caller so the kernel
warning is causing false alerts.

Signed-off-by: David Sterba <dsterba@suse.com>
4 years agonet: cxgb3_main: Add CAP_NET_ADMIN check to CHELSIO_GET_MEM
Michael Ellerman [Fri, 24 Jan 2020 09:41:44 +0000 (20:41 +1100)]
net: cxgb3_main: Add CAP_NET_ADMIN check to CHELSIO_GET_MEM

The cxgb3 driver for "Chelsio T3-based gigabit and 10Gb Ethernet
adapters" implements a custom ioctl as SIOCCHIOCTL/SIOCDEVPRIVATE in
cxgb_extension_ioctl().

One of the subcommands of the ioctl is CHELSIO_GET_MEM, which appears
to read memory directly out of the adapter and return it to userspace.
It's not entirely clear what the contents of the adapter memory
contains, but the assumption is that it shouldn't be accessible to all
users.

So add a CAP_NET_ADMIN check to the CHELSIO_GET_MEM case. Put it after
the is_offload() check, which matches two of the other subcommands in
the same function which also check for is_offload() and CAP_NET_ADMIN.

Found by Ilja by code inspection, not tested as I don't have the
required hardware.

Reported-by: Ilja Van Sprundel <ivansprundel@ioactive.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: bcmgenet: Use netif_tx_napi_add() for TX NAPI
Florian Fainelli [Thu, 23 Jan 2020 17:49:34 +0000 (09:49 -0800)]
net: bcmgenet: Use netif_tx_napi_add() for TX NAPI

Before commit 7587935cfa11 ("net: bcmgenet: move NAPI initialization to
ring initialization") moved the code, this used to be
netif_tx_napi_add(), but we lost that small semantic change in the
process, restore that.

Fixes: 7587935cfa11 ("net: bcmgenet: move NAPI initialization to ring initialization")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agotipc: change maintainer email address
Jon Maloy [Thu, 23 Jan 2020 15:09:39 +0000 (10:09 -0500)]
tipc: change maintainer email address

Reflecting new realities.

Signed-off-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: stmmac: platform: fix probe for ACPI devices
Ajay Gupta [Thu, 23 Jan 2020 01:16:35 +0000 (17:16 -0800)]
net: stmmac: platform: fix probe for ACPI devices

Use generic device API to get phy mode to fix probe failure
with ACPI based devices.

Signed-off-by: Ajay Gupta <ajayg@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Linus Torvalds [Sat, 25 Jan 2020 03:27:42 +0000 (19:27 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input

Pull input fixes from Dmitry Torokhov:

 - add sanity checks to USB endpoints in various dirvers

 - max77650-onkey was missing an OF table which was preventing module
   autoloading

 - a revert and a different fix for F54 handling in Synaptics dirver

 - a fixup for handling register in pm8xxx vibrator driver

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: pm8xxx-vib - fix handling of separate enable register
  Input: keyspan-remote - fix control-message timeouts
  Input: max77650-onkey - add of_match table
  Input: rmi_f54 - read from FIFO in 32 byte blocks
  Revert "Input: synaptics-rmi4 - don't increment rmiaddr for SMBus transfers"
  Input: sur40 - fix interface sanity checks
  Input: gtco - drop redundant variable reinit
  Input: gtco - fix extra-descriptor debug message
  Input: gtco - fix endpoint sanity check
  Input: aiptek - use descriptors of current altsetting
  Input: aiptek - fix endpoint sanity check
  Input: pegasus_notetaker - fix endpoint sanity check
  Input: sun4i-ts - add a check for devm_thermal_zone_of_sensor_register
  Input: evdev - convert kzalloc()/vzalloc() to kvzalloc()

4 years agoMerge tag 'omap-for-fixes-whenever-signed' of git://git.kernel.org/pub/scm/linux...
Olof Johansson [Fri, 24 Jan 2020 20:05:32 +0000 (12:05 -0800)]
Merge tag 'omap-for-fixes-whenever-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into arm/fixes

Few minor fixes for omaps

Looks like we have wrong default memory size for beaglebone black,
it has at least 512 MB of RAM and not 256 MB. This causes an issue
when booted with GRUB2 that does not seem to pass memory info to
the kernel.

And for am43x-epos-evm the SPI pin directions need to be configured
for SPI to work.

* tag 'omap-for-fixes-whenever-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
  ARM: dts: am43x-epos-evm: set data pin directions for spi0 and spi1
  ARM: dts: am335x-boneblack-common: fix memory size

Link: https://lore.kernel.org/r/pull-1579895109-287828@atomide.com
Signed-off-by: Olof Johansson <olof@lixom.net>
4 years agoMerge tag 'tee-optee-fix2-for-5.5' of https://git.linaro.org:/people/jens.wiklander...
Olof Johansson [Fri, 24 Jan 2020 20:05:07 +0000 (12:05 -0800)]
Merge tag 'tee-optee-fix2-for-5.5' of https://git.linaro.org:/people/jens.wiklander/linux-tee into arm/fixes

Fix OP-TEE compile error with nommu

* tag 'tee-optee-fix2-for-5.5' of https://git.linaro.org:/people/jens.wiklander/linux-tee:
  tee: optee: Fix compilation issue with nommu

Link: https://lore.kernel.org/r/20200123101310.GA10320@jax
Signed-off-by: Olof Johansson <olof@lixom.net>
4 years agonet/mlx5e: kTLS, Do not send decrypted-marked SKBs via non-accel path
Tariq Toukan [Mon, 20 Jan 2020 11:42:00 +0000 (13:42 +0200)]
net/mlx5e: kTLS, Do not send decrypted-marked SKBs via non-accel path

When TCP out-of-order is identified (unexpected tcp seq mismatch), driver
analyzes the packet and decides what handling should it get:
1. go to accelerated path (to be encrypted in HW),
2. go to regular xmit path (send w/o encryption),
3. drop.

Packets marked with skb->decrypted by the TLS stack in the TX flow skips
SW encryption, and rely on the HW offload.
Verify that such packets are never sent un-encrypted on the wire.
Add a WARN to catch such bugs, and prefer dropping the packet in these cases.

Fixes: 46a3ea98074e ("net/mlx5e: kTLS, Enhance TX resync flow")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
4 years agonet/mlx5e: kTLS, Remove redundant posts in TX resync flow
Tariq Toukan [Mon, 13 Jan 2020 12:46:09 +0000 (14:46 +0200)]
net/mlx5e: kTLS, Remove redundant posts in TX resync flow

The call to tx_post_resync_params() is done earlier in the flow,
the post of the control WQEs is unnecessarily repeated. Remove it.

Fixes: 700ec4974240 ("net/mlx5e: kTLS, Fix missing SQ edge fill")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
4 years agonet/mlx5e: kTLS, Fix corner-case checks in TX resync flow
Tariq Toukan [Sun, 12 Jan 2020 14:22:14 +0000 (16:22 +0200)]
net/mlx5e: kTLS, Fix corner-case checks in TX resync flow

There are the following cases:

1. Packet ends before start marker: bypass offload.
2. Packet starts before start marker and ends after it: drop,
   not supported, breaks contract with kernel.
3. packet ends before tls record info starts: drop,
   this packet was already acknowledged and its record info
   was released.

Add the above as comment in code.

Mind possible wraparounds of the TCP seq, replace the simple comparison
with a call to the TCP before() method.

In addition, remove logic that handles negative sync_len values,
as it became impossible.

Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support")
Fixes: 46a3ea98074e ("net/mlx5e: kTLS, Enhance TX resync flow")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
4 years agonet/mlx5e: Clear VF config when switching modes
Dmytro Linkin [Mon, 13 Jan 2020 11:46:13 +0000 (13:46 +0200)]
net/mlx5e: Clear VF config when switching modes

Currently VF in LEGACY mode are not able to go up. Also in OFFLOADS
mode, when switching to it first time, VF can go up independently to
his representor, which is not expected.
Perform clearing of VF config when switching modes and set link state
to AUTO as default value. Also, when switching to OFFLOADS mode set
link state to DOWN, which allow VF link state to be controlled by its
REP.

Fixes: 1ab2068a4c66 ("net/mlx5: Implement vports admin state backup/restore")
Fixes: 556b9d16d3f5 ("net/mlx5: Clear VF's configuration on disabling SRIOV")
Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
4 years agonet/mlx5: DR, use non preemptible call to get the current cpu number
Erez Shitrit [Sun, 12 Jan 2020 06:57:59 +0000 (08:57 +0200)]
net/mlx5: DR, use non preemptible call to get the current cpu number

Use raw_smp_processor_id instead of smp_processor_id() otherwise we will
get the following trace in debug-kernel:
BUG: using smp_processor_id() in preemptible [00000000] code: devlink
caller is dr_create_cq.constprop.2+0x31d/0x970 [mlx5_core]
Call Trace:
dump_stack+0x9a/0xf0
debug_smp_processor_id+0x1f3/0x200
dr_create_cq.constprop.2+0x31d/0x970
genl_family_rcv_msg+0x5fd/0x1170
genl_rcv_msg+0xb8/0x160
netlink_rcv_skb+0x11e/0x340

Fixes: 297cccebdc5a ("net/mlx5: DR, Expose an internal API to issue RDMA operations")
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
4 years agonet/mlx5: E-Switch, Prevent ingress rate configuration of uplink rep
Eli Cohen [Sun, 12 Jan 2020 11:43:37 +0000 (13:43 +0200)]
net/mlx5: E-Switch, Prevent ingress rate configuration of uplink rep

Since the implementation relies on limiting the VF transmit rate to
simulate ingress rate limiting, and since either uplink representor or
ecpf are not associated with a VF, we limit the rate limit configuration
for those ports.

Fixes: fcb64c0f5640 ("net/mlx5: E-Switch, add ingress rate support")
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
4 years agonet/mlx5: DR, Enable counter on non-fwd-dest objects
Erez Shitrit [Wed, 8 Jan 2020 12:17:32 +0000 (14:17 +0200)]
net/mlx5: DR, Enable counter on non-fwd-dest objects

The current code handles only counters that attached to dest, we still
have the cases where we have counter on non-dest, like over drop etc.

Fixes: 6a48faeeca10 ("net/mlx5: Add direct rule fs_cmd implementation")
Signed-off-by: Hamdan Igbaria <hamdani@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
4 years agonet/mlx5: Update the list of the PCI supported devices
Meir Lichtinger [Thu, 12 Dec 2019 14:09:33 +0000 (16:09 +0200)]
net/mlx5: Update the list of the PCI supported devices

Add the upcoming ConnectX-7 device ID.

Fixes: 85327a9c4150 ("net/mlx5: Update the list of the PCI supported devices")
Signed-off-by: Meir Lichtinger <meirl@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
4 years agonet/mlx5: Fix lowest FDB pool size
Paul Blakey [Tue, 31 Dec 2019 15:04:15 +0000 (17:04 +0200)]
net/mlx5: Fix lowest FDB pool size

The pool sizes represent the pool sizes in the fw. when we request
a pool size from fw, it will return the next possible group.
We track how many pools the fw has left and start requesting groups
from the big to the small.
When we start request 4k group, which doesn't exists in fw, fw
wants to allocate the next possible size, 64k, but will fail since
its exhausted. The correct smallest pool size in fw is 128 and not 4k.

Fixes: e52c28024008 ("net/mlx5: E-Switch, Add chains and priorities")
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
4 years agonet: Fix skb->csum update in inet_proto_csum_replace16().
Praveen Chaudhary [Thu, 23 Jan 2020 20:33:28 +0000 (12:33 -0800)]
net: Fix skb->csum update in inet_proto_csum_replace16().

skb->csum is updated incorrectly, when manipulation for
NF_NAT_MANIP_SRC\DST is done on IPV6 packet.

Fix:
There is no need to update skb->csum in inet_proto_csum_replace16(),
because update in two fields a.) IPv6 src/dst address and b.) L4 header
checksum cancels each other for skb->csum calculation. Whereas
inet_proto_csum_replace4 function needs to update skb->csum, because
update in 3 fields a.) IPv4 src/dst address, b.) IPv4 Header checksum
and c.) L4 header checksum results in same diff as L4 Header checksum
for skb->csum calculation.

[ pablo@netfilter.org: a few comestic documentation edits ]
Signed-off-by: Praveen Chaudhary <pchaudhary@linkedin.com>
Signed-off-by: Zhenggen Xu <zxu@linkedin.com>
Signed-off-by: Andy Stracner <astracner@linkedin.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 years agonetfilter: nf_tables: autoload modules from the abort path
Pablo Neira Ayuso [Tue, 21 Jan 2020 15:48:03 +0000 (16:48 +0100)]
netfilter: nf_tables: autoload modules from the abort path

This patch introduces a list of pending module requests. This new module
list is composed of nft_module_request objects that contain the module
name and one status field that tells if the module has been already
loaded (the 'done' field).

In the first pass, from the preparation phase, the netlink command finds
that a module is missing on this list. Then, a module request is
allocated and added to this list and nft_request_module() returns
-EAGAIN. This triggers the abort path with the autoload parameter set on
from nfnetlink, request_module() is called and the module request enters
the 'done' state. Since the mutex is released when loading modules from
the abort phase, the module list is zapped so this is iteration occurs
over a local list. Therefore, the request_module() calls happen when
object lists are in consistent state (after fulling aborting the
transaction) and the commit list is empty.

On the second pass, the netlink command will find that it already tried
to load the module, so it does not request it again and
nft_request_module() returns 0. Then, there is a look up to find the
object that the command was missing. If the module was successfully
loaded, the command proceeds normally since it finds the missing object
in place, otherwise -ENOENT is reported to userspace.

This patch also updates nfnetlink to include the reason to enter the
abort phase, which is required for this new autoload module rationale.

Fixes: ec7470b834fe ("netfilter: nf_tables: store transaction list locally while requesting module")
Reported-by: syzbot+29125d208b3dae9a7019@syzkaller.appspotmail.com
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 years agonetfilter: nf_tables: add __nft_chain_type_get()
Pablo Neira Ayuso [Tue, 21 Jan 2020 15:07:00 +0000 (16:07 +0100)]
netfilter: nf_tables: add __nft_chain_type_get()

This new helper function validates that unknown family and chain type
coming from userspace do not trigger an out-of-bound array access. Bail
out in case __nft_chain_type_get() returns NULL from
nft_chain_parse_hook().

Fixes: 9370761c56b6 ("netfilter: nf_tables: convert built-in tables/chains to chain types")
Reported-by: syzbot+156a04714799b1d480bc@syzkaller.appspotmail.com
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 years agonetfilter: nf_tables_offload: fix check the chain offload flag
wenxu [Sun, 19 Jan 2020 05:18:30 +0000 (13:18 +0800)]
netfilter: nf_tables_offload: fix check the chain offload flag

In the nft_indr_block_cb the chain should check the flag with
NFT_CHAIN_HW_OFFLOAD.

Fixes: 9a32669fecfb ("netfilter: nf_tables_offload: support indr block call")
Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 years agoMerge tag 'iommu-fixes-v5.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 24 Jan 2020 17:56:59 +0000 (09:56 -0800)]
Merge tag 'iommu-fixes-v5.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu fixes from Joerg Roedel:
 "Two fixes:

   - Fix NULL-ptr dereference bug in Intel IOMMU driver

   - Properly save and restore AMD IOMMU performance counter registers
     when testing if they are writable"

* tag 'iommu-fixes-v5.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/amd: Fix IOMMU perf counter clobbering during init
  iommu/vt-d: Call __dmar_remove_one_dev_info with valid pointer

4 years agoMerge tag 'powerpc-5.5-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Linus Torvalds [Fri, 24 Jan 2020 17:49:20 +0000 (09:49 -0800)]
Merge tag 'powerpc-5.5-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "Some more powerpc fixes for 5.5:

   - Fix our hash MMU code to avoid having overlapping ids between user
     and kernel, which isn't as bad as it sounds but led to crashes on
     some machines.

   - A fix for the Power9 XIVE interrupt code, which could return the
     wrong interrupt state in obscure error conditions.

   - A minor Kconfig fix for the recently added CONFIG_PPC_UV code.

  Thanks to Aneesh Kumar K.V, Bharata B Rao, Cédric Le Goater, Frederic
  Barrat"

* tag 'powerpc-5.5-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/mm/hash: Fix sharing context ids between kernel & userspace
  powerpc/xive: Discard ESB load value when interrupt is invalid
  powerpc: Ultravisor: Fix the dependencies for CONFIG_PPC_UV

4 years agoMerge tag 'drm-fixes-2020-01-24' of git://anongit.freedesktop.org/drm/drm
Linus Torvalds [Fri, 24 Jan 2020 17:38:04 +0000 (09:38 -0800)]
Merge tag 'drm-fixes-2020-01-24' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
 "This one has a core mst fix and two i915 fixes. amdgpu just enables
  some hw outside experimental.

  The panfrost fix is a little bigger than I'd like at this stage but it
  fixes a fairly fundamental problem with global shared buffers in that
  driver, and since it's confined to that driver and I've taken a look
  at it, I think it's fine to get into the tree now, so it can get
  stable propagated as well.

  core/mst:
   - Fix SST branch device handling

  amdgpu:
   - enable renoir outside experimental

  i915:
   - Avoid overflow with huge userptr objects
   - uAPI fix to correctly handle negative values in
     engine->uabi_class/instance (cc: stable)

  panfrost:
   - Fix mapping of globally visible BO's (Boris)"

* tag 'drm-fixes-2020-01-24' of git://anongit.freedesktop.org/drm/drm:
  drm/amdgpu: remove the experimental flag for renoir
  drm/panfrost: Add the panfrost_gem_mapping concept
  drm/i915: Align engine->uabi_class/instance with i915_drm.h
  drm/i915/userptr: fix size calculation
  drm/dp_mst: Handle SST-only branch device case

4 years agolib: Reduce user_access_begin() boundaries in strncpy_from_user() and strnlen_user()
Christophe Leroy [Thu, 23 Jan 2020 08:34:18 +0000 (08:34 +0000)]
lib: Reduce user_access_begin() boundaries in strncpy_from_user() and strnlen_user()

The range passed to user_access_begin() by strncpy_from_user() and
strnlen_user() starts at 'src' and goes up to the limit of userspace
although reads will be limited by the 'count' param.

On 32 bits powerpc (book3s/32) access has to be granted for each
256Mbytes segment and the cost increases with the number of segments to
unlock.

Limit the range with 'count' param.

Fixes: 594cc251fdd0 ("make 'user_access_begin()' do 'access_ok()'")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agonetfilter: conntrack: sctp: use distinct states for new SCTP connections
Jiri Wiesner [Sat, 18 Jan 2020 12:10:50 +0000 (13:10 +0100)]
netfilter: conntrack: sctp: use distinct states for new SCTP connections

The netlink notifications triggered by the INIT and INIT_ACK chunks
for a tracked SCTP association do not include protocol information
for the corresponding connection - SCTP state and verification tags
for the original and reply direction are missing. Since the connection
tracking implementation allows user space programs to receive
notifications about a connection and then create a new connection
based on the values received in a notification, it makes sense that
INIT and INIT_ACK notifications should contain the SCTP state
and verification tags available at the time when a notification
is sent. The missing verification tags cause a newly created
netfilter connection to fail to verify the tags of SCTP packets
when this connection has been created from the values previously
received in an INIT or INIT_ACK notification.

A PROTOINFO event is cached in sctp_packet() when the state
of a connection changes. The CLOSED and COOKIE_WAIT state will
be used for connections that have seen an INIT and INIT_ACK chunk,
respectively. The distinct states will cause a connection state
change in sctp_packet().

Signed-off-by: Jiri Wiesner <jwiesner@suse.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4 years agoiommu/amd: Fix IOMMU perf counter clobbering during init
Shuah Khan [Thu, 23 Jan 2020 22:32:14 +0000 (15:32 -0700)]
iommu/amd: Fix IOMMU perf counter clobbering during init

init_iommu_perf_ctr() clobbers the register when it checks write access
to IOMMU perf counters and fails to restore when they are writable.

Add save and restore to fix it.

Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Fixes: 30861ddc9cca4 ("perf/x86/amd: Add IOMMU Performance Counter resource management")
Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
4 years agoiommu/vt-d: Call __dmar_remove_one_dev_info with valid pointer
Jerry Snitselaar [Wed, 22 Jan 2020 00:34:26 +0000 (17:34 -0700)]
iommu/vt-d: Call __dmar_remove_one_dev_info with valid pointer

It is possible for archdata.iommu to be set to
DEFER_DEVICE_DOMAIN_INFO or DUMMY_DEVICE_DOMAIN_INFO so check for
those values before calling __dmar_remove_one_dev_info. Without a
check it can result in a null pointer dereference. This has been seen
while booting a kdump kernel on an HP dl380 gen9.

Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: stable@vger.kernel.org # 5.3+
Cc: linux-kernel@vger.kernel.org
Fixes: ae23bfb68f28 ("iommu/vt-d: Detach domain before using a private one")
Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com>
Acked-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
4 years agobtrfs: scrub: Require mandatory block group RO for dev-replace
Qu Wenruo [Thu, 23 Jan 2020 23:58:20 +0000 (07:58 +0800)]
btrfs: scrub: Require mandatory block group RO for dev-replace

[BUG]
For dev-replace test cases with fsstress, like btrfs/06[45] btrfs/071,
looped runs can lead to random failure, where scrub finds csum error.

The possibility is not high, around 1/20 to 1/100, but it's causing data
corruption.

The bug is observable after commit b12de52896c0 ("btrfs: scrub: Don't
check free space before marking a block group RO")

[CAUSE]
Dev-replace has two source of writes:

- Write duplication
  All writes to source device will also be duplicated to target device.

  Content: Not yet persisted data/meta

- Scrub copy
  Dev-replace reused scrub code to iterate through existing extents, and
  copy the verified data to target device.

  Content: Previously persisted data and metadata

The difference in contents makes the following race possible:
Regular Writer | Dev-replace
-----------------------------------------------------------------
  ^                             |
  | Preallocate one data extent |
  | at bytenr X, len 1M |
  v |
  ^ Commit transaction |
  | Now extent [X, X+1M) is in  |
  v commit root |
 ================== Dev replace starts =========================
   | ^
| | Scrub extent [X, X+1M)
| | Read [X, X+1M)
| | (The content are mostly garbage
| |  since it's preallocated)
  ^ | v
  | Write back happens for |
  | extent [X, X+512K) |
  | New data writes to both |
  | source and target dev. |
  v |
| ^
| | Scrub writes back extent [X, X+1M)
| | to target device.
| | This will over write the new data in
| | [X, X+512K)
| v

This race can only happen for nocow writes. Thus metadata and data cow
writes are safe, as COW will never overwrite extents of previous
transaction (in commit root).

This behavior can be confirmed by disabling all fallocate related calls
in fsstress (*), then all related tests can pass a 2000 run loop.

*: FSSTRESS_AVOID="-f fallocate=0 -f allocsp=0 -f zero=0 -f insert=0 \
   -f collapse=0 -f punch=0 -f resvsp=0"
   I didn't expect resvsp ioctl will fallback to fallocate in VFS...

[FIX]
Make dev-replace to require mandatory block group RO, and wait for current
nocow writes before calling scrub_chunk().

This patch will mostly revert commit 76a8efa171bf ("btrfs: Continue replace
when set_block_ro failed") for dev-replace path.

The side effect is, dev-replace can be more strict on avaialble space, but
definitely worth to avoid data corruption.

Reported-by: Filipe Manana <fdmanana@suse.com>
Fixes: 76a8efa171bf ("btrfs: Continue replace when set_block_ro failed")
Fixes: b12de52896c0 ("btrfs: scrub: Don't check free space before marking a block group RO")
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
4 years agoMerge branch 'netdev-seq_file-next-functions-should-increase-position-index'
David S. Miller [Fri, 24 Jan 2020 10:42:18 +0000 (11:42 +0100)]
Merge branch 'netdev-seq_file-next-functions-should-increase-position-index'

Vasily Averin says:

====================
netdev: seq_file .next functions should increase position index

In Aug 2018 NeilBrown noticed
commit 1f4aace60b0e ("fs/seq_file.c: simplify seq_file iteration code and interface")
"Some ->next functions do not increment *pos when they return NULL...
Note that such ->next functions are buggy and should be fixed.
A simple demonstration is

dd if=/proc/swaps bs=1000 skip=1

Choose any block size larger than the size of /proc/swaps.  This will
always show the whole last line of /proc/swaps"

Described problem is still actual. If you make lseek into middle of last output line
following read will output end of last line and whole last line once again.

$ dd if=/proc/swaps bs=1  # usual output
Filename Type Size Used Priority
/dev/dm-0                               partition 4194812 97536 -2
104+0 records in
104+0 records out
104 bytes copied

$ dd if=/proc/swaps bs=40 skip=1    # last line was generated twice
dd: /proc/swaps: cannot skip to specified offset
v/dm-0                               partition 4194812 97536 -2
/dev/dm-0                               partition 4194812 97536 -2
3+1 records in
3+1 records out
131 bytes copied

There are lot of other affected files, I've found 30+ including
/proc/net/ip_tables_matches and /proc/sysvipc/*

This patch-set fixes files related to netdev@

https://bugzilla.kernel.org/show_bug.cgi?id=206283
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoipv6_route_seq_next should increase position index
Vasily Averin [Thu, 23 Jan 2020 07:12:06 +0000 (10:12 +0300)]
ipv6_route_seq_next should increase position index

if seq_file .next fuction does not change position index,
read after some lseek can generate unexpected output.

https://bugzilla.kernel.org/show_bug.cgi?id=206283
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agort_cpu_seq_next should increase position index
Vasily Averin [Thu, 23 Jan 2020 07:11:35 +0000 (10:11 +0300)]
rt_cpu_seq_next should increase position index

if seq_file .next fuction does not change position index,
read after some lseek can generate unexpected output.

https://bugzilla.kernel.org/show_bug.cgi?id=206283
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoneigh_stat_seq_next() should increase position index
Vasily Averin [Thu, 23 Jan 2020 07:11:28 +0000 (10:11 +0300)]
neigh_stat_seq_next() should increase position index

if seq_file .next fuction does not change position index,
read after some lseek can generate unexpected output.

https://bugzilla.kernel.org/show_bug.cgi?id=206283
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agovcc_seq_next should increase position index
Vasily Averin [Thu, 23 Jan 2020 07:11:20 +0000 (10:11 +0300)]
vcc_seq_next should increase position index

if seq_file .next fuction does not change position index,
read after some lseek can generate unexpected output.

https://bugzilla.kernel.org/show_bug.cgi?id=206283
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agol2t_seq_next should increase position index
Vasily Averin [Thu, 23 Jan 2020 07:11:13 +0000 (10:11 +0300)]
l2t_seq_next should increase position index

if seq_file .next fuction does not change position index,
read after some lseek can generate unexpected output.

https://bugzilla.kernel.org/show_bug.cgi?id=206283
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoseq_tab_next() should increase position index
Vasily Averin [Thu, 23 Jan 2020 07:11:08 +0000 (10:11 +0300)]
seq_tab_next() should increase position index

if seq_file .next fuction does not change position index,
read after some lseek can generate unexpected output.

https://bugzilla.kernel.org/show_bug.cgi?id=206283
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agotcp: do not leave dangling pointers in tp->highest_sack
Eric Dumazet [Thu, 23 Jan 2020 05:03:00 +0000 (21:03 -0800)]
tcp: do not leave dangling pointers in tp->highest_sack

Latest commit 853697504de0 ("tcp: Fix highest_sack and highest_sack_seq")
apparently allowed syzbot to trigger various crashes in TCP stack [1]

I believe this commit only made things easier for syzbot to find
its way into triggering use-after-frees. But really the bugs
could lead to bad TCP behavior or even plain crashes even for
non malicious peers.

I have audited all calls to tcp_rtx_queue_unlink() and
tcp_rtx_queue_unlink_and_free() and made sure tp->highest_sack would be updated
if we are removing from rtx queue the skb that tp->highest_sack points to.

These updates were missing in three locations :

1) tcp_clean_rtx_queue() [This one seems quite serious,
                          I have no idea why this was not caught earlier]

2) tcp_rtx_queue_purge() [Probably not a big deal for normal operations]

3) tcp_send_synack()     [Probably not a big deal for normal operations]

[1]
BUG: KASAN: use-after-free in tcp_highest_sack_seq include/net/tcp.h:1864 [inline]
BUG: KASAN: use-after-free in tcp_highest_sack_seq include/net/tcp.h:1856 [inline]
BUG: KASAN: use-after-free in tcp_check_sack_reordering+0x33c/0x3a0 net/ipv4/tcp_input.c:891
Read of size 4 at addr ffff8880a488d068 by task ksoftirqd/1/16

CPU: 1 PID: 16 Comm: ksoftirqd/1 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x197/0x210 lib/dump_stack.c:118
 print_address_description.constprop.0.cold+0xd4/0x30b mm/kasan/report.c:374
 __kasan_report.cold+0x1b/0x41 mm/kasan/report.c:506
 kasan_report+0x12/0x20 mm/kasan/common.c:639
 __asan_report_load4_noabort+0x14/0x20 mm/kasan/generic_report.c:134
 tcp_highest_sack_seq include/net/tcp.h:1864 [inline]
 tcp_highest_sack_seq include/net/tcp.h:1856 [inline]
 tcp_check_sack_reordering+0x33c/0x3a0 net/ipv4/tcp_input.c:891
 tcp_try_undo_partial net/ipv4/tcp_input.c:2730 [inline]
 tcp_fastretrans_alert+0xf74/0x23f0 net/ipv4/tcp_input.c:2847
 tcp_ack+0x2577/0x5bf0 net/ipv4/tcp_input.c:3710
 tcp_rcv_established+0x6dd/0x1e90 net/ipv4/tcp_input.c:5706
 tcp_v4_do_rcv+0x619/0x8d0 net/ipv4/tcp_ipv4.c:1619
 tcp_v4_rcv+0x307f/0x3b40 net/ipv4/tcp_ipv4.c:2001
 ip_protocol_deliver_rcu+0x5a/0x880 net/ipv4/ip_input.c:204
 ip_local_deliver_finish+0x23b/0x380 net/ipv4/ip_input.c:231
 NF_HOOK include/linux/netfilter.h:307 [inline]
 NF_HOOK include/linux/netfilter.h:301 [inline]
 ip_local_deliver+0x1e9/0x520 net/ipv4/ip_input.c:252
 dst_input include/net/dst.h:442 [inline]
 ip_rcv_finish+0x1db/0x2f0 net/ipv4/ip_input.c:428
 NF_HOOK include/linux/netfilter.h:307 [inline]
 NF_HOOK include/linux/netfilter.h:301 [inline]
 ip_rcv+0xe8/0x3f0 net/ipv4/ip_input.c:538
 __netif_receive_skb_one_core+0x113/0x1a0 net/core/dev.c:5148
 __netif_receive_skb+0x2c/0x1d0 net/core/dev.c:5262
 process_backlog+0x206/0x750 net/core/dev.c:6093
 napi_poll net/core/dev.c:6530 [inline]
 net_rx_action+0x508/0x1120 net/core/dev.c:6598
 __do_softirq+0x262/0x98c kernel/softirq.c:292
 run_ksoftirqd kernel/softirq.c:603 [inline]
 run_ksoftirqd+0x8e/0x110 kernel/softirq.c:595
 smpboot_thread_fn+0x6a3/0xa40 kernel/smpboot.c:165
 kthread+0x361/0x430 kernel/kthread.c:255
 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

Allocated by task 10091:
 save_stack+0x23/0x90 mm/kasan/common.c:72
 set_track mm/kasan/common.c:80 [inline]
 __kasan_kmalloc mm/kasan/common.c:513 [inline]
 __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:486
 kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:521
 slab_post_alloc_hook mm/slab.h:584 [inline]
 slab_alloc_node mm/slab.c:3263 [inline]
 kmem_cache_alloc_node+0x138/0x740 mm/slab.c:3575
 __alloc_skb+0xd5/0x5e0 net/core/skbuff.c:198
 alloc_skb_fclone include/linux/skbuff.h:1099 [inline]
 sk_stream_alloc_skb net/ipv4/tcp.c:875 [inline]
 sk_stream_alloc_skb+0x113/0xc90 net/ipv4/tcp.c:852
 tcp_sendmsg_locked+0xcf9/0x3470 net/ipv4/tcp.c:1282
 tcp_sendmsg+0x30/0x50 net/ipv4/tcp.c:1432
 inet_sendmsg+0x9e/0xe0 net/ipv4/af_inet.c:807
 sock_sendmsg_nosec net/socket.c:652 [inline]
 sock_sendmsg+0xd7/0x130 net/socket.c:672
 __sys_sendto+0x262/0x380 net/socket.c:1998
 __do_sys_sendto net/socket.c:2010 [inline]
 __se_sys_sendto net/socket.c:2006 [inline]
 __x64_sys_sendto+0xe1/0x1a0 net/socket.c:2006
 do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 10095:
 save_stack+0x23/0x90 mm/kasan/common.c:72
 set_track mm/kasan/common.c:80 [inline]
 kasan_set_free_info mm/kasan/common.c:335 [inline]
 __kasan_slab_free+0x102/0x150 mm/kasan/common.c:474
 kasan_slab_free+0xe/0x10 mm/kasan/common.c:483
 __cache_free mm/slab.c:3426 [inline]
 kmem_cache_free+0x86/0x320 mm/slab.c:3694
 kfree_skbmem+0x178/0x1c0 net/core/skbuff.c:645
 __kfree_skb+0x1e/0x30 net/core/skbuff.c:681
 sk_eat_skb include/net/sock.h:2453 [inline]
 tcp_recvmsg+0x1252/0x2930 net/ipv4/tcp.c:2166
 inet_recvmsg+0x136/0x610 net/ipv4/af_inet.c:838
 sock_recvmsg_nosec net/socket.c:886 [inline]
 sock_recvmsg net/socket.c:904 [inline]
 sock_recvmsg+0xce/0x110 net/socket.c:900
 __sys_recvfrom+0x1ff/0x350 net/socket.c:2055
 __do_sys_recvfrom net/socket.c:2073 [inline]
 __se_sys_recvfrom net/socket.c:2069 [inline]
 __x64_sys_recvfrom+0xe1/0x1a0 net/socket.c:2069
 do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff8880a488d040
 which belongs to the cache skbuff_fclone_cache of size 456
The buggy address is located 40 bytes inside of
 456-byte region [ffff8880a488d040ffff8880a488d208)
The buggy address belongs to the page:
page:ffffea0002922340 refcount:1 mapcount:0 mapping:ffff88821b057000 index:0x0
raw: 00fffe0000000200 ffffea00022a5788 ffffea0002624a48 ffff88821b057000
raw: 0000000000000000 ffff8880a488d040 0000000100000006 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff8880a488cf00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff8880a488cf80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8880a488d000: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
                                                          ^
 ffff8880a488d080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff8880a488d100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 853697504de0 ("tcp: Fix highest_sack and highest_sack_seq")
Fixes: 50895b9de1d3 ("tcp: highest_sack fix")
Fixes: 737ff314563c ("tcp: use sequence distance to detect reordering")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Cambda Zhu <cambda@linux.alibaba.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/rose: fix spelling mistake "to" -> "too"
Colin Ian King [Thu, 23 Jan 2020 09:27:30 +0000 (09:27 +0000)]
net/rose: fix spelling mistake "to" -> "too"

There is a spelling mistake in a printk message. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agocaif_usb: fix spelling mistake "to" -> "too"
Colin Ian King [Thu, 23 Jan 2020 00:45:55 +0000 (00:45 +0000)]
caif_usb: fix spelling mistake "to" -> "too"

There is a spelling mistake in a pr_warn message. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoipvs: fix spelling mistake "to" -> "too"
Colin Ian King [Thu, 23 Jan 2020 00:43:28 +0000 (00:43 +0000)]
ipvs: fix spelling mistake "to" -> "too"

There is a spelling mistake in a IP_VS_ERR_RL message. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoi40e: fix spelling mistake "to" -> "too"
Colin Ian King [Thu, 23 Jan 2020 00:14:29 +0000 (00:14 +0000)]
i40e: fix spelling mistake "to" -> "too"

There is a spelling mistake in a hw_dbg message. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge tag 'mmc-v5.5-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Linus Torvalds [Fri, 24 Jan 2020 00:02:00 +0000 (16:02 -0800)]
Merge tag 'mmc-v5.5-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc

Pull MMC fixes from Ulf Hansson:
 "A couple of MMC host fixes:

   - sdhci: Fix minimum clock rate for v3 controllers

   - sdhci-tegra: Fix SDR50 tuning override

   - sdhci_am654: Fixup tuning issues and support for CQHCI

   - sdhci_am654: Remove wrong write protect flag"

* tag 'mmc-v5.5-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: sdhci: fix minimum clock rate for v3 controller
  mmc: tegra: fix SDR50 tuning override
  mmc: sdhci_am654: Fix Command Queuing in AM65x
  mmc: sdhci_am654: Reset Command and Data line after tuning
  mmc: sdhci_am654: Remove Inverted Write Protect flag

4 years agoMerge tag 'amd-drm-fixes-5.5-2020-01-23' of git://people.freedesktop.org/~agd5f/linux...
Dave Airlie [Thu, 23 Jan 2020 22:58:12 +0000 (08:58 +1000)]
Merge tag 'amd-drm-fixes-5.5-2020-01-23' of git://people.freedesktop.org/~agd5f/linux into drm-fixes

amd-drm-fixes-5.5-2020-01-23:

amdgpu:
- remove the experimental flag from renoir

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200123191424.3849-1-alexander.deucher@amd.com
4 years agoMerge tag 'drm-intel-fixes-2020-01-23' of git://anongit.freedesktop.org/drm/drm-intel...
Dave Airlie [Thu, 23 Jan 2020 22:57:37 +0000 (08:57 +1000)]
Merge tag 'drm-intel-fixes-2020-01-23' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

- Avoid overflow with huge userptr objects
- uAPI fix to correctly handle negative values in
  engine->uabi_class/instance (cc: stable)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200123135045.GA12584@jlahtine-desk.ger.corp.intel.com
4 years agonet_sched: fix datalen for ematch
Cong Wang [Wed, 22 Jan 2020 23:42:02 +0000 (15:42 -0800)]
net_sched: fix datalen for ematch

syzbot reported an out-of-bound access in em_nbyte. As initially
analyzed by Eric, this is because em_nbyte sets its own em->datalen
in em_nbyte_change() other than the one specified by user, but this
value gets overwritten later by its caller tcf_em_validate().
We should leave em->datalen untouched to respect their choices.

I audit all the in-tree ematch users, all of those implement
->change() set em->datalen, so we can just avoid setting it twice
in this case.

Reported-and-tested-by: syzbot+5af9a90dad568aa9f611@syzkaller.appspotmail.com
Reported-by: syzbot+2f07903a5b05e7f36410@syzkaller.appspotmail.com
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'Fixes-for-SONIC-ethernet-driver'
David S. Miller [Thu, 23 Jan 2020 20:24:37 +0000 (21:24 +0100)]
Merge branch 'Fixes-for-SONIC-ethernet-driver'

Finn Thain says:

====================
Fixes for SONIC ethernet driver

Various SONIC driver problems have become apparent over the years,
including tx watchdog timeouts, lost packets and duplicated packets.

The problems are mostly caused by bugs in buffer handling, locking and
(re-)initialization code.

This patch series resolves these problems.

This series has been tested on National Semiconductor hardware (macsonic),
qemu-system-m68k (macsonic) and qemu-system-mips64el (jazzsonic).

The emulated dp8393x device used in QEMU also has bugs.
I have fixed the bugs that I know of in a series of patches at,
https://github.com/fthain/qemu/commits/sonic

Changed since v1:
 - Minor revisions as described in commit logs.
 - Deferred net-next patches.
Changed since v2:
 - Minor revisions as described in commit logs.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/sonic: Prevent tx watchdog timeout
Finn Thain [Wed, 22 Jan 2020 22:07:26 +0000 (09:07 +1100)]
net/sonic: Prevent tx watchdog timeout

Section 5.5.3.2 of the datasheet says,

    If FIFO Underrun, Byte Count Mismatch, Excessive Collision, or
    Excessive Deferral (if enabled) errors occur, transmission ceases.

In this situation, the chip asserts a TXER interrupt rather than TXDN.
But the handler for the TXDN is the only way that the transmit queue
gets restarted. Hence, an aborted transmission can result in a watchdog
timeout.

This problem can be reproduced on congested link, as that can result in
excessive transmitter collisions. Another way to reproduce this is with
a FIFO Underrun, which may be caused by DMA latency.

In event of a TXER interrupt, prevent a watchdog timeout by restarting
transmission.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/sonic: Fix CAM initialization
Finn Thain [Wed, 22 Jan 2020 22:07:26 +0000 (09:07 +1100)]
net/sonic: Fix CAM initialization

Section 4.3.1 of the datasheet says,

    This bit [TXP] must not be set if a Load CAM operation is in
    progress (LCAM is set). The SONIC will lock up if both bits are
    set simultaneously.

Testing has shown that the driver sometimes attempts to set LCAM
while TXP is set. Avoid this by waiting for command completion
before and after giving the LCAM command.

After issuing the Load CAM command, poll for !SONIC_CR_LCAM rather than
SONIC_INT_LCD, because the SONIC_CR_TXP bit can't be used until
!SONIC_CR_LCAM.

When in reset mode, take the opportunity to reset the CAM Enable
register.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/sonic: Fix command register usage
Finn Thain [Wed, 22 Jan 2020 22:07:26 +0000 (09:07 +1100)]
net/sonic: Fix command register usage

There are several issues relating to command register usage during
chip initialization.

Firstly, the SONIC sometimes comes out of software reset with the
Start Timer bit set. This gets logged as,

    macsonic macsonic eth0: sonic_init: status=24, i=101

Avoid this by giving the Stop Timer command earlier than later.

Secondly, the loop that waits for the Read RRA command to complete has
the break condition inverted. That's why the for loop iterates until
its termination condition. Call the helper for this instead.

Finally, give the Receiver Enable command after clearing interrupts,
not before, to avoid the possibility of losing an interrupt.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/sonic: Quiesce SONIC before re-initializing descriptor memory
Finn Thain [Wed, 22 Jan 2020 22:07:26 +0000 (09:07 +1100)]
net/sonic: Quiesce SONIC before re-initializing descriptor memory

Make sure the SONIC's DMA engine is idle before altering the transmit
and receive descriptors. Add a helper for this as it will be needed
again.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/sonic: Fix receive buffer replenishment
Finn Thain [Wed, 22 Jan 2020 22:07:26 +0000 (09:07 +1100)]
net/sonic: Fix receive buffer replenishment

As soon as the driver is finished with a receive buffer it allocs a new
one and overwrites the corresponding RRA entry with a new buffer pointer.

Problem is, the buffer pointer is split across two word-sized registers.
It can't be updated in one atomic store. So this operation races with the
chip while it stores received packets and advances its RRP register.
This could result in memory corruption by a DMA write.

Avoid this problem by adding buffers only at the location given by the
RWP register, in accordance with the National Semiconductor datasheet.

Re-factor this code into separate functions to calculate a RRA pointer
and to update the RWP.

Fixes: efcce839360f ("[PATCH] macsonic/jazzsonic network drivers update")
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/sonic: Improve receive descriptor status flag check
Finn Thain [Wed, 22 Jan 2020 22:07:26 +0000 (09:07 +1100)]
net/sonic: Improve receive descriptor status flag check

After sonic_tx_timeout() calls sonic_init(), it can happen that
sonic_rx() will subsequently encounter a receive descriptor with no
flags set. Remove the comment that says that this can't happen.

When giving a receive descriptor to the SONIC, clear the descriptor
status field. That way, any rx descriptor with flags set can only be
a newly received packet.

Don't process a descriptor without the LPKT bit set. The buffer is
still in use by the SONIC.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/sonic: Avoid needless receive descriptor EOL flag updates
Finn Thain [Wed, 22 Jan 2020 22:07:26 +0000 (09:07 +1100)]
net/sonic: Avoid needless receive descriptor EOL flag updates

The while loop in sonic_rx() traverses the rx descriptor ring. It stops
when it reaches a descriptor that the SONIC has not used. Each iteration
advances the EOL flag so the SONIC can keep using more descriptors.
Therefore, the while loop has no definite termination condition.

The algorithm described in the National Semiconductor literature is quite
different. It consumes descriptors up to the one with its EOL flag set
(which will also have its "in use" flag set). All freed descriptors are
then returned to the ring at once, by adjusting the EOL flags (and link
pointers).

Adopt the algorithm from datasheet as it's simpler, terminates quickly
and avoids a lot of pointless descriptor EOL flag changes.

Fixes: efcce839360f ("[PATCH] macsonic/jazzsonic network drivers update")
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/sonic: Fix receive buffer handling
Finn Thain [Wed, 22 Jan 2020 22:07:26 +0000 (09:07 +1100)]
net/sonic: Fix receive buffer handling

The SONIC can sometimes advance its rx buffer pointer (RRP register)
without advancing its rx descriptor pointer (CRDA register). As a result
the index of the current rx descriptor may not equal that of the current
rx buffer. The driver mistakenly assumes that they are always equal.
This assumption leads to incorrect packet lengths and possible packet
duplication. Avoid this by calling a new function to locate the buffer
corresponding to a given descriptor.

Fixes: efcce839360f ("[PATCH] macsonic/jazzsonic network drivers update")
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/sonic: Fix interface error stats collection
Finn Thain [Wed, 22 Jan 2020 22:07:26 +0000 (09:07 +1100)]
net/sonic: Fix interface error stats collection

The tx_aborted_errors statistic should count packets flagged with EXD,
EXC, FU, or BCM bits because those bits denote an aborted transmission.
That corresponds to the bitmask 0x0446, not 0x0642. Use macros for these
constants to avoid mistakes. Better to leave out FIFO Underruns (FU) as
there's a separate counter for that purpose.

Don't lump all these errors in with the general tx_errors counter as
that's used for tx timeout events.

On the rx side, don't count RDE and RBAE interrupts as dropped packets.
These interrupts don't indicate a lost packet, just a lack of resources.
When a lack of resources results in a lost packet, this gets reported
in the rx_missed_errors counter (along with RFO events).

Don't double-count rx_frame_errors and rx_crc_errors.

Don't use the general rx_errors counter for events that already have
special counters.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/sonic: Use MMIO accessors
Finn Thain [Wed, 22 Jan 2020 22:07:26 +0000 (09:07 +1100)]
net/sonic: Use MMIO accessors

The driver accesses descriptor memory which is simultaneously accessed by
the chip, so the compiler must not be allowed to re-order CPU accesses.
sonic_buf_get() used 'volatile' to prevent that. sonic_buf_put() should
have done so too but was overlooked.

Fixes: efcce839360f ("[PATCH] macsonic/jazzsonic network drivers update")
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/sonic: Clear interrupt flags immediately
Finn Thain [Wed, 22 Jan 2020 22:07:26 +0000 (09:07 +1100)]
net/sonic: Clear interrupt flags immediately

The chip can change a packet's descriptor status flags at any time.
However, an active interrupt flag gets cleared rather late. This
allows a race condition that could theoretically lose an interrupt.
Fix this by clearing asserted interrupt flags immediately.

Fixes: efcce839360f ("[PATCH] macsonic/jazzsonic network drivers update")
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/sonic: Add mutual exclusion for accessing shared state
Finn Thain [Wed, 22 Jan 2020 22:07:26 +0000 (09:07 +1100)]
net/sonic: Add mutual exclusion for accessing shared state

The netif_stop_queue() call in sonic_send_packet() races with the
netif_wake_queue() call in sonic_interrupt(). This causes issues
like "NETDEV WATCHDOG: eth0 (macsonic): transmit queue 0 timed out".
Fix this by disabling interrupts when accessing tx_skb[] and next_tx.
Update a comment to clarify the synchronization properties.

Fixes: efcce839360f ("[PATCH] macsonic/jazzsonic network drivers update")
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>