]> asedeno.scripts.mit.edu Git - linux.git/log
linux.git
5 years agolib/ubsan: remove null-pointer checks
Andrey Ryabinin [Sat, 11 Aug 2018 00:23:03 +0000 (17:23 -0700)]
lib/ubsan: remove null-pointer checks

With gcc-8 fsanitize=null become very noisy.  GCC started to complain
about things like &a->b, where 'a' is NULL pointer.  There is no NULL
dereference, we just calculate address to struct member.  It's
technically undefined behavior so UBSAN is correct to report it.  But as
long as there is no real NULL-dereference, I think, we should be fine.

-fno-delete-null-pointer-checks compiler flag should protect us from any
consequences.  So let's just no use -fsanitize=null as it's not useful
for us.  If there is a real NULL-deref we will see crash.  Even if
userspace mapped something at NULL (root can do this), with things like
SMAP should catch the issue.

Link: http://lkml.kernel.org/r/20180802153209.813-1-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agoMAINTAINERS: GDB: update e-mail address
Kieran Bingham [Sat, 11 Aug 2018 00:23:00 +0000 (17:23 -0700)]
MAINTAINERS: GDB: update e-mail address

This entry was created with my personal e-mail address.  Update this entry
to my open-source kernel.org account.

Link: http://lkml.kernel.org/r/20180806143904.4716-4-kieran.bingham@ideasonboard.com
Signed-off-by: Kieran Bingham <kbingham@kernel.org>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agosmb3: create smb3 equivalent alias for cifs pseudo-xattrs
Steve French [Fri, 10 Aug 2018 23:46:58 +0000 (18:46 -0500)]
smb3: create smb3 equivalent alias for cifs pseudo-xattrs

We really, really don't want to be encouraging people to use
cifs (the dialect) since it is insecure, so to avoid confusion
we want to move them to names which include 'smb3' instead of
'cifs' - so this simply creates an alias for the pseudo-xattrs

e.g. can now do:
getfattr -n user.smb3.creationtime /mnt1/file
and
getfattr -n user.smb3.dosattrib /mnt1/file
and
getfattr -n system.smb3_acl /mnt1/file

instead of forcing you to use the string 'cifs' in
these (e.g. getfattr -n system.cifs_acl /mnt1/file)

Signed-off-by: Steve French <stfrench@microsoft.com>
5 years agopinctrl: nomadik: silence uninitialized variable warning
Dan Carpenter [Wed, 8 Aug 2018 12:04:49 +0000 (15:04 +0300)]
pinctrl: nomadik: silence uninitialized variable warning

This is harmless, but "val" isn't necessarily initialized if
abx500_get_register_interruptible() fails.  I've re-arranged the code to
just return an error code in that situation.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
5 years agopinctrl: axp209: Fix NULL pointer dereference after allocation
Anton Vasilyev [Mon, 6 Aug 2018 16:06:35 +0000 (19:06 +0300)]
pinctrl: axp209: Fix NULL pointer dereference after allocation

There is no check that allocation in axp20x_funcs_groups_from_mask
is successful.
The patch adds corresponding check and return values.

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Anton Vasilyev <vasilyev@ispras.ru>
Acked-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
5 years agopinctrl: samsung: Remove duplicated "wakeup" in printk
Krzysztof Kozlowski [Mon, 6 Aug 2018 16:33:40 +0000 (18:33 +0200)]
pinctrl: samsung: Remove duplicated "wakeup" in printk

Double "wakeup" appears in printed message.

Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
5 years agox86/mm/pti: Move user W+X check into pti_finalize()
Joerg Roedel [Wed, 8 Aug 2018 11:16:40 +0000 (13:16 +0200)]
x86/mm/pti: Move user W+X check into pti_finalize()

The user page-table gets the updated kernel mappings in pti_finalize(),
which runs after the RO+X permissions got applied to the kernel page-table
in mark_readonly().

But with CONFIG_DEBUG_WX enabled, the user page-table is already checked in
mark_readonly() for insecure mappings.  This causes false-positive
warnings, because the user page-table did not get the updated mappings yet.

Move the W+X check for the user page-table into pti_finalize() after it
updated all required mappings.

[ tglx: Folded !NX supported fix ]

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Waiman Long <llong@redhat.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
Cc: joro@8bytes.org
Link: https://lkml.kernel.org/r/1533727000-9172-1-git-send-email-joro@8bytes.org
5 years agoxfs: repair the AGI
Darrick J. Wong [Fri, 10 Aug 2018 05:43:04 +0000 (22:43 -0700)]
xfs: repair the AGI

Rebuild the AGI header items with some help from the rmapbt.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
5 years agoxfs: repair the AGFL
Darrick J. Wong [Fri, 10 Aug 2018 05:43:02 +0000 (22:43 -0700)]
xfs: repair the AGFL

Repair the AGFL from the rmap data.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
5 years agoxfs: repair the AGF
Darrick J. Wong [Fri, 10 Aug 2018 05:42:53 +0000 (22:42 -0700)]
xfs: repair the AGF

Regenerate the AGF from the rmap data.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
5 years agobcache: fix error setting writeback_rate through sysfs interface
Coly Li [Fri, 10 Aug 2018 15:45:50 +0000 (23:45 +0800)]
bcache: fix error setting writeback_rate through sysfs interface

Commit ea8c5356d390 ("bcache: set max writeback rate when I/O request
is idle") changes struct bch_ratelimit member rate from uint32_t to
atomic_long_t and uses atomic_long_set() in drivers/md/bcache/sysfs.c
to set new writeback rate, after the input is converted from memory
buf to long int by sysfs_strtoul_clamp().

The above change has a problem because there is an implicit return
inside sysfs_strtoul_clamp() so the following atomic_long_set()
won't be called. This error is detected by 0day system with following
snipped smatch warnings:

drivers/md/bcache/sysfs.c:271 __cached_dev_store() error: uninitialized
symbol 'v'.
270  sysfs_strtoul_clamp(writeback_rate, v, 1, INT_MAX);
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@271 atomic_long_set(&dc->writeback_rate.rate, v);

This patch fixes the above error by using strtoul_safe_clamp() to
convert the input buffer into a long int type result.

Fixes: ea8c5356d390 ("bcache: set max writeback rate when I/O request is idle")
Cc: Kai Krakow <kai@kaishome.de>
Cc: Stefan Priebe <s.priebe@profihost.ag>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoMerge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Linus Torvalds [Fri, 10 Aug 2018 17:04:56 +0000 (10:04 -0700)]
Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c fix from Wolfram Sang:
 "A single driver bugfix for I2C.

  The bug was found by systematically stress testing the driver, so I am
  confident to merge it that late in the cycle although it is probably
  unusually large"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: xlp9xx: Fix case where SSIF read transaction completes early

5 years agosmb3: allow previous versions to be mounted with snapshot= mount parm
Steve French [Fri, 10 Aug 2018 07:25:06 +0000 (02:25 -0500)]
smb3: allow previous versions to be mounted with snapshot= mount parm

mounting with the "snapshots=" mount parm allows a read-only
view of a previous version of a file system (see MS-SMB2
and "timewarp" tokens, section 2.2.13.2.6) based on the timestamp
passed in on the snapshots mount parm.

Add processing to optionally send this create context.

Example output:

/mnt1 is mounted with "snapshots=..." and will see an earlier
version of the directory, with three fewer files than /mnt2
the current version of the directory.

root@Ubuntu-17-Virtual-Machine:~/cifs-2.6# cat /proc/mounts | grep cifs
//172.22.149.186/public /mnt1 cifs
ro,relatime,vers=default,cache=strict,username=smfrench,uid=0,noforceuid,gid=0,noforcegid,addr=172.22.149.186,file_mode=0755,dir_mode=0755,soft,nounix,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,snapshot=131748608570000000,actimeo=1

//172.22.149.186/public /mnt2 cifs
rw,relatime,vers=default,cache=strict,username=smfrench,uid=0,noforceuid,gid=0,noforcegid,addr=172.22.149.186,file_mode=0755,dir_mode=0755,soft,nounix,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1

root@Ubuntu-17-Virtual-Machine:~/cifs-2.6# ls /mnt1
EmptyDir  newerdir
root@Ubuntu-17-Virtual-Machine:~/cifs-2.6# ls /mnt1/newerdir

root@Ubuntu-17-Virtual-Machine:~/cifs-2.6# ls /mnt2
EmptyDir  file  newerdir  newestdir  timestamp-trace.cap
root@Ubuntu-17-Virtual-Machine:~/cifs-2.6# ls /mnt2/newerdir
new-file-not-in-snapshot

Snapshots are extremely useful for comparing previous versions of files or directories,
and recovering from data corruptions or mistakes.

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
5 years agocifs: don't show domain= in mount output when domain is empty
Ronnie Sahlberg [Fri, 10 Aug 2018 01:31:10 +0000 (11:31 +1000)]
cifs: don't show domain= in mount output when domain is empty

Reported-by: Xiaoli Feng <xifeng@redhat.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
5 years agocifs: add missing support for ACLs in SMB 3.11
Ronnie Sahlberg [Fri, 10 Aug 2018 01:03:55 +0000 (11:03 +1000)]
cifs: add missing support for ACLs in SMB 3.11

We were missing the methods for get_acl and friends for the 3.11
dialect.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
5 years agoMerge branch 'spi-4.19' into spi-next
Mark Brown [Fri, 10 Aug 2018 16:51:52 +0000 (17:51 +0100)]
Merge branch 'spi-4.19' into spi-next

5 years agoMerge branch 'spi-4.18' into spi-linus
Mark Brown [Fri, 10 Aug 2018 16:51:50 +0000 (17:51 +0100)]
Merge branch 'spi-4.18' into spi-linus

5 years agoMerge branch 'regulator-4.19' into regulator-next
Mark Brown [Fri, 10 Aug 2018 16:31:24 +0000 (17:31 +0100)]
Merge branch 'regulator-4.19' into regulator-next

5 years agoMerge branch 'regulator-4.18' into regulator-linus
Mark Brown [Fri, 10 Aug 2018 16:31:22 +0000 (17:31 +0100)]
Merge branch 'regulator-4.18' into regulator-linus

5 years agoregulator: add QCOM RPMh regulator driver
David Collins [Sat, 14 Jul 2018 01:50:59 +0000 (18:50 -0700)]
regulator: add QCOM RPMh regulator driver

Add the QCOM RPMh regulator driver to manage PMIC regulators
which are controlled via RPMh on some Qualcomm Technologies, Inc.
SoCs.  RPMh is a hardware block which contains several
accelerators which are used to manage various hardware resources
that are shared between the processors of the SoC.  The final
hardware state of a regulator is determined within RPMh by
performing max aggregation of the requests made by all of the
processors.

Add support for PMIC regulator control via the voltage regulator
manager (VRM) and oscillator buffer (XOB) RPMh accelerators.
VRM supports manipulation of enable state, voltage, and mode.
XOB supports manipulation of enable state.

Signed-off-by: David Collins <collinsd@codeaurora.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
5 years agoregulator: dt-bindings: add QCOM RPMh regulator bindings
David Collins [Sat, 14 Jul 2018 01:50:58 +0000 (18:50 -0700)]
regulator: dt-bindings: add QCOM RPMh regulator bindings

Introduce bindings for RPMh regulator devices found on some
Qualcomm Technlogies, Inc. SoCs.  These devices allow a given
processor within the SoC to make PMIC regulator requests which
are aggregated within the RPMh hardware block along with requests
from other processors in the SoC to determine the final PMIC
regulator hardware state.

Signed-off-by: David Collins <collinsd@codeaurora.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
5 years agoMerge tag 'qcom-drivers-for-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git...
Mark Brown [Fri, 10 Aug 2018 16:29:43 +0000 (17:29 +0100)]
Merge tag 'qcom-drivers-for-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/agross/linux into regulator-4.19 for RPMH

Qualcomm ARM Based Driver Updates for v4.19

* Add Qualcomm LLCC driver
* Add Qualcomm RPMH controller
* Fix memleak in Qualcomm RMTFS
* Add dummy qcom_scm_assign_mem()
* Fix check for global partition in SMEM

5 years agohwmon: (adt7475) Change show functions to return error data correctly
Tokunori Ikegami [Wed, 8 Aug 2018 01:32:19 +0000 (10:32 +0900)]
hwmon: (adt7475) Change show functions to return error data correctly

Change update device function to return an error pointer if needed,
and report the error to user space.

Signed-off-by: Tokunori Ikegami <ikegami@allied-telesis.co.jp>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Chris Packham <chris.packham@alliedtelesis.co.nz>
[groeck: Clarified/updated description]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
5 years agohwmon: (adt7475) Change update functions to add error handling
Tokunori Ikegami [Wed, 8 Aug 2018 01:32:18 +0000 (10:32 +0900)]
hwmon: (adt7475) Change update functions to add error handling

I2C SMBus sometimes returns error codes.
In the error case, measurement values are updated incorrectly.
The sensor application then generates warning log messages and SNMP traps.
To prevent this, add error handling into the update functions.

Signed-off-by: Tokunori Ikegami <ikegami@allied-telesis.co.jp>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Chris Packham <chris.packham@alliedtelesis.co.nz>
[groeck: Update description]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
5 years agohwmon: (adt7475) Change valid parameter to bool type
Tokunori Ikegami [Wed, 8 Aug 2018 01:32:17 +0000 (10:32 +0900)]
hwmon: (adt7475) Change valid parameter to bool type

Currently the valid variable is of type char, but it is used as boolean.
So let's change it to bool.

Signed-off-by: Tokunori Ikegami <ikegami@allied-telesis.co.jp>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Chris Packham <chris.packham@alliedtelesis.co.nz>
[groeck: Update description]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
5 years agohwmon: (adt7475) Split device update function to measure and limits
Tokunori Ikegami [Wed, 8 Aug 2018 01:32:16 +0000 (10:32 +0900)]
hwmon: (adt7475) Split device update function to measure and limits

The update function reads both measurement and limit values.
Those parts can be split so split them for a maintainability.

Signed-off-by: Tokunori Ikegami <ikegami@allied-telesis.co.jp>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Chris Packham <chris.packham@alliedtelesis.co.nz>
[groeck: Clarify description]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
5 years agospi: davinci: fix a NULL pointer dereference
Bartosz Golaszewski [Fri, 10 Aug 2018 09:13:52 +0000 (11:13 +0200)]
spi: davinci: fix a NULL pointer dereference

On non-OF systems spi->controlled_data may be NULL. This causes a NULL
pointer derefence on dm365-evm.

Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
5 years agox86/microcode: Allow late microcode loading with SMT disabled
Josh Poimboeuf [Fri, 10 Aug 2018 07:31:10 +0000 (08:31 +0100)]
x86/microcode: Allow late microcode loading with SMT disabled

The kernel unnecessarily prevents late microcode loading when SMT is
disabled.  It should be safe to allow it if all the primary threads are
online.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
David S. Miller [Fri, 10 Aug 2018 06:18:29 +0000 (23:18 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2018-08-10

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Fix cpumap and devmap on teardown as they're under RCU context
   and won't have same assumption as running under NAPI protection,
   from Jesper.

2) Fix various sockmap bugs in bpf_tcp_sendmsg() code, e.g. we had
   a bug where socket error was not propagated correctly, from Daniel.

3) Fix incompatible libbpf header license for BTF code and match it
   before it gets officially released with the rest of libbpf which
   is LGPL-2.1, from Martin.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agosmb3: enumerating snapshots was leaving part of the data off end
Steve French [Thu, 9 Aug 2018 19:33:12 +0000 (14:33 -0500)]
smb3: enumerating snapshots was leaving part of the data off end

When enumerating snapshots, the last few bytes of the final
snapshot could be left off since we were miscalculating the
length returned (leaving off the sizeof struct SRV_SNAPSHOT_ARRAY)
See MS-SMB2 section 2.2.32.2. In addition fixup the length used
to allow smaller buffer to be passed in, in order to allow
returning the size of the whole snapshot array more easily.

Sample userspace output with a kernel patched with this
(mounted to a Windows volume with two snapshots).
Before this patch, the second snapshot would be missing a
few bytes at the end.

~/cifs-2.6# ~/enum-snapshots /mnt/file
press enter to issue the ioctl to retrieve snapshot information ...

size of snapshot array = 102
Num snapshots: 2 Num returned: 2 Array Size: 102

Snapshot 0:@GMT-2018.06.30-19.34.17
Snapshot 1:@GMT-2018.06.30-19.33.37

CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
5 years agocifs: update smb2_queryfs() to use compounding
Ronnie Sahlberg [Wed, 8 Aug 2018 05:07:49 +0000 (15:07 +1000)]
cifs: update smb2_queryfs() to use compounding

Change smb2_queryfs() to use a Create/QueryInfo/Close compound request.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Paulo Alcantara <palcantara@suse.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
5 years agocifs: update receive_encrypted_standard to handle compounded responses
Ronnie Sahlberg [Wed, 8 Aug 2018 05:07:45 +0000 (15:07 +1000)]
cifs: update receive_encrypted_standard to handle compounded responses

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Paulo Alcantara <palcantara@suse.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
5 years agomake sure that __dentry_kill() always invalidates d_seq, unhashed or not
Al Viro [Thu, 9 Aug 2018 14:15:54 +0000 (10:15 -0400)]
make sure that __dentry_kill() always invalidates d_seq, unhashed or not

RCU pathwalk relies upon the assumption that anything that changes
->d_inode of a dentry will invalidate its ->d_seq.  That's almost
true - the one exception is that the final dput() of already unhashed
dentry does *not* touch ->d_seq at all.  Unhashing does, though,
so for anything we'd found by RCU dcache lookup we are fine.
Unfortunately, we can *start* with an unhashed dentry or jump into
it.

We could try and be careful in the (few) places where that could
happen.  Or we could just make the final dput() invalidate the damn
thing, unhashed or not.  The latter is much simpler and easier to
backport, so let's do it that way.

Reported-by: "Dae R. Jeong" <threeearcat@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
5 years agofix __legitimize_mnt()/mntput() race
Al Viro [Thu, 9 Aug 2018 21:51:32 +0000 (17:51 -0400)]
fix __legitimize_mnt()/mntput() race

__legitimize_mnt() has two problems - one is that in case of success
the check of mount_lock is not ordered wrt preceding increment of
refcount, making it possible to have successful __legitimize_mnt()
on one CPU just before the otherwise final mntpu() on another,
with __legitimize_mnt() not seeing mntput() taking the lock and
mntput() not seeing the increment done by __legitimize_mnt().
Solved by a pair of barriers.

Another is that failure of __legitimize_mnt() on the second
read_seqretry() leaves us with reference that'll need to be
dropped by caller; however, if that races with final mntput()
we can end up with caller dropping rcu_read_lock() and doing
mntput() to release that reference - with the first mntput()
having freed the damn thing just as rcu_read_lock() had been
dropped.  Solution: in "do mntput() yourself" failure case
grab mount_lock, check if MNT_DOOMED has been set by racing
final mntput() that has missed our increment and if it has -
undo the increment and treat that as "failure, caller doesn't
need to drop anything" case.

It's not easy to hit - the final mntput() has to come right
after the first read_seqretry() in __legitimize_mnt() *and*
manage to miss the increment done by __legitimize_mnt() before
the second read_seqretry() in there.  The things that are almost
impossible to hit on bare hardware are not impossible on SMP
KVM, though...

Reported-by: Oleg Nesterov <oleg@redhat.com>
Fixes: 48a066e72d97 ("RCU'd vsfmounts")
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
5 years agoMIPS: Remove remnants of UASM_ISA
Paul Burton [Thu, 9 Aug 2018 21:43:42 +0000 (14:43 -0700)]
MIPS: Remove remnants of UASM_ISA

Commit 33679a50370d ("MIPS: uasm: Remove needless ISA abstraction")
removed use of the MIPS_ISA preprocessor macro, but left a couple of
unused definitions of it behind.

Remove the dead code.

Signed-off-by: Paul Burton <paul.burton@mips.com>
5 years agofix mntput/mntput race
Al Viro [Thu, 9 Aug 2018 21:21:17 +0000 (17:21 -0400)]
fix mntput/mntput race

mntput_no_expire() does the calculation of total refcount under mount_lock;
unfortunately, the decrement (as well as all increments) are done outside
of it, leading to false positives in the "are we dropping the last reference"
test.  Consider the following situation:
* mnt is a lazy-umounted mount, kept alive by two opened files.  One
of those files gets closed.  Total refcount of mnt is 2.  On CPU 42
mntput(mnt) (called from __fput()) drops one reference, decrementing component
* After it has looked at component #0, the process on CPU 0 does
mntget(), incrementing component #0, gets preempted and gets to run again -
on CPU 69.  There it does mntput(), which drops the reference (component #69)
and proceeds to spin on mount_lock.
* On CPU 42 our first mntput() finishes counting.  It observes the
decrement of component #69, but not the increment of component #0.  As the
result, the total it gets is not 1 as it should've been - it's 0.  At which
point we decide that vfsmount needs to be killed and proceed to free it and
shut the filesystem down.  However, there's still another opened file
on that filesystem, with reference to (now freed) vfsmount, etc. and we are
screwed.

It's not a wide race, but it can be reproduced with artificial slowdown of
the mnt_get_count() loop, and it should be easier to hit on SMP KVM setups.

Fix consists of moving the refcount decrement under mount_lock; the tricky
part is that we want (and can) keep the fast case (i.e. mount that still
has non-NULL ->mnt_ns) entirely out of mount_lock.  All places that zero
mnt->mnt_ns are dropping some reference to mnt and they call synchronize_rcu()
before that mntput().  IOW, if mntput() observes (under rcu_read_lock())
a non-NULL ->mnt_ns, it is guaranteed that there is another reference yet to
be dropped.

Reported-by: Jann Horn <jannh@google.com>
Tested-by: Jann Horn <jannh@google.com>
Fixes: 48a066e72d97 ("RCU'd vsfmounts")
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
5 years agonull_blk: add lock drop/acquire annotation
Jens Axboe [Thu, 9 Aug 2018 20:22:41 +0000 (14:22 -0600)]
null_blk: add lock drop/acquire annotation

sparse complains:

drivers/block/null_blk_main.c:816:24: sparse: context imbalance in 'null_insert_page' - unexpected unlock

Fix it by adding the necessary annotations to the function.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agohwmon: k10temp: Support Threadripper 2920X, 2970WX; simplify offset table
Guenter Roeck [Thu, 9 Aug 2018 18:50:46 +0000 (11:50 -0700)]
hwmon: k10temp: Support Threadripper 2920X, 2970WX; simplify offset table

All announced Threadripper 29xx models have a temperature offset of
27 degrees C. Simplify temperature offset table to match all 29xx
Threadripper models with a single entry. Also simplify the table to match
all 19xx Threadripper models with a single entry. This effectively drops
entries for Threadripper 1910/1920/1950 which never saw the light of day.

Cc: Michael Larabel <Michael@phoronix.com>
Cc: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
5 years agoMerge branch 'bpf-fix-cpu-and-devmap-teardown'
Daniel Borkmann [Thu, 9 Aug 2018 19:50:45 +0000 (21:50 +0200)]
Merge branch 'bpf-fix-cpu-and-devmap-teardown'

Jesper Dangaard Brouer says:

====================
Removing entries from cpumap and devmap, goes through a number of
syncronization steps to make sure no new xdp_frames can be enqueued.
But there is a small chance, that xdp_frames remains which have not
been flushed/processed yet.  Flushing these during teardown, happens
from RCU context and not as usual under RX NAPI context.

The optimization introduced in commt 389ab7f01af9 ("xdp: introduce
xdp_return_frame_rx_napi"), missed that the flush operation can also
be called from RCU context.  Thus, we cannot always use the
xdp_return_frame_rx_napi call, which take advantage of the protection
provided by XDP RX running under NAPI protection.

The samples/bpf xdp_redirect_cpu have a --stress-mode, that is
adjusted to easier reproduce (verified by Red Hat QA).
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agoxdp: fix bug in devmap teardown code path
Jesper Dangaard Brouer [Wed, 8 Aug 2018 21:00:45 +0000 (23:00 +0200)]
xdp: fix bug in devmap teardown code path

Like cpumap teardown, the devmap teardown code also flush remaining
xdp_frames, via bq_xmit_all() in case map entry is removed.  The code
can call xdp_return_frame_rx_napi, from the the wrong context, in-case
ndo_xdp_xmit() fails.

Fixes: 389ab7f01af9 ("xdp: introduce xdp_return_frame_rx_napi")
Fixes: 735fc4054b3a ("xdp: change ndo_xdp_xmit API to support bulking")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agosamples/bpf: xdp_redirect_cpu adjustment to reproduce teardown race easier
Jesper Dangaard Brouer [Wed, 8 Aug 2018 21:00:39 +0000 (23:00 +0200)]
samples/bpf: xdp_redirect_cpu adjustment to reproduce teardown race easier

The teardown race in cpumap is really hard to reproduce.  These changes
makes it easier to reproduce, for QA.

The --stress-mode now have a case of a very small queue size of 8, that helps
to trigger teardown flush to encounter a full queue, which results in calling
xdp_return_frame API, in a non-NAPI protect context.

Also increase MAX_CPUS, as my QA department have larger machines than me.

Tested-by: Jean-Tsung Hsiao <jhsiao@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agoxdp: fix bug in cpumap teardown code path
Jesper Dangaard Brouer [Wed, 8 Aug 2018 21:00:34 +0000 (23:00 +0200)]
xdp: fix bug in cpumap teardown code path

When removing a cpumap entry, a number of syncronization steps happen.
Eventually the teardown code __cpu_map_entry_free is invoked from/via
call_rcu.

The teardown code __cpu_map_entry_free() flushes remaining xdp_frames,
by invoking bq_flush_to_queue, which calls xdp_return_frame_rx_napi().
The issues is that the teardown code is not running in the RX NAPI
code path.  Thus, it is not allowed to invoke the NAPI variant of
xdp_return_frame.

This bug was found and triggered by using the --stress-mode option to
the samples/bpf program xdp_redirect_cpu.  It is hard to trigger,
because the ptr_ring have to be full and cpumap bulk queue max
contains 8 packets, and a remote CPU is racing to empty the ptr_ring
queue.

Fixes: 389ab7f01af9 ("xdp: introduce xdp_return_frame_rx_napi")
Tested-by: Jean-Tsung Hsiao <jhsiao@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agoBlk-throttle: reduce tail io latency when iops limit is enforced
Liu Bo [Thu, 9 Aug 2018 17:47:02 +0000 (01:47 +0800)]
Blk-throttle: reduce tail io latency when iops limit is enforced

When an application's iops has exceeded its cgroup's iops limit, surely it
is throttled and kernel will set a timer for dispatching, thus IO latency
includes the delay.

However, the dispatch delay which is calculated by the limit and the
elapsed jiffies is suboptimal.  As the dispatch delay is only calculated
once the application's iops is (iops limit + 1), it doesn't need to wait
any longer than the remaining time of the current slice.

The difference can be proved by the following fio job and cgroup iops
setting,
-----
$ echo 4 > /mnt/config/nullb/disk1/mbps    # limit nullb's bandwidth to 4MB/s for testing.
$ echo "253:1 riops=100 rbps=max" > /sys/fs/cgroup/unified/cg1/io.max
$ cat r2.job
[global]
name=fio-rand-read
filename=/dev/nullb1
rw=randread
bs=4k
direct=1
numjobs=1
time_based=1
runtime=60
group_reporting=1

[file1]
size=4G
ioengine=libaio
iodepth=1
rate_iops=50000
norandommap=1
thinktime=4ms
-----

wo patch:
file1: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
fio-3.7-66-gedfc
Starting 1 process

   read: IOPS=99, BW=400KiB/s (410kB/s)(23.4MiB/60001msec)
    slat (usec): min=10, max=336, avg=27.71, stdev=17.82
    clat (usec): min=2, max=28887, avg=5929.81, stdev=7374.29
     lat (usec): min=24, max=28901, avg=5958.73, stdev=7366.22
    clat percentiles (usec):
     |  1.00th=[    4],  5.00th=[    4], 10.00th=[    4], 20.00th=[    4],
     | 30.00th=[    4], 40.00th=[    4], 50.00th=[    6], 60.00th=[11731],
     | 70.00th=[11863], 80.00th=[11994], 90.00th=[12911], 95.00th=[22676],
     | 99.00th=[23725], 99.50th=[23987], 99.90th=[23987], 99.95th=[25035],
     | 99.99th=[28967]

w/ patch:
file1: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
fio-3.7-66-gedfc
Starting 1 process

   read: IOPS=100, BW=400KiB/s (410kB/s)(23.4MiB/60005msec)
    slat (usec): min=10, max=155, avg=23.24, stdev=16.79
    clat (usec): min=2, max=12393, avg=5961.58, stdev=5959.25
     lat (usec): min=23, max=12412, avg=5985.91, stdev=5951.92
    clat percentiles (usec):
     |  1.00th=[    3],  5.00th=[    3], 10.00th=[    4], 20.00th=[    4],
     | 30.00th=[    4], 40.00th=[    5], 50.00th=[   47], 60.00th=[11863],
     | 70.00th=[11994], 80.00th=[11994], 90.00th=[11994], 95.00th=[11994],
     | 99.00th=[11994], 99.50th=[11994], 99.90th=[12125], 99.95th=[12125],
     | 99.99th=[12387]

Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agox86/relocs: Add __end_rodata_aligned to S_REL
Joerg Roedel [Thu, 9 Aug 2018 09:44:49 +0000 (11:44 +0200)]
x86/relocs: Add __end_rodata_aligned to S_REL

This new symbol needs to be in the workaround-list for buggy
binutils, otherwise the build with gcc-4.6 fails.

Fixes: 39d668e04eda ('x86/mm/pti: Make pti_clone_kernel_text() compile on 32 bit')
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linux-Next Mailing List <linux-next@vger.kernel.org>
Link: https://lkml.kernel.org/r/20180809094449.ddmnrkz7qkvo3j2x@suse.de
5 years agoMerge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Linus Torvalds [Thu, 9 Aug 2018 17:00:15 +0000 (10:00 -0700)]
Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto fix from Herbert Xu:
 "This fixes a performance regression in arm64 NEON crypto as well as a
  crash in x86 aegis/morus on unsupported CPUs"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: x86/aegis,morus - Fix and simplify CPUID checks
  crypto: arm64 - revert NEON yield for fast AEAD implementations

5 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Thu, 9 Aug 2018 16:57:13 +0000 (09:57 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:

 1) The real fix for the ipv6 route metric leak Sabrina was seeing, from
    Cong Wang.

 2) Fix syzbot triggers AF_PACKET v3 ring buffer insufficient room
    conditions, from Willem de Bruijn.

 3) vsock can reinitialize active work struct, fix from Cong Wang.

 4) RXRPC keepalive generator can wedge a cpu, fix from David Howells.

 5) Fix locking in AF_SMC ioctl, from Ursula Braun.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  dsa: slave: eee: Allow ports to use phylink
  net/smc: move sock lock in smc_ioctl()
  net/smc: allow sysctl rmem and wmem defaults for servers
  net/smc: no shutdown in state SMC_LISTEN
  net: aquantia: Fix IFF_ALLMULTI flag functionality
  rxrpc: Fix the keepalive generator [ver #2]
  net/mlx5e: Cleanup of dcbnl related fields
  net/mlx5e: Properly check if hairpin is possible between two functions
  vhost: reset metadata cache when initializing new IOTLB
  llc: use refcount_inc_not_zero() for llc_sap_find()
  dccp: fix undefined behavior with 'cwnd' shift in ccid2_cwnd_restart()
  tipc: fix an interrupt unsafe locking scenario
  vsock: split dwork to avoid reinitializations
  net: thunderx: check for failed allocation lmac->dmacs
  cxgb4: mk_act_open_req() buggers ->{local, peer}_ip on big-endian hosts
  packet: refine ring v3 block size test to hold one frame
  ip6_tunnel: use the right value for ipv4 min mtu check in ip6_tnl_xmit
  ipv6: fix double refcount of fib6_metrics

5 years agoblock: paride: pd: mark expected switch fall-throughs
Gustavo A. R. Silva [Thu, 9 Aug 2018 15:54:46 +0000 (10:54 -0500)]
block: paride: pd: mark expected switch fall-throughs

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Addresses-Coverity-ID: 1056543 ("Missing break in switch")
Addresses-Coverity-ID: 1056544 ("Missing break in switch")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoi2c: xlp9xx: Fix case where SSIF read transaction completes early
George Cherian [Thu, 9 Aug 2018 06:36:48 +0000 (23:36 -0700)]
i2c: xlp9xx: Fix case where SSIF read transaction completes early

During ipmi stress tests we see occasional failure of transactions
at the boot time. This happens in the case of a I2C_M_RECV_LEN
transactions, when the read transfer completes (with the initial
read length of 34) before the driver gets a chance to handle interrupts.

The current driver code expects at least 2 interrupts for I2C_M_RECV_LEN
transactions. The length is updated during the first interrupt, and  the
buffer contents are only copied during subsequent interrupts. In case of
just one interrupt, we will complete the transaction without copying
out the bytes from RX fifo.

Update the code to drain the RX fifo after the length update,
so that the transaction completes correctly in all cases.

Signed-off-by: George Cherian <george.cherian@cavium.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org
5 years agoblock: Ensure that a request queue is dissociated from the cgroup controller
Bart Van Assche [Thu, 9 Aug 2018 14:53:38 +0000 (07:53 -0700)]
block: Ensure that a request queue is dissociated from the cgroup controller

Several block drivers call alloc_disk() followed by put_disk() if
something fails before device_add_disk() is called without calling
blk_cleanup_queue(). Make sure that also for this scenario a request
queue is dissociated from the cgroup controller. This patch avoids
that loading the parport_pc, paride and pf drivers triggers the
following kernel crash:

BUG: KASAN: null-ptr-deref in pi_init+0x42e/0x580 [paride]
Read of size 4 at addr 0000000000000008 by task modprobe/744
Call Trace:
dump_stack+0x9a/0xeb
kasan_report+0x139/0x350
pi_init+0x42e/0x580 [paride]
pf_init+0x2bb/0x1000 [pf]
do_one_initcall+0x8e/0x405
do_init_module+0xd9/0x2f2
load_module+0x3ab4/0x4700
SYSC_finit_module+0x176/0x1a0
do_syscall_64+0xee/0x2b0
entry_SYSCALL_64_after_hwframe+0x42/0xb7

Reported-by: Alexandru Moise <00moses.alexander00@gmail.com>
Fixes: a063057d7c73 ("block: Fix a race between request queue removal and the block cgroup controller") # v4.17
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Tested-by: Alexandru Moise <00moses.alexander00@gmail.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Alexandru Moise <00moses.alexander00@gmail.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoblock: Introduce blk_exit_queue()
Bart Van Assche [Thu, 9 Aug 2018 14:53:37 +0000 (07:53 -0700)]
block: Introduce blk_exit_queue()

This patch does not change any functionality.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Alexandru Moise <00moses.alexander00@gmail.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoblkcg: Introduce blkg_root_lookup()
Bart Van Assche [Thu, 9 Aug 2018 14:53:36 +0000 (07:53 -0700)]
blkcg: Introduce blkg_root_lookup()

This new function will be used in a later patch to verify whether a
queue has been dissociated from the cgroup controller before being
released.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Alexandru Moise <00moses.alexander00@gmail.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoblock: Remove two superfluous #include directives
Bart Van Assche [Thu, 9 Aug 2018 14:47:28 +0000 (07:47 -0700)]
block: Remove two superfluous #include directives

Commit 12f5b9314545 ("blk-mq: Remove generation seqeunce") removed the
only seqcount_t and u64_stats_sync instances from <linux/blkdev.h> but
did not remove the corresponding #include directives. Since these
include directives are no longer needed, remove them.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Hannes Reinecke <hare@suse.com>,
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoblk-mq: count the hctx as active before allocating tag
Jianchao Wang [Thu, 9 Aug 2018 14:34:17 +0000 (08:34 -0600)]
blk-mq: count the hctx as active before allocating tag

Currently, we count the hctx as active after allocate driver tag
successfully. If a previously inactive hctx try to get tag first
time, it may fails and need to wait. However, due to the stale tag
->active_queues, the other shared-tags users are still able to
occupy all driver tags while there is someone waiting for tag.
Consequently, even if the previously inactive hctx is waked up, it
still may not be able to get a tag and could be starved.

To fix it, we count the hctx as active before try to allocate driver
tag, then when it is waiting the tag, the other shared-tag users
will reserve budget for it.

Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoblock: bvec_nr_vecs() returns value for wrong slab
Greg Edwards [Wed, 8 Aug 2018 19:27:53 +0000 (13:27 -0600)]
block: bvec_nr_vecs() returns value for wrong slab

In commit ed996a52c868 ("block: simplify and cleanup bvec pool
handling"), the value of the slab index is incremented by one in
bvec_alloc() after the allocation is done to indicate an index value of
0 does not need to be later freed.

bvec_nr_vecs() was not updated accordingly, and thus returns the wrong
value.  Decrement idx before performing the lookup.

Fixes: ed996a52c868 ("block: simplify and cleanup bvec pool handling")
Signed-off-by: Greg Edwards <gedwards@ddn.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoMerge branch 'nvme-4.19' of git://git.infradead.org/nvme into for-4.19/block
Jens Axboe [Thu, 9 Aug 2018 14:22:21 +0000 (08:22 -0600)]
Merge branch 'nvme-4.19' of git://git.infradead.org/nvme into for-4.19/block

Pull NVMe updates from Christoph:

"This should be the last round of NVMe updates before the 4.19 merge
 window opens.  It conatins support for write protected (aka read-only)
 namespaces from Chaitanya, two ANA fixes from Hannes and a fabrics
 fix from Tal Shorer."

* 'nvme-4.19' of git://git.infradead.org/nvme:
  nvme-fabrics: fix ctrl_loss_tmo < 0 to reconnect forever
  nvmet: add ns write protect support
  nvme: set gendisk read only based on nsattr
  nvme.h: add support for ns write protect definitions
  nvme.h: fixup ANA group descriptor format
  nvme: fixup crash on failed discovery

5 years agobcache: trivial - remove tailing backslash in macro BTREE_FLAG
Shenghui Wang [Thu, 9 Aug 2018 07:48:51 +0000 (15:48 +0800)]
bcache: trivial - remove tailing backslash in macro BTREE_FLAG

Remove the tailing backslash in macro BTREE_FLAG in btree.h

Signed-off-by: Shenghui Wang <shhuiw@foxmail.com>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agobcache: make the pr_err statement used for ENOENT only in sysfs_attatch section
Shenghui Wang [Thu, 9 Aug 2018 07:48:50 +0000 (15:48 +0800)]
bcache: make the pr_err statement used for ENOENT only in sysfs_attatch section

The pr_err statement in the code for sysfs_attatch section would run
for various error codes, which maybe confusing.

E.g,

Run the command twice:
   echo 796b5c05-b03c-4bc7-9cbd-a8df5e8be891 > \
/sys/block/bcache0/bcache/attach
   [the backing dev got attached on the first run]
   echo 796b5c05-b03c-4bc7-9cbd-a8df5e8be891 > \
/sys/block/bcache0/bcache/attach

In dmesg, after the command run twice, we can get:
bcache: bch_cached_dev_attach() Can't attach sda6: already attached
bcache: __cached_dev_store() Can't attach 796b5c05-b03c-4bc7-9cbd-\
a8df5e8be891
               : cache set not found
The first statement in the message was right, but the second was
confusing.

bch_cached_dev_attach has various pr_ statements for various error
codes, except ENOENT.

After the change, rerun above command twice:
echo 796b5c05-b03c-4bc7-9cbd-a8df5e8be891 > \
/sys/block/bcache0/bcache/attach
echo 796b5c05-b03c-4bc7-9cbd-a8df5e8be891 > \
/sys/block/bcache0/bcache/attach

In dmesg we only got:
bcache: bch_cached_dev_attach() Can't attach sda6: already attached
No confusing "cache set not found" message anymore.

And for some not exist SET-UUID:
echo 796b5c05-b03c-4bc7-9cbd-a8df5e8be898 > \
/sys/block/bcache0/bcache/attach
In dmesg we can get:
bcache: __cached_dev_store() Can't attach 796b5c05-b03c-4bc7-9cbd-\
a8df5e8be898
               : cache set not found

Signed-off-by: Shenghui Wang <shhuiw@foxmail.com>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agobcache: set max writeback rate when I/O request is idle
Coly Li [Thu, 9 Aug 2018 07:48:49 +0000 (15:48 +0800)]
bcache: set max writeback rate when I/O request is idle

Commit b1092c9af9ed ("bcache: allow quick writeback when backing idle")
allows the writeback rate to be faster if there is no I/O request on a
bcache device. It works well if there is only one bcache device attached
to the cache set. If there are many bcache devices attached to a cache
set, it may introduce performance regression because multiple faster
writeback threads of the idle bcache devices will compete the btree level
locks with the bcache device who have I/O requests coming.

This patch fixes the above issue by only permitting fast writebac when
all bcache devices attached on the cache set are idle. And if one of the
bcache devices has new I/O request coming, minimized all writeback
throughput immediately and let PI controller __update_writeback_rate()
to decide the upcoming writeback rate for each bcache device.

Also when all bcache devices are idle, limited wrieback rate to a small
number is wast of thoughput, especially when backing devices are slower
non-rotation devices (e.g. SATA SSD). This patch sets a max writeback
rate for each backing device if the whole cache set is idle. A faster
writeback rate in idle time means new I/Os may have more available space
for dirty data, and people may observe a better write performance then.

Please note bcache may change its cache mode in run time, and this patch
still works if the cache mode is switched from writeback mode and there
is still dirty data on cache.

Fixes: Commit b1092c9af9ed ("bcache: allow quick writeback when backing idle")
Cc: stable@vger.kernel.org #4.16+
Signed-off-by: Coly Li <colyli@suse.de>
Tested-by: Kai Krakow <kai@kaishome.de>
Tested-by: Stefan Priebe <s.priebe@profihost.ag>
Cc: Michael Lyle <mlyle@lyle.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agobcache: add code comments for bset.c
Coly Li [Thu, 9 Aug 2018 07:48:48 +0000 (15:48 +0800)]
bcache: add code comments for bset.c

This patch tries to add code comments in bset.c, to make some
tricky code and designment to be more comprehensible. Most information
of this patch comes from the discussion between Kent and I, he
offers very informative details. If there is any mistake
of the idea behind the code, no doubt that's from me misrepresentation.

Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agobcache: fix mistaken comments in request.c
Coly Li [Thu, 9 Aug 2018 07:48:47 +0000 (15:48 +0800)]
bcache: fix mistaken comments in request.c

This patch updates code comment in bch_keylist_realloc() by fixing
incorrected function names, to make the code to be more comprehennsible.

Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agobcache: fix mistaken code comments in bcache.h
Coly Li [Thu, 9 Aug 2018 07:48:46 +0000 (15:48 +0800)]
bcache: fix mistaken code comments in bcache.h

This patch updates the code comment in struct cache with correct array
names, to make the code to be more comprehensible.

Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agobcache: add a comment in super.c
Coly Li [Thu, 9 Aug 2018 07:48:45 +0000 (15:48 +0800)]
bcache: add a comment in super.c

This patch adds a line of code comment in super.c:register_bdev(), to
make code to be more comprehensible.

Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agobcache: avoid unncessary cache prefetch bch_btree_node_get()
Coly Li [Thu, 9 Aug 2018 07:48:44 +0000 (15:48 +0800)]
bcache: avoid unncessary cache prefetch bch_btree_node_get()

In bch_btree_node_get() the read-in btree node will be partially
prefetched into L1 cache for following bset iteration (if there is).
But if the btree node read is failed, the perfetch operations will
waste L1 cache space. This patch checkes whether read operation and
only does cache prefetch when read I/O succeeded.

Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agobcache: display rate debug parameters to 0 when writeback is not running
Coly Li [Thu, 9 Aug 2018 07:48:43 +0000 (15:48 +0800)]
bcache: display rate debug parameters to 0 when writeback is not running

When writeback is not running, writeback rate should be 0, other value is
misleading. And the following dyanmic writeback rate debug parameters
should be 0 too,
rate, proportional, integral, change
otherwise they are misleading when writeback is not running.

Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agobcache: do not check return value of debugfs_create_dir()
Coly Li [Thu, 9 Aug 2018 07:48:42 +0000 (15:48 +0800)]
bcache: do not check return value of debugfs_create_dir()

Greg KH suggests that normal code should not care about debugfs. Therefore
no matter successful or failed of debugfs_create_dir() execution, it is
unncessary to check its return value.

There are two functions called debugfs_create_dir() and check the return
value, which are bch_debug_init() and closure_debug_init(). This patch
changes these two functions from int to void type, and ignore return values
of debugfs_create_dir().

This patch does not fix exact bug, just makes things work as they should.

Signed-off-by: Coly Li <colyli@suse.de>
Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: stable@vger.kernel.org
Cc: Kai Krakow <kai@kaishome.de>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoMerge branch 'asoc-4.19' into asoc-next
Mark Brown [Thu, 9 Aug 2018 13:47:05 +0000 (14:47 +0100)]
Merge branch 'asoc-4.19' into asoc-next

5 years agoMerge branch 'asoc-4.18' into asoc-linus
Mark Brown [Thu, 9 Aug 2018 13:46:56 +0000 (14:46 +0100)]
Merge branch 'asoc-4.18' into asoc-linus

5 years agoASoC: adav80x: mark expected switch fall-through
Gustavo A. R. Silva [Wed, 8 Aug 2018 19:19:33 +0000 (14:19 -0500)]
ASoC: adav80x: mark expected switch fall-through

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Addresses-Coverity-ID: 1056531 ("Missing break in switch")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
5 years agoplatform/x86: Add ACPI i2c-multi-instantiate pseudo driver
Hans de Goede [Thu, 9 Aug 2018 11:40:46 +0000 (13:40 +0200)]
platform/x86: Add ACPI i2c-multi-instantiate pseudo driver

On systems with ACPI instantiated i2c-clients, normally there is 1 fw_node
per i2c-device and that fw-node contains 1 I2cSerialBus resource for that 1
i2c-device.

But in some rare cases the manufacturer has decided to describe multiple
i2c-devices in a single ACPI fwnode with multiple I2cSerialBus resources.

An earlier attempt to fix this in the i2c-core resulted in a lot of extra
code to support this corner-case.

This commit introduces a new i2c-multi-instantiate driver which fixes this
in a different way. This new driver can be built as a module which will
only loaded on affected systems.

This driver will instantiate a new i2c-client per I2cSerialBus resource,
using the driver_data from the acpi_device_id it is binding to to tell it
which chip-type (and optional irq-resource) to use when instantiating.

Note this driver depends on a platform device being instantiated for the
ACPI fwnode, see the i2c_multi_instantiate_ids list of ACPI device-ids in
drivers/acpi/scan.c: acpi_device_enumeration_by_parent().

Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
5 years agos390/dasd: fix hanging offline processing due to canceled worker
Stefan Haberland [Wed, 25 Jul 2018 12:00:47 +0000 (14:00 +0200)]
s390/dasd: fix hanging offline processing due to canceled worker

During offline processing two worker threads are canceled without
freeing the device reference which leads to a hanging offline process.

Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com>
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
5 years agos390/dasd: fix panic for failed online processing
Stefan Haberland [Wed, 25 Jul 2018 11:27:10 +0000 (13:27 +0200)]
s390/dasd: fix panic for failed online processing

Fix a panic that occurs for a device that got an error in
dasd_eckd_check_characteristics() during online processing.
For example the read configuration data command may have failed.

If this error occurs the device is not being set online and the earlier
invoked steps during online processing are rolled back. Therefore
dasd_eckd_uncheck_device() is called which needs a valid private
structure. But this pointer is not valid if
dasd_eckd_check_characteristics() has failed.

Check for a valid device->private pointer to prevent a panic.

Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com>
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
5 years agoMerge branch 'regmap-4.19' into regmap-next
Mark Brown [Thu, 9 Aug 2018 10:17:30 +0000 (11:17 +0100)]
Merge branch 'regmap-4.19' into regmap-next

5 years agoMerge tag 'regmap-noinc-read' into regmap-4.19
Mark Brown [Thu, 9 Aug 2018 10:15:06 +0000 (11:15 +0100)]
Merge tag 'regmap-noinc-read' into regmap-4.19

regmap: Support non-incrementing registers

Some devices have individual registers that don't autoincrement the
register address during bulk reads but instead repeatedly read the same
value, for example for monitoring GPIOs or ADCs.  Add support for these.

5 years agoACPI / x86: utils: Remove status workaround from acpi_device_always_present()
Hans de Goede [Thu, 9 Aug 2018 09:15:57 +0000 (11:15 +0200)]
ACPI / x86: utils: Remove status workaround from acpi_device_always_present()

Now that we init the status field to ACPI_STA_DEFAULT rather then to 0,
the workaround for acpi_match_device_ids() always returning -ENOENT when
status is 0 is no longer needed.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
5 years agoMerge branch 'acpi-scan' to satisfy dependencies.
Rafael J. Wysocki [Thu, 9 Aug 2018 10:12:52 +0000 (12:12 +0200)]
Merge branch 'acpi-scan' to satisfy dependencies.

5 years agoACPI / scan: Create platform device for fwnodes with multiple i2c devices
Hans de Goede [Thu, 9 Aug 2018 09:15:56 +0000 (11:15 +0200)]
ACPI / scan: Create platform device for fwnodes with multiple i2c devices

Some devices have multiple I2cSerialBus resources and for things to work
an i2c-client must be instantiated for each, each with its own
i2c_device_id.

Normally we only instantiate an i2c-client for the first resource, using
the ACPI HID as id.

This commit adds a list of HIDs of devices, which need multiple i2c-clients
instantiated from a single fwnode, to acpi_device_enumeration_by_parent and
makes acpi_device_enumeration_by_parent return false for these devices so
that a platform device will be instantiated.

This allows the drivers/platform/x86/i2c-multi-instantiate.c driver, which
knows which i2c_device_id to use for each resource, to bind to the fwnode
and initiate an i2c-client for each resource.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
5 years agoregmap: Add regmap_noinc_read API
Crestez Dan Leonard [Tue, 7 Aug 2018 14:52:17 +0000 (17:52 +0300)]
regmap: Add regmap_noinc_read API

The regmap API usually assumes that bulk read operations will read a
range of registers but some I2C/SPI devices have certain registers for
which a such a read operation will return data from an internal FIFO
instead. Add an explicit API to support bulk read without range semantics.

Some linux drivers use regmap_bulk_read or regmap_raw_read for such
registers, for example mpu6050 or bmi150 from IIO. This only happens to
work because when caching is disabled a single regmap read op will map
to a single bus read op (as desired). This breaks if caching is enabled and
reg+1 happens to be a cacheable register.

Without regmap support refactoring a driver to enable regmap caching
requires separate I2C and SPI paths. This is exactly what regmap is
supposed to help avoid.

Suggested-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Crestez Dan Leonard <leonard.crestez@intel.com>
Signed-off-by: Stefan Popa <stefan.popa@analog.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
5 years agoASoC: da7219: Add delays to capture path to remove DC offset noise
Adam Thomson [Thu, 9 Aug 2018 09:48:50 +0000 (10:48 +0100)]
ASoC: da7219: Add delays to capture path to remove DC offset noise

On some platforms it has been noted that a pop noise can be
witnessed when capturing audio, mainly for first time after a
headset jack has been inserted. This is due to a DC offset in the
Mic PGA and so to avoid this delays are required when powering
up the capture path.

This commit rectifies the problem by adding delays post Mic PGA and
post Mixin PGA. The post Mic PGA delay is determined based on
Mic Bias voltage, and is only applied the first time after a
headset jack is inserted.

Signed-off-by: Adam Thomson <Adam.Thomson.Opensource@diasemi.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
5 years agoACPI / scan: Initialize status to ACPI_STA_DEFAULT
Hans de Goede [Wed, 8 Aug 2018 08:30:03 +0000 (10:30 +0200)]
ACPI / scan: Initialize status to ACPI_STA_DEFAULT

Since commit 63347db0affa "ACPI / scan: Use acpi_bus_get_status() to
initialize ACPI_TYPE_DEVICE devs" the status field of normal acpi_devices
gets set to 0 by acpi_bus_type_and_status() and filled with its actual
value later when acpi_add_single_object() calls acpi_bus_get_status().

This means that any acpi_match_device_ids() calls in between will always
fail with -ENOENT.

We already have a workaround for this, which temporary forces status to
ACPI_STA_DEFAULT in drivers/acpi/x86/utils.c: acpi_device_always_present()
and the next commit in this series adds another acpi_match_device_ids()
call between status being initialized as 0 and the acpi_bus_get_status()
call.

Rather then adding another workaround, this commit makes
acpi_bus_type_and_status() initialize status to ACPI_STA_DEFAULT, this is
safe to do as the only code looking at status between the initialization
and the acpi_bus_get_status() call is those acpi_match_device_ids() calls.

Note this does mean that we need to (re)set status to 0 in case the
acpi_bus_get_status() call fails.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
5 years agoACPI / EC: Add another entry for Thinkpad X1 Carbon 6th
Mika Westerberg [Wed, 8 Aug 2018 09:50:37 +0000 (12:50 +0300)]
ACPI / EC: Add another entry for Thinkpad X1 Carbon 6th

Commit 2c4d6baf1bc4 (ACPI / EC: Use ec_no_wakeup on more Thinkpad X1
Carbon 6th systems) changed the DMI table to match all systems where
DMI product family is "Thinkpad X1 Carbon 6th". However, the system I
have here has this string written differently (ThinkPad vs. Thinkpad)
which makes the match fail.

In addition to that, after BIOS upgrade Robin now has the same string
than my system has (perhaps newer BIOS has changed the string).

In any case add another DMI entry to acpi_ec_no_wakeup[] table hopefully
covering all the X1 Carbon 6th systems out there.

Fixes: 2c4d6baf1bc4 (ACPI / EC: Use ec_no_wakeup on more Thinkpad X1 Carbon 6th systems)
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
[ rjw: Rebase and change the ident string to match the product familiy ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
5 years agoACPI: bus: Fix a pointer coding style issue
Tom Todd [Wed, 8 Aug 2018 00:52:02 +0000 (01:52 +0100)]
ACPI: bus: Fix a pointer coding style issue

Fix white space in the argument list of acpi_device_remove().

Signed-off-by: Tom Todd <thomas.m.a.todd@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
5 years agoarm64 / ACPI: clean the additional checks before calling ghes_notify_sea()
Dongjiu Geng [Tue, 7 Aug 2018 16:26:15 +0000 (12:26 -0400)]
arm64 / ACPI: clean the additional checks before calling ghes_notify_sea()

In order to remove the additional check before calling the
ghes_notify_sea(), make stub definition when !CONFIG_ACPI_APEI_SEA.

After this cleanup, we can simply call the ghes_notify_sea() to let
APEI driver handle the SEA notification.

Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
5 years agoACPI / scan: Add static attribute to indirect_io_hosts[]
John Garry [Tue, 7 Aug 2018 13:15:05 +0000 (21:15 +0800)]
ACPI / scan: Add static attribute to indirect_io_hosts[]

Array indirect_io_hosts[] is declared in acpi_is_indirect_io_slave() as a
const array, which means that the array will be re-built for each call.

Optimise by adding the static attribute, which means that the array is
added to const-data pool and not re-built per function call.

Reported-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
5 years agoACPI / battery: Do not export energy_full[_design] on devices without full_charge_cap...
Hans de Goede [Tue, 7 Aug 2018 07:36:30 +0000 (09:36 +0200)]
ACPI / battery: Do not export energy_full[_design] on devices without full_charge_capacity

On some devices (with a buggy _BIX implementation) full_charge_capacity
always reports as 0. This means that our energy_full sysfs attribute will
also always be 0, which is not useful to export.

Worse we calculate our reported capacity on full_charge_capacity and if it
is 0 we always report 0. This causes userspace to immediately shutdown or
hibernate the laptop since it assumes that the battery is critically low.

This commit makes us not report energy_full[_design] or capacity on such
broken devices, avoiding the immediate shutdown / hibernate from userspace.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=83941
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
5 years agotools headers: Synchronise x86 cpufeatures.h for L1TF additions
David Woodhouse [Wed, 8 Aug 2018 10:00:16 +0000 (11:00 +0100)]
tools headers: Synchronise x86 cpufeatures.h for L1TF additions

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agoALSA: usb-audio: Mark expected switch fall-through
Gustavo A. R. Silva [Wed, 8 Aug 2018 22:23:44 +0000 (17:23 -0500)]
ALSA: usb-audio: Mark expected switch fall-through

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Addresses-Coverity-ID: 1357413 ("Missing break in switch")
Addresses-Coverity-ID: 114917 ("Missing break in switch")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
5 years agoALSA: mixart: Mark expected switch fall-through
Gustavo A. R. Silva [Wed, 8 Aug 2018 22:11:48 +0000 (17:11 -0500)]
ALSA: mixart: Mark expected switch fall-through

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Notice that in this particular case, I replaced the code comment with
a proper "fall through" annotation, which is what GCC is expecting
to find.

Addresses-Coverity-ID: 114889 ("Missing break in switch")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
5 years agos390/mm: fix addressing exception after suspend/resume
Gerald Schaefer [Tue, 7 Aug 2018 16:57:11 +0000 (18:57 +0200)]
s390/mm: fix addressing exception after suspend/resume

Commit c9b5ad546e7d "s390/mm: tag normal pages vs pages used in page tables"
accidentally changed the logic in arch_set_page_states(), which is used by
the suspend/resume code. set_page_stable(page, order) was changed to
set_page_stable_dat(page, 0). After this, only the first page of higher order
pages will be set to stable, and a write to one of the unstable pages will
result in an addressing exception.

Fix this by using "order" again, instead of "0".

Fixes: c9b5ad546e7d ("s390/mm: tag normal pages vs pages used in page tables")
Cc: stable@vger.kernel.org # 4.14+
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
5 years agorseq/selftests: add s390 support
Vasily Gorbik [Mon, 9 Jul 2018 15:07:48 +0000 (17:07 +0200)]
rseq/selftests: add s390 support

Implement support for s390 in the rseq selftests, in order to sanity
check the recently enabled rseq syscall. The Implementation covers both
64-bit and 31-bit mode.

Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
5 years agodsa: slave: eee: Allow ports to use phylink
Andrew Lunn [Wed, 8 Aug 2018 18:56:40 +0000 (20:56 +0200)]
dsa: slave: eee: Allow ports to use phylink

For a port to be able to use EEE, both the MAC and the PHY must
support EEE. A phy can be provided by both a phydev or phylink. Verify
at least one of these exist, not just phydev.

Fixes: aab9c4067d23 ("net: dsa: Plug in PHYLINK support")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'smc-fixes'
David S. Miller [Thu, 9 Aug 2018 02:14:23 +0000 (19:14 -0700)]
Merge branch 'smc-fixes'

Ursula Braun says:

====================
net/smc: fixes 2018-08-08

here are small fixes for SMC: The first patch makes sure, shutdown code
is not executed for sockets in state SMC_LISTEN. The second patch resets
send and receive buffer values for accepted sockets, since TCP buffer size
optimizations for the internal CLC socket should not be forwarded to the
outer SMC socket. The third patch solves a race between connect and ioctl
reported by syzbot.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/smc: move sock lock in smc_ioctl()
Ursula Braun [Wed, 8 Aug 2018 12:13:21 +0000 (14:13 +0200)]
net/smc: move sock lock in smc_ioctl()

When an SMC socket is connecting it is decided whether fallback to
TCP is needed. To avoid races between connect and ioctl move the
sock lock before the use_fallback check.

Reported-by: syzbot+5b2cece1a8ecb2ca77d8@syzkaller.appspotmail.com
Reported-by: syzbot+19557374321ca3710990@syzkaller.appspotmail.com
Fixes: 1992d99882af ("net/smc: take sock lock in smc_ioctl()")
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/smc: allow sysctl rmem and wmem defaults for servers
Ursula Braun [Wed, 8 Aug 2018 12:13:20 +0000 (14:13 +0200)]
net/smc: allow sysctl rmem and wmem defaults for servers

Without setsockopt SO_SNDBUF and SO_RCVBUF settings, the sysctl
defaults net.ipv4.tcp_wmem and net.ipv4.tcp_rmem should be the base
for the sizes of the SMC sndbuf and rcvbuf. Any TCP buffer size
optimizations for servers should be ignored.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/smc: no shutdown in state SMC_LISTEN
Ursula Braun [Wed, 8 Aug 2018 12:13:19 +0000 (14:13 +0200)]
net/smc: no shutdown in state SMC_LISTEN

Invoking shutdown for a socket in state SMC_LISTEN does not make
sense. Nevertheless programs like syzbot fuzzing the kernel may
try to do this. For SMC this means a socket refcounting problem.
This patch makes sure a shutdown call for an SMC socket in state
SMC_LISTEN simply returns with -ENOTCONN.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: aquantia: Fix IFF_ALLMULTI flag functionality
Dmitry Bogdanov [Wed, 8 Aug 2018 11:06:32 +0000 (14:06 +0300)]
net: aquantia: Fix IFF_ALLMULTI flag functionality

It was noticed that NIC always pass all multicast traffic to the host
regardless of IFF_ALLMULTI flag on the interface.
The rule in MC Filter Table in NIC, that is configured to accept any
multicast packets, is turning on if IFF_MULTICAST flag is set on the
interface. It leads to passing all multicast traffic to the host.
This fix changes the condition to turn on that rule by checking
IFF_ALLMULTI flag as it should.

Fixes: b21f502f84be ("net:ethernet:aquantia: Fix for multicast filter handling.")
Signed-off-by: Dmitry Bogdanov <dmitry.bogdanov@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agorxrpc: Fix the keepalive generator [ver #2]
David Howells [Wed, 8 Aug 2018 10:30:02 +0000 (11:30 +0100)]
rxrpc: Fix the keepalive generator [ver #2]

AF_RXRPC has a keepalive message generator that generates a message for a
peer ~20s after the last transmission to that peer to keep firewall ports
open.  The implementation is incorrect in the following ways:

 (1) It mixes up ktime_t and time64_t types.

 (2) It uses ktime_get_real(), the output of which may jump forward or
     backward due to adjustments to the time of day.

 (3) If the current time jumps forward too much or jumps backwards, the
     generator function will crank the base of the time ring round one slot
     at a time (ie. a 1s period) until it catches up, spewing out VERSION
     packets as it goes.

Fix the problem by:

 (1) Only using time64_t.  There's no need for sub-second resolution.

 (2) Use ktime_get_seconds() rather than ktime_get_real() so that time
     isn't perceived to go backwards.

 (3) Simplifying rxrpc_peer_keepalive_worker() by splitting it into two
     parts:

     (a) The "worker" function that manages the buckets and the timer.

     (b) The "dispatch" function that takes the pending peers and
       potentially transmits a keepalive packet before putting them back
       in the ring into the slot appropriate to the revised last-Tx time.

 (4) Taking everything that's pending out of the ring and splicing it into
     a temporary collector list for processing.

     In the case that there's been a significant jump forward, the ring
     gets entirely emptied and then the time base can be warped forward
     before the peers are processed.

     The warping can't happen if the ring isn't empty because the slot a
     peer is in is keepalive-time dependent, relative to the base time.

 (5) Limit the number of iterations of the bucket array when scanning it.

 (6) Set the timer to skip any empty slots as there's no point waking up if
     there's nothing to do yet.

This can be triggered by an incoming call from a server after a reboot with
AF_RXRPC and AFS built into the kernel causing a peer record to be set up
before userspace is started.  The system clock is then adjusted by
userspace, thereby potentially causing the keepalive generator to have a
meltdown - which leads to a message like:

watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kworker/0:1:23]
...
Workqueue: krxrpcd rxrpc_peer_keepalive_worker
EIP: lock_acquire+0x69/0x80
...
Call Trace:
 ? rxrpc_peer_keepalive_worker+0x5e/0x350
 ? _raw_spin_lock_bh+0x29/0x60
 ? rxrpc_peer_keepalive_worker+0x5e/0x350
 ? rxrpc_peer_keepalive_worker+0x5e/0x350
 ? __lock_acquire+0x3d3/0x870
 ? process_one_work+0x110/0x340
 ? process_one_work+0x166/0x340
 ? process_one_work+0x110/0x340
 ? worker_thread+0x39/0x3c0
 ? kthread+0xdb/0x110
 ? cancel_delayed_work+0x90/0x90
 ? kthread_stop+0x70/0x70
 ? ret_from_fork+0x19/0x24

Fixes: ace45bec6d77 ("rxrpc: Fix firewall route keepalive")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'mlx5-fixes'
David S. Miller [Thu, 9 Aug 2018 02:07:38 +0000 (19:07 -0700)]
Merge branch 'mlx5-fixes'

Saeed Mahameed says:

====================
Mellanox, mlx5e fixes 2018-08-07

I know it is late into 4.18 release, and this is why I am submitting
only two mlx5e ethernet fixes.

The first one from Or, is needed for -stable and it fixes hairpin
for "same device" check.

The second fix is a non risk fix from Huy which cleans up and improves
error return value reporting for dcbnl_ieee_setapp.

For -stable v4.16
- net/mlx5e: Properly check if hairpin is possible between two functions
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/mlx5e: Cleanup of dcbnl related fields
Huy Nguyen [Wed, 8 Aug 2018 22:48:08 +0000 (15:48 -0700)]
net/mlx5e: Cleanup of dcbnl related fields

Remove unused netdev_registered_init/remove in en.h
Return ENOSUPPORT if the check MLX5_DSCP_SUPPORTED fails.
Remove extra white space

Fixes: 2a5e7a1344f4 ("net/mlx5e: Add dcbnl dscp to priority support")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Cc: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/mlx5e: Properly check if hairpin is possible between two functions
Or Gerlitz [Wed, 8 Aug 2018 22:48:07 +0000 (15:48 -0700)]
net/mlx5e: Properly check if hairpin is possible between two functions

The current check relies on function BDF addresses and can get
us wrong e.g when two VFs are assigned into a VM and the PCI
v-address is set by the hypervisor.

Fixes: 5c65c564c962 ('net/mlx5e: Support offloading TC NIC hairpin flows')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Alaa Hleihel <alaa@mellanox.com>
Tested-by: Alaa Hleihel <alaa@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agocifs: create SMB2_open_init()/SMB2_open_free() helpers.
Ronnie Sahlberg [Wed, 8 Aug 2018 05:07:46 +0000 (15:07 +1000)]
cifs: create SMB2_open_init()/SMB2_open_free() helpers.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Paulo Alcantara <palcantara@suse.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>