Dmytro Linkin [Thu, 25 Apr 2019 08:52:02 +0000 (08:52 +0000)]
net/mlx5e: Add missing ethtool driver info for representors
For all representors added firmware version info to show in
ethtool driver info.
For uplink representor, because only it is tied to the pci device
sysfs, added pci bus info.
Fixes: ff9b85de5d5d ("net/mlx5e: Add some ethtool port control entries to the uplink rep netdev") Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com> Reviewed-by: Gavi Teitz <gavi@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Eli Britstein [Mon, 13 May 2019 09:06:02 +0000 (09:06 +0000)]
net/mlx5e: Fix number of vports for ingress ACL configuration
With the cited commit, ACLs are configured for the VF ports. The loop
for the number of ports had the wrong number. Fix it.
Fixes: 184867373d8c ("net/mlx5e: ACLs for priority tag mode") Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Saeed Mahameed [Tue, 7 May 2019 19:59:38 +0000 (12:59 -0700)]
net/mlx5e: Fix ethtool rxfh commands when CONFIG_MLX5_EN_RXNFC is disabled
ethtool user spaces needs to know ring count via ETHTOOL_GRXRINGS when
executing (ethtool -x) which is retrieved via ethtool get_rxnfc callback,
in mlx5 this callback is disabled when CONFIG_MLX5_EN_RXNFC=n.
This patch allows only ETHTOOL_GRXRINGS command on mlx5e_get_rxnfc() when
CONFIG_MLX5_EN_RXNFC is disabled, so ethtool -x will continue working.
net/mlx5: E-Switch, Correct type to u16 for vport_num and int for vport_index
To avoid any ambiguity between vport index and vport number,
rename functions that had vport, to vport_num or vport_index appropriately.
vport_num is u16 hence change mlx5_eswitch_index_to_vport_num() return
type to u16.
vport_index is an int in vport array. Hence change input type of vport
index in mlx5_eswitch_index_to_vport_num() to int.
Correct multiple eswitch representor interfaces use type u16 of
rep->vport as type int vport_index.
Send vport FW commands with correct eswitch u16 vport_num instead
host int vport_index.
Fixes: 5ae5162066d8 ("net/mlx5: E-Switch, Assign a different position for uplink rep and vport") Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
net/mlx5: Add meaningful return codes to status_to_err function
Current version of function status_to_err return -1 for any
status returned by mlx5_cmd_invoke function. In case status is
MLX5_DRIVER_STATUS_ABORTED we should return 0 to the caller as we
assume command completed successfully on FW. If error returned we are
getting confusing messages in dmesg. In addition, currently returned
value -1 is confusing with -EPERM.
New implementation actually fix original commit and return meaningful
codes for commands delivery status and print message in case of failure.
Junwei Hu [Fri, 17 May 2019 11:27:34 +0000 (19:27 +0800)]
tipc: fix modprobe tipc failed after switch order of device registration
Error message printed:
modprobe: ERROR: could not insert 'tipc': Address family not
supported by protocol.
when modprobe tipc after the following patch: switch order of
device registration, commit 7e27e8d6130c
("tipc: switch order of device registration to fix a crash")
Because sock_create_kern(net, AF_TIPC, ...) is called by
tipc_topsrv_create_listener() in the initialization process
of tipc_net_ops, tipc_socket_init() must be execute before that.
I move tipc_socket_init() into function tipc_init_net().
Fixes: 7e27e8d6130c
("tipc: switch order of device registration to fix a crash") Signed-off-by: Junwei Hu <hujunwei4@huawei.com> Reported-by: Wang Wang <wangwang2@huawei.com> Reviewed-by: Kang Zhou <zhoukang7@huawei.com> Reviewed-by: Suanming Mou <mousuanming@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Variable 'entropy' was wrongly documented as 'seed', changed comment to
reflect actual variable name.
../lib/random32.c:179: warning: Function parameter or member 'entropy' not described in 'prandom_seed'
../lib/random32.c:179: warning: Excess function parameter 'seed' description in 'prandom_seed'
Signed-off-by: Philippe Mazenauer <philippe.mazenauer@outlook.de> Acked-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 17 May 2019 17:33:30 +0000 (10:33 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"ARM:
- support for SVE and Pointer Authentication in guests
- PMU improvements
POWER:
- support for direct access to the POWER9 XIVE interrupt controller
- memory and performance optimizations
x86:
- support for accessing memory not backed by struct page
- fixes and refactoring
Generic:
- dirty page tracking improvements"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (155 commits)
kvm: fix compilation on aarch64
Revert "KVM: nVMX: Expose RDPMC-exiting only when guest supports PMU"
kvm: x86: Fix L1TF mitigation for shadow MMU
KVM: nVMX: Disable intercept for FS/GS base MSRs in vmcs02 when possible
KVM: PPC: Book3S: Remove useless checks in 'release' method of KVM device
KVM: PPC: Book3S HV: XIVE: Fix spelling mistake "acessing" -> "accessing"
KVM: PPC: Book3S HV: Make sure to load LPID for radix VCPUs
kvm: nVMX: Set nested_run_pending in vmx_set_nested_state after checks complete
tests: kvm: Add tests for KVM_SET_NESTED_STATE
KVM: nVMX: KVM_SET_NESTED_STATE - Tear down old EVMCS state before setting new state
tests: kvm: Add tests for KVM_CAP_MAX_VCPUS and KVM_CAP_MAX_CPU_ID
tests: kvm: Add tests to .gitignore
KVM: Introduce KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2
KVM: Fix kvm_clear_dirty_log_protect off-by-(minus-)one
KVM: Fix the bitmap range to copy during clear dirty
KVM: arm64: Fix ptrauth ID register masking logic
KVM: x86: use direct accessors for RIP and RSP
KVM: VMX: Use accessors for GPRs outside of dedicated caching logic
KVM: x86: Omit caching logic for always-available GPRs
kvm, x86: Properly check whether a pfn is an MMIO or not
...
Heiner Kallweit [Thu, 16 May 2019 21:13:09 +0000 (23:13 +0200)]
i2c: core: add device-managed version of i2c_new_dummy
i2c_new_dummy is typically called from the probe function of the
driver for the primary i2c client. It requires calls to
i2c_unregister_device in the error path of the probe function and
in the remove function.
This can be simplified by introducing a device-managed version.
Note the changed error case return value type: i2c_new_dummy returns
NULL whilst devm_i2c_new_dummy_device returns an ERR_PTR.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
[wsa: rename new functions and fix minor kdoc issues] Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Peter Rosin <peda@axentia.se> Reviewed-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com> Reviewed-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Heiner Kallweit [Thu, 16 May 2019 21:13:08 +0000 (23:13 +0200)]
i2c: core: improve return value handling of i2c_new_device and i2c_new_dummy
Currently i2c_new_device and i2c_new_dummy return just NULL in error
case although they have more error details internally. Therefore move
the functionality into new functions returning detailed errors and
add wrappers for compatibility with the current API.
This allows to use these functions with detailed error codes within
the i2c core or for API extensions.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
[wsa: rename new functions and fix minor kdoc issues] Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Peter Rosin <peda@axentia.se> Reviewed-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com> Reviewed-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Linus Torvalds [Fri, 17 May 2019 17:08:59 +0000 (10:08 -0700)]
Merge tag 's390-5.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull more s390 updates from Martin Schwidefsky:
- Enhancements for the QDIO layer
- Remove the RCP trace event
- Avoid three build issues
- Move the defconfig to the configs directory
* tag 's390-5.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390: move arch/s390/defconfig to arch/s390/configs/defconfig
s390/qdio: optimize state inspection of HW-owned SBALs
s390/qdio: use get_buf_state() in debug_get_buf_state()
s390/qdio: allow to scan all Output SBALs in one go
s390/cio: Remove tracing for rchp instruction
s390/kasan: adapt disabled_wait usage to avoid build error
latent_entropy: avoid build error when plugin cflags are not set
s390/boot: fix compiler error due to missing awk strtonum
Linus Torvalds [Fri, 17 May 2019 16:46:31 +0000 (09:46 -0700)]
Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull more vfs mount updates from Al Viro:
"Propagation of new syscalls to other architectures + cosmetic change
from Christian (fscontext didn't follow the convention for anon inode
names)"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
uapi: Wire up the mount API syscalls on non-x86 arches [ver #2]
uapi, x86: Fix the syscall numbering of the mount API syscalls [ver #2]
uapi, fsopen: use square brackets around "fscontext" [ver #2]
powerpc/mm/hash: Fix get_region_id() for invalid addresses
Accesses by userspace to random addresses outside the user or kernel
address range will generate an SLB fault. When we handle that fault we
classify the effective address into several classes, eg. user, kernel
linear, kernel virtual etc.
For addresses that are completely outside of any valid range, we
should not insert an SLB entry at all, and instead immediately an
exception.
In the past this was handled in two ways. Firstly we would check the
top nibble of the address (using REGION_ID(ea)) and that would tell us
if the address was user (0), kernel linear (c), kernel virtual (d), or
vmemmap (f). If the address didn't match any of these it was invalid.
Then for each type of address we would do a secondary check. For the
user region we check against H_PGTABLE_RANGE, for kernel linear we
would mask the top nibble of the address and then check the address
against MAX_PHYSMEM_BITS.
As part of commit 0034d395f89d ("powerpc/mm/hash64: Map all the kernel
regions in the same 0xc range") we replaced REGION_ID() with
get_region_id() and changed the masking of the top nibble to only mask
the top two bits, which introduced a bug.
Addresses less than (4 << 60) are still handled correctly, they are
either less than (1 << 60) in which case they are subject to the
H_PGTABLE_RANGE check, or they are correctly checked against
MAX_PHYSMEM_BITS.
However addresses from (4 << 60) to ((0xc << 60) - 1), are incorrectly
treated as kernel linear addresses in get_region_id(). Then the top
two bits are cleared by EA_MASK in slb_allocate_kernel() and the
address is checked against MAX_PHYSMEM_BITS, which it passes due to
the masking. The end result is we incorrectly insert SLB entries for
those addresses.
That is not actually catastrophic, having inserted the SLB entry we
will then go on to take a page fault for the address and at that point
we detect the problem and report it as a bad fault.
Still we should not be inserting those entries, or treating them as
kernel linear addresses in the first place. So fix get_region_id() to
detect addresses in that range and return an invalid region id, which
we cause use to not insert an SLB entry and directly report an
exception.
Fixes: 0034d395f89d ("powerpc/mm/hash64: Map all the kernel regions in the same 0xc range") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
[mpe: Drop change to EA_MASK for now, rewrite change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Paolo Bonzini [Fri, 17 May 2019 12:08:53 +0000 (14:08 +0200)]
kvm: fix compilation on aarch64
Commit e45adf665a53 ("KVM: Introduce a new guest mapping API", 2019-01-31)
introduced a build failure on aarch64 defconfig:
$ make -j$(nproc) ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- O=out defconfig \
Image.gz
...
../arch/arm64/kvm/../../../virt/kvm/kvm_main.c:
In function '__kvm_map_gfn':
../arch/arm64/kvm/../../../virt/kvm/kvm_main.c:1763:9: error:
implicit declaration of function 'memremap'; did you mean 'memset_p'?
../arch/arm64/kvm/../../../virt/kvm/kvm_main.c:1763:46: error:
'MEMREMAP_WB' undeclared (first use in this function)
../arch/arm64/kvm/../../../virt/kvm/kvm_main.c:
In function 'kvm_vcpu_unmap':
../arch/arm64/kvm/../../../virt/kvm/kvm_main.c:1795:3: error:
implicit declaration of function 'memunmap'; did you mean 'vm_munmap'?
because these functions are declared in <linux/io.h> rather than <asm/io.h>,
and the former was being pulled in already on x86 but not on aarch64.
Reported-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Ard Biesheuvel [Thu, 16 May 2019 21:31:59 +0000 (23:31 +0200)]
fbdev/efifb: Ignore framebuffer memmap entries that lack any memory types
The following commit:
38ac0287b7f4 ("fbdev/efifb: Honour UEFI memory map attributes when mapping the FB")
updated the EFI framebuffer code to use memory mappings for the linear
framebuffer that are permitted by the memory attributes described by the
EFI memory map for the particular region, if the framebuffer happens to
be covered by the EFI memory map (which is typically only the case for
framebuffers in shared memory). This is required since non-x86 systems
may require cacheable attributes for memory mappings that are shared
with other masters (such as GPUs), and this information cannot be
described by the Graphics Output Protocol (GOP) EFI protocol itself,
and so we rely on the EFI memory map for this.
As reported by James, this breaks some x86 systems:
[ 1.173368] efifb: probing for efifb
[ 1.173386] efifb: abort, cannot remap video memory 0x1d5000 @ 0xcf800000
[ 1.173395] Trying to free nonexistent resource <00000000cf800000-00000000cf9d4bff>
[ 1.173413] efi-framebuffer: probe of efi-framebuffer.0 failed with error -5
The problem turns out to be that the memory map entry that describes the
framebuffer has no memory attributes listed at all, and so we end up with
a mem_flags value of 0x0.
So work around this by ensuring that the memory map entry's attribute field
has a sane value before using it to mask the set of usable attributes.
Reported-by: James Hilliard <james.hilliard1@gmail.com> Tested-by: James Hilliard <james.hilliard1@gmail.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: <stable@vger.kernel.org> # v4.19+ Cc: Borislav Petkov <bp@alien8.de> Cc: James Morse <james.morse@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Jones <pjones@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Fixes: 38ac0287b7f4 ("fbdev/efifb: Honour UEFI memory map attributes when ...") Link: http://lkml.kernel.org/r/20190516213159.3530-2-ard.biesheuvel@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
Daniel Axtens [Thu, 16 May 2019 15:40:02 +0000 (01:40 +1000)]
crypto: vmx - ghash: do nosimd fallback manually
VMX ghash was using a fallback that did not support interleaving simd
and nosimd operations, leading to failures in the extended test suite.
If I understood correctly, Eric's suggestion was to use the same
data format that the generic code uses, allowing us to call into it
with the same contexts. I wasn't able to get that to work - I think
there's a very different key structure and data layout being used.
So instead steal the arm64 approach and perform the fallback
operations directly if required.
Fixes: cc333cd68dfa ("crypto: vmx - Adding GHASH routines for VMX module") Cc: stable@vger.kernel.org # v4.1+ Reported-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Daniel Axtens <dja@axtens.net> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Daniel Axtens [Wed, 15 May 2019 10:24:50 +0000 (20:24 +1000)]
crypto: vmx - CTR: always increment IV as quadword
The kernel self-tests picked up an issue with CTR mode:
alg: skcipher: p8_aes_ctr encryption test failed (wrong result) on test vector 3, cfg="uneven misaligned splits, may sleep"
In the aesp8-ppc code from OpenSSL, there are two paths that
increment IVs: the bulk (8 at a time) path, and the individual
path which is used when there are fewer than 8 AES blocks to
process.
In the bulk path, the IV is incremented with vadduqm: "Vector
Add Unsigned Quadword Modulo", which does 128-bit addition.
In the individual path, however, the IV is incremented with
vadduwm: "Vector Add Unsigned Word Modulo", which instead
does 4 32-bit additions. Thus the IV would instead become FFFFFFFFFFFFFFFFFFFFFFFF00000000, throwing off the result.
Use vadduqm.
This was probably a typo originally, what with q and w being
adjacent. It is a pretty narrow edge case: I am really
impressed by the quality of the kernel self-tests!
Fixes: 5c380d623ed3 ("crypto: vmx - Add support for VMS instructions by ASM") Cc: stable@vger.kernel.org Signed-off-by: Daniel Axtens <dja@axtens.net> Acked-by: Nayna Jain <nayna@linux.ibm.com> Tested-by: Nayna Jain <nayna@linux.ibm.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Eric Biggers [Tue, 14 May 2019 23:13:15 +0000 (16:13 -0700)]
crypto: hash - fix incorrect HASH_MAX_DESCSIZE
The "hmac(sha3-224-generic)" algorithm has a descsize of 368 bytes,
which is greater than HASH_MAX_DESCSIZE (360) which is only enough for
sha3-224-generic. The check in shash_prepare_alg() doesn't catch this
because the HMAC template doesn't set descsize on the algorithms, but
rather sets it on each individual HMAC transform.
This causes a stack buffer overflow when SHASH_DESC_ON_STACK() is used
with hmac(sha3-224-generic).
Fix it by increasing HASH_MAX_DESCSIZE to the real maximum. Also add a
sanity check to hmac_init().
This was detected by the improved crypto self-tests in v5.2, by loading
the tcrypt module with CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y enabled. I
didn't notice this bug when I ran the self-tests by requesting the
algorithms via AF_ALG (i.e., not using tcrypt), probably because the
stack layout differs in the two cases and that made a difference here.
KASAN report:
BUG: KASAN: stack-out-of-bounds in memcpy include/linux/string.h:359 [inline]
BUG: KASAN: stack-out-of-bounds in shash_default_import+0x52/0x80 crypto/shash.c:223
Write of size 360 at addr ffff8880651defc8 by task insmod/3689
Vincent Chen [Tue, 5 Mar 2019 03:23:35 +0000 (11:23 +0800)]
riscv: Support BUG() in kernel module
The kernel module is loaded into vmalloc region which is located below
to the PAGE_OFFSET. Hence the condition, pc < PAGE_OFFSET, in the
is_valid_bugaddr() will filter out all trap exceptions triggered
by kernel module. To support BUG() in kernel module, the condition is
changed to pc < VMALLOC_START.
Signed-off-by: Vincent Chen <vincentc@andestech.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Vincent Chen [Tue, 5 Mar 2019 03:23:34 +0000 (11:23 +0800)]
riscv: Add the support for c.ebreak check in is_valid_bugaddr()
The macro __BUG_INSN currently is defined as the "ebreak" opcode.
The is_valid_bugaddr() function compares the instruction pointed to by
$sepc with macro __BUG_INSN to check whether the current trap exception
is caused by an "ebreak" instruction. However, this check flow is possibly
erroneous because if C extension is supported, the expected trap
instruction "ebreak" is possibly translated to "c.ebreak" by the assembler.
Therefore, it requires a mechanism to distinguish the length of the
instruction in $spec and compare it to the correct trap instruction.
Signed-off-by: Vincent Chen <vincentc@andestech.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Vincent Chen [Tue, 5 Mar 2019 03:23:33 +0000 (11:23 +0800)]
riscv: support trap-based WARN()
The WARN() related function will trigger a debug exception. This can help
developers to analyze the cause of WARN() because if the debugger is
connected, the control flow will be transferred to debugging
environment.
Signed-off-by: Vincent Chen <vincentc@andestech.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Gary Guo [Wed, 27 Mar 2019 00:41:29 +0000 (00:41 +0000)]
riscv: fix sbi_remote_sfence_vma{,_asid}.
Currently sbi_remote_sfence_vma{,_asid} does not pass their arguments
to SBI at all, which is semantically incorrect.
Neither BBL nor OpenSBI is using these arguments at the moment, and
they just do a global flush instead. However we still need to provide
correct arguments.
Signed-off-by: Gary Guo <gary@garyguo.net> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Gary Guo [Wed, 27 Mar 2019 00:41:29 +0000 (00:41 +0000)]
riscv: move switch_mm to its own file
switch_mm is an expensive operations that has two users.
flush_icache_deferred is only called within switch_mm and can be moved
together. The function is expected to be more complicated when ASID
support is added, so clean up eagerly.
By moving them to a separate file we also removes some excessive
dependency of tlbflush.h and cacheflush.h.
Signed-off-by: Gary Guo <gary@garyguo.net> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Gary Guo [Wed, 27 Mar 2019 00:41:25 +0000 (00:41 +0000)]
riscv: move flush_icache_{all,mm} to cacheflush.c
Currently, flush_icache_all is macro-expanded into a SBI call, yet no
asm/sbi.h is included in asm/cacheflush.h. This could be moved to
mm/cacheflush.c instead (SBI call will dominate performance-wise and
there is no worry to not have it inlined.
Currently, flush_icache_mm stays in kernel/smp.c, which looks like a
hack to prevent it from being compiled when CONFIG_SMP=n. It should
also be in mm/cacheflush.c.
Signed-off-by: Gary Guo <gary@garyguo.net> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
tty: Don't force RISCV SBI console as preferred console
The Linux kernel will auto-disables all boot consoles whenever it
gets a preferred real console.
Currently on RISC-V systems, if we have a real console which is not
RISCV SBI console then boot consoles (such as earlycon=sbi) are not
auto-disabled when a real console (ttyS0 or ttySIF0) is available.
This results in duplicate prints at boot-time after kernel starts
using real console (i.e. ttyS0 or ttySIF0) if "earlycon=" kernel
parameter was passed by bootloader.
The reason for above issue is that RISCV SBI console always adds
itself as preferred console which is causing other real consoles
to be not used as preferred console.
Ideally "console=" kernel parameter passed by bootloaders should
be the one selecting a preferred real console.
This patch fixes above issue by not forcing RISCV SBI console as
preferred console.
We should prefer accessing CSRs using their CSR numbers because:
1. It compiles fine with older toolchains.
2. We can use latest CSR names in #define macro names of CSR numbers
as-per RISC-V spec.
3. We can access newly added CSRs even if toolchain does not recognize
newly addes CSRs by name.
RISC-V: Add interrupt related SCAUSE defines in asm/csr.h
This patch adds SCAUSE interrupt flag and SCAUSE interrupt related
defines to asm/csr.h. We also use these defines in kernel/irq.c and
express SIE/SIP flags in-terms of SCAUSE interrupt causes.
RISC-V: Use tabs to align macro values in asm/csr.h
The spacing between macro name and value is not consistent in
asm/csr.h. This patch beautifies asm/csr.h by using tabs to align
macro values instead of spaces.
Linus Torvalds [Fri, 17 May 2019 02:10:37 +0000 (19:10 -0700)]
Merge tag 'for-linus-20190516' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"A small set of fixes for io_uring.
This contains:
- smp_rmb() cleanup for io_cqring_events() (Jackie)
- io_cqring_wait() simplification (Jackie)
- removal of dead 'ev_flags' passing (me)
- SQ poll CPU affinity verification fix (me)
- SQ poll wait fix (Roman)
- SQE command prep cleanup and fix (Stefan)"
* tag 'for-linus-20190516' of git://git.kernel.dk/linux-block:
io_uring: use wait_event_interruptible for cq_wait conditional wait
io_uring: adjust smp_rmb inside io_cqring_events
io_uring: fix infinite wait in khread_park() on io_finish_async()
io_uring: remove 'ev_flags' argument
io_uring: fix failure to verify SQ_AFF cpu
io_uring: fix race condition reading SQE data
Linus Torvalds [Fri, 17 May 2019 02:08:15 +0000 (19:08 -0700)]
Merge tag 'for-5.2/block-post-20190516' of git://git.kernel.dk/linux-block
Pull more block updates from Jens Axboe:
"This is mainly some late lightnvm changes that came in just before the
merge window, as well as fixes that have been queued up since the
initial pull request was frozen.
- NVMe pull request with minor fixes all over the map (via Christoph)
- remove redundant error print in sata_rcar (Geert)
- struct_size() cleanup (Jackie)
- dasd CONFIG_LBADF warning fix (Ming)
- brd cond_resched() improvement (Mikulas)"
* tag 'for-5.2/block-post-20190516' of git://git.kernel.dk/linux-block: (41 commits)
block/bio-integrity: use struct_size() in kmalloc()
nvme: validate cntlid during controller initialisation
nvme: change locking for the per-subsystem controller list
nvme: trace all async notice events
nvme: fix typos in nvme status code values
nvme-fabrics: remove unused argument
nvme-multipath: avoid crash on invalid subsystem cntlid enumeration
nvme-fc: use separate work queue to avoid warning
nvme-rdma: remove redundant reference between ib_device and tagset
nvme-pci: mark expected switch fall-through
nvme-pci: add known admin effects to augument admin effects log page
nvme-pci: init shadow doorbell after each reset
brd: add cond_resched to brd_free_pages
sata_rcar: Remove ata_host_alloc() error printing
s390/dasd: fix build warning in dasd_eckd_build_cp_raw
lightnvm: pblk: use nvm_rq_to_ppa_list()
lightnvm: pblk: simplify partial read path
lightnvm: do not remove instance under global lock
lightnvm: track inflight target creations
lightnvm: pblk: recover only written metadata
...
Linus Torvalds [Fri, 17 May 2019 02:05:35 +0000 (19:05 -0700)]
Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull more clk framework updates from Stephen Boyd:
"One more patch to remove io.h from clk-provider.h.
We used to need this include when we had clk_readl() and clk_writel(),
but those are gone now so this patch pushes the dependency out to the
users of clk-provider.h"
* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: Remove io.h from clk-provider.h
- Fix to kselftest install to use default install path
- Fix to kselftest KBUILD_OUTPUT builds to not clutter main
KBUILD_OUTPUT directory with selftest objects
- .gitignore fixes (Kelsey Skunberg)
- rseq selftests updates (Mathieu Desnoyers and Martin Schwidefsky)
They change the per-architecture pre-abort signatures to ensure those
are valid trap instructions.
The way exit points are presented to debuggers is enhanced, ensuring
all exit points are present, so debuggers don't have to disassemble
rseq critical section to properly skip over them.
Discussions with the glibc community is reaching a consensus of
exposing a __rseq_handled symbol from glibc to coexist with rseq
early adopters. Update the rseq selftest code to expose and use this
symbol.
Support for compiling asm goto with clang is added with the
"-no-integrated-as" compiler switch, similarly to the top level
kernel Makefile.
- kselftest Makefile test run output refactoring and making test output
TAP13 compliant from Kees Cook:
This re-factors the selftest Makefiles to extract the test running
logic to be reused between "run_tests" and "emit_tests", while also
fixing up the test output to be TAP version 13 compliant:
- added "plan" line
- fixed result line syntax
- moved all test output to be "# "-prefixed as TAP "diagnostic"
lines
The prefixing code includes a fallback mode for limited execution
environments.
Additionally, the plan lines are fixed for all callers of
kselftest.h.
* tag 'linux-kselftest-5.2-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (25 commits)
selftests: avoid KBUILD_OUTPUT dir cluttering with selftest objects
selftests: drivers: Create .gitignore to include /dma-buf/udmabuf
selftests: pidfd: Create .gitignore to include pidfd_test
selftests: fix bpf build/test workflow regression when KBUILD_OUTPUT is set
selftests: fix install target to use default install path
rseq/selftests: add -no-integrated-as for clang
rseq/selftests: mips: use break instruction for RSEQ_SIG
rseq/selftests: powerpc code signature: generate valid instructions
rseq/selftests: aarch64 code signature: handle big-endian environment
rseq/selftests: arm: use udf instruction for RSEQ_SIG
rseq/selftests: s390: use trap4 for RSEQ_SIG
rseq/selftests: x86: use ud1 instruction as RSEQ_SIG opcode
rseq/selftests: s390: use jg instruction for jumps outside of the asm
rseq/selftests: Use __rseq_handled symbol to coexist with glibc
rseq/selftests: Introduce __rseq_cs_ptr_array, rename __rseq_table to __rseq_cs
rseq/selftests: Add __rseq_exit_point_array section for debuggers
rseq/selftests: x86: Work-around bogus gcc-8 optimisation
selftests: Add test plan API to kselftest.h and adjust callers
selftests: Remove KSFT_TAP_LEVEL
selftests: Move test output to diagnostic lines
...
Linus Torvalds [Fri, 17 May 2019 00:18:41 +0000 (17:18 -0700)]
Merge tag 'afs-fixes-b-20190516' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull AFS callback promise fixes from David Howells:
"This series fixes a bunch of problems in callback promise handling,
where a callback promise indicates a promise on the part of the server
to notify the client in the event of some sort of change to a file or
volume. In the event of a break, the client has to go and refetch the
client status from the server and discard any cached permission
information as the ACL might have changed.
The problem in the current code is that changes made by other clients
aren't always noticed, primarily because the file status information
and the callback information aren't updated in the same critical
section, even if these are carried in the same reply from an RPC
operation, and so the AFS_VNODE_CB_PROMISED flag is unreliable.
Arranging for them to be done in the same critical section during
reply decoding is tricky because of the FS.InlineBulkStatus op - which
has all the statuses in the reply arriving and then all the callbacks,
so they have to be buffered. It simplifies things a lot to move the
critical section out of the decode phase and do it after the RPC
function returns.
Also new inodes (either newly fetched or newly created) aren't
properly managed against a callback break happening before we get the
local inode up and running.
Fix this by:
- There's now a combined file status and callback record (struct
afs_status_cb) to carry both plus some flags.
- Each operation wrapper function allocates sufficient afs_status_cb
records for all the vnodes it is interested in and passes them into
RPC operations to be filled in from the reply.
- The FileStatus and CallBack record decoders no longer apply the
new/revised status and callback information to the inode/vnode at
the point of decoding and instead store the information into the
record from (2).
- afs_vnode_commit_status() then revises the file status, detects
deletion and notes callback information inside of a single critical
section. It also checks the callback break counters and cancels the
callback promise if they changed during the operation.
[*] Note that "callback break counters" are counters of server
events that cancel one or more callback promises that the client
thinks it has. The client counts the events and compares the
counters before and after an operation to see if the callback
promise it thinks it just got evaporated before it got recorded
under lock.
- Volume and server callback break counters are passed into
afs_iget() allowing callback breaks concurrent with inode set up to
be detected and the callback promise thence to be cancelled.
- AFS validation checks are now done under RCU conditions using a
read lock on cb_lock. This requires vnode->cb_interest to be made
RCU safe.
- If the checks in (6) fail, the callback breaker is then called
under write lock on the cb_lock - but only if the callback break
counter didn't change from the value read before the checks were
made.
- Results from FS.InlineBulkStatus that correspond to inodes we
currently have in memory are now used to update those inodes'
status and callback information rather than being discarded. This
requires those inodes to be looked up before the RPC op is made and
all their callback break values saved.
To aid in this, the following changes have also been made:
- Don't pass the vnode into the reply delivery functions or the
decoders. The vnode shouldn't be altered anywhere in those paths.
The only exception, for the moment, is for the call done hook for
file lock ops that wants access to both the vnode and the call -
this can be fixed at a later time.
- Get rid of the call->reply[] void* array and replace it with named
and typed members. This avoids confusion since different ops were
mapping different reply[] members to different things.
- Fix an order-1 kmalloc allocation in afs_do_lookup() and replace it
with kvcalloc().
- Always get the reply time. Since callback, lock and fileserver
record expiry times are calculated for several RPCs, make this
mandatory.
- Call afs_pages_written_back() from the operation wrapper rather
than from the delivery function.
- Don't store the version and type from a callback promise in a reply
as the information in them is of very limited use"
* tag 'afs-fixes-b-20190516' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
afs: Fix application of the results of a inline bulk status fetch
afs: Pass pre-fetch server and volume break counts into afs_iget5_set()
afs: Fix unlink to handle YFS.RemoveFile2 better
afs: Clear AFS_VNODE_CB_PROMISED if we detect callback expiry
afs: Make vnode->cb_interest RCU safe
afs: Split afs_validate() so first part can be used under LOOKUP_RCU
afs: Don't save callback version and type fields
afs: Fix application of status and callback to be under same lock
afs: Always get the reply time
afs: Fix order-1 allocation in afs_do_lookup()
afs: Get rid of afs_call::reply[]
afs: Don't pass the vnode pointer through into the inline bulk status op
Linus Torvalds [Fri, 17 May 2019 00:00:13 +0000 (17:00 -0700)]
Merge tag 'afs-fixes-20190516' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull misc AFS fixes from David Howells:
"This fixes a set of miscellaneous issues in the afs filesystem,
including:
- leak of keys on file close.
- broken error handling in xattr functions.
- missing locking when updating VL server list.
- volume location server DNS lookup whereby preloaded cells may not
ever get a lookup and regular DNS lookups to maintain server lists
consume power unnecessarily.
- incorrect error propagation and handling in the fileserver
iteration code causes operations to sometimes apparently succeed.
- interruption of server record check/update side op during
fileserver iteration causes uninterruptible main operations to fail
unexpectedly.
- callback promise expiry time miscalculation.
- over invalidation of the callback promise on directories.
- double locking on callback break waking up file locking waiters.
- double increment of the vnode callback break counter.
Note that it makes some changes outside of the afs code, including:
- an extra parameter to dns_query() to allow the dns_resolver key
just accessed to be immediately invalidated. AFS is caching the
results itself, so the key can be discarded.
- an interruptible version of wait_var_event().
- an rxrpc function to allow the maximum lifespan to be set on a
call.
- a way for an rxrpc call to be marked as non-interruptible"
* tag 'afs-fixes-20190516' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
afs: Fix double inc of vnode->cb_break
afs: Fix lock-wait/callback-break double locking
afs: Don't invalidate callback if AFS_VNODE_DIR_VALID not set
afs: Fix calculation of callback expiry time
afs: Make dynamic root population wait uninterruptibly for proc_cells_lock
afs: Make some RPC operations non-interruptible
rxrpc: Allow the kernel to mark a call as being non-interruptible
afs: Fix error propagation from server record check/update
afs: Fix the maximum lifespan of VL and probe calls
rxrpc: Provide kernel interface to set max lifespan on a call
afs: Fix "kAFS: AFS vnode with undefined type 0"
afs: Fix cell DNS lookup
Add wait_var_event_interruptible()
dns_resolver: Allow used keys to be invalidated
afs: Fix afs_cell records to always have a VL server list record
afs: Fix missing lock when replacing VL server list
afs: Fix afs_xattr_get_yfs() to not try freeing an error value
afs: Fix incorrect error handling in afs_xattr_get_acl()
afs: Fix key leak in afs_release() and afs_evict_inode()
Linus Torvalds [Thu, 16 May 2019 23:24:01 +0000 (16:24 -0700)]
Merge tag 'ceph-for-5.2-rc1' of git://github.com/ceph/ceph-client
Pull ceph updates from Ilya Dryomov:
"On the filesystem side we have:
- a fix to enforce quotas set above the mount point (Luis Henriques)
- support for exporting snapshots through NFS (Zheng Yan)
- proper statx implementation (Jeff Layton). statx flags are mapped
to MDS caps, with AT_STATX_{DONT,FORCE}_SYNC taken into account.
- some follow-up dentry name handling fixes, in particular
elimination of our hand-rolled helper and the switch to __getname()
as suggested by Al (Jeff Layton)
- a set of MDS client cleanups in preparation for async MDS requests
in the future (Jeff Layton)
- a fix to sync the filesystem before remounting (Jeff Layton)
On the rbd side, work is on-going on object-map and fast-diff image
features"
* tag 'ceph-for-5.2-rc1' of git://github.com/ceph/ceph-client: (29 commits)
ceph: flush dirty inodes before proceeding with remount
ceph: fix unaligned access in ceph_send_cap_releases
libceph: make ceph_pr_addr take an struct ceph_entity_addr pointer
libceph: fix unaligned accesses in ceph_entity_addr handling
rbd: don't assert on writes to snapshots
rbd: client_mutex is never nested
ceph: print inode number in __caps_issued_mask debugging messages
ceph: just call get_session in __ceph_lookup_mds_session
ceph: simplify arguments and return semantics of try_get_cap_refs
ceph: fix comment over ceph_drop_caps_for_unlink
ceph: move wait for mds request into helper function
ceph: have ceph_mdsc_do_request call ceph_mdsc_submit_request
ceph: after an MDS request, do callback and completions
ceph: use pathlen values returned by set_request_path_attr
ceph: use __getname/__putname in ceph_mdsc_build_path
ceph: use ceph_mdsc_build_path instead of clone_dentry_name
ceph: fix potential use-after-free in ceph_mdsc_build_path
ceph: dump granular cap info in "caps" debugfs file
ceph: make iterate_session_caps a public symbol
ceph: fix NULL pointer deref when debugging is enabled
...
Linus Torvalds [Thu, 16 May 2019 23:16:18 +0000 (16:16 -0700)]
Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux
Pull thermal management updates from Zhang Rui:
- Remove the 'module' Kconfig option for thermal subsystem framework
because the thermal framework are required to be ready as early as
possible to avoid overheat at boot time (Daniel Lezcano)
- Fix a bug that thermal framework pokes disabled thermal zones upon
resume (Wei Wang)
- A couple of cleanups and trivial fixes on int340x thermal drivers
(Srinivas Pandruvada, Zhang Rui, Sumeet Pawnikar)
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
drivers: thermal: processor_thermal: Downgrade error message
mlxsw: Remove obsolete dependency on THERMAL=m
hwmon/drivers/core: Simplify complex dependency
thermal/drivers/core: Fix typo in the option name
thermal/drivers/core: Remove depends on THERMAL in Kconfig
thermal/drivers/core: Remove module unload code
thermal/drivers/core: Remove the module Kconfig's option
thermal: core: skip update disabled thermal zones after suspend
thermal: make device_register's type argument const
thermal: intel: int340x: processor_thermal_device: simplify to get driver data
thermal/int3403_thermal: favor _TMP instead of PTYP
Linus Torvalds [Thu, 16 May 2019 22:55:48 +0000 (15:55 -0700)]
Merge tag 'for-5.2/dm-changes-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mike Snitzer:
- Improve DM snapshot target's scalability by using finer grained
locking. Requires some list_bl interface improvements.
- Add ability for DM integrity to use a bitmap mode, that tracks
regions where data and metadata are out of sync, instead of using a
journal.
- Improve DM thin provisioning target to not write metadata changes to
disk if the thin-pool and associated thin devices are merely
activated but not used. This avoids metadata corruption due to
concurrent activation of thin devices across different OS instances
(e.g. split brain scenarios, which ultimately would be avoided if
proper device filters were used -- but not having proper filtering
has proven a very common configuration mistake)
- Fix missing call to path selector type->end_io in DM multipath. This
fixes reported performance problems due to inaccurate path selector
IO accounting causing an imbalance of IO (e.g. avoiding issuing IO to
particular path due to it seemingly being heavily used).
- Fix bug in DM cache metadata's loading of its discard bitset that
could lead to all cache blocks being discarded if the very first
cache block was discarded (thankfully in practice the first cache
block is generally in use; be it FS superblock, partition table, disk
label, etc).
- Add testing-only DM dust target which simulates a device that has
failing sectors and/or read failures.
- Fix a DM init error path reference count hang that caused boot hangs
if user supplied malformed input on kernel commandline.
- Fix a couple issues with DM crypt target's logging being overly
verbose or lacking context.
- Various other small fixes to DM init, DM multipath, DM zoned, and DM
crypt.
* tag 'for-5.2/dm-changes-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (42 commits)
dm: fix a couple brace coding style issues
dm crypt: print device name in integrity error message
dm crypt: move detailed message into debug level
dm ioctl: fix hang in early create error condition
dm integrity: whitespace, coding style and dead code cleanup
dm integrity: implement synchronous mode for reboot handling
dm integrity: handle machine reboot in bitmap mode
dm integrity: add a bitmap mode
dm integrity: introduce a function add_new_range_and_wait()
dm integrity: allow large ranges to be described
dm ingerity: pass size to dm_integrity_alloc_page_list()
dm integrity: introduce rw_journal_sectors()
dm integrity: update documentation
dm integrity: don't report unused options
dm integrity: don't check null pointer before kvfree and vfree
dm integrity: correctly calculate the size of metadata area
dm dust: Make dm_dust_init and dm_dust_exit static
dm dust: remove redundant unsigned comparison to less than zero
dm mpath: always free attached_handler_name in parse_path()
dm init: fix max devices/targets checks
...
Qian Cai [Thu, 16 May 2019 19:57:41 +0000 (15:57 -0400)]
slab: remove /proc/slab_allocators
It turned out that DEBUG_SLAB_LEAK is still broken even after recent
recue efforts that when there is a large number of objects like
kmemleak_object which is normal on a debug kernel,
reading /proc/slab_allocators could easily loop forever while processing
the kmemleak_object cache and any additional freeing or allocating
objects will trigger a reprocessing. To make a situation worse,
soft-lockups could easily happen in this sitatuion which will call
printk() to allocate more kmemleak objects to guarantee an infinite
loop.
Also, since it seems no one had noticed when it was totally broken
more than 2-year ago - see the commit fcf88917dd43 ("slab: fix a crash
by reading /proc/slab_allocators"), probably nobody cares about it
anymore due to the decline of the SLAB. Just remove it entirely.
Baolin Wang [Wed, 10 Apr 2019 07:22:50 +0000 (15:22 +0800)]
arm64: dts: sprd: Add clock properties for serial devices
We've introduced power management logics for the Spreadtrum serial
controller by commit 062ec2774c8a ("serial: sprd: Add power management
for the Spreadtrum serial controller"), thus add related clock properties
to support this feature.
Signed-off-by: Baolin Wang <baolin.wang@linaro.org> Signed-off-by: Olof Johansson <olof@lixom.net>
Wei Wang [Thu, 16 May 2019 20:30:54 +0000 (13:30 -0700)]
ipv6: fix src addr routing with the exception table
When inserting route cache into the exception table, the key is
generated with both src_addr and dest_addr with src addr routing.
However, current logic always assumes the src_addr used to generate the
key is a /128 host address. This is not true in the following scenarios:
1. When the route is a gateway route or does not have next hop.
(rt6_is_gw_or_nonexthop() == false)
2. When calling ip6_rt_cache_alloc(), saddr is passed in as NULL.
This means, when looking for a route cache in the exception table, we
have to do the lookup twice: first time with the passed in /128 host
address, second time with the src_addr stored in fib6_info.
This solves the pmtu discovery issue reported by Mikael Magnusson where
a route cache with a lower mtu info is created for a gateway route with
src addr. However, the lookup code is not able to find this route cache.
Fixes: 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache") Reported-by: Mikael Magnusson <mikael.kernel@lists.m7n.se> Bisected-by: David Ahern <dsahern@gmail.com> Signed-off-by: Wei Wang <weiwan@google.com> Cc: Martin Lau <kafai@fb.com> Cc: Eric Dumazet <edumazet@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 16 May 2019 17:41:31 +0000 (10:41 -0700)]
selftests: pmtu.sh: Remove quotes around commands in setup_xfrm
The first command in setup_xfrm is failing resulting in the test getting
skipped:
+ ip netns exec ns-B ip -6 xfrm state add src fd00:1::a dst fd00:1::b spi 0x1000 proto esp aead 'rfc4106(gcm(aes))' 0x0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f 128 mode tunnel
+ out=RTNETLINK answers: Function not implemented
...
xfrm6 not supported
TEST: vti6: PMTU exceptions [SKIP]
xfrm4 not supported
TEST: vti4: PMTU exceptions [SKIP]
...
The setup command started failing when the run_cmd option was added.
Removing the quotes fixes the problem:
...
TEST: vti6: PMTU exceptions [ OK ]
TEST: vti4: PMTU exceptions [ OK ]
...
Fixes: 56490b623aa0 ("selftests: Add debugging options to pmtu.sh") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 16 May 2019 15:09:57 +0000 (08:09 -0700)]
net: avoid weird emergency message
When host is under high stress, it is very possible thread
running netdev_wait_allrefs() returns from msleep(250)
10 seconds late.
This leads to these messages in the syslog :
[...] unregister_netdevice: waiting for syz_tun to become free. Usage count = 0
If the device refcount is zero, the wait is over.
Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Howells [Tue, 14 May 2019 11:33:10 +0000 (12:33 +0100)]
afs: Fix application of the results of a inline bulk status fetch
Fix afs_do_lookup() such that when it does an inline bulk status fetch op,
it will update inodes that are already extant (something that afs_iget()
doesn't do) and to cache permits for each inode created (thereby avoiding a
follow up FS.FetchStatus call to determine this).
Extant inodes need looking up in advance so that their cb_break counters
before and after the operation can be compared. To this end, the inode
pointers are cached so that they don't need looking up again after the op.
Fixes: 5cf9dd55a0ec ("afs: Prospectively look up extra files when doing a single lookup") Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Tue, 14 May 2019 11:23:43 +0000 (12:23 +0100)]
afs: Pass pre-fetch server and volume break counts into afs_iget5_set()
Pass the server and volume break counts from before the status fetch
operation that queried the attributes of a file into afs_iget5_set() so
that the new vnode's break counters can be initialised appropriately.
This allows detection of a volume or server break that happened whilst we
were fetching the status or setting up the vnode.
Fixes: c435ee34551e ("afs: Overhaul the callback handling") Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Tue, 14 May 2019 11:29:11 +0000 (12:29 +0100)]
afs: Fix unlink to handle YFS.RemoveFile2 better
Make use of the status update for the target file that the YFS.RemoveFile2
RPC op returns to correctly update the vnode as to whether the file was
actually deleted or just had nlink reduced.
Fixes: 30062bd13e36 ("afs: Implement YFS support in the fs client") Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Thu, 9 May 2019 13:15:11 +0000 (14:15 +0100)]
afs: Clear AFS_VNODE_CB_PROMISED if we detect callback expiry
Fix afs_validate() to clear AFS_VNODE_CB_PROMISED on a vnode if we detect
any condition that causes the callback promise to be broken implicitly,
including server break (cb_s_break), volume break (cb_v_break) or callback
expiry.
Fixes: ae3b7361dc0e ("afs: Fix validation/callback interaction") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Mon, 13 May 2019 15:14:32 +0000 (16:14 +0100)]
afs: Make vnode->cb_interest RCU safe
Use RCU-based freeing for afs_cb_interest struct objects and use RCU on
vnode->cb_interest. Use that change to allow afs_check_validity() to use
read_seqbegin_or_lock() instead of read_seqlock_excl().
This also requires the caller of afs_check_validity() to hold the RCU read
lock across the call.
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Tue, 14 May 2019 14:35:44 +0000 (15:35 +0100)]
afs: Don't save callback version and type fields
Don't save callback version and type fields as the version is about the
format of the callback information and the type is relative to the
particular RPC call.
Signed-off-by: David Howells <dhowells@redhat.com>
First and second original patches are already dropped from stable,
No need to stable-queue the third patch as it has no functional impact,
just a logic cleanup.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh [Thu, 16 May 2019 14:52:25 +0000 (14:52 +0000)]
aqc111: cleanup mtu related logic
Original fix b8b277525e9d was done under impression that invalid data
could be written for mtu configuration higher that 16334.
But the high limit will anyway be rejected my max_mtu check in caller.
Thus, make the code cleaner and allow it doing the configuration without
checking for maximum mtu value.
Fixes: b8b277525e9d ("aqc111: fix endianness issue in aqc111_change_mtu") Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Thu, 16 May 2019 09:28:16 +0000 (11:28 +0200)]
xfrm: ressurrect "Fix uninitialized memory read in _decode_session4"
This resurrects commit 8742dc86d0c7a9628
("xfrm4: Fix uninitialized memory read in _decode_session4"),
which got lost during a merge conflict resolution between ipsec-next
and net-next tree.
c53ac41e3720 ("xfrm: remove decode_session indirection from afinfo_policy")
in ipsec-next moved the (buggy) _decode_session4 from
net/ipv4/xfrm4_policy.c to net/xfrm/xfrm_policy.c.
In mean time, 8742dc86d0c7a was applied to ipsec.git and fixed the
problem in the "old" location.
When the trees got merged, the moved, old function was kept.
This applies the "lost" commit again, to the new location.
Fixes: a658a3f2ecbab ("Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Rob Herring [Fri, 10 May 2019 18:51:11 +0000 (13:51 -0500)]
dt-bindings: Convert vendor prefixes to json-schema
Convert the vendor prefix registry to a schema. This will enable checking
that new vendor prefixes are added (in addition to the less than perfect
checkpatch.pl check) and will also check against adding other prefixes
which are not vendors.
Converted vendor-prefixes.txt using the following sed script:
sed -e 's/\([a-zA-Z0-9\-]*\)[[:space:]]*\([a-zA-Z0-9].*\)/ "^\1,\.\*\":\n description: \2/'
Andrii Nakryiko [Thu, 16 May 2019 03:39:27 +0000 (20:39 -0700)]
libbpf: move logging helpers into libbpf_internal.h
libbpf_util.h header was recently exposed as public as a dependency of
xsk.h. In addition to memory barriers, it contained logging helpers,
which are not supposed to be exposed. This patch moves those into
libbpf_internal.h, which is kept as an internal header.
Cc: Stanislav Fomichev <sdf@google.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Fixes: 7080da890984 ("libbpf: add libbpf_util.h to header install.") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Junwei Hu [Thu, 16 May 2019 02:51:15 +0000 (10:51 +0800)]
tipc: switch order of device registration to fix a crash
When tipc is loaded while many processes try to create a TIPC socket,
a crash occurs:
PANIC: Unable to handle kernel paging request at virtual
address "dfff20000000021d"
pc : tipc_sk_create+0x374/0x1180 [tipc]
lr : tipc_sk_create+0x374/0x1180 [tipc]
Exception class = DABT (current EL), IL = 32 bits
Call trace:
tipc_sk_create+0x374/0x1180 [tipc]
__sock_create+0x1cc/0x408
__sys_socket+0xec/0x1f0
__arm64_sys_socket+0x74/0xa8
...
This is due to race between sock_create and unfinished
register_pernet_device. tipc_sk_insert tries to do
"net_generic(net, tipc_net_id)".
but tipc_net_id is not initialized yet.
So switch the order of the two to close the race.
This can be reproduced with multiple processes doing socket(AF_TIPC, ...)
and one process doing module removal.
Fixes: a62fbccecd62 ("tipc: make subscriber server support net namespace") Signed-off-by: Junwei Hu <hujunwei4@huawei.com> Reported-by: Wang Wang <wangwang2@huawei.com> Reviewed-by: Xiaogang Wang <wangxiaogang3@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 16 May 2019 02:39:52 +0000 (19:39 -0700)]
ipv6: prevent possible fib6 leaks
At ipv6 route dismantle, fib6_drop_pcpu_from() is responsible
for finding all percpu routes and set their ->from pointer
to NULL, so that fib6_ref can reach its expected value (1).
The problem right now is that other cpus can still catch the
route being deleted, since there is no rcu grace period
between the route deletion and call to fib6_drop_pcpu_from()
This can leak the fib6 and associated resources, since no
notifier will take care of removing the last reference(s).
I decided to add another boolean (fib6_destroying) instead
of reusing/renaming exception_bucket_flushed to ease stable backports,
and properly document the memory barriers used to implement this fix.
This patch has been co-developped with Wei Wang.
Fixes: 93531c674315 ("net/ipv6: separate handling of FIB entries from dst based routes") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Wei Wang <weiwan@google.com> Cc: David Ahern <dsahern@gmail.com> Cc: Martin Lau <kafai@fb.com> Acked-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Willem de Bruijn [Wed, 15 May 2019 17:29:16 +0000 (13:29 -0400)]
net: test nouarg before dereferencing zerocopy pointers
Zerocopy skbs without completion notification were added for packet
sockets with PACKET_TX_RING user buffers. Those signal completion
through the TP_STATUS_USER bit in the ring. Zerocopy annotation was
added only to avoid premature notification after clone or orphan, by
triggering a copy on these paths for these packets.
The mechanism had to define a special "no-uarg" mode because packet
sockets already use skb_uarg(skb) == skb_shinfo(skb)->destructor_arg
for a different pointer.
Before deferencing skb_uarg(skb), verify that it is a real pointer.
Fixes: 5cd8d46ea1562 ("packet: copy user buffers before orphan or clone") Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: phy: aquantia: readd XGMII support for AQR107
XGMII interface mode no longer works on AQR107 after the recent changes,
adding back support.
Fixes: 570c8a7d5303 ("net: phy: aquantia: check for supported interface modes in config_init") Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: bpfilter: fallback to netfilter if failed to load bpfilter kernel module
If bpfilter is not available return ENOPROTOOPT to fallback to netfilter.
Function request_module() returns both errors and userspace exit codes.
Just ignore them. Rechecking bpfilter_ops is enough.
Fixes: d2ba09c17a06 ("net: add skeleton of bpfilter kernel module") Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
Sunil Muthuswamy [Wed, 15 May 2019 00:56:05 +0000 (00:56 +0000)]
hv_sock: Add support for delayed close
Currently, hvsock does not implement any delayed or background close
logic. Whenever the hvsock socket is closed, a FIN is sent to the peer, and
the last reference to the socket is dropped, which leads to a call to
.destruct where the socket can hang indefinitely waiting for the peer to
close it's side. The can cause the user application to hang in the close()
call.
This change implements proper STREAM(TCP) closing handshake mechanism by
sending the FIN to the peer and the waiting for the peer's FIN to arrive
for a given timeout. On timeout, it will try to terminate the connection
(i.e. a RST). This is in-line with other socket providers such as virtio.
This change does not address the hang in the vmbus_hvsock_device_unregister
where it waits indefinitely for the host to rescind the channel. That
should be taken up as a separate fix.
Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 16 May 2019 19:02:42 +0000 (12:02 -0700)]
Merge branch 'flow_offload-fix-CVLAN-support'
Edward Cree says:
====================
flow_offload: fix CVLAN support
When the flow_offload infrastructure was added, CVLAN matches weren't
plumbed through, and flow_rule_match_vlan() was incorrectly called in
the mlx5 driver when populating CVLAN match information. This series
adds flow_rule_match_cvlan(), and uses it in the mlx5 code.
Both patches should also go to 5.1 stable.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jianbo Liu [Tue, 14 May 2019 20:18:50 +0000 (21:18 +0100)]
net/mlx5e: Fix calling wrong function to get inner vlan key and mask
When flow_rule_match_XYZ() functions were first introduced,
flow_rule_match_cvlan() for inner vlan is missing.
In mlx5_core driver, to get inner vlan key and mask, flow_rule_match_vlan()
is just called, which is wrong because it obtains outer vlan information by
FLOW_DISSECTOR_KEY_VLAN.
This commit fixes this by changing to call flow_rule_match_cvlan() after
it's added.
Fixes: 8f2566225ae2 ("flow_offload: add flow_rule and flow_match structures and use them") Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 16 May 2019 18:57:16 +0000 (11:57 -0700)]
Merge tag 'media/v5.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
"Some fixes for some platform drivers (rockchip, atmel, omap, daVinci,
tegra-cec, coda and rcar).
Also includes a fix on one of the V4L2 uAPI doc, explaining a border
case"
* tag 'media/v5.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
media: rockchip/vpu: Fix/re-order probe-error/remove path
media: rockchip/vpu: Initialize mdev->bus_info
media: rockchip/vpu: Get vdev from the file arg in vidioc_querycap()
media: rockchip/vpu: Add missing dont_use_autosuspend() calls
media: rockchip/vpu: Do not request id 0 for our video device
media: tegra-cec: fix cec_notifier_parse_hdmi_phandle return check
media: davinci/vpbe: array underflow in vpbe_enum_outputs()
media: field-order.rst: clarify FIELD_ANY and FIELD_NONE
media: staging/imx: add media device to capture register
media: rcar-csi2: Propagate the FLD signal for NTSC and PAL
media: rcar-csi2: restart CSI-2 link if error is detected
media: omap_vout: potential buffer overflow in vidioc_dqbuf()
media: coda: fix unset field and fail on invalid field in buf_prepare
media: atmel: atmel-isc: fix asd memory allocation
media: atmel: atmel-isc: fix INIT_WORK misplacement
media: atmel: atmel-isc: limit incoming pixels per frame
Linus Torvalds [Thu, 16 May 2019 18:55:35 +0000 (11:55 -0700)]
Merge tag 'edac_fixes_for_5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
Pull EDAC fixes from Borislav Petkov:
- Do not build mpc85_edac as a module (Michael Ellerman)
- Correct edac_mc_find()'s return value on error (Robert Richter)
* tag 'edac_fixes_for_5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
EDAC/mc: Fix edac_mc_find() in case no device is found
EDAC/mpc85xx: Prevent building as a module
Yonghong Song [Thu, 16 May 2019 17:17:31 +0000 (10:17 -0700)]
tools/bpftool: move set_max_rlimit() before __bpf_object__open_xattr()
For a host which has a lower rlimit for max locked memory (e.g., 64KB),
the following error occurs in one of our production systems:
# /usr/sbin/bpftool prog load /paragon/pods/52877437/home/mark.o \
/sys/fs/bpf/paragon_mark_21 type cgroup/skb \
map idx 0 pinned /sys/fs/bpf/paragon_map_21
libbpf: Error in bpf_object__probe_name():Operation not permitted(1).
Couldn't load basic 'r0 = 0' BPF program.
Error: failed to open object file
The reason is due to low locked memory during bpf_object__probe_name()
which probes whether program name is supported in kernel or not
during __bpf_object__open_xattr().
bpftool program load already tries to relax mlock rlimit before
bpf_object__load(). Let us move set_max_rlimit() before
__bpf_object__open_xattr(), which fixed the issue here.
Fixes: 47eff61777c7 ("bpf, libbpf: introduce bpf_object__probe_caps to test BPF capabilities") Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Chenbo Feng [Wed, 15 May 2019 02:42:57 +0000 (19:42 -0700)]
bpf: relax inode permission check for retrieving bpf program
For iptable module to load a bpf program from a pinned location, it
only retrieve a loaded program and cannot change the program content so
requiring a write permission for it might not be necessary.
Also when adding or removing an unrelated iptable rule, it might need to
flush and reload the xt_bpf related rules as well and triggers the inode
permission check. It might be better to remove the write premission
check for the inode so we won't need to grant write access to all the
processes that flush and restore iptables rules.
Linus Torvalds [Thu, 16 May 2019 18:26:37 +0000 (11:26 -0700)]
Merge tag 'asm-generic-nommu' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull nommu generic uaccess updates from Arnd Bergmann:
"asm-generic: kill <asm/segment.h> and improve nommu generic uaccess helpers
Christoph Hellwig writes:
This is a series doing two somewhat interwinded things. It improves
the asm-generic nommu uaccess helper to optionally be entirely
generic and not require any arch helpers for the actual uaccess.
For the generic uaccess.h to actually be generically useful I also
had to kill off the mess we made of <asm/segment.h>, which really
shouldn't exist on most architectures"
* tag 'asm-generic-nommu' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
asm-generic: optimize generic uaccess for 8-byte loads and stores
asm-generic: provide entirely generic nommu uaccess
arch: mostly remove <asm/segment.h>
asm-generic: don't include <asm/segment.h> from <asm/uaccess.h>
Olof Johansson [Thu, 16 May 2019 18:05:11 +0000 (11:05 -0700)]
Merge tag 'at91-5.2-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/at91/linux into arm/late
AT91 SoC for 5.2
- PM changes for SAM9X60
* tag 'at91-5.2-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/at91/linux:
ARM: at91: pm: do not disable/enable PLLA for ULP modes
ARM: at91: pm: disable RC oscillator in ULP0
ARM: at91: pm: add ULP1 support for SAM9X60
ARM: at91: pm: add support for per SoC wakeup source configuration
ARM: at91: pm: keep at91_pm_backup_init() only for SAMA5D2 SoCs
ARM: at91: pm: initial PM support for SAM9X60
dt-bindings: arm: atmel: add binding for SAM9X60 SoC
ARM: at91: pm: introduce at91_soc_pm structure
Linus Torvalds [Thu, 16 May 2019 18:02:27 +0000 (11:02 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Misc fixes and updates:
- a handful of MDS documentation/comment updates
- a cleanup related to hweight interfaces
- a SEV guest fix for large pages
- a kprobes LTO fix
- and a final cleanup commit for vDSO HPET support removal"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/speculation/mds: Improve CPU buffer clear documentation
x86/speculation/mds: Revert CPU buffer clear on double fault exit
x86/kconfig: Disable CONFIG_GENERIC_HWEIGHT and remove __HAVE_ARCH_SW_HWEIGHT
x86/mm: Do not use set_{pud, pmd}_safe() when splitting a large page
x86/kprobes: Make trampoline_handler() global and visible
x86/vdso: Remove hpet_page from vDSO
Linus Torvalds [Thu, 16 May 2019 18:00:20 +0000 (11:00 -0700)]
Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull time fixes from Ingo Molnar:
"A TIA adjtimex interface extension, and a POSIX compliance ABI fix for
timespec64 users"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
ntp: Allow TAI-UTC offset to be set to zero
y2038: Make CONFIG_64BIT_TIME unconditional
Olof Johansson [Thu, 16 May 2019 17:59:33 +0000 (10:59 -0700)]
Merge tag 'at91-5.2-defconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/at91/linux into arm/late
AT91 defconfig for 5.2
- ov2640 driver as module
- selecting HAVE_FB_ATMEL for SAMA5 SoCs is useless
* tag 'at91-5.2-defconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/at91/linux:
ARM: at91: sama5: make ov2640 as a module
ARM: at91: remove HAVE_FB_ATMEL for sama5 SoC as they use DRM
Linus Torvalds [Thu, 16 May 2019 17:58:54 +0000 (10:58 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"An x86 PMU constraint fix, an interface fix, and a Sparse fix"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Allow PEBS multi-entry in watermark mode
perf/x86/intel: Fix INTEL_FLAGS_EVENT_CONSTRAINT* masking
perf/x86/amd/iommu: Make the 'amd_iommu_attr_groups' symbol static