Mike Snitzer [Tue, 5 Feb 2019 22:07:58 +0000 (17:07 -0500)]
dm: don't use bio_trim() afterall
bio_trim() has an early return, which makes it _not_ idempotent, if the
offset is 0 and the bio's bi_size already matches the requested size.
Prior to DM, all users of bio_trim() were fine with this. But DM has
exposed the fact that bio_trim()'s early return is incompatible with a
cloned bio whose integrity payload must be trimmed via
bio_integrity_trim().
Fix this by reverting DM back to doing the equivalent of bio_trim() but
in an idempotent manner (so bio_integrity_trim is always performed).
Follow-on work is needed to assess what benefit bio_trim()'s early
return is providing to its existing callers.
Reported-by: Milan Broz <gmazyland@gmail.com> Fixes: 57c36519e4b94 ("dm: fix clone_bio() to trigger blk_recount_segments()") Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Mikulas Patocka [Tue, 5 Feb 2019 10:09:00 +0000 (05:09 -0500)]
dm: add memory barrier before waitqueue_active
Block core changes to switch bio-based IO accounting to be percpu had a
side-effect of altering DM core to now rely on calling waitqueue_active
(in both bio-based and request-based) to check if another task is in
dm_wait_for_completion().
A memory barrier is needed before calling waitqueue_active(). DM core
doesn't piggyback on a preceding memory barrier so it must explicitly
use its own.
For more details on why using waitqueue_active() without a preceding
barrier is unsafe, please see the comment before the waitqueue_active()
definition in include/linux/wait.h.
Add the missing memory barrier by switching to using wq_has_sleeper().
Fixes: 6f75723190d8 ("dm: remove the pending IO accounting") Fixes: c4576aed8d85 ("dm: fix request-based dm's use of dm_wait_for_completion") Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Dan Carpenter [Wed, 6 Feb 2019 15:35:15 +0000 (18:35 +0300)]
net: dsa: Fix NULL checking in dsa_slave_set_eee()
This function can't succeed if dp->pl is NULL. It will Oops inside the
call to return phylink_ethtool_get_eee(dp->pl, e);
Fixes: 1be52e97ed3e ("dsa: slave: eee: Allow ports to use phylink") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Chuck Lever [Fri, 25 Jan 2019 21:54:54 +0000 (16:54 -0500)]
svcrdma: Remove max_sge check at connect time
Two and a half years ago, the client was changed to use gathered
Send for larger inline messages, in commit 655fec6987b ("xprtrdma:
Use gathered Send for large inline messages"). Several fixes were
required because there are a few in-kernel device drivers whose
max_sge is 3, and these were broken by the change.
Apparently my memory is going, because some time later, I submitted
commit 25fd86eca11c ("svcrdma: Don't overrun the SGE array in
svc_rdma_send_ctxt"), and after that, commit f3c1fd0ee294 ("svcrdma:
Reduce max_send_sges"). These too incorrectly assumed in-kernel
device drivers would have more than a few Send SGEs available.
The fix for the server side is not the same. This is because the
fundamental problem on the server is that, whether or not the client
has provisioned a chunk for the RPC reply, the server must squeeze
even the most complex RPC replies into a single RDMA Send. Failing
in the send path because of Send SGE exhaustion should never be an
option.
Therefore, instead of failing when the send path runs out of SGEs,
switch to using a bounce buffer mechanism to handle RPC replies that
are too complex for the device to send directly. That allows us to
remove the max_sge check to enable drivers with small max_sge to
work again.
Reported-by: Don Dutile <ddutile@redhat.com> Fixes: 25fd86eca11c ("svcrdma: Don't overrun the SGE array in ...") Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Trond Myklebust [Mon, 21 Jan 2019 20:58:38 +0000 (15:58 -0500)]
nfsd: Fix error return values for nfsd4_clone_file_range()
If the parameter 'count' is non-zero, nfsd4_clone_file_range() will
currently clobber all errors returned by vfs_clone_file_range() and
replace them with EINVAL.
Fixes: 42ec3d4c0218 ("vfs: make remap_file_range functions take and...") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: stable@vger.kernel.org # v4.20+ Signed-off-by: J. Bruce Fields <bfields@redhat.com>
When something let __find_get_block_slow() hit all_mapped path, it calls
printk() for 100+ times per a second. But there is no need to print same
message with such high frequency; it is just asking for stall warning, or
at least bloating log files.
Russell King [Wed, 6 Feb 2019 10:54:54 +0000 (10:54 +0000)]
MAINTAINERS: add maintainer for SFF/SFP/SFP+ support
Add maintainer entry for SFF/SFP/SFP+ support.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Chengguang Xu [Fri, 1 Feb 2019 03:26:02 +0000 (11:26 +0800)]
m68k: set proper major_num when specifying module param major_num
When calling register_blkdev() with specified major
device number, the return code is 0 on success.
So it seems not correct direct assign return code to
variable major_num in this case.
Hans de Goede [Sun, 3 Feb 2019 09:02:07 +0000 (10:02 +0100)]
libata: Add NOLPM quirk for SAMSUNG MZ7TE512HMHP-000L1 SSD
We've received a bugreport that using LPM with a SAMSUNG
MZ7TE512HMHP-000L1 SSD leads to system instability, we already have
a quirk for the MZ7TD256HAFV-000L9, which is also a Samsun EVO 840 /
PM851 OEM model, so it seems some of these models have a LPM issue.
This commits adds a NOLPM quirk for the model string from the new
bugeport, to avoid the reported stability issues.
Eric Dumazet [Mon, 4 Feb 2019 16:36:06 +0000 (08:36 -0800)]
rxrpc: bad unlock balance in rxrpc_recvmsg
When either "goto wait_interrupted;" or "goto wait_error;"
paths are taken, socket lock has already been released.
This patch fixes following syzbot splat :
WARNING: bad unlock balance detected!
5.0.0-rc4+ #59 Not tainted
-------------------------------------
syz-executor223/8256 is trying to release lock (sk_lock-AF_RXRPC) at:
[<ffffffff86651353>] rxrpc_recvmsg+0x6d3/0x3099 net/rxrpc/recvmsg.c:598
but there are no more locks to release!
other info that might help us debug this:
1 lock held by syz-executor223/8256:
#0: 00000000fa9ed0f4 (slock-AF_RXRPC){+...}, at: spin_lock_bh include/linux/spinlock.h:334 [inline]
#0: 00000000fa9ed0f4 (slock-AF_RXRPC){+...}, at: release_sock+0x20/0x1c0 net/core/sock.c:2798
Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: David Howells <dhowells@redhat.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
x86/boot/compressed/64: Do not corrupt EDX on EFER.LME=1 setting
RDMSR in the trampoline code overwrites EDX but that register is used
to indicate whether 5-level paging has to be enabled and if clobbered,
leads to failure to boot on a 5-level paging machine.
Preserve EDX on the stack while we are dealing with EFER.
Fixes: b677dfae5aa1 ("x86/boot/compressed/64: Set EFER.LME=1 in 32-bit trampoline before returning to long mode") Reported-by: Kyle D Pelton <kyle.d.pelton@intel.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: dave.hansen@linux.intel.com Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Huang <wei@redhat.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190206115253.1907-1-kirill.shutemov@linux.intel.com
Keith Busch [Thu, 24 Jan 2019 01:46:11 +0000 (18:46 -0700)]
nvme-pci: fix rapid add remove sequence
A surprise removal may fail to tear down request queues if it is racing
with the initial asynchronous probe. If that happens, the remove path
won't see the queue resources to tear down, and the controller reset
path may create a new request queue on a removed device, but will not
be able to make forward progress, deadlocking the pci removal.
Protect setting up non-blocking resources from a shutdown by holding the
same mutex, and transition to the CONNECTING state after these resources
are initialized so the probe path may see the dead controller state
before dispatching new IO.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=202081 Reported-by: Alex Gagniuc <Alex_Gagniuc@Dellteam.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Tested-by: Alex Gagniuc <mr.nuke.me@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
Keith Busch [Mon, 28 Jan 2019 16:46:07 +0000 (09:46 -0700)]
nvme: lock NS list changes while handling command effects
If a controller supports the NS Change Notification, the namespace
scan_work is automatically triggered after attaching a new namespace.
Occasionally the namespace scan_work may append the new namespace to the
list before the admin command effects handling is completed. The effects
handling unfreezes namespaces, but if it unfreezes the newly attached
namespace, its request_queue freeze depth will be off and we'll hit the
warning in blk_mq_unfreeze_queue().
On the next namespace add, we will fail to freeze that queue due to the
previous bad accounting and deadlock waiting for frozen.
Fix that by preventing scan work from altering the namespace list while
command effects handling needs to pair freeze with unfreeze.
Reported-by: Wen Xiong <wenxiong@us.ibm.com> Tested-by: Wen Xiong <wenxiong@us.ibm.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
Tomi Valkeinen [Fri, 11 Jan 2019 03:50:35 +0000 (05:50 +0200)]
drm/omap: dsi: Hack-fix DSI bus flags
Since commit b4935e3a3cfa ("drm/omap: Store bus flags in the
omap_dss_device structure") video mode flags are managed by the omapdss
(and later omapdrm) core based on bus flags stored in omap_dss_device.
This works fine for all devices whose video modes are set by the omapdss
and omapdrm core, but breaks DSI operation as the DSI still uses legacy
code paths and sets the DISPC timings manually.
To fix the problem properly we should move the DSI encoder to the new
encoder model. This will however require a considerable amount of work.
Restore DSI operation by adding back video mode flags handling in the
DSI encoder driver as a hack in the meantime.
Tomi Valkeinen [Fri, 11 Jan 2019 03:50:34 +0000 (05:50 +0200)]
drm/omap: dsi: Fix OF platform depopulate
Commit edb715dffdee ("drm/omap: dss: dsi: Move initialization code from
bind to probe") moved the of_platform_populate() call from dsi_bind() to
dsi_probe(), but failed to move the corresponding
of_platform_depopulate() from dsi_unbind() to dsi_remove(). This results
in OF child devices being potentially removed multiple times. Fix it by
placing the of_platform_depopulate() call where it belongs.
Tomi Valkeinen [Fri, 11 Jan 2019 03:50:33 +0000 (05:50 +0200)]
drm/omap: dsi: Fix crash in DSI debug dumps
Reading any of the DSI debugfs files results in a crash, as wrong
pointer is passed to the dump functions, and the dump functions use a
wrong pointer. This patch fixes DSI debug dumps.
mtd: rawnand: gpmi: fix MX28 bus master lockup problem
Disable BCH soft reset according to MX23 erratum #2847 ("BCH soft
reset may cause bus master lock up") for MX28 too. It has the same
problem.
Observed problem: once per 100,000+ MX28 reboots NAND read failed on
DMA timeout errors:
[ 1.770823] UBI: attaching mtd3 to ubi0
[ 2.768088] gpmi_nand: DMA timeout, last DMA :1
[ 3.958087] gpmi_nand: BCH timeout, last DMA :1
[ 4.156033] gpmi_nand: Error in ECC-based read: -110
[ 4.161136] UBI warning: ubi_io_read: error -110 while reading 64
bytes from PEB 0:0, read only 0 bytes, retry
[ 4.171283] step 1 error
[ 4.173846] gpmi_nand: Chip: 0, Error -1
Without BCH soft reset we successfully executed 1,000,000 MX28 reboots.
I have a quote from NXP regarding this problem, from July 18th 2016:
"As the i.MX23 and i.MX28 are of the same generation, they share many
characteristics. Unfortunately, also the erratas may be shared.
In case of the documented erratas and the workarounds, you can also
apply the workaround solution of one device on the other one. This have
been reported, but I’m afraid that there are not an estimated date for
updating the Errata documents.
Please accept our apologies for any inconveniences this may cause."
Fixes: 6f2a6a52560a ("mtd: nand: gpmi: reset BCH earlier, too, to avoid NAND startup problems") Cc: stable@vger.kernel.org Signed-off-by: Manfred Schlaegl <manfred.schlaegl@ginzinger.com> Signed-off-by: Martin Kepplinger <martin.kepplinger@ginzinger.com> Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com> Reviewed-by: Fabio Estevam <festevam@gmail.com> Acked-by: Han Xu <han.xu@nxp.com> Signed-off-by: Boris Brezillon <bbrezillon@kernel.org>
Ville Syrjälä [Tue, 5 Feb 2019 14:18:46 +0000 (16:18 +0200)]
drm/i915: Try to sanitize bogus DPLL state left over by broken SNB BIOSen
Certain SNB machines (eg. ASUS K53SV) seem to have a broken BIOS
which misprograms the hardware badly when encountering a suitably
high resolution display. The programmed pipe timings are somewhat
bonkers and the DPLL is totally misprogrammed (P divider == 0).
That will result in atomic commit timeouts as apparently the pipe
is sufficiently stuck to not signal vblank interrupts.
IIRC something like this was also observed on some other SNB
machine years ago (might have been a Dell XPS 8300) but a BIOS
update cured it. Sadly looks like this was never fixed for the
ASUS K53SV as the latest BIOS (K53SV.320 11/11/2011) is still
broken.
The quickest way to deal with this seems to be to shut down
the pipe+ports+DPLL. Unfortunately doing this during the
normal sanitization phase isn't quite soon enough as we
already spew several WARNs about the bogus hardware state.
But it's better than hanging the boot for a few dozen seconds.
Since this is limited to a few old machines it doesn't seem
entirely worthwile to try and rework the readout+sanitization
code to handle it more gracefully.
v2: Fix potential NULL deref (kbuild test robot)
Constify has_bogus_dpll_config()
Takashi Iwai [Tue, 5 Feb 2019 16:57:27 +0000 (17:57 +0100)]
ALSA: hda/ca0132 - Fix build error without CONFIG_PCI
A call of pci_iounmap() call without CONFIG_PCI leads to a build error
on some architectures. We tried to address this and add a check of
IS_ENABLED(CONFIG_PCI), but this still doesn't seem enough for sh.
Ideally we should fix it globally, it's really a corner case, so let's
paper over it with a simpler ifdef.
Eric Dumazet [Tue, 5 Feb 2019 23:38:44 +0000 (15:38 -0800)]
mISDN: fix a race in dev_expire_timer()
Since mISDN_close() uses dev->pending to iterate over active
timers, there is a chance that one timer got removed from the
->pending list in dev_expire_timer() but that the thread
has not called yet wake_up_interruptible()
So mISDN_close() could miss this and free dev before
completion of at least one dev_expire_timer()
syzbot was able to catch this race :
BUG: KASAN: use-after-free in register_lock_class+0x140c/0x1bf0 kernel/locking/lockdep.c:827
Write of size 8 at addr ffff88809fc18948 by task syz-executor1/24769
Memory state around the buggy address: ffff88809fc18800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff88809fc18880: 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff88809fc18900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^ ffff88809fc18980: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ffff88809fc18a00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Karsten Keil <isdn@linux-pingi.de> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Tue, 5 Feb 2019 23:02:58 +0000 (00:02 +0100)]
net: dsa: mv88e6xxx: Fix counting of ATU violations
The ATU port vector contains a bit per port of the switch. The code
wrongly used it as a port number, and incremented a port counter. This
resulted in the wrong interfaces counter being incremented, and
potentially going off the end of the array of ports.
Fix this by using the source port ID for the violation, which really
is a port number.
Reported-by: Chris Healy <Chris.Healy@zii.aero> Tested-by: Chris Healy <Chris.Healy@zii.aero> Fixes: 65f60e4582bd ("net: dsa: mv88e6xxx: Keep ATU/VTU violation statistics") Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
drm/amd/display: Attach VRR properties for eDP connectors
[Why]
eDP was missing in the checks for supported VRR connectors.
[How]
Attach the properties for eDP connectors too.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202449 Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amdkfd: Fix if preprocessor statement above kfd_fill_iolink_info_for_cpu
Clang warns:
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_crat.c:866:5: warning:
'CONFIG_X86_64' is not defined, evaluates to 0 [-Wundef]
^
1 warning generated.
Fixes: d1c234e2cd10 ("drm/amdkfd: Allow building KFD on ARM64 (v2)") Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Philip Yang [Wed, 30 Jan 2019 20:21:16 +0000 (15:21 -0500)]
drm/amdgpu: use spin_lock_irqsave to protect vm_manager.pasid_idr
amdgpu_vm_get_task_info is called from interrupt handler and sched timeout
workqueue, we should use irq version spin_lock to avoid deadlock.
Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Boris Brezillon [Wed, 30 Jan 2019 11:55:52 +0000 (12:55 +0100)]
mtd: Make sure mtd->erasesize is valid even if the partition is of size 0
Commit 33f45c44d68b ("mtd: Do not allow MTD devices with inconsistent
erase properties") introduced a check to make sure ->erasesize and
->_erase values are consistent with the MTD_NO_ERASE flag.
This patch did not take the 0 bytes partition case into account which
can happen when the defined partition is outside the flash device memory
range. Fix that by setting the partition erasesize to the parent
erasesize.
Fixes: 33f45c44d68b ("mtd: Do not allow MTD devices with inconsistent erase properties") Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: <stable@vger.kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Boris Brezillon <bbrezillon@kernel.org> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Charles Keepax [Tue, 5 Feb 2019 16:29:40 +0000 (16:29 +0000)]
ALSA: compress: Fix stop handling on compressed capture streams
It is normal user behaviour to start, stop, then start a stream
again without closing it. Currently this works for compressed
playback streams but not capture ones.
The states on a compressed capture stream go directly from OPEN to
PREPARED, unlike a playback stream which moves to SETUP and waits
for a write of data before moving to PREPARED. Currently however,
when a stop is sent the state is set to SETUP for both types of
streams. This leaves a capture stream in the situation where a new
start can't be sent as that requires the state to be PREPARED and
a new set_params can't be sent as that requires the state to be
OPEN. The only option being to close the stream, and then reopen.
Correct this issues by allowing snd_compr_drain_notify to set the
state depending on the stream direction, as we already do in
set_params.
Fixes: 49bb6402f1aa ("ALSA: compress_core: Add support for capture streams") Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
Or Gerlitz [Thu, 10 Jan 2019 18:37:36 +0000 (20:37 +0200)]
net/mlx5e: Properly set steering match levels for offloaded TC decap rules
The match level computed by the driver gets to be wrong for decap
rules with wildcarded inner packet match such as:
tc filter add dev vxlan_sys_4789 protocol all parent ffff: prio 2 flower
enc_dst_ip 192.168.0.9 enc_key_id 100 enc_dst_port 4789
action tunnel_key unset
action mirred egress redirect dev eth1
The FW errs for a missing matching meta-data indicator for the outer
headers (where we do have a match), and a wrong matching meta-data
indicator for the inner headers (where we don't have a match).
Fix that by taking into account the matching on the tunnel info and
relating the match level of the encapsulated packet to the firmware
inner headers indicator in case of decap.
As for vxlan we mandate a match on the tunnel udp dst port, and in general
we practically madndate a match on the source or dest ip for any IP tunnel,
the fix was done in a minimal manner around the tunnel match parsing code.
Fixes: d708f902989b ('net/mlx5e: Get the required HW match level while parsing TC flow matches') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reported-by: Slava Ovsiienko <viacheslavo@mellanox.com> Reviewed-by: Jianbo Liu <jianbol@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Raed Salem [Mon, 17 Dec 2018 09:40:06 +0000 (11:40 +0200)]
net/mlx5e: FPGA, fix Innova IPsec TX offload data path performance
At Innova IPsec TX offload data path a special software parser metadata
is used to pass some packet attributes to the hardware, this metadata
is passed using the Ethernet control segment of a WQE (a HW descriptor)
header.
The cited commit might nullify this header, hence the metadata is lost,
this caused a significant performance drop during hw offloading
operation.
Fix by restoring the metadata at the Ethernet control segment in case
it was nullified.
The following patchset contains Netfilter fixes for net:
1) Use CONFIG_NF_TABLES_INET from seltests, not NF_TABLES_INET.
From Naresh Kamboju.
2) Add a test to cover masquerading and redirect case, from Florian
Westphal.
3) Two packets coming from the same socket may race to set up NAT,
ending up with different tuples and the packet losing race being
dropped. Update nf_conntrack_tuple_taken() to exercise clash
resolution for this case. From Martynas Pumputis and Florian
Westphal.
4) Unbind anonymous sets from the commit and abort path, this fixes
a splat due to double set list removal/release in case that the
transaction needs to be aborted.
5) Do not preserve original output interface for packets that are
redirected in the output chain when ip6_route_me_harder() is
called. Otherwise packets end up going not going to the loopback
device. From Eli Cooper.
6) Fix bogus splat in nft_compat with CONFIG_REFCOUNT_FULL=y, this
also simplifies the existing logic to deal with the list insertions
of the xtables extensions. From Florian Westphal.
Diffstat look rather larger than usual because of the new selftest, but
Florian and I consider that having tests soon into the tree is good to
improve coverage. If there's a different policy in this regard, please,
let me know.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Udo Eberhardt [Tue, 5 Feb 2019 16:20:47 +0000 (17:20 +0100)]
ALSA: usb-audio: Add support for new T+A USB DAC
This patch adds the T+A VID to the generic check in order to enable
native DSD support for T+A devices. This works with the new T+A USB
DAC model SD3100HV and will also work with future devices which
support the XMOS/Thesycon style DSD format.
signal: Always attempt to allocate siginfo for SIGSTOP
Since 2.5.34 the code has had the potential to not allocate siginfo
for SIGSTOP signals. Except for ptrace this is perfectly fine as only
ptrace can use PTRACE_PEEK_SIGINFO and see what the contents of
the delivered siginfo are.
Users of PTRACE_PEEK_SIGINFO that care about the contents siginfo
for SIGSTOP are rare, but they do exist. A seccomp self test
has cared and lldb cares.
Jack Andersen <jackoalan@gmail.com> writes:
> The patch titled
> `signal: Never allocate siginfo for SIGKILL or SIGSTOP`
> created a regression for users of PTRACE_GETSIGINFO needing to
> discern signals that were raised via the tgkill syscall.
>
> A notable user of this tgkill+ptrace combination is lldb while
> debugging a multithreaded program. Without the ability to detect a
> SIGSTOP originating from tgkill, lldb does not have a way to
> synchronize on a per-thread basis and falls back to SIGSTOP-ing the
> entire process.
Everyone affected by this please note. The kernel can still fail to
allocate a siginfo structure. The allocation is with GFP_KERNEL and
is best effort only. If memory is tight when the signal allocation
comes in this will fail to allocate a siginfo.
So I strongly recommend looking at more robust solutions for
synchronizing with a single thread such as PTRACE_INTERRUPT. Or if
that does not work persuading your friendly local kernel developer to
build the interface you need.
Reported-by: Tycho Andersen <tycho@tycho.ws> Reported-by: Kees Cook <keescook@chromium.org> Reported-by: Jack Andersen <jackoalan@gmail.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Christian Brauner <christian@brauner.io> Cc: stable@vger.kernel.org Fixes: f149b3155744 ("signal: Never allocate siginfo for SIGKILL or SIGSTOP") Fixes: 6dfc88977e42 ("[PATCH] shared thread signals")
History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Tony Jones [Thu, 24 Jan 2019 00:52:29 +0000 (16:52 -0800)]
perf script python: Add Python3 support to tests/attr.py
Support both Python 2 and Python 3 in tests/attr.py
The use of "except as" syntax implies the minimum supported Python2 version is
now v2.6
Committer testing:
$ make -C tools/perf PYTHON3=python install-bin
Before:
# perf test attr
16: Setup struct perf_event_attr : FAILED!
48: Synthesize attr update : Ok
[root@quaco ~]# perf test -v attr
16: Setup struct perf_event_attr :
--- start ---
test child forked, pid 3121
File "/home/acme/libexec/perf-core/tests/attr.py", line 324
except Unsup, obj:
^
SyntaxError: invalid syntax
test child finished with -1
---- end ----
Setup struct perf_event_attr: FAILED!
48: Synthesize attr update :
--- start ---
test child forked, pid 3124
test child finished with 0
---- end ----
Synthesize attr update: Ok
#
After:
# perf test attr
16: Setup struct perf_event_attr : Ok
48: Synthesize attr update : Ok
#
Signed-off-by: Tony Jones <tonyj@suse.de> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Seeteena Thoufeek <s1seetee@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20190124005229.16146-7-tonyj@suse.de Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
netfilter: nft_compat: don't use refcount_inc on newly allocated entry
When I moved the refcount to refcount_t type I missed the fact that
refcount_inc() will result in use-after-free warning with
CONFIG_REFCOUNT_FULL=y builds.
The correct fix would be to init the reference count to 1 at allocation
time, but, unfortunately we cannot do this, as we can't undo that
in case something else fails later in the batch.
So only solution I see is to special-case the 'new entry' condition
and replace refcount_inc() with a "delayed" refcount_set(1) in this case,
as done here.
The .activate callback can be removed to simplify things, we only
need to make sure that deactivate() decrements/unlinks the entry
from the list at end of transaction phase (commit or abort).
Fixes: 12c44aba6618 ("netfilter: nft_compat: use refcnt_t type for nft_xt reference count") Reported-by: Jordan Glover <Golden_Miller83@protonmail.ch> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Eli Cooper [Mon, 21 Jan 2019 10:45:27 +0000 (18:45 +0800)]
netfilter: ipv6: Don't preserve original oif for loopback address
Commit 508b09046c0f ("netfilter: ipv6: Preserve link scope traffic
original oif") made ip6_route_me_harder() keep the original oif for
link-local and multicast packets. However, it also affected packets
for the loopback address because it used rt6_need_strict().
REDIRECT rules in the OUTPUT chain rewrite the destination to loopback
address; thus its oif should not be preserved. This commit fixes the bug
that redirected local packets are being dropped. Actually the packet was
not exactly dropped; Instead it was sent out to the original oif rather
than lo. When a packet with daddr ::1 is sent to the router, it is
effectively dropped.
Fixes: 508b09046c0f ("netfilter: ipv6: Preserve link scope traffic original oif") Signed-off-by: Eli Cooper <elicooper@gmx.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Thomas Hellstrom [Mon, 28 Jan 2019 09:31:33 +0000 (10:31 +0100)]
drm/vmwgfx: Fix setting of dma masks
Previously we set only the dma mask and not the coherent mask. Fix that.
Also, for clarity, make sure both are initially set to 64 bits.
Cc: <stable@vger.kernel.org> Fixes: 0d00c488f3de: ("drm/vmwgfx: Fix the driver for large dma addresses") Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Deepak Rawat <drawat@vmware.com>
Deepak Rawat [Fri, 21 Dec 2018 22:38:35 +0000 (14:38 -0800)]
drm/vmwgfx: Also check for crtc status while checking for DU active
During modeset check it is possible to have all crtc_state's in atomic
state. Check for crtc enable status while checking for display unit
active status. Only error if enabling a crtc while display unit is not
active.
Cc: <stable@vger.kernel.org> Fixes: 9da6e26c0aae: ("drm/vmwgfx: Fix a layout race condition") Signed-off-by: Deepak Rawat <drawat@vmware.com> Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com> Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Thomas Hellstrom [Thu, 31 Jan 2019 09:52:21 +0000 (10:52 +0100)]
drm/vmwgfx: Fix an uninitialized fence handle value
if vmw_execbuf_fence_commands() fails, The handle value will be
uninitialized and a bogus fence handle might be copied to user-space.
Cc: <stable@vger.kernel.org> Fixes: 2724b2d54cda: ("drm/vmwgfx: Use new validation interface for the modesetting code v2") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com> #v1 Reviewed-by: Sinclair Yeh <syeh@vmware.com> #v1 Reviewed-by: Deepak Rawat <drawat@vmware.com>
Thomas Hellstrom [Thu, 31 Jan 2019 09:55:37 +0000 (10:55 +0100)]
drm/vmwgfx: Return error code from vmw_execbuf_copy_fence_user
The function was unconditionally returning 0, and a caller would have to
rely on the returned fence pointer being NULL to detect errors. However,
the function vmw_execbuf_copy_fence_user() would expect a non-zero error
code in that case and would BUG otherwise.
So make sure we return a proper non-zero error code if the fence pointer
returned is NULL.
Tony Lindgren [Thu, 10 Jan 2019 15:59:16 +0000 (07:59 -0800)]
i2c: omap: Use noirq system sleep pm ops to idle device for suspend
We currently get the following error with pixcir_ts driver during a
suspend resume cycle:
omap_i2c 4802a000.i2c: controller timed out
pixcir_ts 1-005c: pixcir_int_enable: can't read reg 0x34 : -110
pixcir_ts 1-005c: Failed to disable interrupt generation: -110
pixcir_ts 1-005c: Failed to stop
dpm_run_callback(): pixcir_i2c_ts_resume+0x0/0x98
[pixcir_i2c_ts] returns -110
PM: Device 1-005c failed to resume: error -110
And at least am437x based devices with pixcir_ts will fail to resume
to a touchscreen that is configured as the wakeup-source in device
tree for these devices.
This is because pixcir_ts tries to reconfigure it's registers for
noirq suspend which fails. This also leaves i2c-omap in enabled state
for suspend.
Let's fix the pixcir_ts issue and make sure i2c-omap is suspended by
adding SET_NOIRQ_SYSTEM_SLEEP_PM_OPS.
Let's also get rid of some ifdefs while at it and replace them with
__maybe_unused as SET_RUNTIME_PM_OPS and SET_NOIRQ_SYSTEM_SLEEP_PM_OPS
already deal with the various PM Kconfig options.
Reported-by: Keerthy <j-keerthy@ti.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Acked-by: Vignesh R <vigneshr@ti.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Vaibhav Jain [Wed, 30 Jan 2019 12:26:51 +0000 (17:56 +0530)]
scsi: cxlflash: Prevent deadlock when adapter probe fails
Presently when an error is encountered during probe of the cxlflash
adapter, a deadlock is seen with cpu thread stuck inside
cxlflash_remove(). Below is the trace of the deadlock as logged by
khungtaskd:
cxlflash 0006:00:00.0: cxlflash_probe: init_afu failed rc=-16
INFO: task kworker/80:1:890 blocked for more than 120 seconds.
Not tainted 5.0.0-rc4-capi2-kexec+ #2
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/80:1 D 0 890 2 0x00000808
Workqueue: events work_for_cpu_fn
Call Trace:
0x4d72136320 (unreliable)
__switch_to+0x2cc/0x460
__schedule+0x2bc/0xac0
schedule+0x40/0xb0
cxlflash_remove+0xec/0x640 [cxlflash]
cxlflash_probe+0x370/0x8f0 [cxlflash]
local_pci_probe+0x6c/0x140
work_for_cpu_fn+0x38/0x60
process_one_work+0x260/0x530
worker_thread+0x280/0x5d0
kthread+0x1a8/0x1b0
ret_from_kernel_thread+0x5c/0x80
INFO: task systemd-udevd:5160 blocked for more than 120 seconds.
The deadlock occurs as cxlflash_remove() is called from cxlflash_probe()
without setting 'cxlflash_cfg->state' to STATE_PROBED and the probe thread
starts to wait on 'cxlflash_cfg->reset_waitq'. Since the device was never
successfully probed the 'cxlflash_cfg->state' never changes from
STATE_PROBING hence the deadlock occurs.
We fix this deadlock by setting the variable 'cxlflash_cfg->state' to
STATE_PROBED in case an error occurs during cxlflash_probe() and just
before calling cxlflash_remove().
Cc: stable@vger.kernel.org Fixes: c21e0bbfc485("cxlflash: Base support for IBM CXL Flash Adapter") Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
It added a warning whose intent was to check whether the rport was still
linked into the peer list. It doesn't work as intended and gives false
positive warnings for two reasons:
1) If the rport is never linked into the peer list it will not be
considered empty since the list_head is never initialized.
2) If the rport is deleted from the peer list using list_del_rcu(), then
the list_head is in an undefined state and it is not considered empty.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Damien Le Moal [Wed, 30 Jan 2019 06:54:58 +0000 (15:54 +0900)]
scsi: sd_zbc: Fix zone information messages
Commit bf5054569653 ("block: Introduce blk_revalidate_disk_zones()")
inadvertently broke the message output of sd_zbc_print_zones() because the
zone information initialization of the scsi disk structure was moved to the
second scan run while sd_zbc_print_zones() is called on the first
scan. This leads to the following incorrect message to be printed for any
ZBC or ZAC zoned disks.
"...[sdX] 4294967295 zones of 0 logical blocks + 1 runt zone"
Fix this by initializing sdkp zone size and number of zones early on the
first scan. This does not impact the execution of
blk_revalidate_zones(). This functions is still called only once the block
device capacity is set on the second revalidate run on boot, or if the disk
zone configuration changed (i.e. the disk changed).
Fixes: bf5054569653 ("block: Introduce blk_revalidate_disk_zones()") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
scsi: target: make the pi_prot_format ConfigFS path readable
pi_prot_format conversion to write-only caused userspace breakage. Make the
ConfigFS path readable again and hardcode the "0\n" content, matching
previous output.
Fixes: 6baca7601bde ("scsi: target: drop unused pi_prot_format attribute storage") Link: https://bugzilla.redhat.com/show_bug.cgi?id=1667505 Reported-by: Lee Duncan <lduncan@suse.com> Reported-by: Laura Abbott <labbott@redhat.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: David Disseldorp <ddiss@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This looks fairly harmless (no actual deadlock occurs), and is
fixed in a similar way to c6894dec8ea9 ("bridge: fix lockdep
addr_list_lock false positive splat") by putting the addr_list_lock
in its own lockdep class.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Rundong Ge [Sat, 2 Feb 2019 14:29:35 +0000 (14:29 +0000)]
net: dsa: slave: Don't propagate flag changes on down slave interfaces
The unbalance of master's promiscuity or allmulti will happen after ifdown
and ifup a slave interface which is in a bridge.
When we ifdown a slave interface , both the 'dsa_slave_close' and
'dsa_slave_change_rx_flags' will clear the master's flags. The flags
of master will be decrease twice.
In the other hand, if we ifup the slave interface again, since the
slave's flags were cleared the 'dsa_slave_open' won't set the master's
flag, only 'dsa_slave_change_rx_flags' that triggered by 'br_add_if'
will set the master's flags. The flags of master is increase once.
Only propagating flag changes when a slave interface is up makes
sure this does not happen. The 'vlan_dev_change_rx_flags' had the
same problem and was fixed, and changes here follows that fix.
Fixes: 91da11f870f0 ("net: Distributed Switch Architecture protocol support") Signed-off-by: Rundong Ge <rdong.ge@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jun-Ru Chang [Tue, 29 Jan 2019 03:56:07 +0000 (11:56 +0800)]
MIPS: Remove function size check in get_frame_info()
Patch (b6c7a324df37b "MIPS: Fix get_frame_info() handling of
microMIPS function size.") introduces additional function size
check for microMIPS by only checking insn between ip and ip + func_size.
However, func_size in get_frame_info() is always 0 if KALLSYMS is not
enabled. This causes get_frame_info() to return immediately without
calculating correct frame_size, which in turn causes "Can't analyze
schedule() prologue" warning messages at boot time.
This patch removes func_size check, and let the frame_size check run
up to 128 insns for both MIPS and microMIPS.
Signed-off-by: Jun-Ru Chang <jrjang@realtek.com> Signed-off-by: Tony Wu <tonywu@realtek.com> Signed-off-by: Paul Burton <paul.burton@mips.com> Fixes: b6c7a324df37b ("MIPS: Fix get_frame_info() handling of microMIPS function size.") Cc: <ralf@linux-mips.org> Cc: <jhogan@kernel.org> Cc: <macro@mips.com> Cc: <yamada.masahiro@socionext.com> Cc: <peterz@infradead.org> Cc: <mingo@kernel.org> Cc: <linux-mips@vger.kernel.org> Cc: <linux-kernel@vger.kernel.org>
Paul Burton [Mon, 4 Feb 2019 19:53:53 +0000 (19:53 +0000)]
MIPS: Use lower case for addresses in nexys4ddr.dts
DTC introduced an i2c_bus_reg check in v1.4.7, used since Linux v4.20,
which complains about upper case addresses used in the unit name.
nexys4ddr.dts names an I2C device node "ad7420@4B", leading to:
arch/mips/boot/dts/xilfpga/nexys4ddr.dts:109.16-112.8: Warning
(i2c_bus_reg): /i2c@10A00000/ad7420@4B: I2C bus unit address format
error, expected "4b"
Fix this by switching to lower case addresses throughout the file, as is
*mostly* the case in the file already & fairly standard throughout the
tree.
Signed-off-by: Paul Burton <paul.burton@mips.com> Cc: stable@vger.kernel.org # v4.20+ Cc: linux-mips@vger.kernel.org
Huacai Chen [Tue, 15 Jan 2019 08:04:54 +0000 (16:04 +0800)]
MIPS: Loongson: Introduce and use loongson_llsc_mb()
On the Loongson-2G/2H/3A/3B there is a hardware flaw that ll/sc and
lld/scd is very weak ordering. We should add sync instructions "before
each ll/lld" and "at the branch-target between ll/sc" to workaround.
Otherwise, this flaw will cause deadlock occasionally (e.g. when doing
heavy load test with LTP).
Below is the explaination of CPU designer:
"For Loongson 3 family, when a memory access instruction (load, store,
or prefetch)'s executing occurs between the execution of LL and SC, the
success or failure of SC is not predictable. Although programmer would
not insert memory access instructions between LL and SC, the memory
instructions before LL in program-order, may dynamically executed
between the execution of LL/SC, so a memory fence (SYNC) is needed
before LL/LLD to avoid this situation.
Since Loongson-3A R2 (3A2000), we have improved our hardware design to
handle this case. But we later deduce a rarely circumstance that some
speculatively executed memory instructions due to branch misprediction
between LL/SC still fall into the above case, so a memory fence (SYNC)
at branch-target (if its target is not between LL/SC) is needed for
Loongson 3A1000, 3B1500, 3A2000 and 3A3000.
Our processor is continually evolving and we aim to to remove all these
workaround-SYNCs around LL/SC for new-come processor."
Here is an example:
Both cpu1 and cpu2 simutaneously run atomic_add by 1 on same atomic var,
this bug cause both 'sc' run by two cpus (in atomic_add) succeed at same
time('sc' return 1), and the variable is only *added by 1*, sometimes,
which is wrong and unacceptable(it should be added by 2).
Why disable fix-loongson3-llsc in compiler?
Because compiler fix will cause problems in kernel's __ex_table section.
This patch fix all the cases in kernel, but:
+. the fix at the end of futex_atomic_cmpxchg_inatomic is for branch-target
of 'bne', there other cases which smp_mb__before_llsc() and smp_llsc_mb() fix
the ll and branch-target coincidently such as atomic_sub_if_positive/
cmpxchg/xchg, just like this one.
+. Loongson 3 does support CONFIG_EDAC_ATOMIC_SCRUB, so no need to touch
edac.h
+. local_ops and cmpxchg_local should not be affected by this bug since
only the owner can write.
+. mips_atomic_set for syscall.c is deprecated and rarely used, just let
it go
Signed-off-by: Huacai Chen <chenhc@lemote.com> Signed-off-by: Huang Pei <huangpei@loongson.cn>
[paul.burton@mips.com:
- Simplify the addition of -mno-fix-loongson3-llsc to cflags, and add
a comment describing why it's there.
- Make loongson_llsc_mb() a no-op when
CONFIG_CPU_LOONGSON3_WORKAROUNDS=n, rather than a compiler memory
barrier.
- Add a comment describing the bug & how loongson_llsc_mb() helps
in asm/barrier.h.] Signed-off-by: Paul Burton <paul.burton@mips.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: ambrosehua@gmail.com Cc: Steven J . Hill <Steven.Hill@cavium.com> Cc: linux-mips@linux-mips.org Cc: Fuxin Zhang <zhangfx@lemote.com> Cc: Zhangjin Wu <wuzhangjin@gmail.com> Cc: Li Xuefeng <lixuefeng@loongson.cn> Cc: Xu Chenghua <xuchenghua@loongson.cn>
With a suitably defined "probe:vfs_getname" probe, 'perf trace' can
"beautify" its output, so syscalls like open() or openat() can print the
"filename" argument instead of just its hex address, like:
[root@quaco ~]# perf test vfs
65: Use vfs_getname probe to get syscall args filenames : Ok
66: Add vfs_getname probe to get syscall args filenames : Ok
67: Check open filename arg using perf trace + vfs_getname: Ok
[root@quaco ~]#
Reported-by: Michael Petlan <mpetlan@redhat.com> Tested-by: Michael Petlan <mpetlan@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-mv8kolk17xla1smvmp3qabv1@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Those symbols have no use for report or annotation and should be
skipped. Moreover they interfere with the DWARF unwind test on the PPC
arch, where they are mixed with checked symbols and then the test fails:
# perf test dwarf -v
59: Test dwarf unwind :
--- start ---
test child forked, pid 8515
unwind: .annobin_dwarf_unwind.c:ip = 0x10dba40dc (0x2740dc)
...
got: .annobin_dwarf_unwind.c 0x10dba40dc, expecting test__arch_unwind_sample
unwind: failed with 'no error'
The annobin symbols are defined as NOTYPE/LOCAL/HIDDEN:
# readelf -s ./perf | grep annobin | head -1
40: 00000000001bce4f 0 NOTYPE LOCAL HIDDEN 13 .annobin_init.c
They can still pass the check for the label symbol. Adding check for
HIDDEN and INTERNAL (as suggested by Nick below) visibility and filter
out such symbols.
> Just to be awkward, if you are going to ignore STV_HIDDEN
> symbols then you should probably also ignore STV_INTERNAL ones
> as well... Annobin does not generate them, but you never know,
> one day some other tool might create some.
Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Clifton <nickc@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20190128133526.GD15461@krava Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf symbols: Add fallback definitions for GELF_ST_VISIBILITY()
Those aren't present in Alpine Linux 3.4 to edge, so provide fallback
defines to get the next patch building there keeping the build
bisectable.
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Clifton <nickc@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/n/tip-03cg3gya2ju4ba2x6ibb9fuz@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
David S. Miller [Mon, 4 Feb 2019 17:43:48 +0000 (09:43 -0800)]
Merge branch 's390-qeth-fixes'
Julian Wiedmann says:
====================
s390/qeth: fixes 2019-02-04
please apply the following four fixes to -net.
Patch 1 takes care of a common resource leak in various error paths, while the
second patch fixes a misordered kfree when cleaning up after an error.
The other two patches ensure that there's no stale work dangling on workqueues
when the qeth device has already been offlined and/or removed.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Mon, 4 Feb 2019 16:40:09 +0000 (17:40 +0100)]
s390/qeth: conclude all event processing before offlining a card
Work for Bridgeport events is currently placed on a driver-wide
workqueue. If the card is removed and freed while any such work is still
active, this causes a use-after-free.
So put the events on a per-card queue, where we can control their
lifetime. As we also don't want stale events to last beyond an
offline & online cycle, flush this queue when setting the card offline.
Fixes: b4d72c08b358 ("qeth: bridgeport support - basic control") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Mon, 4 Feb 2019 16:40:08 +0000 (17:40 +0100)]
s390/qeth: cancel close_dev work before removing a card
A card's close_dev work is scheduled on a driver-wide workqueue. If the
card is removed and freed while the work is still active, this causes a
use-after-free.
So make sure that the work is completed before freeing the card.
Fixes: 0f54761d167f ("qeth: Support VEPA mode") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Mon, 4 Feb 2019 16:40:07 +0000 (17:40 +0100)]
s390/qeth: fix use-after-free in error path
The error path in qeth_alloc_qdio_buffers() that takes care of
cleaning up the Output Queues is buggy. It first frees the queue, but
then calls qeth_clear_outq_buffers() with that very queue struct.
Make the call to qeth_clear_outq_buffers() part of the free action
(in the correct order), and while at it fix the naming of the helper.
Fixes: 0da9581ddb0f ("qeth: exploit asynchronous delivery of storage blocks") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Mon, 4 Feb 2019 16:40:06 +0000 (17:40 +0100)]
s390/qeth: release cmd buffer in error paths
Whenever we fail before/while starting an IO, make sure to release the
IO buffer. Usually qeth_irq() would do this for us, but if the IO
doesn't even start we obviously won't get an interrupt for it either.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Mon, 4 Feb 2019 14:50:38 +0000 (14:50 +0000)]
net: cls_flower: Remove filter from mask before freeing it
In fl_change(), when adding a new rule (i.e. fold == NULL), a driver may
reject the new rule, for example due to resource exhaustion. By that
point, the new rule was already assigned a mask, and it was added to
that mask's hash table. The clean-up path that's invoked as a result of
the rejection however neglects to undo the hash table addition, and
proceeds to free the new rule, thus leaving a dangling pointer in the
hash table.
Fix by removing fnew from the mask's hash table before it is freed.
Fixes: 35cc3cefc4de ("net/sched: cls_flower: Reject duplicated rules also under skip_sw") Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 4 Feb 2019 17:11:19 +0000 (09:11 -0800)]
Merge branch 'smc-fixes'
Ursula Braun says:
====================
net/smc: fixes 2019-02-04
here are more fixes in the smc code for the net tree:
Patch 1 fixes an IB-related problem with SMCR.
Patch 2 fixes a cursor problem for one-way traffic.
Patch 3 fixes a problem with RMB-reusage.
Patch 4 fixes a closing issue.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ursula Braun [Mon, 4 Feb 2019 12:44:47 +0000 (13:44 +0100)]
net/smc: correct state change for peer closing
If some kind of closing is received from the peer while still in
state SMC_INIT, it means the peer has had an active connection and
closed the socket quickly before listen_work finished. This should
not result in a shortcut from state SMC_INIT to state SMC_CLOSED.
This patch adds the socket to the accept queue in state
SMC_APPCLOSEWAIT1. The socket reaches state SMC_CLOSED once being
accepted and closed with smc_release().
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ursula Braun [Mon, 4 Feb 2019 12:44:46 +0000 (13:44 +0100)]
net/smc: delete rkey first before switching to unused
Once RMBs are flagged as unused they are candidates for reuse.
Thus the LLC DELETE RKEY operaton should be made before flagging
the RMB as unused.
Fixes: c7674c001b11 ("net/smc: unregister rkeys of unused buffer") Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ursula Braun [Mon, 4 Feb 2019 12:44:45 +0000 (13:44 +0100)]
net/smc: fix sender_free computation
In some scenarios a separate consumer cursor update is necessary.
The decision is made in smc_tx_consumer_cursor_update(). The
sender_free computation could be wrong:
The rx confirmed cursor is always smaller than or equal to the
rx producer cursor. The parameters in the smc_curs_diff() call
have to be exchanged, otherwise sender_free might even be negative.
And if more data arrives local_rx_ctrl.prod might be updated, enabling
a cursor difference between local_rx_ctrl.prod and rx confirmed cursor
larger than the RMB size. This case is not covered by smc_curs_diff().
Thus function smc_curs_diff_large() is introduced here.
If a recvmsg() is processed in parallel, local_tx_ctrl.cons might
change during smc_cdc_msg_send. Make sure rx_curs_confirmed is updated
with the actually sent local_tx_ctrl.cons value.
Fixes: e82f2e31f559 ("net/smc: optimize consumer cursor updates") Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ursula Braun [Mon, 4 Feb 2019 12:44:44 +0000 (13:44 +0100)]
net/smc: preallocated memory for rdma work requests
The work requests for rdma writes are built in local variables within
function smc_tx_rdma_write(). This violates the rule that the work
request storage has to stay till the work request is confirmed by
a completion queue response.
This patch introduces preallocated memory for these work requests.
The storage is allocated, once a link (and thus a queue pair) is
established.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
During sendmsg() a cloned skb is saved via dp83640_txtstamp() in
->tx_queue. After the NIC sends this packet, the PHY will reply with a
timestamp for that TX packet. If the cable is pulled at the right time I
don't see that packet. It might gets flushed as part of queue shutdown
on NIC's side.
Once the link is up again then after the next sendmsg() we enqueue
another skb in dp83640_txtstamp() and have two on the list. Then the PHY
will send a reply and decode_txts() attaches it to the first skb on the
list.
No crash occurs since refcounting works but we are one packet behind.
linuxptp/ptp4l usually closes the socket and opens a new one (in such a
timeout case) so those "stale" replies never get there. However it does
not resume normal operation anymore.
Purge old skbs in decode_txts().
Fixes: cb646e2b02b2 ("ptp: Added a clock driver for the National Semiconductor PHYTER.") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
netfilter: nf_tables: unbind set in rule from commit path
Anonymous sets that are bound to rules from the same transaction trigger
a kernel splat from the abort path due to double set list removal and
double free.
This patch updates the logic to search for the transaction that is
responsible for creating the set and disable the set list removal and
release, given the rule is now responsible for this. Lookup is reverse
since the transaction that adds the set is likely to be at the tail of
the list.
Moreover, this patch adds the unbind step to deliver the event from the
commit path. This should not be done from the worker thread, since we
have no guarantees of in-order delivery to the listener.
This patch removes the assumption that both activate and deactivate
callbacks need to be provided.
Fixes: cd5125d8f518 ("netfilter: nf_tables: split set destruction in deactivate and destroy phase") Reported-by: Mikhail Morfikov <mmorfikov@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Will Deacon [Mon, 4 Feb 2019 14:37:38 +0000 (14:37 +0000)]
arm64: ptdump: Don't iterate kernel page tables using PTRS_PER_PXX
When 52-bit virtual addressing is enabled for userspace
(CONFIG_ARM64_USER_VA_BITS_52=y), the kernel continues to utilise 48-bit
virtual addressing in TTBR1. Consequently, PTRS_PER_PGD reflects the
larger page table size for userspace and the pgd pointer for kernel page
tables is offset before being written to TTBR1.
This means that we can't use PTRS_PER_PGD to iterate over kernel page
tables unless we apply the same offset, which is fiddly to get right and
leads to some non-idiomatic walking code. Instead, just follow the usual
pattern when walking page tables by using a while loop driven by
pXd_offset() and pXd_addr_end().
Reported-by: Qian Cai <cai@lca.pw> Tested-by: Qian Cai <cai@lca.pw> Acked-by: Steve Capper <steve.capper@arm.com> Tested-by: Steve Capper <steve.capper@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
tools headers uapi: Sync linux/in.h copy from the kernel sources
To get the changes in this cset:
f275ee0fa3a0 ("IN_BADCLASS: fix macro to actually work")
The macros changed in this cset are not used in tools/, so this is just
to silence this perf tools build warning:
Warning: Kernel ABI header at 'tools/include/uapi/linux/in.h' differs from latest version at 'include/uapi/linux/in.h'
diff -u tools/include/uapi/linux/in.h include/uapi/linux/in.h
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-xbk34kwamn8bw8ywpuxetct9@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
$ make -C tools/perf CC=clang LIBCLANGLLVM=1
<SNIP>
util/c++/clang.cpp: In function 'std::unique_ptr<llvm::SmallVectorImpl<char> > perf::getBPFObjectFromModule(llvm::Module*)':
util/c++/clang.cpp:163:18: error: moving a local object in a return statement prevents copy elision [-Werror=pessimizing-move]
163 | return std::move(Buffer);
| ~~~~~~~~~^~~~~~~~
util/c++/clang.cpp:163:18: note: remove 'std::move' call
cc1plus: all warnings being treated as errors
<SNIP>
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-lehqf5x5q96l0o8myhb6blz6@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ravi Bangoria [Tue, 29 Jan 2019 13:24:12 +0000 (18:54 +0530)]
perf mem/c2c: Fix perf_mem_events to support powerpc
PowerPC hardware does not have a builtin latency filter (--ldlat) for
the "mem-load" event and perf_mem_events by default includes
"/ldlat=30/" which is causing a failure on PowerPC. Refactor the code to
support "perf mem/c2c" on PowerPC.
This patch depends on kernel side changes done my Madhavan:
https://lists.ozlabs.org/pipermail/linuxppc-dev/2018-December/182596.html
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Dick Fowles <fowles@inreach.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Joe Mario <jmario@redhat.com> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Namhyung Kim <namhyung@kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/20190129132412.771-1-ravi.bangoria@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Notice that the use of the bitwise OR operator '|' always leads to true
in this particular case, which seems a bit suspicious due to the context
in which this expression is being used.
Fix this by using bitwise AND operator '&' instead.
This bug was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Fixes: 6a6cd11d4e57 ("perf test: Add test for the sched tracepoint format fields") Link: http://lkml.kernel.org/r/20190122233439.GA5868@embeddedor Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
netfilter: nf_nat: skip nat clash resolution for same-origin entries
It is possible that two concurrent packets originating from the same
socket of a connection-less protocol (e.g. UDP) can end up having
different IP_CT_DIR_REPLY tuples which results in one of the packets
being dropped.
To illustrate this, consider the following simplified scenario:
1. Packet A and B are sent at the same time from two different threads
by same UDP socket. No matching conntrack entry exists yet.
Both packets cause allocation of a new conntrack entry.
2. get_unique_tuple gets called for A. No clashing entry found.
conntrack entry for A is added to main conntrack table.
3. get_unique_tuple is called for B and will find that the reply
tuple of B is already taken by A.
It will allocate a new UDP source port for B to resolve the clash.
4. conntrack entry for B cannot be added to main conntrack table
because its ORIGINAL direction is clashing with A and the REPLY
directions of A and B are not the same anymore due to UDP source
port reallocation done in step 3.
This patch modifies nf_conntrack_tuple_taken so it doesn't consider
colliding reply tuples if the IP_CT_DIR_ORIGINAL tuples are equal.
[ Florian: simplify patch to not use .allow_clash setting
and always ignore identical flows ]
Signed-off-by: Martynas Pumputis <martynas@weave.works> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Fixes: 4076e755dbec ("dmatest: convert to dmaengine_unmap_data") Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Vinod Koul <vkoul@kernel.org>
Mark Rutland [Thu, 10 Jan 2019 14:27:45 +0000 (14:27 +0000)]
perf/core: Don't WARN() for impossible ring-buffer sizes
The perf tool uses /proc/sys/kernel/perf_event_mlock_kb to determine how
large its ringbuffer mmap should be. This can be configured to arbitrary
values, which can be larger than the maximum possible allocation from
kmalloc.
When this is configured to a suitably large value (e.g. thanks to the
perf fuzzer), attempting to use perf record triggers a WARN_ON_ONCE() in
__alloc_pages_nodemask():
WARNING: CPU: 2 PID: 5666 at mm/page_alloc.c:4511 __alloc_pages_nodemask+0x3f8/0xbc8
Let's avoid this by checking that the requested allocation is possible
before calling kzalloc.
Reported-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Julien Thierry <julien.thierry@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20190110142745.25495-1-mark.rutland@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Wed, 19 Dec 2018 16:53:50 +0000 (17:53 +0100)]
perf/x86/intel: Delay memory deallocation until x86_pmu_dead_cpu()
intel_pmu_cpu_prepare() allocated memory for ->shared_regs among other
members of struct cpu_hw_events. This memory is released in
intel_pmu_cpu_dying() which is wrong. The counterpart of the
intel_pmu_cpu_prepare() callback is x86_pmu_dead_cpu().
Otherwise if the CPU fails on the UP path between CPUHP_PERF_X86_PREPARE
and CPUHP_AP_PERF_X86_STARTING then it won't release the memory but
allocate new memory on the next attempt to online the CPU (leaking the
old memory).
Also, if the CPU down path fails between CPUHP_AP_PERF_X86_STARTING and
CPUHP_PERF_X86_PREPARE then the CPU will go back online but never
allocate the memory that was released in x86_pmu_dying_cpu().
Make the memory allocation/free symmetrical in regard to the CPU hotplug
notifier by moving the deallocation to intel_pmu_cpu_dead().
This started in commit:
a7e3ed1e47011 ("perf: Add support for supplementary event registers").
In principle the bug was introduced in v2.6.39 (!), but it will almost
certainly not backport cleanly across the big CPU hotplug rewrite between v4.7-v4.15...
Reported-by: He Zhe <zhe.he@windriver.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> # With developer hat on Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> # With maintainer hat on Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@kernel.org Cc: bp@alien8.de Cc: hpa@zytor.com Cc: jolsa@kernel.org Cc: kan.liang@linux.intel.com Cc: namhyung@kernel.org Cc: <stable@vger.kernel.org> Fixes: a7e3ed1e47011 ("perf: Add support for supplementary event registers"). Link: https://lkml.kernel.org/r/20181219165350.6s3jvyxbibpvlhtq@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
Kan Liang [Sun, 27 Jan 2019 14:53:14 +0000 (06:53 -0800)]
perf/x86/intel/uncore: Add Node ID mask
Some PCI uncore PMUs cannot be registered on an 8-socket system (HPE
Superdome Flex).
To understand which Socket the PCI uncore PMUs belongs to, perf retrieves
the local Node ID of the uncore device from CPUNODEID(0xC0) of the PCI
configuration space, and the mapping between Socket ID and Node ID from
GIDNIDMAP(0xD4). The Socket ID can be calculated accordingly.
The local Node ID is only available at bit 2:0, but current code doesn't
mask it. If a BIOS doesn't clear the rest of the bits, an incorrect Node ID
will be fetched.
Filter the Node ID by adding a mask.
Reported-by: Song Liu <songliubraving@fb.com> Tested-by: Song Liu <songliubraving@fb.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> # v3.7+ Fixes: 7c94ee2e0917 ("perf/x86: Add Intel Nehalem and Sandy Bridge-EP uncore support") Link: https://lkml.kernel.org/r/1548600794-33162-1-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Lukas Wunner [Wed, 23 Jan 2019 08:26:00 +0000 (09:26 +0100)]
dmaengine: bcm2835: Fix abort of transactions
There are multiple issues with bcm2835_dma_abort() (which is called on
termination of a transaction):
* The algorithm to abort the transaction first pauses the channel by
clearing the ACTIVE flag in the CS register, then waits for the PAUSED
flag to clear. Page 49 of the spec documents the latter as follows:
"Indicates if the DMA is currently paused and not transferring data.
This will occur if the active bit has been cleared [...]"
https://www.raspberrypi.org/app/uploads/2012/02/BCM2835-ARM-Peripherals.pdf
So the function is entering an infinite loop because it is waiting for
PAUSED to clear which is always set due to the function having cleared
the ACTIVE flag. The only thing that's saving it from itself is the
upper bound of 10000 loop iterations.
The code comment says that the intention is to "wait for any current
AXI transfer to complete", so the author probably wanted to check the
WAITING_FOR_OUTSTANDING_WRITES flag instead. Amend the function
accordingly.
* The CS register is only read at the beginning of the function. It
needs to be read again after pausing the channel and before checking
for outstanding writes, otherwise writes which were issued between
the register read at the beginning of the function and pausing the
channel may not be waited for.
* The function seeks to abort the transfer by writing 0 to the NEXTCONBK
register and setting the ABORT and ACTIVE flags. Thereby, the 0 in
NEXTCONBK is sought to be loaded into the CONBLK_AD register. However
experimentation has shown this approach to not work: The CONBLK_AD
register remains the same as before and the CS register contains
0x00000030 (PAUSED | DREQ_STOPS_DMA). In other words, the control
block is not aborted but merely paused and it will be resumed once the
next DMA transaction is started. That is absolutely not the desired
behavior.
A simpler approach is to set the channel's RESET flag instead. This
reliably zeroes the NEXTCONBK as well as the CS register. It requires
less code and only a single MMIO write. This is also what popular
user space DMA drivers do, e.g.:
https://github.com/metachris/RPIO/blob/master/source/c_pwm/pwm.c
Note that the spec is contradictory whether the NEXTCONBK register
is writeable at all. On the one hand, page 41 claims:
"The value loaded into the NEXTCONBK register can be overwritten so
that the linked list of Control Block data structures can be
dynamically altered. However it is only safe to do this when the DMA
is paused."
On the other hand, page 40 specifies:
"Only three registers in each channel's register set are directly
writeable (CS, CONBLK_AD and DEBUG). The other registers (TI,
SOURCE_AD, DEST_AD, TXFR_LEN, STRIDE & NEXTCONBK), are automatically
loaded from a Control Block data structure held in external memory."
Fixes: 96286b576690 ("dmaengine: Add support for BCM2835") Signed-off-by: Lukas Wunner <lukas@wunner.de> Cc: stable@vger.kernel.org # v3.14+ Cc: Frank Pavlic <f.pavlic@kunbus.de> Cc: Martin Sperl <kernel@martin.sperl.org> Cc: Florian Meier <florian.meier@koalo.de> Cc: Clive Messer <clive.m.messer@gmail.com> Cc: Matthias Reichl <hias@horus.com> Tested-by: Stefan Wahren <stefan.wahren@i2se.com> Acked-by: Florian Kauer <florian.kauer@koalo.de> Signed-off-by: Vinod Koul <vkoul@kernel.org>
Lukas Wunner [Wed, 23 Jan 2019 08:26:00 +0000 (09:26 +0100)]
dmaengine: bcm2835: Fix interrupt race on RT
If IRQ handlers are threaded (either because CONFIG_PREEMPT_RT_BASE is
enabled or "threadirqs" was passed on the command line) and if system
load is sufficiently high that wakeup latency of IRQ threads degrades,
SPI DMA transactions on the BCM2835 occasionally break like this:
ks8851 spi0.0: SPI transfer timed out
bcm2835-dma 3f007000.dma: DMA transfer could not be terminated
ks8851 spi0.0 eth2: ks8851_rdfifo: spi_sync() failed
The root cause is an assumption made by the DMA driver which is
documented in a code comment in bcm2835_dma_terminate_all():
/*
* Stop DMA activity: we assume the callback will not be called
* after bcm_dma_abort() returns (even if it does, it will see
* c->desc is NULL and exit.)
*/
That assumption falls apart if the IRQ handler bcm2835_dma_callback() is
threaded: A client may terminate a descriptor and issue a new one
before the IRQ handler had a chance to run. In fact the IRQ handler may
miss an *arbitrary* number of descriptors. The result is the following
race condition:
1. A descriptor finishes, its interrupt is deferred to the IRQ thread.
2. A client calls dma_terminate_async() which sets channel->desc = NULL.
3. The client issues a new descriptor. Because channel->desc is NULL,
bcm2835_dma_issue_pending() immediately starts the descriptor.
4. Finally the IRQ thread runs and writes BCM2835_DMA_INT to the CS
register to acknowledge the interrupt. This clears the ACTIVE flag,
so the newly issued descriptor is paused in the middle of the
transaction. Because channel->desc is not NULL, the IRQ thread
finalizes the descriptor and tries to start the next one.
I see two possible solutions: The first is to call synchronize_irq()
in bcm2835_dma_issue_pending() to wait until the IRQ thread has
finished before issuing a new descriptor. The downside of this approach
is unnecessary latency if clients desire rapidly terminating and
re-issuing descriptors and don't have any use for an IRQ callback.
(The SPI TX DMA channel is a case in point.)
A better alternative is to make the IRQ thread recognize that it has
missed descriptors and avoid finalizing the newly issued descriptor.
So first of all, set the ACTIVE flag when acknowledging the interrupt.
This keeps a newly issued descriptor running.
If the descriptor was finished, the channel remains idle despite the
ACTIVE flag being set. However the ACTIVE flag can then no longer be
used to check whether the channel is idle, so instead check whether
the register containing the current control block address is zero
and finalize the current descriptor only if so.
That way, there is no impact on latency and throughput if the client
doesn't care for the interrupt: Only minimal additional overhead is
introduced for non-cyclic descriptors as one further MMIO read is
necessary per interrupt to check for idleness of the channel. Cyclic
descriptors are sped up slightly by removing one MMIO write per
interrupt.
Fixes: 96286b576690 ("dmaengine: Add support for BCM2835") Signed-off-by: Lukas Wunner <lukas@wunner.de> Cc: stable@vger.kernel.org # v3.14+ Cc: Frank Pavlic <f.pavlic@kunbus.de> Cc: Martin Sperl <kernel@martin.sperl.org> Cc: Florian Meier <florian.meier@koalo.de> Cc: Clive Messer <clive.m.messer@gmail.com> Cc: Matthias Reichl <hias@horus.com> Tested-by: Stefan Wahren <stefan.wahren@i2se.com> Acked-by: Florian Kauer <florian.kauer@koalo.de> Signed-off-by: Vinod Koul <vkoul@kernel.org>
Leonid Iziumtsev [Tue, 15 Jan 2019 17:15:23 +0000 (17:15 +0000)]
dmaengine: imx-dma: fix wrong callback invoke
Once the "ld_queue" list is not empty, next descriptor will migrate
into "ld_active" list. The "desc" variable will be overwritten
during that transition. And later the dmaengine_desc_get_callback_invoke()
will use it as an argument. As result we invoke wrong callback.
That behaviour was in place since:
commit fcaaba6c7136 ("dmaengine: imx-dma: fix callback path in tasklet").
But after commit 4cd13c21b207 ("softirq: Let ksoftirqd do its job")
things got worse, since possible delay between tasklet_schedule()
from DMA irq handler and actual tasklet function execution got bigger.
And that gave more time for new DMA request to be submitted and
to be put into "ld_queue" list.
It has been noticed that DMA issue is causing problems for "mxc-mmc"
driver. While stressing the system with heavy network traffic and
writing/reading to/from sd card simultaneously the timeout may happen:
10013000.sdhci: mxcmci_watchdog: read time out (status = 0x30004900)
That often lead to file system corruption.
Signed-off-by: Leonid Iziumtsev <leonid.iziumtsev@gmail.com> Signed-off-by: Vinod Koul <vkoul@kernel.org> Cc: stable@vger.kernel.org
Toshiaki Makita [Thu, 31 Jan 2019 11:40:30 +0000 (20:40 +0900)]
virtio_net: Account for tx bytes and packets on sending xdp_frames
Previously virtnet_xdp_xmit() did not account for device tx counters,
which caused confusions.
To be consistent with SKBs, account them on freeing xdp_frames.
Reported-by: David Ahern <dsahern@gmail.com> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dave Airlie [Mon, 4 Feb 2019 01:05:49 +0000 (11:05 +1000)]
Merge branch 'drm-fixes-5.0' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
A few fixes for 5.0:
- Fix radeon crash on SI with VM passthrough
- Fencing fix for shared buffers
- Fix power hwmon reporting on APUs
- Powerplay fix for APUs
Xin Long [Sun, 3 Feb 2019 19:27:58 +0000 (03:27 +0800)]
sctp: check and update stream->out_curr when allocating stream_out
Now when using stream reconfig to add out streams, stream->out
will get re-allocated, and all old streams' information will
be copied to the new ones and the old ones will be freed.
So without stream->out_curr updated, next time when trying to
send from stream->out_curr stream, a panic would be caused.
This patch is to check and update stream->out_curr when
allocating stream_out.
v1->v2:
- define fa_index() to get elem index from stream->out_curr.
v2->v3:
- repost with no change.
Fixes: 5bbbbe32a431 ("sctp: introduce stream scheduler foundations") Reported-by: Ying Xu <yinxu@redhat.com> Reported-by: syzbot+e33a3a138267ca119c7d@syzkaller.appspotmail.com Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Darrick J. Wong [Sun, 3 Feb 2019 22:03:59 +0000 (14:03 -0800)]
xfs: set buffer ops when repair probes for btree type
In xrep_findroot_block, we work out the btree type and correctness of a
given block by calling different btree verifiers on root block
candidates. However, we leave the NULL b_ops while ->verify_read
validates the block, which means that if the verifier calls
xfs_buf_verifier_error it'll crash on the null b_ops. Fix it to set
b_ops before calling the verifier and unsetting it if the verifier
fails.
Furthermore, improve the documentation around xfs_buf_ensure_ops, which
is the function that is responsible for cleaning up the b_ops state of
buffers that go through xrep_findroot_block but don't match anything.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
Brian Foster [Sun, 3 Feb 2019 22:03:06 +0000 (14:03 -0800)]
xfs: end sync buffer I/O properly on shutdown error
As of commit e339dd8d8b ("xfs: use sync buffer I/O for sync delwri
queue submission"), the delwri submission code uses sync buffer I/O
for sync delwri I/O. Instead of waiting on async I/O to unlock the
buffer, it uses the underlying sync I/O completion mechanism.
If delwri buffer submission fails due to a shutdown scenario, an
error is set on the buffer and buffer completion never occurs. This
can cause xfs_buf_delwri_submit() to deadlock waiting on a
completion event.
We could check the error state before waiting on such buffers, but
that doesn't serialize against the case of an error set via a racing
I/O completion. Instead, invoke I/O completion in the shutdown case
regardless of buffer I/O type.
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Brian Foster [Fri, 1 Feb 2019 17:36:36 +0000 (09:36 -0800)]
xfs: eof trim writeback mapping as soon as it is cached
The cached writeback mapping is EOF trimmed to try and avoid races
between post-eof block management and writeback that result in
sending cached data to a stale location. The cached mapping is
currently trimmed on the validation check, which leaves a race
window between the time the mapping is cached and when it is trimmed
against the current inode size.
For example, if a new mapping is cached by delalloc conversion on a
blocksize == page size fs, we could cycle various locks, perform
memory allocations, etc. in the writeback codepath before the
associated mapping is eventually trimmed to i_size. This leaves
enough time for a post-eof truncate and file append before the
cached mapping is trimmed. The former event essentially invalidates
a range of the cached mapping and the latter bumps the inode size
such the trim on the next writepage event won't trim all of the
invalid blocks. fstest generic/464 reproduces this scenario
occasionally and causes a lost writeback and stale delalloc blocks
warning on inode inactivation.
To work around this problem, trim the cached writeback mapping as
soon as it is cached in addition to on subsequent validation checks.
This is a minor tweak to tighten the race window as much as possible
until a proper invalidation mechanism is available.
Fixes: 40214d128e07 ("xfs: trim writepage mapping to within eof") Cc: <stable@vger.kernel.org> # v4.14+ Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
net: systemport: Fix WoL with password after deep sleep
Broadcom STB chips support a deep sleep mode where all register
contents are lost. Because we were stashing the MagicPacket password
into some of these registers a suspend into that deep sleep then a
resumption would not lead to being able to wake-up from MagicPacket with
password again.
Fix this by keeping a software copy of the password and program it
during suspend.
Fixes: 83e82f4c706b ("net: systemport: add Wake-on-LAN support") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 3 Feb 2019 19:06:25 +0000 (11:06 -0800)]
Merge branch 'vsock-virtio-hot-unplug'
Stefano Garzarella says:
====================
vsock/virtio: fix issues on device hot-unplug
These patches try to handle the hot-unplug of vsock virtio transport device in
a proper way.
Maybe move the vsock_core_init()/vsock_core_exit() functions in the module_init
and module_exit of vsock_virtio_transport module can't be the best way, but the
architecture of vsock_core forces us to this approach for now.
The vsock_core proto_ops expect a valid pointer to the transport device, so we
can't call vsock_core_exit() until there are open sockets.
v2 -> v3:
- Rebased on master
v1 -> v2:
- Fixed commit message of patch 1.
- Added Reviewed-by, Acked-by tags by Stefan
====================
Signed-off-by: David S. Miller <davem@davemloft.net>