]> asedeno.scripts.mit.edu Git - linux.git/log
linux.git
4 years agoxfs: Sanity check flags of Q_XQUOTARM call
Jan Kara [Thu, 24 Oct 2019 00:00:45 +0000 (17:00 -0700)]
xfs: Sanity check flags of Q_XQUOTARM call

Flags passed to Q_XQUOTARM were not sanity checked for invalid values.
Fix that.

Fixes: 9da93f9b7cdf ("xfs: fix Q_XQUOTARM ioctl")
Reported-by: Yang Xu <xuyang2018.jy@cn.fujitsu.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: add mising include of xfs_pnfs.h for missing declarations
Ben Dooks (Codethink) [Tue, 22 Oct 2019 16:53:50 +0000 (09:53 -0700)]
xfs: add mising include of xfs_pnfs.h for missing declarations

The xfs_pnfs.c file is missing an include of xfs_pnfs.h to
add the prototypes of the functions it exports. Include this
file to fix the following sparse warnings:

fs/xfs/xfs_pnfs.c:27:1: warning: symbol 'xfs_break_leased_layouts' was not declared. Should it be static?
fs/xfs/xfs_pnfs.c:52:1: warning: symbol 'xfs_fs_get_uuid' was not declared. Should it be static?
fs/xfs/xfs_pnfs.c:77:1: warning: symbol 'xfs_fs_map_blocks' was not declared. Should it be static?
fs/xfs/xfs_pnfs.c:226:1: warning: symbol 'xfs_fs_commit_blocks' was not declared. Should it be static?

Signed-off-by: Ben Dooks (Codethink) <ben.dooks@codethink.co.uk>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: don't set bmapi total block req where minleft is
Brian Foster [Mon, 21 Oct 2019 16:26:48 +0000 (09:26 -0700)]
xfs: don't set bmapi total block req where minleft is

xfs_bmapi_write() takes a total block requirement parameter that is
passed down to the block allocation code and is used to specify the
total block requirement of the associated transaction. This is used
to try and select an AG that can not only satisfy the requested
extent allocation, but can also accommodate subsequent allocations
that might be required to complete the transaction. For example,
additional bmbt block allocations may be required on insertion of
the resulting extent to an inode data fork.

While it's important for callers to calculate and reserve such extra
blocks in the transaction, it is not necessary to pass the total
value to xfs_bmapi_write() in all cases. The latter automatically
sets minleft to ensure that sufficient free blocks remain after the
allocation attempt to expand the format of the associated inode
(i.e., such as extent to btree conversion, btree splits, etc).
Therefore, any callers that pass a total block requirement of the
bmap mapping length plus worst case bmbt expansion essentially
specify the additional reservation requirement twice. These callers
can pass a total of zero to rely on the bmapi minleft policy.

Beyond being superfluous, the primary motivation for this change is
that the total reservation logic in the bmbt code is dubious in
scenarios where minlen < maxlen and a maxlen extent cannot be
allocated (which is more common for data extent allocations where
contiguity is not required). The total value is based on maxlen in
the xfs_bmapi_write() caller. If the bmbt code falls back to an
allocation between minlen and maxlen, that allocation will not
succeed until total is reset to minlen, which essentially throws
away any additional reservation included in total by the caller. In
addition, the total value is not reset until after alignment is
dropped, which means that such callers drop alignment far too
aggressively than necessary.

Update all callers of xfs_bmapi_write() that pass a total block
value of the mapping length plus bmbt reservation to instead pass
zero and rely on xfs_bmapi_minleft() to enforce the bmbt reservation
requirement. This trades off slightly less conservative AG selection
for the ability to preserve alignment in more scenarios.
xfs_bmapi_write() callers that incorporate unrelated or additional
reservations in total beyond what is already included in minleft
must continue to use the former.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: cap longest free extent to maximum allocatable
Dave Chinner [Mon, 21 Oct 2019 16:26:34 +0000 (09:26 -0700)]
xfs: cap longest free extent to maximum allocatable

Cap longest extent to the largest we can allocate based on limits
calculated at mount time. Dynamic state (such as finobt blocks)
can result in the longest free extent exceeding the size we can
allocate, and that results in failure to align full AG allocations
when the AG is empty.

Result:

xfs_io-4413  [003]   426.412459: xfs_alloc_vextent_loopfailed: dev 8:96 agno 0 agbno 32 minlen 243968 maxlen 244000 mod 0 prod 1 minleft 1 total 262148 alignment 32 minalignslop 0 len 0 type NEAR_BNO otype START_BNO wasdel 0 wasfromfl 0 resv 0 datatype 0x5 firstblock 0xffffffffffffffff

minlen and maxlen are now separated by the alignment size, and
allocation fails because args.total > free space in the AG.

[bfoster: Added xfs_bmap_btalloc() changes.]

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: remove the duplicated inode log fieldmask set
kaixuxia [Mon, 21 Oct 2019 15:55:33 +0000 (08:55 -0700)]
xfs: remove the duplicated inode log fieldmask set

The xfs_bumplink() call has set the inode log fieldmask XFS_ILOG_CORE,
so the next xfs_trans_log_inode() call is not necessary.

Signed-off-by: kaixuxia <kaixuxia@tencent.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: improve the IOMAP_NOWAIT check for COW inodes
Christoph Hellwig [Sat, 19 Oct 2019 16:09:47 +0000 (09:09 -0700)]
xfs: improve the IOMAP_NOWAIT check for COW inodes

Only bail out once we know that a COW allocation is actually required,
similar to how we handle normal data fork allocations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: cleanup xfs_direct_write_iomap_begin
Christoph Hellwig [Sat, 19 Oct 2019 16:09:47 +0000 (09:09 -0700)]
xfs: cleanup xfs_direct_write_iomap_begin

Move more checks into the helpers that determine if we need a COW
operation or allocation and split the return path for when an existing
data for allocation has been found versus a new allocation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: rename the whichfork variable in xfs_buffered_write_iomap_begin
Christoph Hellwig [Sat, 19 Oct 2019 16:09:47 +0000 (09:09 -0700)]
xfs: rename the whichfork variable in xfs_buffered_write_iomap_begin

Renaming whichfork to allocfork in xfs_buffered_write_iomap_begin makes
the usage of this variable a little more clear.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: split the iomap ops for buffered vs direct writes
Christoph Hellwig [Sat, 19 Oct 2019 16:09:46 +0000 (09:09 -0700)]
xfs: split the iomap ops for buffered vs direct writes

Instead of lots of magic conditionals in the main write_begin
handler this make the intent very clear.  Thing will become even
better once we support delayed allocations for extent size hints
and realtime allocations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: move xfs_file_iomap_begin_delay around
Christoph Hellwig [Sat, 19 Oct 2019 16:09:46 +0000 (09:09 -0700)]
xfs: move xfs_file_iomap_begin_delay around

Move xfs_file_iomap_begin_delay near the end of the file next to the
other iomap functions to prepare for additional refactoring.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: split out a new set of read-only iomap ops
Christoph Hellwig [Sat, 19 Oct 2019 16:09:45 +0000 (09:09 -0700)]
xfs: split out a new set of read-only iomap ops

Start untangling xfs_file_iomap_begin by splitting out the read-only
case into its own set of iomap_ops with a very simply iomap_begin
helper.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: factor out a helper to calculate the end_fsb
Christoph Hellwig [Sat, 19 Oct 2019 16:09:44 +0000 (09:09 -0700)]
xfs: factor out a helper to calculate the end_fsb

We have lots of places that want to calculate the final fsb for
a offset + count in bytes and check that the result fits into
s_maxbytes.  Factor out a helper for that.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: fill out the srcmap in iomap_begin
Christoph Hellwig [Sat, 19 Oct 2019 16:09:44 +0000 (09:09 -0700)]
xfs: fill out the srcmap in iomap_begin

Replace our local hacks to report the source block in the main iomap
with the proper scrmap reporting.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: refactor xfs_file_iomap_begin_delay
Christoph Hellwig [Sat, 19 Oct 2019 16:09:43 +0000 (09:09 -0700)]
xfs: refactor xfs_file_iomap_begin_delay

Rejuggle the return path to prepare for filling out a source iomap.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: pass two imaps to xfs_reflink_allocate_cow
Christoph Hellwig [Sat, 19 Oct 2019 16:09:43 +0000 (09:09 -0700)]
xfs: pass two imaps to xfs_reflink_allocate_cow

xfs_reflink_allocate_cow consumes the source data fork imap, and
potentially returns the COW fork imap.  Split the arguments in two
to clear up the calling conventions and to prepare for returning
a source iomap from ->iomap_begin.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: remove xfs_reflink_dirty_extents
Christoph Hellwig [Sat, 19 Oct 2019 16:09:43 +0000 (09:09 -0700)]
xfs: remove xfs_reflink_dirty_extents

Now that xfs_file_unshare is not completely dumb we can just call it
directly without iterating the extent and reflink btrees ourselves.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: also call xfs_file_iomap_end_delalloc for zeroing operations
Christoph Hellwig [Sat, 19 Oct 2019 16:09:42 +0000 (09:09 -0700)]
xfs: also call xfs_file_iomap_end_delalloc for zeroing operations

There is no reason not to punch out stale delalloc blocks for zeroing
operations, as they otherwise behave exactly like normal writes.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: fix inode fork extent count overflow
Dave Chinner [Thu, 17 Oct 2019 20:40:33 +0000 (13:40 -0700)]
xfs: fix inode fork extent count overflow

[commit message is verbose for discussion purposes - will trim it
down later. Some questions about implementation details at the end.]

Zorro Lang recently ran a new test to stress single inode extent
counts now that they are no longer limited by memory allocation.
The test was simply:

# xfs_io -f -c "falloc 0 40t" /mnt/scratch/big-file
# ~/src/xfstests-dev/punch-alternating /mnt/scratch/big-file

This test uncovered a problem where the hole punching operation
appeared to finish with no error, but apparently only created 268M
extents instead of the 10 billion it was supposed to.

Further, trying to punch out extents that should have been present
resulted in success, but no change in the extent count. It looked
like a silent failure.

While running the test and observing the behaviour in real time,
I observed the extent coutn growing at ~2M extents/minute, and saw
this after about an hour:

# xfs_io -f -c "stat" /mnt/scratch/big-file |grep next ; \
> sleep 60 ; \
> xfs_io -f -c "stat" /mnt/scratch/big-file |grep next
fsxattr.nextents = 127657993
fsxattr.nextents = 129683339
#

And a few minutes later this:

# xfs_io -f -c "stat" /mnt/scratch/big-file |grep next
fsxattr.nextents = 4177861124
#

Ah, what? Where did that 4 billion extra extents suddenly come from?

Stop the workload, unmount, mount:

# xfs_io -f -c "stat" /mnt/scratch/big-file |grep next
fsxattr.nextents = 166044375
#

And it's back at the expected number. i.e. the extent count is
correct on disk, but it's screwed up in memory. I loaded up the
extent list, and immediately:

# xfs_io -f -c "stat" /mnt/scratch/big-file |grep next
fsxattr.nextents = 4192576215
#

It's bad again. So, where does that number come from?
xfs_fill_fsxattr():

                if (ip->i_df.if_flags & XFS_IFEXTENTS)
                        fa->fsx_nextents = xfs_iext_count(&ip->i_df);
                else
                        fa->fsx_nextents = ip->i_d.di_nextents;

And that's the behaviour I just saw in a nutshell. The on disk count
is correct, but once the tree is loaded into memory, it goes whacky.
Clearly there's something wrong with xfs_iext_count():

inline xfs_extnum_t xfs_iext_count(struct xfs_ifork *ifp)
{
        return ifp->if_bytes / sizeof(struct xfs_iext_rec);
}

Simple enough, but 134M extents is 2**27, and that's right about
where things went wrong. A struct xfs_iext_rec is 16 bytes in size,
which means 2**27 * 2**4 = 2**31 and we're right on target for an
integer overflow. And, sure enough:

struct xfs_ifork {
        int                     if_bytes;       /* bytes in if_u1 */
....

Once we get 2**27 extents in a file, we overflow if_bytes and the
in-core extent count goes wrong. And when we reach 2**28 extents,
if_bytes wraps back to zero and things really start to go wrong
there. This is where the silent failure comes from - only the first
2**28 extents can be looked up directly due to the overflow, all the
extents above this index wrap back to somewhere in the first 2**28
extents. Hence with a regular pattern, trying to punch a hole in the
range that didn't have holes mapped to a hole in the first 2**28
extents and so "succeeded" without changing anything. Hence "silent
failure"...

Fix this by converting if_bytes to a int64_t and converting all the
index variables and size calculations to use int64_t types to avoid
overflows in future. Signed integers are still used to enable easy
detection of extent count underflows. This enables scalability of
extent counts to the limits of the on-disk format - MAXEXTNUM
(2**31) extents.

Current testing is at over 500M extents and still going:

fsxattr.nextents = 517310478

Reported-by: Zorro Lang <zlang@redhat.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: remove the XLOG_STATE_DO_CALLBACK state
Christoph Hellwig [Mon, 14 Oct 2019 17:36:43 +0000 (10:36 -0700)]
xfs: remove the XLOG_STATE_DO_CALLBACK state

XLOG_STATE_DO_CALLBACK is only entered through XLOG_STATE_DONE_SYNC
and just used in a single debug check.  Remove the flag and thus
simplify the calling conventions for xlog_state_do_callback and
xlog_state_iodone_process_iclog.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
4 years agoxfs: turn ic_state into an enum
Christoph Hellwig [Mon, 14 Oct 2019 17:36:43 +0000 (10:36 -0700)]
xfs: turn ic_state into an enum

ic_state really is a set of different states, even if the values are
encoded as non-conflicting bits and we sometimes use logical and
operations to check for them.  Switch all comparisms to check for
exact values (and use switch statements in a few places to make it
more clear) and turn the values into an implicitly enumerated enum
type.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
4 years agoxfs: remove the unused XLOG_STATE_ALL and XLOG_STATE_UNUSED flags
Christoph Hellwig [Mon, 14 Oct 2019 17:36:42 +0000 (10:36 -0700)]
xfs: remove the unused XLOG_STATE_ALL and XLOG_STATE_UNUSED flags

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
4 years agoxfs: remove dead ifdef XFSERRORDEBUG code
Christoph Hellwig [Mon, 14 Oct 2019 17:36:42 +0000 (10:36 -0700)]
xfs: remove dead ifdef XFSERRORDEBUG code

XFSERRORDEBUG is never set and the code isn't all that useful, so remove
it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
4 years agoxfs: call xlog_state_release_iclog with l_icloglock held
Christoph Hellwig [Mon, 14 Oct 2019 17:36:41 +0000 (10:36 -0700)]
xfs: call xlog_state_release_iclog with l_icloglock held

All but one caller of xlog_state_release_iclog hold l_icloglock and need
to drop and reacquire it to call xlog_state_release_iclog.  Switch the
xlog_state_release_iclog calling conventions to expect the lock to be
held, and open code the logic (using a shared helper) in the only
remaining caller that does not have the lock (and where not holding it
is a nice performance optimization).  Also move the refactored code to
require the least amount of forward declarations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: minor whitespace cleanup]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
4 years agoxfs: move the locking from xlog_state_finish_copy to the callers
Christoph Hellwig [Mon, 14 Oct 2019 17:36:41 +0000 (10:36 -0700)]
xfs: move the locking from xlog_state_finish_copy to the callers

This will allow optimizing various locking cycles in the following
patches.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
4 years agoxfs: remove the unused ic_io_size field from xlog_in_core
Christoph Hellwig [Mon, 14 Oct 2019 17:36:40 +0000 (10:36 -0700)]
xfs: remove the unused ic_io_size field from xlog_in_core

ic_io_size is only used inside xlog_write_iclog, where we can just use
the count parameter intead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
4 years agoxfs: pass the correct flag to xlog_write_iclog
Christoph Hellwig [Mon, 14 Oct 2019 17:36:40 +0000 (10:36 -0700)]
xfs: pass the correct flag to xlog_write_iclog

xlog_write_iclog expects a bool for the second argument.  While any
non-0 value happens to work fine this makes all calls consistent.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
4 years agoxfs: optimize near mode bnobt scans with concurrent cntbt lookups
Brian Foster [Mon, 14 Oct 2019 00:10:36 +0000 (17:10 -0700)]
xfs: optimize near mode bnobt scans with concurrent cntbt lookups

The near mode fallback algorithm consists of a left/right scan of
the bnobt. This algorithm has very poor breakdown characteristics
under worst case free space fragmentation conditions. If a suitable
extent is far enough from the locality hint, each allocation may
scan most or all of the bnobt before it completes. This causes
pathological behavior and extremely high allocation latencies.

While locality is important to near mode allocations, it is not so
important as to incur pathological allocation latency to provide the
asolute best available locality for every allocation. If the
allocation is large enough or far enough away, there is a point of
diminishing returns. As such, we can bound the overall operation by
including an iterative cntbt lookup in the broader search. The cntbt
lookup is optimized to immediately find the extent with best
locality for the given size on each iteration. Since the cntbt is
indexed by extent size, the lookup repeats with a variably
aggressive increasing search key size until it runs off the edge of
the tree.

This approach provides a natural balance between the two algorithms
for various situations. For example, the bnobt scan is able to
satisfy smaller allocations such as for inode chunks or btree blocks
more quickly where the cntbt search may have to search through a
large set of extent sizes when the search key starts off small
relative to the largest extent in the tree. On the other hand, the
cntbt search more deterministically covers the set of suitable
extents for larger data extent allocation requests that the bnobt
scan may have to search the entire tree to locate.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: factor out tree fixup logic into helper
Brian Foster [Mon, 14 Oct 2019 00:10:35 +0000 (17:10 -0700)]
xfs: factor out tree fixup logic into helper

Lift the btree fixup path into a helper function.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: refactor near mode alloc bnobt scan into separate function
Brian Foster [Mon, 14 Oct 2019 00:10:35 +0000 (17:10 -0700)]
xfs: refactor near mode alloc bnobt scan into separate function

In preparation to enhance the near mode allocation bnobt scan algorithm, lift
it into a separate function. No functional changes.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: refactor and reuse best extent scanning logic
Brian Foster [Mon, 14 Oct 2019 00:10:34 +0000 (17:10 -0700)]
xfs: refactor and reuse best extent scanning logic

The bnobt "find best" helper implements a simple btree walker
function. This general pattern, or a subset thereof, is reused in
various parts of a near mode allocation operation. For example, the
bnobt left/right scans are each iterative btree walks along with the
cntbt lastblock scan.

Rework this function into a generic btree walker, add a couple
parameters to control termination behavior from various contexts and
reuse it where applicable.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: refactor allocation tree fixup code
Brian Foster [Mon, 14 Oct 2019 00:10:34 +0000 (17:10 -0700)]
xfs: refactor allocation tree fixup code

Both algorithms duplicate the same btree allocation code. Eliminate
the duplication and reuse the fallback algorithm codepath.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: reuse best extent tracking logic for bnobt scan
Brian Foster [Mon, 14 Oct 2019 00:10:33 +0000 (17:10 -0700)]
xfs: reuse best extent tracking logic for bnobt scan

The near mode bnobt scan searches left and right in the bnobt
looking for the closest free extent to the allocation hint that
satisfies minlen. Once such an extent is found, the left/right
search terminates, we search one more time in the opposite direction
and finish the allocation with the best overall extent.

The left/right and find best searches are currently controlled via a
combination of cursor state and local variables. Clean up this code
and prepare for further improvements to the near mode fallback
algorithm by reusing the allocation cursor best extent tracking
mechanism. Update the tracking logic to deactivate bnobt cursors
when out of allocation range and replace open-coded extent checks to
calls to the common helper. In doing so, rename some misnamed local
variables in the top-level near mode allocation function.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: refactor cntbt lastblock scan best extent logic into helper
Brian Foster [Mon, 14 Oct 2019 00:10:33 +0000 (17:10 -0700)]
xfs: refactor cntbt lastblock scan best extent logic into helper

The cntbt lastblock scan checks the size, alignment, locality, etc.
of each free extent in the block and compares it with the current
best candidate. This logic will be reused by the upcoming optimized
cntbt algorithm, so refactor it into a separate helper. Note that
acur->diff is now initialized to -1 (unsigned) instead of 0 to
support the more granular comparison logic in the new helper.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: track best extent from cntbt lastblock scan in alloc cursor
Brian Foster [Mon, 14 Oct 2019 00:10:32 +0000 (17:10 -0700)]
xfs: track best extent from cntbt lastblock scan in alloc cursor

If the size lookup lands in the last block of the by-size btree, the
near mode algorithm scans the entire block for the extent with best
available locality. In preparation for similar best available
extent tracking across both btrees, extend the allocation cursor
with best extent data and lift the associated state from the cntbt
last block scan code. No functional changes.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: track allocation busy state in allocation cursor
Brian Foster [Mon, 14 Oct 2019 00:10:32 +0000 (17:10 -0700)]
xfs: track allocation busy state in allocation cursor

Extend the allocation cursor to track extent busy state for an
allocation attempt. No functional changes.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: introduce allocation cursor data structure
Brian Foster [Mon, 14 Oct 2019 00:10:31 +0000 (17:10 -0700)]
xfs: introduce allocation cursor data structure

Introduce a new allocation cursor data structure to encapsulate the
various states and structures used to perform an extent allocation.
This structure will eventually be used to track overall allocation
state across different search algorithms on both free space btrees.

To start, include the three btree cursors (one for the cntbt and two
for the bnobt left/right search) used by the near mode allocation
algorithm and refactor the cursor setup and teardown code into
helpers. This slightly changes cursor memory allocation patterns,
but otherwise makes no functional changes to the allocation
algorithm.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: fix sparse complaints]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: track active state of allocation btree cursors
Brian Foster [Mon, 14 Oct 2019 00:10:31 +0000 (17:10 -0700)]
xfs: track active state of allocation btree cursors

The upcoming allocation algorithm update searches multiple
allocation btree cursors concurrently. As such, it requires an
active state to track when a particular cursor should continue
searching. While active state will be modified based on higher level
logic, we can define base functionality based on the result of
allocation btree lookups.

Define an active flag in the private area of the btree cursor.
Update it based on the result of lookups in the existing allocation
btree helpers. Finally, provide a new helper to query the current
state.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: ignore extent size hints for always COW inodes
Christoph Hellwig [Mon, 14 Oct 2019 17:07:21 +0000 (10:07 -0700)]
xfs: ignore extent size hints for always COW inodes

There is no point in applying extent size hints for always COW inodes,
as we would just have to COW any extra allocation beyond the data
actually written.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: include QUOTA, FATAL ASSERT build options in XFS_BUILD_OPTIONS
yu kuai [Mon, 14 Oct 2019 17:34:32 +0000 (10:34 -0700)]
xfs: include QUOTA, FATAL ASSERT build options in XFS_BUILD_OPTIONS

In commit d03a2f1b9fa8 ("xfs: include WARN, REPAIR build options in
XFS_BUILD_OPTIONS"), Eric pointed out that the XFS_BUILD_OPTIONS string,
shown at module init time and in modinfo output, does not currently
include all available build options. So, he added in CONFIG_XFS_WARN and
CONFIG_XFS_REPAIR. However, this is not enough, add in CONFIG_XFS_QUOTA
and CONFIG_XFS_ASSERT_FATAL.

Signed-off-by: yu kuai <yukuai3@huawei.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoiomap: use a srcmap for a read-modify-write I/O
Goldwyn Rodrigues [Fri, 18 Oct 2019 23:44:10 +0000 (16:44 -0700)]
iomap: use a srcmap for a read-modify-write I/O

The srcmap is used to identify where the read is to be performed from.
It is passed to ->iomap_begin, which can fill it in if we need to read
data for partially written blocks from a different location than the
write target.  The srcmap is only supported for buffered writes so far.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
[hch: merged two patches, removed the IOMAP_F_COW flag, use iomap as
      srcmap if not set, adjust length down to srcmap end as well]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Acked-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
4 years agoiomap: renumber IOMAP_HOLE to 0
Christoph Hellwig [Fri, 18 Oct 2019 23:43:08 +0000 (16:43 -0700)]
iomap: renumber IOMAP_HOLE to 0

Instead of keeping a separate unnamed state for uninitialized iomaps,
renumber IOMAP_HOLE to zero so that an uninitialized iomap is treated
as a hole.

Suggested-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoiomap: use write_begin to read pages to unshare
Christoph Hellwig [Fri, 18 Oct 2019 23:42:50 +0000 (16:42 -0700)]
iomap: use write_begin to read pages to unshare

Use the existing iomap write_begin code to read the pages unshared
by iomap_file_unshare.  That avoids the extra ->readpage call and
extent tree lookup currently done by read_mapping_page.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoiomap: move the zeroing case out of iomap_read_page_sync
Christoph Hellwig [Fri, 18 Oct 2019 23:42:24 +0000 (16:42 -0700)]
iomap: move the zeroing case out of iomap_read_page_sync

That keeps the function a little easier to understand, and easier to
modify for pending enhancements.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoiomap: ignore non-shared or non-data blocks in xfs_file_dirty
Christoph Hellwig [Fri, 18 Oct 2019 23:41:34 +0000 (16:41 -0700)]
iomap: ignore non-shared or non-data blocks in xfs_file_dirty

xfs_file_dirty is used to unshare reflink blocks.  Rename the function
to xfs_file_unshare to better document that purpose, and skip iomaps
that are not shared and don't need zeroing.  This will allow to simplify
the caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoiomap: always use AOP_FLAG_NOFS in iomap_write_begin
Christoph Hellwig [Fri, 18 Oct 2019 23:41:12 +0000 (16:41 -0700)]
iomap: always use AOP_FLAG_NOFS in iomap_write_begin

All callers pass AOP_FLAG_NOFS, so lift that flag to iomap_write_begin
to allow reusing the flags arguments for an internal flags namespace
soon.  Also remove the local index variable that is only used once.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoiomap: remove the unused iomap argument to __iomap_write_end
Christoph Hellwig [Fri, 18 Oct 2019 23:40:57 +0000 (16:40 -0700)]
iomap: remove the unused iomap argument to __iomap_write_end

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoiomap: better document the IOMAP_F_* flags
Christoph Hellwig [Fri, 18 Oct 2019 23:40:17 +0000 (16:40 -0700)]
iomap: better document the IOMAP_F_* flags

The documentation for IOMAP_F_* is a bit disorganized, and doesn't
mention the fact that most flags are set by the file system and consumed
by the iomap core, while IOMAP_F_SIZE_CHANGED is set by the core and
consumed by the file system.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoiomap: enhance writeback error message
Darrick J. Wong [Thu, 17 Oct 2019 21:02:07 +0000 (14:02 -0700)]
iomap: enhance writeback error message

If we encounter an IO error during writeback, log the inode, offset, and
sector number of the failure, instead of forcing the user to do some
sort of reverse mapping to figure out which file is affected.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 years agoiomap: pass a struct page to iomap_finish_page_writeback
Christoph Hellwig [Thu, 17 Oct 2019 20:12:22 +0000 (13:12 -0700)]
iomap: pass a struct page to iomap_finish_page_writeback

No need to pass the full bio_vec.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoiomap: cleanup iomap_ioend_compare
Christoph Hellwig [Thu, 17 Oct 2019 20:12:20 +0000 (13:12 -0700)]
iomap: cleanup iomap_ioend_compare

Move the initialization of ia and ib to the declaration line and remove
a superflous else.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoiomap: move struct iomap_page out of iomap.h
Christoph Hellwig [Thu, 17 Oct 2019 20:12:19 +0000 (13:12 -0700)]
iomap: move struct iomap_page out of iomap.h

Now that all the writepage code is in the iomap code there is no
need to keep this structure public.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoiomap: warn on inline maps in iomap_writepage_map
Christoph Hellwig [Thu, 17 Oct 2019 20:12:17 +0000 (13:12 -0700)]
iomap: warn on inline maps in iomap_writepage_map

And inline mapping should never mark the page dirty and thus never end up
in writepages.  Add a check for that condition and warn if it happens.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoiomap: lift the xfs writeback code to iomap
Christoph Hellwig [Thu, 17 Oct 2019 20:12:15 +0000 (13:12 -0700)]
iomap: lift the xfs writeback code to iomap

Take the xfs writeback code and move it to fs/iomap.  A new structure
with three methods is added as the abstraction from the generic writeback
code to the file system.  These methods are used to map blocks, submit an
ioend, and cancel a page that encountered an error before it was added to
an ioend.

Signed-off-by: Christoph Hellwig <hch@lst.de>
[darrick: rename ->submit_ioend to ->prepare_ioend to clarify what it
does]
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
4 years agoiomap: lift common tracing code from xfs to iomap
Christoph Hellwig [Thu, 17 Oct 2019 20:12:13 +0000 (13:12 -0700)]
iomap: lift common tracing code from xfs to iomap

Lift the xfs code for tracing address space operations to the iomap
layer.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoiomap: zero newly allocated mapped blocks
Christoph Hellwig [Thu, 17 Oct 2019 20:12:12 +0000 (13:12 -0700)]
iomap: zero newly allocated mapped blocks

File systems like gfs2 don't support delayed allocations or unwritten
extents and thus allocate normal mapped blocks to fill holes.  To
cover the case of such file systems allocating new blocks to fill holes
also zero out mapped blocks with the new flag.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: remove the fork fields in the writepage_ctx and ioend
Christoph Hellwig [Thu, 17 Oct 2019 20:12:10 +0000 (13:12 -0700)]
xfs: remove the fork fields in the writepage_ctx and ioend

In preparation for moving the writeback code to iomap.c, replace the
XFS-specific COW fork concept with the iomap IOMAP_F_SHARED flag.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: turn io_append_trans into an io_private void pointer
Christoph Hellwig [Thu, 17 Oct 2019 20:12:09 +0000 (13:12 -0700)]
xfs: turn io_append_trans into an io_private void pointer

In preparation for moving the ioend structure to common code we need
to get rid of the xfs-specific xfs_trans type.  Just make it a file
system private void pointer instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: refactor the ioend merging code
Christoph Hellwig [Thu, 17 Oct 2019 20:12:07 +0000 (13:12 -0700)]
xfs: refactor the ioend merging code

Introduce two nicely abstracted helper, which can be moved to the iomap
code later.  Also use list_first_entry_or_null to simplify the code a
bit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: use a struct iomap in xfs_writepage_ctx
Christoph Hellwig [Thu, 17 Oct 2019 20:12:06 +0000 (13:12 -0700)]
xfs: use a struct iomap in xfs_writepage_ctx

In preparation for moving the XFS writeback code to fs/iomap.c, switch
it to use struct iomap instead of the XFS-specific struct xfs_bmbt_irec.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: set IOMAP_F_NEW more carefully
Christoph Hellwig [Thu, 17 Oct 2019 20:12:04 +0000 (13:12 -0700)]
xfs: set IOMAP_F_NEW more carefully

Don't set IOMAP_F_NEW if we COW over an existing allocated range, as
these aren't strictly new allocations.  This is required to be able to
use IOMAP_F_NEW to zero newly allocated blocks, which is required for
the iomap code to fully support file systems that don't do delayed
allocations or use unwritten extents.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: initialize iomap->flags in xfs_bmbt_to_iomap
Christoph Hellwig [Thu, 17 Oct 2019 20:12:02 +0000 (13:12 -0700)]
xfs: initialize iomap->flags in xfs_bmbt_to_iomap

Currently we don't overwrite the flags field in the iomap in
xfs_bmbt_to_iomap.  This works fine with 0-initialized iomaps on stack,
but is harmful once we want to be able to reuse an iomap in the
writeback code.  Replace the shared parameter with a set of initial
flags an thus ensures the flags field is always reinitialized.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoiomap: iomap that extends beyond EOF should be marked dirty
Dave Chinner [Thu, 17 Oct 2019 20:12:01 +0000 (13:12 -0700)]
iomap: iomap that extends beyond EOF should be marked dirty

When doing a direct IO that spans the current EOF, and there are
written blocks beyond EOF that extend beyond the current write, the
only metadata update that needs to be done is a file size extension.

However, we don't mark such iomaps as IOMAP_F_DIRTY to indicate that
there is IO completion metadata updates required, and hence we may
fail to correctly sync file size extensions made in IO completion
when O_DSYNC writes are being used and the hardware supports FUA.

Hence when setting IOMAP_F_DIRTY, we need to also take into account
whether the iomap spans the current EOF. If it does, then we need to
mark it dirty so that IO completion will call generic_write_sync()
to flush the inode size update to stable storage correctly.

Fixes: 3460cac1ca76 ("iomap: Use FUA for pure data O_DSYNC DIO writes")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: removed the ext4 part; they'll handle it separately]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: Use iomap_dio_rw to wait for unaligned direct IO
Jan Kara [Tue, 15 Oct 2019 15:43:43 +0000 (08:43 -0700)]
xfs: Use iomap_dio_rw to wait for unaligned direct IO

Use iomap_dio_rw() to wait for unaligned direct IO instead of opencoding
the wait.

Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoiomap: Allow forcing of waiting for running DIO in iomap_dio_rw()
Jan Kara [Tue, 15 Oct 2019 15:43:42 +0000 (08:43 -0700)]
iomap: Allow forcing of waiting for running DIO in iomap_dio_rw()

Filesystems do not support doing IO as asynchronous in some cases. For
example in case of unaligned writes or in case file size needs to be
extended (e.g. for ext4). Instead of forcing filesystem to wait for AIO
in such cases, add argument to iomap_dio_rw() which makes the function
wait for IO completion. This also results in executing
iomap_dio_complete() inline in iomap_dio_rw() providing its return value
to the caller as for ordinary sync IO.

Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoLinux 5.4-rc3 v5.4-rc3
Linus Torvalds [Sun, 13 Oct 2019 23:37:36 +0000 (16:37 -0700)]
Linux 5.4-rc3

4 years agoMerge tag 'trace-v5.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt...
Linus Torvalds [Sun, 13 Oct 2019 21:47:10 +0000 (14:47 -0700)]
Merge tag 'trace-v5.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "A few tracing fixes:

   - Remove lockdown from tracefs itself and moved it to the trace
     directory. Have the open functions there do the lockdown checks.

   - Fix a few races with opening an instance file and the instance
     being deleted (Discovered during the lockdown updates). Kept
     separate from the clean up code such that they can be backported to
     stable easier.

   - Clean up and consolidated the checks done when opening a trace
     file, as there were multiple checks that need to be done, and it
     did not make sense having them done in each open instance.

   - Fix a regression in the record mcount code.

   - Small hw_lat detector tracer fixes.

   - A trace_pipe read fix due to not initializing trace_seq"

* tag 'trace-v5.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Initialize iter->seq after zeroing in tracing_read_pipe()
  tracing/hwlat: Don't ignore outer-loop duration when calculating max_latency
  tracing/hwlat: Report total time spent in all NMIs during the sample
  recordmcount: Fix nop_mcount() function
  tracing: Do not create tracefs files if tracefs lockdown is in effect
  tracing: Add locked_down checks to the open calls of files created for tracefs
  tracing: Add tracing_check_open_get_tr()
  tracing: Have trace events system open call tracing_open_generic_tr()
  tracing: Get trace_array reference for available_tracers files
  ftrace: Get a reference counter for the trace_array on filter files
  tracefs: Revert ccbd54ff54e8 ("tracefs: Restrict tracefs when the kernel is locked down")

4 years agoMerge tag 'hwmon-for-v5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groec...
Linus Torvalds [Sun, 13 Oct 2019 15:40:31 +0000 (08:40 -0700)]
Merge tag 'hwmon-for-v5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull hwmon fixes from Guenter Roeck:

 - Update/fix inspur-ipsps1 and k10temp Documentation

 - Fix nct7904 driver

 - Fix HWMON_P_MIN_ALARM mask in hwmon core

* tag 'hwmon-for-v5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: docs: Extend inspur-ipsps1 title underline
  hwmon: (nct7904) Add array fan_alarm and vsen_alarm to store the alarms in nct7904_data struct.
  docs: hwmon: Include 'inspur-ipsps1.rst' into docs
  hwmon: Fix HWMON_P_MIN_ALARM mask
  hwmon: (k10temp) Update documentation and add temp2_input info
  hwmon: (nct7904) Fix the incorrect value of vsen_mask in nct7904_data struct

4 years agoMerge tag 'fixes-for-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux
Linus Torvalds [Sun, 13 Oct 2019 15:26:54 +0000 (08:26 -0700)]
Merge tag 'fixes-for-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux

Pull MTD fixes from Richard Weinberger:
 "Two fixes for MTD:

   - spi-nor: Fix for a regression in write_sr()

   - rawnand: Regression fix for the au1550nd driver"

* tag 'fixes-for-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
  mtd: rawnand: au1550nd: Fix au_read_buf16() prototype
  mtd: spi-nor: Fix direction of the write_sr() transfer

4 years agoMerge tag 'for-linus-20191012' of git://git.kernel.dk/linux-block
Linus Torvalds [Sun, 13 Oct 2019 15:15:35 +0000 (08:15 -0700)]
Merge tag 'for-linus-20191012' of git://git.kernel.dk/linux-block

Pull io_uring fix from Jens Axboe:
 "Single small fix for a regression in the sequence logic for linked
  commands"

* tag 'for-linus-20191012' of git://git.kernel.dk/linux-block:
  io_uring: fix sequence logic for timeout requests

4 years agotracing: Initialize iter->seq after zeroing in tracing_read_pipe()
Petr Mladek [Fri, 11 Oct 2019 14:21:34 +0000 (16:21 +0200)]
tracing: Initialize iter->seq after zeroing in tracing_read_pipe()

A customer reported the following softlockup:

[899688.160002] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [test.sh:16464]
[899688.160002] CPU: 0 PID: 16464 Comm: test.sh Not tainted 4.12.14-6.23-azure #1 SLE12-SP4
[899688.160002] RIP: 0010:up_write+0x1a/0x30
[899688.160002] Kernel panic - not syncing: softlockup: hung tasks
[899688.160002] RIP: 0010:up_write+0x1a/0x30
[899688.160002] RSP: 0018:ffffa86784d4fde8 EFLAGS: 00000257 ORIG_RAX: ffffffffffffff12
[899688.160002] RAX: ffffffff970fea00 RBX: 0000000000000001 RCX: 0000000000000000
[899688.160002] RDX: ffffffff00000001 RSI: 0000000000000080 RDI: ffffffff970fea00
[899688.160002] RBP: ffffffffffffffff R08: ffffffffffffffff R09: 0000000000000000
[899688.160002] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8b59014720d8
[899688.160002] R13: ffff8b59014720c0 R14: ffff8b5901471090 R15: ffff8b5901470000
[899688.160002]  tracing_read_pipe+0x336/0x3c0
[899688.160002]  __vfs_read+0x26/0x140
[899688.160002]  vfs_read+0x87/0x130
[899688.160002]  SyS_read+0x42/0x90
[899688.160002]  do_syscall_64+0x74/0x160

It caught the process in the middle of trace_access_unlock(). There is
no loop. So, it must be looping in the caller tracing_read_pipe()
via the "waitagain" label.

Crashdump analyze uncovered that iter->seq was completely zeroed
at this point, including iter->seq.seq.size. It means that
print_trace_line() was never able to print anything and
there was no forward progress.

The culprit seems to be in the code:

/* reset all but tr, trace, and overruns */
memset(&iter->seq, 0,
       sizeof(struct trace_iterator) -
       offsetof(struct trace_iterator, seq));

It was added by the commit 53d0aa773053ab182877 ("ftrace:
add logic to record overruns"). It was v2.6.27-rc1.
It was the time when iter->seq looked like:

     struct trace_seq {
unsigned char buffer[PAGE_SIZE];
unsigned int len;
     };

There was no "size" variable and zeroing was perfectly fine.

The solution is to reinitialize the structure after or without
zeroing.

Link: http://lkml.kernel.org/r/20191011142134.11997-1-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
4 years agotracing/hwlat: Don't ignore outer-loop duration when calculating max_latency
Srivatsa S. Bhat (VMware) [Thu, 10 Oct 2019 18:51:01 +0000 (11:51 -0700)]
tracing/hwlat: Don't ignore outer-loop duration when calculating max_latency

max_latency is intended to record the maximum ever observed hardware
latency, which may occur in either part of the loop (inner/outer). So
we need to also consider the outer-loop sample when updating
max_latency.

Link: http://lkml.kernel.org/r/157073345463.17189.18124025522664682811.stgit@srivatsa-ubuntu
Fixes: e7c15cd8a113 ("tracing: Added hardware latency tracer")
Cc: stable@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
4 years agotracing/hwlat: Report total time spent in all NMIs during the sample
Srivatsa S. Bhat (VMware) [Thu, 10 Oct 2019 18:50:46 +0000 (11:50 -0700)]
tracing/hwlat: Report total time spent in all NMIs during the sample

nmi_total_ts is supposed to record the total time spent in *all* NMIs
that occur on the given CPU during the (active portion of the)
sampling window. However, the code seems to be overwriting this
variable for each NMI, thereby only recording the time spent in the
most recent NMI. Fix it by accumulating the duration instead.

Link: http://lkml.kernel.org/r/157073343544.17189.13911783866738671133.stgit@srivatsa-ubuntu
Fixes: 7b2c86250122 ("tracing: Add NMI tracing in hwlat detector")
Cc: stable@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
4 years agorecordmcount: Fix nop_mcount() function
Steven Rostedt (VMware) [Wed, 9 Oct 2019 15:05:38 +0000 (11:05 -0400)]
recordmcount: Fix nop_mcount() function

The removal of the longjmp code in recordmcount.c mistakenly made the return
of make_nop() being negative an exit of nop_mcount(). It should not exit the
routine, but instead just not process that part of the code. By exiting with
an error code, it would cause the update of recordmcount to fail some files
which would fail the build if ftrace function tracing was enabled.

Link: http://lkml.kernel.org/r/20191009110538.5909fec6@gandalf.local.home
Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Tested-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Fixes: 3f1df12019f3 ("recordmcount: Rewrite error/success handling")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
4 years agotracing: Do not create tracefs files if tracefs lockdown is in effect
Steven Rostedt (VMware) [Sat, 12 Oct 2019 00:41:41 +0000 (20:41 -0400)]
tracing: Do not create tracefs files if tracefs lockdown is in effect

If on boot up, lockdown is activated for tracefs, don't even bother creating
the files. This can also prevent instances from being created if lockdown is
in effect.

Link: http://lkml.kernel.org/r/CAHk-=whC6Ji=fWnjh2+eS4b15TnbsS4VPVtvBOwCy1jjEG_JHQ@mail.gmail.com
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
4 years agotracing: Add locked_down checks to the open calls of files created for tracefs
Steven Rostedt (VMware) [Fri, 11 Oct 2019 21:22:50 +0000 (17:22 -0400)]
tracing: Add locked_down checks to the open calls of files created for tracefs

Added various checks on open tracefs calls to see if tracefs is in lockdown
mode, and if so, to return -EPERM.

Note, the event format files (which are basically standard on all machines)
as well as the enabled_functions file (which shows what is currently being
traced) are not lockde down. Perhaps they should be, but it seems counter
intuitive to lockdown information to help you know if the system has been
modified.

Link: http://lkml.kernel.org/r/CAHk-=wj7fGPKUspr579Cii-w_y60PtRaiDgKuxVtBAMK0VNNkA@mail.gmail.com
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
4 years agotracing: Add tracing_check_open_get_tr()
Steven Rostedt (VMware) [Fri, 11 Oct 2019 21:39:57 +0000 (17:39 -0400)]
tracing: Add tracing_check_open_get_tr()

Currently, most files in the tracefs directory test if tracing_disabled is
set. If so, it should return -ENODEV. The tracing_disabled is called when
tracing is found to be broken. Originally it was done in case the ring
buffer was found to be corrupted, and we wanted to prevent reading it from
crashing the kernel. But it's also called if a tracing selftest fails on
boot. It's a one way switch. That is, once it is triggered, tracing is
disabled until reboot.

As most tracefs files can also be used by instances in the tracefs
directory, they need to be carefully done. Each instance has a trace_array
associated to it, and when the instance is removed, the trace_array is
freed. But if an instance is opened with a reference to the trace_array,
then it requires looking up the trace_array to get its ref counter (as there
could be a race with it being deleted and the open itself). Once it is
found, a reference is added to prevent the instance from being removed (and
the trace_array associated with it freed).

Combine the two checks (tracing_disabled and trace_array_get()) into a
single helper function. This will also make it easier to add lockdown to
tracefs later.

Link: http://lkml.kernel.org/r/20191011135458.7399da44@gandalf.local.home
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
4 years agotracing: Have trace events system open call tracing_open_generic_tr()
Steven Rostedt (VMware) [Fri, 11 Oct 2019 23:12:21 +0000 (19:12 -0400)]
tracing: Have trace events system open call tracing_open_generic_tr()

Instead of having the trace events system open call open code the taking of
the trace_array descriptor (with trace_array_get()) and then calling
trace_open_generic(), have it use the tracing_open_generic_tr() that does
the combination of the two. This requires making tracing_open_generic_tr()
global.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
4 years agotracing: Get trace_array reference for available_tracers files
Steven Rostedt (VMware) [Fri, 11 Oct 2019 22:19:17 +0000 (18:19 -0400)]
tracing: Get trace_array reference for available_tracers files

As instances may have different tracers available, we need to look at the
trace_array descriptor that shows the list of the available tracers for the
instance. But there's a race between opening the file and an admin
deleting the instance. The trace_array_get() needs to be called before
accessing the trace_array.

Cc: stable@vger.kernel.org
Fixes: 607e2ea167e56 ("tracing: Set up infrastructure to allow tracers for instances")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
4 years agoftrace: Get a reference counter for the trace_array on filter files
Steven Rostedt (VMware) [Fri, 11 Oct 2019 21:56:57 +0000 (17:56 -0400)]
ftrace: Get a reference counter for the trace_array on filter files

The ftrace set_ftrace_filter and set_ftrace_notrace files are specific for
an instance now. They need to take a reference to the instance otherwise
there could be a race between accessing the files and deleting the instance.

It wasn't until the :mod: caching where these file operations started
referencing the trace_array directly.

Cc: stable@vger.kernel.org
Fixes: 673feb9d76ab3 ("ftrace: Add :mod: caching infrastructure to trace_array")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
4 years agotracefs: Revert ccbd54ff54e8 ("tracefs: Restrict tracefs when the kernel is locked down")
Steven Rostedt (VMware) [Fri, 11 Oct 2019 17:54:58 +0000 (13:54 -0400)]
tracefs: Revert ccbd54ff54e8 ("tracefs: Restrict tracefs when the kernel is locked down")

Running the latest kernel through my "make instances" stress tests, I
triggered the following bug (with KASAN and kmemleak enabled):

mkdir invoked oom-killer:
gfp_mask=0x40cd0(GFP_KERNEL|__GFP_COMP|__GFP_RECLAIMABLE), order=0,
oom_score_adj=0
CPU: 1 PID: 2229 Comm: mkdir Not tainted 5.4.0-rc2-test #325
Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
Call Trace:
 dump_stack+0x64/0x8c
 dump_header+0x43/0x3b7
 ? trace_hardirqs_on+0x48/0x4a
 oom_kill_process+0x68/0x2d5
 out_of_memory+0x2aa/0x2d0
 __alloc_pages_nodemask+0x96d/0xb67
 __alloc_pages_node+0x19/0x1e
 alloc_slab_page+0x17/0x45
 new_slab+0xd0/0x234
 ___slab_alloc.constprop.86+0x18f/0x336
 ? alloc_inode+0x2c/0x74
 ? irq_trace+0x12/0x1e
 ? tracer_hardirqs_off+0x1d/0xd7
 ? __slab_alloc.constprop.85+0x21/0x53
 __slab_alloc.constprop.85+0x31/0x53
 ? __slab_alloc.constprop.85+0x31/0x53
 ? alloc_inode+0x2c/0x74
 kmem_cache_alloc+0x50/0x179
 ? alloc_inode+0x2c/0x74
 alloc_inode+0x2c/0x74
 new_inode_pseudo+0xf/0x48
 new_inode+0x15/0x25
 tracefs_get_inode+0x23/0x7c
 ? lookup_one_len+0x54/0x6c
 tracefs_create_file+0x53/0x11d
 trace_create_file+0x15/0x33
 event_create_dir+0x2a3/0x34b
 __trace_add_new_event+0x1c/0x26
 event_trace_add_tracer+0x56/0x86
 trace_array_create+0x13e/0x1e1
 instance_mkdir+0x8/0x17
 tracefs_syscall_mkdir+0x39/0x50
 ? get_dname+0x31/0x31
 vfs_mkdir+0x78/0xa3
 do_mkdirat+0x71/0xb0
 sys_mkdir+0x19/0x1b
 do_fast_syscall_32+0xb0/0xed

I bisected this down to the addition of the proxy_ops into tracefs for
lockdown. It appears that the allocation of the proxy_ops and then freeing
it in the destroy_inode callback, is causing havoc with the memory system.
Reading the documentation about destroy_inode and talking with Linus about
this, this is buggy and wrong. When defining the destroy_inode() method, it
is expected that the destroy_inode() will also free the inode, and not just
the extra allocations done in the creation of the inode. The faulty commit
causes a memory leak of the inode data structure when they are deleted.

Instead of allocating the proxy_ops (and then having to free it) the checks
should be done by the open functions themselves, and not hack into the
tracefs directory. First revert the tracefs updates for locked_down and then
later we can add the locked_down checks in the kernel/trace files.

Link: http://lkml.kernel.org/r/20191011135458.7399da44@gandalf.local.home
Fixes: ccbd54ff54e8 ("tracefs: Restrict tracefs when the kernel is locked down")
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
4 years agoMerge tag 'char-misc-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh...
Linus Torvalds [Sat, 12 Oct 2019 22:47:19 +0000 (15:47 -0700)]
Merge tag 'char-misc-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc driver fixes from Greg KH:
 "Here are some small char/misc driver fixes for 5.4-rc3.

  Nothing huge here. Some binder driver fixes (although it is still
  being discussed if these all fix the reported issues or not, so more
  might be coming later), some mei device ids and fixes, and a google
  firmware driver bugfix that fixes a regression, as well as some other
  tiny fixes.

  All have been in linux-next with no reported issues"

* tag 'char-misc-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  firmware: google: increment VPD key_len properly
  w1: ds250x: Fix build error without CRC16
  virt: vbox: fix memory leak in hgcm_call_preprocess_linaddr
  binder: Fix comment headers on binder_alloc_prepare_to_free()
  binder: prevent UAF read in print_binder_transaction_log_entry()
  misc: fastrpc: prevent memory leak in fastrpc_dma_buf_attach
  mei: avoid FW version request on Ibex Peak and earlier
  mei: me: add comet point (lake) LP device ids

4 years agoMerge tag 'staging-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh...
Linus Torvalds [Sat, 12 Oct 2019 22:44:46 +0000 (15:44 -0700)]
Merge tag 'staging-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging/IIO driver fixes from Greg KH:
 "Here are some staging and IIO driver fixes for 5.4-rc3.

  The "biggest" thing here is a removal of the fbtft device and flexfb
  code as they have been abandoned by their authors and are no longer
  needed for that hardware.

  Other than that, the usual amount of staging driver and iio driver
  fixes for reported issues, and some speakup sysfs file documentation,
  which has been long awaited for.

  All have been in linux-next with no reported issues"

* tag 'staging-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (32 commits)
  iio: Fix an undefied reference error in noa1305_probe
  iio: light: opt3001: fix mutex unlock race
  iio: adc: ad799x: fix probe error handling
  iio: light: add missing vcnl4040 of_compatible
  iio: light: fix vcnl4000 devicetree hooks
  iio: imu: st_lsm6dsx: fix waitime for st_lsm6dsx i2c controller
  iio: adc: axp288: Override TS pin bias current for some models
  iio: imu: adis16400: fix memory leak
  iio: imu: adis16400: release allocated memory on failure
  iio: adc: stm32-adc: fix a race when using several adcs with dma and irq
  iio: adc: stm32-adc: move registers definitions
  iio: accel: adxl372: Perform a reset at start up
  iio: accel: adxl372: Fix push to buffers lost samples
  iio: accel: adxl372: Fix/remove limitation for FIFO samples
  iio: adc: hx711: fix bug in sampling of data
  staging: vt6655: Fix memory leak in vt6655_probe
  staging: exfat: Use kvzalloc() instead of kzalloc() for exfat_sb_info
  Staging: fbtft: fix memory leak in fbtft_framebuffer_alloc
  staging: speakup: document sysfs attributes
  staging: rtl8188eu: fix HighestRate check in odm_ARFBRefresh_8188E()
  ...

4 years agoMerge tag 'tty-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Linus Torvalds [Sat, 12 Oct 2019 22:42:19 +0000 (15:42 -0700)]
Merge tag 'tty-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial driver fixes from Greg KH:
 "Here are some small tty and serial driver fixes for 5.4-rc3 that
  resolve a number of reported issues and regressions.

  None of these are huge, full details are in the shortlog. There's also
  a MAINTAINERS update that I think you might have already taken in your
  tree already, but git should handle that merge easily.

  All have been in linux-next with no reported issues"

* tag 'tty-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  MAINTAINERS: kgdb: Add myself as a reviewer for kgdb/kdb
  tty: serial: imx: Use platform_get_irq_optional() for optional IRQs
  serial: fix kernel-doc warning in comments
  serial: 8250_omap: Fix gpio check for auto RTS/CTS
  serial: mctrl_gpio: Check for NULL pointer
  tty: serial: fsl_lpuart: Fix lpuart_flush_buffer()
  tty: serial: Fix PORT_LINFLEXUART definition
  tty: n_hdlc: fix build on SPARC
  serial: uartps: Fix uartps_major handling
  serial: uartlite: fix exit path null pointer
  tty: serial: linflexuart: Fix magic SysRq handling
  serial: sh-sci: Use platform_get_irq_optional() for optional interrupts
  dt-bindings: serial: sh-sci: Document r8a774b1 bindings
  serial/sifive: select SERIAL_EARLYCON
  tty: serial: rda: Fix the link time qualifier of 'rda_uart_exit()'
  tty: serial: owl: Fix the link time qualifier of 'owl_uart_exit()'

4 years agoMerge tag 'usb-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Linus Torvalds [Sat, 12 Oct 2019 22:37:12 +0000 (15:37 -0700)]
Merge tag 'usb-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
 "Here are a lot of small USB driver fixes for 5.4-rc3.

  syzbot has stepped up its testing of the USB driver stack, now able to
  trigger fun race conditions between disconnect and probe functions.
  Because of that we have a lot of fixes in here from Johan and others
  fixing these reported issues that have been around since almost all
  time.

  We also are just deleting the rio500 driver, making all of the syzbot
  bugs found in it moot as it turns out no one has been using it for
  years as there is a userspace version that is being used instead.

  There are also a number of other small fixes in here, all resolving
  reported issues or regressions.

  All have been in linux-next without any reported issues"

* tag 'usb-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (65 commits)
  USB: yurex: fix NULL-derefs on disconnect
  USB: iowarrior: use pr_err()
  USB: iowarrior: drop redundant iowarrior mutex
  USB: iowarrior: drop redundant disconnect mutex
  USB: iowarrior: fix use-after-free after driver unbind
  USB: iowarrior: fix use-after-free on release
  USB: iowarrior: fix use-after-free on disconnect
  USB: chaoskey: fix use-after-free on release
  USB: adutux: fix use-after-free on release
  USB: ldusb: fix NULL-derefs on driver unbind
  USB: legousbtower: fix use-after-free on release
  usb: cdns3: Fix for incorrect DMA mask.
  usb: cdns3: fix cdns3_core_init_role()
  usb: cdns3: gadget: Fix full-speed mode
  USB: usb-skeleton: drop redundant in-urb check
  USB: usb-skeleton: fix use-after-free after driver unbind
  USB: usb-skeleton: fix NULL-deref on disconnect
  usb:cdns3: Fix for CV CH9 running with g_zero driver.
  usb: dwc3: Remove dev_err() on platform_get_irq() failure
  usb: dwc3: Switch to platform_get_irq_byname_optional()
  ...

4 years agoMerge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 12 Oct 2019 22:29:54 +0000 (15:29 -0700)]
Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fixes from Ingo Molnar:
 "Two fixes: a guest-cputime accounting fix, and a cgroup bandwidth
  quota precision fix"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/vtime: Fix guest/system mis-accounting on task switch
  sched/fair: Scale bandwidth quota and period without losing quota/period ratio precision

4 years agoMerge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 12 Oct 2019 22:15:17 +0000 (15:15 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fixes from Ingo Molnar:
 "Mostly tooling fixes, but also a couple of updates for new Intel
  models (which are technically hw-enablement, but to users it's a fix
  to perf behavior on those new CPUs - hope this is fine), an AUX
  inheritance fix, event time-sharing fix, and a fix for lost non-perf
  NMI events on AMD systems"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
  perf/x86/cstate: Add Tiger Lake CPU support
  perf/x86/msr: Add Tiger Lake CPU support
  perf/x86/intel: Add Tiger Lake CPU support
  perf/x86/cstate: Update C-state counters for Ice Lake
  perf/x86/msr: Add new CPU model numbers for Ice Lake
  perf/x86/cstate: Add Comet Lake CPU support
  perf/x86/msr: Add Comet Lake CPU support
  perf/x86/intel: Add Comet Lake CPU support
  perf/x86/amd: Change/fix NMI latency mitigation to use a timestamp
  perf/core: Fix corner case in perf_rotate_context()
  perf/core: Rework memory accounting in perf_mmap()
  perf/core: Fix inheritance of aux_output groups
  perf annotate: Don't return -1 for error when doing BPF disassembly
  perf annotate: Return appropriate error code for allocation failures
  perf annotate: Fix arch specific ->init() failure errors
  perf annotate: Propagate the symbol__annotate() error return
  perf annotate: Fix the signedness of failure returns
  perf annotate: Propagate perf_env__arch() error
  perf evsel: Fall back to global 'perf_env' in perf_evsel__env()
  perf tools: Propagate get_cpuid() error
  ...

4 years agoMerge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 12 Oct 2019 22:08:24 +0000 (15:08 -0700)]
Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull EFI fixes from Ingo Molnar:
 "Misc EFI fixes all across the map: CPER error report fixes, fixes to
  TPM event log parsing, fix for a kexec hang, a Sparse fix and other
  fixes"

* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efi/tpm: Fix sanity check of unsigned tbl_size being less than zero
  efi/x86: Do not clean dummy variable in kexec path
  efi: Make unexported efi_rci2_sysfs_init() static
  efi/tpm: Only set 'efi_tpm_final_log_size' after successful event log parsing
  efi/tpm: Don't traverse an event log with no events
  efi/tpm: Don't access event->count when it isn't mapped
  efivar/ssdt: Don't iterate over EFI vars if no SSDT override was specified
  efi/cper: Fix endianness of PCIe class code

4 years agoMerge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 12 Oct 2019 21:46:14 +0000 (14:46 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:
 "A handful of fixes: a kexec linking fix, an AMD MWAITX fix, a vmware
  guest support fix when built under Clang, and new CPU model number
  definitions"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Add Comet Lake to the Intel CPU models header
  lib/string: Make memzero_explicit() inline instead of external
  x86/cpu/vmware: Use the full form of INL in VMWARE_PORT
  x86/asm: Fix MWAITX C-state hint value

4 years agoMerge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 12 Oct 2019 21:37:55 +0000 (14:37 -0700)]
Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 license tag fixlets from Ingo Molnar:
 "Fix a couple of SPDX tags in x86 headers to follow the canonical
  pattern"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Use the correct SPDX License Identifier in headers

4 years agoMerge tag 'riscv/for-v5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv...
Linus Torvalds [Sat, 12 Oct 2019 21:25:38 +0000 (14:25 -0700)]
Merge tag 'riscv/for-v5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fixes from Paul Walmsley:

 - Fix several bugs in the breakpoint trap handler

 - Drop an unnecessary loop around calls to preempt_schedule_irq()

* tag 'riscv/for-v5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  RISC-V: entry: Remove unneeded need_resched() loop
  riscv: Correct the handling of unexpected ebreak in do_trap_break()
  riscv: avoid sending a SIGTRAP to a user thread trapped in WARN()
  riscv: avoid kernel hangs when trapped in BUG()

4 years agoMerge tag 'mips_fixes_5.4_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Linus Torvalds [Sat, 12 Oct 2019 21:16:51 +0000 (14:16 -0700)]
Merge tag 'mips_fixes_5.4_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux

Pull MIPS fixes from Paul Burton:

 - Build fixes for CONFIG_OPTIMIZE_INLINING=y builds in which the
   compiler may choose not to inline __xchg() & __cmpxchg().

 - A build fix for Loongson configurations with GCC 9.x.

 - Expose some extra HWCAP bits to indicate support for various
   instruction set extensions to userland.

 - Fix bad stack access in firmware handling code for old SNI
   RM200/300/400 machines.

* tag 'mips_fixes_5.4_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
  MIPS: Disable Loongson MMI instructions for kernel build
  MIPS: elf_hwcap: Export userspace ASEs
  MIPS: fw: sni: Fix out of bounds init of o32 stack
  MIPS: include: Mark __xchg as __always_inline
  MIPS: include: Mark __cmpxchg as __always_inline

4 years agoMerge tag 'powerpc-5.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Linus Torvalds [Sat, 12 Oct 2019 21:13:55 +0000 (14:13 -0700)]
Merge tag 'powerpc-5.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "Fix a kernel crash in spufs_create_root() on Cell machines, since the
  new mount API went in.

  Fix a regression in our KVM code caused by our recent PCR changes.

  Avoid a warning message about a failing hypervisor API on systems that
  don't have that API.

  A couple of minor build fixes.

  Thanks to: Alexey Kardashevskiy, Alistair Popple, Desnes A. Nunes do
  Rosario, Emmanuel Nicolet, Jordan Niethe, Laurent Dufour, Stephen
  Rothwell"

* tag 'powerpc-5.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  spufs: fix a crash in spufs_create_root()
  powerpc/kvm: Fix kvmppc_vcore->in_guest value in kvmhv_switch_to_host
  selftests/powerpc: Fix compile error on tlbie_test due to newer gcc
  powerpc/pseries: Remove confusing warning message.
  powerpc/64s/radix: Fix build failure with RADIX_MMU=n

4 years agoMerge tag 'for-linus-5.4-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 12 Oct 2019 21:11:21 +0000 (14:11 -0700)]
Merge tag 'for-linus-5.4-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:

 - correct panic handling when running as a Xen guest

 - cleanup the Xen grant driver to remove printing a pointer being
   always NULL

 - remove a soon to be wrong call of of_dma_configure()

* tag 'for-linus-5.4-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen: Stop abusing DT of_dma_configure API
  xen/grant-table: remove unnecessary printing
  x86/xen: Return from panic notifier

4 years agoMerge tag 's390-5.4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Linus Torvalds [Sat, 12 Oct 2019 21:09:31 +0000 (14:09 -0700)]
Merge tag 's390-5.4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 fixes from Vasily Gorbik:

 - Fix virtio-ccw DMA regression

 - Fix compiler warnings in uaccess

* tag 's390-5.4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/uaccess: avoid (false positive) compiler warnings
  s390/cio: fix virtio-ccw DMA without PV

4 years agoperf/x86/cstate: Add Tiger Lake CPU support
Kan Liang [Tue, 8 Oct 2019 15:50:10 +0000 (08:50 -0700)]
perf/x86/cstate: Add Tiger Lake CPU support

Tiger Lake is the followon to Ice Lake. From the perspective of Intel
cstate residency counters, there is nothing changed compared with
Ice Lake.

Share icl_cstates with Ice Lake.
Update the comments for Tiger Lake.

The External Design Specification (EDS) is not published yet. It comes
from an authoritative internal source.

The patch has been tested on real hardware.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1570549810-25049-10-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agoperf/x86/msr: Add Tiger Lake CPU support
Kan Liang [Tue, 8 Oct 2019 15:50:09 +0000 (08:50 -0700)]
perf/x86/msr: Add Tiger Lake CPU support

Tiger Lake is the followon to Ice Lake. PPERF and SMI_COUNT MSRs are
also supported.

The External Design Specification (EDS) is not published yet. It comes
from an authoritative internal source.

The patch has been tested on real hardware.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1570549810-25049-9-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agoperf/x86/intel: Add Tiger Lake CPU support
Kan Liang [Tue, 8 Oct 2019 15:50:08 +0000 (08:50 -0700)]
perf/x86/intel: Add Tiger Lake CPU support

Tiger Lake is the followon to Ice Lake. From the perspective of Intel
core PMU, there is little changes compared with Ice Lake, e.g. small
changes in event list. But it doesn't impact on core PMU functionality.
Share the perf code with Ice Lake. The event list patch will be submitted
later separately.

The patch has been tested on real hardware.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1570549810-25049-8-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agoperf/x86/cstate: Update C-state counters for Ice Lake
Kan Liang [Tue, 8 Oct 2019 15:50:07 +0000 (08:50 -0700)]
perf/x86/cstate: Update C-state counters for Ice Lake

There is no Core C3 C-State counter for Ice Lake.
Package C8/C9/C10 C-State counters are added for Ice Lake.

Introduce a new event list, icl_cstates, for Ice Lake.
Update the comments accordingly.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: f08c47d1f86c ("perf/x86/intel/cstate: Add Icelake support")
Link: https://lkml.kernel.org/r/1570549810-25049-7-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agoperf/x86/msr: Add new CPU model numbers for Ice Lake
Kan Liang [Tue, 8 Oct 2019 15:50:06 +0000 (08:50 -0700)]
perf/x86/msr: Add new CPU model numbers for Ice Lake

PPERF and SMI_COUNT MSRs are also supported by Ice Lake desktop and
server.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1570549810-25049-6-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agoperf/x86/cstate: Add Comet Lake CPU support
Kan Liang [Tue, 8 Oct 2019 15:50:05 +0000 (08:50 -0700)]
perf/x86/cstate: Add Comet Lake CPU support

Comet Lake is the new 10th Gen Intel processor. From the perspective of
Intel cstate residency counters, there is nothing changed compared with
Kaby Lake.

Share hswult_cstates with Kaby Lake.
Update the comments for Comet Lake.
Kaby Lake is missed in the comments for some Residency Counters. Update
the comments for Kaby Lake as well.

The External Design Specification (EDS) is not published yet. It comes
from an authoritative internal source.

The patch has been tested on real hardware.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1570549810-25049-5-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>