Chao Yu [Tue, 29 May 2018 16:20:41 +0000 (00:20 +0800)]
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
This is because in recovery flow of __exchange_data_block, we didn't pass olen to
__roll_back_blkaddrs, instead we passed len, which indicates wrong array size, result
in copying random block address into dnode page.
Later, once that random block address was accessed by is_checkpointed_data, it can
cause NULL pointer dereference.
Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
allocate_data_block will submit all sequential pending IOs sorted by a
FIFO list, If we failed to submit other user's IO due to unaligned write,
we will retry to allocate new block address for current IO, then it will
initialize fio.list again, if fio was in the list before, it can break
FIFO list, result in above panic.
Thread A Thread B
- do_write_page
- allocate_data_block
- list_add_tail
: fioA cached in FIFO list.
- do_write_page
- allocate_data_block
- list_add_tail
: fioB cached in FIFO list.
- f2fs_submit_page_write
: fail to submit IO
- allocate_data_block
- INIT_LIST_HEAD
- f2fs_submit_page_write
- list_del <-- NULL pointer dereference
This patch adds fio.retry parameter to indicate failure status for each
IO, and avoid bailing out if there is still pending IO in FIFO list for
fixing.
Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Anatoly Pugachev [Sun, 27 May 2018 23:06:37 +0000 (02:06 +0300)]
disable loading f2fs module on PAGE_SIZE > 4KB
The following patch disables loading of f2fs module on architectures
which have PAGE_SIZE > 4096 , since it is impossible to mount f2fs on
such architectures , log messages are:
mount: /mnt: wrong fs type, bad option, bad superblock on
/dev/vdiskb1, missing codepage or helper program, or other error.
/dev/vdiskb1: F2FS filesystem,
UUID=1d8b9ca4-2389-4910-af3b-10998969f09c, volume name ""
May 15 18:03:13 ttip kernel: F2FS-fs (vdiskb1): Invalid
page_cache_size (8192), supports only 4KB
May 15 18:03:13 ttip kernel: F2FS-fs (vdiskb1): Can't find valid F2FS
filesystem in 1th superblock
May 15 18:03:13 ttip kernel: F2FS-fs (vdiskb1): Invalid
page_cache_size (8192), supports only 4KB
May 15 18:03:13 ttip kernel: F2FS-fs (vdiskb1): Can't find valid F2FS
filesystem in 2th superblock
May 15 18:03:13 ttip kernel: F2FS-fs (vdiskb1): Invalid
page_cache_size (8192), supports only 4KB
Chao Yu [Mon, 28 May 2018 08:57:32 +0000 (16:57 +0800)]
f2fs: fix to avoid race during access gc_thread pointer
Thread A Thread B
- f2fs_remount
- stop_gc_thread
- f2fs_sbi_store
sbi->gc_thread = NULL;
access sbi->gc_thread->gc_*
Previously, we allocate memory for sbi->gc_thread based on background
gc thread mount option, the memory can be released if we turn off
that mount option, but still there are several places access gc_thread
pointer without considering race condition, result in NULL point
dereference.
In order to fix this issue, use sb->s_umount to exclude those operations.
Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
With data_flush mount option, during recovery, in order to avoid entering
above writeback flow, let's detect recovery status and do skip in
f2fs_balance_fs_bg.
Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Sheng Yong [Tue, 8 May 2018 09:51:34 +0000 (17:51 +0800)]
f2fs: clear discard_wake earlier
If SBI_NEED_FSCK is set, discard_wake will never be cleared. As a
result, the condition of wait_event_interruptible_timeout() is always
true, which gets discard thread run too frequently.
Chao Yu [Mon, 7 May 2018 12:28:54 +0000 (20:28 +0800)]
f2fs: avoid stucking GC due to atomic write
f2fs doesn't allow abuse on atomic write class interface, so except
limiting in-mem pages' total memory usage capacity, we need to limit
atomic-write usage as well when filesystem is seriously fragmented,
otherwise we may run into infinite loop during foreground GC because
target blocks in victim segment are belong to atomic opened file for
long time.
Now, we will detect failure due to atomic write in foreground GC, if
the count exceeds threshold, we will drop all atomic written data in
cache, by this, I expect it can keep our system running safely to
prevent Dos attack.
In addition, his patch adds to show GC skip information in debugfs,
now it just shows count of skipped caused by atomic write.
Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Sat, 26 May 2018 01:00:13 +0000 (09:00 +0800)]
f2fs: keep migration IO order in LFS mode
For non-migration IO, we will keep order of data/node blocks' submitting
as allocation sequence by sorting IOs in per log io_list list, but for
migration IO, it could be out-of-order.
In LFS mode, we should keep all IOs including migration IO be ordered,
so that this patch fixes to add an additional lock to keep submitting
order.
Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
f2fs: fix to wait page writeback during revoking atomic write
After revoking atomic write, related LBA can be reused by others, so we
need to wait page writeback before reusing the LBA, in order to avoid
interference between old atomic written in-flight IO and new IO.
Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Wed, 23 May 2018 14:25:09 +0000 (22:25 +0800)]
f2fs: detect synchronous writeback more earlier
This patch changes to detect synchronous writeback more earlier before,
in order to avoid unnecessary page writeback before exiting asynchronous
writeback.
Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Tue, 8 May 2018 06:06:03 +0000 (14:06 +0800)]
f2fs: fix to let checkpoint guarantee atomic page persistence
1. thread A: commit_inmem_pages submit data into block layer, but
haven't waited it writeback.
2. thread A: commit_inmem_pages update related node.
3. thread B: do checkpoint, flush all nodes to disk.
4. SPOR
Then, atomic file becomes corrupted since nodes is flushed before data.
This patch fixes to treat atomic page as checkpoint guaranteed one,
then in checkpoint, we can make sure all atomic page can be writebacked
with metadata of atomic file.
Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Mon, 7 May 2018 12:28:52 +0000 (20:28 +0800)]
f2fs: fix to initialize i_current_depth according to inode type
i_current_depth is used only for directory inode, but its space is
shared with i_gc_failures field used for regular inode, in order to
avoid affecting i_gc_failures' value, this patch fixes to initialize
the union's fields according to inode type.
Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Revert "f2fs: add ovp valid_blocks check for bg gc victim to fg_gc"
For extreme case:
10 section, op = 10%, no_fggc_threshold = 90%
All section usage: 85% 85% 85% 85% 90% 90% 95% 95% 95% 95%
During foreground GC, if we skip select dirty section whose usage
is larger than no_fggc_threshold, we can only recycle 80% invalid
space from four 85% usage sections and two 90% usage sections,
result in encountering out-of-space issue.
This reverts commit e93b9865251a0503d83fd570e7d5a7c8bc351715 to
fix this issue, besides, we keep the logic that we scan all dirty
section when searching a victim, so that GC can select victim with
least valid blocks.
Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Sat, 28 Apr 2018 02:03:22 +0000 (19:03 -0700)]
f2fs: enhance sanity_check_raw_super() to avoid potential overflows
In order to avoid the below overflow issue, we should have checked the
boundaries in superblock before reaching out to allocation. As Linus suggested,
the right place should be sanity_check_raw_super().
Dr Silvio Cesare of InfoSect reported:
There are integer overflows with using the cp_payload superblock field in the
f2fs filesystem potentially leading to memory corruption.
include/linux/f2fs_fs.h
struct f2fs_super_block {
...
__le32 cp_payload;
fs/f2fs/f2fs.h
typedef u32 block_t; /*
* should not change u32, since it is the on-disk block
* address format, __le32.
*/
...
There will be two times potential overflow:
- old_valid_blocks - se->valid_blocks will overflow, and be a very
large number.
- sbi->discard_blks += result will overflow again, comes out a correct
result accidently.
Anyway, it should be fixed.
Fixes: d600af236da5 ("f2fs: avoid unneeded loop in build_sit_entries") Fixes: 1f43e2ad7bff ("f2fs: introduce CP_TRIMMED_FLAG to avoid unneeded discard") Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
RW semphore dio_rwsem in struct f2fs_inode_info is introduced to avoid
race between dio and data gc, but now, it is more wildly used to avoid
foreground operation vs data gc. So rename it to i_gc_rwsem to improve
its readability.
Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+bf9253040425feb155ad@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for details.
If you forward the report, please keep this part and the footer.
IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+83699adeb2d13579c31e@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for details.
If you forward the report, please keep this part and the footer.
IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+d154ec99402c6f628887@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for details.
If you forward the report, please keep this part and the footer.
Yunlei He [Fri, 13 Apr 2018 03:08:05 +0000 (11:08 +0800)]
f2fs: stop issue discard if something wrong with f2fs
v4->v5: move data corruption check to __submit_discard_cmd, in order to
control discard io submitted more accurately, besides, increase async
thread wait time if data corruption detected.
This patch stop async thread and umount process to issue discard
if something wrong with f2fs, which is similar to fstrim.
Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Now F2FS_FL_USER_VISIBLE and F2FS_FL_USER_MODIFIABLE has included
F2FS_PROJINHERIT_FL, so remove unneeded F2FS_PROJINHERIT_FL when
using visible/modifiable flag macro.
Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Sometimes, we need to write meta data to new allocated block address,
then we will allocate a zeroed page in inner inode's address space, and
fill partial data in it, and leave other place with zero value which means
some fields are initial status.
There are two inner inodes (meta inode and node inode) setting __GFP_ZERO,
I have just checked them, for both of them, we can avoid using __GFP_ZERO,
and do initialization by ourselves to avoid unneeded/redundant zeroing
from mm.
Cc: <stable@vger.kernel.org> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
For buffered IO, we don't need to use block plug to cache bio,
for direct IO, generic f2fs_direct_IO has already added block
plug, so let's remove redundant one in .write_iter.
Yunlong Song [Tue, 3 Apr 2018 11:42:41 +0000 (19:42 +0800)]
f2fs: remove unmatched zero_user_segment when convert inline dentry
Since the layout of regular dentry block is different from inline dentry
block, zero_user_segment starting from MAX_INLINE_DATA(dir) is not
correct for regular dentry block, besides, bitmap is already copied and
used, so there is no necessary to zero page at all, so just remove the
zero_user_segment is OK.
Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Previously, we use generic FS_*_FL defined by vfs to indicate inode status
for each bit of i_flags, so f2fs's flag status definition is tied to vfs'
one, it will be hard for f2fs to reuse bits f2fs never used to indicate
new status..
In order to solve this issue, we introduce private inode status mapping,
Note, for these bits have already been persisted into disk, we should
never change their definition, for other ones, we can remap them for
later new coming status.
Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Now, we issue discard asynchronously in separated thread instead of in
checkpoint, after that, we won't encounter long latency in checkpoint
due to huge number of synchronous discard command handling, so, we don't
need to split checkpoint to do trim in batch, merge it and obsolete
related sysfs entry.
Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Tue, 29 May 2018 16:58:42 +0000 (09:58 -0700)]
f2fs: issue discard commands proactively in high fs utilization
In the high utilization like over 80%, we don't expect huge # of large discard
commands, but do many small pending discards which affects FTL GCs a lot.
Let's issue them in that case.
Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Thu, 24 May 2018 20:57:26 +0000 (13:57 -0700)]
f2fs: let fstrim issue discard commands in lower priority
The fstrim gathers huge number of large discard commands, and tries to issue
without IO awareness, which results in long user-perceive IO latencies on
READ, WRITE, and FLUSH in UFS. We've observed some of commands take several
seconds due to long discard latency.
This patch limits the maximum size to 2MB per candidate, and check IO congestion
when issuing them to disk.
Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Fri, 4 May 2018 06:26:02 +0000 (23:26 -0700)]
f2fs: avoid fsync() failure caused by EAGAIN in writepage()
pageout() in MM traslates EAGAIN, so calls handle_write_error()
-> mapping_set_error() -> set_bit(AS_EIO, ...).
file_write_and_wait_range() will see EIO error, which is critical
to return value of fsync() followed by atomic_write failure to user.
Jaegeuk Kim [Thu, 12 Apr 2018 06:09:04 +0000 (23:09 -0700)]
f2fs: clear PageError on writepage
This patch clears PageError in some pages tagged by read path, but when we
write the pages with valid contents, writepage should clear the bit likewise
ext4.
Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Eric Biggers [Wed, 18 Apr 2018 22:48:42 +0000 (15:48 -0700)]
f2fs: call unlock_new_inode() before d_instantiate()
xfstest generic/429 sometimes hangs on f2fs, caused by a thread being
unable to take a directory's i_rwsem for write in vfs_rmdir(). In the
test, one thread repeatedly creates and removes a directory, and other
threads repeatedly look up a file in the directory. The bug is that
f2fs_mkdir() calls d_instantiate() before unlock_new_inode(), resulting
in the directory inode being exposed to lookups before it has been fully
initialized. And with CONFIG_DEBUG_LOCK_ALLOC, unlock_new_inode()
reinitializes ->i_rwsem, corrupting its state when it is already held.
Fix it by calling unlock_new_inode() before d_instantiate(). This
matches what other filesystems do.
Fixes: 57397d86c62d ("f2fs: add inode operations for special inodes") Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Eric Biggers [Wed, 18 Apr 2018 18:09:48 +0000 (11:09 -0700)]
f2fs: refactor read path to allow multiple postprocessing steps
Currently f2fs's ->readpage() and ->readpages() assume that either the
data undergoes no postprocessing, or decryption only. But with
fs-verity, there will be an additional authenticity verification step,
and it may be needed either by itself, or combined with decryption.
To support this, store a 'struct bio_post_read_ctx' in ->bi_private
which contains a work struct, a bitmask of postprocessing steps that are
enabled, and an indicator of the current step. The bio completion
routine, if there was no I/O error, enqueues the first postprocessing
step. When that completes, it continues to the next step. Pages that
fail any postprocessing step have PageError set. Once all steps have
completed, pages without PageError set are set Uptodate, and all pages
are unlocked.
Also replace f2fs_encrypted_file() with a new function
f2fs_post_read_required() in places like direct I/O and garbage
collection that really should be testing whether the file needs special
I/O processing, not whether it is encrypted specifically.
This may also be useful for other future f2fs features such as
compression.
Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Eric Biggers [Wed, 18 Apr 2018 18:09:47 +0000 (11:09 -0700)]
fscrypt: allow synchronous bio decryption
Currently, fscrypt provides fscrypt_decrypt_bio_pages() which decrypts a
bio's pages asynchronously, then unlocks them afterwards. But, this
assumes that decryption is the last "postprocessing step" for the bio,
so it's incompatible with additional postprocessing steps such as
authenticity verification after decryption.
Therefore, rename the existing fscrypt_decrypt_bio_pages() to
fscrypt_enqueue_decrypt_bio(). Then, add fscrypt_decrypt_bio() which
decrypts the pages in the bio synchronously without unlocking the pages,
nor setting them Uptodate; and add fscrypt_enqueue_decrypt_work(), which
enqueues work on the fscrypt_read_workqueue. The new functions will be
used by filesystems that support both fscrypt and fs-verity.
Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
We already have memcpy_toio(), but not memset_io(), so let's
add the obvious version to allow building an allmodconfig kernel
without errors like
drivers/gpu/drm/ttm/ttm_bo_util.c: In function 'ttm_bo_move_memcpy':
drivers/gpu/drm/ttm/ttm_bo_util.c:390:3: error: implicit declaration of function 'memset_io' [-Werror=implicit-function-declaration]
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
Linus Torvalds [Tue, 1 May 2018 16:11:45 +0000 (09:11 -0700)]
Merge tag 'xfs-4.17-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Darrick Wong:
"Here are a few more bug fixes for xfs for 4.17-rc4. Most of them are
fixes for bad behavior.
This series has been run through a full xfstests run during LSF and
through a quick xfstests run against this morning's master, with no
major failures reported.
Summary:
- Enhance inode fork verifiers to prevent loading of corrupted
metadata.
- Fix a crash when we try to convert extents format inodes to btree
format, we run out of space, but forget to revert the in-core state
changes.
- Fix file size checks when doing INSERT_RANGE that could cause files
to end up negative size if there previously was an extent mapped at
s_maxbytes.
- Fix a bug when doing a remove-then-add ATTR_REPLACE xattr update
where we forget to clear ATTR_REPLACE after the remove, which
causes the attr to be lost and the fs to shut down due to (what it
thinks is) inconsistent in-core state"
* tag 'xfs-4.17-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: don't fail when converting shortform attr to long form during ATTR_REPLACE
xfs: prevent creating negative-sized file via INSERT_RANGE
xfs: set format back to extents if xfs_bmap_extents_to_btree
xfs: enhance dinode verifier
Merge tag 'errseq-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux
Pull errseq infrastructure fix from Jeff Layton:
"The PostgreSQL developers recently had a spirited discussion about the
writeback error handling in Linux, and reached out to us about a
behavoir change to the code that bit them when the errseq_t changes
were merged.
When we changed to using errseq_t for tracking writeback errors, we
lost the ability for an application to see a writeback error that
occurred before the open on which the fsync was issued. This was
problematic for PostgreSQL which offloads fsync calls to a completely
separate process from the DB writers.
This patch restores that ability. If the errseq_t value in the inode
does not have the SEEN flag set, then we just return 0 for the sample.
That ensures that any recorded error is always delivered at least
once.
Note that we might still lose the error if the inode gets evicted from
the cache before anything can reopen it, but that was the case before
errseq_t was merged. At LSF/MM we had some discussion about keeping
inodes with unreported writeback errors around in the cache for longer
(possibly indefinitely), but that's really a separate problem"
* tag 'errseq-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
errseq: Always report a writeback error once
- Fixup license text for oradax driver, from Rob Gardner.
- Release device object with put_device() instead of straight kfree(),
from Arvind Yadav.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc: vio: use put_device() instead of kfree()
sparc64: Fix mistake in oradax license text
Rob Gardner [Fri, 20 Apr 2018 18:48:25 +0000 (12:48 -0600)]
sparc64: Fix mistake in oradax license text
The license text in both oradax files mistakenly specifies "version 3" of
the GNU General Public License. This is corrected to specify "version 2".
Signed-off-by: Rob Gardner <rob.gardner@oracle.com> Signed-off-by: Jonathan Helman <jonathan.helman@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"Another set of x86 related updates:
- Fix the long broken x32 version of the IPC user space headers which
was noticed by Arnd Bergman in course of his ongoing y2038 work.
GLIBC seems to have non broken private copies of these headers so
this went unnoticed.
- Two microcode fixlets which address some more fallout from the
recent modifications in that area:
- Unconditionally save the microcode patch, which was only saved
when CPU_HOTPLUG was enabled causing failures in the late
loading mechanism
- Make the later loader synchronization finally work under all
circumstances. It was exiting early and causing timeout failures
due to a missing synchronization point.
- Do not use mwait_play_dead() on AMD systems to prevent excessive
power consumption as the CPU cannot go into deep power states from
there.
- Address an annoying sparse warning due to lost type qualifiers of
the vmemmap and vmalloc base address constants.
- Prevent reserving crash kernel region on Xen PV as this leads to
the wrong perception that crash kernels actually work there which
is not the case. Xen PV has its own crash mechanism handled by the
hypervisor.
- Add missing TLB cpuid values to the table to make the printout on
certain machines correct.
- Enumerate the new CLDEMOTE instruction
- Fix an incorrect SPDX identifier
- Remove stale macros"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/ipc: Fix x32 version of shmid64_ds and msqid64_ds
x86/setup: Do not reserve a crash kernel region if booted on Xen PV
x86/cpu/intel: Add missing TLB cpuid values
x86/smpboot: Don't use mwait_play_dead() on AMD systems
x86/mm: Make vmemmap and vmalloc base address constants unsigned long
x86/vector: Remove the unused macro FPU_IRQ
x86/vector: Remove the macro VECTOR_OFFSET_START
x86/cpufeatures: Enumerate cldemote instruction
x86/microcode: Do not exit early from __reload_late()
x86/microcode/intel: Save microcode patch unconditionally
x86/jailhouse: Fix incorrect SPDX identifier
Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 pti fixes from Thomas Gleixner:
"A set of updates for the x86/pti related code:
- Preserve r8-r11 in int $0x80. r8-r11 need to be preserved, but the
int$80 entry code removed that quite some time ago. Make it correct
again.
- A set of fixes for the Global Bit work which went into 4.17 and
caused a bunch of interesting regressions:
- Triggering a BUG in the page attribute code due to a missing
check for early boot stage
- Warnings in the page attribute code about holes in the kernel
text mapping which are caused by the freeing of the init code.
Handle such holes gracefully.
- Reduce the amount of kernel memory which is set global to the
actual text and do not incidentally overlap with data.
- Disable the global bit when RANDSTRUCT is enabled as it
partially defeats the hardening.
- Make the page protection setup correct for vma->page_prot
population again. The adjustment of the protections fell through
the crack during the Global bit rework and triggers warnings on
machines which do not support certain features, e.g. NX"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/entry/64/compat: Preserve r8-r11 in int $0x80
x86/pti: Filter at vma->vm_page_prot population
x86/pti: Disallow global kernel text with RANDSTRUCT
x86/pti: Reduce amount of kernel text allowed to be Global
x86/pti: Fix boot warning from Global-bit setting
x86/pti: Fix boot problems from Global-bit setting
Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
"Two fixes from the timer departement:
- Fix a long standing issue in the NOHZ tick code which causes RB
tree corruption, delayed timers and other malfunctions. The cause
for this is code which modifies the expiry time of an enqueued
hrtimer.
- Revert the CLOCK_MONOTONIC/CLOCK_BOOTTIME unification due to
regression reports. Seems userspace _is_ relying on the documented
behaviour despite our hope that it wont"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Revert: Unify CLOCK_MONOTONIC and CLOCK_BOOTTIME
tick/sched: Do not mess with an enqueued hrtimer
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
"The perf update contains the following bits:
x86:
- Prevent setting freeze_on_smi on PerfMon V1 CPUs to avoid #GP
perf stat:
- Keep the '/' event modifier separator in fallback, for example when
fallbacking from 'cpu/cpu-cycles/' to user level only, where it
should become 'cpu/cpu-cycles/u' and not 'cpu/cpu-cycles/:u' (Jiri
Olsa)
- Disable write_backward and other event attributes for !group events
in a group, fixing, for instance this group: '{cycles,msr/aperf/}:S'
that has leader sampling (:S) and where just the 'cycles', the
leader event, should have the write_backward attribute set, in this
case it all fails because the PMU where 'msr/aperf/' lives doesn't
accepts write_backward style sampling (Jiri Olsa)
- Only fall back group read for leader (Kan Liang)
- Fix core PMU alias list for x86 platform (Kan Liang)
- Print out hint for mixed PMU group error (Kan Liang)
- Fix duplicate PMU name for interval print (Kan Liang)
Core:
- Set main kernel end address properly when reading kernel and module
maps (Namhyung Kim)
perf mem:
- Fix incorrect entries and add missing man options (Sangwon Hong)
s/390:
- Remove s390 specific strcmp_cpuid_cmp function (Thomas Richter)
- Adapt 'perf test' case record+probe_libc_inet_pton.sh for s390
- Fix s390 undefined record__auxtrace_init() return value in 'perf
record' (Thomas Richter)"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Don't enable freeze-on-smi for PerfMon V1
perf stat: Fix duplicate PMU name for interval print
perf evsel: Only fall back group read for leader
perf stat: Print out hint for mixed PMU group error
perf pmu: Fix core PMU alias list for X86 platform
perf record: Fix s390 undefined record__auxtrace_init() return value
perf mem: Document incorrect and missing options
perf evsel: Disable write_backward for leader sampling group events
perf pmu: Fix pmu events parsing rule
perf stat: Keep the / modifier separator in fallback
perf test: Adapt test case record+probe_libc_inet_pton.sh for s390
perf list: Remove s390 specific strcmp_cpuid_cmp function
perf machine: Set main kernel end address properly
Merge tag 'for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"Fix misc bugs and a regression for ext4"
* tag 'for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: add MODULE_SOFTDEP to ensure crc32c is included in the initramfs
ext4: fix bitmap position validation
ext4: set h_journal if there is a failure starting a reserved handle
ext4: prevent right-shifting extents beyond EXT_MAX_BLOCKS
Amir Goldstein [Mon, 5 Feb 2018 17:32:18 +0000 (19:32 +0200)]
<linux/stringhash.h>: fix end_name_hash() for 64bit long
The comment claims that this helper will try not to loose bits, but for
64bit long it looses the high bits before hashing 64bit long into 32bit
int. Use the helper hash_long() to do the right thing for 64bit long.
For 32bit long, there is no change.
All the callers of end_name_hash() either assign the result to
qstr->hash, which is u32 or return the result as an int value (e.g.
full_name_hash()). Change the helper return type to int to conform to
its users.
[ It took me a while to apply this, because my initial reaction to it
was - incorrectly - that it could make for slower code.
After having looked more at it, I take back all my complaints about
the patch, Amir was right and I was mis-reading things or just being
stupid.
I also don't worry too much about the possible performance impact of
this on 64-bit, since most architectures that actually care about
performance end up not using this very much (the dcache code is the
most performance-critical, but the word-at-a-time case uses its own
hashing anyway).
So this ends up being mostly used for filesystems that do their own
degraded hashing (usually because they want a case-insensitive
comparison function).
A _tiny_ worry remains, in that not everybody uses DCACHE_WORD_ACCESS,
and then this potentially makes things more expensive on 64-bit
architectures with slow or lacking multipliers even for the normal
case.
That said, realistically the only such architecture I can think of is
PA-RISC. Nobody really cares about performance on that, it's more of a
"look ma, I've got warts^W an odd machine" platform.
So the patch is fine, and all my initial worries were just misplaced
from not looking at this properly. - Linus ]
Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Sterba [Sat, 28 Apr 2018 17:05:04 +0000 (19:05 +0200)]
MAINTAINERS: add myself as maintainer of AFFS
The AFFS filesystem is still in use by m68k community (Link #2), but as
there was no code activity and no maintainer, the filesystem appeared on
the list of candidates for staging/removal (Link #1).
I volunteer to act as a maintainer of AFFS to collect any fixes that
might show up and to guard fs/affs/ against another spring cleaning.
Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
- two driver fixes
- better parameter check for the core
- Documentation updates
- part of a tree-wide HAS_DMA cleanup
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: sprd: Fix the i2c count issue
i2c: sprd: Prevent i2c accesses after suspend is called
i2c: dev: prevent ZERO_SIZE_PTR deref in i2cdev_ioctl_rdwr()
Documentation/i2c: adopt kernel commenting style in examples
Documentation/i2c: sync docs with current state of i2c-tools
Documentation/i2c: whitespace cleanup
i2c: Remove depends on HAS_DMA in case of platform dependency
Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
- crypto API regression that may cause sporadic alloc failures
- double-free bug in drbg
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: drbg - set freed buffers to NULL
crypto: api - fix finding algorithm currently being tested
Merge tag '4.17-rc2-smb3' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
"A few security related fixes for SMB3, most importantly for SMB3.11
encryption"
* tag '4.17-rc2-smb3' of git://git.samba.org/sfrench/cifs-2.6:
cifs: smbd: Avoid allocating iov on the stack
cifs: smbd: Don't use RDMA read/write when signing is used
SMB311: Fix reconnect
SMB3: Fix 3.11 encryption to Windows and handle encrypted smb3 tcon
CIFS: set *resp_buf_type to NO_BUFFER on error
Merge tag 'powerpc-4.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"A bunch of fixes, mostly for existing code and going to stable.
Our memory hot-unplug path wasn't flushing the cache before removing
memory. That is a problem now that we are doing memory hotplug on bare
metal.
Three fixes for the NPU code that supports devices connected via
NVLink (ie. GPUs). The main one tweaks the TLB flush algorithm to
avoid soft lockups for large flushes.
A fix for our memory error handling where we would loop infinitely,
returning back to the bad access and hard lockup the CPU.
Fixes for the OPAL RTC driver, which wasn't handling some error cases
correctly.
A fix for a hardlockup in the powernv cpufreq driver.
And finally two fixes to our smp_send_stop(), required due to a recent
change to use it on shutdown.
Thanks to: Alistair Popple, Balbir Singh, Laurentiu Tudor, Mahesh
Salgaonkar, Mark Hairgrove, Nicholas Piggin, Rashmica Gupta, Shilpasri
G Bhat"
* tag 'powerpc-4.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/kvm/booke: Fix altivec related build break
powerpc: Fix deadlock with multiple calls to smp_send_stop
cpufreq: powernv: Fix hardlockup due to synchronous smp_call in timer interrupt
powerpc: Fix smp_send_stop NMI IPI handling
rtc: opal: Fix OPAL RTC driver OPAL_BUSY loops
powerpc/mce: Fix a bug where mce loops on memory UE.
powerpc/powernv/npu: Do a PID GPU TLB flush when invalidating a large address range
powerpc/powernv/npu: Prevent overwriting of pnv_npu2_init_contex() callback parameters
powerpc/powernv/npu: Add lock to prevent race in concurrent context init/destroy
powerpc/powernv/memtrace: Let the arch hotunplug code flush cache
powerpc/mm: Flush cache on memory hot(un)plug
rMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Radim Krčmář:
"ARM:
- PSCI selection API, a leftover from 4.16 (for stable)
- Kick vcpu on active interrupt affinity change
- Plug a VMID allocation race on oversubscribed systems
- Silence debug messages
- Update Christoffer's email address (linaro -> arm)
x86:
- Expose userspace-relevant bits of a newly added feature
- Fix TLB flushing on VMX with VPID, but without EPT"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
x86/headers/UAPI: Move DISABLE_EXITS KVM capability bits to the UAPI
kvm: apic: Flush TLB after APIC mode/address change if VPIDs are in use
arm/arm64: KVM: Add PSCI version selection API
KVM: arm/arm64: vgic: Kick new VCPU on interrupt migration
arm64: KVM: Demote SVE and LORegion warnings to debug only
MAINTAINERS: Update e-mail address for Christoffer Dall
KVM: arm/arm64: Close VMID generation race
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"Nothing too bad, but the spectre updates to smatch identified a few
places that may need sanitising so we've got those covered.
Details:
- Close some potential spectre-v1 vulnerabilities found by smatch
- Add missing list sentinel for CPUs that don't require KPTI
- Removal of unused 'addr' parameter for I/D cache coherency
- Removal of redundant set_fs(KERNEL_DS) calls in ptrace
- Fix single-stepping state machine handling in response to kernel
traps
- Clang support for 128-bit integers
- Avoid instrumenting our out-of-line atomics in preparation for
enabling LSE atomics by default in 4.18"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: avoid instrumenting atomic_ll_sc.o
KVM: arm/arm64: vgic: fix possible spectre-v1 in vgic_mmio_read_apr()
KVM: arm/arm64: vgic: fix possible spectre-v1 in vgic_get_irq()
arm64: fix possible spectre-v1 in ptrace_hbp_get_event()
arm64: support __int128 with clang
arm64: only advance singlestep for user instruction traps
arm64/kernel: rename module_emit_adrp_veneer->module_emit_veneer_for_adrp
arm64: ptrace: remove addr_limit manipulation
arm64: mm: drop addr parameter from sync icache and dcache
arm64: add sentinel to kpti_safe_list
Merge tag 'ceph-for-4.17-rc3' of git://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"A CephFS quota follow-up and fixes for two older issues in the
messenger layer, marked for stable"
* tag 'ceph-for-4.17-rc3' of git://github.com/ceph/ceph-client:
libceph: validate con->state at the top of try_write()
libceph: reschedule a tick in finish_hunting()
libceph: un-backoff on tick when we have a authenticated session
ceph: check if mds create snaprealm when setting quota
Merge tag 'char-misc-4.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are some small char and misc driver fixes for 4.17-rc3
A variety of small things that have fallen out after 4.17-rc1 was out.
Some vboxguest fixes for systems with lots of memory, amba bus fixes,
some MAINTAINERS updates, uio_hv_generic driver fixes, and a few other
minor things that resolve problems that people reported.
The amba bus fixes took twice to get right, the first time I messed up
applying the patches in the wrong order, hence the revert and later
addition again with the correct fix, sorry about that.
All of these have been in linux-next with no reported issues"
* tag 'char-misc-4.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
ARM: amba: Fix race condition with driver_override
ARM: amba: Make driver_override output consistent with other buses
Revert "ARM: amba: Fix race condition with driver_override"
ARM: amba: Don't read past the end of sysfs "driver_override" buffer
ARM: amba: Fix race condition with driver_override
virt: vbox: Log an error when we fail to get the host version
virt: vbox: Use __get_free_pages instead of kmalloc for DMA32 memory
virt: vbox: Add vbg_req_free() helper function
virt: vbox: Move declarations of vboxguest private functions to private header
slimbus: Fix out-of-bounds access in slim_slicesize()
MAINTAINERS: add dri-devel&linaro-mm for Android ION
fpga-manager: altera-ps-spi: preserve nCONFIG state
MAINTAINERS: update my email address
uio_hv_generic: fix subchannel ring mmap
uio_hv_generic: use correct channel in isr
uio_hv_generic: make ring buffer attribute for primary channel
uio_hv_generic: set size of ring buffer attribute
ANDROID: binder: prevent transactions into own process.
Merge tag 'driver-core-4.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg Kroah-Hartman:
"Here are some small driver core and firmware fixes for 4.17-rc3
There's a kobject WARN() removal to make syzkaller a lot happier about
some "normal" error paths that it keeps hitting, which should reduce
the number of false-positives we have been getting recently.
There's also some fimware test and documentation fixes, and the
coredump() function signature change that needed to happen after -rc1
before drivers started to take advantage of it.
All of these have been in linux-next with no reported issues"
* tag 'driver-core-4.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
firmware: some documentation fixes
selftests:firmware: fixes a call to a wrong function name
kobject: don't use WARN for registration failures
firmware: Fix firmware documentation for recent file renames
test_firmware: fix setting old custom fw path back on exit, second try
test_firmware: Install all scripts
drivers: change struct device_driver::coredump() return type to void
Merge tag 'tty-4.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
"Here are some tty and serial driver fixes for reported issues for
4.17-rc3.
Nothing major, but a number of small things:
- device tree fixes/updates for serial ports
- earlycon fixes
- n_gsm fixes
- tty core change reverted to help resolve syszkaller reports
- other serial driver small fixes
All of these have been in linux-next with no reported issues"
* tag 'tty-4.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
tty: Use __GFP_NOFAIL for tty_ldisc_get()
tty: serial: xuartps: Setup early console when uartclk is also passed
tty: Don't call panic() at tty_ldisc_init()
tty: Avoid possible error pointer dereference at tty_ldisc_restore().
dt-bindings: mvebu-uart: DT fix s/interrupts-names/interrupt-names/
tty: serial: qcom_geni_serial: Use signed variable to get IRQ
earlycon: Use a pointer table to fix __earlycon_table stride
serial: sh-sci: Document r8a77470 bindings
dt-bindings: meson-uart: DT fix s/clocks-names/clock-names/
serial: imx: fix cached UCR2 read on software reset
serial: imx: warn user when using unsupported configuration
serial: mvebu-uart: Fix local flags handling on termios update
tty: n_gsm: Fix DLCI handling for ADM mode if debug & 2 is not set
tty: n_gsm: Fix long delays with control frame timeouts in ADM mode