Michal Hocko [Thu, 16 Nov 2017 01:33:34 +0000 (17:33 -0800)]
mm, memory_hotplug: do not fail offlining too early
Patch series "mm, memory_hotplug: redefine memory offline retry logic", v2.
While testing memory hotplug on a large 4TB machine we have noticed that
memory offlining is just too eager to fail. The primary reason is that
the retry logic is just too easy to give up. We have 4 ways out of the
offline
- we have a permanent failure (isolation or memory notifiers fail,
or hugetlb pages cannot be dropped)
- userspace sends a signal
- a hardcoded 120s timeout expires
- page migration fails 5 times
This is way too convoluted and it doesn't scale very well. We have seen
both temporary migration failures as well as 120s being triggered.
After removing those restrictions we were able to pass stress testing
during memory hot remove without any other negative side effects
observed. Therefore I suggest dropping both hard coded policies. I
couldn't have found any specific reason for them in the changelog. I
neither didn't get any response [1] from Kamezawa. If we need some
upper bound - e.g. timeout based - then we should have a proper and
user defined policy for that. In any case there should be a clear use
case when introducing it.
This patch (of 2):
Memory offlining can fail too eagerly under heavy memory pressure.
Isolation has failed here because the page is not on LRU. Most probably
because it was on the pcp LRU cache or it has been removed from the LRU
already but it hasn't been freed yet. In both cases the page doesn't
look non-migrable so retrying more makes sense.
__offline_pages seems rather cluttered when it comes to the retry logic.
We have 5 retries at maximum and a timeout. We could argue whether the
timeout makes sense but failing just because of a race when somebody
isoltes a page from LRU or puts it on a pcp LRU lists is just wrong. It
only takes it to race with a process which unmaps some pages and remove
them from the LRU list and we can fail the whole offline because of
something that is a temporary condition and actually not harmful for the
offline.
Please note that unmovable pages should be already excluded during
start_isolate_page_range. We could argue that has_unmovable_pages is
racy and MIGRATE_MOVABLE check doesn't provide any hard guarantee either
but kernel zones (aka < ZONE_MOVABLE) will very likely detect unmovable
pages in most cases and movable zone shouldn't contain unmovable pages
at all. Some of those pages might be pinned but not for ever because
that would be a bug on its own. In any case the context is still
interruptible and so the userspace can easily bail out when the
operation takes too long. This is certainly better behavior than a
hardcoded retry loop which is racy.
Fix this by removing the max retry count and only rely on the timeout
resp. interruption by a signal from the userspace. Also retry rather
than fail when check_pages_isolated sees some !free pages because those
could be a result of the race as well.
Michal Hocko [Thu, 16 Nov 2017 01:33:30 +0000 (17:33 -0800)]
mm, page_alloc: fail has_unmovable_pages when seeing reserved pages
Reserved pages should be completely ignored by the core mm because they
have a special meaning for their owners. has_unmovable_pages doesn't
check those so we rely on other tests (reference count, or PageLRU) to
fail on such pages. Althought this happens to work it is safer to
simply check for those explicitly and do not rely on the owner of the
page to abuse those fields for special purposes.
Please note that this is more of a further fortification of the code
rahter than a fix of an existing issue.
Michal Hocko [Thu, 16 Nov 2017 01:33:26 +0000 (17:33 -0800)]
mm: distinguish CMA and MOVABLE isolation in has_unmovable_pages()
Joonsoo has noticed that "mm: drop migrate type checks from
has_unmovable_pages" would break CMA allocator because it relies on
has_unmovable_pages returning false even for CMA pageblocks which in
fact don't have to be movable:
This is a result of the code sharing between CMA and memory hotplug
while each one has a different idea of what has_unmovable_pages should
return. This is unfortunate but fixing it properly would require a lot
of code duplication.
Fix the issue by introducing the requested migrate type argument and
special case MIGRATE_CMA case where CMA page blocks are handled
properly. This will work for memory hotplug because it requires
MIGRATE_MOVABLE.
Link: http://lkml.kernel.org/r/20171019122118.y6cndierwl2vnguj@dhcp22.suse.cz Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Tested-by: Stefan Wahren <stefan.wahren@i2se.com> Tested-by: Ran Wang <ran.wang_1@nxp.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Igor Mammedov <imammedo@redhat.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Reza Arbab <arbab@linux.vnet.ibm.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Yasuaki Ishimatsu <yasu.isimatu@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The current implementation will fail the operation after several failed
page migration attempts but we shouldn't even attempt to migrate that
memory and fail right away because this memory is clearly not
migrateable. This will become a real problem when we drop the retry
loop counter resp. timeout.
The real problem is in has_unmovable_pages in fact. We should fail if
there are any non migrateable pages in the area. In orther to guarantee
that remove the migrate type checks because MIGRATE_MOVABLE is not
guaranteed to contain only migrateable pages. It is merely a heuristic.
Similarly MIGRATE_CMA does guarantee that the page allocator doesn't
allocate any non-migrateable pages from the block but CMA allocations
themselves are unlikely to migrateable. Therefore remove both checks.
[akpm@linux-foundation.org: remove unused local `mt'] Link: http://lkml.kernel.org/r/20171013120013.698-1-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Michael Ellerman <mpe@ellerman.id.au> Tested-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Vlastimil Babka <vbabka@suse.cz> Tested-by: Tony Lindgren <tony@atomide.com> Tested-by: Ran Wang <ran.wang_1@nxp.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Reza Arbab <arbab@linux.vnet.ibm.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Yasuaki Ishimatsu <yasu.isimatu@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tahsin Erdogan [Thu, 16 Nov 2017 01:33:19 +0000 (17:33 -0800)]
mm/page-writeback.c: remove unused parameter from balance_dirty_pages()
"mapping" parameter to balance_dirty_pages() is not used anymore.
Fixes: dfb8ae567835 ("writeback: let balance_dirty_pages() work on the matching cgroup bdi_writeback") Link: http://lkml.kernel.org/r/20170927221311.23263-1-tahsin@google.com Signed-off-by: Tahsin Erdogan <tahsin@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Huang Ying [Thu, 16 Nov 2017 01:33:15 +0000 (17:33 -0800)]
mm, swap: fix false error message in __swp_swapcount()
When a page fault occurs for a swap entry, the physical swap readahead
(not the VMA base swap readahead) may readahead several swap entries
after the fault swap entry. The readahead algorithm calculates some of
the swap entries to readahead via increasing the offset of the fault
swap entry without checking whether they are beyond the end of the swap
device and it relys on the __swp_swapcount() and swapcache_prepare() to
check it. Although __swp_swapcount() checks for the swap entry passed
in, it will complain with the error message as follow for the expected
invalid swap entry. This may make the end users confused.
To fix the false error message, the swap entry checking is added in
swapin_readahead() to avoid to pass the out-of-bound swap entries and
the swap entry reserved for the swap header to __swp_swapcount() and
swapcache_prepare().
Link: http://lkml.kernel.org/r/20171102054225.22897-1-ying.huang@intel.com Fixes: e8c26ab60598 ("mm/swap: skip readahead for unreferenced swap slots") Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Reported-by: Christian Kujau <lists@nerdbynature.de> Acked-by: Minchan Kim <minchan@kernel.org> Suggested-by: Minchan Kim <minchan@kernel.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Hugh Dickins <hughd@google.com> Cc: <stable@vger.kernel.org> [4.11+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Minchan Kim [Thu, 16 Nov 2017 01:33:11 +0000 (17:33 -0800)]
mm: swap: SWP_SYNCHRONOUS_IO: skip swapcache only if swapped page has no other reference
When SWP_SYNCHRONOUS_IO swapped-in pages are shared by several
processes, it can cause unnecessary memory wastage by skipping swap
cache. Because, with swapin fault by read, they could share a page if
the page were in swap cache. Thus, it avoids allocating same content
new pages.
This patch makes the swapcache skipping work only if the swap pte is
non-sharable.
[akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/1507620825-5537-1-git-send-email-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Huang Ying <ying.huang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Minchan Kim [Thu, 16 Nov 2017 01:33:07 +0000 (17:33 -0800)]
mm, swap: skip swapcache for swapin of synchronous device
With fast swap storage, the platforms want to use swap more aggressively
and swap-in is crucial to application latency.
The rw_page() based synchronous devices like zram, pmem and btt are such
fast storage. When I profile swapin performance with zram lz4
decompress test, S/W overhead is more than 70%. Maybe, it would be
bigger in nvdimm.
This patch aims to reduce swap-in latency by skipping swapcache if the
swap device is synchronous device like rw_page based device. It
enhances 45% my swapin test(5G sequential swapin, no readahead, from
2.41sec to 1.64sec).
Link: http://lkml.kernel.org/r/1505886205-9671-5-git-send-email-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Huang Ying <ying.huang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Minchan Kim [Thu, 16 Nov 2017 01:33:04 +0000 (17:33 -0800)]
mm, swap: introduce SWP_SYNCHRONOUS_IO
If rw-page based fast storage is used for swap devices, we need to
detect it to enhance swap IO operations. This patch is preparation for
optimizing of swap-in operation with next patch.
Link: http://lkml.kernel.org/r/1505886205-9671-4-git-send-email-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Huang Ying <ying.huang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
someday we will remove rw_page(). If so, we need something to detect
such super-fast storage on which synchronous IO operations like the
current rw_page are always a win.
Introduces BDI_CAP_SYNCHRONOUS_IO to indicate such devices. With it, we
could use various optimization techniques.
Link: http://lkml.kernel.org/r/1505886205-9671-3-git-send-email-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Huang Ying <ying.huang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Minchan Kim [Thu, 16 Nov 2017 01:32:56 +0000 (17:32 -0800)]
zram: set BDI_CAP_STABLE_WRITES once
With fast swap storage, the platform wants to use swap more aggressively
and swap-in is crucial to application latency.
The rw_page() based synchronous devices like zram, pmem and btt are such
fast storage. When I profile swapin performance with zram lz4
decompress test, S/W overhead is more than 70%. Maybe, it would be
bigger in nvdimm.
This patchset reduces swap-in latency by skipping swapcache if the swap
device is a synchronous device like a rw_page() based device.
It enhances by 45% my swapin test (5G sequential swapin, no readahead)
from 2.41sec to 1.64sec.
This patch (of 4):
Commit 19b7ccf8651d ("block: get rid of blk_integrity_revalidate()")
fixed a weird thing (i.e., reset BDI_CAP_STABLE_WRITES flag
unconditionally whenever revalidat_disk is called) so zram doesn't need
to reset the flag any more when revalidating the bdev. Instead, set the
flag just once when the zram device is created.
It shouldn't change any behavior.
Link: http://lkml.kernel.org/r/1505886205-9671-2-git-send-email-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Hugh Dickins <hughd@google.com> Cc: Huang Ying <ying.huang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
include/linux/slab.h: add kmalloc_array_node() and kcalloc_node()
Patch series "Add kmalloc_array_node() and kcalloc_node()".
Our current memeory allocation routines suffer form an API imbalance,
for one we have kmalloc_array() and kcalloc() which check for overflows
in size multiplication and we have kmalloc_node() and kzalloc_node()
which allow for memory allocation on a certain NUMA node but don't check
for eventual overflows.
This patch (of 6):
We have kmalloc_array() and kcalloc() wrappers on top of kmalloc() which
ensure us overflow free multiplication for the size of a memory
allocation but these implementations are not NUMA-aware.
Likewise we have kmalloc_node() which is a NUMA-aware version of
kmalloc() but the implementation is not aware of any possible overflows
in eventual size calculations.
Introduce a combination of the two above cases to have a NUMA-node aware
version of kmalloc_array() and kcalloc().
Link: http://lkml.kernel.org/r/20170927082038.3782-2-jthumshirn@suse.de Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Cc: Christoph Lameter <cl@linux.com> Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: David Rientjes <rientjes@google.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Doug Ledford <dledford@redhat.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mike Marciniszyn <infinipath@intel.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Santosh Shilimkar <santosh.shilimkar@oracle.com> Cc: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Miles Chen [Thu, 16 Nov 2017 01:32:25 +0000 (17:32 -0800)]
slub: fix sysfs duplicate filename creation when slub_debug=O
When slub_debug=O is set. It is possible to clear debug flags for an
"unmergeable" slab cache in kmem_cache_open(). It makes the "unmergeable"
cache became "mergeable" in sysfs_slab_add().
These caches will generate their "unique IDs" by create_unique_id(), but
it is possible to create identical unique IDs. In my experiment,
sgpool-128, names_cache, biovec-256 generate the same ID ":Ft-0004096" and
the kernel reports "sysfs: cannot create duplicate filename
'/kernel/slab/:Ft-0004096'".
To repeat my experiment, set disable_higher_order_debug=1,
CONFIG_SLUB_DEBUG_ON=y in kernel-4.14.
Fix this issue by setting unmergeable=1 if slub_debug=O and the the
default slub_debug contains any no-merge flags.
call path:
kmem_cache_create()
__kmem_cache_alias() -> we set SLAB_NEVER_MERGE flags here
create_cache()
__kmem_cache_create()
kmem_cache_open() -> clear DEBUG_METADATA_FLAGS
sysfs_slab_add() -> the slab cache is mergeable now
sysfs: cannot create duplicate filename '/kernel/slab/:Ft-0004096'
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x60/0x7c
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 4.14.0-rc7ajb-00131-gd4c2e9f-dirty #123
Hardware name: linux,dummy-virt (DT)
task: ffffffc07d4e0080 task.stack: ffffff8008008000
PC is at sysfs_warn_dup+0x60/0x7c
LR is at sysfs_warn_dup+0x60/0x7c
pc : lr : pstate: 60000145
Call trace:
sysfs_warn_dup+0x60/0x7c
sysfs_create_dir_ns+0x98/0xa0
kobject_add_internal+0xa0/0x294
kobject_init_and_add+0x90/0xb4
sysfs_slab_add+0x90/0x200
__kmem_cache_create+0x26c/0x438
kmem_cache_create+0x164/0x1f4
sg_pool_init+0x60/0x100
do_one_initcall+0x38/0x12c
kernel_init_freeable+0x138/0x1d4
kernel_init+0x10/0xfc
ret_from_fork+0x10/0x18
Link: http://lkml.kernel.org/r/1510365805-5155-1-git-send-email-miles.chen@mediatek.com Signed-off-by: Miles Chen <miles.chen@mediatek.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Rientjes [Thu, 16 Nov 2017 01:32:14 +0000 (17:32 -0800)]
mm/slab.c: only set __GFP_RECLAIMABLE once
SLAB_RECLAIM_ACCOUNT is a permanent attribute of a slab cache. Set
__GFP_RECLAIMABLE as part of its ->allocflags rather than check the
cachep flag on every page allocation.
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1710171527560.140898@chino.kir.corp.google.com Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Miles Chen [Thu, 16 Nov 2017 01:32:10 +0000 (17:32 -0800)]
mm/slob.c: remove an unnecessary check for __GFP_ZERO
Current flow guarantees a valid pointer when handling the __GFP_ZERO
case. So remove the unnecessary NULL pointer check.
Link: http://lkml.kernel.org/r/1507203141-11959-1-git-send-email-miles.chen@mediatek.com Signed-off-by: Miles Chen <miles.chen@mediatek.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yang Shi [Thu, 16 Nov 2017 01:32:07 +0000 (17:32 -0800)]
mm: oom: show unreclaimable slab info when unreclaimable slabs > user memory
The kernel may panic when an oom happens without killable process
sometimes it is caused by huge unreclaimable slabs used by kernel.
Although kdump could help debug such problem, however, kdump is not
available on all architectures and it might be malfunction sometime.
And, since kernel already panic it is worthy capturing such information
in dmesg to aid touble shooting.
Print out unreclaimable slab info (used size and total size) which
actual memory usage is not zero (num_objs * size != 0) when
unreclaimable slabs amount is greater than total user memory (LRU
pages).
Yang Shi [Thu, 16 Nov 2017 01:32:03 +0000 (17:32 -0800)]
mm: slabinfo: remove CONFIG_SLABINFO
According to discussion with Christoph
(https://marc.info/?l=linux-kernel&m=150695909709711&w=2), it sounds like
it is pointless to keep CONFIG_SLABINFO around.
This patch removes the CONFIG_SLABINFO config option, but /proc/slabinfo
is still available.
Yang Shi [Thu, 16 Nov 2017 01:31:59 +0000 (17:31 -0800)]
tools: slabinfo: add "-U" option to show unreclaimable slabs only
Patch series "oom: capture unreclaimable slab info in oom message", v10.
Recently we ran into a oom issue, kernel panic due to no killable
process. The dmesg shows huge unreclaimable slabs used almost 100%
memory, but kdump doesn't capture vmcore due to some reason.
So, it may sound better to capture unreclaimable slab info in oom
message when kernel panic to aid trouble shooting and cover the corner
case. Since kernel already panic, so capturing more information sounds
worthy and doesn't bother normal oom killer.
With the patchset, tools/vm/slabinfo has a new option, "-U", to show
unreclaimable slab only.
And, oom will print all non zero (num_objs * size != 0) unreclaimable
slabs in oom killer message.
This patch (of 3):
Add "-U" option to show unreclaimable slabs only.
"-U" and "-S" together can tell us what unreclaimable slabs use the most
memory to help debug huge unreclaimable slabs issue.
Link: http://lkml.kernel.org/r/1507152550-46205-2-git-send-email-yang.s@alibaba-inc.com Signed-off-by: Yang Shi <yang.s@alibaba-inc.com> Acked-by: Christoph Lameter <cl@linux.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Changwei Ge [Thu, 16 Nov 2017 01:31:52 +0000 (17:31 -0800)]
ocfs2/dlm: get mle inuse only when it is initialized
When dlm_add_migration_mle returns -EEXIST, previously input mle will
not be initialized. So we can't use its associated dlm object. And we
truly don't need this mle for already launched migration progress, since
oldmle has taken this role.
Link: http://lkml.kernel.org/r/63ADC13FD55D6546B7DECE290D39E373CED7AA61@H3CMLB14-EX.srv.huawei-3com.com Signed-off-by: Changwei Ge <ge.changwei@h3c.com> Reviewed-by: Joseph Qi <jiangqi903@gmail.com> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
alex chen [Thu, 16 Nov 2017 01:31:48 +0000 (17:31 -0800)]
ocfs2: subsystem.su_mutex is required while accessing the item->ci_parent
The subsystem.su_mutex is required while accessing the item->ci_parent,
otherwise, NULL pointer dereference to the item->ci_parent will be
triggered in the following situation:
add node delete node
sys_write
vfs_write
configfs_write_file
o2nm_node_store
o2nm_node_local_write
do_rmdir
vfs_rmdir
configfs_rmdir
mutex_lock(&subsys->su_mutex);
unlink_obj
item->ci_group = NULL;
item->ci_parent = NULL;
to_o2nm_cluster_from_node
node->nd_item.ci_parent->ci_parent
BUG since of NULL pointer dereference to nd_item.ci_parent
Moreover, the o2nm_cluster also should be protected by the
subsystem.su_mutex.
[alex.chen@huawei.com: v2] Link: http://lkml.kernel.org/r/59EEAA69.9080703@huawei.com Link: http://lkml.kernel.org/r/59E9B36A.10700@huawei.com Signed-off-by: Alex Chen <alex.chen@huawei.com> Reviewed-by: Jun Piao <piaojun@huawei.com> Reviewed-by: Joseph Qi <jiangqi903@gmail.com> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
alex chen [Thu, 16 Nov 2017 01:31:44 +0000 (17:31 -0800)]
ocfs2: ip_alloc_sem should be taken in ocfs2_get_block()
ip_alloc_sem should be taken in ocfs2_get_block() when reading file in
DIRECT mode to prevent concurrent access to extent tree with
ocfs2_dio_end_io_write(), which may cause BUGON in the following
situation:
read file 'A' end_io of writing file 'A'
vfs_read
__vfs_read
ocfs2_file_read_iter
generic_file_read_iter
ocfs2_direct_IO
__blockdev_direct_IO
do_blockdev_direct_IO
do_direct_IO
get_more_blocks
ocfs2_get_block
ocfs2_extent_map_get_blocks
ocfs2_get_clusters
ocfs2_get_clusters_nocache()
ocfs2_search_extent_list
return the index of record which
contains the v_cluster, that is
v_cluster > rec[i]->e_cpos.
ocfs2_dio_end_io
ocfs2_dio_end_io_write
down_write(&oi->ip_alloc_sem);
ocfs2_mark_extent_written
ocfs2_change_extent_flag
ocfs2_split_extent
...
--> modify the rec[i]->e_cpos, resulting
in v_cluster < rec[i]->e_cpos.
BUG_ON(v_cluster < le32_to_cpu(rec->e_cpos))
[alex.chen@huawei.com: v3] Link: http://lkml.kernel.org/r/59EF3614.6050008@huawei.com Link: http://lkml.kernel.org/r/59EF3614.6050008@huawei.com Fixes: c15471f79506 ("ocfs2: fix sparse file & data ordering issue in direct io") Signed-off-by: Alex Chen <alex.chen@huawei.com> Reviewed-by: Jun Piao <piaojun@huawei.com> Reviewed-by: Joseph Qi <jiangqi903@gmail.com> Reviewed-by: Gang He <ghe@suse.com> Acked-by: Changwei Ge <ge.changwei@h3c.com> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
alex chen [Thu, 16 Nov 2017 01:31:40 +0000 (17:31 -0800)]
ocfs2: should wait dio before inode lock in ocfs2_setattr()
we should wait dio requests to finish before inode lock in
ocfs2_setattr(), otherwise the following deadlock will happen:
process 1 process 2 process 3
truncate file 'A' end_io of writing file 'A' receiving the bast messages
ocfs2_setattr
ocfs2_inode_lock_tracker
ocfs2_inode_lock_full
inode_dio_wait
__inode_dio_wait
-->waiting for all dio
requests finish
dlm_proxy_ast_handler
dlm_do_local_bast
ocfs2_blocking_ast
ocfs2_generic_handle_bast
set OCFS2_LOCK_BLOCKED flag
dio_end_io
dio_bio_end_aio
dio_complete
ocfs2_dio_end_io
ocfs2_dio_end_io_write
ocfs2_inode_lock
__ocfs2_cluster_lock
ocfs2_wait_for_mask
-->waiting for OCFS2_LOCK_BLOCKED
flag to be cleared, that is waiting
for 'process 1' unlocking the inode lock
inode_dio_end
-->here dec the i_dio_count, but will never
be called, so a deadlock happened.
Link: http://lkml.kernel.org/r/59F81636.70508@huawei.com Signed-off-by: Alex Chen <alex.chen@huawei.com> Reviewed-by: Jun Piao <piaojun@huawei.com> Reviewed-by: Joseph Qi <jiangqi903@gmail.com> Acked-by: Changwei Ge <ge.changwei@h3c.com> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
piaojun [Thu, 16 Nov 2017 01:31:37 +0000 (17:31 -0800)]
ocfs2: clean up some unused function declarations
Link: http://lkml.kernel.org/r/59C5D7D6.9050106@huawei.com Signed-off-by: Jun Piao <piaojun@huawei.com> Reviewed-by: Alex Chen <alex.chen@huawei.com> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Changwei Ge [Thu, 16 Nov 2017 01:31:33 +0000 (17:31 -0800)]
ocfs2: fix cluster hang after a node dies
When a node dies, other live nodes have to choose a new master for an
existed lock resource mastered by the dead node.
As for ocfs2/dlm implementation, this is done by function -
dlm_move_lockres_to_recovery_list which marks those lock rsources as
DLM_LOCK_RES_RECOVERING and manages them via a list from which DLM
changes lock resource's master later.
So without invoking dlm_move_lockres_to_recovery_list, no master will be
choosed after dlm recovery accomplishment since no lock resource can be
found through ::resource list.
What's worse is that if DLM_LOCK_RES_RECOVERING is not marked for lock
resources mastered a dead node, it will break up synchronization among
nodes.
So invoke dlm_move_lockres_to_recovery_list again.
Fixs: 'commit ee8f7fcbe638 ("ocfs2/dlm: continue to purge recovery lockres when recovery master goes down")' Link: http://lkml.kernel.org/r/63ADC13FD55D6546B7DECE290D39E373CED6E0F9@H3CMLB14-EX.srv.huawei-3com.com Signed-off-by: Changwei Ge <ge.changwei@h3c.com> Reported-by: Vitaly Mayatskih <v.mayatskih@gmail.com> Tested-by: Vitaly Mayatskikh <v.mayatskih@gmail.com> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
piaojun [Thu, 16 Nov 2017 01:31:29 +0000 (17:31 -0800)]
ocfs2: cleanup unused func declaration and assignment
Link: http://lkml.kernel.org/r/59E064BB.8000005@huawei.com Signed-off-by: Jun Piao <piaojun@huawei.com> Reviewed-by: Joseph Qi <jiangqi903@gmail.com> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
piaojun [Thu, 16 Nov 2017 01:31:25 +0000 (17:31 -0800)]
ocfs2: no need flush workqueue before destroying it
destroy_workqueue() will do flushing work for us.
Link: http://lkml.kernel.org/r/59E06476.3090502@huawei.com Signed-off-by: Jun Piao <piaojun@huawei.com> Reviewed-by: Joseph Qi <jiangqi903@gmail.com> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The m32r Kconfig provides both CPU_BIG_ENDIAN and CPU_LITTLE_ENDIAN
configuration options. As they are user-selectable and independent,
this allows invalid configurations:
- All m32r defconfigs build a big endian kernel, but CPU_BIG_ENDIAN is
not set, causing compiler warnings like:
- Since commit 5bdfca6435b82944 ("m32r: define CPU_BIG_ENDIAN"),
building an allmodconfig or allyesconfig enables both
CONFIG_CPU_BIG_ENDIAN and CONFIG_CPU_LITTLE_ENDIAN.
While this did get rid of the warning above, both options are
obviously mutually exclusive.
Fix this by making only CPU_LITTLE_ENDIAN configurable by the user, as
before, and by making sure exactly one of CPU_BIG_ENDIAN and
CPU_LITTLE_ENDIAN is always enabled.
Linus Torvalds [Wed, 15 Nov 2017 23:12:28 +0000 (15:12 -0800)]
Merge tag 'ipmi-for-4.15' of git://github.com/cminyard/linux-ipmi
Pull IPMI updates from Corey Minyard:
"This is a fairly large rework of the IPMI code, along with a bunch of
smaller fixes. The major changes have been in the next tree for a
couple of months, so they should be good to do in.
- Some users had IPMI systems where the GUID of the IPMI controller
could change. So rescanning of the GUID was added. The naming of
some sysfs things was dependent on the GUID, however, so this
resulted in the sysfs interface code in IPMI changing to remove
that dependency and name the IPMI BMCs like other sysfs devices.
- The ipmi_si_intf.c code was fairly bloated with all the different
discovery methods (PCI, ACPI, SMBIOS, OF, platform, module
parameters, hot add). The structure of how the interfaces were
added was redone to make them more modular, then the individual
methods were pulled out into their own files"
* tag 'ipmi-for-4.15' of git://github.com/cminyard/linux-ipmi: (48 commits)
ipmi_si: Delete an error message for a failed memory allocation in try_smi_init()
ipmi_si: fix memory leak on new_smi
ipmi: remove redundant initialization of bmc
ipmi: pr_err() strings should end with newlines
ipmi: Clean up some print operations
ipmi: Make the DMI probe into a generic platform probe
ipmi: Make the IPMI proc interface configurable
ipmi_ssif: Add device attrs for the things in proc
ipmi_si: Add device attrs for the things in proc
ipmi_si: remove ipmi_smi_alloc() function
ipmi_si: Move port and mem I/O handling to their own files
ipmi_si: Get rid of unused spacing and port fields
ipmi_si: Move PARISC handling to another file
ipmi_si: Move PCI setup to another file
ipmi_si: Move platform device handling to another file
ipmi_si: Move hardcode handling to a separate file.
ipmi_si: Move the hotmod handling to another file.
ipmi_si: Change ipmi_si_add_smi() to take just I/O info
ipmi_si: Move io setup into io structure
ipmi_si: Move irq setup handling into the io struct
...
- support MSI on Tango host controller (Marc Gonzalez)
- support Tegra186 PCIe host controller (Manikanta Maddireddy)
- use generic accessors on Tegra when possible (Thierry Reding)
- support V3 Semiconductor PCI host controller (Linus Walleij)
* tag 'pci-v4.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (85 commits)
PCI/ASPM: Add L1 Substates definitions
PCI/ASPM: Reformat ASPM register definitions
PCI/ASPM: Use correct capability pointer to program LTR_L1.2_THRESHOLD
PCI/ASPM: Account for downstream device's Port Common_Mode_Restore_Time
PCI: xgene: Rename xgene_pcie_probe_bridge() to xgene_pcie_probe()
PCI: xilinx: Rename xilinx_pcie_link_is_up() to xilinx_pcie_link_up()
PCI: altera: Rename altera_pcie_link_is_up() to altera_pcie_link_up()
PCI: Fix kernel-doc build warning
PCI: Fail pci_map_rom() if the option ROM is invalid
PCI: Move pci_map_rom() error path
PCI: Move PCI_QUIRKS to the PCI bus menu
alpha/PCI: Make pdev_save_srm_config() static
PCI: Remove unused declarations
PCI: Remove redundant pci_dev, pci_bus, resource declarations
PCI: Remove redundant pcibios_set_master() declarations
PCI/PME: Handle invalid data when reading Root Status
PCI: hv: Use effective affinity mask
PCI: pciehp: Do not clear Presence Detect Changed during initialization
PCI: pciehp: Fix race condition handling surprise link down
PCI: Distribute available resources to hotplug-capable bridges
...
Linus Torvalds [Wed, 15 Nov 2017 22:54:53 +0000 (14:54 -0800)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull rdma updates from Doug Ledford:
"This is a fairly plain pull request. Lots of driver updates across the
stack, a huge number of static analysis cleanups including a close to
50 patch series from Bart Van Assche, and a number of new features
inside the stack such as general CQ moderation support.
Nothing really stands out, but there might be a few conflicts as you
take things in. In particular, the cleanups touched some of the same
lines as the new timer_setup changes.
Everything in this pull request has been through 0day and at least two
days of linux-next (since Stephen doesn't necessarily flag new
errors/warnings until day2). A few more items (about 30 patches) from
Intel and Mellanox showed up on the list on Tuesday. I've excluded
those from this pull request, and I'm sure some of them qualify as
fixes suitable to send any time, but I still have to review them
fully. If they contain mostly fixes and little or no new development,
then I will probably send them through by the end of the week just to
get them out of the way.
There was a break in my acceptance of patches which coincides with the
computer problems I had, and then when I got things mostly back under
control I had a backlog of patches to process, which I did mostly last
Friday and Monday. So there is a larger number of patches processed in
that timeframe than I was striving for.
Summary:
- Add iWARP support to qedr driver
- Lots of misc fixes across subsystem
- Multiple update series to hns roce driver
- Multiple update series to hfi1 driver
- Updates to vnic driver
- Add kref to wait struct in cxgb4 driver
- Updates to i40iw driver
- Mellanox shared pull request
- timer_setup changes
- massive cleanup series from Bart Van Assche
- Two series of SRP/SRPT changes from Bart Van Assche
- Core updates from Mellanox
- i40iw updates
- IPoIB updates
- mlx5 updates
- mlx4 updates
- hns updates
- bnxt_re fixes
- PCI write padding support
- Sparse/Smatch/warning cleanups/fixes
- CQ moderation support
- SRQ support in vmw_pvrdma"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (296 commits)
RDMA/core: Rename kernel modify_cq to better describe its usage
IB/mlx5: Add CQ moderation capability to query_device
IB/mlx4: Add CQ moderation capability to query_device
IB/uverbs: Add CQ moderation capability to query_device
IB/mlx5: Exposing modify CQ callback to uverbs layer
IB/mlx4: Exposing modify CQ callback to uverbs layer
IB/uverbs: Allow CQ moderation with modify CQ
iw_cxgb4: atomically flush the qp
iw_cxgb4: only call the cq comp_handler when the cq is armed
iw_cxgb4: Fix possible circular dependency locking warning
RDMA/bnxt_re: report vlan_id and sl in qp1 recv completion
IB/core: Only maintain real QPs in the security lists
IB/ocrdma_hw: remove unnecessary code in ocrdma_mbx_dealloc_lkey
RDMA/core: Make function rdma_copy_addr return void
RDMA/vmw_pvrdma: Add shared receive queue support
RDMA/core: avoid uninitialized variable warning in create_udata
RDMA/bnxt_re: synchronize poll_cq and req_notify_cq verbs
RDMA/bnxt_re: Flush CQ notification Work Queue before destroying QP
RDMA/bnxt_re: Set QP state in case of response completion errors
RDMA/bnxt_re: Add memory barriers when processing CQ/EQ entries
...
Linus Torvalds [Wed, 15 Nov 2017 22:29:44 +0000 (14:29 -0800)]
Merge branch 'for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
"Cgroup2 cpu controller support is finally merged.
- Basic cpu statistics support to allow monitoring by default without
the CPU controller enabled.
- cgroup2 cpu controller support.
- /sys/kernel/cgroup files to help dealing with new / optional
features"
* 'for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: export list of cgroups v2 features using sysfs
cgroup: export list of delegatable control files using sysfs
cgroup: mark @cgrp __maybe_unused in cpu_stat_show()
MAINTAINERS: relocate cpuset.c
cgroup, sched: Move basic cpu stats from cgroup.stat to cpu.stat
sched: Implement interface for cgroup unified hierarchy
sched: Misc preps for cgroup unified hierarchy interface
sched/cputime: Add dummy cputime_adjust() implementation for CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
cgroup: statically initialize init_css_set->dfl_cgrp
cgroup: Implement cgroup2 basic CPU usage accounting
cpuacct: Introduce cgroup_account_cputime[_field]()
sched/cputime: Expose cputime_adjust()
Linus Torvalds [Wed, 15 Nov 2017 22:17:11 +0000 (14:17 -0800)]
Merge branch 'for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
Pull percpu update from Tejun Heo:
"Another minor pull request. It only contains one commit which can
reclaim a bit of memory wasted during boot on UP"
* 'for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
percpu: don't forget to free the temporary struct pcpu_alloc_info
Linus Torvalds [Wed, 15 Nov 2017 22:15:21 +0000 (14:15 -0800)]
Merge branch 'for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue updates from Tejun Heo:
"There was a commit to make unbound kworkers respect cpu isolation but
it conflicted with the restructuring of cpu isolation and got
reverted, so the only thing left is the trivial comment fix.
Will retry the cpu isolation change after this merge window"
* 'for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: Fix comment for unbound workqueue's attrbutes
Revert "workqueue: respect isolated cpus when queueing an unbound work"
workqueue: respect isolated cpus when queueing an unbound work
Linus Torvalds [Wed, 15 Nov 2017 22:11:41 +0000 (14:11 -0800)]
Merge branch 'for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata
Pull libata updates from Tejun Heo:
"Nothing too interesting or alarming. Other than a new power saving
mode addition to ahci and crash fix on a tracepoint, all changes are
trivial or device-specific"
* 'for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata: (22 commits)
ahci: imx: Handle increased read failures for IMX53 temperature sensor in low frequency mode.
ata: sata_dwc_460ex: Propagate platform device ID to DMA driver
ata: fixes kernel crash while tracing ata_eh_link_autopsy event
ata: pata_pdc2027x: Fix space before '[' error.
libata: fix spelling mistake: 'ambigious' -> 'ambiguous'
ata: ceva: Add SMMU support for SATA IP
ata: ceva: Correct the suspend and resume logic for SATA
ata: ceva: Correct the AXI bus configuration for SATA ports
ata: ceva: Add CCI support for SATA if CCI is enabled
ata: ceva: Make RxWaterMark value as module parameter
ata: ceva: Disable Device Sleep capability
ata: ceva: Add gen 3 mode support in driver
ata: ceva: Move sata port phy oob settings to device-tree
devicetree: bindings: Add sata port phy config parameters in ahci-ceva
ata: mark expected switch fall-throughs
ata: sata_mv: remove a redundant assignment to pointer ehi
ahci: Add support for Cavium's fifth generation SATA controller
ata: sata_rcar: Use of_device_get_match_data() helper
libata: make ata_port_type const
libata: make static arrays const, reduces object code size
...
Linus Torvalds [Wed, 15 Nov 2017 21:46:33 +0000 (13:46 -0800)]
Merge tag 'modules-for-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux
Pull module updates from Jessica Yu:
"Summary of modules changes for the 4.15 merge window:
- treewide module_param_call() cleanup, fix up set/get function
prototype mismatches, from Kees Cook
- minor code cleanups"
* tag 'modules-for-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
module: Do not paper over type mismatches in module_param_call()
treewide: Fix function prototypes for module_param_call()
module: Prepare to convert all module_param_call() prototypes
kernel/module: Delete an error message for a failed memory allocation in add_module_usage()
Linus Torvalds [Wed, 15 Nov 2017 21:39:18 +0000 (13:39 -0800)]
Merge tag 'mailbox-v4.15' of git://git.linaro.org/landing-teams/working/fujitsu/integration
Pull mailbox updates from Jassi Brar:
"Change to POLL api and fixes for FlexRM and OMAP driver.
Summary:
- Core: Prefer ACK method over POLL, if both supported
- Test: use flag instead of special character
- FlexRM: Usual driver internal minor churn
- Omap: fix error path"
* tag 'mailbox-v4.15' of git://git.linaro.org/landing-teams/working/fujitsu/integration:
mailbox/omap: unregister mbox class
mailbox: mailbox-test: don't rely on rx_buffer content to signal data ready
mailbox: reset txdone_method TXDONE_BY_POLL if client knows_txdone
mailbox: Build Broadcom FlexRM driver as loadable module for iProc SOCs
mailbox: bcm-flexrm-mailbox: Use common GPL comment header
mailbox: bcm-flexrm-mailbox: add depends on ARCH_BCM_IPROC
mailbox: bcm-flexrm-mailbox: Print ring number in errors and warnings
mailbox: bcm-flexrm-mailbox: Fix FlexRM ring flush sequence
Linus Torvalds [Wed, 15 Nov 2017 21:35:43 +0000 (13:35 -0800)]
Merge tag 'hsi-for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-hsi
Pull HSI updates from Sebastian Reichel:
- add HSI OMAP4 bindings
- misc small fixes
* tag 'hsi-for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-hsi:
dt-bindings: hsi: add omap4 hsi controller bindings
HSI: hsi_char: pr_err() strings should end with newlines
HSI: omap_ssi_core: fix kilo to be "k" not "K"
Linus Torvalds [Wed, 15 Nov 2017 21:32:56 +0000 (13:32 -0800)]
Merge tag 'selinux-pr-20171113' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull SELinux updates from Paul Moore:
"Seven SELinux patches for v4.15, although five of the seven are small
build fixes and cleanups.
Of the remaining two patches, the only one worth really calling out is
Eric's fix for the SELinux filesystem xattr set/remove code; the other
patch simply converts the SELinux hash table implementation to use
kmem_cache.
Eric's setxattr/removexattr tweak converts SELinux back to calling the
commoncap implementations when the xattr is not SELinux related. The
immediate win is to fixup filesystem capabilities in user namespaces,
but it makes things a bit saner overall; more information in the
commit description"
* tag 'selinux-pr-20171113' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: remove extraneous initialization of slots_used and max_chain_len
selinux: remove redundant assignment to len
selinux: remove redundant assignment to str
selinux: fix build warning
selinux: fix build warning by removing the unused sid variable
selinux: Perform both commoncap and selinux xattr checks
selinux: Use kmem_cache for hashtab_node
Linus Torvalds [Wed, 15 Nov 2017 21:28:48 +0000 (13:28 -0800)]
Merge tag 'audit-pr-20171113' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit
Pull audit updates from Paul Moore:
"Another relatively small pull request for audit, nine patches total.
The only real new bit of functionality is the patch from Richard which
adds the ability to filter records based on the filesystem type.
The remainder are bug fixes and cleanups; the bug fix highlights
include:
- ensuring that we properly audit init/PID-1 (me)
- allowing the audit daemon to shutdown the kernel/auditd connection
cleanly by setting the audit PID to zero (Steve)"
* tag 'audit-pr-20171113' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: filter PATH records keyed on filesystem magic
Audit: remove unused audit_log_secctx function
audit: Allow auditd to set pid to 0 to end auditing
audit: Add new syscalls to the perm=w filter
audit: use audit_set_enabled() in audit_enable()
audit: convert audit_ever_enabled to a boolean
audit: don't use simple_strtol() anymore
audit: initialize the audit subsystem as early as possible
audit: ensure that 'audit=1' actually enables audit for PID 1
Jann Horn [Tue, 14 Nov 2017 00:03:44 +0000 (01:03 +0100)]
mm/pagewalk.c: report holes in hugetlb ranges
This matters at least for the mincore syscall, which will otherwise copy
uninitialized memory from the page allocator to userspace. It is
probably also a correctness error for /proc/$pid/pagemap, but I haven't
tested that.
Removing the `walk->hugetlb_entry` condition in walk_hugetlb_range() has
no effect because the caller already checks for that.
This only reports holes in hugetlb ranges to callers who have specified
a hugetlb_entry callback.
This issue was found using an AFL-based fuzzer.
v2:
- don't crash on ->pte_hole==NULL (Andrew Morton)
- add Cc stable (Andrew Morton)
Pull networking updates from David Miller:
"Highlights:
1) Maintain the TCP retransmit queue using an rbtree, with 1GB
windows at 100Gb this really has become necessary. From Eric
Dumazet.
2) Multi-program support for cgroup+bpf, from Alexei Starovoitov.
3) Perform broadcast flooding in hardware in mv88e6xxx, from Andrew
Lunn.
4) Add meter action support to openvswitch, from Andy Zhou.
5) Add a data meta pointer for BPF accessible packets, from Daniel
Borkmann.
6) Namespace-ify almost all TCP sysctl knobs, from Eric Dumazet.
7) Turn on Broadcom Tags in b53 driver, from Florian Fainelli.
8) More work to move the RTNL mutex down, from Florian Westphal.
9) Add 'bpftool' utility, to help with bpf program introspection.
From Jakub Kicinski.
10) Add new 'cpumap' type for XDP_REDIRECT action, from Jesper
Dangaard Brouer.
11) Support 'blocks' of transformations in the packet scheduler which
can span multiple network devices, from Jiri Pirko.
12) TC flower offload support in cxgb4, from Kumar Sanghvi.
13) Priority based stream scheduler for SCTP, from Marcelo Ricardo
Leitner.
14) Thunderbolt networking driver, from Amir Levy and Mika Westerberg.
15) Add RED qdisc offloadability, and use it in mlxsw driver. From
Nogah Frankel.
16) eBPF based device controller for cgroup v2, from Roman Gushchin.
17) Add some fundamental tracepoints for TCP, from Song Liu.
18) Remove garbage collection from ipv6 route layer, this is a
significant accomplishment. From Wei Wang.
19) Add multicast route offload support to mlxsw, from Yotam Gigi"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2177 commits)
tcp: highest_sack fix
geneve: fix fill_info when link down
bpf: fix lockdep splat
net: cdc_ncm: GetNtbFormat endian fix
openvswitch: meter: fix NULL pointer dereference in ovs_meter_cmd_reply_start
netem: remove unnecessary 64 bit modulus
netem: use 64 bit divide by rate
tcp: Namespace-ify sysctl_tcp_default_congestion_control
net: Protect iterations over net::fib_notifier_ops in fib_seq_sum()
ipv6: set all.accept_dad to 0 by default
uapi: fix linux/tls.h userspace compilation error
usbnet: ipheth: prevent TX queue timeouts when device not ready
vhost_net: conditionally enable tx polling
uapi: fix linux/rxrpc.h userspace compilation errors
net: stmmac: fix LPI transitioning for dwmac4
atm: horizon: Fix irq release error
net-sysfs: trigger netlink notification on ifalias change via sysfs
openvswitch: Using kfree_rcu() to simplify the code
openvswitch: Make local function ovs_nsh_key_attr_size() static
openvswitch: Fix return value check in ovs_meter_cmd_features()
...
Linus Torvalds [Wed, 15 Nov 2017 19:36:08 +0000 (11:36 -0800)]
Merge tag 'mips_4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips
Pull MIPS updates from James Hogan:
"These are the main MIPS changes for 4.15.
Fixes:
- ralink: Fix MT7620 PCI build issues (4.5)
- Disable cmpxchg64() and HAVE_VIRT_CPU_ACCOUNTING_GEN for 32-bit SMP
(4.1)
- Fix MIPS64 FP save/restore on 32-bit kernels (4.0)
- ptrace: Pick up ptrace/seccomp changed syscall numbers (3.19)
- ralink: Fix MT7628 pinmux (3.19)
- BCM47XX: Fix LED inversion on WRT54GSv1 (3.17)
- Fix n32 core dumping as o32 since regset support (3.13)
- ralink: Drop obsolete USB_ARCH_HAS_HCD select
Build system:
- Default to "generic" (multiplatform) system type instead of IP22
- Use generic little endian MIPS32 r2 configuration as default
defconfig instead of ip22_defconfig
FPU emulation:
- Fix exception generation for certain R6 FPU instructions
SMP:
- Allow __cpu_number_map to be larger than NR_CPUS for sparse CPU id
spaces
Miscellaneous:
- Add iomem resource for kernel bss section for kexec/kdump
- Atomics: Nudge writes on bit unlock
- DT files: Standardise "ok" -> "okay"
Minor cleanups:
- Define virt_to_pfn()
- Make thread_saved_pc static
- Simplify 32-bit sign extension in __read_64bit_c0_split()
- DMA: Use vma_pages() helper
- FPU emulation: Replace unsigned with unsigned int
- MM: Removed unused lastpfn
- Alchemy: Make clk_ops const
- Lasat: Use setup_timer() helper
- ralink: Use BIT() in MT7620 PCI driver
Platform support:
BMIPS:
- Enable HARDIRQS_SW_RESEND
Broadcom BCM63XX:
- Add clkdev lookup support
- Update clk driver, UART driver, DTs to handle named refclk from DTs
- Split apart various clocks to more closely match hardware
- Add ethernet clocks
Cavium Octeon:
- Remove usage of cvmx_wait() in favour of __delay()
ImgTec Pistachio:
- DT: Drop deprecated dwmmc num-slots property
Ingenic JZ4780:
- Add NFS root to Ci20 defconfig
- Add watchdog to Ci20 DT & defconfig, and allow building of watchdog
driver with this SoC
Generic (multiplatform):
- Migrate xilfpga (MIPSfpga) platform to the generic platform
Lantiq xway:
- Fix ASC0/ASC1 clocks"
* tag 'mips_4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips: (46 commits)
MIPS: Add iomem resource for kernel bss section.
MIPS: cmpxchg64() and HAVE_VIRT_CPU_ACCOUNTING_GEN don't work for 32-bit SMP
MIPS: BMIPS: Enable HARDIRQS_SW_RESEND
MIPS: pci: Make use of the BIT() macro inside the mt7620 driver
MIPS: pci: Remove KERN_WARN instance inside the mt7620 driver
MIPS: pci: Remove duplicate define in mt7620 driver
MIPS: ralink: Fix typo in mt7628 pinmux function
MIPS: ralink: Fix MT7628 pinmux
MIPS: Fix odd fp register warnings with MIPS64r2
watchdog: jz4780: Allow selection of jz4740-wdt driver
MIPS/ptrace: Update syscall nr on register changes
MIPS/ptrace: Pick up ptrace/seccomp changed syscalls
MIPS: Fix an n32 core file generation regset support regression
MIPS: Fix MIPS64 FP save/restore on 32-bit kernels
MIPS: page.h: Define virt_to_pfn()
MIPS: Xilfpga: Switch to using generic defconfigs
MIPS: generic: Add support for MIPSfpga
MIPS: Set defconfig target to a generic system for 32r2el
MIPS: Kconfig: Set default MIPS system type as generic
MIPS: DTS: Remove num-slots from Pistachio SoC
...
Linus Torvalds [Wed, 15 Nov 2017 18:56:56 +0000 (10:56 -0800)]
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
"The big highlight is support for the Scalable Vector Extension (SVE)
which required extensive ABI work to ensure we don't break existing
applications by blowing away their signal stack with the rather large
new vector context (<= 2 kbit per vector register). There's further
work to be done optimising things like exception return, but the ABI
is solid now.
Much of the line count comes from some new PMU drivers we have, but
they're pretty self-contained and I suspect we'll have more of them in
future.
Plenty of acronym soup here:
- initial support for the Scalable Vector Extension (SVE)
- improved handling for SError interrupts (required to handle RAS
events)
- enable GCC support for 128-bit integer types
- remove kernel text addresses from backtraces and register dumps
- use of WFE to implement long delay()s
- ACPI IORT updates from Lorenzo Pieralisi
- perf PMU driver for the Statistical Profiling Extension (SPE)
- perf PMU driver for Hisilicon's system PMUs
- misc cleanups and non-critical fixes"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (97 commits)
arm64: Make ARMV8_DEPRECATED depend on SYSCTL
arm64: Implement __lshrti3 library function
arm64: support __int128 on gcc 5+
arm64/sve: Add documentation
arm64/sve: Detect SVE and activate runtime support
arm64/sve: KVM: Hide SVE from CPU features exposed to guests
arm64/sve: KVM: Treat guest SVE use as undefined instruction execution
arm64/sve: KVM: Prevent guests from using SVE
arm64/sve: Add sysctl to set the default vector length for new processes
arm64/sve: Add prctl controls for userspace vector length management
arm64/sve: ptrace and ELF coredump support
arm64/sve: Preserve SVE registers around EFI runtime service calls
arm64/sve: Preserve SVE registers around kernel-mode NEON use
arm64/sve: Probe SVE capabilities and usable vector lengths
arm64: cpufeature: Move sys_caps_initialised declarations
arm64/sve: Backend logic for setting the vector length
arm64/sve: Signal handling support
arm64/sve: Support vector length resetting for new processes
arm64/sve: Core task context handling
arm64/sve: Low-level CPU setup
...
Linus Torvalds [Wed, 15 Nov 2017 18:49:15 +0000 (10:49 -0800)]
Merge tag 'riscv-for-linus-4.15-arch-v9-premerge' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux
Pull RISC-V architecture support from Palmer Dabbelt:
"This contains the core RISC-V Linux port, which has been through nine
rounds of review on various mailing lists. The port is not complete:
there's some cleanup patches moving through the review process, a
whole bunch of drivers that need some work, and a lot of feature
additions that will be needed.
The patches contained in this tag have been through nine rounds of
review on the various mailing lists. I have some outstanding cleanup
patches, but since there's been so much review on these patches I
thought it would be best to submit them as-is and then submit explicit
cleanup patches so everyone can review them. This first patch set is
big enough that it's a bit of a pain to constantly rewrite, and it's
caused a few headaches with various contributors.
The port is definately a work in progress. While what's there builds
and boots with 4.14, it's a bit hard to actually see anything happen
because there are no device drivers yet. I maintain a staging branch
that contains all the device drivers and cleanup that actually works,
but those patches won't all be ready for a while. I'd like to get what
we currently have into your tree so everyone can start working from a
single base -- of particular importance is allowing the glibc
upstreaming process to proceed so we can sort out any possibly
lingering user-visible ABI problems we might have.
Copied below is the ChangeLog that contains the history of this patch
set:
(v9) As per suggestions on our v8 patch set, I've split the core
architecture code out from our drivers and would like to submit
this patch set to be included into linux-next, with the goal
being to be merged in during the next merge window. This patch
set is based on 4.14-rc2, but if it's better to have it based on
something else then I can change it around.
This patch set contains just the core arch code for RISC-V, so
while it builds an nominally boots, you can't print or take an
interrupt so it's not that useful. If you're looking to actually
boot a system it would probably be better to use the full patch
set listed below.
We've collected a handful of tags from reviewers, and the
remainder of the patch set only got minimal feedback last time.
Here's what changed:
- We now use the device tree to initialize the timer driver so
it's less tighly coupled with the arch port.
- I cleaned up the defconfigs -- there's actually now just one,
and it's empty. For now I think we're OK with what the kernel
sets as defaults, but I anticipate we'll begin to expand this
as people start to use the port more.
- The VDSO symbols version is sane.
- We WFI while spinning in the boot loop.
- A handful of comments have been added.
While there are still a handful of FIXMEs in this patch set,
we've started to get enough interest from various users and
contributors that maintaining an out of tree patch set is
starting to become a big burden. Hopefully the patches are good
enough to merge now, which will at least get everyone working in
a more reasonable manner as we clean up the remaining issues.
(v8) I know it may not be the ideal time to submit a patch set right
now, as it's the middle of the merge window, but things have
calmed down quite a bit in the last month so I thought it would
be good to get everyone on the same page. There's been a handful
of changes since the last patch set, but most of them are fairly
minor:
- We changed PAGE_OFFSET to allowing mapping more physical
memory on 64-bit systems. This is user configurable, as it
triggers a different code model that generates slightly less
efficient code.
- The device tree binding documentation is back, I'd managed to
lose it at some point.
- We now pass the atomic64 test suite
- The SBI timer driver has been refactored.
(v7) It's been a while since my last patch set, but the changes han
been fairly minimal:
- The PCI cleanup patches have been dropped, we'll do them as a
separate patch set later.
- We've the Kconfig entries from CONFIG_ISA_* to
CONFIG_RISCV_ISA_*, to make grep easier.
- There have been a handful of memory model related tweaks in
I/O land, particularly relating the PCI and the upcoming
platform specification. There are significant comments in the
relevant files. This is still a WIP, but I think we're close
to getting as good as we're going to get until we end up with
some more specifications.
(v6) As it's been only a day since the v5 patch set, the changes are
pretty minimal:
- The patch set is now based on linux-next/master, which I
believe is a better base now that we're getting closer to
upstream.
- EARLY_PRINTK is no longer an option. Since the SBI console is
reasonable, there's no penalty to enabling it (and thus no
benefit to disabling it).
- The mmap syscalls were refactored a bit.
(v5) Things have really started to calm down, so this is fairly
similar to the v4 patch set. The most interesting changes
include:
- We've moved back to a single patch set.
- SMP support has been fixed, I was accidentally running on a
non-SMP configuration. There were various mistakes all over
the tree as a result of this.
- The cmpxchg syscalls have been removed, as they were deemed a
bad idea. As a result, RISC-V Linux systems mandate the A
extension. The corresponding Kconfig entry to enable builds
on non-A systems has been removed.
- A few more atomic fixes: mostly fence changes, but those
resulted in a handful of additional macros that were no
longer necessary.
- riscv_early_sie has been removed.
(v4) There have only been a few changes since the v3 patch set:
- The cmpxchg64 syscall is no longer enabled on 32-bit systems.
It's not possible to provide this on SMP systems, and it's
not necessary as glibc knows not to call it.
- We provide a ELF_HWCAP so users can determine the ISA of the
machine the kernel is running on.
- The multi-line comments are in a better form.
- There were a handful of headers that could be replaced with
the asm-generic versions, and a few unnecessary definitions.
- We no longer use printk, but instead use pr_*.
- A few Kconfig and defconfig entries have been cleaned up.
(v3) A highlight of the changes since the v2 patch set includes:
- We've split out all our drivers into separate patch sets,
which I've already sent out to the relevant maintainers. I
haven't included those patches in this patch set, but some of
them are necessary to build our port.
- The patch set is now split up differently: rather than being
split per directory it is split per topic. Hopefully this
will make it easier to review the port on the mailing list.
The split is a bit rough, so you probably still want to look
at the patch set as a whole.
- atomic.h has been completely rewritten and is hopefully now
correct. I've attempted to sanitize the various other memory
model related code as well, and I think it should all be sane
now aside from a handful of FIXMEs commented in the code.
- We've changed the cmpexchg syscall to always exist and to not
be multiplexed. There is also a VDSO entry for compare and
exchange, which allows kernels with the A extension to
execute user code without the A extension reasonably fast.
- Our user-visible register state now contains enough space for
the Q extension for 128-bit floating point, as well as a few
words to allow extensibility to future ISA extensions like
the eventual V extension for vectors.
- A handful of driver cleanups, but these have been split into
separate patch sets now so I won't duplicate them here.
(v2) A highlight of the changes since the v1 patch set includes:
- We've split out our drivers into the right places, which
means now there's a lot more patches. I'll be submitting
these patches to various subsystem maintainers and including
them in any future RISC-V patch sets until they've been
merged.
- The SBI console driver has been completely rewritten to use
the HVC helpers and is now significantly smaller.
- We've begun to use weaker barriers as opposed to just the big
"fence". There's still some work to do here, specifically:
- We need fences in the relaxed MMIO functions.
- The non-relaxed MMIO functions are missing R/W bits on their fences.
- Many AMOs need the aq and rl bits set.
- We now have thread_info in task_struct. As a result, sscratch
now contains TP instead of SP. This was necessary because
thread_info is no longer on the stack.
- A few shared routines have been added that we use instead of
creating another arch copy"
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
* tag 'riscv-for-linus-4.15-arch-v9-premerge' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux:
RISC-V: Build Infrastructure
RISC-V: User-facing API
RISC-V: Paging and MMU
RISC-V: Device, timer, IRQs, and the SBI
RISC-V: Task implementation
RISC-V: ELF and module implementation
RISC-V: Generic library routines and assembly
RISC-V: Atomic and Locking Code
RISC-V: Init and Halt Code
dt-bindings: RISC-V CPU Bindings
lib: Add shared copies of some GCC library routines
MAINTAINERS: Add RISC-V
Linus Torvalds [Wed, 15 Nov 2017 18:21:58 +0000 (10:21 -0800)]
Merge branch 'for-linus' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching
Pull livepatching updates from Jiri Kosina:
- shadow variables support, allowing livepatches to associate new
"shadow" fields to existing data structures, from Joe Lawrence
- pre/post patch callbacks API, allowing livepatch writers to register
callbacks to be called before and after patch application, from Joe
Lawrence
* 'for-linus' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
livepatch: __klp_disable_patch() should never be called for disabled patches
livepatch: Correctly call klp_post_unpatch_callback() in error paths
livepatch: add transition notices
livepatch: move transition "complete" notice into klp_complete_transition()
livepatch: add (un)patch callbacks
livepatch: Small shadow variable documentation fixes
livepatch: __klp_shadow_get_or_alloc() is local to shadow.c
livepatch: introduce shadow variable API
Linus Torvalds [Wed, 15 Nov 2017 17:43:57 +0000 (09:43 -0800)]
Merge branch 'for-linus' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jikos/hid
Pull HID updates from Jiri Kosina:
- high resolution mode for Dell canvas support, from Benjamin Tissoires
- pen handling fixes for the Wacom driver, from Jason Gerecke
- i2c-hid: Apollo-Lake based laptops improvements, from Hans de Goede
- Input/Core: eraser tool support, from Ping Cheng
- new ALPS touchpad (T4, found currently on HP EliteBook 1000, Zbook
Stduio and HP Elite book x360) supportm from Masaki Ota
- other smaller assorted fixes
* 'for-linus' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (33 commits)
HID: cp2112: fix broken gpio_direction_input callback
HID: cp2112: fix interface specification URL
HID: Wacom: switch Dell canvas into highres mode
HID: wacom: generic: Send BTN_STYLUS3 when both barrel switches are set
HID: sony: Fix SHANWAN pad rumbling on USB
HID: i2c-hid: Add no-irq-after-reset quirk for 0911:5288 device
HID: add backlight level quirk for Asus ROG laptops
HID: cp2112: add HIDRAW dependency
HID: Add ID 044f:b605 ThrustMaster, Inc. force feedback Racing Wheel
HID: hid-logitech: remove redundant assignment to pointer value
HID: wacom: generic: Recognize WACOM_HID_WD_PEN as a type of pen collection
HID: rmi: Check that a device is a RMI device before calling RMI functions
HID: add multi-input quirk for GamepadBlock
HID: alps: add new U1 device ID
HID: alps: add support for Alps T4 Touchpad device
HID: alps: remove variables local to u1_init() from the device struct
HID: alps: properly handle max_fingers and minimum on X and Y axis
HID: alps: Separate U1 device code
HID: alps: delete unnecessary struct u1_dev devInfo
HID: usbhid: Convert timers to use timer_setup()
...
Eric Dumazet [Wed, 15 Nov 2017 05:02:19 +0000 (21:02 -0800)]
tcp: highest_sack fix
syzbot easily found a regression added in our latest patches [1]
No longer set tp->highest_sack to the head of the send queue since
this is not logical and error prone.
Only sack processing should maintain the pointer to an skb from rtx queue.
We might in the future only remember the sequence instead of a pointer to skb,
since rb-tree should allow a fast lookup.
[1]
BUG: KASAN: use-after-free in tcp_highest_sack_seq include/net/tcp.h:1706 [inline]
BUG: KASAN: use-after-free in tcp_ack+0x42bb/0x4fd0 net/ipv4/tcp_input.c:3537
Read of size 4 at addr ffff8801c154faa8 by task syz-executor4/12860
Fixes: 737ff314563c ("tcp: use sequence distance to detect reordering") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hangbin Liu [Wed, 15 Nov 2017 01:43:09 +0000 (09:43 +0800)]
geneve: fix fill_info when link down
geneve->sock4/6 were added with geneve_open and released with geneve_stop.
So when geneve link down, we will not able to show remote address and
checksum info after commit 11387fe4a98 ("geneve: fix fill_info when using
collect_metadata").
Fix this by avoid passing *_REMOTE{,6} for COLLECT_METADATA since they are
mutually exclusive, and always show UDP_ZERO_CSUM6_RX info.
Fixes: 11387fe4a98 ("geneve: fix fill_info when using collect_metadata") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 15 Nov 2017 01:15:50 +0000 (17:15 -0800)]
bpf: fix lockdep splat
pcpu_freelist_pop() needs the same lockdep awareness than
pcpu_freelist_populate() to avoid a false positive.
[ INFO: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected ]
switchto-defaul/12508 [HC0[0]:SC0[6]:HE0:SE0] is trying to acquire:
(&htab->buckets[i].lock){......}, at: [<ffffffff9dc099cb>] __htab_percpu_map_update_elem+0x1cb/0x300
and this task is already holding:
(dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2){+.-...}, at: [<ffffffff9e135848>] __dev_queue_xmit+0
x868/0x1240
which would create a new lock dependency:
(dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2){+.-...} -> (&htab->buckets[i].lock){......}
Fixes: e19494edab82 ("bpf: introduce percpu_freelist") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Wed, 15 Nov 2017 08:35:02 +0000 (09:35 +0100)]
net: cdc_ncm: GetNtbFormat endian fix
The GetNtbFormat and SetNtbFormat requests operate on 16 bit little
endian values. We get away with ignoring this most of the time, because
we only care about USB_CDC_NCM_NTB16_FORMAT which is 0x0000. This
fails for USB_CDC_NCM_NTB32_FORMAT.
Fix comparison between LE value from device and constant by converting
the constant to LE.
Reported-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Fixes: 2b02c20ce0c2 ("cdc_ncm: Set NTB format again after altsetting switch for Huawei devices") Cc: Enrico Mioso <mrkiko.rs@gmail.com> Cc: Christian Panton <christian@panton.org> Signed-off-by: Bjørn Mork <bjorn@mork.no> Acked-By: Enrico Mioso <mrkiko.rs@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Kosina [Wed, 15 Nov 2017 10:14:23 +0000 (11:14 +0100)]
Merge branch 'for-4.15/wacom' into for-linus
- High resolution mode for DEll canvas support, from Benjamin Tissoires
- A lot of improvements to pen handling in the Wacom driver, from Jason Gerecke
Jiri Kosina [Wed, 15 Nov 2017 10:02:25 +0000 (11:02 +0100)]
Merge branch 'for-4.14/upstream-fixes' into for-linus
- Wacom: recognize PEN application collection properly, from Jason Gerecke
- RMI: avoid cofusion caused by RMI functions being by mistake called on
non-RMI devices, from Andrew Duggan
- small device-ID-specific quirks/fixes
Jiri Kosina [Wed, 15 Nov 2017 09:53:24 +0000 (10:53 +0100)]
Merge branch 'for-4.15/callbacks' into for-linus
This pulls in an infrastructure/API that allows livepatch writers to
register pre-patch and post-patch callbacks that allow for running a
glue code necessary for finalizing the patching if necessary.
Conflicts:
kernel/livepatch/core.c
- trivial conflict by adding a callback call into
module going notifier vs. moving that code block
to klp_cleanup_module_patches_limited()
Jiri Kosina [Wed, 15 Nov 2017 09:49:14 +0000 (10:49 +0100)]
Merge branch 'for-4.15/shadow-variables' into for-linus
Shadow variables allow callers to associate new shadow fields to existing data
structures. This is intended to be used by livepatch modules seeking to
emulate additions to data structure definitions.
openvswitch: meter: fix NULL pointer dereference in ovs_meter_cmd_reply_start
It seems that the intention of the code is to null check the value
returned by function genlmsg_put. But the current code is null
checking the address of the pointer that holds the value returned
by genlmsg_put.
Fix this by properly null checking the value returned by function
genlmsg_put in order to avoid a pontential null pointer dereference.
Addresses-Coverity-ID: 1461561 ("Dereference before null check")
Addresses-Coverity-ID: 1461562 ("Dereference null return value") Fixes: 96fbc13d7e77 ("openvswitch: Add meter infrastructure") Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Fix compilation on 32 bit platforms (where doing modulus operation
with 64 bit requires extra glibc functions) by truncation.
The jitter for table distribution is limited to a 32 bit value
because random numbers are scaled as 32 bit value.
Also fix some whitespace.
Fixes: 99803171ef04 ("netem: add uapi to express delay and jitter in nanoseconds") Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Since times are now expressed in nanosecond, need to now do
true 64 bit divide. Old code would truncate rate at 32 bits.
Rename function to better express current usage.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Make default TCP default congestion control to a per namespace
value. This changes default congestion control to a pointer to congestion ops
(rather than implicit as first element of available lsit).
The congestion control setting of new namespaces is inherited
from the current setting of the root namespace.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
So fib_seq_sum() can't use rtnl_lock() only for protection.
The possible solution could be to use rtnl_lock()
in fib_notifier_ops_unregister(), but this adds
a possible delay during net namespace creation,
so we better use rcu_read_lock() till someone
really needs the mutex (if that happens).
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolas Dichtel [Tue, 14 Nov 2017 13:21:32 +0000 (14:21 +0100)]
ipv6: set all.accept_dad to 0 by default
With commits 35e015e1f577 and a2d3f3e33853, the global 'accept_dad' flag
is also taken into account (default value is 1). If either global or
per-interface flag is non-zero, DAD will be enabled on a given interface.
This is not backward compatible: before those patches, the user could
disable DAD just by setting the per-interface flag to 0. Now, the
user instead needs to set both flags to 0 to actually disable DAD.
Restore the previous behaviour by setting the default for the global
'accept_dad' flag to 0. This way, DAD is still enabled by default,
as per-interface flags are set to 1 on device creation, but setting
them to 0 is enough to disable DAD on a given interface.
- Before 35e015e1f57a7 and a2d3f3e33853:
global per-interface DAD enabled
[default] 1 1 yes
X 0 no
X 1 yes
- After 35e015e1f577 and a2d3f3e33853:
global per-interface DAD enabled
[default] 1 1 yes
0 0 no
0 1 yes
1 0 yes
- After this fix:
global per-interface DAD enabled
1 1 yes
0 0 no
[default] 0 1 yes
1 0 yes
Fixes: 35e015e1f577 ("ipv6: fix net.ipv6.conf.all interface DAD handlers") Fixes: a2d3f3e33853 ("ipv6: fix net.ipv6.conf.all.accept_dad behaviour for real") CC: Stefano Brivio <sbrivio@redhat.com> CC: Matteo Croce <mcroce@redhat.com> CC: Erik Kline <ek@google.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry V. Levin [Tue, 14 Nov 2017 03:30:11 +0000 (06:30 +0300)]
uapi: fix linux/tls.h userspace compilation error
Move inclusion of a private kernel header <net/tcp.h>
from uapi/linux/tls.h to its only user - net/tls.h,
to fix the following linux/tls.h userspace compilation error:
/usr/include/linux/tls.h:41:21: fatal error: net/tcp.h: No such file or directory
As to this point uapi/linux/tls.h was totaly unusuable for userspace,
cleanup this header file further by moving other redundant includes
to net/tls.h.
Fixes: 3c4d7559159b ("tls: kernel TLS support") Cc: <stable@vger.kernel.org> # v4.13+ Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Signed-off-by: David S. Miller <davem@davemloft.net>
usbnet: ipheth: prevent TX queue timeouts when device not ready
iOS devices require the host to be "trusted" before servicing network
packets. Establishing trust requires the user to confirm a dialog on the
iOS device.Until trust is established, the iOS device will silently discard
network packets from the host. Currently, the ipheth driver does not detect
whether an iOS device has established trust with the host, and immediately
sets up the transmit queues.
This causes the following problems:
- Kernel taint due to WARN() in netdev watchdog.
- Dmesg spam ("TX timeout").
- Disruption of user space networking activity (dhcpd, etc...) when new
interface comes up but cannot be used.
- Unnecessary host and device wakeups and USB traffic
The last message "TX timeout" is repeated every 5 seconds until trust is
established or the device is disconnected, filling up dmesg.
The proposed patch eliminates the problem by, upon connection, keeping the
TX queue and carrier disabled until a packet is first received from the iOS
device. This is reflected by the confirmed_pairing variable in the device
structure. Only after at least one packet has been received from the iOS
device, the transmit queue and carrier are brought up during the periodic
device poll in ipheth_carrier_set. Because the iOS device will always send
a packet immediately upon trust being established, this should not delay
the interface becoming useable. To prevent failed UBRs in
ipheth_rcvbulk_callback from perpetually re-enabling the queue if it was
disabled, a new check is added so only successful transfers re-enable the
queue, whereas failed transfers only trigger an immediate poll.
This has the added benefit of removing the periodic control requests to the
iOS device until trust has been established and thus should reduce wakeup
events on both the host and the iOS device.
Signed-off-by: Alexander Kappner <agk@godking.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Wang [Mon, 13 Nov 2017 03:45:34 +0000 (11:45 +0800)]
vhost_net: conditionally enable tx polling
We always poll tx for socket, this is sub optimal since this will
slightly increase the waitqueue traversing time and more important,
vhost could not benefit from commit 9e641bdcfa4e ("net-tun:
restructure tun_do_read for better sleep/wakeup efficiency") even if
we've stopped rx polling during handle_rx(), tx poll were still left
in the waitqueue.
Pktgen from a remote host to VM over mlx4 on two 2.00GHz Xeon E5-2650
shows 11.7% improvements on rx PPS. (from 1.28Mpps to 1.44Mpps)
Cc: Wei Xu <wexu@redhat.com> Cc: Matthew Rosato <mjrosato@linux.vnet.ibm.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Consistently use types provided by <linux/types.h> to fix the following
linux/rxrpc.h userspace compilation errors:
/usr/include/linux/rxrpc.h:24:2: error: unknown type name 'u16'
u16 srx_service; /* service desired */
/usr/include/linux/rxrpc.h:25:2: error: unknown type name 'u16'
u16 transport_type; /* type of transport socket (SOCK_DGRAM) */
/usr/include/linux/rxrpc.h:26:2: error: unknown type name 'u16'
u16 transport_len; /* length of transport address */
Use __kernel_sa_family_t instead of sa_family_t the same way
as uapi/linux/in.h does, to fix the following
linux/rxrpc.h userspace compilation errors:
/usr/include/linux/rxrpc.h:23:2: error: unknown type name 'sa_family_t'
sa_family_t srx_family; /* address family */
/usr/include/linux/rxrpc.h:28:3: error: unknown type name 'sa_family_t'
sa_family_t family; /* transport address family */
Fixes: 727f8914477e ("rxrpc: Expose UAPI definitions to userspace") Cc: <stable@vger.kernel.org> # v4.14 Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Wed, 15 Nov 2017 02:25:40 +0000 (18:25 -0800)]
Merge tag 'devicetree-for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull DeviceTree updates from Rob Herring:
"A bigger diffstat than usual with the kbuild changes and a tree wide
fix in the binding documentation.
Summary:
- kbuild cleanups and improvements for dtbs
- Code clean-up of overlay code and fixing for some long standing
memory leak and race condition in applying overlays
- Improvements to DT memory usage making sysfs/kobjects optional and
skipping unflattening of disabled nodes. This is part of kernel
tinification efforts.
- Final piece of removing storing the full path for every DT node.
The prerequisite conversion of printk's to use device_node format
specifier happened in 4.14.
- Sync with current upstream dtc. This brings additional checks to
dtb compiling.
- Binding doc tree wide removal of leading 0s from examples
- RTC binding documentation adding missing devices and some
consolidation of duplicated bindings
- Vendor prefix documentation for nutsboard, Silicon Storage
Technology, shimafuji, Tecon Microprocessor Technologies, DH
electronics GmbH, Opal Kelly, and Next Thing"
* tag 'devicetree-for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (55 commits)
dt-bindings: usb: add #phy-cells to usb-nop-xceiv
dt-bindings: Remove leading zeros from bindings notation
kbuild: handle dtb-y and CONFIG_OF_ALL_DTBS natively in Makefile.lib
MIPS: dts: remove bogus bcm96358nb4ser.dtb from dtb-y entry
kbuild: clean up *.dtb and *.dtb.S patterns from top-level Makefile
.gitignore: move *.dtb and *.dtb.S patterns to the top-level .gitignore
.gitignore: sort normal pattern rules alphabetically
dt-bindings: add vendor prefix for Next Thing Co.
scripts/dtc: Update to upstream version v1.4.5-6-gc1e55a5513e9
of: dynamic: fix memory leak related to properties of __of_node_dup
of: overlay: make pr_err() string unique
of: overlay: pr_err from return NOTIFY_OK to overlay apply/remove
of: overlay: remove unneeded check for NULL kbasename()
of: overlay: remove a dependency on device node full_name
of: overlay: simplify applying symbols from an overlay
of: overlay: avoid race condition between applying multiple overlays
of: overlay: loosen overly strict phandle clash check
of: overlay: expand check of whether overlay changeset can be removed
of: overlay: detect cases where device tree may become corrupt
of: overlay: minor restructuring
...
Linus Torvalds [Wed, 15 Nov 2017 02:09:31 +0000 (18:09 -0800)]
Merge tag 'leds_for_4.15rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds
Pull LED updates from Jacek Anaszewski:
"New LED class driver:
- add a driver for PC Engines APU/APU2 LEDs
New LED trigger:
- add a system activity LED trigger
LED core improvements:
- replace flags bit shift with BIT() macros
Convert timers to use timer_setup() in:
- led-core
- ledtrig-activity
- ledtrig-heartbeat
- ledtrig-transient
LED class drivers fixes:
- lp55xx: fix spelling mistake: 'cound' -> 'could'
- tca6507: Remove unnecessary reg check
- pca955x: Don't invert requested value in pca955x_gpio_set_value()
LED documentation improvements:
- update 00-INDEX file"
* tag 'leds_for_4.15rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds:
leds: Add driver for PC Engines APU/APU2 LEDs
leds: lp55xx: fix spelling mistake: 'cound' -> 'could'
leds: Convert timers to use timer_setup()
Documentation: leds: Update 00-INDEX file
leds: tca6507: Remove unnecessary reg check
leds: ledtrig-heartbeat: Convert timers to use timer_setup()
leds: Replace flags bit shift with BIT() macros
leds: pca955x: Don't invert requested value in pca955x_gpio_set_value()
leds: ledtrig-activity: Add a system activity LED trigger
Linus Torvalds [Wed, 15 Nov 2017 02:01:46 +0000 (18:01 -0800)]
Merge tag 'sound-4.15-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound updates from Takashi Iwai:
"There are no big surprising changes in this cycle, yet not too boring,
either. The biggest change from diffstat POV is the removal of the
legacy OSS driver codes that have been already disabled for a long
time. This will bring a few trivial merge conflicts.
As new features in ASoC side, there are two things: a new AC97 bus
implementation and AMD Stony platform support. Both include the
relevant changes shared with other subsystems, e.g. AC97 MFD changes
and DRM AMD changes.
Some other highlighted topics are:
- A bunch of USB-audio drivers got the hardening against the
malicious device accesses with a new helper code for endpoint
sanity check
- Lots of cleanups for ASoC Intel platform code, including support
for their open source audio firmware
- Continued ASoC core componentization works
- Support for scaling MCLK with sample rate in ASoC simple-card
- Stabler PCM hot-unplug capability, especially for ASoC usages"
* tag 'sound-4.15-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (302 commits)
Documentation: sound: hd-audio: notes.rst
ASoC: bcm2835: Support left/right justified and DSP modes
ASoC: bcm2835: Enforce full symmetry
ASoC: bcm2835: Support additional samplerates up to 384kHz
ASoC: bcm2835: Add support for TDM modes
ASoC: add mclk-fs support to audio graph card
ASoC: add mclk-fs to audio graph card binding
ASoC: rt5514: work around link error
ASoC: rt5514: mark PM functions as __maybe_unused
ASoC: rt5663: Check the JD status in the button pushing
ASoC: amd: Modified DMA transfer Mechanism for Playback
ASoC: rt5645: Wait for 400msec before concluding on value of RT5645_VENDOR_ID2
ASoC: sun4i-codec: fixed 32bit audio capture support for H3/H2+
ASoC: da7213: add support for DSP modes
ASoC: sun8i-codec: Add a comment on the LRCK inversion
ASoC: sun8i-codec: Set the BCLK divider
ASoC: rt5663: Delay and retry reading rt5663 ID register
ASoC: amd: use do_div rather than 64 bit division to fix 32 bit builds
ASoC: cs42l56: Fix reset GPIO name in example DT binding
ASoC: rt5514-spi: check irq status to schedule data copy in resume function
...
Linus Torvalds [Wed, 15 Nov 2017 01:52:21 +0000 (17:52 -0800)]
Merge branch 'i2c/for-4.15' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c updates from Wolfram Sang:
"This contains two bigger than usual tree-wide changes this time. They
all have proper acks, caused no merge conflicts in linux-next where
they have been for a while. They are namely:
- to-gpiod conversion of the i2c-gpio driver and its users (touching
arch/* and drivers/mfd/*)
- adding a sbs-manager based on I2C core updates to SMBus alerts
(touching drivers/power/*)
Other notable changes:
- i2c_boardinfo can now carry a dev_name to be used when the device
is created. This is because some devices in ACPI world need fixed
names to find the regulators.
- the designware driver got a long discussed overhaul of its PM
handling. img-scb and davinci got PM support, too.
- at24 driver has way better OF support. And it has a new maintainer.
Thanks Bartosz for stepping up!
The rest is regular driver updates and fixes"
* 'i2c/for-4.15' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (55 commits)
ARM: sa1100: simpad: Correct I2C GPIO offsets
i2c: aspeed: Deassert reset in probe
eeprom: at24: Add OF device ID table
MAINTAINERS: new maintainer for AT24 driver
i2c: nuc900: remove platform_data, too
i2c: thunderx: Remove duplicate NULL check
i2c: taos-evm: Remove duplicate NULL check
i2c: Make i2c_unregister_device() NULL-aware
i2c: xgene-slimpro: Support v2
i2c: mpc: remove useless variable initialization
i2c: omap: Trigger bus recovery in lockup case
i2c: gpio: Add support for named gpios in DT
dt-bindings: i2c: i2c-gpio: Add support for named gpios
i2c: gpio: Local vars in probe
i2c: gpio: Augment all boardfiles to use open drain
i2c: gpio: Enforce open drain through gpiolib
gpio: Make it possible for consumers to enforce open drain
i2c: gpio: Convert to use descriptors
power: supply: sbs-message: fix some code style issues
power: supply: sbs-battery: remove unchecked return var
...
Linus Torvalds [Wed, 15 Nov 2017 01:23:44 +0000 (17:23 -0800)]
Merge tag 'gpio-v4.15-1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO updates from Linus Walleij:
"This is the bulk of GPIO changes for the v4.15 kernel cycle:
Core:
- Fix the semantics of raw GPIO to actually be raw. No inversion
semantics as before, but also no open draining, and allow the raw
operations to affect lines used for interrupts as the caller
supposedly knows what they are doing if they are getting the big
hammer.
- Rewrote the __inner_function() notation calls to names that make
more sense. I just find this kind of code disturbing.
- Drop the .irq_base() field from the gpiochip since now all IRQs are
mapped dynamically. This is nice.
- Support for .get_multiple() in the core driver API. This allows us
to read several GPIO lines with a single register read. This has
high value for some usecases: it can be used to create
oscilloscopes and signal analyzers and other things that rely on
reading several lines at exactly the same instant. Also a generally
nice optimization. This uses the new assign_bit() macro from the
bitops lib that was ACKed by Andrew Morton and is implemented for
two drivers, one of them being the generic MMIO driver so everyone
using that will be able to benefit from this.
- Do not allow requests of Open Drain and Open Source setting of a
GPIO line simultaneously. If the hardware actually supports
enabling both at the same time the electrical result would be
disastrous.
- A new interrupt chip core helper. This will be helpful to deal with
"banked" GPIOs, which means GPIO controllers with several logical
blocks of GPIO inside them. This is several gpiochips per device in
the device model, in contrast to the case when there is a 1-to-1
relationship between a device and a gpiochip.
New drivers:
- Maxim MAX3191x industrial serializer, a very interesting piece of
professional I/O hardware.
- Uniphier GPIO driver. This is the GPIO block from the recent
Socionext (ex Fujitsu and Panasonic) platform.
- Tegra 186 driver. This is based on the new banked GPIO
infrastructure.
Other improvements:
- Some documentation improvements.
- Wakeup support for the DesignWare DWAPB GPIO controller.
- Reset line support on the DesignWare DWAPB GPIO controller.
- Several non-critical bug fixes and improvements for the Broadcom
BRCMSTB driver.
- Misc non-critical bug fixes like exotic errorpaths, removal of dead
code etc.
- Explicit comments on fall-through switch() statements"
* tag 'gpio-v4.15-1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (65 commits)
gpio: tegra186: Remove tegra186_gpio_lock_class
gpio: rcar: Add r8a77995 (R-Car D3) support
pinctrl: bcm2835: Fix some merge fallout
gpio: Fix undefined lock_dep_class
gpio: Automatically add lockdep keys
gpio: Introduce struct gpio_irq_chip.first
gpio: Disambiguate struct gpio_irq_chip.nested
gpio: Add Tegra186 support
gpio: Export gpiochip_irq_{map,unmap}()
gpio: Implement tighter IRQ chip integration
gpio: Move lock_key into struct gpio_irq_chip
gpio: Move irq_valid_mask into struct gpio_irq_chip
gpio: Move irq_nested into struct gpio_irq_chip
gpio: Move irq_chained_parent to struct gpio_irq_chip
gpio: Move irq_default_type to struct gpio_irq_chip
gpio: Move irq_handler to struct gpio_irq_chip
gpio: Move irqdomain into struct gpio_irq_chip
gpio: Move irqchip into struct gpio_irq_chip
gpio: Introduce struct gpio_irq_chip
pinctrl: armada-37xx: remove unused variable
...
Linus Torvalds [Wed, 15 Nov 2017 00:54:12 +0000 (16:54 -0800)]
Merge tag 'dma-mapping-4.15' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping updates from Christoph Hellwig:
- turn dma_cache_sync into a dma_map_ops instance and remove
implementation that purely are dead because the architecture doesn't
support noncoherent allocations
- add a flag for busses that need DMA configuration (Robin Murphy)
* tag 'dma-mapping-4.15' of git://git.infradead.org/users/hch/dma-mapping:
dma-mapping: turn dma_cache_sync into a dma_map_ops method
sh: make dma_cache_sync a no-op
xtensa: make dma_cache_sync a no-op
unicore32: make dma_cache_sync a no-op
powerpc: make dma_cache_sync a no-op
mn10300: make dma_cache_sync a no-op
microblaze: make dma_cache_sync a no-op
ia64: make dma_cache_sync a no-op
frv: make dma_cache_sync a no-op
x86: make dma_cache_sync a no-op
floppy: consolidate the dummy fd_cacheflush definition
drivers: flag buses which demand DMA configuration
Linus Torvalds [Wed, 15 Nov 2017 00:49:31 +0000 (16:49 -0800)]
Merge tag 'dmaengine-4.15-rc1' of git://git.infradead.org/users/vkoul/slave-dma
Pull dmaengine updates from Vinod Koul:
"Updates for this cycle include:
- new driver for Spreadtrum dma controller, ST MDMA and DMAMUX
controllers
- PM support for IMG MDC drivers
- updates to bcm-sba-raid driver and improvements to sun6i driver
- subsystem conversion for:
- timers to use timer_setup()
- remove usage of PCI pool API
- usage of %p format specifier
- minor updates to bunch of drivers"
* tag 'dmaengine-4.15-rc1' of git://git.infradead.org/users/vkoul/slave-dma: (49 commits)
dmaengine: ti-dma-crossbar: Correct am335x/am43xx mux value type
dmaengine: dmatest: warn user when dma test times out
dmaengine: Revert "rcar-dmac: use TCRB instead of TCR for residue"
dmaengine: stm32_mdma: activate pack/unpack feature
dmaengine: at_hdmac: Remove unnecessary 0x prefixes before %pad
dmaengine: coh901318: Remove unnecessary 0x prefixes before %pad
MAINTAINERS: Step down from a co-maintaner of DW DMAC driver
dmaengine: pch_dma: Replace PCI pool old API
dmaengine: Convert timers to use timer_setup()
dmaengine: sprd: Add Spreadtrum DMA driver
dt-bindings: dmaengine: Add Spreadtrum SC9860 DMA controller
dmaengine: sun6i: Retrieve channel count/max request from devicetree
dmaengine: Build bcm-sba-raid driver as loadable module for iProc SoCs
dmaengine: bcm-sba-raid: Use common GPL comment header
dmaengine: bcm-sba-raid: Use only single mailbox channel
dmaengine: bcm-sba-raid: serialize dma_cookie_complete() using reqs_lock
dmaengine: pl330: fix descriptor allocation fail
dmaengine: rcar-dmac: use TCRB instead of TCR for residue
dmaengine: sun6i: Add support for Allwinner A64 and compatibles
arm64: allwinner: a64: Add devicetree binding for DMA controller
...
Linus Torvalds [Wed, 15 Nov 2017 00:43:27 +0000 (16:43 -0800)]
Merge tag 'iommu-v4.15-rc1' of git://github.com/awilliam/linux-vfio
Pull IOMMU updates from Alex Williamson:
"As Joerg mentioned[1], he's out on paternity leave through the end of
the year and I'm filling in for him in the interim:
- Enforce MSI multiple IRQ alignment in AMD IOMMU
- VT-d PASID error handling fixes
- Add r8a7795 IPMMU support
- Manage runtime PM links on exynos at {add,remove}_device callbacks
- Fix Mediatek driver name to avoid conflict
- Add terminate support to qcom fault handler
- 64-bit IOVA optimizations
- Simplfy IOVA domain destruction, better use of rcache, and skip
anchor nodes on copy
- Convert to IOMMU TLB sync API in io-pgtable-arm{-v7s}
- Drop command queue lock when waiting for CMD_SYNC completion on ARM
SMMU implementations supporting MSI to cacheable memory
- iomu-vmsa cleanup inspired by missed IOTLB sync callbacks
- Fix sleeping lock with preemption disabled for RT
- Dual MMU support for TI DRA7xx DSPs
- Optional flush option on IOVA allocation avoiding overhead when
caller can try other options
[1] https://lkml.org/lkml/2017/10/22/72"
* tag 'iommu-v4.15-rc1' of git://github.com/awilliam/linux-vfio: (54 commits)
iommu/iova: Use raw_cpu_ptr() instead of get_cpu_ptr() for ->fq
iommu/mediatek: Fix driver name
iommu/ipmmu-vmsa: Hook up r8a7795 DT matching code
iommu/ipmmu-vmsa: Allow two bit SL0
iommu/ipmmu-vmsa: Make IMBUSCTR setup optional
iommu/ipmmu-vmsa: Write IMCTR twice
iommu/ipmmu-vmsa: IPMMU device is 40-bit bus master
iommu/ipmmu-vmsa: Make use of IOMMU_OF_DECLARE()
iommu/ipmmu-vmsa: Enable multi context support
iommu/ipmmu-vmsa: Add optional root device feature
iommu/ipmmu-vmsa: Introduce features, break out alias
iommu/ipmmu-vmsa: Unify ipmmu_ops
iommu/ipmmu-vmsa: Clean up struct ipmmu_vmsa_iommu_priv
iommu/ipmmu-vmsa: Simplify group allocation
iommu/ipmmu-vmsa: Unify domain alloc/free
iommu/ipmmu-vmsa: Fix return value check in ipmmu_find_group_dma()
iommu/vt-d: Clear pasid table entry when memory unbound
iommu/vt-d: Clear Page Request Overflow fault bit
iommu/vt-d: Missing checks for pasid tables if allocation fails
iommu/amd: Limit the IOVA page range to the specified addresses
...
Linus Torvalds [Wed, 15 Nov 2017 00:23:44 +0000 (16:23 -0800)]
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"This is mostly updates of the usual suspects: lpfc, qla2xxx, hisi_sas,
megaraid_sas, pm80xx, mpt3sas, be2iscsi, hpsa. and a host of minor
updates.
There's no major behaviour change or additions to the core in all of
this, so the potential for regressions should be small (biggest
potential being in the scsi error handler changes)"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (203 commits)
scsi: lpfc: Fix hard lock up NMI in els timeout handling.
scsi: mpt3sas: remove a stray KERN_INFO
scsi: mpt3sas: cleanup _scsih_pcie_enumeration_event()
scsi: aacraid: use timespec64 instead of timeval
scsi: scsi_transport_fc: add 64GBIT and 128GBIT port speed definitions
scsi: qla2xxx: Suppress a kernel complaint in qla_init_base_qpair()
scsi: mpt3sas: fix dma_addr_t casts
scsi: be2iscsi: Use kasprintf
scsi: storvsc: Avoid excessive host scan on controller change
scsi: lpfc: fix kzalloc-simple.cocci warnings
scsi: mpt3sas: Update mpt3sas driver version.
scsi: mpt3sas: Fix sparse warnings
scsi: mpt3sas: Fix nvme drives checking for tlr.
scsi: mpt3sas: NVMe drive support for BTDHMAPPING ioctl command and log info
scsi: mpt3sas: Add-Task-management-debug-info-for-NVMe-drives.
scsi: mpt3sas: scan and add nvme device after controller reset
scsi: mpt3sas: Set NVMe device queue depth as 128
scsi: mpt3sas: Handle NVMe PCIe device related events generated from firmware.
scsi: mpt3sas: API's to remove nvme drive from sml
scsi: mpt3sas: API 's to support NVMe drive addition to SML
...
Linus Torvalds [Wed, 15 Nov 2017 00:07:26 +0000 (16:07 -0800)]
Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md
Pull MD update from Shaohua Li:
"This update mostly includes bug fixes:
- md-cluster now supports raid10 from Guoqing
- raid5 PPL fixes from Artur
- badblock regression fix from Bo
- suspend hang related fixes from Neil
- raid5 reshape fixes from Neil
- raid1 freeze deadlock fix from Nate
- memleak fixes from Zdenek
- bitmap related fixes from Me and Tao
- other fixes and cleanups"
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md: (33 commits)
md: free unused memory after bitmap resize
md: release allocated bitset sync_set
md/bitmap: clear BITMAP_WRITE_ERROR bit before writing it to sb
md: be cautious about using ->curr_resync_completed for ->recovery_offset
badblocks: fix wrong return value in badblocks_set if badblocks are disabled
md: don't check MD_SB_CHANGE_CLEAN in md_allow_write
md-cluster: update document for raid10
md: remove redundant variable q
raid1: remove obsolete code in raid1_write_request
md-cluster: Use a small window for raid10 resync
md-cluster: Suspend writes in RAID10 if within range
md-cluster/raid10: set "do_balance = 0" if area is resyncing
md: use lockdep_assert_held
raid1: prevent freeze_array/wait_all_barriers deadlock
md: use TASK_IDLE instead of blocking signals
md: remove special meaning of ->quiesce(.., 2)
md: allow metadata update while suspending.
md: use mddev_suspend/resume instead of ->quiesce()
md: move suspend_hi/lo handling into core md code
md: don't call bitmap_create() while array is quiesced.
...
Linus Torvalds [Tue, 14 Nov 2017 23:50:56 +0000 (15:50 -0800)]
Merge tag 'for-4.15/dm' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mike Snitzer:
- a few conversions from atomic_t to ref_count_t
- a DM core fix for a race during device destruction that could result
in a BUG_ON
- a stable@ fix for a DM cache race condition that could lead to data
corruption when operating in writeback mode (writethrough is default)
- various DM cache cleanups and improvements
- add DAX support to the DM log-writes target
- a fix for the DM zoned target's ability to deal with the last zone of
the drive being smaller than all others
- a stable@ DM crypt and DM integrity fix for a negative check that was
to restrictive (prevented slab debug with XFS ontop of DM crypt from
working)
- a DM raid target fix for a panic that can occur when forcing a raid
to sync
* tag 'for-4.15/dm' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (25 commits)
dm cache: lift common migration preparation code to alloc_migration()
dm cache: remove usused deferred_cells member from struct cache
dm cache policy smq: allocate cache blocks in order
dm cache policy smq: change max background work from 10240 to 4096 blocks
dm cache background tracker: limit amount of background work that may be issued at once
dm cache policy smq: take origin idle status into account when queuing writebacks
dm cache policy smq: handle races with queuing background_work
dm raid: fix panic when attempting to force a raid to sync
dm integrity: allow unaligned bv_offset
dm crypt: allow unaligned bv_offset
dm: small cleanup in dm_get_md()
dm: fix race between dm_get_from_kobject() and __dm_destroy()
dm: allocate struct mapped_device with kvzalloc
dm zoned: ignore last smaller runt zone
dm space map metadata: use ARRAY_SIZE
dm log writes: add support for DAX
dm log writes: add support for inline data buffers
dm cache: simplify get_per_bio_data() by removing data_size argument
dm cache: remove all obsolete writethrough-specific code
dm cache: submit writethrough writes in parallel to origin and cache
...
Linus Torvalds [Tue, 14 Nov 2017 23:32:19 +0000 (15:32 -0800)]
Merge branch 'for-4.15/block' of git://git.kernel.dk/linux-block
Pull core block layer updates from Jens Axboe:
"This is the main pull request for block storage for 4.15-rc1.
Nothing out of the ordinary in here, and no API changes or anything
like that. Just various new features for drivers, core changes, etc.
In particular, this pull request contains:
- A patch series from Bart, closing the whole on blk/scsi-mq queue
quescing.
- A series from Christoph, building towards hidden gendisks (for
multipath) and ability to move bio chains around.
- NVMe
- Support for native multipath for NVMe (Christoph).
- Userspace notifications for AENs (Keith).
- Command side-effects support (Keith).
- SGL support (Chaitanya Kulkarni)
- FC fixes and improvements (James Smart)
- Lots of fixes and tweaks (Various)
- bcache
- New maintainer (Michael Lyle)
- Writeback control improvements (Michael)
- Various fixes (Coly, Elena, Eric, Liang, et al)
- lightnvm updates, mostly centered around the pblk interface
(Javier, Hans, and Rakesh).
- Removal of unused bio/bvec kmap atomic interfaces (me, Christoph)
- Writeback series that fix the much discussed hundreds of millions
of sync-all units. This goes all the way, as discussed previously
(me).
- Fix for missing wakeup on writeback timer adjustments (Yafang
Shao).
- Fix laptop mode on blk-mq (me).
- {mq,name} tupple lookup for IO schedulers, allowing us to have
alias names. This means you can use 'deadline' on both !mq and on
mq (where it's called mq-deadline). (me).
- blktrace race fix, oopsing on sg load (me).
- blk-mq optimizations (me).
- Obscure waitqueue race fix for kyber (Omar).
- NBD fixes (Josef).
- Disable writeback throttling by default on bfq, like we do on cfq
(Luca Miccio).
- Series from Ming that enable us to treat flush requests on blk-mq
like any other request. This is a really nice cleanup.
- Series from Ming that improves merging on blk-mq with schedulers,
getting us closer to flipping the switch on scsi-mq again.
- Lots of minor fixes from lots of different folks, both for core and
driver code"
* 'for-4.15/block' of git://git.kernel.dk/linux-block: (294 commits)
nvme: fix visibility of "uuid" ns attribute
blk-mq: fixup some comment typos and lengths
ide: ide-atapi: fix compile error with defining macro DEBUG
blk-mq: improve tag waiting setup for non-shared tags
brd: remove unused brd_mutex
blk-mq: only run the hardware queue if IO is pending
block: avoid null pointer dereference on null disk
fs: guard_bio_eod() needs to consider partitions
xtensa/simdisk: fix compile error
nvme: expose subsys attribute to sysfs
nvme: create 'slaves' and 'holders' entries for hidden controllers
block: create 'slaves' and 'holders' entries for hidden gendisks
nvme: also expose the namespace identification sysfs files for mpath nodes
nvme: implement multipath access to nvme subsystems
nvme: track shared namespaces
nvme: introduce a nvme_ns_ids structure
nvme: track subsystems
block, nvme: Introduce blk_mq_req_flags_t
block, scsi: Make SCSI quiesce and resume work reliably
block: Add the QUEUE_FLAG_PREEMPT_ONLY request queue flag
...