kobject: clean up the kobject add documentation a bit more
Commit 1fd7c3b438a2 ("kobject: Improve doc clarity kobject_init_and_add()")
tried to provide more clarity, but the reference to kobject_del() was
incorrect. Fix that up by removing that line, and hopefully be more explicit
as to exactly what needs to happen here once you register a kobject with the
kobject core.
kernel-doc comments have a prescribed format. This includes parenthesis
on the function name. To be _particularly_ correct we should also
capitalise the brief description and terminate it with a period.
In preparation for adding/updating kernel-doc function comments clean up
the ones currently present.
Signed-off-by: Tobin C. Harding <tobin@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Colin Ian King [Wed, 1 May 2019 12:43:17 +0000 (13:43 +0100)]
kobject: fix dereference before null check on kobj
The kobj pointer is being null-checked so potentially it could be null,
however, the ktype declaration before the null check is dereferencing kobj
hence we have a potential null pointer deference. Fix this by moving the
assignment of ktype after kobj has been null checked.
Addresses-Coverity: ("Dereference before null check") Fixes: aa30f47cf666 ("kobject: Add support for default attribute groups to kobj_type") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
init/config: Do not select BUILD_BIN2C for IKCONFIG
Since commit 13610aa908dc ("kernel/configs: use .incbin directive to
embed config_data.gz"), IKCONFIG no longer uses BUILD_BIN2C so prevent
it from being selected in Kconfig.
Provide in-kernel headers to make extending kernel easier
Introduce in-kernel headers which are made available as an archive
through proc (/proc/kheaders.tar.xz file). This archive makes it
possible to run eBPF and other tracing programs that need to extend the
kernel for tracing purposes without any dependency on the file system
having headers.
A github PR is sent for the corresponding BCC patch at:
https://github.com/iovisor/bcc/pull/2312
On Android and embedded systems, it is common to switch kernels but not
have kernel headers available on the file system. Further once a
different kernel is booted, any headers stored on the file system will
no longer be useful. This is an issue even well known to distros.
By storing the headers as a compressed archive within the kernel, we can
avoid these issues that have been a hindrance for a long time.
The best way to use this feature is by building it in. Several users
have a need for this, when they switch debug kernels, they do not want to
update the filesystem or worry about it where to store the headers on
it. However, the feature is also buildable as a module in case the user
desires it not being part of the kernel image. This makes it possible to
load and unload the headers from memory on demand. A tracing program can
load the module, do its operations, and then unload the module to save
kernel memory. The total memory needed is 3.3MB.
By having the archive available at a fixed location independent of
filesystem dependencies and conventions, all debugging tools can
directly refer to the fixed location for the archive, without concerning
with where the headers on a typical filesystem which significantly
simplifies tooling that needs kernel headers.
The code to read the headers is based on /proc/config.gz code and uses
the same technique to embed the headers.
Other approaches were discussed such as having an in-memory mountable
filesystem, but that has drawbacks such as requiring an in-kernel xz
decompressor which we don't have today, and requiring usage of 42 MB of
kernel memory to host the decompressed headers at anytime. Also this
approach is simpler than such approaches.
Function kobject_init_and_add() is currently misused in a number of
places in the kernel. On error return kobject_put() must be called but
is at times not.
Make the function documentation more explicit about calling
kobject_put() in the error path.
Signed-off-by: Tobin C. Harding <tobin@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
driver core: platform: Fix the usage of platform device name(pdev->name)
Platform core is using pdev->name as the platform device name to do
the binding of the devices with the drivers. But, when the platform
driver overrides the platform device name with dev_set_name(),
the pdev->name is pointing to a location which is freed and becomes
an invalid parameter to do the binding match.
Kimberly Brown [Tue, 2 Apr 2019 02:51:58 +0000 (22:51 -0400)]
livepatch: Replace klp_ktype_patch's default_attrs with groups
The kobj_type default_attrs field is being replaced by the
default_groups field. Replace klp_ktype_patch's default_attrs field
with default_groups and use the ATTRIBUTE_GROUPS macro to create
klp_patch_groups.
This patch was tested by loading the livepatch-sample module and
verifying that the sysfs files for the attributes in the default groups
were created.
Signed-off-by: Kimberly Brown <kimbrownkd@gmail.com> Acked-by: Jiri Kosina <jkosina@suse.cz> Acked-by: Miroslav Benes <mbenes@suse.cz> Acked-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Kimberly Brown [Tue, 2 Apr 2019 02:51:53 +0000 (22:51 -0400)]
cpufreq: schedutil: Replace default_attrs field with groups
The kobj_type default_attrs field is being replaced by the
default_groups field. Replace sugov_tunables_ktype's default_attrs field
with default groups. Change "sugov_attributes" to "sugov_attrs" and use
the ATTRIBUTE_GROUPS macro to create sugov_groups.
This patch was tested by setting the scaling governor to schedutil and
verifying that the sysfs files for the attributes in the default groups
were created.
Signed-off-by: Kimberly Brown <kimbrownkd@gmail.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Kimberly Brown [Tue, 2 Apr 2019 02:51:47 +0000 (22:51 -0400)]
padata: Replace padata_attr_type default_attrs field with groups
The kobj_type default_attrs field is being replaced by the
default_groups field. Replace padata_attr_type's default_attrs field
with default_groups and use the ATTRIBUTE_GROUPS macro to create
padata_default_groups.
This patch was tested by loading the pcrypt module and verifying that
the sysfs files for the attributes in the default groups were created.
Signed-off-by: Kimberly Brown <kimbrownkd@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Kimberly Brown [Tue, 2 Apr 2019 02:51:41 +0000 (22:51 -0400)]
irqdesc: Replace irq_kobj_type's default_attrs field with groups
The kobj_type default_attrs field is being replaced by the
default_groups field. Replace irq_kobj_type's default_attrs field with
default_groups and use the ATTRIBUTE_GROUPS macro to create irq_groups.
This patch was tested by verifying that the sysfs files for the
attributes in the default groups were created.
Signed-off-by: Kimberly Brown <kimbrownkd@gmail.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Kimberly Brown [Tue, 2 Apr 2019 02:51:35 +0000 (22:51 -0400)]
net-sysfs: Replace ktype default_attrs field with groups
The kobj_type default_attrs field is being replaced by the
default_groups field. Replace the default_attrs fields in rx_queue_ktype
and netdev_queue_ktype with default_groups. Use the ATTRIBUTE_GROUPS
macro to create rx_queue_default_groups and netdev_queue_default_groups.
This patch was tested by verifying that the sysfs files for the
attributes in the default groups were created.
Signed-off-by: Kimberly Brown <kimbrownkd@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Kimberly Brown [Tue, 2 Apr 2019 02:51:30 +0000 (22:51 -0400)]
block: Replace all ktype default_attrs with groups
The kobj_type default_attrs field is being replaced by the
default_groups field. Replace all of the ktype default_attrs fields in
the block subsystem with default_groups and use the ATTRIBUTE_GROUPS
macro to create the default groups.
Remove default_ctx_attrs[] because it doesn't contain any attributes.
This patch was tested by verifying that the sysfs files for the
attributes in the default groups were created.
Signed-off-by: Kimberly Brown <kimbrownkd@gmail.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Kimberly Brown [Tue, 2 Apr 2019 02:51:24 +0000 (22:51 -0400)]
samples/kobject: Replace foo_ktype's default_attrs field with groups
The kobj_type default_attrs field is being replaced by the
default_groups field. Replace foo_ktype's default_attrs field with
default_groups and use the ATTRIBUTE_GROUPS macro to create
foo_default_groups.
This patch was tested by loading the kset-example module and verifying
that the sysfs files for the attributes in the default group were
created.
Signed-off-by: Kimberly Brown <kimbrownkd@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Kimberly Brown [Tue, 2 Apr 2019 02:51:18 +0000 (22:51 -0400)]
kobject: Add support for default attribute groups to kobj_type
kobj_type currently uses a list of individual attributes to store
default attributes. Attribute groups are more flexible than a list of
attributes because groups provide support for attribute visibility. So,
add support for default attribute groups to kobj_type.
In future patches, the existing uses of kobj_type’s attribute list will
be converted to attribute groups. When that is complete, kobj_type’s
attribute list, “default_attrs”, will be removed.
Signed-off-by: Kimberly Brown <kimbrownkd@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
John Garry [Thu, 28 Mar 2019 10:08:05 +0000 (18:08 +0800)]
driver core: Postpone DMA tear-down until after devres release for probe failure
In commit 376991db4b64 ("driver core: Postpone DMA tear-down until after
devres release"), we changed the ordering of tearing down the device DMA
ops and releasing all the device's resources; this was because the DMA ops
should be maintained until we release the device's managed DMA memories.
However, we have seen another crash on an arm64 system when a
device driver probe fails:
hisi_sas_v3_hw 0000:74:02.0: Adding to iommu group 2
scsi host1: hisi_sas_v3_hw
BUG: Bad page state in process swapper/0 pfn:313f5
page:ffff7e0000c4fd40 count:1 mapcount:0
mapping:0000000000000000 index:0x0
flags: 0xfffe00000001000(reserved)
raw: 0fffe00000001000ffff7e0000c4fd48ffff7e0000c4fd48 0000000000000000
raw: 0000000000000000000000000000000000000001ffffffff 0000000000000000
page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
bad because of flags: 0x1000(reserved)
Modules linked in:
CPU: 49 PID: 1 Comm: swapper/0 Not tainted 5.1.0-rc1-43081-g22d97fd-dirty #1433
Hardware name: Huawei D06/D06, BIOS Hisilicon D06 UEFI
RC0 - V1.12.01 01/29/2019
Call trace:
dump_backtrace+0x0/0x118
show_stack+0x14/0x1c
dump_stack+0xa4/0xc8
bad_page+0xe4/0x13c
free_pages_check_bad+0x4c/0xc0
__free_pages_ok+0x30c/0x340
__free_pages+0x30/0x44
__dma_direct_free_pages+0x30/0x38
dma_direct_free+0x24/0x38
dma_free_attrs+0x9c/0xd8
dmam_release+0x20/0x28
release_nodes+0x17c/0x220
devres_release_all+0x34/0x54
really_probe+0xc4/0x2c8
driver_probe_device+0x58/0xfc
device_driver_attach+0x68/0x70
__driver_attach+0x94/0xdc
bus_for_each_dev+0x5c/0xb4
driver_attach+0x20/0x28
bus_add_driver+0x14c/0x200
driver_register+0x6c/0x124
__pci_register_driver+0x48/0x50
sas_v3_pci_driver_init+0x20/0x28
do_one_initcall+0x40/0x25c
kernel_init_freeable+0x2b8/0x3c0
kernel_init+0x10/0x100
ret_from_fork+0x10/0x18
Disabling lock debugging due to kernel taint
BUG: Bad page state in process swapper/0 pfn:313f6
page:ffff7e0000c4fd80 count:1 mapcount:0
mapping:0000000000000000 index:0x0
[ 89.322983] flags: 0xfffe00000001000(reserved)
raw: 0fffe00000001000ffff7e0000c4fd88ffff7e0000c4fd88 0000000000000000
raw: 0000000000000000000000000000000000000001ffffffff 0000000000000000
The crash occurs for the same reason.
In this case, on the really_probe() failure path, we are still clearing
the DMA ops prior to releasing the device's managed memories.
This patch fixes this issue by reordering the DMA ops teardown and the
call to devres_release_all() on the failure path.
Reported-by: Xiang Chen <chenxiang66@hisilicon.com> Tested-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Andy Shevchenko [Thu, 4 Apr 2019 08:11:58 +0000 (11:11 +0300)]
driver core: platform: Propagate error from insert_resource()
Since insert_resource() might return an error we don't need
to shadow its error code and would safely propagate to the user.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Andrea Parri [Tue, 16 Apr 2019 12:17:11 +0000 (14:17 +0200)]
kernfs: fix barrier usage in __kernfs_new_node()
smp_mb__before_atomic() can not be applied to atomic_set(). Remove the
barrier and rely on RELEASE synchronization.
Fixes: ba16b2846a8c6 ("kernfs: add an API to get kernfs node from inode number") Cc: stable@vger.kernel.org Signed-off-by: Andrea Parri <andrea.parri@amarulasolutions.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Qian Cai [Sun, 7 Apr 2019 01:12:22 +0000 (21:12 -0400)]
acpi/hmat: fix an uninitialized memory_target
The commit 665ac7e92757 ("acpi/hmat: Register processor domain to its
memory") introduced an uninitialized "struct memory_target" that could
cause an incorrect branching.
drivers/acpi/hmat/hmat.c:385:6: warning: variable 'target' is used
uninitialized whenever 'if' condition is false
[-Wsometimes-uninitialized]
if (p->flags & ACPI_HMAT_MEMORY_PD_VALID) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/acpi/hmat/hmat.c:392:6: note: uninitialized use occurs here
if (target && p->flags & ACPI_HMAT_PROCESSOR_PD_VALID) {
^~~~~~
drivers/acpi/hmat/hmat.c:385:2: note: remove the 'if' if its condition
is always true
if (p->flags & ACPI_HMAT_MEMORY_PD_VALID) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/acpi/hmat/hmat.c:369:30: note: initialize the variable 'target'
to silence this warning
struct memory_target *target;
^
= NULL
Signed-off-by: Qian Cai <cai@lca.pw> Reviewed-by: Mukesh Ojha <mojha@codeaurora.org> Fixes: 665ac7e92757 ("acpi/hmat: Register processor domain to its memory") Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Qian Cai [Wed, 10 Apr 2019 02:14:50 +0000 (22:14 -0400)]
acpi/hmat: fix memory leaks in hmat_init()
The commit 665ac7e92757 ("acpi/hmat: Register processor domain to its
memory") introduced some memory leaks below due to it fails to release
the heap memory in an error path, and then those statically-allocated
__initdata memory which reference them get freed during boot renders
those heap memory as leaks. Since it is valid to pass NULL to
acpi_put_table(), it is fine to call it even if acpi_get_table() returns
an error.
Signed-off-by: Qian Cai <cai@lca.pw> Fixes: 665ac7e92757 ("acpi/hmat: Register processor domain to its memory") Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
mm/memory_hotplug: Do not unlock when fails to take the device_hotplug_lock
When adding the memory by probing memory block in sysfs interface, there is an
obvious issue that we will unlock the device_hotplug_lock when fails to takes it.
That issue was introduced in Commit 8df1d0e4a265
("mm/memory_hotplug: make add_memory() take the device_hotplug_lock")
We should drop out in time when fails to take the device_hotplug_lock.
Fixes: 8df1d0e4a265 ("mm/memory_hotplug: make add_memory() take the device_hotplug_lock") Reported-by: Yang yingliang <yangyingliang@huawei.com> Signed-off-by: zhong jiang <zhongjiang@huawei.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ronald Tschalär [Mon, 15 Apr 2019 08:25:05 +0000 (01:25 -0700)]
debugfs: update documented return values of debugfs helpers
Since commit ff9fb72bc077 ("debugfs: return error values, not NULL")
these helper functions do not return NULL anymore (with the exception
of debugfs_create_u32_array()).
Fixes: ff9fb72bc077 ("debugfs: return error values, not NULL") Signed-off-by: Ronald Tschalär <ronald@innovation.ch> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
drivers: base: power: add proper SPDX identifiers on files that did not have them.
There were a few files in the driver core power code that did not have
SPDX identifiers on them, so fix that up. At the same time, remove the
"free form" text that specified the license of the file, as that is
impossible for any tool to properly parse.
Cc: "Rafael J. Wysocki" <rafael@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
arch_topology: Make cpu_capacity sysfs node as read-only
If user updates any cpu's cpu_capacity, then the new value is going to
be applied to all its online sibling cpus. But this need not to be correct
always, as sibling cpus (in ARM, same micro architecture cpus) would have
different cpu_capacity with different performance characteristics.
So, updating the user supplied cpu_capacity to all cpu siblings
is not correct.
And another problem is, current code assumes that 'all cpus in a cluster
or with same package_id (core_siblings), would have same cpu_capacity'.
But with commit '5bdd2b3f0f8 ("arm64: topology: add support to remove
cpu topology sibling masks")', when a cpu hotplugged out, the cpu
information gets cleared in its sibling cpus. So, user supplied
cpu_capacity would be applied to only online sibling cpus at the time.
After that, if any cpu hotplugged in, it would have different cpu_capacity
than its siblings, which breaks the above assumption.
So, instead of mucking around the core sibling mask for user supplied
value, use device-tree to set cpu capacity. And make the cpu_capacity
node as read-only to know the asymmetry between cpus in the system.
While at it, remove cpu_scale_mutex usage, which used for sysfs write
protection.
Keith Busch [Mon, 11 Mar 2019 20:56:06 +0000 (14:56 -0600)]
doc/mm: New documentation for memory performance
Platforms may provide system memory where some physical address ranges
perform differently than others, or is cached by the system on the
memory side.
Add documentation describing a high level overview of such systems and the
perforamnce and caching attributes the kernel provides for applications
wishing to query this information.
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Tested-by: Brice Goglin <Brice.Goglin@inria.fr> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Keith Busch [Mon, 11 Mar 2019 20:56:04 +0000 (14:56 -0600)]
acpi/hmat: Register performance attributes
Save the best performance access attributes and register these with the
memory's node if HMAT provides the locality table. While HMAT does make
it possible to know performance for all possible initiator-target
pairings, we export only the local pairings at this time.
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Tested-by: Brice Goglin <Brice.Goglin@inria.fr> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Keith Busch [Mon, 11 Mar 2019 20:56:03 +0000 (14:56 -0600)]
acpi/hmat: Register processor domain to its memory
If the HMAT Subsystem Address Range provides a valid processor proximity
domain for a memory domain, or a processor domain matches the performance
access of the valid processor proximity domain, register the memory
target with that initiator so this relationship will be visible under
the node's sysfs directory.
Since HMAT requires valid address ranges have an equivalent SRAT entry,
verify each memory target satisfies this requirement.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Brice Goglin <Brice.Goglin@inria.fr> Tested-by: Brice Goglin <Brice.Goglin@inria.fr> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Keith Busch [Mon, 11 Mar 2019 20:56:02 +0000 (14:56 -0600)]
node: Add memory-side caching attributes
System memory may have caches to help improve access speed to frequently
requested address ranges. While the system provided cache is transparent
to the software accessing these memory ranges, applications can optimize
their own access based on cache attributes.
Provide a new API for the kernel to register these memory-side caches
under the memory node that provides it.
The new sysfs representation is modeled from the existing cpu cacheinfo
attributes, as seen from /sys/devices/system/cpu/<cpu>/cache/. Unlike CPU
cacheinfo though, the node cache level is reported from the view of the
memory. A higher level number is nearer to the CPU, while lower levels
are closer to the last level memory.
The exported attributes are the cache size, the line size, associativity
indexing, and write back policy, and add the attributes for the system
memory caches to sysfs stable documentation.
Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Brice Goglin <Brice.Goglin@inria.fr> Tested-by: Brice Goglin <Brice.Goglin@inria.fr> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Keith Busch [Mon, 11 Mar 2019 20:56:01 +0000 (14:56 -0600)]
node: Add heterogenous memory access attributes
Heterogeneous memory systems provide memory nodes with different latency
and bandwidth performance attributes. Provide a new kernel interface
for subsystems to register the attributes under the memory target
node's initiator access class. If the system provides this information,
applications may query these attributes when deciding which node to
request memory.
The following example shows the new sysfs hierarchy for a node exporting
performance attributes:
The bandwidth is exported as MB/s and latency is reported in
nanoseconds. The values are taken from the platform as reported by the
manufacturer.
Memory accesses from an initiator node that is not one of the memory's
access "Z" initiator nodes linked in the same directory may observe
different performance than reported here. When a subsystem makes use
of this interface, initiators of a different access number may not have
the same performance relative to initiators in other access numbers, or
omitted from the any access class' initiators.
Descriptions for memory access initiator performance access attributes
are added to sysfs stable documentation.
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Brice Goglin <Brice.Goglin@inria.fr> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Keith Busch [Mon, 11 Mar 2019 20:56:00 +0000 (14:56 -0600)]
node: Link memory nodes to their compute nodes
Systems may be constructed with various specialized nodes. Some nodes
may provide memory, some provide compute devices that access and use
that memory, and others may provide both. Nodes that provide memory are
referred to as memory targets, and nodes that can initiate memory access
are referred to as memory initiators.
Memory targets will often have varying access characteristics from
different initiators, and platforms may have ways to express those
relationships. In preparation for these systems, provide interfaces for
the kernel to export the memory relationship among different nodes memory
targets and their initiators with symlinks to each other.
If a system provides access locality for each initiator-target pair, nodes
may be grouped into ranked access classes relative to other nodes. The
new interface allows a subsystem to register relationships of varying
classes if available and desired to be exported.
A memory initiator may have multiple memory targets in the same access
class. The target memory's initiators in a given class indicate the
nodes access characteristics share the same performance relative to other
linked initiator nodes. Each target within an initiator's access class,
though, do not necessarily perform the same as each other.
A memory target node may have multiple memory initiators. All linked
initiators in a target's class have the same access characteristics to
that target.
The following example show the nodes' new sysfs hierarchy for a memory
target node 'Y' with access class 0 from initiator node 'X':
Keith Busch [Mon, 11 Mar 2019 20:55:59 +0000 (14:55 -0600)]
acpi/hmat: Parse and report heterogeneous memory
Systems may provide different memory types and export this information
in the ACPI Heterogeneous Memory Attribute Table (HMAT). Parse these
tables provided by the platform and report the memory access and caching
attributes to the kernel messages.
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Tested-by: Brice Goglin <Brice.Goglin@inria.fr> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Keith Busch [Mon, 11 Mar 2019 20:55:58 +0000 (14:55 -0600)]
acpi: Add HMAT to generic parsing tables
The Heterogeneous Memory Attribute Table (HMAT) header has different
field lengths than the existing parsing uses. Add the HMAT type to the
parsing rules so it may be generically parsed.
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Tested-by: Brice Goglin <Brice.Goglin@inria.fr> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Keith Busch [Mon, 11 Mar 2019 20:55:57 +0000 (14:55 -0600)]
acpi: Create subtable parsing infrastructure
Parsing entries in an ACPI table had assumed a generic header
structure. There is no standard ACPI header, though, so less common
layouts with different field sizes required custom parsers to go through
their subtable entry list.
Create the infrastructure for adding different table types so parsing
the entries array may be more reused for all ACPI system tables and
the common code doesn't need to be duplicated.
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Tested-by: Brice Goglin <Brice.Goglin@inria.fr> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
syzbot is hitting use-after-free bug in uinput module [1]. This is because
kobject_uevent(KOBJ_REMOVE) is called again due to commit 0f4dafc0563c6c49
("Kobject: auto-cleanup on final unref") after memory allocation fault
injection made kobject_uevent(KOBJ_REMOVE) from device_del() from
input_unregister_device() fail, while uinput_destroy_device() is expecting
that kobject_uevent(KOBJ_REMOVE) is not called after device_del() from
input_unregister_device() completed.
That commit intended to catch cases where nobody even attempted to send
"remove" uevents. But there is no guarantee that an event will ultimately
be sent. We are at the point of no return as far as the rest of the kernel
is concerned; there are no repeats or do-overs.
Also, it is not clear whether some subsystem depends on that commit.
If no subsystem depends on that commit, it will be better to remove
the state_{add,remove}_uevent_sent logic. But we don't want to risk
a regression (in a patch which will be backported) by trying to remove
that logic. Therefore, as a first step, let's avoid the use-after-free bug
by making sure that kobject_uevent(KOBJ_REMOVE) won't be triggered twice.
driver: base: Disable CONFIG_UEVENT_HELPER by default
Since commit 7934779a69f1184f ("Driver-Core: disable /sbin/hotplug by
default"), the help text for the /sbin/hotplug fork-bomb says
"This should not be used today [...] creates a high system load, or
[...] out-of-memory situations during bootup". The rationale for this
was that no recent mainstream system used this anymore (in 2010!).
A few years later, the complete uevent helper support was made optional
in commit 86d56134f1b67d0c ("kobject: Make support for uevent_helper
optional."). However, if was still left enabled by default, to support
ancient userland.
Time passed by, and nothing should use this anymore, so it can be
disabled by default.
struct device is big, around 760 bytes on x86_64. It's not a critical
structure, but it is embedded everywhere, so making it smaller is always
a good thing.
With a recent patch that moved a field from struct device to the private
structure, some benchmarks showed a very odd regression, despite this
structure having nothing to do with those benchmarks. That caused me to
look into the layout of the structure. Using 'pahole', it showed a
number of holes and ways that the structure could be reordered in order
to align some cachelines better, as well as reduce the size of the
overall structure.
Move 'struct kobj' to the start of the structure, to keep that access
in the first cacheline, and try to organize things a bit more compactly
where possible
By doing these few moves, the result removes at least 8 bytes from
'struct device' on a 64bit system. Given we know there are systems with
at least 30k devices in memory at once, every little byte counts, and
this change could be a savings of 240k of kernel memory for them. On
"normal" systems the overall memory savings would be much less.
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Linus Torvalds [Sun, 31 Mar 2019 15:55:59 +0000 (08:55 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"A collection of x86 and ARM bugfixes, and some improvements to
documentation.
On top of this, a cleanup of kvm_para.h headers, which were exported
by some architectures even though they not support KVM at all. This is
responsible for all the Kbuild changes in the diffstat"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (28 commits)
Documentation: kvm: clarify KVM_SET_USER_MEMORY_REGION
KVM: doc: Document the life cycle of a VM and its resources
KVM: selftests: complete IO before migrating guest state
KVM: selftests: disable stack protector for all KVM tests
KVM: selftests: explicitly disable PIE for tests
KVM: selftests: assert on exit reason in CR4/cpuid sync test
KVM: x86: update %rip after emulating IO
x86/kvm/hyper-v: avoid spurious pending stimer on vCPU init
kvm/x86: Move MSR_IA32_ARCH_CAPABILITIES to array emulated_msrs
KVM: x86: Emulate MSR_IA32_ARCH_CAPABILITIES on AMD hosts
kvm: don't redefine flags as something else
kvm: mmu: Used range based flushing in slot_handle_level_range
KVM: export <linux/kvm_para.h> and <asm/kvm_para.h> iif KVM is supported
KVM: x86: remove check on nr_mmu_pages in kvm_arch_commit_memory_region()
kvm: nVMX: Add a vmentry check for HOST_SYSENTER_ESP and HOST_SYSENTER_EIP fields
KVM: SVM: Workaround errata#1096 (insn_len maybe zero on SMAP violation)
KVM: Reject device ioctls from processes other than the VM's creator
KVM: doc: Fix incorrect word ordering regarding supported use of APIs
KVM: x86: fix handling of role.cr4_pae and rename it to 'gpte_size'
KVM: nVMX: Do not inherit quadrant and invalid for the root shadow EPT
...
Linus Torvalds [Sun, 31 Mar 2019 15:37:04 +0000 (08:37 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf tooling fixes from Thomas Gleixner:
"Core libraries:
- Fix max perf_event_attr.precise_ip detection.
- Fix parser error for uncore event alias
- Fixup ordering of kernel maps after obtaining the main kernel map
address.
Intel PT:
- Fix TSC slip where A TSC packet can slip past MTC packets so that
the timestamp appears to go backwards.
- Fixes for exported-sql-viewer GUI conversion to python3.
ARM coresight:
- Fix the build by adding a missing case value for enumeration value
introduced in newer library, that now is the required one.
tool headers:
- Syncronize kernel headers with the kernel, getting new io_uring and
pidfd_send_signal syscalls so that 'perf trace' can handle them"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf pmu: Fix parser error for uncore event alias
perf scripts python: exported-sql-viewer.py: Fix python3 support
perf scripts python: exported-sql-viewer.py: Fix never-ending loop
perf machine: Update kernel map address and re-order properly
tools headers uapi: Sync powerpc's asm/kvm.h copy with the kernel sources
tools headers: Update x86's syscall_64.tbl and uapi/asm-generic/unistd
tools headers uapi: Update drm/i915_drm.h
tools arch x86: Sync asm/cpufeatures.h with the kernel sources
tools headers uapi: Sync linux/fcntl.h to get the F_SEAL_FUTURE_WRITE addition
tools headers uapi: Sync asm-generic/mman-common.h and linux/mman.h
perf evsel: Fix max perf_event_attr.precise_ip detection
perf intel-pt: Fix TSC slip
perf cs-etm: Add missing case value
Linus Torvalds [Sun, 31 Mar 2019 15:22:12 +0000 (08:22 -0700)]
Merge branch 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull CPU hotplug fixes from Thomas Gleixner:
"Two SMT/hotplug related fixes:
- Prevent crash when HOTPLUG_CPU is disabled and the CPU bringup
aborts. This is triggered with the 'nosmt' command line option, but
can happen by any abort condition. As the real unplug code is not
compiled in, prevent the fail by keeping the CPU in zombie state.
- Enforce HOTPLUG_CPU for SMP on x86 to avoid the above situation
completely. With 'nosmt' being a popular option it's required to
unplug the half brought up sibling CPUs (due to the MCE wreckage)
completely"
* 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/smp: Enforce CONFIG_HOTPLUG_CPU when SMP=y
cpu/hotplug: Prevent crash when CPU bringup fails on CONFIG_HOTPLUG_CPU=n
Linus Torvalds [Sun, 31 Mar 2019 14:47:21 +0000 (07:47 -0700)]
Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core fixes from Thomas Gleixner:
"A small set of core updates:
- Make the watchdog respect the selected CPU mask again. That was
broken by the rework of the watchdog thread management and caused
inconsistent state and NMI watchdog being unstoppable.
- Ensure that the objtool build can find the libelf location.
- Remove dead kcore stub code"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
watchdog: Respect watchdog cpumask on CPU hotplug
objtool: Query pkg-config for libelf location
proc/kcore: Remove unused kclist_add_remap()
Linus Torvalds [Sun, 31 Mar 2019 14:44:13 +0000 (07:44 -0700)]
Merge tag 'powerpc-5.1-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Three non-regression fixes.
- Our optimised memcmp could read past the end of one of the buffers
and potentially trigger a page fault leading to an oops.
- Some of our code to read energy management data on PowerVM had an
endian bug leading to bogus results.
- When reporting a machine check exception we incorrectly reported
TLB multihits as D-Cache multhits due to a missing entry in the
array of causes.
Thanks to: Chandan Rajendra, Gautham R. Shenoy, Mahesh Salgaonkar,
Segher Boessenkool, Vaidyanathan Srinivasan"
* tag 'powerpc-5.1-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/pseries/mce: Fix misleading print for TLB mutlihit
powerpc/pseries/energy: Use OF accessor functions to read ibm,drc-indexes
powerpc/64: Fix memcmp reading past the end of src/dest
Linus Torvalds [Sun, 31 Mar 2019 14:42:39 +0000 (07:42 -0700)]
Merge tag 'dmaengine-fix-5.1-rc3' of git://git.infradead.org/users/vkoul/slave-dma
Pull dmaengine fixes from Vinod Koul:
- Revert "dmaengine: stm32-mdma: Add a check on read_u32_array" as that
caused regression
- Fix MAINTAINER file uniphier-mdmac.c file path
* tag 'dmaengine-fix-5.1-rc3' of git://git.infradead.org/users/vkoul/slave-dma:
MAINTAINERS: Fix uniphier-mdmac.c file path
dmaengine: stm32-mdma: Revert "dmaengine: stm32-mdma: Add a check on read_u32_array"
Linus Torvalds [Sat, 30 Mar 2019 19:12:56 +0000 (12:12 -0700)]
Merge tag 'led-fixes-for-5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds
Pull LED fixes from Jacek Anaszewski:
- fix refcnt leak on interface rename
- use memcpy in device_name_store() to avoid including garbage from a
previous, longer value in the device_name
- fix a potential NULL pointer dereference in case of_match_device()
cannot find a match
* tag 'led-fixes-for-5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds:
leds: trigger: netdev: use memcpy in device_name_store
leds: pca9532: fix a potential NULL pointer dereference
leds: trigger: netdev: fix refcnt leak on interface rename
Linus Torvalds [Sat, 30 Mar 2019 18:33:34 +0000 (11:33 -0700)]
Merge tag 'gpio-v5.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO fixes from Linus Walleij:
"As you can see [in the git history] I was away on leave and Bartosz
kindly stepped in and collected a slew of fixes, I pulled them into my
tree in two sets and merged some two more fixes (fixing my own caused
bugs) on top.
Summary:
- Revert the extended use of gpio_set_config() and think about how we
can do this properly.
- Fix up the SPI CS GPIO handling so it now works properly on the SPI
bus children, as intended.
- Error paths and driver fixes"
* tag 'gpio-v5.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: mockup: use simple_read_from_buffer() in debugfs read callback
gpio: of: Fix of_gpiochip_add() error path
gpio: of: Check for "spi-cs-high" in child instead of parent node
gpio: of: Check propname before applying "cs-gpios" quirks
gpio: mockup: fix debugfs read
Revert "gpio: use new gpio_set_config() helper in more places"
gpio: aspeed: fix a potential NULL pointer dereference
gpio: amd-fch: Fix bogus SPDX identifier
gpio: adnp: Fix testing wrong value in adnp_gpio_direction_input
gpio: exar: add a check for the return value of ida_simple_get fails
Rasmus Villemoes [Thu, 14 Mar 2019 14:06:14 +0000 (15:06 +0100)]
leds: trigger: netdev: use memcpy in device_name_store
If userspace doesn't end the input with a newline (which can easily
happen if the write happens from a C program that does write(fd,
iface, strlen(iface))), we may end up including garbage from a
previous, longer value in the device_name. For example
I highly doubt anybody is relying on this behaviour, so switch to
simply copying the bytes (we've already checked that size is <
IFNAMSIZ) and unconditionally zero-terminate it; of course, we also
still have to strip a trailing newline.
This is also preparation for future patches.
Fixes: 06f502f57d0d ("leds: trigger: Introduce a NETDEV trigger") Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Linus Torvalds [Sat, 30 Mar 2019 17:35:20 +0000 (10:35 -0700)]
Merge tag 'staging-5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging driver fixes from Greg KH:
"Here are some small staging driver fixes for 5.1-rc3, and one driver
removal.
The biggest thing here is the removal of the mt7621-eth driver as a
"real" network driver was merged in 5.1-rc1 for this hardware, so this
old driver can now be removed.
Other than that, there are just a number of small fixes, all resolving
reported issues and some potential corner cases for error handling
paths.
All of these have been in linux-next with no reported issues"
* tag 'staging-5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: vt6655: Remove vif check from vnt_interrupt
staging: erofs: keep corrupted fs from crashing kernel in erofs_readdir()
staging: octeon-ethernet: fix incorrect PHY mode
staging: vc04_services: Fix an error code in vchiq_probe()
staging: erofs: fix error handling when failed to read compresssed data
staging: vt6655: Fix interrupt race condition on device start up.
staging: rtlwifi: Fix potential NULL pointer dereference of kzalloc
staging: rtl8712: uninitialized memory in read_bbreg_hdl()
staging: rtlwifi: rtl8822b: fix to avoid potential NULL pointer dereference
staging: rtl8188eu: Fix potential NULL pointer dereference of kcalloc
staging, mt7621-pci: fix build without pci support
staging: speakup_soft: Fix alternate speech with other synths
staging: axis-fifo: add CONFIG_OF dependency
staging: olpc_dcon_xo_1: add missing 'const' qualifier
staging: comedi: ni_mio_common: Fix divide-by-zero for DIO cmdtest
staging: erofs: fix to handle error path of erofs_vmap()
staging: mt7621-dts: update ethernet settings.
staging: remove mt7621-eth
Linus Torvalds [Sat, 30 Mar 2019 17:30:38 +0000 (10:30 -0700)]
Merge tag 'tty-5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
"Here are some small tty and serial driver fixes for 5.1-rc3.
Nothing major here, just a number of potential problems fixes for
error handling paths, as well as some other minor bugfixes for
reported issues with 5.1-rc1.
All of these have been in linux-next with no reported issues"
* tag 'tty-5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
tty: fix NULL pointer issue when tty_port ops is not set
Disable kgdboc failed by echo space to /sys/module/kgdboc/parameters/kgdboc
dt-bindings: serial: Add compatible for Mediatek MT8183
tty/serial: atmel: RS485 HD w/DMA: enable RX after TX is stopped
tty/serial: atmel: Add is_half_duplex helper
serial: sh-sci: Fix setting SCSCR_TIE while transferring data
serial: ar933x_uart: Fix build failure with disabled console
tty: serial: qcom_geni_serial: Initialize baud in qcom_geni_console_setup
sc16is7xx: missing unregister/delete driver on error in sc16is7xx_init()
tty: mxs-auart: fix a potential NULL pointer dereference
tty: atmel_serial: fix a potential NULL pointer dereference
serial: max310x: Fix to avoid potential NULL pointer dereference
serial: mvebu-uart: Fix to avoid a potential NULL pointer dereference
Linus Torvalds [Sat, 30 Mar 2019 17:26:36 +0000 (10:26 -0700)]
Merge tag 'usb-5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some small USB fixes for 5.1-rc3.
Nothing major at all here, just a small collection of fixes for
reported issues, and potential problems with error handling paths.
Also a few new device ids, as normal.
All of these have been in linux-next with no reported issues"
* tag 'usb-5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (25 commits)
USB: serial: option: add Olicard 600
USB: serial: cp210x: add new device id
usb: u132-hcd: fix resource leak
usb: cdc-acm: fix race during wakeup blocking TX traffic
usb: mtu3: fix EXTCON dependency
usb: usb251xb: fix to avoid potential NULL pointer dereference
usb: core: Try generic PHY_MODE_USB_HOST if usb_phy_roothub_set_mode fails
phy: sun4i-usb: Support set_mode to USB_HOST for non-OTG PHYs
xhci: Don't let USB3 ports stuck in polling state prevent suspend
usb: xhci: dbc: Don't free all memory with spinlock held
xhci: Fix port resume done detection for SS ports with LPM enabled
USB: serial: mos7720: fix mos_parport refcount imbalance on error path
USB: gadget: f_hid: fix deadlock in f_hidg_write()
usb: gadget: net2272: Fix net2272_dequeue()
usb: gadget: net2280: Fix net2280_dequeue()
usb: gadget: net2280: Fix overrun of OUT messages
usb: dwc3: pci: add support for Comet Lake PCH ID
usb: usb251xb: Remove unnecessary comparison of unsigned integer with >= 0
usb: common: Consider only available nodes for dr_mode
usb: typec: tcpm: Try PD-2.0 if sink does not respond to 3.0 source-caps
...
Linus Torvalds [Sat, 30 Mar 2019 17:09:11 +0000 (10:09 -0700)]
Merge tag 'acpi-5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"This corrects a previous attempt to make Linux use its own set of ACPI
debug flags different from the upstream ACPICA's default (Erik
Schmauss)"
* tag 'acpi-5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: use different default debug value than ACPICA
Linus Torvalds [Sat, 30 Mar 2019 17:06:09 +0000 (10:06 -0700)]
Merge tag 'pm-5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix CPU base frequency reporting in the intel_pstate driver and
a use-after-free in the scpi-cpufreq driver.
Specifics:
- Fix the ACPI CPPC library to actually follow the specification when
decoding the guaranteed performance register information and make
the intel_pstate driver to fall back to the nominal frequency when
reporting the base frequency if the guaranteed performance register
information is not there (Srinivas Pandruvada).
- Fix use-after-free in the exit callback of the scpi-cpufreq left
after an update during the 5.0 development cycle (Vincent Stehlé)"
* tag 'pm-5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: scpi: Fix use after free
cpufreq: intel_pstate: Also use CPPC nominal_perf for base_frequency
ACPI / CPPC: Fix guaranteed performance handling
Linus Torvalds [Sat, 30 Mar 2019 16:19:09 +0000 (09:19 -0700)]
Merge branch 'fixes-v5.1-a' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security layer fixes from James Morris:
"Yama and LSM config fixes"
* 'fixes-v5.1-a' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
LSM: Revive CONFIG_DEFAULT_SECURITY_* for "make oldconfig"
Yama: mark local symbols as static
Linus Torvalds [Fri, 29 Mar 2019 23:02:28 +0000 (16:02 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"22 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (22 commits)
fs/proc/proc_sysctl.c: fix NULL pointer dereference in put_links
fs: fs_parser: fix printk format warning
checkpatch: add %pt as a valid vsprintf extension
mm/migrate.c: add missing flush_dcache_page for non-mapped page migrate
drivers/block/zram/zram_drv.c: fix idle/writeback string compare
mm/page_isolation.c: fix a wrong flag in set_migratetype_isolate()
mm/memory_hotplug.c: fix notification in offline error path
ptrace: take into account saved_sigmask in PTRACE{GET,SET}SIGMASK
fs/proc/kcore.c: make kcore_modules static
include/linux/list.h: fix list_is_first() kernel-doc
mm/debug.c: fix __dump_page when mapping->host is not set
mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified
include/linux/hugetlb.h: convert to use vm_fault_t
iommu/io-pgtable-arm-v7s: request DMA32 memory, and improve debugging
mm: add support for kmem caches in DMA32 zone
ocfs2: fix inode bh swapping mixup in ocfs2_reflink_inodes_lock
mm/hotplug: fix offline undo_isolate_page_range()
fs/open.c: allow opening only regular files during execve()
mailmap: add Changbin Du
mm/debug.c: add a cast to u64 for atomic64_read()
...
Linus Torvalds [Fri, 29 Mar 2019 22:44:11 +0000 (15:44 -0700)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fix from Catalin Marinas:
"Use memblock_alloc() instead of memblock_alloc_low() in
request_standard_resources(), the latter being limited to the low 4G
memory range on arm64"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: replace memblock_alloc_low with memblock_alloc
Linus Torvalds [Fri, 29 Mar 2019 22:37:10 +0000 (15:37 -0700)]
Merge tag 'iommu-fixes-v5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU fixes from Joerg Roedel:
- Fix a bug in the AMD IOMMU driver not handling exclusion ranges
correctly. In fact the driver did not reserve these ranges for IOVA
allocations, so that dma-handles could be allocated in an exclusion
range, leading to data corruption. Exclusion ranges have not been
used by any firmware up to now, so this issue remained undiscovered
for quite some time.
- Fix wrong warning messages that the IOMMU core code prints when it
tries to allocate the default domain for an iommu group and the
driver does not support any of the default domain types (like Intel
VT-d).
* tag 'iommu-fixes-v5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/amd: Reserve exclusion range in iova-domain
iommu: Don't print warning when IOMMU driver only supports unmanaged domains
Linus Torvalds [Fri, 29 Mar 2019 22:07:29 +0000 (15:07 -0700)]
Merge tag 'driver-core-5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fix from Greg KH:
"Here is a single driver core patch for 5.1-rc3.
After 5.1-rc1, all of the users of BUS_ATTR() are finally removed, so
we can now drop this macro from include/linux/device.h so that no more
new users will be created.
This patch has been in linux-next for a while, with no reported
issues"
* tag 'driver-core-5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
driver core: remove BUS_ATTR()
Linus Torvalds [Fri, 29 Mar 2019 22:03:30 +0000 (15:03 -0700)]
Merge tag 'char-misc-5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are some binder, habanalabs, and vboxguest driver fixes for
5.1-rc3.
The Binder fixes resolve some reported issues found by testing, first
by the selinux developers, and then earlier today by syzbot.
The habanalabs fixes are all minor, resolving a number of tiny things.
The vboxguest patches are a bit larger. They resolve the fact that
virtual box decided to change their api in their latest release in a
way that broke the existing kernel code, despite saying that they were
never going to do that. So this is a bit of a "new feature", but is
good to get merged so that 5.1 will work with the latest release. The
changes are not large and of course virtual box "swears" they will not
break this again, but no one is holding their breath here.
All of these have been in linux-next for a while with no reported
issues"
* tag 'char-misc-5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
virt: vbox: Implement passing requestor info to the host for VirtualBox 6.0.x
binder: fix race between munmap() and direct reclaim
binder: fix BUG_ON found by selinux-testsuite
habanalabs: cast to expected type
habanalabs: prevent host crash during suspend/resume
habanalabs: perform accounting for active CS
habanalabs: fix mapping with page size bigger than 4KB
habanalabs: complete user context cleanup before hard reset
habanalabs: fix bug when mapping very large memory area
habanalabs: fix MMU number of pages calculation
Linus Torvalds [Fri, 29 Mar 2019 21:58:49 +0000 (14:58 -0700)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Thirteen fixes, seven of which are for IBM fibre channel and three
additional for fairly serious bugs in drivers (qla2xxx, mpt3sas,
aacraid).
Of the three core fixes, the most significant is probably the missed
run queue causing an indefinite hang. The others are fixing a
potential use after free on device close and silencing an incorrect
warning"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ibmvfc: Clean up transport events
scsi: ibmvfc: Byte swap status and error codes when logging
scsi: ibmvfc: Add failed PRLI to cmd_status lookup array
scsi: ibmvfc: Remove "failed" from logged errors
scsi: zfcp: reduce flood of fcrscn1 trace records on multi-element RSCN
scsi: zfcp: fix scsi_eh host reset with port_forced ERP for non-NPIV FCP devices
scsi: zfcp: fix rport unblock if deleted SCSI devices on Scsi_Host
scsi: sd: Quiesce warning if device does not report optimal I/O size
scsi: sd: Fix a race between closing an sd device and sd I/O
scsi: core: Run queue when state is set to running after being blocked
scsi: qla4xxx: fix a potential NULL pointer dereference
scsi: aacraid: Insure we don't access PCIe space during AER/EEH
scsi: mpt3sas: Fix kernel panic during expander reset
Linus Torvalds [Fri, 29 Mar 2019 21:56:53 +0000 (14:56 -0700)]
Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"A new ID for the i801 driver and some Documentation fixes to make it
easier for people to find the bindings (which is also a basis for
further improvements in that area)"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: wmt: make bindings file name match the driver
i2c: sun6i-p2wi: make bindings file name match the driver
i2c: stu300: make bindings file name match the driver
i2c: mt65xx: make bindings file name match the driver
i2c: iop3xx: make bindings file name match the driver
i2c: i801: Add support for Intel Comet Lake
Linus Torvalds [Fri, 29 Mar 2019 21:53:33 +0000 (14:53 -0700)]
Merge tag 'sound-5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"The important fixes at this time are a couple fixes in ALSA core: a
fix for PCM is about the OOB access in PCM OSS plugins that has been
for long time, but hasn't hit so often until now just because we
allocated a large buffer via vmalloc(), and surfaced more often after
switching to kvmalloc(). Another fix is for a long-standing PCM
problem wrt racy PM resume.
Others are trivial nospec coverage and usual HD-audio quirks"
* tag 'sound-5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek - Fix speakers on Acer Predator Helios 500 Ryzen laptops
ALSA: pcm: Don't suspend stream in unrecoverable PCM state
ALSA: hda/ca0132 - Simplify alt firmware loading code
ALSA: pcm: Fix possible OOB access in PCM oss plugins
ALSA: hda/realtek: Enable headset MIC of ASUS X430UN and X512DK with ALC256
ALSA: hda/realtek: Enable headset mic of ASUS P5440FF with ALC256
ALSA: hda/realtek: Enable ASUS X441MB and X705FD headset MIC with ALC256
ALSA: hda/realtek - Add support for Acer Aspire E5-523G/ES1-432 headset mic
ALSA: hda/realtek: Enable headset MIC of Acer Aspire Z24-890 with ALC286
ALSA: seq: oss: Fix Spectre v1 vulnerability
ALSA: rawmidi: Fix potential Spectre v1 vulnerability
Linus Torvalds [Fri, 29 Mar 2019 21:46:00 +0000 (14:46 -0700)]
Merge tag 'kbuild-fixes-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Remove harmful -Oz option of Clang
- Get back the original behavior (no recursion for in-tree build) for
GNU Make 4.x
- Some minor fixes for coccinelle patches
- Do not overwrite .gitignore in the output directory in case it is
version-controlled
- Fix missed record-mcount bug for dynamic ftrace
- Fix endianness bug in modversions for relative CRC
- Cater to '^H' key code in Kconfig ncurses programs
* tag 'kbuild-fixes-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kconfig/[mn]conf: handle backspace (^H) key
kbuild: modversions: Fix relative CRC byte order interpretation
scripts: coccinelle: Fix description of badty.cocci
kbuild: strip whitespace in cmd_record_mcount findstring
kbuild: do not overwrite .gitignore in output directory
kbuild: skip parsing pre sub-make code for recursion
coccinelle: put_device: reduce false positives
kbuild: skip sub-make for in-tree build with GNU Make 4.x
Revert "kbuild: use -Oz instead of -Os when using clang"
Linus Torvalds [Fri, 29 Mar 2019 21:43:07 +0000 (14:43 -0700)]
Merge tag 'for-linus-20190329' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"Small set of fixes that should go into this series. This contains:
- compat signal mask fix for io_uring (Arnd)
- EAGAIN corner case for direct vs buffered writes for io_uring
(Roman)
- NVMe pull request from Christoph with various little fixes
- sbitmap ws_active fix, which caused a perf regression for shared
tags (me)
- sbitmap bit ordering fix (Ming)
- libata on-stack DMA fix (Raymond)"
* tag 'for-linus-20190329' of git://git.kernel.dk/linux-block:
nvmet: fix error flow during ns enable
nvmet: fix building bvec from sg list
nvme-multipath: relax ANA state check
nvme-tcp: fix an endianess miss-annotation
libata: fix using DMA buffers on stack
io_uring: offload write to async worker in case of -EAGAIN
sbitmap: order READ/WRITE freed instance and setting clear bit
blk-mq: fix sbitmap ws_active for shared tags
io_uring: fix big-endian compat signal mask handling
blk-mq: update comment for blk_mq_hctx_has_pending()
blk-mq: use blk_mq_put_driver_tag() to put tag
Linus Torvalds [Fri, 29 Mar 2019 21:41:09 +0000 (14:41 -0700)]
Merge tag 'ceph-for-5.1-rc3' of git://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"A patch to avoid choking on multipage bvecs in the messenger and a
small use-after-free fix"
* tag 'ceph-for-5.1-rc3' of git://github.com/ceph/ceph-client:
ceph: fix use-after-free on symlink traversal
libceph: fix breakage caused by multipage bvecs
Linus Torvalds [Fri, 29 Mar 2019 21:36:57 +0000 (14:36 -0700)]
Merge tag 'xfs-5.1-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Darrick Wong:
"Here are a few fixes for some corruption bugs and uninitialized
variable problems. The few patches here have gone through a few days
worth of fstest runs with no new problems observed.
Changes since last update:
- Fix a bunch of static checker complaints about uninitialized
variables and insufficient range checks.
- Avoid a crash when incore extent map data are corrupt.
- Disallow FITRIM when we haven't recovered the log and know the
metadata are stale.
- Fix a data corruption when doing unaligned overlapping dio writes"
* tag 'xfs-5.1-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: serialize unaligned dio writes against all other dio writes
xfs: prohibit fstrim in norecovery mode
xfs: always init bma in xfs_bmapi_write
xfs: fix btree scrub checking with regards to root-in-inode
xfs: dabtree scrub needs to range-check level
xfs: don't trip over uninitialized buffer on extent read of corrupted inode
Kees Cook [Fri, 29 Mar 2019 19:36:04 +0000 (12:36 -0700)]
LSM: Revive CONFIG_DEFAULT_SECURITY_* for "make oldconfig"
Commit 70b62c25665f636c ("LoadPin: Initialize as ordered LSM") removed
CONFIG_DEFAULT_SECURITY_{SELINUX,SMACK,TOMOYO,APPARMOR,DAC} from
security/Kconfig and changed CONFIG_LSM to provide a fixed ordering as a
default value. That commit expected that existing users (upgrading from
Linux 5.0 and earlier) will edit CONFIG_LSM value in accordance with
their CONFIG_DEFAULT_SECURITY_* choice in their old kernel configs. But
since users might forget to edit CONFIG_LSM value, this patch revives
the choice (only for providing the default value for CONFIG_LSM) in order
to make sure that CONFIG_LSM reflects CONFIG_DEFAULT_SECURITY_* from their
old kernel configs.
Note that since TOMOYO can be fully stacked against the other legacy
major LSMs, when it is selected, it explicitly disables the other LSMs
to avoid them also initializing since TOMOYO does not expect this
currently.
Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reported-by: Randy Dunlap <rdunlap@infradead.org> Fixes: 70b62c25665f636c ("LoadPin: Initialize as ordered LSM") Co-developed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: James Morris <james.morris@microsoft.com>
Thomas Gleixner [Fri, 29 Mar 2019 20:28:58 +0000 (21:28 +0100)]
Merge tag 'perf-urgent-for-mingo-5.1-20190329' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
Pull perf/urgent fixes from Arnaldo:
Core libraries:
Jiri Olsa:
- Fix max perf_event_attr.precise_ip detection.
Kan Liang:
- Fix parser error for uncore event alias
Wei Lin:
- Fixup ordering of kernel maps after obtaining the main kernel map address.
Intel PT:
Adrian Hunter:
- Fix TSC slip where A TSC packet can slip past MTC packets so that the
timestamp appears to go backwards.
- Fixes for exported-sql-viewer GUI conversion to python3.
ARM coresight:
Solomon Tan:
- Fix the build by adding a missing case value for enumeration value introduced
in newer library, that now is the required one.
tool headers:
Arnaldo Carvalho de Melo:
- Syncronize kernel headers with the kernel, getting new io_uring and
pidfd_send_signal syscalls so that 'perf trace' can handle them.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Linus Torvalds [Fri, 29 Mar 2019 18:12:45 +0000 (11:12 -0700)]
Merge tag 'drm-fixes-2019-03-29' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Weekly fixes roundup, nothing two serious, some usb device regressions
are fixed, and i915 GVT has a bigger fix but otherwise not really much
happening here.
core:
- fb bpp check regression fix
- release/unplug fix
- use after free fixes
i915:
- fix mmap range checks
- fix gvt ppgtt mm LRU list access races
- fix selftest error pointer check
- fix a macro definition (pre-emptive for potential further backports)
- fix one AML SKU ULX status
* tag 'drm-fixes-2019-03-29' of git://anongit.freedesktop.org/drm/drm: (22 commits)
drm/i915/icl: Fix VEBOX mismatch BUG_ON()
drm/i915/selftests: Fix an IS_ERR() vs NULL check
drm/i915: Mark AML 0x87CA as ULX
drm/meson: fix TMDS clock filtering for DMT monitors
drm/meson: Uninstall IRQ handler
drm/meson: Fix invalid pointer in meson_drv_unbind()
drm/udl: Refactor edid retrieving in UDL driver (v2)
drm: Fix drm_release() and device unplug
drm/fb: avoid setting 0 depth.
drm/tegra: vic: Fix implicit function declaration warning
drm/tegra: hub: Fix dereference before check
drm/i915/icl: Fix the TRANS_DDI_FUNC_CTL2 bitfield macro
drm/amd/display: Only allow VRR when vrefresh is within supported range
drm/rockchip: vop: reset scale mode when win is disabled
drm/vkms: fix use-after-free when drm_gem_handle_create() fails
drm/vgem: fix use-after-free when drm_gem_handle_create() fails
drm/i915/gvt: Add mutual lock for ppgtt mm LRU list
drm/i915/gvt: Only assign ppgtt root at dispatch time
drm/i915/gvt: Don't submit request for error workload dispatch
drm/i915/gvt: stop scheduling workload when vgpu is inactive
...
A new dir entry can be created in get_subdir and its 'header->parent' is
set to NULL. Only after insert_header success, it will be set to 'dir',
otherwise 'header->parent' is set to NULL and drop_sysctl_table is called.
However in err handling path of get_subdir, drop_sysctl_table also be
called on 'new->header' regardless its value of parent pointer. Then
put_links is called, which triggers NULL-ptr deref when access member of
header->parent.
In fact we have multiple error paths which call drop_sysctl_table() there,
upon failure on insert_links() we also call drop_sysctl_table().And even
in the successful case on __register_sysctl_table() we still always call
drop_sysctl_table().This patch fix it.
Link: http://lkml.kernel.org/r/20190314085527.13244-1-yuehaibing@huawei.com Fixes: 0e47c99d7fe25 ("sysctl: Replace root_list with links between sysctl_table_sets") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reported-by: Hulk Robot <hulkci@huawei.com> Acked-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: <stable@vger.kernel.org> [3.4+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lars Persson [Fri, 29 Mar 2019 03:44:28 +0000 (20:44 -0700)]
mm/migrate.c: add missing flush_dcache_page for non-mapped page migrate
Our MIPS 1004Kc SoCs were seeing random userspace crashes with SIGILL
and SIGSEGV that could not be traced back to a userspace code bug. They
had all the magic signs of an I/D cache coherency issue.
Now recently we noticed that the /proc/sys/vm/compact_memory interface
was quite efficient at provoking this class of userspace crashes.
Studying the code in mm/migrate.c there is a distinction made between
migrating a page that is mapped at the instant of migration and one that
is not mapped. Our problem turned out to be the non-mapped pages.
For the non-mapped page the code performs a copy of the page content and
all relevant meta-data of the page without doing the required D-cache
maintenance. This leaves dirty data in the D-cache of the CPU and on
the 1004K cores this data is not visible to the I-cache. A subsequent
page-fault that triggers a mapping of the page will happily serve the
process with potentially stale code.
What about ARM then, this bug should have seen greater exposure? Well
ARM became immune to this flaw back in 2010, see commit c01778001a4f
("ARM: 6379/1: Assume new page cache pages have dirty D-cache").
My proposed fix moves the D-cache maintenance inside move_to_new_page to
make it common for both cases.
Link: http://lkml.kernel.org/r/20190315083502.11849-1-larper@axis.com Fixes: 97ee0524614 ("flush cache before installing new page at migraton") Signed-off-by: Lars Persson <larper@axis.com> Reviewed-by: Paul Burton <paul.burton@mips.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Makoto report a below KASAN error: zram does out-of-bounds read. Because
strscpy copies from source up to count bytes unconditionally. It could
cause out-of-bounds read on next object in slab.
To prevent it, use strlcpy which checks source's length automatically.
BUG: KASAN: slab-out-of-bounds in strscpy+0x68/0x154
Read of size 8 at addr ffffffc0c3495a00 by task system_server/1314
..
Call trace:
strscpy+0x68/0x154
idle_store+0xc4/0x34c
dev_attr_store+0x50/0x6c
sysfs_kf_write+0x98/0xb4
kernfs_fop_write+0x198/0x260
__vfs_write+0x10c/0x338
vfs_write+0x114/0x238
SyS_write+0xc8/0x168
__sys_trace_return+0x0/0x4
Allocated by task 1314:
__kmalloc+0x280/0x318
kernfs_fop_write+0xac/0x260
__vfs_write+0x10c/0x338
vfs_write+0x114/0x238
SyS_write+0xc8/0x168
__sys_trace_return+0x0/0x4
The buggy address belongs to the object at ffffffc0c3495a00
which belongs to the cache kmalloc-128 of size 128
The buggy address is located 0 bytes inside of
128-byte region [ffffffc0c3495a00, ffffffc0c3495a80)
The buggy address belongs to the page:
page:ffffffbf030d2500 count:1 mapcount:0 mapping: (null) index:0x0 compound_mapcount: 0
flags: 0x4000000000010200(slab|head)
page dumped because: kasan: bad access detected
Memory state around the buggy address: ffffffc0c3495900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffffffc0c3495980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffffffc0c3495a00: 04 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^ ffffffc0c3495a80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffffffc0c3495b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Qian Cai [Fri, 29 Mar 2019 03:44:21 +0000 (20:44 -0700)]
mm/page_isolation.c: fix a wrong flag in set_migratetype_isolate()
Due to has_unmovable_pages() taking an incorrect irqsave flag instead of
the isolation flag in set_migratetype_isolate(), there are issues with
HWPOSION and error reporting where dump_page() is not called when there
is an unmovable page.
Link: http://lkml.kernel.org/r/20190320204941.53731-1-cai@lca.pw Fixes: d381c54760dc ("mm: only report isolation failures when offlining memory") Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Signed-off-by: Qian Cai <cai@lca.pw> Cc: <stable@vger.kernel.org> [5.0.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Qian Cai [Fri, 29 Mar 2019 03:44:16 +0000 (20:44 -0700)]
mm/memory_hotplug.c: fix notification in offline error path
When start_isolate_page_range() returned -EBUSY in __offline_pages(), it
calls memory_notify(MEM_CANCEL_OFFLINE, &arg) with an uninitialized
"arg". As the result, it triggers warnings below. Also, it is only
necessary to notify MEM_CANCEL_OFFLINE after MEM_GOING_OFFLINE.
Link: http://lkml.kernel.org/r/20190320204255.53571-1-cai@lca.pw Fixes: 7960509329c2 ("mm, memory_hotplug: print reason for the offlining failure") Reviewed-by: Oscar Salvador <osalvador@suse.de> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Qian Cai <cai@lca.pw> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: <stable@vger.kernel.org> [5.0.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrei Vagin [Fri, 29 Mar 2019 03:44:13 +0000 (20:44 -0700)]
ptrace: take into account saved_sigmask in PTRACE{GET,SET}SIGMASK
There are a few system calls (pselect, ppoll, etc) which replace a task
sigmask while they are running in a kernel-space
When a task calls one of these syscalls, the kernel saves a current
sigmask in task->saved_sigmask and sets a syscall sigmask.
On syscall-exit-stop, ptrace traps a task before restoring the
saved_sigmask, so PTRACE_GETSIGMASK returns the syscall sigmask and
PTRACE_SETSIGMASK does nothing, because its sigmask is replaced by
saved_sigmask, when the task returns to user-space.
This patch fixes this problem. PTRACE_GETSIGMASK returns saved_sigmask
if it's set. PTRACE_SETSIGMASK drops the TIF_RESTORE_SIGMASK flag.
Link: http://lkml.kernel.org/r/20181120060616.6043-1-avagin@gmail.com Fixes: 29000caecbe8 ("ptrace: add ability to get/set signal-blocked mask") Signed-off-by: Andrei Vagin <avagin@gmail.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oscar Salvador [Fri, 29 Mar 2019 03:44:01 +0000 (20:44 -0700)]
mm/debug.c: fix __dump_page when mapping->host is not set
While debugging something, I added a dump_page() into do_swap_page(),
and I got the splat from below. The issue happens when dereferencing
mapping->host in __dump_page():
Swap address space does not contain an inode information, and so
mapping->host equals NULL.
Although the dump_page() call was added artificially into
do_swap_page(), I am not sure if we can hit this from any other path, so
it looks worth fixing it. We can easily do that by checking
mapping->host first.
Link: http://lkml.kernel.org/r/20190318072931.29094-1-osalvador@suse.de Fixes: 1c6fb1d89e73c ("mm: print more information about mapping in __dump_page") Signed-off-by: Oscar Salvador <osalvador@suse.de> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yang Shi [Fri, 29 Mar 2019 03:43:55 +0000 (20:43 -0700)]
mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified
When MPOL_MF_STRICT was specified and an existing page was already on a
node that does not follow the policy, mbind() should return -EIO. But
commit 6f4576e3687b ("mempolicy: apply page table walker on
queue_pages_range()") broke the rule.
And commit c8633798497c ("mm: mempolicy: mbind and migrate_pages support
thp migration") didn't return the correct value for THP mbind() too.
If MPOL_MF_STRICT is set, ignore vma_migratable() to make sure it
reaches queue_pages_to_pte_range() or queue_pages_pmd() to check if an
existing page was already on a node that does not follow the policy.
And, non-migratable vma may be used, return -EIO too if MPOL_MF_MOVE or
MPOL_MF_MOVE_ALL was specified.
Tested with https://github.com/metan-ucw/ltp/blob/master/testcases/kernel/syscalls/mbind/mbind02.c
[akpm@linux-foundation.org: tweak code comment] Link: http://lkml.kernel.org/r/1553020556-38583-1-git-send-email-yang.shi@linux.alibaba.com Fixes: 6f4576e3687b ("mempolicy: apply page table walker on queue_pages_range()") Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Oscar Salvador <osalvador@suse.de> Reported-by: Cyril Hrubis <chrubis@suse.cz> Suggested-by: Kirill A. Shutemov <kirill@shutemov.name> Acked-by: Rafael Aquini <aquini@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Acked-by: David Rientjes <rientjes@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
>> mm/memory.c:3968:21: sparse: incorrect type in assignment (different
>> base types) @@ expected restricted vm_fault_t [usertype] ret @@
>> got e] ret @@
mm/memory.c:3968:21: expected restricted vm_fault_t [usertype] ret
mm/memory.c:3968:21: got int
This patch converts to return vm_fault_t type for hugetlb_fault() when
CONFIG_HUGETLB_PAGE=n.
Regarding the sparse warning, Luc said:
: This is the expected behaviour. The constant 0 is magic regarding bitwise
: types but ({ ...; 0; }) is not, it is just an ordinary expression of type
: 'int'.
:
: So, IMHO, Souptick's patch is the right thing to do.
Link: http://lkml.kernel.org/r/20190318162604.GA31553@jordon-HP-15-Notebook-PC Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nicolas Boichat [Fri, 29 Mar 2019 03:43:46 +0000 (20:43 -0700)]
iommu/io-pgtable-arm-v7s: request DMA32 memory, and improve debugging
IOMMUs using ARMv7 short-descriptor format require page tables (level 1
and 2) to be allocated within the first 4GB of RAM, even on 64-bit
systems.
For level 1/2 pages, ensure GFP_DMA32 is used if CONFIG_ZONE_DMA32 is
defined (e.g. on arm64 platforms).
For level 2 pages, allocate a slab cache in SLAB_CACHE_DMA32. Note that
we do not explicitly pass GFP_DMA[32] to kmem_cache_zalloc, as this is
not strictly necessary, and would cause a warning in mm/sl*b.c, as we
did not update GFP_SLAB_BUG_MASK.
Also, print an error when the physical address does not fit in
32-bit, to make debugging easier in the future.
Link: http://lkml.kernel.org/r/20181210011504.122604-3-drinkcat@chromium.org Fixes: ad67f5a6545f ("arm64: replace ZONE_DMA with ZONE_DMA32") Signed-off-by: Nicolas Boichat <drinkcat@chromium.org> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Hsin-Yi Wang <hsinyi@chromium.org> Cc: Huaisheng Ye <yehs1@lenovo.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sasha Levin <Alexander.Levin@microsoft.com> Cc: Tomasz Figa <tfiga@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yingjoe Chen <yingjoe.chen@mediatek.com> Cc: Yong Wu <yong.wu@mediatek.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nicolas Boichat [Fri, 29 Mar 2019 03:43:42 +0000 (20:43 -0700)]
mm: add support for kmem caches in DMA32 zone
Patch series "iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables",
v6.
This is a followup to the discussion in [1], [2].
IOMMUs using ARMv7 short-descriptor format require page tables (level 1
and 2) to be allocated within the first 4GB of RAM, even on 64-bit
systems.
For L1 tables that are bigger than a page, we can just use
__get_free_pages with GFP_DMA32 (on arm64 systems only, arm would still
use GFP_DMA).
For L2 tables that only take 1KB, it would be a waste to allocate a full
page, so we considered 3 approaches:
1. This series, adding support for GFP_DMA32 slab caches.
2. genalloc, which requires pre-allocating the maximum number of L2 page
tables (4096, so 4MB of memory).
3. page_frag, which is not very memory-efficient as it is unable to reuse
freed fragments until the whole page is freed. [3]
This series is the most memory-efficient approach.
stable@ note:
We confirmed that this is a regression, and IOMMU errors happen on 4.19
and linux-next/master on MT8173 (elm, Acer Chromebook R13). The issue
most likely starts from commit ad67f5a6545f ("arm64: replace ZONE_DMA
with ZONE_DMA32"), i.e. 4.15, and presumably breaks a number of Mediatek
platforms (and maybe others?).
IOMMUs using ARMv7 short-descriptor format require page tables to be
allocated within the first 4GB of RAM, even on 64-bit systems. On arm64,
this is done by passing GFP_DMA32 flag to memory allocation functions.
For IOMMU L2 tables that only take 1KB, it would be a waste to allocate
a full page using get_free_pages, so we considered 3 approaches:
1. This patch, adding support for GFP_DMA32 slab caches.
2. genalloc, which requires pre-allocating the maximum number of L2
page tables (4096, so 4MB of memory).
3. page_frag, which is not very memory-efficient as it is unable
to reuse freed fragments until the whole page is freed.
This change makes it possible to create a custom cache in DMA32 zone using
kmem_cache_create, then allocate memory using kmem_cache_alloc.
We do not create a DMA32 kmalloc cache array, as there are currently no
users of kmalloc(..., GFP_DMA32). These calls will continue to trigger a
warning, as we keep GFP_DMA32 in GFP_SLAB_BUG_MASK.
This implies that calls to kmem_cache_*alloc on a SLAB_CACHE_DMA32
kmem_cache must _not_ use GFP_DMA32 (it is anyway redundant and
unnecessary).
Link: http://lkml.kernel.org/r/20181210011504.122604-2-drinkcat@chromium.org Signed-off-by: Nicolas Boichat <drinkcat@chromium.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Sasha Levin <Alexander.Levin@microsoft.com> Cc: Huaisheng Ye <yehs1@lenovo.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Yong Wu <yong.wu@mediatek.com> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: Tomasz Figa <tfiga@google.com> Cc: Yingjoe Chen <yingjoe.chen@mediatek.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Hsin-Yi Wang <hsinyi@chromium.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Darrick J. Wong [Fri, 29 Mar 2019 03:43:38 +0000 (20:43 -0700)]
ocfs2: fix inode bh swapping mixup in ocfs2_reflink_inodes_lock
ocfs2_reflink_inodes_lock() can swap the inode1/inode2 variables so that
we always grab cluster locks in order of increasing inode number.
Unfortunately, we forget to swap the inode record buffer head pointers
when we've done this, which leads to incorrect bookkeepping when we're
trying to make the two inodes have the same refcount tree.
This has the effect of causing filesystem shutdowns if you're trying to
reflink data from inode 100 into inode 97, where inode 100 already has a
refcount tree attached and inode 97 doesn't. The reflink code decides
to copy the refcount tree pointer from 100 to 97, but uses inode 97's
inode record to open the tree root (which it doesn't have) and blows up.
This issue causes filesystem shutdowns and metadata corruption!
Link: http://lkml.kernel.org/r/20190312214910.GK20533@magnolia Fixes: 29ac8e856cb369 ("ocfs2: implement the VFS clone_range, copy_range, and dedupe_range features") Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Joseph Qi <jiangqi903@gmail.com> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <joseph.qi@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Qian Cai [Fri, 29 Mar 2019 03:43:34 +0000 (20:43 -0700)]
mm/hotplug: fix offline undo_isolate_page_range()
Commit f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded
memory to zones until online") introduced move_pfn_range_to_zone() which
calls memmap_init_zone() during onlining a memory block.
memmap_init_zone() will reset pagetype flags and makes migrate type to
be MOVABLE.
However, in __offline_pages(), it also call undo_isolate_page_range()
after offline_isolated_pages() to do the same thing. Due to commit 2ce13640b3f4 ("mm: __first_valid_page skip over offline pages") changed
__first_valid_page() to skip offline pages, undo_isolate_page_range()
here just waste CPU cycles looping around the offlining PFN range while
doing nothing, because __first_valid_page() will return NULL as
offline_isolated_pages() has already marked all memory sections within
the pfn range as offline via offline_mem_sections().
Also, after calling the "useless" undo_isolate_page_range() here, it
reaches the point of no returning by notifying MEM_OFFLINE. Those pages
will be marked as MIGRATE_MOVABLE again once onlining. The only thing
left to do is to decrease the number of isolated pageblocks zone counter
which would make some paths of the page allocation slower that the above
commit introduced.
Even if alloc_contig_range() can be used to isolate 16GB-hugetlb pages
on ppc64, an "int" should still be enough to represent the number of
pageblocks there. Fix an incorrect comment along the way.
[cai@lca.pw: v4] Link: http://lkml.kernel.org/r/20190314150641.59358-1-cai@lca.pw Link: http://lkml.kernel.org/r/20190313143133.46200-1-cai@lca.pw Fixes: 2ce13640b3f4 ("mm: __first_valid_page skip over offline pages") Signed-off-by: Qian Cai <cai@lca.pw> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> [4.13+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tetsuo Handa [Fri, 29 Mar 2019 03:43:30 +0000 (20:43 -0700)]
fs/open.c: allow opening only regular files during execve()
syzbot is hitting lockdep warning [1] due to trying to open a fifo
during an execve() operation. But we don't need to open non regular
files during an execve() operation, for all files which we will need are
the executable file itself and the interpreter programs like /bin/sh and
ld-linux.so.2 .
Since the manpage for execve(2) says that execve() returns EACCES when
the file or a script interpreter is not a regular file, and the manpage
for uselib(2) says that uselib() can return EACCES, and we use
FMODE_EXEC when opening for execve()/uselib(), we can bail out if a non
regular file is requested with FMODE_EXEC set.
Since this deadlock followed by khungtaskd warnings is trivially
reproducible by a local unprivileged user, and syzbot's frequent crash
due to this deadlock defers finding other bugs, let's workaround this
deadlock until we get a chance to find a better solution.