Benjamin Poirier [Fri, 27 Sep 2019 10:12:08 +0000 (19:12 +0900)]
staging: qlge: Replace memset with assignment
Instead of clearing the structure wholesale, it is sufficient to initialize
the skb member which is used to manage sbq instances. lbq instances are
managed according to curr_idx and clean_idx.
Benjamin Poirier [Fri, 27 Sep 2019 10:12:03 +0000 (19:12 +0900)]
staging: qlge: Fix dma_sync_single calls
Using the unmap addr elsewhere than unmap calls is a misuse of the dma api.
In prevision of this fix, qlge kept two copies of the dma address around ;)
Fixes: c4e84bde1d59 ("qlge: New Qlogic 10Gb Ethernet Driver.") Fixes: 7c734359d350 ("qlge: Size RX buffers based on MTU.") Fixes: 2c9a266afefe ("qlge: Fix receive packets drop.") Signed-off-by: Benjamin Poirier <bpoirier@suse.com> Link: https://lore.kernel.org/r/20190927101210.23856-10-bpoirier@suse.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The qlge driver (and device) uses two kinds of buffers for reception,
so-called "small buffers" and "large buffers". The two are arranged in
rings, the sbq and lbq. These two share similar data structures and code.
Factor out data structures into a common struct qlge_bq, make required
adjustments to code and dedup the most obvious cases of copy/paste.
This patch should not introduce any functional change other than to some of
the printk format strings.
This is unneeded for two reasons:
1) the cpu does not write data for the device in the mapping
2) calls like ..._sync_..._for_device(..., ..._FROMDEVICE) are
nonsensical, see commit 3f0fb4e85b38 ("Documentation/DMA-API-HOWTO.txt:
fix misleading example")
Benjamin Poirier [Fri, 27 Sep 2019 10:12:00 +0000 (19:12 +0900)]
staging: qlge: Remove rx_ring.sbq_buf_size
Tx completion rings have sbq_buf_size = 0 but there's no case where the
code actually tests on that value. We can remove sbq_buf_size and use a
constant instead.
Benjamin Poirier [Fri, 27 Sep 2019 10:11:58 +0000 (19:11 +0900)]
staging: qlge: Deduplicate lbq_buf_size
lbq_buf_size is duplicated to every rx_ring structure whereas lbq_buf_order
is present once in the ql_adapter structure. All rings use the same buf
size, keep only one copy of it. Also factor out the calculation of
lbq_buf_size instead of having two copies.
Benjamin Poirier [Fri, 27 Sep 2019 10:11:56 +0000 (19:11 +0900)]
staging: qlge: Remove irq_cnt
qlge uses an irq enable/disable refcounting scheme that is:
* poorly implemented
Uses a spin_lock to protect accesses to the irq_cnt atomic
variable.
* buggy
Breaks when there is not a 1:1 sequence of irq - napi_poll, such as
when using SO_BUSY_POLL.
* unnecessary
The purpose or irq_cnt is to reduce irq control writes when
multiple work items result from one irq: the irq is re-enabled
after all work is done.
Analysis of the irq handler shows that there is only one case where
there might be two workers scheduled at once, and those have
separate irq masking bits.
Therefore, remove irq_cnt.
Additionally, we get a performance improvement:
perf stat -e cycles -a -r5 super_netperf 100 -H 192.168.33.1 -t TCP_RR
Benjamin Poirier [Fri, 27 Sep 2019 10:11:55 +0000 (19:11 +0900)]
staging: qlge: Fix irq masking in INTx mode
Tracing the driver operation reveals that the INTR_EN_EN bit (per-queue
interrupt control) does not immediately prevent rx completion interrupts
when the device is operating in INTx mode. This leads to interrupts being
raised while napi is scheduled/running. Those interrupts are ignored by
qlge_isr() and falsely reported as IRQ_NONE thanks to the irq_cnt scheme.
This in turn can cause frames to loiter in the receive queue until a later
frame leads to another rx interrupt that will schedule napi.
Use the INTR_EN_EI bit (master interrupt control) instead.
Chip can make foreground scan or background, but both can't be mixed in
same request. So, we need to split each mac80211 requests into multiple
HIF requests.
Three things make this task more complex than it should:
- Chip necessitate to associate a link-id to each station. It is same
thing than association ID but, using 8 bits only.
- Rate policy is sent separately from Tx frames
- Driver try to handle itself power saving of stations and multicast
data
A few tasks remain to be done in order to finish chip initial
configuration:
- configure chip to use multi-tx confirmation (speed up data
transfer)
- configure chip to use wake-up feature (save power consumption
during runtime)
- set hardware configuration (clocks, RF, pinout, etc...) using a
Platform Data Set (PDS) file
On release, driver completely shutdown the chip to save power
consumption.
Documentation about PDS and PDS data for sample boards are available
here[1]. One day, PDS data may find a place in device tree but,
currently, PDS is too much linked with firmware to allowing that.
This patch also add "send_pds" file in debugfs to be able to dynamically
change PDS (only for debug, of course).
Chip support encryption of the link between host and chip. This feature
is called "secure link". Driver code on github[1] support it. However,
it relies on mbedtls for cryptographic functions. So, I decided to not
import this feature in current patch. However, in order to keep code
synchronized between github and kernel, I imported all code related to
this feature, even if most of it is just no-op.
Chip has multiple input buffers and can handle multiple 802.11 frames
in parallel. However, other HIF command must be sent sequentially.
wsm_send_cmd() handles these requests.
This commit also add send_hif_cmd in debugfs. This file allows to send
arbitrary commands to chip. It can be used for debug and testing.
Once firmware is loaded, it send a first indication to host. This
indication signalize that host can start to communicate with firmware.
In add, it contains information about chip and firmware (MAC addresses,
firmware version, etc...).
bh_work() is in charge to schedule all HIF message from/to chip.
On normal operation, when an IRQ is received, driver can get size of
next message in control register. In order to save control register
access, when chip send a message, it also appends a copy of control
register after the message (this register is not accounted in message
length declared in message header, but must accounted in bus request).
This copy of control register is called "piggyback".
It also handles a power saving mechanism specific to WFxxx series. This
mechanism is based on a GPIO called "wakeup" GPIO. Obviously, this gpio
is not part of SPI/SDIO standard buses and must be declared
independently (this is the main reason for why SDIO mode try to get
parameters from DT).
When wakeup is enabled, host can communicate with chip only if it is
awake. To wake up chip, there are two cases:
- host receive an IRQ from chip (chip initiate communication): host
just have to set wakeup GPIO before reading data
- host want to send data to chip: host set wakeup GPIO, then wait
for an IRQ (in fact, wait for an empty message) and finally send data
bh_work() is also in charge to track usage of chip buffers. Normally
each request expect a confirmation. However, you can notice that special
"multi tx" confirmation can acknowledge multiple requests at time.
Finally, note that wfx_bh_request_rx() is not atomic (because of
control_reg_read()). So, in SPI mode, hard-irq handler only postpone all
processing to wfx_spi_request_rx().
These files are shared with firmware sources. Only a subset of these
definitions are used by driver but, for now, it is easier to import all.
API defines 3 kinds of messages:
- Requests (req) are sent from host to chip
- Confirmations (cnf) are sent by chip and are always in reply to a
request
- Indications (ind) are spontaneous message from chip to host
One request normally generate one confirmation. There are a few
exceptions to this rule:
- "shutdown" request is not acknowledged
- multiple tx request can be acknowledged a unique "multi-tx"
confirmation
In add, API defines MIB. They are sub-structures for write_mib and
read_mib API.
Note that all numbers in API have to be little endian when sent/received
from/to chip (I didn't declared them with __le32 because driver also use
them internally).
SDIO interface has two particularities:
1. Some parameters may be useful for end user (I will talk about
gpio_wakeup later).
2. The SDIO VID and PID of WF200 are 0000:0001 which are too much
generic to rely on.
So, current code checks VID/PID and looks for a node in DT (since WF200
targets embedded platforms, I don't think it is a problem to rely on
DT). DT can also be used to define to parameters for driver. Currently,
if no node is found, a warning is emitted, but it could be changed in
error.
staging: exfat: explain the fs_sync() issue in TODO
We've seen several incorrect patches for fs_sync() calls in the exfat driver.
Add code to the TODO that explains this isn't just a delete code and refactor,
but that actual analysis of when the filesystem should be flushed to disk
needs to be done.
The majority of them were totally backwards. Change the logic
so that if DELAYED_SYNC *isn't* in the config, we actually flush to disk
before flagging the file system as clean.
That leaves two calls in the DELAYED_SYNC case. More detailed
analysis is needed to make sure that's what's really needed, or if other
call sites also need a fs_sync() call. This patch is at least "less wrong"
than the code was, but further changes should be another patch.
staging: wilc1000: look for rtc_clk clock in spi mode
If rtc_clk is provided from DT, use it and enable it.
This is optional.
The signal may be hardcoded and no need to be requested,
but if DT provides it, use it.
staging: wilc1000: use RCU list to maintain vif interfaces list
Make use of RCU list to maintain virtual interfaces instead of an array.
The update operation on 'vif' list is less compare to the read
operations. Mostly the 'vif' list elements are accessed for the read
operation, so RCU list is more suited for this requirement.
The shifting of interface index id's during the delete interface is not
required. As the firmware only supports 2 interfaces so make use of
available free slot index id during add interface.
staging: wilc1000: move wlan_deinit_locks() in wilc_netdev_cleanup()
Move deinitialization of lock during the module remove and the
initialization of lock wilc_cfg80211_init(). This to ensure locks are
available during module load and gets free during unload.
staging: wilc1000: remove unnecessary netdev validation check in del_key()
Removed unnecessary check to compare vif interface with zeroth index
element in vif array. Already the caller takes care of passing the
appropriate netdev handler during the del key operation.
Inside a nested 'else' block at the beginning of this function is a
call that assigns 'psta' to the return value of 'rtw_get_stainfo()'.
If 'rtw_get_stainfo()' returns NULL and the flow of control reaches
the 'else if' where 'psta' is dereferenced, then we will dereference
a NULL pointer.
Fix this by checking if 'psta' is not NULL before reading its
'psta->qos_option' data member.
staging: rtl8188eu: remove dead code/vestigial do..while loop
The local variable 'bcmd_down' is always set to true almost immediately
before the do-while's condition is checked. As a result, !bcmd_down
evaluates to false which short circuits the logical AND operator meaning
that the second operand is never reached and is therefore dead code.
Furthermore, the do..while loop may be removed since it will always only
execute once because 'bcmd_down' is always set to true, so the
!bcmd_down evaluates to false and the loop exits immediately after the
first pass.
Fix this by removing the loop and its condition variables 'bcmd_down'
and 'retry_cnts'
While we're in there, also fix some checkpatch.pl suggestions regarding
spaces around arithmetic operators like '+'
Kefeng Wang [Fri, 20 Sep 2019 06:25:33 +0000 (14:25 +0800)]
staging: Use pr_warn instead of pr_warning
As said in commit f2c2cbcc35d4 ("powerpc: Use pr_warn instead of
pr_warning"), removing pr_warning so all logging messages use a
consistent <prefix>_warn style. Let's do it.
When the number of bytes to be printed exceeds the limit snprintf
returns the number of bytes that would have been printed (if there was
no truncation). This might cause issues, hence use scnprintf which
returns the actual number of bytes printed to buffer always.
Jerry Lin [Fri, 6 Sep 2019 01:06:14 +0000 (09:06 +0800)]
staging: olpc_dcon: allow simultaneous XO-1 and XO-1.5 support
This patch remove model related configuration.
Since the module can decide which platform data to use itself base on
current running olpc board.
Also change module dependency from (GPIO_CS5535 || GPIO_CS5535=n)
to (GPIO_CS5535 || ACPI) because original one does not make any sense
and module only doing real work when GPIO_CS5535 or ACPI is setted.
staging: rtl8192u: Remove unnecessary line-breaks in function signatures
This patch fixes the function signatures for rtl8192_handle_assoc_response,
rtl8192_record_rxdesc_forlateruse, rtl819xusb_process_received_packet
and other relevant code blocks to avoid the checkpatch.pl warning:
staging: rtl8192u: ieee80211: Replace snprintf with scnprintf
When the number of bytes to be printed exceeds the limit snprintf
returns the number of bytes that would have been printed (if there was
no truncation). This might cause issues, hence use scnprintf which
returns the actual number of bytes printed to buffer always.
Merge tag 'for-5.4-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A bunch of fixes that accumulated in recent weeks, mostly material for
stable.
Summary:
- fix for regression from 5.3 that prevents to use balance convert
with single profile
- qgroup fixes: rescan race, accounting leak with multiple writers,
potential leak after io failure recovery
- fix for use after free in relocation (reported by KASAN)
- other error handling fixups"
* tag 'for-5.4-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: qgroup: Fix reserved data space leak if we have multiple reserve calls
btrfs: qgroup: Fix the wrong target io_tree when freeing reserved data space
btrfs: Fix a regression which we can't convert to SINGLE profile
btrfs: relocation: fix use-after-free on dead relocation roots
Btrfs: fix race setting up and completing qgroup rescan workers
Btrfs: fix missing error return if writeback for extent buffer never started
btrfs: adjust dirty_metadata_bytes after writeback failure of extent buffer
Btrfs: fix selftests failure due to uninitialized i_mode in test inodes
Merge tag 'trace-v5.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"A few more tracing fixes:
- Fix a buffer overflow by checking nr_args correctly in probes
- Fix a warning that is reported by clang
- Fix a possible memory leak in error path of filter processing
- Fix the selftest that checks for failures, but wasn't failing
- Minor clean up on call site output of a memory trace event"
* tag 'trace-v5.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
selftests/ftrace: Fix same probe error test
mm, tracing: Print symbol name for call_site in trace events
tracing: Have error path in predicate_parse() free its allocated memory
tracing: Fix clang -Wint-in-bool-context warnings in IF_ASSIGN macro
tracing/probe: Fix to check the difference of nr_args before adding probe
Merge tag 'mmc-v5.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull more MMC updates from Ulf Hansson:
"A couple more updates/fixes for MMC:
- sdhci-pci: Add Genesys Logic GL975x support
- sdhci-tegra: Recover loss in throughput for DMA
- sdhci-of-esdhc: Fix DMA bug"
* tag 'mmc-v5.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: host: sdhci-pci: Add Genesys Logic GL975x support
mmc: tegra: Implement ->set_dma_mask()
mmc: sdhci: Let drivers define their DMA mask
mmc: sdhci-of-esdhc: set DMA snooping based on DMA coherence
mmc: sdhci: improve ADMA error reporting
csky: Move static keyword to the front of declaration
Move the static keyword to the front of declaration of
csky_pmu_of_device_ids, and resolve the following compiler
warning that can be seen when building with warnings
enabled (W=1):
arch/csky/kernel/perf_event.c:1340:1: warning:
‘static’ is not at beginning of declaration [-Wold-style-declaration]
Signed-off-by: Krzysztof Wilczynski <kw@linux.com> Signed-off-by: Guo Ren <guoren@kernel.org>
Merge tag 'char-misc-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull Documentation/process update from Greg KH:
"Here are two small Documentation/process/embargoed-hardware-issues.rst
file updates that missed my previous char/misc pull request.
The first one adds an Intel representative for the process, and the
second one cleans up the text a bit more when it comes to how the
disclosure rules work, as it was a bit confusing to some companies"
* tag 'char-misc-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
Documentation/process: Clarify disclosure rules
Documentation/process: Volunteer as the ambassador for Intel
Merge tag '5.4-rc-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull more cifs updates from Steve French:
"Fixes from the recent SMB3 Test events and Storage Developer
Conference (held the last two weeks).
Here are nine smb3 patches including an important patch for debugging
traces with wireshark, with three patches marked for stable.
Additional fixes from last week to better handle some newly discovered
reparse points, and a fix the create/mkdir path for setting the mode
more atomically (in SMB3 Create security descriptor context), and one
for path name processing are still being tested so are not included
here"
* tag '5.4-rc-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
CIFS: Fix oplock handling for SMB 2.1+ protocols
smb3: missing ACL related flags
smb3: pass mode bits into create calls
smb3: Add missing reparse tags
CIFS: fix max ea value size
fs/cifs/sess.c: Remove set but not used variable 'capabilities'
fs/cifs/smb2pdu.c: Make SMB2_notify_init static
smb3: fix leak in "open on server" perf counter
smb3: allow decryption keys to be dumped by admin for debugging
Mao Han [Wed, 25 Sep 2019 09:23:02 +0000 (17:23 +0800)]
csky: Fixup csky_pmu.max_period assignment
The csky_pmu.max_period has type u64, and BIT() can only return
32 bits unsigned long on C-SKY. The initialization for max_period
will be incorrect when count_width is bigger than 32.
Use BIT_ULL()
Signed-off-by: Mao Han <han_mao@c-sky.com> Signed-off-by: Guo Ren <ren_guo@c-sky.com>
This is admittedly partly "for discussion". We need to have a way
forward for the boot time deadlocks where user space ends up waiting for
more entropy, but no entropy is forthcoming because the system is
entirely idle just waiting for something to happen.
While this was triggered by what is arguably a user space bug with
GDM/gnome-session asking for secure randomness during early boot, when
they didn't even need any such truly secure thing, the issue ends up
being that our "getrandom()" interface is prone to that kind of
confusion, because people don't think very hard about whether they want
to block for sufficient amounts of entropy.
The approach here-in is to decide to not just passively wait for entropy
to happen, but to start actively collecting it if it is missing. This
is not necessarily always possible, but if the architecture has a CPU
cycle counter, there is a fair amount of noise in the exact timings of
reasonably complex loads.
We may end up tweaking the load and the entropy estimates, but this
should be at least a reasonable starting point.
As part of this, we also revert the revert of the ext4 IO pattern
improvement that ended up triggering the reported lack of external
entropy.
* getrandom() active entropy waiting:
Revert "Revert "ext4: make __ext4_get_inode_loc plug""
random: try to actively add entropy rather than passively wait for it
Instead of waiting forever for entropy that may just not happen, we now
try to actively generate entropy when required, and are thus hopefully
avoiding the problem that caused the nice ext4 IO pattern fix to be
reverted.
So revert the revert.
Cc: Ahmed S. Darwish <darwish.07@gmail.com> Cc: Ted Ts'o <tytso@mit.edu> Cc: Willy Tarreau <w@1wt.eu> Cc: Alexander E. Patrakov <patrakov@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
random: try to actively add entropy rather than passively wait for it
For 5.3 we had to revert a nice ext4 IO pattern improvement, because it
caused a bootup regression due to lack of entropy at bootup together
with arguably broken user space that was asking for secure random
numbers when it really didn't need to.
See commit 72dbcf721566 (Revert "ext4: make __ext4_get_inode_loc plug").
This aims to solve the issue by actively generating entropy noise using
the CPU cycle counter when waiting for the random number generator to
initialize. This only works when you have a high-frequency time stamp
counter available, but that's the case on all modern x86 CPU's, and on
most other modern CPU's too.
What we do is to generate jitter entropy from the CPU cycle counter
under a somewhat complex load: calling the scheduler while also
guaranteeing a certain amount of timing noise by also triggering a
timer.
I'm sure we can tweak this, and that people will want to look at other
alternatives, but there's been a number of papers written on jitter
entropy, and this should really be fairly conservative by crediting one
bit of entropy for every timer-induced jump in the cycle counter. Not
because the timer itself would be all that unpredictable, but because
the interaction between the timer and the loop is going to be.
Even if (and perhaps particularly if) the timer actually happens on
another CPU, the cacheline interaction between the loop that reads the
cycle counter and the timer itself firing is going to add perturbations
to the cycle counter values that get mixed into the entropy pool.
As Thomas pointed out, with a modern out-of-order CPU, even quite simple
loops show a fair amount of hard-to-predict timing variability even in
the absense of external interrupts. But this tries to take that further
by actually having a fairly complex interaction.
This is not going to solve the entropy issue for architectures that have
no CPU cycle counter, but it's not clear how (and if) that is solvable,
and the hardware in question is largely starting to be irrelevant. And
by doing this we can at least avoid some of the even more contentious
approaches (like making the entropy waiting time out in order to avoid
the possibly unbounded waiting).
Cc: Ahmed Darwish <darwish.07@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Nicholas Mc Guire <hofrat@opentech.at> Cc: Andy Lutomirski <luto@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Willy Tarreau <w@1wt.eu> Cc: Alexander E. Patrakov <patrakov@gmail.com> Cc: Lennart Poettering <mzxreary@0pointer.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Olof Johansson [Sun, 29 Sep 2019 18:19:25 +0000 (11:19 -0700)]
Merge tag 'fixes-5.4-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into arm/fixes
Fixes for omap variants
Few fixes for ti-sysc interconnect target module driver for no-idle
quirks that caused nfsroot to fail on some dra7 boards.
And let's fixes to get LCD working again for logicpd board that got
broken a while back with removal of panel-dpi driver. We need to now
use generic CONFIG_DRM_PANEL_SIMPLE instead.
* tag 'fixes-5.4-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
bus: ti-sysc: Remove unpaired sysc_clkdm_deny_idle()
ARM: dts: logicpd-som-lv: Fix i2c2 and i2c3 Pin mux
ARM: dts: am3517-evm: Fix missing video
ARM: dts: logicpd-torpedo-baseboard: Fix missing video
ARM: omap2plus_defconfig: Fix missing video
bus: ti-sysc: Fix handling of invalid clocks
bus: ti-sysc: Fix clock handling for no-idle quirks
Olof Johansson [Sun, 29 Sep 2019 18:19:18 +0000 (11:19 -0700)]
Merge tag 'scmi-fixes-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux into arm/fixes
ARM SCMI fixes for v5.4
Couple of fixes: one in scmi reset driver initialising missed scmi handle
and an other in scmi reset API implementation fixing the assignment of
reset state
* tag 'scmi-fixes-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux:
reset: reset-scmi: add missing handle initialisation
firmware: arm_scmi: reset: fix reset_state assignment in scmi_domain_reset
Merge tag 'libnvdimm-fixes-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
More libnvdimm updates from Dan Williams:
- Complete the reworks to interoperate with powerpc dynamic huge page
sizes
- Fix a crash due to missed accounting for the powerpc 'struct
page'-memmap mapping granularity
- Fix badblock initialization for volatile (DRAM emulated) pmem ranges
- Stop triggering request_key() notifications to userspace when
NVDIMM-security is disabled / not present
- Miscellaneous small fixups
* tag 'libnvdimm-fixes-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
libnvdimm/region: Enable MAP_SYNC for volatile regions
libnvdimm: prevent nvdimm from requesting key when security is disabled
libnvdimm/region: Initialize bad block for volatile namespaces
libnvdimm/nfit_test: Fix acpi_handle redefinition
libnvdimm/altmap: Track namespace boundaries in altmap
libnvdimm: Fix endian conversion issues
libnvdimm/dax: Pick the right alignment default when creating dax devices
powerpc/book3s64: Export has_transparent_hugepage() related functions.
Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal
Pull thermal SoC updates from Eduardo Valentin:
"This is a really small pull in the midst of a lot of pending patches.
We are in the middle of restructuring how we are maintaining the
thermal subsystem, as per discussion in our last LPC. For now, I am
sending just some changes that were pending in my tree. Looking
forward to get a more streamlined process in the next merge window"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal:
thermal: db8500: Rewrite to be a pure OF sensor
thermal: db8500: Use dev helper variable
thermal: db8500: Finalize device tree conversion
thermal: thermal_mmio: remove some dead code
Merge branch 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull more i2c updates from Wolfram Sang:
- make Lenovo Yoga C630 boot now that the dependencies are merged
- restore BlockProcessCall for i801, accidently removed in this merge
window
- a bugfix for the riic driver
- an improvement to the slave-eeprom driver which should have been in
the first pull request but sadly got lost in the process
* 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: slave-eeprom: Add read only mode
i2c: i801: Bring back Block Process Call support for certain platforms
i2c: riic: Clear NACK in tend isr
i2c: qcom-geni: Disable DMA processing on the Lenovo Yoga C630
Merge tag 'iommu-fixes-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu fixes from Joerg Roedel:
"A couple of fixes for the AMD IOMMU driver have piled up:
- Some fixes for the reworked IO page-table which caused memory leaks
or did not allow to downgrade mappings under some conditions.
- Locking fixes to fix a couple of possible races around accessing
'struct protection_domain'. The races got introduced when the
dma-ops path became lock-less in the fast-path"
* tag 'iommu-fixes-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/amd: Lock code paths traversing protection_domain->dev_list
iommu/amd: Lock dev_data in attach/detach code paths
iommu/amd: Check for busy devices earlier in attach_device()
iommu/amd: Take domain->lock for complete attach/detach path
iommu/amd: Remove amd_iommu_devtable_lock
iommu/amd: Remove domain->updated
iommu/amd: Wait for completion of IOTLB flush in attach_device
iommu/amd: Unmap all L7 PTEs when downgrading page-sizes
iommu/amd: Introduce first_pte_l7() helper
iommu/amd: Fix downgrading default page-sizes in alloc_pte()
iommu/amd: Fix pages leak in free_pagetable()
Thomas Gleixner [Wed, 25 Sep 2019 08:29:49 +0000 (10:29 +0200)]
Documentation/process: Clarify disclosure rules
The role of the contact list provided by the disclosing party and how it
affects the disclosure process and the ability to include experts into
the development process is not really well explained.
Neither is it entirely clear when the disclosing party will be informed
about the fact that a developer who is not covered by an employer NDA needs
to be brought in and disclosed.
Explain the role of the contact list and the information policy along with
an eventual conflict resolution better.
1) Sanity check URB networking device parameters to avoid divide by
zero, from Oliver Neukum.
2) Disable global multicast filter in NCSI, otherwise LLDP and IPV6
don't work properly. Longer term this needs a better fix tho. From
Vijay Khemka.
3) Small fixes to selftests (use ping when ping6 is not present, etc.)
from David Ahern.
4) Bring back rt_uses_gateway member of struct rtable, it's semantics
were not well understood and trying to remove it broke things. From
David Ahern.
5) Move usbnet snaity checking, ignore endpoints with invalid
wMaxPacketSize. From Bjørn Mork.
6) Missing Kconfig deps for sja1105 driver, from Mao Wenan.
7) Various small fixes to the mlx5 DR steering code, from Alaa Hleihel,
Alex Vesker, and Yevgeny Kliteynik
8) Missing CAP_NET_RAW checks in various places, from Ori Nimron.
9) Fix crash when removing sch_cbs entry while offloading is enabled,
from Vinicius Costa Gomes.
10) Signedness bug fixes, generally in looking at the result given by
of_get_phy_mode() and friends. From Dan Crapenter.
11) Disable preemption around BPF_PROG_RUN() calls, from Eric Dumazet.
12) Don't create VRF ipv6 rules if ipv6 is disabled, from David Ahern.
13) Fix quantization code in tcp_bbr, from Kevin Yang.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (127 commits)
net: tap: clean up an indentation issue
nfp: abm: fix memory leak in nfp_abm_u32_knode_replace
tcp: better handle TCP_USER_TIMEOUT in SYN_SENT state
sk_buff: drop all skb extensions on free and skb scrubbing
tcp_bbr: fix quantization code to not raise cwnd if not probing bandwidth
mlxsw: spectrum_flower: Fail in case user specifies multiple mirror actions
Documentation: Clarify trap's description
mlxsw: spectrum: Clear VLAN filters during port initialization
net: ena: clean up indentation issue
NFC: st95hf: clean up indentation issue
net: phy: micrel: add Asym Pause workaround for KSZ9021
net: socionext: ave: Avoid using netdev_err() before calling register_netdev()
ptp: correctly disable flags on old ioctls
lib: dimlib: fix help text typos
net: dsa: microchip: Always set regmap stride to 1
nfp: flower: fix memory leak in nfp_flower_spawn_vnic_reprs
nfp: flower: prevent memory leak in nfp_flower_spawn_phy_reprs
net/sched: Set default of CONFIG_NET_TC_SKB_EXT to N
vrf: Do not attempt to create IPv6 mcast rule if IPv6 is disabled
net: sched: sch_sfb: don't call qdisc_put() while holding tree lock
...
Merge branch 'hugepage-fallbacks' (hugepatch patches from David Rientjes)
Merge hugepage allocation updates from David Rientjes:
"We (mostly Linus, Andrea, and myself) have been discussing offlist how
to implement a sane default allocation strategy for hugepages on NUMA
platforms.
With these reverts in place, the page allocator will happily allocate
a remote hugepage immediately rather than try to make a local hugepage
available. This incurs a substantial performance degradation when
memory compaction would have otherwise made a local hugepage
available.
This series reverts those reverts and attempts to propose a more sane
default allocation strategy specifically for hugepages. Andrea
acknowledges this is likely to fix the swap storms that he originally
reported that resulted in the patches that removed __GFP_THISNODE from
hugepage allocations.
The immediate goal is to return 5.3 to the behavior the kernel has
implemented over the past several years so that remote hugepages are
not immediately allocated when local hugepages could have been made
available because the increased access latency is untenable.
The next goal is to introduce a sane default allocation strategy for
hugepages allocations in general regardless of the configuration of
the system so that we prevent thrashing of local memory when
compaction is unlikely to succeed and can prefer remote hugepages over
remote native pages when the local node is low on memory."
Note on timing: this reverts the hugepage VM behavior changes that got
introduced fairly late in the 5.3 cycle, and that fixed a huge
performance regression for certain loads that had been around since
4.18.
Andrea had this note:
"The regression of 4.18 was that it was taking hours to start a VM
where 3.10 was only taking a few seconds, I reported all the details
on lkml when it was finally tracked down in August 2018.
__GFP_THISNODE in MADV_HUGEPAGE made the above enterprise vfio
workload degrade like in the "current upstream" above. And it still
would have been that bad as above until 5.3-rc5"
where the bad behavior ends up happening as you fill up a local node,
and without that change, you'd get into the nasty swap storm behavior
due to compaction working overtime to make room for more memory on the
nodes.
As a result 5.3 got the two performance fix reverts in rc5.
However, David Rientjes then noted that those performance fixes in turn
regressed performance for other loads - although not quite to the same
degree. He suggested reverting the reverts and instead replacing them
with two small changes to how hugepage allocations are done (patch
descriptions rephrased by me):
- "avoid expensive reclaim when compaction may not succeed": just admit
that the allocation failed when you're trying to allocate a huge-page
and compaction wasn't successful.
- "allow hugepage fallback to remote nodes when madvised": when that
node-local huge-page allocation failed, retry without forcing the
local node.
but by then I judged it too late to replace the fixes for a 5.3 release.
So 5.3 was released with behavior that harked back to the pre-4.18 logic.
But now we're in the merge window for 5.4, and we can see if this
alternate model fixes not just the horrendous swap storm behavior, but
also restores the performance regression that the late reverts caused.
Fingers crossed.
* emailed patches from David Rientjes <rientjes@google.com>:
mm, page_alloc: allow hugepage fallback to remote nodes when madvised
mm, page_alloc: avoid expensive reclaim when compaction may not succeed
Revert "Revert "Revert "mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask""
Revert "Revert "mm, thp: restore node-local hugepage allocations""