Manish Narani [Wed, 20 Nov 2019 06:47:26 +0000 (12:17 +0530)]
mmc: sdhci-of-arasan: Add support to set clock phase delays for SD
Add support to read Clock Phase Delays from the DT and set it via
clk_set_phase() API from clock framework. Some of the controllers might
have their own handling of setting clock delays, for this keep the
set_clk_delays as function pointer which can be assigned controller
specific handling of the same.
Manish Narani [Wed, 20 Nov 2019 06:47:24 +0000 (12:17 +0530)]
mmc: sdhci-of-arasan: Add sampling clock for a phy to use
There are some operations like setting the clock delays may need to have
two clocks, one for output path and one for input path. Adding input
path clock for some phys to use.
Faiz Abbas [Mon, 18 Nov 2019 07:36:09 +0000 (13:06 +0530)]
mmc: sdhci_am654: Add Support for Command Queuing Engine to J721E
Add Support for CQHCI (Command Queuing Host Controller Interface)
for each of the host controllers present in TI's J721E devices.
Add cqhci_ops and a .irq() callback to handle cqhci specific interrupts.
Signed-off-by: Faiz Abbas <faiz_abbas@ti.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Bradley Bolen [Sun, 17 Nov 2019 01:00:45 +0000 (20:00 -0500)]
mmc: core: Fix size overflow for mmc partitions
With large eMMC cards, it is possible to create general purpose
partitions that are bigger than 4GB. The size member of the mmc_part
struct is only an unsigned int which overflows for gp partitions larger
than 4GB. Change this to a u64 to handle the overflow.
Eugeniu Rosca [Fri, 15 Nov 2019 13:44:30 +0000 (14:44 +0100)]
mmc: tmio: Add MMC_CAP_ERASE to allow erase/discard/trim requests
Isolated initially to renesas_sdhi_internal_dmac [1], Ulf suggested
adding MMC_CAP_ERASE to the TMIO mmc core:
On Fri, Nov 15, 2019 at 10:27:25AM +0100, Ulf Hansson wrote:
-- snip --
This test and due to the discussions with Wolfram and you in this
thread, I would actually suggest that you enable MMC_CAP_ERASE for all
tmio variants, rather than just for this particular one.
In other words, set the cap in tmio_mmc_host_probe() should be fine,
as it seems none of the tmio variants supports HW busy detection at
this point.
-- snip --
Testing on R-Car H3ULCB-KF doesn't reveal any issues (v5.4-rc7):
root@rcar-gen3:~# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
mmcblk0 179:0 0 59.2G 0 disk <--- eMMC
mmcblk0boot0 179:8 0 4M 1 disk
mmcblk0boot1 179:16 0 4M 1 disk
mmcblk1 179:24 0 30G 0 disk <--- SD card
root@rcar-gen3:~# time blkdiscard /dev/mmcblk0
real 0m8.659s
user 0m0.001s
sys 0m1.920s
root@rcar-gen3:~# time blkdiscard /dev/mmcblk1
real 0m1.176s
user 0m0.001s
sys 0m0.124s
net: wireless: ti: remove local VENDOR_ID and DEVICE_ID definitions
They are already included from mmc/sdio_ids.h and do not need
a local definition.
Fixes: 884f38607897 ("mmc: core: move some sdio IDs out of quirks file") Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com> Acked-by: Kalle Valo <kvalo@codeaurora.org> Cc: <stable@vger.kernel.org> # v4.11+ Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
net: wireless: ti: wl1251 use new SDIO_VENDOR_ID_TI_WL1251 definition
SDIO_VENDOR_ID_TI_WL1251 is now defined in mmc/sdio_ids.h separately
from SDIO_VENDOR_ID_TI for wl1271.
Fixes: 884f38607897 ("mmc: core: move some sdio IDs out of quirks file") Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com> Acked-by: Kalle Valo <kvalo@codeaurora.org> Cc: <stable@vger.kernel.org> # v4.11+ Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
v4.11-rc1 did introduce a patch series that rearranged the
sdio quirks into a header file. Unfortunately this did forget
to handle SDIO_VENDOR_ID_TI differently between wl1251 and
wl1271 with the result that although the wl1251 was found on
the sdio bus, the firmware did not load any more and there was
no interface registration.
This patch defines separate constants to be used by sdio quirks
and drivers.
Fixes: 884f38607897 ("mmc: core: move some sdio IDs out of quirks file") Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com> Cc: <stable@vger.kernel.org> # v4.11+ Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
omap: pdata-quirks: remove openpandora quirks for mmc3 and wl1251
With a wl1251 child node of mmc3 in the device tree decoded
in omap_hsmmc.c to handle special wl1251 initialization, we do
no longer need to instantiate the mmc3 through pdata quirks.
We also can remove the wlan regulator and reset/interrupt definitions
and do them through device tree.
Fixes: 81eef6ca9201 ("mmc: omap_hsmmc: Use dma_request_chan() for requesting DMA channel") Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com> Cc: <stable@vger.kernel.org> # v4.7+ Acked-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
omap: pdata-quirks: revert pandora specific gpiod additions
This partly reverts the commit efdfeb079cc3 ("regulator: fixed: Convert to
use GPIO descriptor only").
We must remove this from mainline first, so that the following patch
to remove the openpandora quirks for mmc3 and wl1251 cleanly applies
to stable v4.9, v4.14, v4.19 where the above mentioned patch is not yet
present.
Since the code affected is removed (no pandora gpios in pdata-quirks
and more), there will be no matching revert-of-the-revert.
Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com> Acked-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
mmc: host: omap_hsmmc: add code for special init of wl1251 to get rid of pandora_wl1251_init_card
Pandora_wl1251_init_card was used to do special pdata based
setup of the sdio mmc interface. This does no longer work with
v4.7 and later. A fix requires a device tree based mmc3 setup.
Therefore we move the special setup to omap_hsmmc.c instead
of calling some pdata supplied init_card function.
The new code checks for a DT child node compatible to wl1251
so it will not affect other MMC3 use cases.
Generally, this code was and still is a hack and should be
moved to mmc core to e.g. read such properties from optional
DT child nodes.
Fixes: 81eef6ca9201 ("mmc: omap_hsmmc: Use dma_request_chan() for requesting DMA channel") Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com> Cc: <stable@vger.kernel.org> # v4.7+
[Ulf: Fixed up some checkpatch complaints] Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
ARM: dts: pandora-common: define wl1251 as child node of mmc3
Since v4.7 the dma initialization requires that there is a
device tree property for "rx" and "tx" channels which is
not provided by the pdata-quirks initialization.
By conversion of the mmc3 setup to device tree this will
finally allows to remove the OpenPandora wlan specific omap3
data-quirks.
Fixes: 81eef6ca9201 ("mmc: omap_hsmmc: Use dma_request_chan() for requesting DMA channel") Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com> Cc: <stable@vger.kernel.org> # v4.7+ Acked-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
We will have the wl1251 defined as a child node of the mmc interface
and can read setup for gpios, interrupts and the ti,use-eeprom
property from there instead of pdata to be provided by pdata-quirks.
Fixes: 81eef6ca9201 ("mmc: omap_hsmmc: Use dma_request_chan() for requesting DMA channel") Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com> Acked-by: Kalle Valo <kvalo@codeaurora.org> Cc: <stable@vger.kernel.org> # v4.7+
[Ulf: Fixed up some complaints from checkpatch] Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Thierry Reding [Fri, 8 Nov 2019 16:09:00 +0000 (17:09 +0100)]
mmc: mmc_spi: Use proper debounce time for CD GPIO
According to the comment, board files used to specify 1 ms for the
debounce time. gpiod_set_debounce() needs the debounce time to be
specified in units of microseconds, so make sure to multiply the value
by 1000.
Note that, according to the git log, the board files actually did
specify 1 us for bounce times, but that seems really low. Device tree
bindings for this type of GPIO typically specify the debounce times in
milliseconds, so setting this default value to 1 ms seems like it would
be somewhat safer.
Ulf Hansson [Thu, 17 Oct 2019 13:25:36 +0000 (15:25 +0200)]
mmc: core: Re-work HW reset for SDIO cards
It have turned out that it's not a good idea to unconditionally do a power
cycle and then to re-initialize the SDIO card, as currently done through
mmc_hw_reset() -> mmc_sdio_hw_reset(). This because there may be multiple
SDIO func drivers probed, who also shares the same SDIO card.
To address these scenarios, one may be tempted to use a notification
mechanism, as to allow the core to inform each of the probed func drivers,
about an ongoing HW reset. However, supporting such an operation from the
func driver point of view, may not be entirely trivial.
Therefore, let's use a more simplistic approach to solve the problem, by
instead forcing the card to be removed and re-detected, via scheduling a
rescan-work. In this way, we can rely on existing infrastructure, as the
func driver's ->remove() and ->probe() callbacks, becomes invoked to deal
with the cleanup and the re-initialization.
This solution may be considered as rather heavy, especially if a func
driver doesn't share its card with other func drivers. To address this,
let's keep the current immediate HW reset option as well, but run it only
when there is one func driver probed for the card.
Finally, to allow the caller of mmc_hw_reset(), to understand if the reset
is being asynchronously managed from a scheduled work, it returns 1
(propagated from mmc_sdio_hw_reset()). If the HW reset is executed
successfully and synchronously it returns 0, which maintains the existing
behaviour.
Reviewed-by: Douglas Anderson <dianders@chromium.org> Tested-by: Douglas Anderson <dianders@chromium.org> Cc: stable@vger.kernel.org # v5.4+ Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Ulf Hansson [Thu, 10 Oct 2019 13:54:37 +0000 (15:54 +0200)]
mmc: core: Drop check for mmc_card_is_removable() in mmc_rescan()
Upfront in mmc_rescan() we use the host->rescan_entered flag, to allow
scanning only once for non-removable cards. Therefore, it's also not
possible that we can have a corresponding card bus attached (host->bus_ops
is NULL), when we are scanning non-removable cards.
For this reason, let' drop the check for mmc_card_is_removable() as it's
redundant.
Reviewed-by: Douglas Anderson <dianders@chromium.org> Tested-by: Douglas Anderson <dianders@chromium.org> Cc: stable@vger.kernel.org # v5.4+ Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Ulf Hansson [Fri, 25 Oct 2019 07:28:17 +0000 (09:28 +0200)]
mwifiex: Re-work support for SDIO HW reset
The SDIO HW reset procedure in mwifiex_sdio_card_reset_work() is broken,
when the SDIO card is shared with another SDIO func driver. This is the
case when the Bluetooth btmrvl driver is being used in combination with
mwifiex. More precisely, when mwifiex_sdio_card_reset_work() runs to resets
the SDIO card, the btmrvl driver doesn't get notified about it. Beyond that
point, the btmrvl driver will fail to communicate with the SDIO card.
This is a generic problem for SDIO func drivers sharing an SDIO card, which
are about to be addressed in subsequent changes to the mmc core and the
mmc_hw_reset() interface. In principle, these changes means the
mmc_hw_reset() interface starts to return 1 if the are multiple drivers for
the SDIO card, as to indicate to the caller that the reset needed to be
scheduled asynchronously through a hotplug mechanism of the SDIO card.
Let's prepare the mwifiex driver to support the upcoming new behaviour of
mmc_hw_reset(), which means extending the mwifiex_sdio_card_reset_work() to
support the asynchronous SDIO HW reset path. This also means, we need to
allow the ->remove() callback to run, without waiting for the FW to be
loaded. Additionally, during system suspend, mwifiex_sdio_suspend() may be
called when a reset has been scheduled, but waiting to be executed. In this
scenario let's simply return -EBUSY to abort the suspend process, as to
allow the reset to be completed first.
Reviewed-by: Douglas Anderson <dianders@chromium.org> Tested-by: Douglas Anderson <dianders@chromium.org> Cc: stable@vger.kernel.org # v5.4+ Acked-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Eugen Hristev [Thu, 14 Nov 2019 12:59:26 +0000 (12:59 +0000)]
mmc: sdhci-of-at91: fix quirk2 overwrite
The quirks2 are parsed and set (e.g. from DT) before the quirk for broken
HS200 is set in the driver.
The driver needs to enable just this flag, not rewrite the whole quirk set.
Fixes: 7871aa60ae00 ("mmc: sdhci-of-at91: add quirk for broken HS200") Signed-off-by: Eugen Hristev <eugen.hristev@microchip.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Ulf Hansson [Mon, 4 Nov 2019 11:33:43 +0000 (12:33 +0100)]
MAINTAINERS: Mark vub300 mmc driver as orphan
Tony's email address from elandigitalsystems.com has bounced for a long
time. Let's update MAINTAINERS to mark the driver as orphan as to reflect
the situation.
mmc: block: Add CMD13 polling for MMC IOCTLS with R1B response
MMC IOCTLS with R1B responses may cause the card to enter the busy state,
which means it's not ready to receive a new request. To prevent new
requests from being sent to the card, use a CMD13 polling loop to verify
that the card returns to the transfer state, before completing the request.
mmc: block: Make card_busy_detect() a bit more generic
To prepare for more users of card_busy_detect(), let's drop the struct
request * as an in-parameter and convert to log the error message via
dev_err() instead of pr_err().
Yangbo Lu [Wed, 9 Oct 2019 07:41:40 +0000 (15:41 +0800)]
mmc: sdhci-of-esdhc: fix up erratum A-008171 workaround
A previous patch implemented an incomplete workaround of erratum
A-008171. The complete workaround is as below. This patch is to
implement the complete workaround which uses SW tuning if HW tuning
fails, and retries both HW/SW tuning once with reduced clock if
workaround fails. This is suggested by hardware team, and the patch
had been verified on LS1046A eSDHC + Phison 32G eMMC which could
trigger the erratum.
Workaround:
/* For T1040, T2080, LS1021A, T1023 Rev 1: */
1. Program TBPTR[TB_WNDW_END_PTR] = 3*DIV_RATIO.
2. Program TBPTR[TB_WNDW_START_PTR] = 5*DIV_RATIO.
3. Program the software tuning mode by setting TBCTL[TB_MODE] = 2'h3.
4. Set SYSCTL2[EXTN] and SYSCTL2[SAMPCLKSEL].
5. Issue SEND_TUNING_BLK Command (CMD19 for SD, CMD21 for MMC).
6. Wait for IRQSTAT[BRR], buffer read ready, to be set.
7. Clear IRQSTAT[BRR].
8. Check SYSCTL2[EXTN] to be cleared.
9. Check SYSCTL2[SAMPCLKSEL], Sampling Clock Select. It's set value
indicate tuning procedure success, and clear indicate failure.
In case of tuning failure, fixed sampling scheme could be used by
clearing TBCTL[TB_EN].
/* For LS1080A Rev 1, LS2088A Rev 1.0, LA1575A Rev 1.0: */
1. Read the TBCTL[31:0] register. Write TBCTL[11:8]=4'h8 and wait for
1ms.
2. Read the TBCTL[31:0] register and rewrite again. Wait for 1ms second.
3. Read the TBSTAT[31:0] register twice.
3.1 Reset data lines by setting ESDHCCTL[RSTD] bit.
3.2 Check ESDHCCTL[RSTD] bit.
3.3 If ESDHCCTL[RSTD] is 0, go to step 3.4 else go to step 3.2.
3.4 Write 32'hFFFF_FFFF to IRQSTAT register.
4. if TBSTAT[15:8]-TBSTAT[7:0] > 4*DIV_RATIO or TBSTAT[7:0]-TBSTAT[15:8]
> 4*DIV_RATIO , then program TBPTR[TB_WNDW_END_PTR] = 4*DIV_RATIO and
program TBPTR[TB_WNDW_START_PTR] = 8*DIV_RATIO.
/* For LS1012A Rev1, LS1043A Rev 1.x, LS1046A 1.0: */
1. Read the TBCTL[0:31] register. Write TBCTL[20:23]=4'h8 and wait for
1ms.
2. Read the TBCTL[0:31] register and rewrite again. Wait for 1ms second.
3. Read the TBSTAT[0:31] register twice.
3.1 Reset data lines by setting ESDHCCTL[RSTD] bit.
3.2 Check ESDHCCTL[RSTD] bit.
3.3 If ESDHCCTL[RSTD] is 0, go to step 3.4 else go to step 3.2.
3.4 Write 32'hFFFF_FFFF to IRQSTAT register.
4. if TBSTAT[16:23]-TBSTAT[24:31] > 4*DIV_RATIO or TBSTAT[24:31]-
TBSTAT[16:23] > 4* DIV_RATIO , then program TBPTR[TB_WNDW_END_PTR] =
4*DIV_RATIO and program TBPTR[TB_WNDW_START_PTR] = 8*DIV_RATIO.
/* For LS1080A Rev 1, LS2088A Rev 1.0, LA1575A Rev 1.0 LS1012A Rev1,
* LS1043A Rev 1.x, LS1046A 1.0:
*/
5. else program TBPTR[TB_WNDW_END_PTR] = 3*DIV_RATIO and program
TBPTR[TB_WNDW_START_PTR] = 5*DIV_RATIO.
6. Program the software tuning mode by setting TBCTL[TB_MODE] = 2'h3.
7. Set SYSCTL2[EXTN], wait 1us and SYSCTL2[SAMPCLKSEL].
8. Issue SEND_TUNING_BLK Command (CMD19 for SD, CMD21 for MMC).
9. Wait for IRQSTAT[BRR], buffer read ready, to be set.
10. Clear IRQSTAT[BRR].
11. Check SYSCTL2[EXTN] to be cleared.
12. Check SYSCTL2[SAMPCLKSEL], Sampling Clock Select. It's set value
indicate tuning procedure success, and clear indicate failure.
In case of tuning failure, fixed sampling scheme could be used by
clearing TBCTL[TB_EN].
Nicolas Ferre [Tue, 8 Oct 2019 12:34:32 +0000 (14:34 +0200)]
mmc: sdhci-of-at91: add DT property to enable calibration on full reset
Add a property to keep the analog calibration cell powered.
This feature is specific to the Microchip SDHCI IP and outside
of the standard SDHCI register map.
By always keeping it on, after a full reset sequence, we make sure
that this feature is activated and not disabled.
We expose a hardware property to the DT as this feature can be used
to adapt SDHCI behavior vs. how the SDCAL SoC pin is connected
on the board.
Note that managing properly this property would reduce
power consumption on some SAMA5D2 SiP revisions.
Nicolas Ferre [Tue, 8 Oct 2019 12:34:31 +0000 (14:34 +0200)]
dt-bindings: sdhci-of-at91: add the microchip,sdcal-inverted property
Add the specific microchip,sdcal-inverted property to at91 sdhci
device binding.
This optional property describes how the SoC SDCAL pin is connected.
It could be handled at SiP, SoM or board level.
This property read by at91 sdhci driver will allow to put in place a
software workaround that would reduce power consumption.
Signed-off-by: Nicolas Ferre <nicolas.ferre@microchip.com> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Ludovic Barre [Tue, 8 Oct 2019 09:56:04 +0000 (11:56 +0200)]
mmc: mmci: sdmmc: add busy_complete callback
This patch adds a specific busy_complete callback for sdmmc variant.
sdmmc has 2 status flags:
-busyd0: This is a hardware status flag (inverted value of d0 line).
it does not generate an interrupt.
-busyd0end: This indicates only end of busy following a CMD response.
On busy to Not busy changes, an interrupt is generated (if unmask)
and BUSYD0END status flag is set. Status flag is cleared by writing
corresponding interrupt clear bit in MMCICLEAR.
The legacy busy completion has no dedicated interrupt for the end
of busy, so it's must monitor step by step the busy progression.
On sdmmc variant, this procedure is not needed, it's just need
to wait the busyd0end interrupt.
Ludovic Barre [Tue, 8 Oct 2019 09:56:02 +0000 (11:56 +0200)]
mmc: mmci: add hardware busy timeout feature
In the stm32_sdmmc variant, the datatimer is active not only during
data transfers with the DPSM, but also while waiting for the busyend
IRQs from commands having the MMC_RSP_BUSY flag set. This leads to an
incorrect IRQ being raised to signal MCI_DATATIMEOUT error, which
simply breaks the behaviour.
Address this by updating the datatimer value before sending a command
having the MMC_RSP_BUSY flag set. To inform the mmc core about the
maximum supported busy timeout, which also depends on the current
clock rate, set ->max_busy_timeout (in ms).
Ben Dooks [Wed, 25 Sep 2019 11:42:31 +0000 (12:42 +0100)]
mmc: mmci: make unexported functions static
Fix the following sparse warnings by making any functions not used
outsde the mmci.c driver static.
drivers/mmc/host/mmci.c:422:6: warning: symbol 'mmci_dma_release' was not declared. Should it be static?
drivers/mmc/host/mmci.c:430:6: warning: symbol 'mmci_dma_setup' was not declared. Should it be static?
drivers/mmc/host/mmci.c:465:5: warning: symbol 'mmci_prep_data' was not declared. Should it be static?
drivers/mmc/host/mmci.c:481:6: warning: symbol 'mmci_unprep_data' was not declared. Should it be static?
drivers/mmc/host/mmci.c:490:6: warning: symbol 'mmci_get_next_data' was not declared. Should it be static?
drivers/mmc/host/mmci.c:498:5: warning: symbol 'mmci_dma_start' was not declared. Should it be static?
drivers/mmc/host/mmci.c:533:6: warning: symbol 'mmci_dma_finalize' was not declared. Should it be static?
drivers/mmc/host/mmci.c:542:6: warning: symbol 'mmci_dma_error' was not declared. Should it be static?
drivers/mmc/host/mmci.c:951:6: warning: symbol 'mmci_variant_init' was not declared. Should it be static?
drivers/mmc/host/mmci.c:956:6: warning: symbol 'ux500v2_variant_init' was not declared. Should it be static?
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Wolfram Sang [Tue, 17 Sep 2019 18:36:52 +0000 (20:36 +0200)]
mmc: tmio: remove workaround for NON_REMOVABLE
PM has been reworked, so eMMC gets now detected on R-Car H3 ES1.0 and
2.0 as well as M3-N without the workaround. Card detect and write
protect also still work. Remove the workaround.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Add SD/MMC driver for Actions Semi Owl SoCs. This driver currently
supports standard, high speed, SDR12, SDR25 and SDR50. DDR50 mode is
supported but it is untested. There is no SDIO support for now.
SD Host controller on Milbeaut consists of two controller parts.
One is core controller F_SDH30, this is similar to sdhci-fujitsu
controller.
Another is bridge controller.
This bridge controller is not compatible with sdhci-fujitsu controller.
This is special for Milbeaut series. This has some functions.
For example, reset control, clock enable/select for SDR50/25/12, set
property of SD physical pins, retuning control, set capabilityies.
This bridge controller requires special procedures at reset or clock
enablement or change for further tuning of clock.
AMD SDHC 0x7906 requires a hard reset to clear all internal state.
Otherwise it can get into a bad state where the DATA lines are always
read as zeros.
This change requires firmware that can transition the device into
D3Cold for it to work correctly. If the firmware does not support
transitioning to D3Cold then the power state transitions are a no-op.
Signed-off-by: Raul E Rangel <rrangel@chromium.org> Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
In sdhci_do_reset we call the reset callback which is typically
sdhci_reset. sdhci_reset can wait for up to 100ms waiting for the
controller to reset. If SDHCI_RESET_ALL was passed as the flag, the
controller will clear the IRQ mask. If during that 100ms the card is
removed there is no notification to the MMC system that the card was
removed. So from the drivers point of view the card is always present.
By making sdhci_reinit compare the present state it can schedule a
rescan if the card was removed while a reset was in progress.
Signed-off-by: Raul E Rangel <rrangel@chromium.org> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Linus Torvalds [Sun, 10 Nov 2019 21:41:59 +0000 (13:41 -0800)]
Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Olof Johansson:
"A set of fixes that have trickled in over the last couple of weeks:
- MAINTAINER update for Cavium/Marvell ThunderX2
- stm32 tweaks to pinmux for Joystick/Camera, and RAM allocation for
CAN interfaces
- i.MX fixes for voltage regulator GPIO mappings, fixes voltage
scaling issues
- More i.MX fixes for various issues on i.MX eval boards: interrupt
storm due to u-boot leaving pins in new states, fixing power button
config, a couple of compatible-string corrections.
- Powerdown and Suspend/Resume fixes for Allwinner A83-based tablets
- A few documentation tweaks and a fix of a memory leak in the reset
subsystem"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
MAINTAINERS: update Cavium ThunderX2 maintainers
ARM: dts: stm32: change joystick pinctrl definition on stm32mp157c-ev1
ARM: dts: stm32: remove OV5640 pinctrl definition on stm32mp157c-ev1
ARM: dts: stm32: Fix CAN RAM mapping on stm32mp157c
ARM: dts: stm32: relax qspi pins slew-rate for stm32mp157
arm64: dts: zii-ultra: fix ARM regulator GPIO handle
ARM: sunxi: Fix CPU powerdown on A83T
ARM: dts: sun8i-a83t-tbs-a711: Fix WiFi resume from suspend
arm64: dts: imx8mn: fix compatible string for sdma
arm64: dts: imx8mm: fix compatible string for sdma
reset: fix reset_control_ops kerneldoc comment
ARM: dts: imx6-logicpd: Re-enable SNVS power key
soc: imx: gpc: fix initialiser format
ARM: dts: imx6qdl-sabreauto: Fix storm of accelerometer interrupts
arm64: dts: ls1028a: fix a compatible issue
reset: fix reset_control_get_exclusive kerneldoc comment
reset: fix reset_control_lookup kerneldoc comment
reset: fix of_reset_control_get_count kerneldoc comment
reset: fix of_reset_simple_xlate kerneldoc comment
reset: Fix memory leak in reset_control_array_put()
Linus Torvalds [Sun, 10 Nov 2019 21:29:12 +0000 (13:29 -0800)]
Merge tag 'staging-5.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull IIO fixes and staging driver from Greg KH:
"Here is a mix of a number of IIO driver fixes for 5.4-rc7, and a whole
new staging driver.
The IIO fixes resolve some reported issues, all are tiny.
The staging driver addition is the vboxsf filesystem, which is the
VirtualBox guest shared folder code. Hans has been trying to get
filesystem reviewers to review the code for many months now, and
Christoph finally said to just merge it in staging now as it is
stand-alone and the filesystem people can review it easier over time
that way.
I know it's late for this big of an addition, but it is stand-alone.
The code has been in linux-next for a while, long enough to pick up a
few tiny fixes for it already so people are looking at it.
All of these have been in linux-next with no reported issues"
* tag 'staging-5.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: Fix error return code in vboxsf_fill_super()
staging: vboxsf: fix dereference of pointer dentry before it is null checked
staging: vboxsf: Remove unused including <linux/version.h>
staging: Add VirtualBox guest shared folder (vboxsf) support
iio: adc: stm32-adc: fix stopping dma
iio: imu: inv_mpu6050: fix no data on MPU6050
iio: srf04: fix wrong limitation in distance measuring
iio: imu: adis16480: make sure provided frequency is positive
Linus Torvalds [Sun, 10 Nov 2019 21:14:48 +0000 (13:14 -0800)]
Merge tag 'char-misc-5.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are a number of late-arrival driver fixes for issues reported for
some char/misc drivers for 5.4-rc7
These all come from the different subsystem/driver maintainers as
things that they had reports for and wanted to see fixed.
All of these have been in linux-next with no reported issues"
* tag 'char-misc-5.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
intel_th: pci: Add Jasper Lake PCH support
intel_th: pci: Add Comet Lake PCH support
intel_th: msu: Fix possible memory leak in mode_store()
intel_th: msu: Fix overflow in shift of an unsigned int
intel_th: msu: Fix missing allocation failure check on a kstrndup
intel_th: msu: Fix an uninitialized mutex
intel_th: gth: Fix the window switching sequence
soundwire: slave: fix scanf format
soundwire: intel: fix intel_register_dai PDI offsets and numbers
interconnect: Add locking in icc_set_tag()
interconnect: qcom: Fix icc_onecell_data allocation
soundwire: depend on ACPI || OF
soundwire: depend on ACPI
thunderbolt: Drop unnecessary read when writing LC command in Ice Lake
thunderbolt: Fix lockdep circular locking depedency warning
thunderbolt: Read DP IN adapter first two dwords in one go
Linus Torvalds [Sun, 10 Nov 2019 20:07:47 +0000 (12:07 -0800)]
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"A small set of fixes for x86:
- Make the tsc=reliable/nowatchdog command line parameter work again.
It was broken with the introduction of the early TSC clocksource.
- Prevent the evaluation of exception stacks before they are set up.
This causes a crash in dumpstack because the stack walk termination
gets screwed up.
- Prevent a NULL pointer dereference in the rescource control file
system.
- Avoid bogus warnings about APIC id mismatch related to the LDR
which can happen when the LDR is not in use and therefore not
initialized. Only evaluate that when the APIC is in logical
destination mode"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/tsc: Respect tsc command line paraemeter for clocksource_tsc_early
x86/dumpstack/64: Don't evaluate exception stacks before setup
x86/apic/32: Avoid bogus LDR warnings
x86/resctrl: Prevent NULL pointer dereference when reading mondata
Linus Torvalds [Sun, 10 Nov 2019 20:03:58 +0000 (12:03 -0800)]
Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
"A small set of fixes for timekeepoing and clocksource drivers:
- VDSO data was updated conditional on the availability of a VDSO
capable clocksource. This causes the VDSO functions which do not
depend on a VDSO capable clocksource to operate on stale data.
Always update unconditionally.
- Prevent a double free in the mediatek driver
- Use the proper helper in the sh_mtu2 driver so it won't attempt to
initialize non-existing interrupts"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timekeeping/vsyscall: Update VDSO data unconditionally
clocksource/drivers/sh_mtu2: Do not loop using platform_get_irq_by_name()
clocksource/drivers/mediatek: Fix error handling
Linus Torvalds [Sun, 10 Nov 2019 20:00:47 +0000 (12:00 -0800)]
Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Thomas Gleixner:
"Two fixes for scheduler regressions:
- Plug a subtle race condition which was introduced with the rework
of the next task selection functionality. The change of task
properties became unprotected which can be observed inconsistently
causing state corruption.
- A trivial compile fix for CONFIG_CGROUPS=n"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched: Fix pick_next_task() vs 'change' pattern race
sched/core: Fix compilation error when cgroup not selected
Linus Torvalds [Sun, 10 Nov 2019 19:51:11 +0000 (11:51 -0800)]
Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixlet from Thomas Gleixner:
"A trivial fix for a kernel doc regression where an argument change was
not reflected in the documentation"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irq/irqdomain: Update __irq_domain_alloc_fwnode() function documentation
Linus Torvalds [Sat, 9 Nov 2019 16:51:37 +0000 (08:51 -0800)]
Merge tag 'for-5.4-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A few regressions and fixes for stable.
Regressions:
- fix a race leading to metadata space leak after task received a
signal
- un-deprecate 2 ioctls, marked as deprecated by mistake
Fixes:
- fix limit check for number of devices during chunk allocation
- fix a race due to double evaluation of i_size_read inside max()
macro, can cause a crash
- remove wrong device id check in tree-checker"
* tag 'for-5.4-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: un-deprecate ioctls START_SYNC and WAIT_SYNC
btrfs: save i_size to avoid double evaluation of i_size_read in compress_file_range
Btrfs: fix race leading to metadata space leak after task received signal
btrfs: tree-checker: Fix wrong check on max devid
btrfs: Consider system chunk array size for new SYSTEM chunks
Linus Torvalds [Sat, 9 Nov 2019 16:47:03 +0000 (08:47 -0800)]
Merge tag 'linux-watchdog-5.4-rc7' of git://www.linux-watchdog.org/linux-watchdog
Pull watchdog fixes from Wim Van Sebroeck:
- cpwd: fix build regression
- pm8916_wdt: fix pretimeout registration flow
- meson: Fix the wrong value of left time
- imx_sc_wdt: Pretimeout should follow SCU firmware format
- bd70528: Add MODULE_ALIAS to allow module auto loading
* tag 'linux-watchdog-5.4-rc7' of git://www.linux-watchdog.org/linux-watchdog:
watchdog: bd70528: Add MODULE_ALIAS to allow module auto loading
watchdog: imx_sc_wdt: Pretimeout should follow SCU firmware format
watchdog: meson: Fix the wrong value of left time
watchdog: pm8916_wdt: fix pretimeout registration flow
watchdog: cpwd: fix build regression
2) Fix powerpc bpf tail call implementation, from Eric Dumazet.
3) DCCP leaks jiffies on the wire, fix also from Eric Dumazet.
4) Fix crash in ebtables when using dnat target, from Florian Westphal.
5) Fix port disable handling whne removing bcm_sf2 driver, from Florian
Fainelli.
6) Fix kTLS sk_msg trim on fallback to copy mode, from Jakub Kicinski.
7) Various KCSAN fixes all over the networking, from Eric Dumazet.
8) Memory leaks in mlx5 driver, from Alex Vesker.
9) SMC interface refcounting fix, from Ursula Braun.
10) TSO descriptor handling fixes in stmmac driver, from Jose Abreu.
11) Add a TX lock to synchonize the kTLS TX path properly with crypto
operations. From Jakub Kicinski.
12) Sock refcount during shutdown fix in vsock/virtio code, from Stefano
Garzarella.
13) Infinite loop in Intel ice driver, from Colin Ian King.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (108 commits)
ixgbe: need_wakeup flag might not be set for Tx
i40e: need_wakeup flag might not be set for Tx
igb/igc: use ktime accessors for skb->tstamp
i40e: Fix for ethtool -m issue on X722 NIC
iavf: initialize ITRN registers with correct values
ice: fix potential infinite loop because loop counter being too small
qede: fix NULL pointer deref in __qede_remove()
net: fix data-race in neigh_event_send()
vsock/virtio: fix sock refcnt holding during the shutdown
net: ethernet: octeon_mgmt: Account for second possible VLAN header
mac80211: fix station inactive_time shortly after boot
net/fq_impl: Switch to kvmalloc() for memory allocation
mac80211: fix ieee80211_txq_setup_flows() failure path
ipv4: Fix table id reference in fib_sync_down_addr
ipv6: fixes rt6_probe() and fib6_nh->last_probe init
net: hns: Fix the stray netpoll locks causing deadlock in NAPI path
net: usb: qmi_wwan: add support for DW5821e with eSIM support
CDC-NCM: handle incomplete transfer of MTU
nfc: netlink: fix double device reference drop
NFC: st21nfca: fix double free
...
Linus Torvalds [Sat, 9 Nov 2019 02:15:55 +0000 (18:15 -0800)]
Merge tag 'for-linus-2019-11-08' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- Two NVMe device removal crash fixes, and a compat fixup for for an
ioctl that was introduced in this release (Anton, Charles, Max - via
Keith)
- Missing error path mutex unlock for drbd (Dan)
- cgroup writeback fixup on dead memcg (Tejun)
- blkcg online stats print fix (Tejun)
* tag 'for-linus-2019-11-08' of git://git.kernel.dk/linux-block:
cgroup,writeback: don't switch wbs immediately on dead wbs if the memcg is dead
block: drbd: remove a stray unlock in __drbd_send_protocol()
blkcg: make blkcg_print_stat() print stats only for online blkgs
nvme: change nvme_passthru_cmd64 to explicitly mark rsvd
nvme-multipath: fix crash in nvme_mpath_clear_ctrl_paths
nvme-rdma: fix a segmentation fault during module unload
David S. Miller [Sat, 9 Nov 2019 00:50:14 +0000 (16:50 -0800)]
Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue
Jeff Kirsher says:
====================
Intel Wired LAN Driver Fixes 2019-11-08
This series contains fixes to igb, igc, ixgbe, i40e, iavf and ice
drivers.
Colin Ian King fixes a potentially wrap-around counter in a for-loop.
Nick fixes the default ITR values for the iavf driver to 50 usecs
interval.
Arkadiusz fixes 'ethtool -m' for X722 devices where the correct value
cannot be obtained from the firmware, so add X722 to the check to ensure
the wrong value is not returned.
Jake fixes igb and igc drivers in their implementation of launch time
support by declaring skb->tstamp value as ktime_t instead of s64.
Magnus fixes ixgbe and i40e where the need_wakeup flag for transmit may
not be set for AF_XDP sockets that are only used to send packets.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Magnus Karlsson [Fri, 8 Nov 2019 19:58:10 +0000 (20:58 +0100)]
ixgbe: need_wakeup flag might not be set for Tx
The need_wakeup flag for Tx might not be set for AF_XDP sockets that
are only used to send packets. This happens if there is at least one
outstanding packet that has not been completed by the hardware and we
get that corresponding completion (which will not generate an
interrupt since interrupts are disabled in the napi poll loop) between
the time we stopped processing the Tx completions and interrupts are
enabled again. In this case, the need_wakeup flag will have been
cleared at the end of the Tx completion processing as we believe we
will get an interrupt from the outstanding completion at a later point
in time. But if this completion interrupt occurs before interrupts
are enable, we lose it and should at that point really have set the
need_wakeup flag since there are no more outstanding completions that
can generate an interrupt to continue the processing. When this
happens, user space will see a Tx queue need_wakeup of 0 and skip
issuing a syscall, which means will never get into the Tx processing
again and we have a deadlock.
This patch introduces a quick fix for this issue by just setting the
need_wakeup flag for Tx to 1 all the time. I am working on a proper
fix for this that will toggle the flag appropriately, but it is more
challenging than I anticipated and I am afraid that this patch will
not be completed before the merge window closes, therefore this easier
fix for now. This fix has a negative performance impact in the range
of 0% to 4%. Towards the higher end of the scale if you have driver
and application on the same core and issue a lot of packets, and
towards no negative impact if you use two cores, lower transmission
speeds and/or a workload that also receives packets.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Magnus Karlsson [Fri, 8 Nov 2019 19:58:09 +0000 (20:58 +0100)]
i40e: need_wakeup flag might not be set for Tx
The need_wakeup flag for Tx might not be set for AF_XDP sockets that
are only used to send packets. This happens if there is at least one
outstanding packet that has not been completed by the hardware and we
get that corresponding completion (which will not generate an
interrupt since interrupts are disabled in the napi poll loop) between
the time we stopped processing the Tx completions and interrupts are
enabled again. In this case, the need_wakeup flag will have been
cleared at the end of the Tx completion processing as we believe we
will get an interrupt from the outstanding completion at a later point
in time. But if this completion interrupt occurs before interrupts
are enable, we lose it and should at that point really have set the
need_wakeup flag since there are no more outstanding completions that
can generate an interrupt to continue the processing. When this
happens, user space will see a Tx queue need_wakeup of 0 and skip
issuing a syscall, which means will never get into the Tx processing
again and we have a deadlock.
This patch introduces a quick fix for this issue by just setting the
need_wakeup flag for Tx to 1 all the time. I am working on a proper
fix for this that will toggle the flag appropriately, but it is more
challenging than I anticipated and I am afraid that this patch will
not be completed before the merge window closes, therefore this easier
fix for now. This fix has a negative performance impact in the range
of 0% to 4%. Towards the higher end of the scale if you have driver
and application on the same core and issue a lot of packets, and
towards no negative impact if you use two cores, lower transmission
speeds and/or a workload that also receives packets.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Wed, 6 Nov 2019 17:18:23 +0000 (09:18 -0800)]
igb/igc: use ktime accessors for skb->tstamp
When implementing launch time support in the igb and igc drivers, the
skb->tstamp value is assumed to be a s64, but it's declared as a ktime_t
value.
Although ktime_t is typedef'd to s64 it wasn't always, and the kernel
provides accessors for ktime_t values.
Use the ktime_to_timespec64 and ktime_set accessors instead of directly
assuming that the variable is always an s64.
This improves portability if the code is ever moved to another kernel
version, or if the definition of ktime_t ever changes again in the
future.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch contains fix for a problem with command:
'ethtool -m <dev>'
which breaks functionality of:
'ethtool <dev>'
when called on X722 NIC
Disallowed update of link phy_types on X722 NIC
Currently correct value cannot be obtained from FW
Previously wrong value returned by FW was used and was
a root cause for incorrect output of 'ethtool <dev>' command
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Nicholas Nunley [Tue, 5 Nov 2019 12:22:14 +0000 (04:22 -0800)]
iavf: initialize ITRN registers with correct values
Since commit 92418fb14750 ("i40e/i40evf: Use usec value instead of reg
value for ITR defines") the driver tracks the interrupt throttling
intervals in single usec units, although the actual ITRN registers are
programmed in 2 usec units. Most register programming flows in the driver
correctly handle the conversion, although it is currently not applied when
the registers are initialized to their default values. Most of the time
this doesn't present a problem since the default values are usually
immediately overwritten through the standard adaptive throttling mechanism,
or updated manually by the user, but if adaptive throttling is disabled and
the interval values are left alone then the incorrect value will persist.
Since the intended default interval of 50 usecs (vs. 100 usecs as
programmed) performs better for most traffic workloads, this can lead to
performance regressions.
This patch adds the correct conversion when writing the initial values to
the ITRN registers.
Signed-off-by: Nicholas Nunley <nicholas.d.nunley@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Colin Ian King [Fri, 1 Nov 2019 14:00:17 +0000 (14:00 +0000)]
ice: fix potential infinite loop because loop counter being too small
Currently the for-loop counter i is a u8 however it is being checked
against a maximum value hw->num_tx_sched_layers which is a u16. Hence
there is a potential wrap-around of counter i back to zero if
hw->num_tx_sched_layers is greater than 255. Fix this by making i
a u16.
Addresses-Coverity: ("Infinite loop") Fixes: b36c598c999c ("ice: Updates to Tx scheduler code") Signed-off-by: Colin Ian King <colin.king@canonical.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Manish Chopra [Fri, 8 Nov 2019 10:42:30 +0000 (02:42 -0800)]
qede: fix NULL pointer deref in __qede_remove()
While rebooting the system with SR-IOV vfs enabled leads
to below crash due to recurrence of __qede_remove() on the VF
devices (first from .shutdown() flow of the VF itself and
another from PF's .shutdown() flow executing pci_disable_sriov())
This patch adds a safeguard in __qede_remove() flow to fix this,
so that driver doesn't attempt to remove "already removed" devices.
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.4.0-rc3+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 8 Nov 2019 21:42:40 +0000 (13:42 -0800)]
Merge tag 'pwm/for-5.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm
Pull pwm fix from Thierry Reding:
"One more fix to keep a reference to the driver's module as long as
there are users of the PWM exposed by the driver"
* tag 'pwm/for-5.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm:
pwm: bcm-iproc: Prevent unloading the driver module while in use
Peter Zijlstra [Fri, 8 Nov 2019 10:11:52 +0000 (11:11 +0100)]
sched: Fix pick_next_task() vs 'change' pattern race
Commit 67692435c411 ("sched: Rework pick_next_task() slow-path")
inadvertly introduced a race because it changed a previously
unexplored dependency between dropping the rq->lock and
sched_class::put_prev_task().
The comments about dropping rq->lock, in for example
newidle_balance(), only mentions the task being current and ->on_cpu
being set. But when we look at the 'change' pattern (in for example
sched_setnuma()):
if (queued)
dequeue_task(...);
if (running)
put_prev_task(...);
/* change task properties */
if (queued)
enqueue_task(...);
if (running)
set_next_task(...);
It becomes obvious that if we do this after put_prev_task() has
already been called on @p, things go sideways. This is exactly what
the commit in question allows to happen when it does:
prev->sched_class->put_prev_task(rq, prev, rf);
if (!rq->nr_running)
newidle_balance(rq, rf);
The newidle_balance() call will drop rq->lock after we've called
put_prev_task() and that allows the above 'change' pattern to
interleave and mess up the state.
Furthermore, it turns out we lost the RT-pull when we put the last DL
task.
Fix both problems by extracting the balancing from put_prev_task() and
doing a multi-class balance() pass before put_prev_task().
Qais Yousef [Tue, 5 Nov 2019 11:22:12 +0000 (11:22 +0000)]
sched/core: Fix compilation error when cgroup not selected
When cgroup is disabled the following compilation error was hit
kernel/sched/core.c: In function ‘uclamp_update_active_tasks’:
kernel/sched/core.c:1081:23: error: storage size of ‘it’ isn’t known
struct css_task_iter it;
^~
kernel/sched/core.c:1084:2: error: implicit declaration of function ‘css_task_iter_start’; did you mean ‘__sg_page_iter_start’? [-Werror=implicit-function-declaration]
css_task_iter_start(css, 0, &it);
^~~~~~~~~~~~~~~~~~~
__sg_page_iter_start
kernel/sched/core.c:1085:14: error: implicit declaration of function ‘css_task_iter_next’; did you mean ‘__sg_page_iter_next’? [-Werror=implicit-function-declaration]
while ((p = css_task_iter_next(&it))) {
^~~~~~~~~~~~~~~~~~
__sg_page_iter_next
kernel/sched/core.c:1091:2: error: implicit declaration of function ‘css_task_iter_end’; did you mean ‘get_task_cred’? [-Werror=implicit-function-declaration]
css_task_iter_end(&it);
^~~~~~~~~~~~~~~~~
get_task_cred
kernel/sched/core.c:1081:23: warning: unused variable ‘it’ [-Wunused-variable]
struct css_task_iter it;
^~
cc1: some warnings being treated as errors
make[2]: *** [kernel/sched/core.o] Error 1
Fix by protetion uclamp_update_active_tasks() with
CONFIG_UCLAMP_TASK_GROUP
Fixes: babbe170e053 ("sched/uclamp: Update CPU's refcount on TG's clamp changes") Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Patrick Bellasi <patrick.bellasi@matbug.net> Cc: Mel Gorman <mgorman@suse.de> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Ben Segall <bsegall@google.com> Link: https://lkml.kernel.org/r/20191105112212.596-1-qais.yousef@arm.com