Markus Pargmann [Fri, 29 Mar 2013 15:20:10 +0000 (16:20 +0100)]
ARM: imx35 Bugfix admux clock
The admux clock seems to be the audmux clock as tests show. audmux does
not work without this clock enabled. Currently imx35 does not register a
clock device for audmux. This patch adds this registration. imx-audmux
driver already handles a clock device, so no changes are necessary
there.
Markus Pargmann [Fri, 29 Mar 2013 15:20:09 +0000 (16:20 +0100)]
ARM: clk-imx35: Bugfix iomux clock
This patch enables iomuxc_gate clock. It is necessary to be able to
reconfigure iomux pads. Without this clock enabled, the
clk_disable_unused function will disable this clock and the iomux pads
are not configurable anymore. This happens at every boot. After a reboot
(watchdog system reset) the clock is not enabled again, so all iomux pad
reconfigurations in boot code are without effect.
The iomux pads should be always configurable, so this patch always
enables it.
drivers/gpio/devres.c: In function 'devm_gpio_request_one':
drivers/gpio/devres.c:90:2: error: implicit declaration of function 'gpio_request_one' [-Werror=implicit-function-declaration]
So provide a local gpio_request_one() function. Code largely borrowed from
blackfin's local gpio_request_one() function.
sizeof() when applied to a pointer typed expression gives the size of the
pointer, not that of the pointed data.
Introduced by commit 3ce5ef(netrom: fix info leak via msg_name in nr_recvmsg)
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
Merge tag 'remoteproc-3.9-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad/remoteproc
Pull remoteproc fixes from Ohad Ben-Cohen:
"Four small remoteproc fixes:
- Suman fixed an issue that crawled in with the move to the new
idr_alloc interface in 3.9
- Dmitry fixed an STE specific memory leak
- Sjur fixed an error path in the core
- Rob fixed a Kconfig typo"
* tag 'remoteproc-3.9-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad/remoteproc:
remoteproc: fix FW_CONFIG typo
remoteproc: fix error path of handle_vdev
remoteproc/ste: fix memory leak on shutdown
remoteproc: fix the error check for idr_alloc
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sfr/next-fixes
Pull powerpc bugfix from Stephen Rothwell:
"A single BUG_ON fix for a condition that could happen for machines
with certain hardware installed."
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sfr/next-fixes:
powerpc: pSeries_lpar_hpte_remove fails from Adjunct partition being performed before the ANDCOND test
ARC: Add implicit compiler barrier to raw_local_irq* functions
ARC irqsave/restore macros were missing the compiler barrier, causing a
stale load in irq-enabled region be used in irq-safe region, despite
being changed, because the register holding the value was still live.
The problem manifested as random crashes in timer code when stress
testing ARCLinux (3.9-rc3) on a !SMP && !PREEMPT_COUNT
Here's the exact sequence which caused this:
(0). tv1[x] <----> t1 <---> t2
(1). mod_timer(t1) interrupted after it calls timer_pending()
(2). mod_timer(t2) completes
(3). mod_timer(t1) resumes but messes up the list
(4). __runt_timers( ) uses bogus timer_list entry / crashes in
timer->function
Essentially mod_timer() was racing against itself and while the spinlock
serialized the tv1[] timer link list, timer_pending() called outside the
spinlock, cached timer link list element in a register.
With low register pressure (and a deep register file), lack of barrier
in raw_local_irqsave() as well as preempt_disable (!PREEMPT_COUNT
version), there was nothing to force gcc to reload across the spinlock,
causing a stale value in reg be used for link list manipulation - ensuing
a corruption.
ARcompact disassembly which shows the culprit generated code:
tst_s r3,r3 <--------------
# timer_pending( ) checks timer->entry.next
# r3 is NOT reloaded by gcc, using stale value
beq.d @.L169
mov.eq r0,0
##### detach_timer( ): __list_del( )
ld r4,[r13,4] # <variable>.entry.prev, D.31439
st r4,[r3,4] # <variable>.prev, D.31439
st r3,[r4] # <variable>.next, D.30246
We initially tried to fix this by adding barrier() to preempt_* macros
for !PREEMPT_COUNT but Linus clarified that it was anything but wrong.
http://www.spinics.net/lists/kernel/msg1512709.html
[vgupta: updated commitlog]
Reported-by/Signed-off-by: Christian Ruppert <christian.ruppert@abilis.com> Cc: Christian Ruppert <christian.ruppert@abilis.com> Cc: Pierrick Hascoet <pierrick.hascoet@abilis.com>
Debugged-by/Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Merge tag 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev
Pull libata fixes from Jeff Garzik:
"The HDIO_DRIVE_* fix is really the biggie.
1) Fix ATAPI regression, noticed mainly on tape drives, due to a
commit which mistakenly changed an 'int' return type to a 'bool'.
Broken by commit 4dce8ba94c75 ("libata: Use 'bool' return value for
ata_id_XXX")
2) Add Slimtype DVD A DS8A8SH ATAPI quirk
3) ata_piix: Intel Haswell platform quirk
4) Avoid DMA'ing to stack buffer, when obtaining DEVSLP timings. IMO
a mild regression, given that libata previously did not DMA to a
stack buffer. Broken by commit commit 803739d25c23 ("[libata]
replace sata_settings with devslp_timing")
5) Fix regression impacting SMART and smartd, broken by commit 84a9a8cd9d0a ("[libata] Set proper SK when CK_COND is set")"
* tag 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
[libata] Fix HDIO_DRIVE_* ioctl() Linux 3.9 regression
libata: fix DMA to stack in reading devslp_timing parameters
ata_piix: Fix DVD not dectected at some Haswell platforms
libata: Set max sector to 65535 for Slimtype DVD A DS8A8SH drive
libata: Use integer return value for atapi_command_packet_set
Merge tag 'trace-fixes-3.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"This includes three fixes. Two fix features added in 3.9 and one
fixes a long time minor bug.
The first patch fixes a race that can happen if the user switches from
the irqsoff tracer to another tracer. If a irqs off latency is
detected, it will try to use the snapshot buffer, but the new tracer
wont have it allocated. There's a nasty warning that gets printed and
the trace is ignored. Nothing crashes, just a nasty WARN_ON is shown.
The second patch fixes an issue where if the sysctl is used to disable
and enable function tracing, it can put the function tracing into an
unstable state.
The third patch fixes an issue with perf using the function tracer.
An update was done, where the stub function could be called during the
perf function tracing, and that stub function wont have the "control"
flag set and cause a nasty warning when running perf."
* tag 'trace-fixes-3.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ftrace: Do not call stub functions in control loop
ftrace: Consistently restore trace function on sysctl enabling
tracing: Fix race with update_max_tr_single and changing tracers
Stefan Raspl [Sun, 7 Apr 2013 22:19:27 +0000 (22:19 +0000)]
qeth: fix qeth_wait_for_threads() deadlock for OSN devices
Any recovery thread will deadlock when calling qeth_wait_for_threads(), most
notably when triggering a recovery on an OSN device.
This patch will store the recovery thread's task pointer on recovery
invocation and check in qeth_wait_for_threads() respectively to avoid
deadlocks.
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com> Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com> Reviewed-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
af_iucv: fix recvmsg by replacing skb_pull() function
When receiving data messages, the "BUG_ON(skb->len < skb->data_len)" in
the skb_pull() function triggers a kernel panic.
Replace the skb_pull logic by a per skb offset as advised by
Eric Dumazet.
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Asias He [Wed, 3 Apr 2013 06:17:37 +0000 (14:17 +0800)]
tcm_vhost: Use vq->private_data to indicate if the endpoint is setup
Currently, vs->vs_endpoint is used indicate if the endpoint is setup or
not. It is set or cleared in vhost_scsi_set_endpoint() or
vhost_scsi_clear_endpoint() under the vs->dev.mutex lock. However, when
we check it in vhost_scsi_handle_vq(), we ignored the lock.
Instead of using the vs->vs_endpoint and the vs->dev.mutex lock to
indicate the status of the endpoint, we use per virtqueue
vq->private_data to indicate it. In this way, we can only take the
vq->mutex lock which is per queue and make the concurrent multiqueue
process having less lock contention. Further, in the read side of
vq->private_data, we can even do not take the lock if it is accessed in
the vhost worker thread, because it is protected by "vhost rcu".
(nab: Do s/VHOST_FEATURES/~VHOST_SCSI_FEATURES)
Signed-off-by: Asias He <asias@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
bonding: fix bonding_masters race condition in bond unloading
While the bonding module is unloading, it is considered that after
rtnl_link_unregister all bond devices are destroyed but since no
synchronization mechanism exists, a new bond device can be created
via bonding_masters before unregister_pernet_subsys which would
lead to multiple problems (e.g. NULL pointer dereference, wrong RIP,
list corruption).
This patch fixes the issue by removing any bond devices left in the
netns after bonding_masters is removed from sysfs.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Acked-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces a new bug which causes access to freed memory.
In bond_uninit: list_del(&bond->bond_list);
bond_list is linked in bond_net's dev_list which is freed by
unregister_pernet_subsys.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 8 Apr 2013 20:39:55 +0000 (16:39 -0400)]
Merge branch 'wireless'
John W. Linville says:
====================
For the cfg80211 fix, Johannes says:
"I have another straggler for 3.9, adding locking forgotten in a previous
fix."
On top of that:
Bing Zhao provides an mwifiex fix to properly order a scan completion.
Franky Lin gives us a brcmfmac fix to fail at the firmware loading
stage if the nvram cannot be downloaded.
Gabor Juhos brings what at first looks like a rather big rt2x00 patch.
I think it is OK because it is really just reorganizing some code
within the rt2x00 driver in order to fix a build failure.
Hante Meuleman offers a trio of brcmfmac fixes related to running in
AP mode.
Robert Shade sends an ath9k fix to reenable interrupts even if a
channel change fails.
Tim Gardner gives us an rt2x00 fix to cut-down on some log SPAM.
Please let me know if there are problems!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
PM / reboot: call syscore_shutdown() after disable_nonboot_cpus()
As commit 40dc166c (PM / Core: Introduce struct syscore_ops for core
subsystems PM) say, syscore_ops operations should be carried with one
CPU on-line and interrupts disabled. However, after commit f96972f2d
(kernel/sys.c: call disable_nonboot_cpus() in kernel_restart()),
syscore_shutdown() is called before disable_nonboot_cpus(), so break
the rules. We have a MIPS machine with a 8259A PIC, and there is an
external timer (HPET) linked at 8259A. Since 8259A has been shutdown
too early (by syscore_shutdown()), disable_nonboot_cpus() runs without
timer interrupt, so it hangs and reboot fails. This patch call
syscore_shutdown() a little later (after disable_nonboot_cpus()) to
avoid reboot failure, this is the same way as poweroff does.
For consistency, add disable_nonboot_cpus() to kernel_halt().
Signed-off-by: Huacai Chen <chenhc@lemote.com> Cc: <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
cpufreq / intel_pstate: Set timer timeout correctly
The current calculation of the delay time is wrong and a cut and
paste error from a previous experimental driver. This can result in
the timeout being set to jiffies + 1 which setup the driver to race
with itself if the APIC timer interrupt happens at just the right
time.
References: https://bugzilla.redhat.com/show_bug.cgi?id=920289 Reported-by: Adam Williamson <awilliam@redhat.com> Reported-and-tested-by: Parag Warudkar <parag.lkml@gmail.com> Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
tty: mxser: fix cycle termination condition in mxser_probe() and mxser_module_init()
There is a bug in resources deallocation code in mxser_probe() and
mxser_module_init(). As soon as variable 'i' is unsigned int, cycle
termination condition i >= 0 is always true. The patch fixes the issue.
Checking for disabled resources board breaks detection pnp on another
board "AMI UEFI implementation (Version: 0406 Release Date: 06/06/2012)".
I'm working with the reporter of the original bug to write and test
a better fix.
ARM: dts: Add cpufreq controller node for Exynos5440 SoC
Add cpufreq controller device node for Exynos5440 SoC for passing
parameters like controller base address, interrupt and cpufreq
table. This node is added outside cpu0 as this driver is now a platform
driver and a new device structure is needed.
Cc: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Amit Daniel Kachhap <amit.daniel@samsung.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
ARM: dts: Add SYSREG block node for S5P/Exynos4 SoC series
This patch adds device tree node for the SYSREG registers block
found in Samsung S5P/Exynos SoC series. The SYSREG module
generates control signals for the ARM CPU and various IP blocks
and buses. SYSREG block registers are exposed through APB bus
interface. A sysreg device tree node is to be associated with
mfd syscon driver and all SYSREG clients should use regmap
interface it provides. It allows to eliminate any possible races
and conflicts should different drivers attempt to concurrently
access same register.
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Doug Anderson [Thu, 4 Apr 2013 06:04:58 +0000 (15:04 +0900)]
ARM: dts: add usb 2.0 clock references to exynos5250 device tree
This is a fixup to two device tree nodes that have already landed but
without clock nodes since the transition to common clock happened at
the same time.
Signed-off-by: Doug Anderson <dianders@chromium.org> Reviewed-by: Jingoo Han <jg1.han@samsung.com>
[gautam.vivek@samsung.com: tested on smdk5250] Tested-by: Vivek Gautam <gautam.vivek@samsung.com> Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
ftrace: Do not call stub functions in control loop
The function tracing control loop used by perf spits out a warning
if the called function is not a control function. This is because
the control function references a per cpu allocated data structure
on struct ftrace_ops that is not allocated for other types of
functions.
commit 0a016409e42 "ftrace: Optimize the function tracer list loop"
Had an optimization done to all function tracing loops to optimize
for a single registered ops. Unfortunately, this allows for a slight
race when tracing starts or ends, where the stub function might be
called after the current registered ops is removed. In this case we
get the following dump:
To fix this a new ftrace_ops flag is added that denotes the ftrace_list_end
ftrace_ops stub as just that, a stub. This flag is now checked in the
control loop and the function is not called if the flag is set.
Thanks to Jovi for not just reporting the bug, but also pointing out
where the bug was in the code.
Jan Kiszka [Tue, 26 Mar 2013 16:53:03 +0000 (17:53 +0100)]
ftrace: Consistently restore trace function on sysctl enabling
If we reenable ftrace via syctl, we currently set ftrace_trace_function
based on the previous simplistic algorithm. This is inconsistent with
what update_ftrace_function does. So better call that helper instead.
tracing: Fix race with update_max_tr_single and changing tracers
The commit 34600f0e9 "tracing: Fix race with max_tr and changing tracers"
fixed the updating of the main buffers with the race of changing
tracers, but left out the fix to the updating of just a per cpu buffer.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Added S5M8767 PMIC DT nodes for Arndale board. Only the used
LDO's/BUCK are defined here. Also the nodes describe the default/reset
state LDO's and no power mangement tuning is implemented. The usage
desription can be found in s5m8767 device tree binding documentation.
Signed-off-by: Amit Daniel Kachhap <amit.daniel@samsung.com> Signed-off-by: Tushar Behera <tushar.behera@linaro.org> Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
net: mvneta: enable features before registering the driver
It seems that the reason why the dev features were ignored was because
they were enabled after registeration.
Signed-off-by: Willy Tarreau <w@1wt.eu> Acked-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Abraham [Thu, 4 Apr 2013 05:16:11 +0000 (14:16 +0900)]
ARM: dts: add pin state information in client nodes for Exynos5 platforms
Add default pin state information for all client nodes that require
pin configuration support using pinctrl interface.
Signed-off-by: Thomas Abraham <thomas.abraham@linaro.org> Acked-by: Linus Walleij <linus.walleij@linaro.org> Tested-by: Doug Anderson <dianders@chromium.org> Reviewed-by: Doug Anderson <dianders@chromium.org> Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
In some cases, the VM_PKT_COMP message can arrive later than RNDIS completion
message, which will free the packet memory. This may cause panic due to access
to freed memory in netvsc_send_completion().
This patch fixes this problem by removing rndis_filter_send_request_completion()
from the code path. The function was a no-op.
Reported-by: Long Li <longli@microsoft.com> Tested-by: Long Li <longli@microsoft.com> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
hyperv: Fix a kernel warning from netvsc_linkstatus_callback()
The warning about local_bh_enable inside IRQ happens when disconnecting a
virtual NIC.
The reason for the warning is -- netif_tx_disable() is called when the NIC
is disconnected. And it's called within irq context. netif_tx_disable() calls
local_bh_enable() which displays warning if in irq.
The fix is to remove the unnecessary netif_tx_disable & wake_queue() in the
netvsc_linkstatus_callback().
Reported-by: Richard Genoud <richard.genoud@gmail.com> Tested-by: Long Li <longli@microsoft.com> Tested-by: Richard Genoud <richard.genoud@gmail.com> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Abraham [Thu, 4 Apr 2013 05:07:04 +0000 (14:07 +0900)]
ARM: dts: add pin state information in client nodes for Exynos4 platforms
Add default pin state information for all client nodes that require
pin configuration support using pinctrl interface.
Signed-off-by: Thomas Abraham <thomas.abraham@linaro.org> Acked-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
net: ipv4: reset check_lifetime_work after changing lifetime
This will result in calling check_lifetime in nearest opportunity and
that function will adjust next time to call check_lifetime correctly.
Without this, check_lifetime is called in time computed by previous run,
not affecting modified lifetime.
Merge tag 'mxs-fixes-3.9-4' of git://git.linaro.org/people/shawnguo/linux-2.6 into fixes
From Shawn Guo <shawn.guo@linaro.org>:
The mxs fixes for 3.9, take 4:
- A couple mxs boards that run I2C at 400 kHz experience some unstable
issue occasionally. Slow down the clock speed to have I2C work
reliably.
* tag 'mxs-fixes-3.9-4' of git://git.linaro.org/people/shawnguo/linux-2.6:
ARM: mxs: Slow down the I2C clock speed
sched/cputime: Fix accounting on multi-threaded processes
Recent commit 6fac4829 ("cputime: Use accessors to read task
cputime stats") introduced a bug, where we account many times
the cputime of the first thread, instead of cputimes of all
the different threads.
Thomas Abraham [Fri, 5 Apr 2013 06:17:47 +0000 (15:17 +0900)]
ARM: EXYNOS: fix compilation error introduced due to common clock migration
The functions exynos4_clk_init and exynos4_clk_register_fixed_ext
are applicable only on Exynos4 non-dt platforms. But when building
Exynos5 platforms without including Exynos4 platforms, the following
errors show up.
arch/arm/mach-exynos/built-in.o: In function `exynos_init_time':
arch/arm/mach-exynos/common.c:446: undefined reference to `exynos4_clk_init'
arch/arm/mach-exynos/common.c:447: undefined reference to `exynos4_clk_register_fixed_ext'
Fix this compilation errors by marking these calls as Exynos4 specific.
Signed-off-by: Thomas Abraham <thomas.ab@samsung.com> Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Tushar Behera [Mon, 8 Apr 2013 06:28:12 +0000 (15:28 +0900)]
clk: exynos5250: Fix divider values for sclk_mmc{0,1,2,3}
In legacy setup, sclk_mmc{0,1,2,3} used PRE_RATIO bit-field (8-bit wide)
instead of RATIO bit-field (4-bit wide) for dividing clock rate.
With current common clock setup, we are using RATIO bit-field which
is creating FIFO read errors while accessing eMMC. Changing over to
use PRE_RATIO bit-field fixes this issue.
dwmmc_exynos 12200000.dwmmc0: data FIFO error (status=00008020)
mmcblk0: error -5 transferring data, sector 1, nr 7, cmd response 0x900, card status 0x0
end_request: I/O error, dev mmcblk0, sector 1
Signed-off-by: Tushar Behera <tushar.behera@linaro.org> CC: Thomas Abraham <thomas.abraham@linaro.org> Acked-by: Mike Turquette <mturquette@linaro.org> Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
This patch adds clock indexes for ACLK_DIV0, ACLK_DIV1,
ACLK_400_MCUISP, ACLK_MCUISP_DIV0, ACLK_MCUISP_DIV1,
DIVACLK_400_MCUISP and DIVACLK_200 so these clocks are
available to the consumers (Exynos4x12 FIMC-IS subsystem).
While at it, indentation of the mux clocks table is
corrected.
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Acked-by: Mike Turquette <mturquette@linaro.org> Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Fixes the below compilation error during non-dt build.
drivers/clk/samsung/clk.c: In function 'samsung_clk_of_register_fixed_ext':
drivers/clk/samsung/clk.c:252:2: error: implicit declaration of function 'for_each_matching_node_and_match' [-Werror=implicit-function-declaration]
drivers/clk/samsung/clk.c:252:60: error: expected ';' before '{' token
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org> Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Thomas Gleixner [Sat, 6 Apr 2013 08:10:27 +0000 (10:10 +0200)]
sched_clock: Prevent 64bit inatomicity on 32bit systems
The sched_clock_remote() implementation has the following inatomicity
problem on 32bit systems when accessing the remote scd->clock, which
is a 64bit value.
While the update of scd->clock is using an atomic64 mechanism, the
readout on the remote cpu is not, which can cause completely bogus
readouts.
It is a quite rare problem, because it requires the update to hit the
narrow race window between the low/high readout and the update must go
across the 32bit boundary.
The resulting misbehaviour is, that CPU1 will see the sched_clock on
CPU1 ~4 seconds ahead of it's own and update CPU1s sched_clock value
to this bogus timestamp. This stays that way due to the clamping
implementation for about 4 seconds until the synchronization with
CLOCK_MONOTONIC undoes the problem.
The issue is hard to observe, because it might only result in a less
accurate SCHED_OTHER timeslicing behaviour. To create observable
damage on realtime scheduling classes, it is necessary that the bogus
update of CPU1 sched_clock happens in the context of an realtime
thread, which then gets charged 4 seconds of RT runtime, which results
in the RT throttler mechanism to trigger and prevent scheduling of RT
tasks for a little less than 4 seconds. So this is quite unlikely as
well.
The issue was quite hard to decode as the reproduction time is between
2 days and 3 weeks and intrusive tracing makes it less likely, but the
following trace recorded with trace_clock=global, which uses
sched_clock_local(), gave the final hint:
What happens is that CPU0 goes idle and invokes
sched_clock_idle_sleep_event() which invokes sched_clock_local() and
CPU1 runs a remote wakeup for CPU0 at the same time, which invokes
sched_remote_clock(). The time jump gets propagated to CPU0 via
sched_remote_clock() and stays stale on both cores for ~4 seconds.
There are only two other possibilities, which could cause a stale
sched clock:
1) ktime_get() which reads out CLOCK_MONOTONIC returns a sporadic
wrong value.
2) sched_clock() which reads the TSC returns a sporadic wrong value.
#1 can be excluded because sched_clock would continue to increase for
one jiffy and then go stale.
#2 can be excluded because it would not make the clock jump
forward. It would just result in a stale sched_clock for one jiffy.
After quite some brain twisting and finding the same pattern on other
traces, sched_clock_remote() remained the only place which could cause
such a problem and as explained above it's indeed racy on 32bit
systems.
So while on 64bit systems the readout is atomic, we need to verify the
remote readout on 32bit machines. We need to protect the local->clock
readout in sched_clock_remote() on 32bit as well because an NMI could
hit between the low and the high readout, call sched_clock_local() and
modify local->clock.
Thanks to Siegfried Wulsch for bearing with my debug requests and
going through the tedious tasks of running a bunch of reproducer
systems to generate the debug information which let me decode the
issue.
Reported-by: Siegfried Wulsch <Siegfried.Wulsch@rovema.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1304051544160.21884@ionos Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org
There are situations where the destruction path is called
with the bdev->bd_mutex already held, which then deadlocks in
loop_clr_fd(). The normal partition cleanup does a trylock()
on the mutex, but it'd be nice to have a more bullet proof
method in loop. So punt this more involved fix to the next
merge window, and just back out this buggy fix for now.
Michael Wolf [Fri, 5 Apr 2013 10:41:40 +0000 (10:41 +0000)]
powerpc: pSeries_lpar_hpte_remove fails from Adjunct partition being performed before the ANDCOND test
Some versions of pHyp will perform the adjunct partition test before the
ANDCOND test. The result of this is that H_RESOURCE can be returned and
cause the BUG_ON condition to occur. The HPTE is not removed. So add a
check for H_RESOURCE, it is ok if this HPTE is not removed as
pSeries_lpar_hpte_remove is looking for an HPTE to remove and not a
specific HPTE to remove. So it is ok to just move on to the next slot
and try again.
Cc: stable@vger.kernel.org Signed-off-by: Michael Wolf <mjw@linux.vnet.ibm.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Check KR2 recovery time at the beginning of the work-around function.
Signed-off-by: Yaniv Rosner <yanivr@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 7 Apr 2013 20:42:40 +0000 (16:42 -0400)]
Merge branch 'infoleaks'
Mathias Krause says:
====================
a few more info leak fixes in the recvmsg path. The error pattern here
is the protocol specific recvmsg function is missing the msg_namelen
assignment -- either completely or in early exit paths that do not
result in errors in __sys_recvmsg()/sys_recvfrom() and, in turn, make
them call move_addr_to_user(), leaking the then still uninitialized
sockaddr_storage stack variable to userland.
My audit was initiated by a rather coarse fix of the leak that can be
found in the grsecurity patch, putting a penalty on protocols complying
to the rules of recvmsg. So credits for finding the leak in the recvmsg
path in __sys_recvmsg() should go to Brad!
The buggy protocols/subsystems are rather obscure anyway. As a missing
assignment of msg_namelen coupled with a missing filling of msg_name
would only result in garbage -- the leak -- in case userland would care
about that information, i.e. would provide a msg_name pointer. But
obviously current userland does not.
While auditing the code for the above pattern I found a few more
'uninitialized members' kind of leaks related to the msg_name filling.
Those are fixed in this series, too.
I have to admit, I failed to test all of the patches due to missing
hardware, e.g. iucv depends on S390 -- hardware I've no access to :/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
VSOCK: Fix missing msg_namelen update in vsock_stream_recvmsg()
The code misses to update the msg_namelen member to 0 and therefore
makes net/socket.c leak the local, uninitialized sockaddr_storage
variable to userland -- 128 bytes of kernel stack memory.
Cc: Andy King <acking@vmware.com> Cc: Dmitry Torokhov <dtor@vmware.com> Cc: George Zhang <georgezhang@vmware.com> Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
VSOCK: vmci - fix possible info leak in vmci_transport_dgram_dequeue()
In case we received no data on the call to skb_recv_datagram(), i.e.
skb->data is NULL, vmci_transport_dgram_dequeue() will return with 0
without updating msg_namelen leading to net/socket.c leaking the local,
uninitialized sockaddr_storage variable to userland -- 128 bytes of
kernel stack memory.
Fix this by moving the already existing msg_namelen assignment a few
lines above.
Cc: Andy King <acking@vmware.com> Cc: Dmitry Torokhov <dtor@vmware.com> Cc: George Zhang <georgezhang@vmware.com> Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
tipc: fix info leaks via msg_name in recv_msg/recv_stream
The code in set_orig_addr() does not initialize all of the members of
struct sockaddr_tipc when filling the sockaddr info -- namely the union
is only partly filled. This will make recv_msg() and recv_stream() --
the only users of this function -- leak kernel stack memory as the
msg_name member is a local variable in net/socket.c.
Additionally to that both recv_msg() and recv_stream() fail to update
the msg_namelen member to 0 while otherwise returning with 0, i.e.
"success". This is the case for, e.g., non-blocking sockets. This will
lead to a 128 byte kernel stack leak in net/socket.c.
Fix the first issue by initializing the memory of the union with
memset(0). Fix the second one by setting msg_namelen to 0 early as it
will be updated later if we're going to fill the msg_name member.
Cc: Jon Maloy <jon.maloy@ericsson.com> Cc: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
rose: fix info leak via msg_name in rose_recvmsg()
The code in rose_recvmsg() does not initialize all of the members of
struct sockaddr_rose/full_sockaddr_rose when filling the sockaddr info.
Nor does it initialize the padding bytes of the structure inserted by
the compiler for alignment. This will lead to leaking uninitialized
kernel stack bytes in net/socket.c.
Fix the issue by initializing the memory used for sockaddr info with
memset(0).
Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
NFC: llcp: fix info leaks via msg_name in llcp_sock_recvmsg()
The code in llcp_sock_recvmsg() does not initialize all the members of
struct sockaddr_nfc_llcp when filling the sockaddr info. Nor does it
initialize the padding bytes of the structure inserted by the compiler
for alignment.
Also, if the socket is in state LLCP_CLOSED or is shutting down during
receive the msg_namelen member is not updated to 0 while otherwise
returning with 0, i.e. "success". The msg_namelen update is also
missing for stream and seqpacket sockets which don't fill the sockaddr
info.
Both issues lead to the fact that the code will leak uninitialized
kernel stack bytes in net/socket.c.
Fix the first issue by initializing the memory used for sockaddr info
with memset(0). Fix the second one by setting msg_namelen to 0 early.
It will be updated later if we're going to fill the msg_name member.
Cc: Lauro Ramos Venancio <lauro.venancio@openbossa.org> Cc: Aloisio Almeida Jr <aloisio.almeida@openbossa.org> Cc: Samuel Ortiz <sameo@linux.intel.com> Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
netrom: fix info leak via msg_name in nr_recvmsg()
In case msg_name is set the sockaddr info gets filled out, as
requested, but the code fails to initialize the padding bytes of
struct sockaddr_ax25 inserted by the compiler for alignment. Also
the sax25_ndigis member does not get assigned, leaking four more
bytes.
Both issues lead to the fact that the code will leak uninitialized
kernel stack bytes in net/socket.c.
Fix both issues by initializing the memory with memset(0).
Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
llc: Fix missing msg_namelen update in llc_ui_recvmsg()
For stream sockets the code misses to update the msg_namelen member
to 0 and therefore makes net/socket.c leak the local, uninitialized
sockaddr_storage variable to userland -- 128 bytes of kernel stack
memory. The msg_namelen update is also missing for datagram sockets
in case the socket is shutting down during receive.
Fix both issues by setting msg_namelen to 0 early. It will be
updated later if we're going to fill the msg_name member.
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The L2TP code for IPv6 fails to initialize the l2tp_conn_id member of
struct sockaddr_l2tpip6 and therefore leaks four bytes kernel stack
in l2tp_ip6_recvmsg() in case msg_name is set.
Initialize l2tp_conn_id with 0 to avoid the info leak.
Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
iucv: Fix missing msg_namelen update in iucv_sock_recvmsg()
The current code does not fill the msg_name member in case it is set.
It also does not set the msg_namelen member to 0 and therefore makes
net/socket.c leak the local, uninitialized sockaddr_storage variable
to userland -- 128 bytes of kernel stack memory.
Fix that by simply setting msg_namelen to 0 as obviously nobody cared
about iucv_sock_recvmsg() not filling the msg_name in case it was set.
Cc: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
irda: Fix missing msg_namelen update in irda_recvmsg_dgram()
The current code does not fill the msg_name member in case it is set.
It also does not set the msg_namelen member to 0 and therefore makes
net/socket.c leak the local, uninitialized sockaddr_storage variable
to userland -- 128 bytes of kernel stack memory.
Fix that by simply setting msg_namelen to 0 as obviously nobody cared
about irda_recvmsg_dgram() not filling the msg_name in case it was
set.
Cc: Samuel Ortiz <samuel@sortiz.org> Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
caif: Fix missing msg_namelen update in caif_seqpkt_recvmsg()
The current code does not fill the msg_name member in case it is set.
It also does not set the msg_namelen member to 0 and therefore makes
net/socket.c leak the local, uninitialized sockaddr_storage variable
to userland -- 128 bytes of kernel stack memory.
Fix that by simply setting msg_namelen to 0 as obviously nobody cared
about caif_seqpkt_recvmsg() not filling the msg_name in case it was
set.
Cc: Sjur Braendeland <sjur.brandeland@stericsson.com> Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Bluetooth: SCO - Fix missing msg_namelen update in sco_sock_recvmsg()
If the socket is in state BT_CONNECT2 and BT_SK_DEFER_SETUP is set in
the flags, sco_sock_recvmsg() returns early with 0 without updating the
possibly set msg_namelen member. This, in turn, leads to a 128 byte
kernel stack leak in net/socket.c.
Fix this by updating msg_namelen in this case. For all other cases it
will be handled in bt_sock_recvmsg().
Cc: Marcel Holtmann <marcel@holtmann.org> Cc: Gustavo Padovan <gustavo@padovan.org> Cc: Johan Hedberg <johan.hedberg@gmail.com> Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Bluetooth: RFCOMM - Fix missing msg_namelen update in rfcomm_sock_recvmsg()
If RFCOMM_DEFER_SETUP is set in the flags, rfcomm_sock_recvmsg() returns
early with 0 without updating the possibly set msg_namelen member. This,
in turn, leads to a 128 byte kernel stack leak in net/socket.c.
Fix this by updating msg_namelen in this case. For all other cases it
will be handled in bt_sock_stream_recvmsg().
Cc: Marcel Holtmann <marcel@holtmann.org> Cc: Gustavo Padovan <gustavo@padovan.org> Cc: Johan Hedberg <johan.hedberg@gmail.com> Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>