Saurav Girepunje [Thu, 24 Oct 2019 03:09:01 +0000 (08:39 +0530)]
scsi: lpfc: lpfc_nvmet: Fix Use plain integer as NULL pointer
Replace assignment of 0 to pointer with NULL assignment.
Link: https://lore.kernel.org/r/20191024030857.GA12097@saurav Signed-off-by: Saurav Girepunje <saurav.girepunje@gmail.com> Acked-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Saurav Girepunje [Thu, 24 Oct 2019 02:57:29 +0000 (08:27 +0530)]
scsi: lpfc: lpfc_attr: Fix Use plain integer as NULL pointer
Replace assignment of 0 to pointer with NULL assignment.
Link: https://lore.kernel.org/r/20191024025726.GA31421@saurav Signed-off-by: Saurav Girepunje <saurav.girepunje@gmail.com> Acked-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
James Smart [Fri, 18 Oct 2019 21:18:31 +0000 (14:18 -0700)]
scsi: lpfc: Add additional discovery log messages
When debugging a recent discovery customer problem it was very hard to tell
what was happening with the existing discovery log messages. To fully debug
the issue additional log messages were necessary.
Add or extend log messages so that sufficient information is present for
debugging.
James Smart [Fri, 18 Oct 2019 21:18:30 +0000 (14:18 -0700)]
scsi: lpfc: Add FC-AL support to lpe32000 models
In the past, the lpe32000 models, based their main support being for 32G,
and as FC-AL is not supported in the FC standards past 8G, did not support
FC-AL operation.
This patch adds private-loop FC-AL support for the LPE32000 adapters
when a link is 8G or below. To avoid conditions where link rate may
change, which would cause non-connectivity to the AL device, FC-AL
mode must become a persistent setting and the link kept at a speed
supporting FC-AL.
The patch:
- Adds a pls attribute indicating whether the adapter properly supports
FC-AL.
- Adds support for the adapter to indicate that topology should be fixed
and the topology types to be configured.
- Adds a pt attribute to report the persistent topology if present.
James Smart [Fri, 18 Oct 2019 21:18:27 +0000 (14:18 -0700)]
scsi: lpfc: Make FW logging dynamically configurable
Currently, the FW logging facility is a load/boot time parameter which
requires the driver to be unloaded/reloaded or the system rebooted in order
to change its configuration.
Convert the logging facility to allow dynamic enablement and configuration.
Specifically:
- Convert the feature so that it can be enabled dynamically via an
attribute. Additionally, the size of the buffer can be configured
dynamically.
- Add locks around states that now may be changing.
- Tie the feature into debugfs so that the logs can be read at any time.
James Smart [Fri, 18 Oct 2019 21:18:26 +0000 (14:18 -0700)]
scsi: lpfc: Revise interrupt coalescing for missing scenarios
The existing "auto eq delay" mechanism was sometimes skipping over an EQ,
not ramping the coalescing down under light load fast enough, and in other
cases never kicked in as cpu sharing by multiple vectors didn't quite add
up right.
Tweak the interrupt mechanism such that:
- Add a flag to the EQ to force checking for colaescing values when being
serviced in the interrupt handler. The flag will be set by any CQ bound
to the EQ whenever the number of CQ elements process in a single scan
meets or exceeds the hardware queue notify level. E.g. there's a
significant number of completions happening.
- In the heartbeat work item that checks coalescing:
- Replace the structure that was counting the number of EQs that
interrupted on a single cpu with a new structure that looks at the EQ
to see whether EQ currently has a coalescing value (thus it should be
re-evaluate) or was marked by the new flag indicating heavy
completions.
- When a cpu, which may be servicing multiple vectors, had at least 1 EQ
that should be checked, a new coalescing delay is calculated based on
the number of interrupts that occurred on the cpu.
- The new coalescing value is then applied to the EQs that had
interrupted on the cpu.
Lower IOps performance with write operations. Perf tool shows lock
contention in dma_pool_alloc and dma_pool_free related to the
txrdy_payload_pool.
The allocations are for dma buffers for XFER_RDY's, which actually are not
needed for the FCP_TRECEIVE command as the command contents are used by the
adapter to generate the IU.
Remove the allocations and the associated buffer pool. Rather than leaving
NULLs in buffer pointer locations, set command and sgl to indicate skipped
SGLE indexes.
James Smart [Fri, 18 Oct 2019 21:18:22 +0000 (14:18 -0700)]
scsi: lpfc: Fix hardlockup in lpfc_abort_handler
In lpfc_abort_handler, the lock acquire order is hbalock (irqsave),
buf_lock (irq) and ring_lock (irq). The issue is that in two places the
locks are released out of order - the buf_lock and the hbalock - resulting
in the cpu preemption/lock flags getting restored out of order and
deadlocking the cpu.
Fix the unlock order by fully releasing the hbalocks as well.
CC: Zhangguanghui <zhang.guanghui@h3c.com> Link: https://lore.kernel.org/r/20191018211832.7917-7-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Crash was caused by a bad ndlp address passed to I/O indicated by the XRI
aborted CQE. The address was not NULL so the routine deferenced the ndlp
ptr. The bad ndlp also caused the lpfc_sli4_io_xri_aborted to call an
erroneous io handler. Root cause for the bad ndlp was an lpfc_ncmd that
was aborted, put on the abort_io list, completed, taken off the abort_io
list, sent to lpfc_release_nvme_buf where it was put back on the abort_io
list because the lpfc_ncmd->flags setting LPFC_SBUF_XBUSY was not cleared
on the final completion.
Rework the exchange busy handling to ensure the flags are properly set for
both scsi and nvme.
Fixes: c490850a0947 ("scsi: lpfc: Adapt partitioned XRI lists to efficient sharing") Cc: <stable@vger.kernel.org> # v5.1+ Link: https://lore.kernel.org/r/20191018211832.7917-6-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
James Smart [Fri, 18 Oct 2019 21:18:20 +0000 (14:18 -0700)]
scsi: lpfc: Fix SLI3 hba in loop mode not discovering devices
When operating in private loop mode, PLOGI exchanges are racing and the
driver tries to abort it's PLOGI. But the PLOGI abort ends up terminating
the login with the other end causing the other end to abort its PLOGI as
well. Discovery never fully completes.
Fix by disabling the PLOGI abort when private loop and letting the state
machine play out.
James Smart [Fri, 18 Oct 2019 21:18:18 +0000 (14:18 -0700)]
scsi: lpfc: Fix reporting of read-only fw error errors
When the adapter FW is administratively set to RO mode, a FW update
triggered by the driver's sysfs attribute will fail. Currently, the
driver's logging mechanism does not properly parse the adapter return codes
and print a meaningful message. This oversight prevents quick diagnosis in
the field.
Parse the adapter return codes for Write_Object and write an appropriate
message to the system console.
James Smart [Fri, 18 Oct 2019 21:18:17 +0000 (14:18 -0700)]
scsi: lpfc: fix lpfc_nvmet_mrq to be bound by hdw queue count
Currently, lpfc_nvmet_mrq is always scaled back to the min(lpfc_nvmet_mrq,
lpfc_irq_chann). There's no reason to reduce it to the number of interrupt
vectors. Rather, it should be scaled down based on the number of hardware
queues for the system (if lower than max of 16).
Change scaling to use hardware queue count rather than interrupt vector
count.
David Disseldorp [Thu, 12 Sep 2019 09:55:46 +0000 (11:55 +0200)]
scsi: target: fix SendTargets=All string compares
strncmp is currently used for "SendTargets" key and "All" value matching
without checking for trailing garbage. This means that Text request PDUs
with garbage such as "SendTargetsPlease=All" and "SendTargets=Alle" are
processed successfully as if they were "SendTargets=All" requests.
David Disseldorp [Thu, 12 Sep 2019 09:55:45 +0000 (11:55 +0200)]
scsi: target: compare full CHAP_A Algorithm strings
RFC 2307 states:
For CHAP [RFC1994], in the first step, the initiator MUST send:
CHAP_A=<A1,A2...>
Where A1,A2... are proposed algorithms, in order of preference.
...
For the Algorithm, as stated in [RFC1994], one value is required to
be implemented:
5 (CHAP with MD5)
LIO currently checks for this value by only comparing a single byte in
the tokenized Algorithm string, which means that any value starting with
a '5' (e.g. "55") is interpreted as "CHAP with MD5". Fix this by
comparing the entire tokenized string.
Reviewed-by: Lee Duncan <lduncan@suse.com> Reviewed-by: Mike Christie <mchristi@redhat.com> Signed-off-by: David Disseldorp <ddiss@suse.de> Link: https://lore.kernel.org/r/20190912095547.22427-2-ddiss@suse.de Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Balsundar P [Tue, 15 Oct 2019 06:22:03 +0000 (11:52 +0530)]
scsi: aacraid: send AIF request post IOP RESET
After IOP reset completion, AIF request command is not issued to the
controller. Driver schedules a worker thread to issue a AIF request command
after IOP reset completion.
Balsundar P [Tue, 15 Oct 2019 06:22:02 +0000 (11:52 +0530)]
scsi: aacraid: check adapter health
Currently driver waits for the command IOCTL from the firmware and if the
firmware enters nonresponsive state, the driver doesn't respond till the
firmware is responsive again.
Check that firmware is alive, otherwise return -EBUSY.
Balsundar P [Tue, 15 Oct 2019 06:22:00 +0000 (11:52 +0530)]
scsi: aacraid: fixed firmware assert issue
Before issuing IOP reset, INTX mode is selected. This is triggering MSGU
lockup and ended in basecode assert. Use DROP_IO command when IOP reset is
sent in preparation for interrupt mode switch.
Balsundar P [Tue, 15 Oct 2019 06:21:59 +0000 (11:51 +0530)]
scsi: aacraid: fixed IO reporting error
The problem is the driver detects FastResponse bit set and saves it to
Fib's flags to not check IO response status, but it never clears it for
next IO. Hence the next IO will pick up FastResponse bit to not check
the IO response status and fail to report any type IO error to kernel
drivers/scsi/megaraid/megaraid_sas_fp.c: In function MR_GetSpanBlock:
drivers/scsi/megaraid/megaraid_sas_fp.c:400:16: warning: variable debugBlk set but not used [-Wunused-but-set-variable]
drivers/scsi/megaraid/megaraid_sas_fp.c: In function mr_spanset_get_phy_params:
drivers/scsi/megaraid/megaraid_sas_fp.c:713:25: warning: variable fusion set but not used [-Wunused-but-set-variable]
drivers/scsi/megaraid/megaraid_sas_fp.c: In function MR_GetPhyParams:
drivers/scsi/megaraid/megaraid_sas_fp.c:815:25: warning: variable fusion set but not used [-Wunused-but-set-variable]
'debugBlk' is introduced by commit 9c915a8c99bc ("[SCSI] megaraid_sas:
Add 9565/9285 specific code"), but never used, so remove it
'fusion' is not used since commit c365178f3147 ("scsi: megaraid_sas:
use adapter_type for all gen controllers")
scsi: megaraid_sas: Unique names for MSI-X vectors
Currently, MSI-X vectors name appears in /proc/interrupts is "megasas"
which is same for all the vectors. This patch provides a unique name for
all megaraid_sas controllers and their associated MSI-X interrupts.
Link: https://lore.kernel.org/r/20191007051828.12294-1-chandrakanth.patil@broadcom.com Suggested-by: Konstantin Shalygin <k0ste@k0ste.ru> Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com> Signed-off-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Kevin Barnett [Mon, 7 Oct 2019 22:32:10 +0000 (17:32 -0500)]
scsi: smartpqi: Align driver syntax with oob
Formatting changes, no functional changes.
Link: https://lore.kernel.org/r/157048753005.11757.2228541207280057256.stgit@brunhilda Reviewed-by: Scott Benesh <scott.benesh@microsemi.com> Reviewed-by: Scott Teel <scott.teel@microsemi.com> Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Kevin Barnett [Mon, 7 Oct 2019 22:32:04 +0000 (17:32 -0500)]
scsi: smartpqi: remove unused manifest constants
Removed some unused manifest constants.
Link: https://lore.kernel.org/r/157048752420.11757.3464951542864727227.stgit@brunhilda Reviewed-by: Scott Benesh <scott.benesh@microsemi.com> Reviewed-by: Scott Teel <scott.teel@microsemi.com> Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Kevin Barnett [Mon, 7 Oct 2019 22:31:58 +0000 (17:31 -0500)]
scsi: smartpqi: fix problem with unique ID for physical device
Obtain the unique IDs from the RLL and RPL instead of VPD page 83h.
Link: https://lore.kernel.org/r/157048751833.11757.11996314786914610803.stgit@brunhilda Reviewed-by: Scott Benesh <scott.benesh@microsemi.com> Reviewed-by: Scott Teel <scott.teel@microsemi.com> Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Kevin Barnett [Mon, 7 Oct 2019 22:31:52 +0000 (17:31 -0500)]
scsi: smartpqi: correct syntax issue
Link: https://lore.kernel.org/r/157048751247.11757.1727592925624138646.stgit@brunhilda Reviewed-by: Scott Benesh <scott.benesh@microsemi.com> Reviewed-by: Scott Teel <scott.teel@microsemi.com> Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Kevin Barnett [Mon, 7 Oct 2019 22:31:46 +0000 (17:31 -0500)]
scsi: smartpqi: change TMF timeout from 60 to 30 seconds
Link: https://lore.kernel.org/r/157048750649.11757.7811056360633694725.stgit@brunhilda Reviewed-by: Scott Benesh <scott.benesh@microsemi.com> Reviewed-by: Scott Teel <scott.teel@microsemi.com> Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Murthy Bhat [Mon, 7 Oct 2019 22:31:40 +0000 (17:31 -0500)]
scsi: smartpqi: fix LUN reset when fw bkgnd thread is hung
Add support for a timeout on LUN resets.
Link: https://lore.kernel.org/r/157048750055.11757.9689400788261610618.stgit@brunhilda Reviewed-by: Scott Benesh <scott.benesh@microsemi.com> Reviewed-by: Scott Teel <scott.teel@microsemi.com> Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com> Signed-off-by: Murthy Bhat <Murthy.Bhat@microsemi.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
koshyaji [Mon, 7 Oct 2019 22:31:34 +0000 (17:31 -0500)]
scsi: smartpqi: add inquiry timeouts
Add timeout field in RAID IU.
Link: https://lore.kernel.org/r/157048749461.11757.10013040278241807855.stgit@brunhilda Reviewed-by: Scott Benesh <scott.benesh@microsemi.com> Reviewed-by: Scott Teel <scott.teel@microsemi.com> Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com> Signed-off-by: koshyaji <ajish.koshy@microsemi.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Murthy Bhat [Mon, 7 Oct 2019 22:31:28 +0000 (17:31 -0500)]
scsi: smartpqi: fix call trace in device discovery
Use sas_phy_delete rather than sas_phy_free which, according to
comments, should not be called for PHYs that have been set up
successfully.
Link: https://lore.kernel.org/r/157048748876.11757.17773443136670011786.stgit@brunhilda Reviewed-by: Scott Benesh <scott.benesh@microsemi.com> Reviewed-by: Scott Teel <scott.teel@microsemi.com> Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com> Signed-off-by: Murthy Bhat <Murthy.Bhat@microsemi.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Kevin Barnett [Mon, 7 Oct 2019 22:31:23 +0000 (17:31 -0500)]
scsi: smartpqi: fix controller lockup observed during force reboot
Link: https://lore.kernel.org/r/157048748297.11757.3872221216800537383.stgit@brunhilda Reviewed-by: Scott Benesh <scott.benesh@microsemi.com> Reviewed-by: Scott Teel <scott.teel@microsemi.com> Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
zhengbin [Fri, 4 Oct 2019 10:04:37 +0000 (18:04 +0800)]
scsi: lpfc: Make function lpfc_defer_pt2pt_acc static
Fix sparse warnings:
drivers/scsi/lpfc/lpfc_nportdisc.c:290:1: warning: symbol 'lpfc_defer_pt2pt_acc' was not declared. Should it be static?
Link: https://lore.kernel.org/r/1570183477-137273-1-git-send-email-zhengbin13@huawei.com Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: zhengbin <zhengbin13@huawei.com> Reviewed-by: Dick Kennedy <dick.kennedy@broadcom.com> Reviewed-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
scsi: qla2xxx: Check for MB timeout while capturing ISP27/28xx FW dump
Add mailbox timeout checkout for ISP 27xx/28xx during FW dump procedure.
Without the timeout check, hardware lock can be held for long period. This
patch would shorten the dump procedure if a timeout condition is
encountered.
Link: https://lore.kernel.org/r/20190912180918.6436-12-hmadhani@marvell.com Reviewed-by: Laurence Oberman <loberman@redhat.com> Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Himanshu Madhani <hmadhani@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
During driver unload, the remove flag will be set for all
scsi_qla_host/NPIV. This allows each NPIV to see the flag instead of
reaching for base_vha to search for it.
scsi: qla2xxx: Add error handling for PLOGI ELS passthrough
Add error handling logic to ELS Passthrough relating to NVME devices.
Current code does not parse error code to take proper recovery action,
instead it re-logins with the same login parameters that encountered the
error. Ex: nport handle collision.
Some storage arrays advertise FCP LUNs and NVMe namespaces behind the same
WWN. The driver now offers a user option by way of NVRAM parameter to
allow users to choose, on a per port basis, the kind of FC-4 type they
would like to prioritize for login.
scsi: target: Remove tpg_list and se_portal_group.se_tpg_node
Maintaining tpg_list without ever iterating over it is not useful. Hence
remove tpg_list. This patch does not change the behavior of the SCSI target
code.
Cc: Mike Christie <mchristie@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Nicholas Bellinger <nab@linux-iscsi.org> Link: https://lore.kernel.org/r/20190930232224.58980-1-bvanassche@acm.org Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Steffen Maier [Tue, 1 Oct 2019 10:49:49 +0000 (12:49 +0200)]
scsi: zfcp: fix reaction on bit error threshold notification
On excessive bit errors for the FCP channel ingress fibre path, the channel
notifies us. Previously, we only emitted a kernel message and a trace
record. Since performance can become suboptimal with I/O timeouts due to
bit errors, we now stop using an FCP device by default on channel
notification so multipath on top can timely failover to other paths. A new
module parameter zfcp.ber_stop can be used to get zfcp old behavior.
User explanation of new kernel message:
* Description:
* The FCP channel reported that its bit error threshold has been exceeded.
* These errors might result from a problem with the physical components
* of the local fibre link into the FCP channel.
* The problem might be damage or malfunction of the cable or
* cable connection between the FCP channel and
* the adjacent fabric switch port or the point-to-point peer.
* Find details about the errors in the HBA trace for the FCP device.
* The zfcp device driver closed down the FCP device
* to limit the performance impact from possible I/O command timeouts.
* User action:
* Check for problems on the local fibre link, ensure that fibre optics are
* clean and functional, and all cables are properly plugged.
* After the repair action, you can manually recover the FCP device by
* writing "0" into its "failed" sysfs attribute.
* If recovery through sysfs is not possible, set the CHPID of the device
* offline and back online on the service element.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: <stable@vger.kernel.org> #2.6.30+ Link: https://lore.kernel.org/r/20191001104949.42810-1-maier@linux.ibm.com Reviewed-by: Jens Remus <jremus@linux.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Damien Le Moal [Tue, 1 Oct 2019 07:48:39 +0000 (16:48 +0900)]
scsi: core: save/restore command resid for error handling
When a non-passthrough command is terminated with CHECK CONDITION, request
sense is executed by hijacking the command descriptor. Since
scsi_eh_prep_cmnd() and scsi_eh_restore_cmnd() do not save/restore the
original command resid, the value returned on failure of the original
command is lost and replaced with the value set by the execution of the
request sense command. This value may in many instances be unaligned to the
device sector size, causing sd_done() to print a warning message about the
incorrect unaligned resid before the command is retried.
Fix this problem by saving the original command residual in struct
scsi_eh_save using scsi_eh_prep_cmnd() and restoring it in
scsi_eh_restore_cmnd(). In addition, to make sure that the request sense
command is executed with a correctly initialized command structure, also
reset the residual to 0 in scsi_eh_prep_cmnd() after saving the original
command value in struct scsi_eh_save.
Daniel Wagner [Fri, 27 Sep 2019 07:30:31 +0000 (09:30 +0200)]
scsi: qla2xxx: Remove WARN_ON_ONCE in qla2x00_status_cont_entry()
Commit 88263208dd23 ("scsi: qla2xxx: Complain if sp->done() is not called
from the completion path") introduced the WARN_ON_ONCE in
qla2x00_status_cont_entry(). The assumption was that there is only one
status continuations element. According to the firmware documentation it is
possible that multiple status continuations are emitted by the firmware.
Fixes: 88263208dd23 ("scsi: qla2xxx: Complain if sp->done() is not called from the completion path") Link: https://lore.kernel.org/r/20190927073031.62296-1-dwagner@suse.de Cc: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Oliver Neukum [Tue, 3 Sep 2019 10:18:39 +0000 (12:18 +0200)]
scsi: sd: Ignore a failure to sync cache due to lack of authorization
I've got a report about a UAS drive enclosure reporting back Sense: Logical
unit access not authorized if the drive it holds is password protected.
While the drive is obviously unusable in that state as a mass storage
device, it still exists as a sd device and when the system is asked to
perform a suspend of the drive, it will be sent a SYNCHRONIZE CACHE. If
that fails due to password protection, the error must be ignored.
Milan P. Gandhi [Thu, 26 Sep 2019 05:25:02 +0000 (10:55 +0530)]
scsi: core: Log SCSI command age with errors
Couple of users had requested to print the SCSI command age along with
command failure errors. This is a small change, but allows users to get
more important information about the command that was failed, it would help
the users in debugging the command failures:
Link: https://lore.kernel.org/r/20190926052501.GA8352@machine1 Signed-off-by: Milan P. Gandhi <mgandhi@redhat.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Daniel Wagner [Tue, 24 Sep 2019 07:29:06 +0000 (09:29 +0200)]
scsi: qedf: Add port_id getter
Add qedf_get_host_port_id() to the transport template.
The fc_transport_template initializes the port_id member to the default
value of -1. The new getter ensures that the sysfs entry shows the current
value and not the default one, e.g by using 'lsscsi -H -t'
Link: https://lore.kernel.org/r/20190924072906.23737-1-dwagner@suse.de Signed-off-by: Daniel Wagner <dwagner@suse.de> Acked-by: Saurav Kashyap <skashyap@marvell.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Stanley Chu [Mon, 16 Sep 2019 15:56:49 +0000 (23:56 +0800)]
scsi: core: allow auto suspend override by low-level driver
Rework from previous work by:
Sujit Reddy Thumma <sthumma@codeaurora.org>
Until now the scsi mid-layer forbids runtime suspend till userspace enables
it. This is mainly to quarantine some disks with broken runtime power
management or have high latencies executing suspend resume callbacks. If
the userspace doesn't enable the runtime suspend the underlying hardware
will be always on even when it is not doing any useful work and thus
wasting power.
Some low-level drivers for the controllers can efficiently use runtime
power management to reduce power consumption and improve battery life.
Allow runtime suspend parameters override within the LLD itself instead of
waiting for userspace to control the power management.
Colin Ian King [Thu, 5 Sep 2019 13:50:17 +0000 (14:50 +0100)]
scsi: mvsas: remove redundant assignment to variable rc
The variable rc is being initialized with a value that is never read and is
being re-assigned a little later on. The assignment is redundant and hence
can be removed.
Colin Ian King [Thu, 5 Sep 2019 13:42:29 +0000 (14:42 +0100)]
scsi: qla2xxx: remove redundant assignment to pointer host
The pointer host is being initialized with a value that is never read and
is being re-assigned a little later on. The assignment is redundant and
hence can be removed.
YueHaibing [Sat, 31 Aug 2019 13:03:48 +0000 (13:03 +0000)]
scsi: smartpqi: remove set but not used variable 'ctrl_info'
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/scsi/smartpqi/smartpqi_init.c: In function 'pqi_driver_version_show':
drivers/scsi/smartpqi/smartpqi_init.c:6164:24: warning:
variable 'ctrl_info' set but not used [-Wunused-but-set-variable]
commit 6d90615f1346 ("scsi: smartpqi: add sysfs entries") added it but
it was never used. Also remove variable 'shost'.
Load driver with module parameter "max_msix_vectors". Value provided in
module parameter is not used by mpt3sas driver. Driver loads with max
controller supported MSI-X value.
In _base_alloc_irq_vectors use reply_queue_count which is determined using
user provided msix value insted of ioc->msix_vector_count which tells max
supported msix value of the controller.
scsi: mpt3sas: Reject NVMe Encap cmnds to unsupported HBA
If any faulty application issues an NVMe Encapsulated commands to HBA which
doesn't support NVMe protocol then driver should return the command as
invalid with the following message.
"HBA doesn't support NVMe. Rejecting NVMe Encapsulated request."
Otherwise below page fault kernel panic will be observed while building the
PRPs as there is no PRP pools allocated for the HBA which doesn't support
NVMe drives.
scsi: mpt3sas: Use Component img header to get Package ver
The firmware image layout has been changed for Aero controllers. All
compatible HBAs have to get Firmware Package version from Component Image
Header layout.
The Signature field in FW header is set to 0xEB000042 for products
compatible with Component Image Header.
For compatible controllers, driver fetches firmware package version from
ApplicationSpecific field of Component Image Header.
scsi: mpt3sas: Add app owned flag support for diag buffer
Added a new status flag named MPT3_DIAG_BUFFER_IS_APP_OWNED and it will set
whenever application registers the diag buffer & it will be cleared when
application unregisters the buffer.
When this flag is enabled, and if application issues diag buffer register
command without releasing the buffer, then register command will be failed
with -EINVAL status by saying that this buffer is already registered by the
application.
When user issues a trace buffer register command through sysfs parameter,
and if trace buffer is in released stated but not yet unregistered by the
application which was owning it, then driver will unregister the buffer by
itself and freshly register the 1MB sized trace buffer with the HBA
firmware.
scsi: mpt3sas: Reuse diag buffer allocated at load time
The diag buffer which is allocated during driver load time or through sysfs
parameter is marked as driver allocated diag buffer.
MPT3_DIAG_BUFFER_IS_DRIVER_ALLOCATED bit will be set for this buffer.
This buffer won't be de-allocated even when application issues unregister
command, driver just clears the registered status bit. Same buffer will be
reused while re-registering the same diag buffer type by any application.
While re-registering the same diag buffer type application has to register
with the same size that the buffer was allocated during driver load
time. This buffer size can be read by the application by issuing diag
'query' command.
This always makes sure that the memory is available for applications for
collecting the firmware logs. Only thing is that this won't allow the
application to re-register the diag buffer with different size, but the
buffer size which is allocated during driver load time will be enough for
most of the cases for collecting the firmware logs.
scsi: mpt3sas: clear release bit when buffer reregistered
Clear MPT3_DIAG_BUFFER_IS_RELEASED bit once diag buffer is re-registered
after reading the buffer, else driver won't release the buffer and return
the 'diag release' command with -EINVAL status saying that buffer is
already released.
scsi: mpt3sas: Maintain owner of buffer through UniqueID
Application A has registered a diag buffer and looking for particular event
to happen to release & read the trace buffer. Meanwhile application B has
unregistered the diag buffer and now Application A can't get the required
diag buffer. So proper diag buffer ownership is missing.
Each application has to maintain its own Unique ID. Now driver has to save
the Application's UniqueID for each diag buffer type when diag buffer is
registered. And driver has to allow 'release', 'read' & 'unregister' diag
commands only if application's UniqueID matches with saved UniqueID for the
corresponding diag buffer type.
When diag buffer is registered by the driver, then the UniqueID saved by
the driver is "BRCM" (i.e. 0x4252434D) for SAS3 and above generations HBA
devices. For SAS2 HBAs, driver keeps the legacy UniqueID 0x07075900 for
maintaining compatibility with the legacy SAS2 application and this
improvement won't be applicable for SAS2 HBA devices.
Any application can own the buffer registered by the driver by sending
diag register request to driver with same buffer type and size
(Application can get the buffer size by sending 'query' command). Then
driver changes the ownership of the buffer by saving application's
UniqueID for that corresponding buffer type.
Also, application can re-register the diag buffer with same size without
un-registering it, but diag buffer should be released before re-registering
it. By allowing this, driver no need to deallocate and allocate a new
buffer for re-register command, same buffer can be re-used.
scsi: mpt3sas: Free diag buffer without any status check
Memory leak can happen when diag buffer is released but not unregistered
(where buffer is deallocated) by the user. During module unload time driver
is not deallocating the buffer if the buffer is in released state.
Deallocate the diag buffer during module unload time without any diag
buffer status checks.
scsi: mpt3sas: Fix clear pending bit in ioctl status
When user issues diag register command from application with required size,
and if driver unable to allocate the memory, then it will fail the register
command. While failing the register command, driver is not currently
clearing MPT3_CMD_PENDING bit in ctl_cmds.status variable which was set
before trying to allocate the memory. As this bit is set, subsequent
register command will be failed with BUSY status even when user wants to
register the trace buffer will less memory.
Clear MPT3_CMD_PENDING bit in ctl_cmds.status before returning the diag
register command with no memory status.
scsi: mpt3sas: Register trace buffer based on NVDATA settings
Currently if user wishes to enable the host trace buffer during driver load
time, then user has to load the driver with module parameter
'diag_buffer_enable' set to one.
Alternatively now the user can enable host trace buffer by enabling the
following fields in manufacturing page11 in NVDATA (nvdata xml is used
while building HBA firmware image):
* HostTraceBufferMaxSizeKB - Maximum trace buffer size in KB that host can
allocate,
* HostTraceBufferMinSizeKB - Minimum trace buffer size in KB atleast host
should allocate,
* HostTraceBufferDecrementSizeKB - size by which host can reduce from
buffer size and retry the buffer allocation
when buffer allocation failed with previous
calculated buffer size.
The driver will register the trace buffer automatically without any module
parameter during boot time when above fields are enabled in manufacturing
page11 in HBA firmware.
Driver follows the following algorithm for enabling the host trace buffer
during driver load time:
* If user has loaded the driver with module parameter 'diag_buffer_enable'
set to one, then driver allocates 2MB buffer and registers this buffer
with HBA firmware for capturing the firmware trace logs.
* Else driver reads manufacture page11 data and checks whether
HostTraceBufferMaxSizeKB filed is zero or not?
- If HostTraceBufferMaxSizeKB is non-zero then driver tries to allocate
HostTraceBufferMaxSizeKB size of memory. If the buffer allocation is
successful, then it will register this buffer with HBA firmware, else
in a loop the driver will try again by reducing the current buffer size
with HostTraceBufferDecrementSizeKB size until memory allocation is
successful or buffer size falls below HostTraceBufferMinSizeKB. If the
memory allocation is successful, then the buffer will be registered
with the firmware. Else, if the buffer size falls below the
HostTraceBufferMinSizeKB, then driver won't register trace buffer with
HBA firmware.
- If HostTraceBufferMaxSizeKB is zero, then driver won't register trace
buffer with HBA firmware.
James Smart [Sun, 22 Sep 2019 03:59:04 +0000 (20:59 -0700)]
scsi: lpfc: Complete removal of FCoE T10 PI support on SLI-4 adapters
T10 PI support on SLI-4-based FCoE adapters is not supported. A prior
commit in the 12.4.0.0 stream added device recognition that would prevent
T10 PI enablement. However, it didn't contain a complete device list. Thus
some SLI-4 FCoE adapters still had T10 PI enabled.
Fix by expanding the device list that identifies FCoE devices.
James Smart [Sun, 22 Sep 2019 03:59:02 +0000 (20:59 -0700)]
scsi: lpfc: Fix list corruption detected in lpfc_put_sgl_per_hdwq
In lpfc_release_io_buf, an lpfc_io_buf is returned to the 'available' pool
before any associated sgl or cmd and rsp buffers are returned via their
respective 'put' routines. If xri rebalancing occurs and an lpfc_io_buf
structure is reused quickly, there may be a race condition between release
of old and association of new resources.
Re-ordered lpfc_release_io_buf to release sgl and cmd/rsp
buffer lists before releasing the lpfc_io_buf structure for re-use.
Fixes: d79c9e9d4b3d ("scsi: lpfc: Support dynamic unbounded SGL lists on G7 hardware.") Link: https://lore.kernel.org/r/20190922035906.10977-17-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
James Smart [Sun, 22 Sep 2019 03:59:01 +0000 (20:59 -0700)]
scsi: lpfc: Fix hdwq sgl locks and irq handling
Many of the sgl-per-hdwq paths are locking with spin_lock_irq() and
spin_unlock_irq() and may unwittingly raising irq when it shouldn't. Hard
deadlocks were seen around lpfc_scsi_prep_cmnd().
Fix by converting the locks to irqsave/irqrestore.
Fixes: d79c9e9d4b3d ("scsi: lpfc: Support dynamic unbounded SGL lists on G7 hardware.") Link: https://lore.kernel.org/r/20190922035906.10977-16-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
James Smart [Sun, 22 Sep 2019 03:58:59 +0000 (20:58 -0700)]
scsi: lpfc: Fix list corruption in lpfc_sli_get_iocbq
After study, it was determined there was a double free of a CT iocb during
execution of lpfc_offline_prep and lpfc_offline. The prep routine issued
an abort for some CT iocbs, but the aborts did not complete fast enough for
a subsequent routine that waits for completion. Thus the driver proceeded
to lpfc_offline, which releases any pending iocbs. Unfortunately, the
completions for the aborts were then received which re-released the ct
iocbs.
Turns out the issue for why the aborts didn't complete fast enough was not
their time on the wire/in the adapter. It was the lpfc_work_done routine,
which requires the adapter state to be UP before it calls
lpfc_sli_handle_slow_ring_event() to process the completions. The issue is
the prep routine takes the link down as part of it's processing.
To fix, the following was performed:
- Prevent the offline routine from releasing iocbs that have had aborts
issued on them. Defer to the abort completions. Also means the driver
fully waits for the completions. Given this change, the recognition of
"driver-generated" status which then releases the iocb is no longer
valid. As such, the change made in the commit 296012285c90 is reverted.
As recognition of "driver-generated" status is no longer valid, this
patch reverts the changes made in
commit 296012285c90 ("scsi: lpfc: Fix leak of ELS completions on adapter reset")
- Modify lpfc_work_done to allow slow path completions so that the abort
completions aren't ignored.
- Updated the fdmi path to recognize a CT request that fails due to the
port being unusable. This stops FDMI retries. FDMI will be restarted on
next link up.