]> asedeno.scripts.mit.edu Git - linux.git/log
linux.git
5 years agoiw_cxgb4: Make function read_tcb() static
Wei Yongjun [Mon, 18 Feb 2019 07:04:32 +0000 (07:04 +0000)]
iw_cxgb4: Make function read_tcb() static

Fixes the following sparse warning:

drivers/infiniband/hw/cxgb4/cm.c:658:6: warning:
 symbol 'read_tcb' was not declared. Should it be static?

Fixes: 11a27e2121a5 ("iw_cxgb4: complete the cached SRQ buffers")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Raju Rangoju <rajur@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/hns: Bugfix for set hem of SCC
Yangyang Li [Sat, 16 Feb 2019 12:10:25 +0000 (20:10 +0800)]
RDMA/hns: Bugfix for set hem of SCC

The method of set hem for scc context is different from other contexts. It
should notify the hardware with the detailed idx in bt0 for scc, while for
other contexts, it only need to notify the bt step and the hardware will
calculate the idx.

Here fixes the following error when unloading the hip08 driver:

[  123.570768] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
[  123.579023] {1}[Hardware Error]: event severity: recoverable
[  123.584670] {1}[Hardware Error]:  Error 0, type: recoverable
[  123.590317] {1}[Hardware Error]:   section_type: PCIe error
[  123.595877] {1}[Hardware Error]:   version: 4.0
[  123.600395] {1}[Hardware Error]:   command: 0x0006, status: 0x0010
[  123.606562] {1}[Hardware Error]:   device_id: 0000:7d:00.0
[  123.612034] {1}[Hardware Error]:   slot: 0
[  123.616120] {1}[Hardware Error]:   secondary_bus: 0x00
[  123.621245] {1}[Hardware Error]:   vendor_id: 0x19e5, device_id: 0xa222
[  123.627847] {1}[Hardware Error]:   class_code: 000002
[  123.632977] hns3 0000:7d:00.0: aer_status: 0x00000000, aer_mask: 0x00000000
[  123.639928] hns3 0000:7d:00.0: aer_layer=Transaction Layer, aer_agent=Receiver ID
[  123.647400] hns3 0000:7d:00.0: aer_uncor_severity: 0x00000000
[  123.653136] hns3 0000:7d:00.0: PCI error detected, state(=1)!!
[  123.658959] hns3 0000:7d:00.0: ROCEE uncorrected RAS error identified
[  123.665395] hns3 0000:7d:00.0: ROCEE RAS AXI rresp error
[  123.670713] hns3 0000:7d:00.0: requesting reset due to PCI error
[  123.676715] hns3 0000:7d:00.0: received reset event , reset type is 5
[  123.683147] hns3 0000:7d:00.0: AER: Device recovery successful
[  123.688978] hns3 0000:7d:00.0: PF Reset requested
[  123.693684] hns3 0000:7d:00.0: PF failed(=-5) to send mailbox message to VF
[  123.700633] hns3 0000:7d:00.0: inform reset to vf(1) failded -5!

Fixes: 6a157f7d1b14 ("RDMA/hns: Add SCC context allocation support for hip08")
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Reviewed-by: Yixian Liu <liuyixian@huawei.com>
Reviewed-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/hns: Modify qp&cq&pd specification according to UM
Lijun Ou [Sat, 16 Feb 2019 12:10:24 +0000 (20:10 +0800)]
RDMA/hns: Modify qp&cq&pd specification according to UM

Accroding to hip08's limitation, qp&cq specification is 1M, mtpt
specification 1M in kernel space.

Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agolib/irq_poll: Support schedules in non-interrupt contexts
Steve Wise [Wed, 6 Feb 2019 21:11:49 +0000 (13:11 -0800)]
lib/irq_poll: Support schedules in non-interrupt contexts

Do not assume irq_poll_sched() is called from an interrupt context only.
So use raise_softirq_irqoff() instead of __raise_softirq_irqoff() so it
will kick the ksoftirqd if the schedule is from a non-interrupt context.

This is required for RDMA drivers, like soft iwarp, that generate cq
completion notifications in a workqueue or kthread context.  Without this
change, siw completion notifications to the ULP can take several hundred
usecs, depending on the system load.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agordma_rxe: Use netlink messages to add/delete links
Steve Wise [Fri, 15 Feb 2019 19:03:57 +0000 (11:03 -0800)]
rdma_rxe: Use netlink messages to add/delete links

Add support for the RDMA_NLDEV_CMD_NEWLINK/DELLINK messages which allow
dynamically adding new RXE links.  Deprecate the old module options for
now.

Cc: Moni Shoua <monis@mellanox.com>
Reviewed-by: Yanjun Zhu <yanjun.zhu@oracle.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/core: Add RDMA_NLDEV_CMD_NEWLINK/DELLINK support
Steve Wise [Fri, 15 Feb 2019 19:03:53 +0000 (11:03 -0800)]
RDMA/core: Add RDMA_NLDEV_CMD_NEWLINK/DELLINK support

Add support for new LINK messages to allow adding and deleting rdma
interfaces.  This will be used initially for soft rdma drivers which
instantiate device instances dynamically by the admin specifying a netdev
device to use.  The rdma_rxe module will be the first user of these
messages.

The design is modeled after RTNL_NEWLINK/DELLINK: rdma drivers register
with the rdma core if they provide link add/delete functions.  Each driver
registers with a unique "type" string, that is used to dispatch messages
coming from user space.  A new RDMA_NLDEV_ATTR is defined for the "type"
string.  User mode will pass 3 attributes in a NEWLINK message:
RDMA_NLDEV_ATTR_DEV_NAME for the desired rdma device name to be created,
RDMA_NLDEV_ATTR_LINK_TYPE for the "type" of link being added, and
RDMA_NLDEV_ATTR_NDEV_NAME for the net_device interface to use for this
link.  The DELLINK message will contain the RDMA_NLDEV_ATTR_DEV_INDEX of
the device to delete.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/usnic: Fix deadlock
Parvi Kaustubhi [Sat, 9 Feb 2019 17:28:30 +0000 (09:28 -0800)]
IB/usnic: Fix deadlock

There is a dead lock in usnic ib_register and netdev_notify path.

usnic_ib_discover_pf()
| mutex_lock(&usnic_ib_ibdev_list_lock);
 | usnic_ib_device_add();
  | ib_register_device()
   | usnic_ib_query_port()
    | mutex_lock(&us_ibdev->usdev_lock);
     | ib_get_eth_speed()
      | rtnl_lock()

order of lock: &usnic_ib_ibdev_list_lock -> usdev_lock -> rtnl_lock

rtnl_lock()
 | usnic_ib_netdevice_event()
  | mutex_lock(&usnic_ib_ibdev_list_lock);

order of lock: rtnl_lock -> &usnic_ib_ibdev_list_lock

Solution is to use the core's lock-free ib_device_get_by_netdev() scheme
to lookup ib_dev while handling netdev & inet events.

Signed-off-by: Parvi Kaustubhi <pkaustub@cisco.com>
Reviewed-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
Reviewed-by: Tanmay Inamdar <tinamdar@cisco.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/rxe: Close a race after ib_register_device
Jason Gunthorpe [Wed, 13 Feb 2019 04:12:56 +0000 (21:12 -0700)]
RDMA/rxe: Close a race after ib_register_device

Since rxe allows unregistration from other threads the rxe pointer can
become invalid any moment after ib_register_driver returns. This could
cause a user triggered use after free.

Add another driver callback to be called right after the device becomes
registered to complete any device setup required post-registration.  This
callback has enough core locking to prevent the device from becoming
unregistered.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/rxe: Add ib_device_get_by_name() and use it in rxe
Jason Gunthorpe [Wed, 13 Feb 2019 04:12:55 +0000 (21:12 -0700)]
RDMA/rxe: Add ib_device_get_by_name() and use it in rxe

rxe has an open coded version of this that is not as safe as the core
version. This lets us eliminate the internal device list entirely from
rxe.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/rxe: Use driver_unregister and new unregistration API
Jason Gunthorpe [Tue, 22 Jan 2019 23:27:24 +0000 (16:27 -0700)]
RDMA/rxe: Use driver_unregister and new unregistration API

rxe does not have correct locking for its registration/unregistration
paths, use the core code to handle it instead. In this mode
ib_unregister_device will also do the dealloc, so rxe is required to do
clean up from a callback.

The core code ensures that unregistration is done only once, and generally
takes care of locking and concurrency problems for rxe.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/device: Provide APIs from the core code to help unregistration
Jason Gunthorpe [Wed, 13 Feb 2019 04:12:53 +0000 (21:12 -0700)]
RDMA/device: Provide APIs from the core code to help unregistration

These APIs are intended to support drivers that exist outside the usual
driver core probe()/remove() callbacks. Normally the driver core will
prevent remove() from running concurrently with probe(), once this safety
is lost drivers need more support to get the locking and lifetimes right.

ib_unregister_driver() is intended to be used during module_exit of a
driver using these APIs. It unregisters all the associated ib_devices.

ib_unregister_device_and_put() is to be used by a driver-specific removal
function (ie removal by name, removal from a netdev notifier, removal from
netlink)

ib_unregister_queued() is to be used from netdev notifier chains where
RTNL is held.

The locking is tricky here since once things become async it is possible
to race unregister with registration. This is largely solved by relying on
the registration refcount, unregistration will only ever work on something
that has a positive registration refcount - and then an unregistration
mutex serializes all competing unregistrations of the same device.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/rxe: Use ib_device_get_by_netdev() instead of open coding
Jason Gunthorpe [Wed, 13 Feb 2019 04:12:52 +0000 (21:12 -0700)]
RDMA/rxe: Use ib_device_get_by_netdev() instead of open coding

The core API handles the locking correctly and is faster if there are
multiple devices.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/device: Add ib_device_get_by_netdev()
Jason Gunthorpe [Wed, 13 Feb 2019 04:12:51 +0000 (21:12 -0700)]
RDMA/device: Add ib_device_get_by_netdev()

Several drivers need to find the ib_device from a given netdev. rxe needs
this at speed in an unsleepable context, so choose to implement the
translation using a RCU safe hash table.

The hash table can have a many to one mapping. This is intended to support
some future case where multiple IB drivers (ie iWarp and RoCE) connect to
the same netdevs. driver_ids will need to be different to support this.

In the process this makes the struct ib_device and ib_port_data RCU safe
by deferring their kfrees.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/device: Add ib_device_set_netdev() as an alternative to get_netdev
Jason Gunthorpe [Wed, 13 Feb 2019 04:12:50 +0000 (21:12 -0700)]
RDMA/device: Add ib_device_set_netdev() as an alternative to get_netdev

The associated netdev should not actually be very dynamic, so for most
drivers there is no reason for a callback like this. Provide an API to
inform the core code about the net dev affiliation and use a core
maintained data structure instead.

This allows the core code to be more aware of the ndev relationship which
will allow some new APIs based around this.

This also uses locking that makes some kind of sense, many drivers had a
confusing RCU lock, or missing locking which isn't right.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/cache: Move the cache per-port data into the main ib_port_data
Jason Gunthorpe [Wed, 13 Feb 2019 04:12:49 +0000 (21:12 -0700)]
RDMA/cache: Move the cache per-port data into the main ib_port_data

Like the other cases there no real reason to have another array just for
the cache. This larger conversion gets its own patch.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/device: Consolidate ib_device per_port data into one place
Jason Gunthorpe [Wed, 13 Feb 2019 04:12:48 +0000 (21:12 -0700)]
RDMA/device: Consolidate ib_device per_port data into one place

There is no reason to have three allocations of per-port data. Combine
them together and make the lifetime for all the per-port data match the
struct ib_device.

Following patches will require more port-specific data, now there is a
good place to put it.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA: Add and use rdma_for_each_port
Jason Gunthorpe [Wed, 13 Feb 2019 04:12:47 +0000 (21:12 -0700)]
RDMA: Add and use rdma_for_each_port

We have many loops iterating over all of the end port numbers on a struct
ib_device, simplify them with a for_each helper.

Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/nldev: Don't expose number of not-visible entries
Leon Romanovsky [Mon, 18 Feb 2019 20:25:52 +0000 (22:25 +0200)]
RDMA/nldev: Don't expose number of not-visible entries

Netlink dumpit handshake exchanges the index from which kernel should
start to return its value, in current code, this index included
not-visible in this PID items too and indirectly revealed the number of
entries.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/nldev: Connect QP number to .doit callback
Leon Romanovsky [Mon, 18 Feb 2019 20:25:51 +0000 (22:25 +0200)]
RDMA/nldev: Connect QP number to .doit callback

This patch adds ability to query specific QP based on its LQPN (local
QPN), which is assigned by HW and needs special treatment while inserting
into restrack DB.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/nldev: Provide parent IDs for PD, MR and QP objects
Leon Romanovsky [Mon, 18 Feb 2019 20:25:50 +0000 (22:25 +0200)]
RDMA/nldev: Provide parent IDs for PD, MR and QP objects

PD, MR and QP objects have parents objects: contexts and PDs.  The exposed
parent IDs allow to correlate various objects and simplify debug
investigation.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/nldev: Share with user-space object IDs
Leon Romanovsky [Mon, 18 Feb 2019 20:25:49 +0000 (22:25 +0200)]
RDMA/nldev: Share with user-space object IDs

Give to the user space tools unique identifier for PD, MR, CQ and CM_ID
objects, so they can be able to query on them with .doit callbacks.

QP .doit is not supported yet, till all drivers will be updated to provide
their LQPN to be equal to their restrack ID.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/restrack: Prepare restrack_root to addition of extra fields per-type
Leon Romanovsky [Mon, 18 Feb 2019 20:25:48 +0000 (22:25 +0200)]
RDMA/restrack: Prepare restrack_root to addition of extra fields per-type

As a preparation to extension of rdma_restrack_root to provide software
IDs, which will be per-type too. We convert the rdma_restrack_root from
struct with arrays to array of structs.

Such conversion allows us to drop rwsem lock in favour of internal XArray
lock.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/restrack: Hide restrack DB from IB/core
Leon Romanovsky [Mon, 18 Feb 2019 20:25:47 +0000 (22:25 +0200)]
RDMA/restrack: Hide restrack DB from IB/core

There is no need to expose internals of restrack DB to IB/core.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/restrack: Reduce scope of synchronization lock while updating DB
Leon Romanovsky [Mon, 18 Feb 2019 20:25:46 +0000 (22:25 +0200)]
RDMA/restrack: Reduce scope of synchronization lock while updating DB

XArray uses internal lock for updates to XArray. This means that our
external RW lock is needed to ensure that entry is not deleted while we
are performing iteration over list.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/nldev: Add resource tracker doit callback
Leon Romanovsky [Mon, 18 Feb 2019 20:25:45 +0000 (22:25 +0200)]
RDMA/nldev: Add resource tracker doit callback

Implement doit callbacks and ensure that users won't provide port values
on resource entry allocated in per-device mode needed for .doit callback.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/restrack: Translate from ID to restrack object
Leon Romanovsky [Mon, 18 Feb 2019 20:25:44 +0000 (22:25 +0200)]
RDMA/restrack: Translate from ID to restrack object

Add new general helper to get restrack entry given by ID and their
respective type.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/restrack: Convert internal DB from hash to XArray
Leon Romanovsky [Mon, 18 Feb 2019 20:25:43 +0000 (22:25 +0200)]
RDMA/restrack: Convert internal DB from hash to XArray

The additions of .doit callbacks posses new access pattern to the resource
entries by some user visible index. Back then, the legacy DB was
implemented as hash because per-index access wasn't needed and XArray
wasn't accepted yet.

Acceptance of XArray together with per-index access requires the refresh
of DB implementation.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/core: Move device addition deletion to device.c
Parav Pandit [Wed, 13 Feb 2019 17:23:06 +0000 (19:23 +0200)]
RDMA/core: Move device addition deletion to device.c

Move core device addition and removal from sysfs.c to device.c as device.c
is more appropriate place for device management.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/core: Introduce and use ib_setup_port_attrs()
Parav Pandit [Wed, 13 Feb 2019 17:23:04 +0000 (19:23 +0200)]
RDMA/core: Introduce and use ib_setup_port_attrs()

Refactor code for device and port sysfs attributes for reuse.

While at it, rename counter part free function to ib_free_port_attrs.

Also attribute setup sequence is:
(a) port specific init.
(b) device stats alloc/init.

So for cleanup, follow reverse sequence:
(a) device stats dealloc
(b) port specific cleanup

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/core: Use simpler device_del() instead of device_unregister()
Parav Pandit [Wed, 13 Feb 2019 17:23:03 +0000 (19:23 +0200)]
RDMA/core: Use simpler device_del() instead of device_unregister()

Instead of holding extra reference using get_device() that
device_unregister() releases, simplify it as below.

device_add() balances with device_del().  device_initialize() balances
with put_device(), always via ib_dealloc_device().

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/cxgb4: Remove kref accounting for sync operation
Leon Romanovsky [Tue, 12 Feb 2019 18:39:15 +0000 (20:39 +0200)]
RDMA/cxgb4: Remove kref accounting for sync operation

Ucontext allocation and release aren't async events and don't need kref
accounting. The common layer of RDMA subsystem ensures that dealloc
ucontext will be called after all other objects are released.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Tested-by: Raju Rangoju <rajur@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/nes: Remove useless usecnt variable and redundant memset
Leon Romanovsky [Tue, 12 Feb 2019 18:39:14 +0000 (20:39 +0200)]
RDMA/nes: Remove useless usecnt variable and redundant memset

The internal design of RDMA/core ensures that there dealloc ucontext will
be called only if alloc_ucontext succeeded, hence there is no need to
manage internal variable to mark validity of ucontext.

As part of this change, remove redundant memeset too.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/hfi1: Fix a build warning for TID RDMA READ
Kaike Wan [Fri, 15 Feb 2019 21:45:30 +0000 (13:45 -0800)]
IB/hfi1: Fix a build warning for TID RDMA READ

The following build warning was produced for the TID RDMA READ
patch ("IB/hfi1: Enable TID RDMA READ protocol"):

drivers/infiniband/hw/hfi1/qp.c: In function 'hfi1_setup_wqe':
drivers/infiniband/hw/hfi1/qp.c:328:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
   hfi1_setup_tid_rdma_wqe(qp, wqe);
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/infiniband/hw/hfi1/qp.c:329:2: note: here
  case IB_QPT_UC:
  ^~~~

This patch will fix the issue by adding the "fall through" comment.

Fixes: f1ab4efa6d32 ("IB/hfi1: Enable TID RDMA READ protocol")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/uverbs: Fix an error flow in ib_uverbs_poll_cq
Jason Gunthorpe [Thu, 14 Feb 2019 20:13:31 +0000 (20:13 +0000)]
RDMA/uverbs: Fix an error flow in ib_uverbs_poll_cq

The new output_written block was wrongly placed before the ret=0, causing
the error code to be lost. uverbs_output_written is not expected to fail,
and even if it does fail it has no significant impact on the userspace
flow.

Reported-by: Bart Van Assche <bvanassche@acm.org>
Fixes: d6f4a21f309d ("RDMA/uverbs: Mark ioctl responses with UVERBS_ATTR_F_VALID_OUTPUT")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
5 years agoIB/{hw,sw}: Remove 'uobject->context' dependency in object creation APIs
Shamir Rabinovitch [Thu, 7 Feb 2019 16:44:49 +0000 (18:44 +0200)]
IB/{hw,sw}: Remove 'uobject->context' dependency in object creation APIs

Now when we have the udata passed to all the ib_xxx object creation APIs
and the additional macro 'rdma_udata_to_drv_context' to get the
ib_ucontext from ib_udata stored in uverbs_attr_bundle, we can finally
start to remove the dependency of the drivers in the
ib_xxx->uobject->context.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/verbs: Add helper function rdma_udata_to_drv_context
Shamir Rabinovitch [Thu, 7 Feb 2019 16:44:48 +0000 (18:44 +0200)]
IB/verbs: Add helper function rdma_udata_to_drv_context

Helper function to get driver's context out of ib_udata wrapped in
uverbs_attr_bundle for user objects or NULL for kernel objects.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/uverbs: Add ib_ucontext to uverbs_attr_bundle sent from ioctl and cmd flows
Shamir Rabinovitch [Thu, 7 Feb 2019 16:44:47 +0000 (18:44 +0200)]
IB/uverbs: Add ib_ucontext to uverbs_attr_bundle sent from ioctl and cmd flows

Add ib_ucontext to the uverbs_attr_bundle sent down the iocl and cmd flows
as soon as the flow has ib_uobject.

In addition, remove rdma_get_ucontext helper function that is only used by
ib_umem_get.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/ipoib: Use __func__ instead of function's name
Erez Alfasi [Wed, 13 Feb 2019 17:12:02 +0000 (19:12 +0200)]
IB/ipoib: Use __func__ instead of function's name

Changed debug statements to use %s and __func__ instead
of hard-coded function's name.

Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/iwpm: Remove set but not used variable 'msg_seq'
YueHaibing [Thu, 14 Feb 2019 01:57:10 +0000 (01:57 +0000)]
RDMA/iwpm: Remove set but not used variable 'msg_seq'

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/infiniband/core/iwpm_util.c: In function 'iwpm_send_hello':
drivers/infiniband/core/iwpm_util.c:811:6: warning:
 variable 'msg_seq' set but not used [-Wunused-but-set-variable]

It never used since introduction in commit b0bad9ad514f ("RDMA/IWPM:
Support no port mapping requirements")

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/hns: Configure capacity of hns device
Lijun Ou [Sun, 3 Feb 2019 08:13:07 +0000 (16:13 +0800)]
RDMA/hns: Configure capacity of hns device

This patch adds new device capability for IB_DEVICE_MEM_MGT_EXTENSIONS to
indicate device support for the following features:

1. Fast register memory region.
2. send with remote invalidate by frmr
3. local invalidate memory regsion

As well as adds the max depth of frmr page list len.

Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/hns: Delete useful prints for aeq subtype event
Yixian Liu [Sun, 3 Feb 2019 08:13:06 +0000 (16:13 +0800)]
RDMA/hns: Delete useful prints for aeq subtype event

Current all messages printed for aeq subtype event are wrong.  Thus,
delete them and only the value of subtype event is printed.

Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/hns: Set allocated memory to zero for wrid
Yixian Liu [Sun, 3 Feb 2019 08:13:05 +0000 (16:13 +0800)]
RDMA/hns: Set allocated memory to zero for wrid

The memory allocated for wrid should be initialized to zero.

Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/hns: Fix the state of rereg mr
Yixian Liu [Sun, 3 Feb 2019 08:13:04 +0000 (16:13 +0800)]
RDMA/hns: Fix the state of rereg mr

The state of mr after reregister operation should be set to valid
state. Otherwise, it will keep the same as the state before reregistered.

Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/hns: Limit minimum ROCE CQ depth to 64
chenglang [Sun, 3 Feb 2019 08:13:03 +0000 (16:13 +0800)]
RDMA/hns: Limit minimum ROCE CQ depth to 64

This patch modifies the minimum CQ depth specification of hip08 and is
consistent with the processing of hip06.

Signed-off-by: chenglang <chenglang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/nes: Use for_each_sg_dma_page iterator for umem SGL
Shiraz, Saleem [Tue, 12 Feb 2019 17:01:14 +0000 (11:01 -0600)]
RDMA/nes: Use for_each_sg_dma_page iterator for umem SGL

Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped
SGL and get the page DMA address. This avoids the extra loop to iterate
pages in the SGE when for_each_sg iterator is used.

Additionally, purge umem->page_shift usage in the driver as its only
relevant for ODP MRs. Use system page size and shift instead.

Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/rdmavt: Adapt to handle non-uniform sizes on umem SGEs
Shiraz, Saleem [Tue, 12 Feb 2019 16:52:24 +0000 (10:52 -0600)]
RDMA/rdmavt: Adapt to handle non-uniform sizes on umem SGEs

rdmavt expects a uniform size on all umem SGEs which is currently at
PAGE_SIZE.

Adapt to a umem API change which could return non-uniform sized SGEs due
to combining contiguous PAGE_SIZE regions into an SGE. Use
for_each_sg_page variant to unfold the larger SGEs into a list of
PAGE_SIZE elements.

Additionally, purge umem->page_shift usage in the driver as its only
relevant for ODP MRs. Use system page size and shift instead.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA: Fix allocation failure on pointer pd
Colin Ian King [Tue, 12 Feb 2019 11:22:33 +0000 (11:22 +0000)]
RDMA: Fix allocation failure on pointer pd

The null check on an allocation failure on pd is currently checking
if pd is non-null rather than null. Fix this by adding the missing !
operator.

Fixes: 21a428a019c9 ("RDMA: Handle PD allocations by IB/core")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoMerge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma...
Doug Ledford [Wed, 13 Feb 2019 14:35:39 +0000 (09:35 -0500)]
Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma into for-next

I had merged the hfi1-tid code into my local copy of for-next, but was
waiting on 0day testing before pushing it (I pushed it to my wip
branch).  Having waited several days for 0day testing to show up, I'm
finally just going to push it out.  In the meantime, though, Jason
pushed other stuff to for-next, so I needed to merge up the branches
before pushing.

Signed-off-by: Doug Ledford <dledford@redhat.com>
5 years agoRDMA/bnxt_re: fix or'ing of data into an uninitialized struct member
Colin Ian King [Mon, 11 Feb 2019 13:34:15 +0000 (13:34 +0000)]
RDMA/bnxt_re: fix or'ing of data into an uninitialized struct member

The struct member comp_mask has not been initialized however a bit
pattern is being bitwise or'd into the member and hence other bit
fields in comp_mask may contain any garbage from the stack. Fix this
by making the bitwise or into an assignment.

Fixes: 95b86d1c91ad ("RDMA/bnxt_re: Update kernel user abi to pass chip context")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/mlx5: Fix memory leak in case we fail to add an IB device
Mark Bloch [Mon, 11 Feb 2019 15:40:54 +0000 (17:40 +0200)]
RDMA/mlx5: Fix memory leak in case we fail to add an IB device

Make sure the IB device is freed on failure.

Fixes: b5ca15ad7e61 ("IB/mlx5: Add proper representors support")
Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: HÃ¥kon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/mlx5: Fix bad flow upon DEVX mkey creation
Yishai Hadas [Mon, 11 Feb 2019 15:40:53 +0000 (17:40 +0200)]
IB/mlx5: Fix bad flow upon DEVX mkey creation

Fix bad flow upon DEVX mkey creation to prevent deleting the indirect mkey
from the radix tree in case there was a previous failure to insert it.

Fixes: 534fd7aac56a ("IB/mlx5: Manage indirection mkey upon DEVX flow for ODP")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/rxe: Use for_each_sg_page iterator on umem SGL
Shiraz, Saleem [Mon, 11 Feb 2019 15:25:07 +0000 (09:25 -0600)]
RDMA/rxe: Use for_each_sg_page iterator on umem SGL

The driver walks the umem SGL assuming a 1:1 mapping between SGE and
system page. Update to use the for_each_sg_page iterator to get individual
pages contained in the SGEs.  This is a pre-requisite before adding page
combining into SGEs while building the scatter table in IB core.

Additionally, purge umem->page_shift usage in the driver as its only
relevant for ODP MRs. Use system page size and shift instead.

Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/ocrdma: Use for_each_sg_dma_page iterator on umem SGL
Shiraz, Saleem [Mon, 11 Feb 2019 15:25:05 +0000 (09:25 -0600)]
RDMA/ocrdma: Use for_each_sg_dma_page iterator on umem SGL

Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped
SGL and get the page DMA address. This avoids the extra loop to iterate
pages in the SGE when for_each_sg iterator is used.

Additionally, purge umem->page_shift usage in the driver as its only
relevant for ODP MRs. Use system page size and shift instead.

Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/qedr: Use for_each_sg_dma_page iterator on umem SGL
Shiraz, Saleem [Mon, 11 Feb 2019 15:25:04 +0000 (09:25 -0600)]
RDMA/qedr: Use for_each_sg_dma_page iterator on umem SGL

Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped
SGL and get the page DMA address. This avoids the extra loop to iterate
pages in the SGE when for_each_sg iterator is used.

Additionally, purge umem->page_shift usage in the driver as its only
relevant for ODP MRs. Use system page size and shift instead.

Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com>
Acked-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/vmw_pvrdma: Use for_each_sg_dma_page iterator on umem SGL
Shiraz, Saleem [Mon, 11 Feb 2019 15:25:03 +0000 (09:25 -0600)]
RDMA/vmw_pvrdma: Use for_each_sg_dma_page iterator on umem SGL

Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped
SGL and get the page DMA address. This avoids the extra loop to iterate
pages in the SGE when for_each_sg iterator is used.

Additionally, purge umem->page_shift usage in the driver as its only
relevant for ODP MRs. Use system page size and shift instead.

Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/cxgb3: Use for_each_sg_dma_page iterator on umem SGL
Shiraz, Saleem [Mon, 11 Feb 2019 15:25:02 +0000 (09:25 -0600)]
RDMA/cxgb3: Use for_each_sg_dma_page iterator on umem SGL

Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped
SGL and get the page DMA address. This avoids the extra loop to iterate
pages in the SGE when for_each_sg iterator is used.

Additionally, purge umem->page_shift usage in the driver as its only
relevant for ODP MRs. Use system page size and shift instead.

Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/cxgb4: Use for_each_sg_dma_page iterator on umem SGL
Shiraz, Saleem [Mon, 11 Feb 2019 15:25:01 +0000 (09:25 -0600)]
RDMA/cxgb4: Use for_each_sg_dma_page iterator on umem SGL

Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped
SGL and get the page DMA address. This avoids the extra loop to iterate
pages in the SGE when for_each_sg iterator is used.

Additionally, purge umem->page_shift usage in the driver as its only
relevant for ODP MRs. Use system page size and shift instead.

Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/hns: Use for_each_sg_dma_page iterator on umem SGL
Shiraz, Saleem [Mon, 11 Feb 2019 15:25:00 +0000 (09:25 -0600)]
RDMA/hns: Use for_each_sg_dma_page iterator on umem SGL

Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped
SGL and get the page DMA address. This avoids the extra loop to iterate
pages in the SGE when for_each_sg iterator is used.

Additionally, purge umem->page_shift usage in the driver as its only
relevant for ODP MRs. Use system page size and shift instead.

Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/i40iw: Use for_each_sg_dma_page iterator on umem SGL
Shiraz, Saleem [Mon, 11 Feb 2019 15:24:59 +0000 (09:24 -0600)]
RDMA/i40iw: Use for_each_sg_dma_page iterator on umem SGL

Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped
SGL and get the page DMA address. This avoids the extra loop to iterate
pages in the SGE when for_each_sg iterator is used.

Additionally, purge umem->page_shift usage in the driver as its only
relevant for ODP MRs. Use system page size and shift instead.

Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/mthca: Use for_each_sg_dma_page iterator on umem SGL
Shiraz, Saleem [Mon, 11 Feb 2019 15:24:58 +0000 (09:24 -0600)]
RDMA/mthca: Use for_each_sg_dma_page iterator on umem SGL

Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped
SGL and get the page DMA address. This avoids the extra loop to iterate
pages in the SGE when for_each_sg iterator is used.

Additionally, purge umem->page_shift usage in the driver as its only
relevant for ODP MRs. Use system page size and shift instead.

Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/bnxt_re: Use for_each_sg_dma_page iterator on umem SGL
Shiraz, Saleem [Mon, 11 Feb 2019 15:24:57 +0000 (09:24 -0600)]
RDMA/bnxt_re: Use for_each_sg_dma_page iterator on umem SGL

Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped
SGL and get the page DMA address. This avoids the extra loop to iterate
pages in the SGE when for_each_sg iterator is used.

Additionally, purge umem->page_shift usage in the driver as its only
relevant for ODP MRs. Use system page size and shift instead.

Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agolib/scatterlist: Provide a DMA page iterator
Jason Gunthorpe [Fri, 4 Jan 2019 18:40:21 +0000 (11:40 -0700)]
lib/scatterlist: Provide a DMA page iterator

Commit 2db76d7c3c6d ("lib/scatterlist: sg_page_iter: support sg lists w/o
backing pages") introduced the sg_page_iter_dma_address() function without
providing a way to use it in the general case. If the sg_dma_len() is not
equal to the sg length callers cannot safely use the
for_each_sg_page/sg_page_iter_dma_address combination.

Resolve this API mistake by providing a DMA specific iterator,
for_each_sg_dma_page(), that uses the right length so
sg_page_iter_dma_address() works as expected with all sglists.

A new iterator type is introduced to provide compile-time safety against
wrongly mixing accessors and iterators.

Acked-by: Christoph Hellwig <hch@lst.de> (for scatterlist)
Acked-by: Thomas Hellstrom <thellstrom@vmware.com>
Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com> (ipu3-cio2)
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoMerge branch 'wip/dl-for-next' into for-next
Doug Ledford [Sat, 9 Feb 2019 17:54:04 +0000 (12:54 -0500)]
Merge branch 'wip/dl-for-next' into for-next

Due to concurrent work by myself and Jason, a normal fast forward merge
was not possible.  This brings in a number of hfi1 changes, mainly the
hfi1 TID RDMA support (roughly 10,000 LOC change), which was reviewed
and integrated over a period of days.

Signed-off-by: Doug Ledford <dledford@redhat.com>
5 years agoMerge branch 'hfi1-tid' into wip/dl-for-next
Doug Ledford [Sat, 9 Feb 2019 17:50:02 +0000 (12:50 -0500)]
Merge branch 'hfi1-tid' into wip/dl-for-next

Omni-Path TID RDMA Feature

Intel Omni-Path (OPA) TID RDMA support is a feature that accelerates
data movement between two OPA nodes through the IB Verbs interface. It
improves RDMA READ/WRITE performance by delivering the data payload to a
user buffer directly without any software copying.

Architecture
=============
The TID RDMA protocol is implemented on the hfi1 driver level and is
therefore transparent to the ULPs. It is designed to facilitate the data
transactions for two specific RDMA requests:
  - RDMA READ;
  - RDMA WRITE.
Previously, when a verbs data packet is received at the destination
(requester side for RDMA READ and responder side for RDMA WRITE), the
data payload is copied to the user buffer by software, which slows down
the performance significantly for large requests.

Internally, hfi1 converts qualified RDMA READ/WRITE requests into TID
RDMA READ/WRITE requests when the requests are post sent to the hfi1
driver. Non-qualified RDMA requests are handled by normal RDMA protocol.

For TID RDMA requests, hardware resources (hardware flow and TID entries)
are allocated on the destination side (the requester side for TID RDMA
READ and the responder side for TID RDMA WRITE). The information for
these resources is conveyed to the data source side (the responder side
for TID RDMA READ and the requester side for TID RDMA WRITE) and embedded
in data packets. When data packets are received by the destination,
hardware will deliver the data payload to the destination buffer without
involving software and therefore improve the performance.

Details
=======
RDMA READ/WRITE requests are qualified by the following:
  - Total data length >= 256k;
  - Totoal data length is a multiple of 4K pages.

Additional qualifications are enforced for the destination buffers:
  For RDMA RAED:
    - Each destination sge buffer is 4K aligned;
    - Each destination sge buffer is a multiple of 4K pages.
  For RDMA WRITE:
    - The destination number is 4K aligned.

In addition, in an OPA fabric, some nodes may support TID RDMA while
others may not. As such, it is important for two transaction nodes to
exchange the information about the features they support. This discovery
mechanism is called OPA Feature Negotion (OPFN) and is described in
details in the patch series. Through OPFN, two nodes can find whether
they both support TID RDMA and subsequently convert RDMA requests into
TID RDMA requests.

* hfi1-tid: (46 commits)
  IB/hfi1: Prioritize the sending of ACK packets
  IB/hfi1: Add static trace for TID RDMA WRITE protocol
  IB/hfi1: Enable TID RDMA WRITE protocol
  IB/hfi1: Add interlock between TID RDMA WRITE and other requests
  IB/hfi1: Add TID RDMA WRITE functionality into RDMA verbs
  IB/hfi1: Add the dual leg code
  IB/hfi1: Add the TID second leg ACK packet builder
  IB/hfi1: Add the TID second leg send packet builder
  IB/hfi1: Resend the TID RDMA WRITE DATA packets
  IB/hfi1: Add a function to receive TID RDMA RESYNC packet
  IB/hfi1: Add a function to build TID RDMA RESYNC packet
  IB/hfi1: Add TID RDMA retry timer
  IB/hfi1: Add a function to receive TID RDMA ACK packet
  IB/hfi1: Add a function to build TID RDMA ACK packet
  IB/hfi1: Add a function to receive TID RDMA WRITE DATA packet
  IB/hfi1: Add a function to build TID RDMA WRITE DATA packet
  IB/hfi1: Add a function to receive TID RDMA WRITE response
  IB/hfi1: Add TID resource timer
  IB/hfi1: Add a function to build TID RDMA WRITE response
  IB/hfi1: Add functions to receive TID RDMA WRITE request
  ...

Signed-off-by: Doug Ledford <dledford@redhat.com>
5 years agoiw_cxgb4: fix srqidx leak during connection abort
Raju Rangoju [Wed, 6 Feb 2019 17:24:44 +0000 (22:54 +0530)]
iw_cxgb4: fix srqidx leak during connection abort

When an application aborts the connection by moving QP from RTS to ERROR,
then iw_cxgb4's modify_rc_qp() RTS->ERROR logic sets the
*srqidxp to 0 via t4_set_wq_in_error(&qhp->wq, 0), and aborts the
connection by calling c4iw_ep_disconnect().

c4iw_ep_disconnect() does the following:
 1. sends up a close_complete_upcall(ep, -ECONNRESET) to libcxgb4.
 2. sends abort request CPL to hw.

But, since the close_complete_upcall() is sent before sending the
ABORT_REQ to hw, libcxgb4 would fail to release the srqidx if the
connection holds one. Because, the srqidx is passed up to libcxgb4 only
after corresponding ABORT_RPL is processed by kernel in abort_rpl().

This patch handle the corner-case by moving the call to
close_complete_upcall() from c4iw_ep_disconnect() to abort_rpl().  So that
libcxgb4 is notified about the -ECONNRESET only after abort_rpl(), and
libcxgb4 can relinquish the srqidx properly.

Signed-off-by: Raju Rangoju <rajur@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoiw_cxgb4: complete the cached SRQ buffers
Raju Rangoju [Wed, 6 Feb 2019 17:24:43 +0000 (22:54 +0530)]
iw_cxgb4: complete the cached SRQ buffers

If TP fetches an SRQ buffer but ends up not using it before the connection
is aborted, then it passes the index of that SRQ buffer to the host in
ABORT_REQ_RSS or ABORT_RPL CPL message.

But, if the srqidx field is zero in the received ABORT_RPL or
ABORT_REQ_RSS CPL, then we need to read the tcb.rq_start field to see if
it really did have an RQE cached. This works around a case where HW does
not include the srqidx in the ABORT_RPL/ABORT_REQ_RSS CPL.

The final value of rq_start is the one present in TCB with the
TF_RX_PDU_OUT bit cleared. So, we need to read the TCB, examine the
TF_RX_PDU_OUT (bit 49 of t_flags) in order to determine if there's a rx
PDU feedback event pending.

Signed-off-by: Raju Rangoju <rajur@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agocxgb4: add tcb flags and tcb rpl struct
Raju Rangoju [Wed, 6 Feb 2019 17:24:42 +0000 (22:54 +0530)]
cxgb4: add tcb flags and tcb rpl struct

This patch adds the tcb flags and structures needed for querying tcb
information.

Signed-off-by: Raju Rangoju <rajur@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/devices: Re-organize device.c locking
Jason Gunthorpe [Thu, 7 Feb 2019 05:41:54 +0000 (22:41 -0700)]
RDMA/devices: Re-organize device.c locking

The locking here started out with a single lock that covered everything
and then has lately veered into crazy town.

The fundamental problem is that several places need to iterate over a
linked list, but also need to drop their locks to avoid deadlock during
client callbacks.

xarray's restartable iteration offers a simple solution to the
problem. Once all the lists are xarrays we can drop locks in the places
that need that and rely on xarray to provide consistency and locking for
the data structure.

The resulting simplification is that each of the three lists has a
dedicated rwsem that must be held when working with the list it
covers. One data structure is no longer covered by multiple locks.

The sleeping semaphore is selected because the read side generally needs
to be held over something sleeping, and using RCU reader locking in those
cases is overkill.

In the process this simplifies the entire registration/unregistration flow
to be the expected list of setups and the reversed list of matching
teardowns, and the registration lock 'refcount' can now be revised to be
released after the ULPs are removed, providing a very sane semantic for
this feature.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/devices: Use xarray to store the client_data
Jason Gunthorpe [Thu, 7 Feb 2019 05:41:53 +0000 (22:41 -0700)]
RDMA/devices: Use xarray to store the client_data

Now that we have a small ID for each client we can use xarray instead of
linearly searching linked lists for client data. This will give much
faster and scalable client data lookup, and will lets us revise the
locking scheme.

Since xarray can store 'going_down' using a mark just entirely eliminate
the struct ib_client_data and directly store the client_data value in the
xarray. However this does require a special iterator as we must still
iterate over any NULL client_data values.

Also eliminate the client_data_lock in favour of internal xarray locking.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/devices: Use xarray to store the clients
Jason Gunthorpe [Thu, 7 Feb 2019 05:41:52 +0000 (22:41 -0700)]
RDMA/devices: Use xarray to store the clients

This gives each client a unique ID and will let us move client_data to use
xarray, and revise the locking scheme.

clients have to be add/removed in strict FIFO/LIFO order as they
interdepend. To support this the client_ids are assigned to increase in
FIFO order. The existing linked list is kept to support reverse iteration
until xarray can get a reverse iteration API.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
5 years agoRDMA/device: Use an ida instead of a free page in alloc_name
Jason Gunthorpe [Thu, 7 Feb 2019 05:41:51 +0000 (22:41 -0700)]
RDMA/device: Use an ida instead of a free page in alloc_name

ida is the proper data structure to hold list of clustered small integers
and then allocate an unused integer. Get rid of the convoluted and limited
open-coded bitmap.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/device: Get rid of reg_state
Jason Gunthorpe [Thu, 7 Feb 2019 05:41:50 +0000 (22:41 -0700)]
RDMA/device: Get rid of reg_state

This really has no purpose anymore, refcount can be used to tell if the
device is still registered. Keeping it around just invites mis-use.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
5 years agoRDMA/device: Call ib_cache_release_one() only from ib_device_release()
Jason Gunthorpe [Thu, 7 Feb 2019 05:41:49 +0000 (22:41 -0700)]
RDMA/device: Call ib_cache_release_one() only from ib_device_release()

Instead of complicated logic about when this memory is freed, always free
it during device release(). All the cache pointers start out as NULL, so
it is safe to call this before the cache is initialized.

This makes for a simpler error unwind flow, and a simpler understanding of
the lifetime of the memory allocations inside the struct ib_device.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/device: Ensure that security memory is always freed
Jason Gunthorpe [Thu, 7 Feb 2019 05:41:48 +0000 (22:41 -0700)]
RDMA/device: Ensure that security memory is always freed

Since this only frees memory it should be done during the release
callback. Otherwise there are possible error flows where it might not get
called if registration aborts.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/device: Check that the rename is nop under the lock
Jason Gunthorpe [Thu, 7 Feb 2019 05:41:47 +0000 (22:41 -0700)]
RDMA/device: Check that the rename is nop under the lock

Since another rename could be running in parallel it is safer to check
that the name is not changing inside the lock, where we already know the
device name will not change.

Fixes: d21943dd19b5 ("RDMA/core: Implement IB device rename function")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
5 years agoRDMA: Handle PD allocations by IB/core
Leon Romanovsky [Sun, 3 Feb 2019 12:55:51 +0000 (14:55 +0200)]
RDMA: Handle PD allocations by IB/core

The PD allocations in IB/core allows us to simplify drivers and their
error flows in their .alloc_pd() paths. The changes in .alloc_pd() go hand
in had with relevant update in .dealloc_pd().

We will use this opportunity and convert .dealloc_pd() to don't fail, as
it was suggested a long time ago, failures are not happening as we have
never seen a WARN_ON print.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/core: Share driver structure size with core
Leon Romanovsky [Sun, 3 Feb 2019 12:55:50 +0000 (14:55 +0200)]
RDMA/core: Share driver structure size with core

Add new macros to be used in drivers while registering ops structure and
IB/core while calling allocation routines, so drivers won't need to
perform kzalloc/kfree in their paths.

The change in allocation stage allows us to initialize common fields prior
to calling to drivers (e.g. restrack).

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/core: Don't register each MAD agent for LSM notifier
Daniel Jurgens [Sat, 2 Feb 2019 09:09:45 +0000 (11:09 +0200)]
IB/core: Don't register each MAD agent for LSM notifier

When creating many MAD agents in a short period of time, receive packet
processing can be delayed long enough to cause timeouts while new agents
are being added to the atomic notifier chain with IRQs disabled.  Notifier
chain registration and unregstration is an O(n) operation. With large
numbers of MAD agents being created and destroyed simultaneously the CPUs
spend too much time with interrupts disabled.

Instead of each MAD agent registering for it's own LSM notification,
maintain a list of agents internally and register once, this registration
already existed for handling the PKeys. This list is write mostly, so a
normal spin lock is used vs a read/write lock. All MAD agents must be
checked, so a single list is used instead of breaking them down per
device.

Notifier calls are done under rcu_read_lock, so there isn't a risk of
similar packet timeouts while checking the MAD agents security settings
when notified.

Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/core: Eliminate a hole in MAD agent struct
Daniel Jurgens [Sat, 2 Feb 2019 09:09:44 +0000 (11:09 +0200)]
IB/core: Eliminate a hole in MAD agent struct

Move the security related fields above the u8s to eliminate a hole in the
struct.

pahole before:
struct ib_mad_agent {
...
u32                        hi_tid;               /*    48     4 */
u32                        flags;                /*    52     4 */
u8                         port_num;             /*    56     1 */
u8                         rmpp_version;         /*    57     1 */

/* XXX 6 bytes hole, try to pack */

/* --- cacheline 1 boundary (64 bytes) --- */
void *                     security;             /*    64     8 */
bool                       smp_allowed;          /*    72     1 */
bool                       lsm_nb_reg;           /*    73     1 */

/* XXX 6 bytes hole, try to pack */

struct notifier_block      lsm_nb;               /*    80    24 */

/* XXX last struct has 4 bytes of padding */

/* size: 104, cachelines: 2, members: 14 */
...
};

pahole after:
struct ib_mad_agent {
...
u32                        hi_tid;               /*    48     4 */
u32                        flags;                /*    52     4 */
void *                     security;             /*    56     8 */
/* --- cacheline 1 boundary (64 bytes) --- */
struct notifier_block      lsm_nb;               /*    64    24 */

/* XXX last struct has 4 bytes of padding */

u8                         port_num;             /*    88     1 */
u8                         rmpp_version;         /*    89     1 */
bool                       smp_allowed;          /*    90     1 */
bool                       lsm_nb_reg;           /*    91     1 */

/* size: 96, cachelines: 2, members: 14 */
...
};

Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/core: Fix potential memory leak while creating MAD agents
Daniel Jurgens [Sat, 2 Feb 2019 09:09:43 +0000 (11:09 +0200)]
IB/core: Fix potential memory leak while creating MAD agents

If the MAD agents isn't allowed to manage the subnet, or fails to register
for the LSM notifier, the security context is leaked. Free the context in
these cases.

Fixes: 47a2b338fe63 ("IB/core: Enforce security on management datagrams")
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Reported-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/core: Unregister notifier before freeing MAD security
Daniel Jurgens [Sat, 2 Feb 2019 09:09:42 +0000 (11:09 +0200)]
IB/core: Unregister notifier before freeing MAD security

If the notifier runs after the security context is freed an access of
freed memory can occur.

Fixes: 47a2b338fe63 ("IB/core: Enforce security on management datagrams")
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/usnic: Fix locking when unregistering
Parvi Kaustubhi [Fri, 8 Feb 2019 21:53:43 +0000 (13:53 -0800)]
IB/usnic: Fix locking when unregistering

Move the call to usnic_ib_device_remove after usnic_ib_ibdev_list_lock has
been released.

Signed-off-by: Parvi Kaustubhi <pkaustub@cisco.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoiw_cxgb4: use tos when finding ipv6 routes
Steve Wise [Fri, 1 Feb 2019 20:44:53 +0000 (12:44 -0800)]
iw_cxgb4: use tos when finding ipv6 routes

When IPv6 support was added, the correct tos was not passed to
cxgb_find_route6(). This potentially results in the wrong route entry.

Fixes: 830662f6f032 ("RDMA/cxgb4: Add support for active and passive open connection with IPv6 address")
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoiw_cxgb4: use tos when importing the endpoint
Steve Wise [Fri, 1 Feb 2019 20:44:41 +0000 (12:44 -0800)]
iw_cxgb4: use tos when importing the endpoint

import_ep() is passed the correct tos, but doesn't use it correctly.

Fixes: ac8e4c69a021 ("cxgb4/iw_cxgb4: TOS support")
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoiw_cxgb4: use listening ep tos when accepting new connections
Steve Wise [Fri, 1 Feb 2019 20:44:37 +0000 (12:44 -0800)]
iw_cxgb4: use listening ep tos when accepting new connections

If the parent listening endpoint has a service type set, then use that
when setting up the connection.  This allows server-side applications to
mandate the tos for passive side connections via rdma_set_service_type()
on the listening endpoints.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/iwcm: add tos_set bool to iw_cm struct
Steve Wise [Fri, 1 Feb 2019 20:44:32 +0000 (12:44 -0800)]
RDMA/iwcm: add tos_set bool to iw_cm struct

This allows drivers to know the tos was actively set by the application.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/cma: listening device cm_ids should inherit tos
Steve Wise [Fri, 1 Feb 2019 20:44:27 +0000 (12:44 -0800)]
RDMA/cma: listening device cm_ids should inherit tos

If a user binds to INADDR_ANY and sets the service id, then the
device-specific cm_ids should also use this tos.  This allows an app to
do:

rdma_bind_addr(INADDR_ANY)
set_service_type()
rdma_listen()

And connections setup via this listening endpoint will use the correct
tos.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/cma: Define option to set ack timeout and pack tos_set
Danit Goldberg [Thu, 24 Jan 2019 12:18:15 +0000 (14:18 +0200)]
IB/cma: Define option to set ack timeout and pack tos_set

Define new option in 'rdma_set_option' to override calculated QP timeout
when requested to provide QP attributes to modify a QP.

At the same time, pack tos_set to be bitfield.

Signed-off-by: Danit Goldberg <danitg@mellanox.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/bnxt_en: Enable RDMA driver support for 57500 chip
Devesh Sharma [Thu, 7 Feb 2019 06:31:28 +0000 (01:31 -0500)]
RDMA/bnxt_en: Enable RDMA driver support for 57500 chip

Re-enabling RDMA driver support on 57500 chips. Removing the forced error
code for 57500 chip.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/bnxt_re: Update kernel user abi to pass chip context
Devesh Sharma [Thu, 7 Feb 2019 06:31:27 +0000 (01:31 -0500)]
RDMA/bnxt_re: Update kernel user abi to pass chip context

User space verbs provider library would need chip context.  Changing the
ABI to add chip version details in structure.  Furthermore, changing the
kernel driver ucontext allocation code to initialize the abi structure
with appropriate values.

As suggested by community, appended the new fields at the bottom of the
ABI structure and retaining to older fields as those were in the older
versions.

Keeping the ABI version at 1 and adding a new field in the ucontext
response structure to hold the component mask.  The user space library
should check pre-defined flags to figure out if a certain feature is
supported on not.

Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/bnxt_re: Add extended psn structure for 57500 adapters
Devesh Sharma [Thu, 7 Feb 2019 06:31:26 +0000 (01:31 -0500)]
RDMA/bnxt_re: Add extended psn structure for 57500 adapters

The new 57500 series of adapter has bigger psn search structure.  The size
of new structure is 16B. Changing the control path memory allocation and
fast path code to accommodate the new psn structure while maintaining the
backward compatibility.

There are few additional changes listed below:
 - For 57500 chip max-sge are limited to 6 for now.
 - For 57500 chip max-receive-sge should be set to 6 for now.
 - Add driver/hardware interface structure for new chip.

Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/bnxt_re: Enable GSI QP support for 57500 series
Devesh Sharma [Thu, 7 Feb 2019 06:31:25 +0000 (01:31 -0500)]
RDMA/bnxt_re: Enable GSI QP support for 57500 series

In the new 57500 series of adapters the GSI qp is a UD type QP unlike the
previous generation where it was a Raw Eth QP. Changing the control and
data path to support the same. Listing all the significant diffs:

 - AH creation resolve network type unconditionally
 - Add check at relevant places to distinguish from Raw Eth
   processing flow.
 - bnxt_re_process_res_ud_wc report completion with GRH flag
   when qp is GSI.
 - Change length, cfa_meta and smac to match new driver/hardware
   interface.
 - Add new driver/hardware interface.

Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/bnxt_re: Skip backing store allocation for 57500 series
Devesh Sharma [Thu, 7 Feb 2019 06:31:24 +0000 (01:31 -0500)]
RDMA/bnxt_re: Skip backing store allocation for 57500 series

The backing store to keep HW context data structures is allocated and
initialized by L2 driver. For 57500 chip RoCE driver do not require to
allocate and initialize additional memory. Changing to skip duplicate
allocation and initialization for 57500 adapters. Driver continues as
before for older chips.

This patch also takes care of stats context memory alignment to 128
boundary, a requirement for 57500 series of chip. Older chips do not care
of alignment, thus the change is unconditional.

Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/bnxt_re: Add 64bit doorbells for 57500 series
Devesh Sharma [Thu, 7 Feb 2019 06:31:23 +0000 (01:31 -0500)]
RDMA/bnxt_re: Add 64bit doorbells for 57500 series

The new chip series has 64 bit doorbell for notification queues. Thus,
both control and data path event queues need new routines to write 64 bit
doorbell. Adding the same. There is new doorbell interface between the
chip and driver. Changing the chip specific data structure definitions.

Additional significant changes are listed below
- bnxt_re_net_ring_free/alloc takes a new argument
- bnxt_qplib_enable_nq and enable_rcfw uses new doorbell offset
  for new chip.
- DB mapping for NQ and CREQ now maps 8 bytes.
- DBR_DBR_* macros renames to DBC_DBC_*
- store nq_db_offset in a 32bit data type.
- got rid of __iowrite64_copy, used writeq instead.
- changed the DB header initialization to simpler scheme.

Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/bnxt_re: Add chip context to identify 57500 series
Devesh Sharma [Thu, 7 Feb 2019 06:31:22 +0000 (01:31 -0500)]
RDMA/bnxt_re: Add chip context to identify 57500 series

Adding setup and destroy routines for chip-context. The chip context would
be used frequently in control and data path to take execution flow
depending on the chip type.  chip context structure pointer is added to
the relevant data structures.

Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/mlx5: Simplify WQE count power of two check
Gal Pressman [Wed, 6 Feb 2019 13:45:35 +0000 (15:45 +0200)]
IB/mlx5: Simplify WQE count power of two check

Use is_power_of_2() instead of hard coding it in the driver. While at it,
fix the meaningless error print.

Signed-off-by: Gal Pressman <galpress@amazon.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoDocumentation/infiniband: update from locked to pinned_vm
Davidlohr Bueso [Thu, 7 Feb 2019 01:31:55 +0000 (17:31 -0800)]
Documentation/infiniband: update from locked to pinned_vm

We are really talking about pinned_vm here.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agodrivers/IB,core: reduce scope of mmap_sem
Davidlohr Bueso [Wed, 6 Feb 2019 17:59:20 +0000 (09:59 -0800)]
drivers/IB,core: reduce scope of mmap_sem

ib_umem_get() uses gup_longterm() and relies on the lock to stabilze the
vma_list, so we cannot really get rid of mmap_sem altogether, but now that
the counter is atomic, we can get of some complexity that mmap_sem brings
with only pinned_vm.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agodrivers/IB,usnic: reduce scope of mmap_sem
Davidlohr Bueso [Wed, 6 Feb 2019 17:59:19 +0000 (09:59 -0800)]
drivers/IB,usnic: reduce scope of mmap_sem

usnic_uiom_get_pages() uses gup_longterm() so we cannot really get rid of
mmap_sem altogether in the driver, but we can get rid of some complexity
that mmap_sem brings with only pinned_vm.  We can get rid of the wq
altogether as we no longer need to defer work to unpin pages as the
counter is now atomic. We also share the lock.

Acked-by: Parvi Kaustubhi <pkaustub@cisco.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agodrivers/IB,hfi1: do not se mmap_sem
Davidlohr Bueso [Wed, 6 Feb 2019 17:59:18 +0000 (09:59 -0800)]
drivers/IB,hfi1: do not se mmap_sem

This driver already uses gup_fast() and thus we can just drop the mmap_sem
protection around the pinned_vm counter. Note that the window between when
hfi1_can_pin_pages() is called and the actual counter is incremented
remains the same as mmap_sem was _only_ used for when ->pinned_vm was
touched.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Davidlohr Bueso <dbueso@suse.det>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>