Conflict resolution of af_smc.c from Stephen Rothwell.
Signed-off-by: David S. Miller <davem@davemloft.net>
- 'rhl_for_each_entry_rcu'
- 'rhl_for_each_rcu'
- 'rht_for_each'
- - 'rht_for_each_continue'
+ - 'rht_for_each_from'
- 'rht_for_each_entry'
- - 'rht_for_each_entry_continue'
+ - 'rht_for_each_entry_from'
- 'rht_for_each_entry_rcu'
- - 'rht_for_each_entry_rcu_continue'
+ - 'rht_for_each_entry_rcu_from'
- 'rht_for_each_entry_safe'
- 'rht_for_each_rcu'
- - 'rht_for_each_rcu_continue'
+ - 'rht_for_each_rcu_from'
- '__rq_for_each_bio'
- 'rq_for_each_bvec'
- 'rq_for_each_segment'
Alan Cox <root@hraefn.swansea.linux.org.uk>
Aleksey Gorelov <aleksey_gorelov@phoenix.com>
Aleksandar Markovic <aleksandar.markovic@mips.com> <aleksandar.markovic@imgtec.com>
+Alexei Starovoitov <ast@kernel.org> <ast@plumgrid.com>
+Alexei Starovoitov <ast@kernel.org> <alexei.starovoitov@gmail.com>
+Alexei Starovoitov <ast@kernel.org> <ast@fb.com>
Al Viro <viro@ftp.linux.org.uk>
Al Viro <viro@zenIV.linux.org.uk>
Andi Shyti <andi@etezian.org> <andi.shyti@samsung.com>
Christophe Ricard <christophe.ricard@gmail.com>
Corey Minyard <minyard@acm.org>
Damian Hobson-Garcia <dhobsong@igel.co.jp>
+Daniel Borkmann <daniel@iogearbox.net> <dborkman@redhat.com>
+Daniel Borkmann <daniel@iogearbox.net> <dborkmann@redhat.com>
+Daniel Borkmann <daniel@iogearbox.net> <danborkmann@iogearbox.net>
+Daniel Borkmann <daniel@iogearbox.net> <daniel.borkmann@tik.ee.ethz.ch>
+Daniel Borkmann <daniel@iogearbox.net> <danborkmann@googlemail.com>
+Daniel Borkmann <daniel@iogearbox.net> <dxchgb@gmail.com>
David Brownell <david-b@pacbell.net>
David Woodhouse <dwmw2@shinybook.infradead.org>
Dengcheng Zhu <dzhu@wavecomp.com> <dengcheng.zhu@mips.com>
+This ABI is deprecated and will be removed after 2021. It is
+replaced with the batadv generic netlink family.
What: /sys/class/net/<iface>/batman-adv/elp_interval
Date: Feb 2014
+This ABI is deprecated and will be removed after 2021. It is
+replaced with the batadv generic netlink family.
What: /sys/class/net/<mesh_iface>/mesh/aggregated_ogms
Date: May 2010
#define BTF_KIND_RESTRICT 11 /* Restrict */
#define BTF_KIND_FUNC 12 /* Function */
#define BTF_KIND_FUNC_PROTO 13 /* Function Proto */
+ #define BTF_KIND_VAR 14 /* Variable */
+ #define BTF_KIND_DATASEC 15 /* Section */
Note that the type section encodes debug info, not just pure types.
``BTF_KIND_FUNC`` is not a type, and it represents a defined subprogram.
If the function has variable arguments, the last parameter is encoded with
``name_off = 0`` and ``type = 0``.
+2.2.14 BTF_KIND_VAR
+~~~~~~~~~~~~~~~~~~~
+
+``struct btf_type`` encoding requirement:
+ * ``name_off``: offset to a valid C identifier
+ * ``info.kind_flag``: 0
+ * ``info.kind``: BTF_KIND_VAR
+ * ``info.vlen``: 0
+ * ``type``: the type of the variable
+
+``btf_type`` is followed by a single ``struct btf_variable`` with the
+following data::
+
+ struct btf_var {
+ __u32 linkage;
+ };
+
+``struct btf_var`` encoding:
+ * ``linkage``: currently only static variable 0, or globally allocated
+ variable in ELF sections 1
+
+Not all type of global variables are supported by LLVM at this point.
+The following is currently available:
+
+ * static variables with or without section attributes
+ * global variables with section attributes
+
+The latter is for future extraction of map key/value type id's from a
+map definition.
+
+2.2.15 BTF_KIND_DATASEC
+~~~~~~~~~~~~~~~~~~~~~~~
+
+``struct btf_type`` encoding requirement:
+ * ``name_off``: offset to a valid name associated with a variable or
+ one of .data/.bss/.rodata
+ * ``info.kind_flag``: 0
+ * ``info.kind``: BTF_KIND_DATASEC
+ * ``info.vlen``: # of variables
+ * ``size``: total section size in bytes (0 at compilation time, patched
+ to actual size by BPF loaders such as libbpf)
+
+``btf_type`` is followed by ``info.vlen`` number of ``struct btf_var_secinfo``.::
+
+ struct btf_var_secinfo {
+ __u32 type;
+ __u32 offset;
+ __u32 size;
+ };
+
+``struct btf_var_secinfo`` encoding:
+ * ``type``: the type of the BTF_KIND_VAR variable
+ * ``offset``: the in-section offset of the variable
+ * ``size``: the size of the variable in bytes
+
3. BTF Kernel API
*****************
--- /dev/null
+Properties for the MDIO bus multiplexer/glue of Amlogic G12a SoC family.
+
+This is a special case of a MDIO bus multiplexer. It allows to choose between
+the internal mdio bus leading to the embedded 10/100 PHY or the external
+MDIO bus.
+
+Required properties in addition to the generic multiplexer properties:
+- compatible : amlogic,g12a-mdio-mux
+- reg: physical address and length of the multiplexer/glue registers
+- clocks: list of clock phandle, one for each entry clock-names.
+- clock-names: should contain the following:
+ * "pclk" : peripheral clock.
+ * "clkin0" : platform crytal
+ * "clkin1" : SoC 50MHz MPLL
+
+Example :
+
+mdio_mux: mdio-multiplexer@4c000 {
+ compatible = "amlogic,g12a-mdio-mux";
+ reg = <0x0 0x4c000 0x0 0xa4>;
+ clocks = <&clkc CLKID_ETH_PHY>,
+ <&xtal>,
+ <&clkc CLKID_MPLL_5OM>;
+ clock-names = "pclk", "clkin0", "clkin1";
+ mdio-parent-bus = <&mdio0>;
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ ext_mdio: mdio@0 {
+ reg = <0>;
+ #address-cells = <1>;
+ #size-cells = <0>;
+ };
+
+ int_mdio: mdio@1 {
+ reg = <1>;
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ internal_ephy: ethernet-phy@8 {
+ compatible = "ethernet-phy-id0180.3301",
+ "ethernet-phy-ieee802.3-c22";
+ interrupts = <GIC_SPI 9 IRQ_TYPE_LEVEL_HIGH>;
+ reg = <8>;
+ max-speed = <100>;
+ };
+ };
+};
$ insmod batman-adv.ko
The module is now waiting for activation. You must add some interfaces on which
-batman can operate. After loading the module batman advanced will scan your
-systems interfaces to search for compatible interfaces. Once found, it will
-create subfolders in the ``/sys`` directories of each supported interface,
-e.g.::
-
- $ ls /sys/class/net/eth0/batman_adv/
- elp_interval iface_status mesh_iface throughput_override
-
-If an interface does not have the ``batman_adv`` subfolder, it probably is not
-supported. Not supported interfaces are: loopback, non-ethernet and batman's
-own interfaces.
-
-Note: After the module was loaded it will continuously watch for new
-interfaces to verify the compatibility. There is no need to reload the module
-if you plug your USB wifi adapter into your machine after batman advanced was
-initially loaded.
-
-The batman-adv soft-interface can be created using the iproute2 tool ``ip``::
+batman-adv can operate. The batman-adv soft-interface can be created using the
+iproute2 tool ``ip``::
$ ip link add name bat0 type batadv
$ ip link set dev eth0 master bat0
-Repeat this step for all interfaces you wish to add. Now batman starts
+Repeat this step for all interfaces you wish to add. Now batman-adv starts
using/broadcasting on this/these interface(s).
-By reading the "iface_status" file you can check its status::
-
- $ cat /sys/class/net/eth0/batman_adv/iface_status
- active
-
To deactivate an interface you have to detach it from the "bat0" interface::
$ ip link set dev eth0 nomaster
+The same can also be done using the batctl interface subcommand::
-All mesh wide settings can be found in batman's own interface folder::
+ batctl -m bat0 interface create
+ batctl -m bat0 interface add -M eth0
- $ ls /sys/class/net/bat0/mesh/
- aggregated_ogms fragmentation isolation_mark routing_algo
- ap_isolation gw_bandwidth log_level vlan0
- bonding gw_mode multicast_mode
- bridge_loop_avoidance gw_sel_class network_coding
- distributed_arp_table hop_penalty orig_interval
+To detach eth0 and destroy bat0::
-There is a special folder for debugging information::
+ batctl -m bat0 interface del -M eth0
+ batctl -m bat0 interface destroy
- $ ls /sys/kernel/debug/batman_adv/bat0/
- bla_backbone_table log neighbors transtable_local
- bla_claim_table mcast_flags originators
- dat_cache nc socket
- gateways nc_nodes transtable_global
+There are additional settings for each batadv mesh interface, vlan and hardif
+which can be modified using batctl. Detailed information about this can be found
+in its manual.
-Some of the files contain all sort of status information regarding the mesh
-network. For example, you can view the table of originators (mesh
-participants) with::
+For instance, you can check the current originator interval (value
+in milliseconds which determines how often batman-adv sends its broadcast
+packets)::
- $ cat /sys/kernel/debug/batman_adv/bat0/originators
-
-Other files allow to change batman's behaviour to better fit your requirements.
-For instance, you can check the current originator interval (value in
-milliseconds which determines how often batman sends its broadcast packets)::
-
- $ cat /sys/class/net/bat0/mesh/orig_interval
+ $ batctl -M bat0 orig_interval
1000
and also change its value::
- $ echo 3000 > /sys/class/net/bat0/mesh/orig_interval
+ $ batctl -M bat0 orig_interval 3000
In very mobile scenarios, you might want to adjust the originator interval to a
lower value. This will make the mesh more responsive to topology changes, but
will also increase the overhead.
+Information about the current state can be accessed via the batadv generic
+netlink family. batctl provides human readable version via its debug tables
+subcommands.
+
Usage
=====
menuconfig" and enable the option ``B.A.T.M.A.N. debugging``
(``CONFIG_BATMAN_ADV_DEBUG=y``).
-Those additional debug messages can be accessed using a special file in
-debugfs::
+Those additional debug messages can be accessed using the perf infrastructure::
- $ cat /sys/kernel/debug/batman_adv/bat0/log
+ $ trace-cmd stream -e batadv:batadv_dbg
The additional debug output is by default disabled. It can be enabled during
-run time. Following log_levels are defined:
-
-.. flat-table::
-
- * - 0
- - All debug output disabled
- * - 1
- - Enable messages related to routing / flooding / broadcasting
- * - 2
- - Enable messages related to route added / changed / deleted
- * - 4
- - Enable messages related to translation table operations
- * - 8
- - Enable messages related to bridge loop avoidance
- * - 16
- - Enable messages related to DAT, ARP snooping and parsing
- * - 32
- - Enable messages related to network coding
- * - 64
- - Enable messages related to multicast
- * - 128
- - Enable messages related to throughput meter
- * - 255
- - Enable all messages
-
-The debug output can be changed at runtime using the file
-``/sys/class/net/bat0/mesh/log_level``. e.g.::
-
- $ echo 6 > /sys/class/net/bat0/mesh/log_level
-
-will enable debug messages for when routes change.
+run time::
+
+ $ batctl -m bat0 loglevel routes tt
+
+will enable debug messages for when routes and translation table entries change.
Counters for different types of packets entering and leaving the batman-adv
module are available through ethtool::
Version of the software responsible for supporting/handling the
Network Controller Sideband Interface.
+
+fw.psid
+=======
+
+Unique identifier of the firmware parameter set.
+=============================================
Broadcom Starfighter 2 Ethernet switch driver
=============================================
The switch hardware block is typically interfaced using MMIO accesses and
contains a bunch of sub-blocks/registers:
-* SWITCH_CORE: common switch registers
-* SWITCH_REG: external interfaces switch register
-* SWITCH_MDIO: external MDIO bus controller (there is another one in SWITCH_CORE,
+- ``SWITCH_CORE``: common switch registers
+- ``SWITCH_REG``: external interfaces switch register
+- ``SWITCH_MDIO``: external MDIO bus controller (there is another one in SWITCH_CORE,
which is used for indirect PHY accesses)
-* SWITCH_INDIR_RW: 64-bits wide register helper block
-* SWITCH_INTRL2_0/1: Level-2 interrupt controllers
-* SWITCH_ACB: Admission control block
-* SWITCH_FCB: Fail-over control block
+- ``SWITCH_INDIR_RW``: 64-bits wide register helper block
+- ``SWITCH_INTRL2_0/1``: Level-2 interrupt controllers
+- ``SWITCH_ACB``: Admission control block
+- ``SWITCH_FCB``: Fail-over control block
Implementation details
======================
-The driver is located in drivers/net/dsa/bcm_sf2.c and is implemented as a DSA
-driver; see Documentation/networking/dsa/dsa.txt for details on the subsystem
+The driver is located in ``drivers/net/dsa/bcm_sf2.c`` and is implemented as a DSA
+driver; see ``Documentation/networking/dsa/dsa.rst`` for details on the subsystem
and what it provides.
The SF2 switch is configured to enable a Broadcom specific 4-bytes switch tag
which gets inserted by the switch for every packet forwarded to the CPU
interface, conversely, the CPU network interface should insert a similar tag for
packets entering the CPU port. The tag format is described in
-net/dsa/tag_brcm.c.
+``net/dsa/tag_brcm.c``.
Overall, the SF2 driver is a fairly regular DSA driver; there are a few
specifics covered below.
-------------------
The DSA platform device driver is probed using a specific compatible string
-provided in net/dsa/dsa.c. The reason for that is because the DSA subsystem gets
+provided in ``net/dsa/dsa.c``. The reason for that is because the DSA subsystem gets
registered as a platform device driver currently. DSA will provide the needed
device_node pointers which are then accessible by the switch driver setup
function to setup resources such as register ranges and interrupts. This
in order to properly configure them. By default, the SF2 pseudo-PHY address, and
an external switch pseudo-PHY address will both be snooping for incoming MDIO
transactions, since they are at the same address (30), resulting in some kind of
-"double" programming. Using DSA, and setting ds->phys_mii_mask accordingly, we
+"double" programming. Using DSA, and setting ``ds->phys_mii_mask`` accordingly, we
selectively divert reads and writes towards external Broadcom switches
pseudo-PHY addresses. Newer revisions of the SF2 hardware have introduced a
configurable pseudo-PHY address which circumvents the initial design limitation.
MoCA interface carrier state and properly report this to the networking stack.
The MoCA interfaces are supported using the PHY library's fixed PHY/emulated PHY
-device and the switch driver registers a fixed_link_update callback for such
+device and the switch driver registers a ``fixed_link_update`` callback for such
PHYs which reflects the link state obtained from the interrupt handler.
-Distributed Switch Architecture
-===============================
-
-Introduction
+============
+Architecture
============
-This document describes the Distributed Switch Architecture (DSA) subsystem
+This document describes the **Distributed Switch Architecture (DSA)** subsystem
design principles, limitations, interactions with other subsystems, and how to
develop drivers for this subsystem as well as a TODO for developers interested
in joining the effort.
DSA currently supports 5 different tagging protocols, and a tag-less mode as
well. The different protocols are implemented in:
-net/dsa/tag_trailer.c: Marvell's 4 trailer tag mode (legacy)
-net/dsa/tag_dsa.c: Marvell's original DSA tag
-net/dsa/tag_edsa.c: Marvell's enhanced DSA tag
-net/dsa/tag_brcm.c: Broadcom's 4 bytes tag
-net/dsa/tag_qca.c: Qualcomm's 2 bytes tag
+- ``net/dsa/tag_trailer.c``: Marvell's 4 trailer tag mode (legacy)
+- ``net/dsa/tag_dsa.c``: Marvell's original DSA tag
+- ``net/dsa/tag_edsa.c``: Marvell's enhanced DSA tag
+- ``net/dsa/tag_brcm.c``: Broadcom's 4 bytes tag
+- ``net/dsa/tag_qca.c``: Qualcomm's 2 bytes tag
The exact format of the tag protocol is vendor specific, but in general, they
all contain something which:
the CPU/management Ethernet interface. Such a driver might occasionally need to
know whether DSA is enabled (e.g.: to enable/disable specific offload features),
but the DSA subsystem has been proven to work with industry standard drivers:
-e1000e, mv643xx_eth etc. without having to introduce modifications to these
+``e1000e,`` ``mv643xx_eth`` etc. without having to introduce modifications to these
drivers. Such network devices are also often referred to as conduit network
devices since they act as a pipe between the host processor and the hardware
Ethernet switch.
When a master netdev is used with DSA, a small hook is placed in in the
networking stack is in order to have the DSA subsystem process the Ethernet
switch specific tagging protocol. DSA accomplishes this by registering a
-specific (and fake) Ethernet type (later becoming skb->protocol) with the
-networking stack, this is also known as a ptype or packet_type. A typical
+specific (and fake) Ethernet type (later becoming ``skb->protocol``) with the
+networking stack, this is also known as a ``ptype`` or ``packet_type``. A typical
Ethernet Frame receive sequence looks like this:
Master network device (e.g.: e1000e):
-Receive interrupt fires:
-- receive function is invoked
-- basic packet processing is done: getting length, status etc.
-- packet is prepared to be processed by the Ethernet layer by calling
- eth_type_trans
+1. Receive interrupt fires:
+
+ - receive function is invoked
+ - basic packet processing is done: getting length, status etc.
+ - packet is prepared to be processed by the Ethernet layer by calling
+ ``eth_type_trans``
+
+2. net/ethernet/eth.c::
+
+ eth_type_trans(skb, dev)
+ if (dev->dsa_ptr != NULL)
+ -> skb->protocol = ETH_P_XDSA
-net/ethernet/eth.c:
+3. drivers/net/ethernet/\*::
-eth_type_trans(skb, dev)
- if (dev->dsa_ptr != NULL)
- -> skb->protocol = ETH_P_XDSA
+ netif_receive_skb(skb)
+ -> iterate over registered packet_type
+ -> invoke handler for ETH_P_XDSA, calls dsa_switch_rcv()
-drivers/net/ethernet/*:
+4. net/dsa/dsa.c::
-netif_receive_skb(skb)
- -> iterate over registered packet_type
- -> invoke handler for ETH_P_XDSA, calls dsa_switch_rcv()
+ -> dsa_switch_rcv()
+ -> invoke switch tag specific protocol handler in 'net/dsa/tag_*.c'
-net/dsa/dsa.c:
- -> dsa_switch_rcv()
- -> invoke switch tag specific protocol handler in
- net/dsa/tag_*.c
+5. net/dsa/tag_*.c:
-net/dsa/tag_*.c:
- -> inspect and strip switch tag protocol to determine originating port
- -> locate per-port network device
- -> invoke eth_type_trans() with the DSA slave network device
- -> invoked netif_receive_skb()
+ - inspect and strip switch tag protocol to determine originating port
+ - locate per-port network device
+ - invoke ``eth_type_trans()`` with the DSA slave network device
+ - invoked ``netif_receive_skb()``
Past this point, the DSA slave network devices get delivered regular Ethernet
frames that can be processed by the networking stack.
switch tag in the Ethernet frames.
These frames are then queued for transmission using the master network device
-ndo_start_xmit() function, since they contain the appropriate switch tag, the
+``ndo_start_xmit()`` function, since they contain the appropriate switch tag, the
Ethernet switch will be able to process these incoming frames from the
management interface and delivers these frames to the physical switch port.
------------------------
Summarized, this is basically how DSA looks like from a network device
-perspective:
-
-
- |---------------------------
- | CPU network device (eth0)|
- ----------------------------
- | <tag added by switch |
- | |
- | |
- | tag added by CPU> |
- |--------------------------------------------|
- | Switch driver |
- |--------------------------------------------|
- || || ||
- |-------| |-------| |-------|
- | sw0p0 | | sw0p1 | | sw0p2 |
- |-------| |-------| |-------|
+perspective::
+
+
+ |---------------------------
+ | CPU network device (eth0)|
+ ----------------------------
+ | <tag added by switch |
+ | |
+ | |
+ | tag added by CPU> |
+ |--------------------------------------------|
+ | Switch driver |
+ |--------------------------------------------|
+ || || ||
+ |-------| |-------| |-------|
+ | sw0p0 | | sw0p1 | | sw0p2 |
+ |-------| |-------| |-------|
+
+
Slave MDIO bus
--------------
Data structures
---------------
-DSA data structures are defined in include/net/dsa.h as well as
-net/dsa/dsa_priv.h.
+DSA data structures are defined in ``include/net/dsa.h`` as well as
+``net/dsa/dsa_priv.h``:
-dsa_chip_data: platform data configuration for a given switch device, this
-structure describes a switch device's parent device, its address, as well as
-various properties of its ports: names/labels, and finally a routing table
-indication (when cascading switches)
+- ``dsa_chip_data``: platform data configuration for a given switch device,
+ this structure describes a switch device's parent device, its address, as
+ well as various properties of its ports: names/labels, and finally a routing
+ table indication (when cascading switches)
-dsa_platform_data: platform device configuration data which can reference a
-collection of dsa_chip_data structure if multiples switches are cascaded, the
-master network device this switch tree is attached to needs to be referenced
+- ``dsa_platform_data``: platform device configuration data which can reference
+ a collection of dsa_chip_data structure if multiples switches are cascaded,
+ the master network device this switch tree is attached to needs to be
+ referenced
-dsa_switch_tree: structure assigned to the master network device under
-"dsa_ptr", this structure references a dsa_platform_data structure as well as
-the tagging protocol supported by the switch tree, and which receive/transmit
-function hooks should be invoked, information about the directly attached switch
-is also provided: CPU port. Finally, a collection of dsa_switch are referenced
-to address individual switches in the tree.
+- ``dsa_switch_tree``: structure assigned to the master network device under
+ ``dsa_ptr``, this structure references a dsa_platform_data structure as well as
+ the tagging protocol supported by the switch tree, and which receive/transmit
+ function hooks should be invoked, information about the directly attached
+ switch is also provided: CPU port. Finally, a collection of dsa_switch are
+ referenced to address individual switches in the tree.
-dsa_switch: structure describing a switch device in the tree, referencing a
-dsa_switch_tree as a backpointer, slave network devices, master network device,
-and a reference to the backing dsa_switch_ops
+- ``dsa_switch``: structure describing a switch device in the tree, referencing
+ a ``dsa_switch_tree`` as a backpointer, slave network devices, master network
+ device, and a reference to the backing``dsa_switch_ops``
-dsa_switch_ops: structure referencing function pointers, see below for a full
-description.
+- ``dsa_switch_ops``: structure referencing function pointers, see below for a
+ full description.
Design limitations
==================
-----------------------------------------
DSA currently limits the number of maximum switches within a tree to 4
-(DSA_MAX_SWITCHES), and the number of ports per switch to 12 (DSA_MAX_PORTS).
+(``DSA_MAX_SWITCHES``), and the number of ports per switch to 12 (``DSA_MAX_PORTS``).
These limits could be extended to support larger configurations would this need
arise.
DSA currently leverages the following subsystems:
-- MDIO/PHY library: drivers/net/phy/phy.c, mdio_bus.c
-- Switchdev: net/switchdev/*
+- MDIO/PHY library: ``drivers/net/phy/phy.c``, ``mdio_bus.c``
+- Switchdev:``net/switchdev/*``
- Device Tree for various of_* functions
MDIO/PHY library
----------------
Slave network devices exposed by DSA may or may not be interfacing with PHY
-devices (struct phy_device as defined in include/linux/phy.h), but the DSA
+devices (``struct phy_device`` as defined in ``include/linux/phy.h)``, but the DSA
subsystem deals with all possible combinations:
- internal PHY devices, built into the Ethernet switch hardware
- special, non-autonegotiated or non MDIO-managed PHY devices: SFPs, MoCA; a.k.a
fixed PHYs
-The PHY configuration is done by the dsa_slave_phy_setup() function and the
+The PHY configuration is done by the ``dsa_slave_phy_setup()`` function and the
logic basically looks like this:
- if Device Tree is used, the PHY device is looked up using the standard
"phy-handle" property, if found, this PHY device is created and registered
- using of_phy_connect()
+ using ``of_phy_connect()``
- if Device Tree is used, and the PHY device is "fixed", that is, conforms to
the definition of a non-MDIO managed PHY as defined in
- Documentation/devicetree/bindings/net/fixed-link.txt, the PHY is registered
+ ``Documentation/devicetree/bindings/net/fixed-link.txt``, the PHY is registered
and connected transparently using the special fixed MDIO bus driver
- finally, if the PHY is built into the switch, as is very common with
-----------
DSA features a standardized binding which is documented in
-Documentation/devicetree/bindings/net/dsa/dsa.txt. PHY/MDIO library helper
-functions such as of_get_phy_mode(), of_phy_connect() are also used to query
+``Documentation/devicetree/bindings/net/dsa/dsa.txt``. PHY/MDIO library helper
+functions such as ``of_get_phy_mode()``, ``of_phy_connect()`` are also used to query
per-port PHY specific details: interface connection, MDIO bus location etc..
Driver development
DSA switch drivers need to implement a dsa_switch_ops structure which will
contain the various members described below.
-register_switch_driver() registers this dsa_switch_ops in its internal list
-of drivers to probe for. unregister_switch_driver() does the exact opposite.
+``register_switch_driver()`` registers this dsa_switch_ops in its internal list
+of drivers to probe for. ``unregister_switch_driver()`` does the exact opposite.
Unless requested differently by setting the priv_size member accordingly, DSA
does not allocate any driver private context space.
Switch configuration
--------------------
-- tag_protocol: this is to indicate what kind of tagging protocol is supported,
- should be a valid value from the dsa_tag_protocol enum
+- ``tag_protocol``: this is to indicate what kind of tagging protocol is supported,
+ should be a valid value from the ``dsa_tag_protocol`` enum
-- probe: probe routine which will be invoked by the DSA platform device upon
+- ``probe``: probe routine which will be invoked by the DSA platform device upon
registration to test for the presence/absence of a switch device. For MDIO
devices, it is recommended to issue a read towards internal registers using
the switch pseudo-PHY and return whether this is a supported device. For other
buses, return a non-NULL string
-- setup: setup function for the switch, this function is responsible for setting
- up the dsa_switch_ops private structure with all it needs: register maps,
+- ``setup``: setup function for the switch, this function is responsible for setting
+ up the ``dsa_switch_ops`` private structure with all it needs: register maps,
interrupts, mutexes, locks etc.. This function is also expected to properly
configure the switch to separate all network interfaces from each other, that
is, they should be isolated by the switch hardware itself, typically by creating
PHY devices and link management
-------------------------------
-- get_phy_flags: Some switches are interfaced to various kinds of Ethernet PHYs,
+- ``get_phy_flags``: Some switches are interfaced to various kinds of Ethernet PHYs,
if the PHY library PHY driver needs to know about information it cannot obtain
on its own (e.g.: coming from switch memory mapped registers), this function
should return a 32-bits bitmask of "flags", that is private between the switch
- driver and the Ethernet PHY driver in drivers/net/phy/*.
+ driver and the Ethernet PHY driver in ``drivers/net/phy/\*``.
-- phy_read: Function invoked by the DSA slave MDIO bus when attempting to read
+- ``phy_read``: Function invoked by the DSA slave MDIO bus when attempting to read
the switch port MDIO registers. If unavailable, return 0xffff for each read.
For builtin switch Ethernet PHYs, this function should allow reading the link
status, auto-negotiation results, link partner pages etc..
-- phy_write: Function invoked by the DSA slave MDIO bus when attempting to write
+- ``phy_write``: Function invoked by the DSA slave MDIO bus when attempting to write
to the switch port MDIO registers. If unavailable return a negative error
code.
-- adjust_link: Function invoked by the PHY library when a slave network device
+- ``adjust_link``: Function invoked by the PHY library when a slave network device
is attached to a PHY device. This function is responsible for appropriately
configuring the switch port link parameters: speed, duplex, pause based on
- what the phy_device is providing.
+ what the ``phy_device`` is providing.
-- fixed_link_update: Function invoked by the PHY library, and specifically by
+- ``fixed_link_update``: Function invoked by the PHY library, and specifically by
the fixed PHY driver asking the switch driver for link parameters that could
not be auto-negotiated, or obtained by reading the PHY registers through MDIO.
This is particularly useful for specific kinds of hardware such as QSGMII,
Ethtool operations
------------------
-- get_strings: ethtool function used to query the driver's strings, will
+- ``get_strings``: ethtool function used to query the driver's strings, will
typically return statistics strings, private flags strings etc.
-- get_ethtool_stats: ethtool function used to query per-port statistics and
+- ``get_ethtool_stats``: ethtool function used to query per-port statistics and
return their values. DSA overlays slave network devices general statistics:
RX/TX counters from the network device, with switch driver specific statistics
per port
-- get_sset_count: ethtool function used to query the number of statistics items
+- ``get_sset_count``: ethtool function used to query the number of statistics items
-- get_wol: ethtool function used to obtain Wake-on-LAN settings per-port, this
+- ``get_wol``: ethtool function used to obtain Wake-on-LAN settings per-port, this
function may, for certain implementations also query the master network device
Wake-on-LAN settings if this interface needs to participate in Wake-on-LAN
-- set_wol: ethtool function used to configure Wake-on-LAN settings per-port,
+- ``set_wol``: ethtool function used to configure Wake-on-LAN settings per-port,
direct counterpart to set_wol with similar restrictions
-- set_eee: ethtool function which is used to configure a switch port EEE (Green
+- ``set_eee``: ethtool function which is used to configure a switch port EEE (Green
Ethernet) settings, can optionally invoke the PHY library to enable EEE at the
PHY level if relevant. This function should enable EEE at the switch port MAC
controller and data-processing logic
-- get_eee: ethtool function which is used to query a switch port EEE settings,
+- ``get_eee``: ethtool function which is used to query a switch port EEE settings,
this function should return the EEE state of the switch port MAC controller
and data-processing logic as well as query the PHY for its currently configured
EEE settings
-- get_eeprom_len: ethtool function returning for a given switch the EEPROM
+- ``get_eeprom_len``: ethtool function returning for a given switch the EEPROM
length/size in bytes
-- get_eeprom: ethtool function returning for a given switch the EEPROM contents
+- ``get_eeprom``: ethtool function returning for a given switch the EEPROM contents
-- set_eeprom: ethtool function writing specified data to a given switch EEPROM
+- ``set_eeprom``: ethtool function writing specified data to a given switch EEPROM
-- get_regs_len: ethtool function returning the register length for a given
+- ``get_regs_len``: ethtool function returning the register length for a given
switch
-- get_regs: ethtool function returning the Ethernet switch internal register
+- ``get_regs``: ethtool function returning the Ethernet switch internal register
contents. This function might require user-land code in ethtool to
pretty-print register values and registers
Power management
----------------
-- suspend: function invoked by the DSA platform device when the system goes to
+- ``suspend``: function invoked by the DSA platform device when the system goes to
suspend, should quiesce all Ethernet switch activities, but keep ports
participating in Wake-on-LAN active as well as additional wake-up logic if
supported
-- resume: function invoked by the DSA platform device when the system resumes,
+- ``resume``: function invoked by the DSA platform device when the system resumes,
should resume all Ethernet switch activities and re-configure the switch to be
in a fully active state
-- port_enable: function invoked by the DSA slave network device ndo_open
+- ``port_enable``: function invoked by the DSA slave network device ndo_open
function when a port is administratively brought up, this function should be
fully enabling a given switch port. DSA takes care of marking the port with
- BR_STATE_BLOCKING if the port is a bridge member, or BR_STATE_FORWARDING if it
+ ``BR_STATE_BLOCKING`` if the port is a bridge member, or ``BR_STATE_FORWARDING`` if it
was not, and propagating these changes down to the hardware
-- port_disable: function invoked by the DSA slave network device ndo_close
+- ``port_disable``: function invoked by the DSA slave network device ndo_close
function when a port is administratively brought down, this function should be
fully disabling a given switch port. DSA takes care of marking the port with
- BR_STATE_DISABLED and propagating changes to the hardware if this port is
+ ``BR_STATE_DISABLED`` and propagating changes to the hardware if this port is
disabled while being a bridge member
Bridge layer
------------
-- port_bridge_join: bridge layer function invoked when a given switch port is
+- ``port_bridge_join``: bridge layer function invoked when a given switch port is
added to a bridge, this function should be doing the necessary at the switch
level to permit the joining port from being added to the relevant logical
domain for it to ingress/egress traffic with other members of the bridge.
-- port_bridge_leave: bridge layer function invoked when a given switch port is
+- ``port_bridge_leave``: bridge layer function invoked when a given switch port is
removed from a bridge, this function should be doing the necessary at the
switch level to deny the leaving port from ingress/egress traffic from the
remaining bridge members. When the port leaves the bridge, it should be aged
out at the switch hardware for the switch to (re) learn MAC addresses behind
this port.
-- port_stp_state_set: bridge layer function invoked when a given switch port STP
+- ``port_stp_state_set``: bridge layer function invoked when a given switch port STP
state is computed by the bridge layer and should be propagated to switch
hardware to forward/block/learn traffic. The switch driver is responsible for
computing a STP state change based on current and asked parameters and perform
Bridge VLAN filtering
---------------------
-- port_vlan_filtering: bridge layer function invoked when the bridge gets
+- ``port_vlan_filtering``: bridge layer function invoked when the bridge gets
configured for turning on or off VLAN filtering. If nothing specific needs to
be done at the hardware level, this callback does not need to be implemented.
When VLAN filtering is turned on, the hardware must be programmed with
accept any 802.1Q frames irrespective of their VLAN ID, and untagged frames are
allowed.
-- port_vlan_prepare: bridge layer function invoked when the bridge prepares the
+- ``port_vlan_prepare``: bridge layer function invoked when the bridge prepares the
configuration of a VLAN on the given port. If the operation is not supported
- by the hardware, this function should return -EOPNOTSUPP to inform the bridge
+ by the hardware, this function should return ``-EOPNOTSUPP`` to inform the bridge
code to fallback to a software implementation. No hardware setup must be done
in this function. See port_vlan_add for this and details.
-- port_vlan_add: bridge layer function invoked when a VLAN is configured
+- ``port_vlan_add``: bridge layer function invoked when a VLAN is configured
(tagged or untagged) for the given switch port
-- port_vlan_del: bridge layer function invoked when a VLAN is removed from the
+- ``port_vlan_del``: bridge layer function invoked when a VLAN is removed from the
given switch port
-- port_vlan_dump: bridge layer function invoked with a switchdev callback
+- ``port_vlan_dump``: bridge layer function invoked with a switchdev callback
function that the driver has to call for each VLAN the given port is a member
of. A switchdev object is used to carry the VID and bridge flags.
-- port_fdb_add: bridge layer function invoked when the bridge wants to install a
+- ``port_fdb_add``: bridge layer function invoked when the bridge wants to install a
Forwarding Database entry, the switch hardware should be programmed with the
specified address in the specified VLAN Id in the forwarding database
associated with this VLAN ID. If the operation is not supported, this
- function should return -EOPNOTSUPP to inform the bridge code to fallback to
+ function should return ``-EOPNOTSUPP`` to inform the bridge code to fallback to
a software implementation.
-Note: VLAN ID 0 corresponds to the port private database, which, in the context
-of DSA, would be the its port-based VLAN, used by the associated bridge device.
+.. note:: VLAN ID 0 corresponds to the port private database, which, in the context
+ of DSA, would be the its port-based VLAN, used by the associated bridge device.
-- port_fdb_del: bridge layer function invoked when the bridge wants to remove a
+- ``port_fdb_del``: bridge layer function invoked when the bridge wants to remove a
Forwarding Database entry, the switch hardware should be programmed to delete
the specified MAC address from the specified VLAN ID if it was mapped into
this port forwarding database
-- port_fdb_dump: bridge layer function invoked with a switchdev callback
+- ``port_fdb_dump``: bridge layer function invoked with a switchdev callback
function that the driver has to call for each MAC address known to be behind
the given port. A switchdev object is used to carry the VID and FDB info.
-- port_mdb_prepare: bridge layer function invoked when the bridge prepares the
+- ``port_mdb_prepare``: bridge layer function invoked when the bridge prepares the
installation of a multicast database entry. If the operation is not supported,
- this function should return -EOPNOTSUPP to inform the bridge code to fallback
+ this function should return ``-EOPNOTSUPP`` to inform the bridge code to fallback
to a software implementation. No hardware setup must be done in this function.
- See port_fdb_add for this and details.
+ See ``port_fdb_add`` for this and details.
-- port_mdb_add: bridge layer function invoked when the bridge wants to install
+- ``port_mdb_add``: bridge layer function invoked when the bridge wants to install
a multicast database entry, the switch hardware should be programmed with the
specified address in the specified VLAN ID in the forwarding database
associated with this VLAN ID.
-Note: VLAN ID 0 corresponds to the port private database, which, in the context
-of DSA, would be the its port-based VLAN, used by the associated bridge device.
+.. note:: VLAN ID 0 corresponds to the port private database, which, in the context
+ of DSA, would be the its port-based VLAN, used by the associated bridge device.
-- port_mdb_del: bridge layer function invoked when the bridge wants to remove a
+- ``port_mdb_del``: bridge layer function invoked when the bridge wants to remove a
multicast database entry, the switch hardware should be programmed to delete
the specified MAC address from the specified VLAN ID if it was mapped into
this port forwarding database.
-- port_mdb_dump: bridge layer function invoked with a switchdev callback
+- ``port_mdb_dump``: bridge layer function invoked with a switchdev callback
function that the driver has to call for each MAC address known to be behind
the given port. A switchdev object is used to carry the VID and MDB info.
Other hanging fruits
--------------------
-- making the number of ports fully dynamic and not dependent on DSA_MAX_PORTS
+- making the number of ports fully dynamic and not dependent on ``DSA_MAX_PORTS``
- allowing more than one CPU/management interface:
http://comments.gmane.org/gmane.linux.network/365657
- porting more drivers from other vendors:
--- /dev/null
+===============================
+Distributed Switch Architecture
+===============================
+
+.. toctree::
+ :maxdepth: 1
+
+ dsa
+ bcm_sf2
+ lan9303
+==============================
LAN9303 Ethernet switch driver
==============================
Driver details
==============
-The driver is implemented as a DSA driver, see
-Documentation/networking/dsa/dsa.txt.
+The driver is implemented as a DSA driver, see ``Documentation/networking/dsa/dsa.rst``.
-See Documentation/devicetree/bindings/net/dsa/lan9303.txt for device tree
+See ``Documentation/devicetree/bindings/net/dsa/lan9303.txt`` for device tree
binding.
The LAN9303 can be managed both via MDIO and I2C, both supported by this driver.
device_drivers/intel/i40e
device_drivers/intel/iavf
device_drivers/intel/ice
+ dsa/index
devlink-info-versions
ieee802154
kapi
0 - Layer 3
1 - Layer 4
+fib_sync_mem - UNSIGNED INTEGER
+ Amount of dirty memory from fib entries that can be backlogged before
+ synchronize_rcu is forced.
+ Default: 512kB Minimum: 64kB Maximum: 64MB
+
ip_forward_update_priority - INTEGER
Whether to update SKB priority from "TOS" field in IPv4 header after it
is forwarded. The new SKB priority is mapped from TOS field value
requests sent to it over the IPv6 protocol.
Default: 0
+echo_ignore_multicast - BOOLEAN
+ If set non-zero, then the kernel will ignore all ICMP ECHO
+ requests sent to it over the IPv6 protocol via multicast.
+ Default: 0
+
+echo_ignore_anycast - BOOLEAN
+ If set non-zero, then the kernel will ignore all ICMP ECHO
+ requests sent to it over the IPv6 protocol destined to anycast address.
+ Default: 0
+
xfrm6_gc_thresh - INTEGER
The threshold at which we will start garbage collecting for IPv6
destination cache entries. At twice this value the system will
M: Antonio Quartulli <a@unstable.cc>
L: b.a.t.m.a.n@lists.open-mesh.org (moderated for non-subscribers)
W: https://www.open-mesh.org/
+B: https://www.open-mesh.org/projects/batman-adv/issues
+C: irc://chat.freenode.net/batman
Q: https://patchwork.open-mesh.org/project/batman/list/
+T: git https://git.open-mesh.org/linux-merge.git
S: Maintained
-F: Documentation/ABI/testing/sysfs-class-net-batman-adv
-F: Documentation/ABI/testing/sysfs-class-net-mesh
+F: Documentation/ABI/obsolete/sysfs-class-net-batman-adv
+F: Documentation/ABI/obsolete/sysfs-class-net-mesh
F: Documentation/networking/batman-adv.rst
F: include/uapi/linux/batadv_packet.h
F: include/uapi/linux/batman_adv.h
F: drivers/net/ethernet/mellanox/mlx5/core/fpga/*
F: include/linux/mlx5/mlx5_ifc_fpga.h
-MELLANOX ETHERNET INNOVA IPSEC DRIVER
-R: Boris Pismenny <borisp@mellanox.com>
-L: netdev@vger.kernel.org
-S: Supported
-W: http://www.mellanox.com
-Q: http://patchwork.ozlabs.org/project/netdev/list/
-F: drivers/net/ethernet/mellanox/mlx5/core/en_ipsec/*
-F: drivers/net/ethernet/mellanox/mlx5/core/ipsec*
-
MELLANOX ETHERNET SWITCH DRIVERS
M: Jiri Pirko <jiri@mellanox.com>
M: Ido Schimmel <idosch@mellanox.com>
STRIP = $(CROSS_COMPILE)strip
OBJCOPY = $(CROSS_COMPILE)objcopy
OBJDUMP = $(CROSS_COMPILE)objdump
+PAHOLE = pahole
LEX = flex
YACC = bison
AWK = awk
GCC_PLUGINS_CFLAGS :=
export ARCH SRCARCH CONFIG_SHELL HOSTCC KBUILD_HOSTCFLAGS CROSS_COMPILE AS LD CC
-export CPP AR NM STRIP OBJCOPY OBJDUMP KBUILD_HOSTLDFLAGS KBUILD_HOSTLDLIBS
+export CPP AR NM STRIP OBJCOPY OBJDUMP PAHOLE KBUILD_HOSTLDFLAGS KBUILD_HOSTLDLIBS
export MAKE LEX YACC AWK INSTALLKERNEL PERL PYTHON PYTHON2 PYTHON3 UTS_MACHINE
export HOSTCXX KBUILD_HOSTCXXFLAGS LDFLAGS_MODULE CHECK CHECKFLAGS
static const struct genl_ops nbd_connect_genl_ops[] = {
{
.cmd = NBD_CMD_CONNECT,
- .policy = nbd_attr_policy,
.doit = nbd_genl_connect,
},
{
.cmd = NBD_CMD_DISCONNECT,
- .policy = nbd_attr_policy,
.doit = nbd_genl_disconnect,
},
{
.cmd = NBD_CMD_RECONFIGURE,
- .policy = nbd_attr_policy,
.doit = nbd_genl_reconfigure,
},
{
.cmd = NBD_CMD_STATUS,
- .policy = nbd_attr_policy,
.doit = nbd_genl_status,
},
};
.ops = nbd_connect_genl_ops,
.n_ops = ARRAY_SIZE(nbd_connect_genl_ops),
.maxattr = NBD_ATTR_MAX,
+ .policy = nbd_attr_policy,
.mcgrps = nbd_mcast_grps,
.n_mcgrps = ARRAY_SIZE(nbd_mcast_grps),
};
#include <net/neighbour.h>
#include <net/route.h>
#include <net/netevent.h>
-#include <net/addrconf.h>
+#include <net/ipv6_stubs.h>
#include <net/ip6_route.h>
#include <rdma/ib_addr.h>
#include <rdma/ib_sa.h>
if (family == AF_INET) {
rt = container_of(dst, struct rtable, dst);
- return rt->rt_uses_gateway;
+ return rt->rt_gw_family == AF_INET;
}
rt6 = container_of(dst, struct rt6_info, dst);
static u16 hfi1_vnic_select_queue(struct net_device *netdev,
struct sk_buff *skb,
- struct net_device *sb_dev,
- select_queue_fallback_t fallback)
+ struct net_device *sb_dev)
{
struct hfi1_vnic_vport_info *vinfo = opa_vnic_dev_priv(netdev);
struct opa_vnic_skb_mdata *mdata;
return ret;
}
- *addr = pci_resource_start(dev->pdev, 0) +
+ *addr = dev->bar_addr +
MLX5_GET64(alloc_memic_out, out, memic_start_addr);
return 0;
u64 start_page_idx;
int err;
- addr -= pci_resource_start(dev->pdev, 0);
+ addr -= dev->bar_addr;
start_page_idx = (addr - hw_start_addr) >> PAGE_SHIFT;
MLX5_SET(dealloc_memic_in, in, opcode, MLX5_CMD_OP_DEALLOC_MEMIC);
fw_uars_per_page = MLX5_CAP_GEN(dev->mdev, uar_4k) ? MLX5_UARS_IN_PAGE : 1;
- return (pci_resource_start(dev->mdev->pdev, 0) >> PAGE_SHIFT) + uar_idx / fw_uars_per_page;
+ return (dev->mdev->bar_addr >> PAGE_SHIFT) + uar_idx / fw_uars_per_page;
}
static int get_command(unsigned long offset)
page_idx + npages)
return -EINVAL;
- pfn = ((pci_resource_start(dev->mdev->pdev, 0) +
+ pfn = ((dev->mdev->bar_addr +
MLX5_CAP64_DEV_MEM(dev->mdev, memic_bar_start_addr)) >>
PAGE_SHIFT) +
page_idx;
goto err_free;
start_offset = memic_addr & ~PAGE_MASK;
- page_idx = (memic_addr - pci_resource_start(memic->dev->pdev, 0) -
+ page_idx = (memic_addr - memic->dev->bar_addr -
MLX5_CAP64_DEV_MEM(memic->dev, memic_bar_start_addr)) >>
PAGE_SHIFT;
if (ret)
return ret;
- page_idx = (dm->dev_addr - pci_resource_start(memic->dev->pdev, 0) -
+ page_idx = (dm->dev_addr - memic->dev->bar_addr -
MLX5_CAP64_DEV_MEM(memic->dev, memic_bar_start_addr)) >>
PAGE_SHIFT;
bitmap_clear(to_mucontext(ibdm->uobject->context)->dm_pages,
MLX5_SET64(mkc, mkc, len, length);
MLX5_SET(mkc, mkc, pd, to_mpd(pd)->pdn);
MLX5_SET(mkc, mkc, qpn, 0xffffff);
- MLX5_SET64(mkc, mkc, start_addr,
- memic_addr - pci_resource_start(dev->mdev->pdev, 0));
+ MLX5_SET64(mkc, mkc, start_addr, memic_addr - dev->mdev->bar_addr);
err = mlx5_core_create_mkey(mdev, &mr->mmkey, in, inlen);
if (err)
wmb();
/* currently we support only regular doorbells */
- mlx5_write64((__be32 *)ctrl, bf->bfreg->map + bf->offset, NULL);
+ mlx5_write64((__be32 *)ctrl, bf->bfreg->map + bf->offset);
/* Make sure doorbells don't leak out of SQ spinlock
* and reach the HCA out of order.
*/
if (neigh->nud_state & NUD_VALID) {
nes_debug(NES_DBG_CM, "Neighbor MAC address for 0x%08X"
" is %pM, Gateway is 0x%08X \n", dst_ip,
- neigh->ha, ntohl(rt->rt_gateway));
+ neigh->ha, ntohl(rt->rt_gw4));
if (arpindex >= 0) {
if (ether_addr_equal(nesadapter->arp_table[arpindex].mac_addr, neigh->ha)) {
}
static u16 opa_vnic_select_queue(struct net_device *netdev, struct sk_buff *skb,
- struct net_device *sb_dev,
- select_queue_fallback_t fallback)
+ struct net_device *sb_dev)
{
struct opa_vnic_adapter *adapter = opa_vnic_priv(netdev);
struct opa_vnic_skb_mdata *mdata;
mdata = skb_push(skb, sizeof(*mdata));
mdata->entropy = opa_vnic_calc_entropy(skb);
mdata->vl = opa_vnic_get_vl(adapter, skb);
- rc = adapter->rn_ops->ndo_select_queue(netdev, skb,
- sb_dev, fallback);
+ rc = adapter->rn_ops->ndo_select_queue(netdev, skb, sb_dev);
skb_pull(skb, sizeof(*mdata));
return rc;
}
return seq;
}
-struct sk_buff *isdn_ppp_mp_discard(ippp_bundle *mp,
- struct sk_buff *from, struct sk_buff *to)
+static struct sk_buff *isdn_ppp_mp_discard(ippp_bundle *mp,
+ struct sk_buff *from,
+ struct sk_buff *to)
{
if (from)
while (from != to) {
return from;
}
-void isdn_ppp_mp_reassembly(isdn_net_dev *net_dev, isdn_net_local *lp,
- struct sk_buff *from, struct sk_buff *to)
+static void isdn_ppp_mp_reassembly(isdn_net_dev *net_dev, isdn_net_local *lp,
+ struct sk_buff *from, struct sk_buff *to)
{
ippp_bundle *mp = net_dev->pb;
int proto;
config NETDEVSIM
tristate "Simulated networking device"
depends on DEBUG_FS
+ select NET_DEVLINK
help
This driver is a developer testing tool and software model that can
be used to test various control path networking APIs, especially
*/
static netdev_tx_t ipddp_xmit(struct sk_buff *skb, struct net_device *dev)
{
- __be32 paddr = skb_rtable(skb)->rt_gateway;
+ struct rtable *rtable = skb_rtable(skb);
+ __be32 paddr = 0;
struct ddpehdr *ddp;
struct ipddp_route *rt;
struct atalk_addr *our_addr;
+ if (rtable->rt_gw_family == AF_INET)
+ paddr = rtable->rt_gw4;
+
spin_lock(&ipddp_route_lock);
/*
static u16 bond_select_queue(struct net_device *dev, struct sk_buff *skb,
- struct net_device *sb_dev,
- select_queue_fallback_t fallback)
+ struct net_device *sb_dev)
{
/* This helper function exists to help dev_pick_tx get the correct
* destination queue. Using a helper function skips a call to
if (ret)
goto out_mdio;
- pr_info("Starfighter 2 top: %x.%02x, core: %x.%02x base: 0x%p, IRQs: %d, %d\n",
- priv->hw_params.top_rev >> 8, priv->hw_params.top_rev & 0xff,
- priv->hw_params.core_rev >> 8, priv->hw_params.core_rev & 0xff,
- priv->core, priv->irq0, priv->irq1);
+ dev_info(&pdev->dev,
+ "Starfighter 2 top: %x.%02x, core: %x.%02x, IRQs: %d, %d\n",
+ priv->hw_params.top_rev >> 8, priv->hw_params.top_rev & 0xff,
+ priv->hw_params.core_rev >> 8, priv->hw_params.core_rev & 0xff,
+ priv->irq0, priv->irq1);
return 0;
interface = PHY_INTERFACE_MODE_GMII;
if (gbit)
break;
+ /* fall through */
case 0:
interface = PHY_INTERFACE_MODE_MII;
break;
return 0;
}
-static void mv88e6xxx_ports_cmode_init(struct mv88e6xxx_chip *chip)
-{
- int i;
-
- for (i = 0; i < mv88e6xxx_num_ports(chip); i++)
- chip->ports[i].cmode = MV88E6XXX_PORT_STS_CMODE_INVALID;
-}
-
static enum dsa_tag_protocol mv88e6xxx_get_tag_protocol(struct dsa_switch *ds,
int port)
{
if (err)
goto free;
- mv88e6xxx_ports_cmode_init(chip);
-
mutex_lock(&chip->reg_lock);
err = mv88e6xxx_switch_reset(chip);
mutex_unlock(&chip->reg_lock);
if (err)
goto out;
- mv88e6xxx_ports_cmode_init(chip);
mv88e6xxx_phy_init(chip);
if (chip->info->ops->get_eeprom) {
#define MV88E6185_PORT_STS_CMODE_1000BASE_X 0x0005
#define MV88E6185_PORT_STS_CMODE_PHY 0x0006
#define MV88E6185_PORT_STS_CMODE_DISABLED 0x0007
-#define MV88E6XXX_PORT_STS_CMODE_INVALID 0xff
/* Offset 0x01: MAC (or PCS or Physical) Control Register */
#define MV88E6XXX_PORT_MAC_CTL 0x01
#include <linux/kernel.h>
#include <linux/netdevice.h>
#include <linux/etherdevice.h>
+#include <linux/ethtool.h>
#include <linux/init.h>
#include <linux/moduleparam.h>
#include <linux/rtnetlink.h>
strlcpy(info->version, DRV_VERSION, sizeof(info->version));
}
-static int dummy_get_ts_info(struct net_device *dev,
- struct ethtool_ts_info *ts_info)
-{
- ts_info->so_timestamping = SOF_TIMESTAMPING_TX_SOFTWARE |
- SOF_TIMESTAMPING_RX_SOFTWARE |
- SOF_TIMESTAMPING_SOFTWARE;
-
- ts_info->phc_index = -1;
-
- return 0;
-};
-
static const struct ethtool_ops dummy_ethtool_ops = {
.get_drvinfo = dummy_get_drvinfo,
- .get_ts_info = dummy_get_ts_info,
+ .get_ts_info = ethtool_op_get_ts_info,
};
static void dummy_setup(struct net_device *dev)
}
}
- if (netif_xmit_stopped(txq) || !skb->xmit_more) {
+ if (netif_xmit_stopped(txq) || !netdev_xmit_more()) {
/* trigger the dma engine. ena_com_write_sq_doorbell()
* has a mb
*/
}
static u16 ena_select_queue(struct net_device *dev, struct sk_buff *skb,
- struct net_device *sb_dev,
- select_queue_fallback_t fallback)
+ struct net_device *sb_dev)
{
u16 qid;
/* we suspect that this is good for in--kernel network services that
if (skb_rx_queue_recorded(skb))
qid = skb_get_rx_queue(skb);
else
- qid = fallback(dev, skb, NULL);
+ qid = netdev_pick_tx(dev, skb, NULL);
return qid;
}
smp_wmb();
ring->cur = cur_index + 1;
- if (!packet->skb->xmit_more ||
+ if (!netdev_xmit_more() ||
netif_xmit_stopped(netdev_get_tx_queue(pdata->netdev,
channel->queue_index)))
xgbe_tx_start_xmit(channel, ring);
config AQTION
tristate "aQuantia AQtion(tm) Support"
- depends on PCI && X86_64
+ depends on PCI
+ depends on X86_64 || ARM64 || COMPILE_TEST
---help---
This enables the support for the aQuantia AQtion(tm) Ethernet card.
#define AQ_CFG_TCS_DEF 1U
#define AQ_CFG_TXDS_DEF 4096U
-#define AQ_CFG_RXDS_DEF 1024U
+#define AQ_CFG_RXDS_DEF 2048U
#define AQ_CFG_IS_POLLING_DEF 0U
#define AQ_CFG_TCS_MAX 8U
#define AQ_CFG_TX_FRAME_MAX (16U * 1024U)
-#define AQ_CFG_RX_FRAME_MAX (4U * 1024U)
+#define AQ_CFG_RX_FRAME_MAX (2U * 1024U)
#define AQ_CFG_TX_CLEAN_BUDGET 256U
+#define AQ_CFG_RX_REFILL_THRES 32U
+
+#define AQ_CFG_RX_HDR_SIZE 256U
+
+#define AQ_CFG_RX_PAGEORDER 0U
+
/* LRO */
#define AQ_CFG_IS_LRO_DEF 1U
cfg->tx_itr = aq_itr_tx;
cfg->rx_itr = aq_itr_rx;
+ cfg->rxpageorder = AQ_CFG_RX_PAGEORDER;
cfg->is_rss = AQ_CFG_IS_RSS_DEF;
cfg->num_rss_queues = AQ_CFG_NUM_RSS_QUEUES_DEF;
cfg->aq_rss.base_cpu_number = AQ_CFG_RSS_BASE_CPU_NUM_DEF;
u32 itr;
u16 rx_itr;
u16 tx_itr;
+ u32 rxpageorder;
u32 num_rss_queues;
u32 mtu;
u32 flow_control;
#include "aq_ring.h"
#include "aq_nic.h"
#include "aq_hw.h"
+#include "aq_hw_utils.h"
#include <linux/netdevice.h>
#include <linux/etherdevice.h>
+static inline void aq_free_rxpage(struct aq_rxpage *rxpage, struct device *dev)
+{
+ unsigned int len = PAGE_SIZE << rxpage->order;
+
+ dma_unmap_page(dev, rxpage->daddr, len, DMA_FROM_DEVICE);
+
+ /* Drop the ref for being in the ring. */
+ __free_pages(rxpage->page, rxpage->order);
+ rxpage->page = NULL;
+}
+
+static int aq_get_rxpage(struct aq_rxpage *rxpage, unsigned int order,
+ struct device *dev)
+{
+ struct page *page;
+ dma_addr_t daddr;
+ int ret = -ENOMEM;
+
+ page = dev_alloc_pages(order);
+ if (unlikely(!page))
+ goto err_exit;
+
+ daddr = dma_map_page(dev, page, 0, PAGE_SIZE << order,
+ DMA_FROM_DEVICE);
+
+ if (unlikely(dma_mapping_error(dev, daddr)))
+ goto free_page;
+
+ rxpage->page = page;
+ rxpage->daddr = daddr;
+ rxpage->order = order;
+ rxpage->pg_off = 0;
+
+ return 0;
+
+free_page:
+ __free_pages(page, order);
+
+err_exit:
+ return ret;
+}
+
+static int aq_get_rxpages(struct aq_ring_s *self, struct aq_ring_buff_s *rxbuf,
+ int order)
+{
+ int ret;
+
+ if (rxbuf->rxdata.page) {
+ /* One means ring is the only user and can reuse */
+ if (page_ref_count(rxbuf->rxdata.page) > 1) {
+ /* Try reuse buffer */
+ rxbuf->rxdata.pg_off += AQ_CFG_RX_FRAME_MAX;
+ if (rxbuf->rxdata.pg_off + AQ_CFG_RX_FRAME_MAX <=
+ (PAGE_SIZE << order)) {
+ self->stats.rx.pg_flips++;
+ } else {
+ /* Buffer exhausted. We have other users and
+ * should release this page and realloc
+ */
+ aq_free_rxpage(&rxbuf->rxdata,
+ aq_nic_get_dev(self->aq_nic));
+ self->stats.rx.pg_losts++;
+ }
+ } else {
+ rxbuf->rxdata.pg_off = 0;
+ self->stats.rx.pg_reuses++;
+ }
+ }
+
+ if (!rxbuf->rxdata.page) {
+ ret = aq_get_rxpage(&rxbuf->rxdata, order,
+ aq_nic_get_dev(self->aq_nic));
+ return ret;
+ }
+
+ return 0;
+}
+
static struct aq_ring_s *aq_ring_alloc(struct aq_ring_s *self,
struct aq_nic_s *aq_nic)
{
self->idx = idx;
self->size = aq_nic_cfg->rxds;
self->dx_size = aq_nic_cfg->aq_hw_caps->rxd_size;
+ self->page_order = fls(AQ_CFG_RX_FRAME_MAX / PAGE_SIZE +
+ (AQ_CFG_RX_FRAME_MAX % PAGE_SIZE ? 1 : 0)) - 1;
+
+ if (aq_nic_cfg->rxpageorder > self->page_order)
+ self->page_order = aq_nic_cfg->rxpageorder;
self = aq_ring_alloc(self, aq_nic);
if (!self) {
int budget)
{
struct net_device *ndev = aq_nic_get_ndev(self->aq_nic);
- int err = 0;
bool is_rsc_completed = true;
+ int err = 0;
for (; (self->sw_head != self->hw_head) && budget;
self->sw_head = aq_ring_next_dx(self, self->sw_head),
--budget, ++(*work_done)) {
struct aq_ring_buff_s *buff = &self->buff_ring[self->sw_head];
+ struct aq_ring_buff_s *buff_ = NULL;
struct sk_buff *skb = NULL;
unsigned int next_ = 0U;
unsigned int i = 0U;
- struct aq_ring_buff_s *buff_ = NULL;
+ u16 hdr_len;
- if (buff->is_error) {
- __free_pages(buff->page, 0);
+ if (buff->is_error)
continue;
- }
if (buff->is_cleaned)
continue;
}
}
+ dma_sync_single_range_for_cpu(aq_nic_get_dev(self->aq_nic),
+ buff->rxdata.daddr,
+ buff->rxdata.pg_off,
+ buff->len, DMA_FROM_DEVICE);
+
/* for single fragment packets use build_skb() */
if (buff->is_eop &&
buff->len <= AQ_CFG_RX_FRAME_MAX - AQ_SKB_ALIGN) {
- skb = build_skb(page_address(buff->page),
+ skb = build_skb(aq_buf_vaddr(&buff->rxdata),
AQ_CFG_RX_FRAME_MAX);
if (unlikely(!skb)) {
err = -ENOMEM;
goto err_exit;
}
-
skb_put(skb, buff->len);
+ page_ref_inc(buff->rxdata.page);
} else {
- skb = netdev_alloc_skb(ndev, ETH_HLEN);
+ skb = napi_alloc_skb(napi, AQ_CFG_RX_HDR_SIZE);
if (unlikely(!skb)) {
err = -ENOMEM;
goto err_exit;
}
- skb_put(skb, ETH_HLEN);
- memcpy(skb->data, page_address(buff->page), ETH_HLEN);
- skb_add_rx_frag(skb, 0, buff->page, ETH_HLEN,
- buff->len - ETH_HLEN,
- SKB_TRUESIZE(buff->len - ETH_HLEN));
+ hdr_len = buff->len;
+ if (hdr_len > AQ_CFG_RX_HDR_SIZE)
+ hdr_len = eth_get_headlen(aq_buf_vaddr(&buff->rxdata),
+ AQ_CFG_RX_HDR_SIZE);
+
+ memcpy(__skb_put(skb, hdr_len), aq_buf_vaddr(&buff->rxdata),
+ ALIGN(hdr_len, sizeof(long)));
+
+ if (buff->len - hdr_len > 0) {
+ skb_add_rx_frag(skb, 0, buff->rxdata.page,
+ buff->rxdata.pg_off + hdr_len,
+ buff->len - hdr_len,
+ AQ_CFG_RX_FRAME_MAX);
+ page_ref_inc(buff->rxdata.page);
+ }
if (!buff->is_eop) {
- for (i = 1U, next_ = buff->next,
- buff_ = &self->buff_ring[next_];
- true; next_ = buff_->next,
- buff_ = &self->buff_ring[next_], ++i) {
- skb_add_rx_frag(skb, i,
- buff_->page, 0,
+ buff_ = buff;
+ i = 1U;
+ do {
+ next_ = buff_->next,
+ buff_ = &self->buff_ring[next_];
+
+ dma_sync_single_range_for_cpu(
+ aq_nic_get_dev(self->aq_nic),
+ buff_->rxdata.daddr,
+ buff_->rxdata.pg_off,
buff_->len,
- SKB_TRUESIZE(buff->len -
- ETH_HLEN));
+ DMA_FROM_DEVICE);
+ skb_add_rx_frag(skb, i++,
+ buff_->rxdata.page,
+ buff_->rxdata.pg_off,
+ buff_->len,
+ AQ_CFG_RX_FRAME_MAX);
+ page_ref_inc(buff_->rxdata.page);
buff_->is_cleaned = 1;
-
- if (buff_->is_eop)
- break;
- }
+ } while (!buff_->is_eop);
}
}
int aq_ring_rx_fill(struct aq_ring_s *self)
{
- unsigned int pages_order = fls(AQ_CFG_RX_FRAME_MAX / PAGE_SIZE +
- (AQ_CFG_RX_FRAME_MAX % PAGE_SIZE ? 1 : 0)) - 1;
+ unsigned int page_order = self->page_order;
struct aq_ring_buff_s *buff = NULL;
int err = 0;
int i = 0;
+ if (aq_ring_avail_dx(self) < min_t(unsigned int, AQ_CFG_RX_REFILL_THRES,
+ self->size / 2))
+ return err;
+
for (i = aq_ring_avail_dx(self); i--;
self->sw_tail = aq_ring_next_dx(self, self->sw_tail)) {
buff = &self->buff_ring[self->sw_tail];
buff->flags = 0U;
buff->len = AQ_CFG_RX_FRAME_MAX;
- buff->page = alloc_pages(GFP_ATOMIC | __GFP_COMP, pages_order);
- if (!buff->page) {
- err = -ENOMEM;
+ err = aq_get_rxpages(self, buff, page_order);
+ if (err)
goto err_exit;
- }
-
- buff->pa = dma_map_page(aq_nic_get_dev(self->aq_nic),
- buff->page, 0,
- AQ_CFG_RX_FRAME_MAX, DMA_FROM_DEVICE);
-
- if (dma_mapping_error(aq_nic_get_dev(self->aq_nic), buff->pa)) {
- err = -ENOMEM;
- goto err_exit;
- }
+ buff->pa = aq_buf_daddr(&buff->rxdata);
buff = NULL;
}
err_exit:
- if (err < 0) {
- if (buff && buff->page)
- __free_pages(buff->page, 0);
- }
-
return err;
}
self->sw_head = aq_ring_next_dx(self, self->sw_head)) {
struct aq_ring_buff_s *buff = &self->buff_ring[self->sw_head];
- dma_unmap_page(aq_nic_get_dev(self->aq_nic), buff->pa,
- AQ_CFG_RX_FRAME_MAX, DMA_FROM_DEVICE);
-
- __free_pages(buff->page, 0);
+ aq_free_rxpage(&buff->rxdata, aq_nic_get_dev(self->aq_nic));
}
err_exit:;
struct page;
struct aq_nic_cfg_s;
+struct aq_rxpage {
+ struct page *page;
+ dma_addr_t daddr;
+ unsigned int order;
+ unsigned int pg_off;
+};
+
/* TxC SOP DX EOP
* +----------+----------+----------+-----------
* 8bytes|len l3,l4 | pa | pa | pa
*/
struct __packed aq_ring_buff_s {
union {
+ /* RX/TX */
+ dma_addr_t pa;
/* RX */
struct {
u32 rss_hash;
u16 next;
u8 is_hash_l4;
u8 rsvd1;
- struct page *page;
+ struct aq_rxpage rxdata;
};
/* EOP */
struct {
dma_addr_t pa_eop;
struct sk_buff *skb;
};
- /* DX */
- struct {
- dma_addr_t pa;
- };
- /* SOP */
- struct {
- dma_addr_t pa_sop;
- u32 len_pkt_sop;
- };
/* TxC */
struct {
u32 mss;
u64 bytes;
u64 lro_packets;
u64 jumbo_packets;
+ u64 pg_losts;
+ u64 pg_flips;
+ u64 pg_reuses;
};
struct aq_ring_stats_tx_s {
unsigned int size; /* descriptors number */
unsigned int dx_size; /* TX or RX descriptor size, */
/* stored here for fater math */
+ unsigned int page_order;
union aq_ring_stats_s stats;
dma_addr_t dx_ring_pa;
};
cpumask_t affinity_mask;
};
+static inline void *aq_buf_vaddr(struct aq_rxpage *rxpage)
+{
+ return page_to_virt(rxpage->page) + rxpage->pg_off;
+}
+
+static inline dma_addr_t aq_buf_daddr(struct aq_rxpage *rxpage)
+{
+ return rxpage->daddr + rxpage->pg_off;
+}
+
static inline unsigned int aq_ring_next_dx(struct aq_ring_s *self,
unsigned int dx)
{
stats_rx->errors += rx->errors;
stats_rx->jumbo_packets += rx->jumbo_packets;
stats_rx->lro_packets += rx->lro_packets;
+ stats_rx->pg_losts += rx->pg_losts;
+ stats_rx->pg_flips += rx->pg_flips;
+ stats_rx->pg_reuses += rx->pg_reuses;
stats_tx->packets += tx->packets;
stats_tx->bytes += tx->bytes;
static int hw_atl_a0_hw_ring_rx_receive(struct aq_hw_s *self,
struct aq_ring_s *ring)
{
- struct device *ndev = aq_nic_get_dev(ring->aq_nic);
-
for (; ring->hw_head != ring->sw_tail;
ring->hw_head = aq_ring_next_dx(ring, ring->hw_head)) {
struct aq_ring_buff_s *buff = NULL;
is_err &= ~0x18U;
is_err &= ~0x04U;
- dma_unmap_page(ndev, buff->pa, buff->len, DMA_FROM_DEVICE);
-
if (is_err || rxd_wb->type & 0x1000U) {
/* status error or DMA error */
buff->is_error = 1U;
hw_atl_rpo_lro_time_base_divider_set(self, 0x61AU);
hw_atl_rpo_lro_inactive_interval_set(self, 0);
- hw_atl_rpo_lro_max_coalescing_interval_set(self, 2);
+ /* the LRO timebase divider is 5 uS (0x61a),
+ * which is multiplied by 50(0x32)
+ * to get a maximum coalescing interval of 250 uS,
+ * which is the default value
+ */
+ hw_atl_rpo_lro_max_coalescing_interval_set(self, 50);
+
hw_atl_rpo_lro_qsessions_lim_set(self, 1U);
hw_atl_rpo_lro_en_set(self,
aq_nic_cfg->is_lro ? 0xFFFFFFFFU : 0U);
+ hw_atl_itr_rsc_en_set(self,
+ aq_nic_cfg->is_lro ? 0xFFFFFFFFU : 0U);
+
+ hw_atl_itr_rsc_delay_set(self, 1U);
}
return aq_hw_err_from_flags(self);
}
static int hw_atl_b0_hw_ring_rx_receive(struct aq_hw_s *self,
struct aq_ring_s *ring)
{
- struct device *ndev = aq_nic_get_dev(ring->aq_nic);
-
for (; ring->hw_head != ring->sw_tail;
ring->hw_head = aq_ring_next_dx(ring, ring->hw_head)) {
struct aq_ring_buff_s *buff = NULL;
buff->is_cso_err = 0U;
}
- dma_unmap_page(ndev, buff->pa, buff->len, DMA_FROM_DEVICE);
-
if ((rx_stat & BIT(0)) || rxd_wb->type & 0x1000U) {
/* MAC error or DMA error */
buff->is_error = 1U;
#define HW_ATL_B0_TC_MAX 1U
#define HW_ATL_B0_RSS_MAX 8U
-#define HW_ATL_B0_LRO_RXD_MAX 2U
+#define HW_ATL_B0_LRO_RXD_MAX 16U
#define HW_ATL_B0_RS_SLIP_ENABLED 0U
/* (256k -1(max pay_len) - 54(header)) */
HW_ATL_ITR_RES_SHIFT, res_irq);
}
+/* set RSC interrupt */
+void hw_atl_itr_rsc_en_set(struct aq_hw_s *aq_hw, u32 enable)
+{
+ aq_hw_write_reg(aq_hw, HW_ATL_ITR_RSC_EN_ADR, enable);
+}
+
+/* set RSC delay */
+void hw_atl_itr_rsc_delay_set(struct aq_hw_s *aq_hw, u32 delay)
+{
+ aq_hw_write_reg_bit(aq_hw, HW_ATL_ITR_RSC_DELAY_ADR,
+ HW_ATL_ITR_RSC_DELAY_MSK,
+ HW_ATL_ITR_RSC_DELAY_SHIFT,
+ delay);
+}
+
/* rdm */
void hw_atl_rdm_cpu_id_set(struct aq_hw_s *aq_hw, u32 cpuid, u32 dca)
{
/* set reset interrupt */
void hw_atl_itr_res_irq_set(struct aq_hw_s *aq_hw, u32 res_irq);
+/* set RSC interrupt */
+void hw_atl_itr_rsc_en_set(struct aq_hw_s *aq_hw, u32 enable);
+
+/* set RSC delay */
+void hw_atl_itr_rsc_delay_set(struct aq_hw_s *aq_hw, u32 delay);
+
/* rdm */
/* set cpu id */
#define HW_ATL_ITR_RES_MSK 0x80000000
/* lower bit position of bitfield itr_reset */
#define HW_ATL_ITR_RES_SHIFT 31
+
+/* register address for bitfield rsc_en */
+#define HW_ATL_ITR_RSC_EN_ADR 0x00002200
+
+/* register address for bitfield rsc_delay */
+#define HW_ATL_ITR_RSC_DELAY_ADR 0x00002204
+/* bitmask for bitfield rsc_delay */
+#define HW_ATL_ITR_RSC_DELAY_MSK 0x0000000f
+/* width of bitfield rsc_delay */
+#define HW_ATL_ITR_RSC_DELAY_WIDTH 4
+/* lower bit position of bitfield rsc_delay */
+#define HW_ATL_ITR_RSC_DELAY_SHIFT 0
+
/* register address for bitfield dca{d}_cpuid[7:0] */
#define HW_ATL_RDM_DCADCPUID_ADR(dca) (0x00006100 + (dca) * 0x4)
/* bitmask for bitfield dca{d}_cpuid[7:0] */
unsigned int dma_len;
unsigned int align;
unsigned int next;
+ bool xmit_more;
if (atomic_read(&priv->tx_free) <= NB8800_DESC_LOW) {
netif_stop_queue(dev);
return NETDEV_TX_OK;
}
+ xmit_more = netdev_xmit_more();
if (atomic_dec_return(&priv->tx_free) <= NB8800_DESC_LOW) {
netif_stop_queue(dev);
- skb->xmit_more = 0;
+ xmit_more = false;
}
next = priv->tx_next;
desc->n_addr = priv->tx_bufs[next].dma_desc;
desc->config = DESC_BTS(2) | DESC_DS | DESC_EOF | dma_len;
- if (!skb->xmit_more)
+ if (!xmit_more)
desc->config |= DESC_EOC;
txb->skb = skb;
priv->tx_next = next;
- if (!skb->xmit_more) {
+ if (!xmit_more) {
smp_wmb();
priv->tx_chain->ready = true;
priv->tx_chain = NULL;
depends on PCI
select FW_LOADER
select LIBCRC32C
+ select NET_DEVLINK
---help---
This driver supports Broadcom NetXtreme-C/E 10/25/40/50 gigabit
Ethernet cards. To compile this driver as a module, choose M here:
};
static u16 bcm_sysport_select_queue(struct net_device *dev, struct sk_buff *skb,
- struct net_device *sb_dev,
- select_queue_fallback_t fallback)
+ struct net_device *sb_dev)
{
struct bcm_sysport_priv *priv = netdev_priv(dev);
u16 queue = skb_get_queue_mapping(skb);
unsigned int q, port;
if (!netdev_uses_dsa(dev))
- return fallback(dev, skb, NULL);
+ return netdev_pick_tx(dev, skb, NULL);
/* DSA tagging layer will have configured the correct queue */
q = BRCM_TAG_GET_QUEUE(queue);
tx_ring = priv->ring_map[q + port * priv->per_port_num_tx_queues];
if (unlikely(!tx_ring))
- return fallback(dev, skb, NULL);
+ return netdev_pick_tx(dev, skb, NULL);
return tx_ring->index;
}
priv->rev = topctrl_readl(priv, REV_CNTL) & REV_MASK;
dev_info(&pdev->dev,
- "Broadcom SYSTEMPORT%s" REV_FMT
- " at 0x%p (irqs: %d, %d, TXQs: %d, RXQs: %d)\n",
+ "Broadcom SYSTEMPORT%s " REV_FMT
+ " (irqs: %d, %d, TXQs: %d, RXQs: %d)\n",
priv->is_lite ? " Lite" : "",
(priv->rev >> 8) & 0xff, priv->rev & 0xff,
- priv->base, priv->irq0, priv->irq1, txq, rxq);
+ priv->irq0, priv->irq1, txq, rxq);
return 0;
}
u16 bnx2x_select_queue(struct net_device *dev, struct sk_buff *skb,
- struct net_device *sb_dev,
- select_queue_fallback_t fallback)
+ struct net_device *sb_dev)
{
struct bnx2x *bp = netdev_priv(dev);
}
/* select a non-FCoE queue */
- return fallback(dev, skb, NULL) %
+ return netdev_pick_tx(dev, skb, NULL) %
(BNX2X_NUM_ETH_QUEUES(bp) * bp->max_cos);
}
/* select_queue callback */
u16 bnx2x_select_queue(struct net_device *dev, struct sk_buff *skb,
- struct net_device *sb_dev,
- select_queue_fallback_t fallback);
+ struct net_device *sb_dev);
static inline void bnx2x_update_rx_prod(struct bnx2x *bp,
struct bnx2x_fastpath *fp,
#define BCM_5710_FW_MAJOR_VERSION 7
#define BCM_5710_FW_MINOR_VERSION 13
-#define BCM_5710_FW_REVISION_VERSION 1
+#define BCM_5710_FW_REVISION_VERSION 11
#define BCM_5710_FW_ENGINEERING_VERSION 0
#define BCM_5710_FW_COMPILE_FLAGS 1
#define CLIENT_INIT_RX_DATA_TPA_EN_IPV6_SHIFT 1
#define CLIENT_INIT_RX_DATA_TPA_MODE (0x1<<2)
#define CLIENT_INIT_RX_DATA_TPA_MODE_SHIFT 2
-#define CLIENT_INIT_RX_DATA_RESERVED5 (0x1F<<3)
-#define CLIENT_INIT_RX_DATA_RESERVED5_SHIFT 3
+#define CLIENT_INIT_RX_DATA_TPA_OVER_VLAN_DISABLE (0x1<<3)
+#define CLIENT_INIT_RX_DATA_TPA_OVER_VLAN_DISABLE_SHIFT 3
+#define CLIENT_INIT_RX_DATA_RESERVED5 (0xF<<4)
+#define CLIENT_INIT_RX_DATA_RESERVED5_SHIFT 4
u8 vmqueue_mode_en_flg;
u8 extra_data_over_sgl_en_flg;
u8 cache_line_alignment_log_size;
*/
struct eth_classify_header {
u8 rule_cnt;
- u8 reserved0;
+ u8 warning_on_error;
__le16 reserved1;
__le32 echo;
};
__le32 sge_page_base_hi;
__le16 sge_pause_thr_low;
__le16 sge_pause_thr_high;
+ u8 tpa_over_vlan_disable;
+ u8 reserved[7];
};
u32 upper_bound;
u32 fair_threshold;
u32 fairness_timeout;
- u32 reserved0;
+ u32 size_thr;
};
/*
u8 sd_vlan_force_pri_val;
u8 c2s_pri_tt_valid;
u8 c2s_pri_default;
- u8 reserved2[6];
+ u8 tx_vlan_filtering_enable;
+ u8 tx_vlan_filtering_use_pvid;
+ u8 reserved2[4];
struct c2s_pri_trans_table_entry c2s_pri_trans_table;
};
u8 reserved1;
__le16 sd_vlan_tag;
__le16 sd_vlan_eth_type;
- __le16 reserved0;
+ u8 tx_vlan_filtering_pvid_change_flg;
+ u8 reserved0;
__le32 reserved2;
};
return 0;
}
+#define BNX2X_P2P_DETECT_PARAM_MASK 0x5F5
+#define BNX2X_P2P_DETECT_RULE_MASK 0x3DBB
+#define BNX2X_PTP_TX_ON_PARAM_MASK (BNX2X_P2P_DETECT_PARAM_MASK & 0x6AA)
+#define BNX2X_PTP_TX_ON_RULE_MASK (BNX2X_P2P_DETECT_RULE_MASK & 0x3EEE)
+#define BNX2X_PTP_V1_L4_PARAM_MASK (BNX2X_P2P_DETECT_PARAM_MASK & 0x7EE)
+#define BNX2X_PTP_V1_L4_RULE_MASK (BNX2X_P2P_DETECT_RULE_MASK & 0x3FFE)
+#define BNX2X_PTP_V2_L4_PARAM_MASK (BNX2X_P2P_DETECT_PARAM_MASK & 0x7EA)
+#define BNX2X_PTP_V2_L4_RULE_MASK (BNX2X_P2P_DETECT_RULE_MASK & 0x3FEE)
+#define BNX2X_PTP_V2_L2_PARAM_MASK (BNX2X_P2P_DETECT_PARAM_MASK & 0x6BF)
+#define BNX2X_PTP_V2_L2_RULE_MASK (BNX2X_P2P_DETECT_RULE_MASK & 0x3EFF)
+#define BNX2X_PTP_V2_PARAM_MASK (BNX2X_P2P_DETECT_PARAM_MASK & 0x6AA)
+#define BNX2X_PTP_V2_RULE_MASK (BNX2X_P2P_DETECT_RULE_MASK & 0x3EEE)
+
int bnx2x_configure_ptp_filters(struct bnx2x *bp)
{
int port = BP_PORT(bp);
+ u32 param, rule;
int rc;
if (!bp->hwtstamp_ioctl_called)
return 0;
+ param = port ? NIG_REG_P1_TLLH_PTP_PARAM_MASK :
+ NIG_REG_P0_TLLH_PTP_PARAM_MASK;
+ rule = port ? NIG_REG_P1_TLLH_PTP_RULE_MASK :
+ NIG_REG_P0_TLLH_PTP_RULE_MASK;
switch (bp->tx_type) {
case HWTSTAMP_TX_ON:
bp->flags |= TX_TIMESTAMPING_EN;
- REG_WR(bp, port ? NIG_REG_P1_TLLH_PTP_PARAM_MASK :
- NIG_REG_P0_TLLH_PTP_PARAM_MASK, 0x6AA);
- REG_WR(bp, port ? NIG_REG_P1_TLLH_PTP_RULE_MASK :
- NIG_REG_P0_TLLH_PTP_RULE_MASK, 0x3EEE);
+ REG_WR(bp, param, BNX2X_PTP_TX_ON_PARAM_MASK);
+ REG_WR(bp, rule, BNX2X_PTP_TX_ON_RULE_MASK);
break;
case HWTSTAMP_TX_ONESTEP_SYNC:
BNX2X_ERR("One-step timestamping is not supported\n");
return -ERANGE;
}
+ param = port ? NIG_REG_P1_LLH_PTP_PARAM_MASK :
+ NIG_REG_P0_LLH_PTP_PARAM_MASK;
+ rule = port ? NIG_REG_P1_LLH_PTP_RULE_MASK :
+ NIG_REG_P0_LLH_PTP_RULE_MASK;
switch (bp->rx_filter) {
case HWTSTAMP_FILTER_NONE:
break;
case HWTSTAMP_FILTER_PTP_V1_L4_DELAY_REQ:
bp->rx_filter = HWTSTAMP_FILTER_PTP_V1_L4_EVENT;
/* Initialize PTP detection for UDP/IPv4 events */
- REG_WR(bp, port ? NIG_REG_P1_LLH_PTP_PARAM_MASK :
- NIG_REG_P0_LLH_PTP_PARAM_MASK, 0x7EE);
- REG_WR(bp, port ? NIG_REG_P1_LLH_PTP_RULE_MASK :
- NIG_REG_P0_LLH_PTP_RULE_MASK, 0x3FFE);
+ REG_WR(bp, param, BNX2X_PTP_V1_L4_PARAM_MASK);
+ REG_WR(bp, rule, BNX2X_PTP_V1_L4_RULE_MASK);
break;
case HWTSTAMP_FILTER_PTP_V2_L4_EVENT:
case HWTSTAMP_FILTER_PTP_V2_L4_SYNC:
case HWTSTAMP_FILTER_PTP_V2_L4_DELAY_REQ:
bp->rx_filter = HWTSTAMP_FILTER_PTP_V2_L4_EVENT;
/* Initialize PTP detection for UDP/IPv4 or UDP/IPv6 events */
- REG_WR(bp, port ? NIG_REG_P1_LLH_PTP_PARAM_MASK :
- NIG_REG_P0_LLH_PTP_PARAM_MASK, 0x7EA);
- REG_WR(bp, port ? NIG_REG_P1_LLH_PTP_RULE_MASK :
- NIG_REG_P0_LLH_PTP_RULE_MASK, 0x3FEE);
+ REG_WR(bp, param, BNX2X_PTP_V2_L4_PARAM_MASK);
+ REG_WR(bp, rule, BNX2X_PTP_V2_L4_RULE_MASK);
break;
case HWTSTAMP_FILTER_PTP_V2_L2_EVENT:
case HWTSTAMP_FILTER_PTP_V2_L2_SYNC:
case HWTSTAMP_FILTER_PTP_V2_L2_DELAY_REQ:
bp->rx_filter = HWTSTAMP_FILTER_PTP_V2_L2_EVENT;
/* Initialize PTP detection L2 events */
- REG_WR(bp, port ? NIG_REG_P1_LLH_PTP_PARAM_MASK :
- NIG_REG_P0_LLH_PTP_PARAM_MASK, 0x6BF);
- REG_WR(bp, port ? NIG_REG_P1_LLH_PTP_RULE_MASK :
- NIG_REG_P0_LLH_PTP_RULE_MASK, 0x3EFF);
+ REG_WR(bp, param, BNX2X_PTP_V2_L2_PARAM_MASK);
+ REG_WR(bp, rule, BNX2X_PTP_V2_L2_RULE_MASK);
break;
case HWTSTAMP_FILTER_PTP_V2_EVENT:
case HWTSTAMP_FILTER_PTP_V2_DELAY_REQ:
bp->rx_filter = HWTSTAMP_FILTER_PTP_V2_EVENT;
/* Initialize PTP detection L2, UDP/IPv4 or UDP/IPv6 events */
- REG_WR(bp, port ? NIG_REG_P1_LLH_PTP_PARAM_MASK :
- NIG_REG_P0_LLH_PTP_PARAM_MASK, 0x6AA);
- REG_WR(bp, port ? NIG_REG_P1_LLH_PTP_RULE_MASK :
- NIG_REG_P0_LLH_PTP_RULE_MASK, 0x3EEE);
+ REG_WR(bp, param, BNX2X_PTP_V2_PARAM_MASK);
+ REG_WR(bp, rule, BNX2X_PTP_V2_RULE_MASK);
break;
}
prod = NEXT_TX(prod);
txr->tx_prod = prod;
- if (!skb->xmit_more || netif_xmit_stopped(txq))
+ if (!netdev_xmit_more() || netif_xmit_stopped(txq))
bnxt_db_write(bp, &txr->tx_db, prod);
tx_done:
mmiowb();
if (unlikely(bnxt_tx_avail(bp, txr) <= MAX_SKB_FRAGS + 1)) {
- if (skb->xmit_more && !tx_buf->is_push)
+ if (netdev_xmit_more() && !tx_buf->is_push)
bnxt_db_write(bp, &txr->tx_db, prod);
netif_tx_stop_queue(txq);
return rc;
}
-static int bnxt_get_phys_port_name(struct net_device *dev, char *buf,
- size_t len)
-{
- struct bnxt *bp = netdev_priv(dev);
- int rc;
-
- /* The PF and it's VF-reps only support the switchdev framework */
- if (!BNXT_PF(bp))
- return -EOPNOTSUPP;
-
- rc = snprintf(buf, len, "p%d", bp->pf.port_id);
-
- if (rc >= len)
- return -EOPNOTSUPP;
- return 0;
-}
-
int bnxt_get_port_parent_id(struct net_device *dev,
struct netdev_phys_item_id *ppid)
{
return 0;
}
+static struct devlink_port *bnxt_get_devlink_port(struct net_device *dev)
+{
+ struct bnxt *bp = netdev_priv(dev);
+
+ return &bp->dl_port;
+}
+
static const struct net_device_ops bnxt_netdev_ops = {
.ndo_open = bnxt_open,
.ndo_start_xmit = bnxt_start_xmit,
.ndo_bpf = bnxt_xdp,
.ndo_bridge_getlink = bnxt_bridge_getlink,
.ndo_bridge_setlink = bnxt_bridge_setlink,
- .ndo_get_port_parent_id = bnxt_get_port_parent_id,
- .ndo_get_phys_port_name = bnxt_get_phys_port_name
+ .ndo_get_devlink_port = bnxt_get_devlink_port,
};
static void bnxt_remove_one(struct pci_dev *pdev)
return rc;
}
+static int bnxt_pcie_dsn_get(struct bnxt *bp, u8 dsn[])
+{
+ struct pci_dev *pdev = bp->pdev;
+ int pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_DSN);
+ u32 dw;
+
+ if (!pos) {
+ netdev_info(bp->dev, "Unable do read adapter's DSN");
+ return -EOPNOTSUPP;
+ }
+
+ /* DSN (two dw) is at an offset of 4 from the cap pos */
+ pos += 4;
+ pci_read_config_dword(pdev, pos, &dw);
+ put_unaligned_le32(dw, &dsn[0]);
+ pci_read_config_dword(pdev, pos + 4, &dw);
+ put_unaligned_le32(dw, &dsn[4]);
+ return 0;
+}
+
static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
{
static int version_printed;
goto init_err_pci_clean;
}
+ /* Read the adapter's DSN to use as the eswitch switch_id */
+ rc = bnxt_pcie_dsn_get(bp, bp->switch_id);
+ if (rc)
+ goto init_err_pci_clean;
+
bnxt_hwrm_func_qcfg(bp);
bnxt_hwrm_vnic_qcaps(bp);
bnxt_hwrm_port_led_qcaps(bp);
#include <linux/pci.h>
#include <linux/netdevice.h>
+#include <net/devlink.h>
#include "bnxt_hsi.h"
#include "bnxt.h"
#include "bnxt_vfr.h"
goto err_dl_unreg;
}
+ devlink_port_attrs_set(&bp->dl_port, DEVLINK_PORT_FLAVOUR_PHYSICAL,
+ bp->pf.port_id, false, 0,
+ bp->switch_id, sizeof(bp->switch_id));
rc = devlink_port_register(dl, &bp->dl_port, bp->pf.port_id);
if (rc) {
netdev_err(bp->dev, "devlink_port_register failed");
dev->min_mtu = ETH_ZLEN;
}
-static int bnxt_pcie_dsn_get(struct bnxt *bp, u8 dsn[])
-{
- struct pci_dev *pdev = bp->pdev;
- int pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_DSN);
- u32 dw;
-
- if (!pos) {
- netdev_info(bp->dev, "Unable do read adapter's DSN");
- return -EOPNOTSUPP;
- }
-
- /* DSN (two dw) is at an offset of 4 from the cap pos */
- pos += 4;
- pci_read_config_dword(pdev, pos, &dw);
- put_unaligned_le32(dw, &dsn[0]);
- pci_read_config_dword(pdev, pos + 4, &dw);
- put_unaligned_le32(dw, &dsn[4]);
- return 0;
-}
-
static int bnxt_vf_reps_create(struct bnxt *bp)
{
u16 *cfa_code_map = NULL, num_vfs = pci_num_vf(bp->pdev);
}
}
- /* Read the adapter's DSN to use as the eswitch switch_id */
- rc = bnxt_pcie_dsn_get(bp, bp->switch_id);
- if (rc)
- goto err;
-
/* publish cfa_code_map only after all VF-reps have been initialized */
bp->cfa_code_map = cfa_code_map;
bp->eswitch_mode = DEVLINK_ESWITCH_MODE_SWITCHDEV;
if (ring->free_bds <= (MAX_SKB_FRAGS + 1))
netif_tx_stop_queue(txq);
- if (!skb->xmit_more || netif_xmit_stopped(txq))
+ if (!netdev_xmit_more() || netif_xmit_stopped(txq))
/* Packets are ready, update producer index */
bcmgenet_tdma_ring_writel(priv, ring->index,
ring->prod_index, TDMA_PROD_INDEX);
netif_tx_wake_queue(txq);
}
- if (!skb->xmit_more || netif_xmit_stopped(txq)) {
+ if (!netdev_xmit_more() || netif_xmit_stopped(txq)) {
/* Packets are ready, update Tx producer idx on card. */
tw32_tx_mbox(tnapi->prodmbox, entry);
mmiowb();
{
struct tg3 *tp = netdev_priv(dev);
- if (!netif_running(tp->dev))
- return -EAGAIN;
-
switch (state) {
case ETHTOOL_ID_ACTIVE:
return 1; /* cycle on/off once per second */
static int __maybe_unused macb_runtime_suspend(struct device *dev)
{
- struct platform_device *pdev = to_platform_device(dev);
- struct net_device *netdev = platform_get_drvdata(pdev);
+ struct net_device *netdev = dev_get_drvdata(dev);
struct macb *bp = netdev_priv(netdev);
if (!(device_may_wakeup(&bp->dev->dev))) {
static int __maybe_unused macb_runtime_resume(struct device *dev)
{
- struct platform_device *pdev = to_platform_device(dev);
- struct net_device *netdev = platform_get_drvdata(pdev);
+ struct net_device *netdev = dev_get_drvdata(dev);
struct macb *bp = netdev_priv(netdev);
if (!(device_may_wakeup(&bp->dev->dev))) {
imply PTP_1588_CLOCK
select FW_LOADER
select LIBCRC32C
+ select NET_DEVLINK
---help---
This driver supports Cavium LiquidIO Intelligent Server Adapters
based on CN66XX, CN68XX and CN23XX chips.
irh->vlan = skb_vlan_tag_get(skb) & 0xfff;
}
- xmit_more = skb->xmit_more;
+ xmit_more = netdev_xmit_more();
if (unlikely(cmdsetup.s.timestamp))
status = send_nic_timestamp_pkt(oct, &ndata, finfo, xmit_more);
irh->vlan = skb_vlan_tag_get(skb) & VLAN_VID_MASK;
}
- xmit_more = skb->xmit_more;
+ xmit_more = netdev_xmit_more();
if (unlikely(cmdsetup.s.timestamp))
status = send_nic_timestamp_pkt(oct, &ndata, finfo, xmit_more);
lmac->last_duplex = (an_result >> 1) & 0x1;
switch (speed) {
case 0:
- lmac->last_speed = 10;
+ lmac->last_speed = SPEED_10;
break;
case 1:
- lmac->last_speed = 100;
+ lmac->last_speed = SPEED_100;
break;
case 2:
- lmac->last_speed = 1000;
+ lmac->last_speed = SPEED_1000;
break;
default:
lmac->link_up = false;
!(smu_link & SMU_RX_CTL_STATUS)) {
lmac->link_up = 1;
if (lmac->lmac_type == BGX_MODE_XLAUI)
- lmac->last_speed = 40000;
+ lmac->last_speed = SPEED_40000;
else
- lmac->last_speed = 10000;
- lmac->last_duplex = 1;
+ lmac->last_speed = SPEED_10000;
+ lmac->last_duplex = DUPLEX_FULL;
} else {
lmac->link_up = 0;
lmac->last_speed = SPEED_UNKNOWN;
} else {
/* Default to below link speed and duplex */
lmac->link_up = true;
- lmac->last_speed = 1000;
- lmac->last_duplex = 1;
+ lmac->last_speed = SPEED_1000;
+ lmac->last_duplex = DUPLEX_FULL;
bgx_sgmii_change_link_state(lmac);
return 0;
}
struct l2t_data *t3_init_l2t(unsigned int l2t_capacity)
{
struct l2t_data *d;
- int i, size = sizeof(*d) + l2t_capacity * sizeof(struct l2t_entry);
+ int i;
- d = kvzalloc(size, GFP_KERNEL);
+ d = kvzalloc(struct_size(d, l2tab, l2t_capacity), GFP_KERNEL);
if (!d)
return NULL;
struct l2t_entry *rover; /* starting point for next allocation */
atomic_t nfree; /* number of free entries */
rwlock_t lock;
- struct l2t_entry l2tab[0];
struct rcu_head rcu_head; /* to handle rcu cleanup */
+ struct l2t_entry l2tab[];
};
typedef void (*arp_failure_handler_func)(struct t3cdev * dev,
int t4_wait_dev_ready(void __iomem *regs);
+fw_port_cap32_t t4_link_acaps(struct adapter *adapter, unsigned int port,
+ struct link_config *lc);
int t4_link_l1cfg_core(struct adapter *adap, unsigned int mbox,
unsigned int port, struct link_config *lc,
- bool sleep_ok, int timeout);
+ u8 sleep_ok, int timeout);
static inline int t4_link_l1cfg(struct adapter *adapter, unsigned int mbox,
unsigned int port, struct link_config *lc)
* Link Mode Mask.
*/
static void fw_caps_to_lmm(enum fw_port_type port_type,
- unsigned int fw_caps,
+ fw_port_cap32_t fw_caps,
unsigned long *link_mode_mask)
{
#define SET_LMM(__lmm_name) \
fw_caps_to_lmm(pi->port_type, pi->link_cfg.pcaps,
link_ksettings->link_modes.supported);
- fw_caps_to_lmm(pi->port_type, pi->link_cfg.acaps,
+ fw_caps_to_lmm(pi->port_type,
+ t4_link_acaps(pi->adapter,
+ pi->lport,
+ &pi->link_cfg),
link_ksettings->link_modes.advertising);
fw_caps_to_lmm(pi->port_type, pi->link_cfg.lpacaps,
link_ksettings->link_modes.lp_advertising);
: SPEED_UNKNOWN);
base->duplex = DUPLEX_FULL;
- if (pi->link_cfg.fc & PAUSE_RX) {
- if (pi->link_cfg.fc & PAUSE_TX) {
- ethtool_link_ksettings_add_link_mode(link_ksettings,
- advertising,
- Pause);
- } else {
- ethtool_link_ksettings_add_link_mode(link_ksettings,
- advertising,
- Asym_Pause);
- }
- } else if (pi->link_cfg.fc & PAUSE_TX) {
- ethtool_link_ksettings_add_link_mode(link_ksettings,
- advertising,
- Asym_Pause);
- }
-
base->autoneg = pi->link_cfg.autoneg;
if (pi->link_cfg.pcaps & FW_PORT_CAP32_ANEG)
ethtool_link_ksettings_add_link_mode(link_ksettings,
break;
default:
- dev_err(adap->pdev_dev, "%s: filter creation PROBLEM; status = %u\n",
- __func__, status);
+ if (status != CPL_ERR_TCAM_FULL)
+ dev_err(adap->pdev_dev, "%s: filter creation PROBLEM; status = %u\n",
+ __func__, status);
if (ctx) {
if (status == CPL_ERR_TCAM_FULL)
- ctx->result = -EAGAIN;
+ ctx->result = -ENOSPC;
else
ctx->result = -EINVAL;
}
}
static u16 cxgb_select_queue(struct net_device *dev, struct sk_buff *skb,
- struct net_device *sb_dev,
- select_queue_fallback_t fallback)
+ struct net_device *sb_dev)
{
int txq;
return txq;
}
- return fallback(dev, skb, NULL) % dev->real_num_tx_queues;
+ return netdev_pick_tx(dev, skb, NULL) % dev->real_num_tx_queues;
}
static int closest_timer(const struct sge *s, int time)
ret = ctx.result;
/* Check if hw returned error for filter creation */
- if (ret) {
- netdev_err(dev, "%s: filter creation err %d\n",
- __func__, ret);
+ if (ret)
goto free_entry;
- }
ch_flower->tc_flower_cookie = cls->cookie;
ch_flower->filter_id = ctx.tid;
}
}
+/* The ADVERT_MASK is used to mask out all of the Advertised Firmware Port
+ * Capabilities which we control with separate controls -- see, for instance,
+ * Pause Frames and Forward Error Correction. In order to determine what the
+ * full set of Advertised Port Capabilities are, the base Advertised Port
+ * Capabilities (masked by ADVERT_MASK) must be combined with the Advertised
+ * Port Capabilities associated with those other controls. See
+ * t4_link_acaps() for how this is done.
+ */
#define ADVERT_MASK (FW_PORT_CAP32_SPEED_V(FW_PORT_CAP32_SPEED_M) | \
FW_PORT_CAP32_ANEG)
/* Translate Common Code Pause specification into Firmware Port Capabilities */
static inline fw_port_cap32_t cc_to_fwcap_pause(enum cc_pause cc_pause)
{
+ /* Translate orthogonal RX/TX Pause Controls for L1 Configure
+ * commands, etc.
+ */
fw_port_cap32_t fw_pause = 0;
if (cc_pause & PAUSE_RX)
if (!(cc_pause & PAUSE_AUTONEG))
fw_pause |= FW_PORT_CAP32_FORCE_PAUSE;
+ /* Translate orthogonal Pause controls into IEEE 802.3 Pause,
+ * Asymetrical Pause for use in reporting to upper layer OS code, etc.
+ * Note that these bits are ignored in L1 Configure commands.
+ */
+ if (cc_pause & PAUSE_RX) {
+ if (cc_pause & PAUSE_TX)
+ fw_pause |= FW_PORT_CAP32_802_3_PAUSE;
+ else
+ fw_pause |= FW_PORT_CAP32_802_3_ASM_DIR;
+ } else if (cc_pause & PAUSE_TX) {
+ fw_pause |= FW_PORT_CAP32_802_3_ASM_DIR;
+ }
+
return fw_pause;
}
}
/**
- * t4_link_l1cfg - apply link configuration to MAC/PHY
+ * t4_link_acaps - compute Link Advertised Port Capabilities
* @adapter: the adapter
- * @mbox: the Firmware Mailbox to use
* @port: the Port ID
* @lc: the Port's Link Configuration
- * @sleep_ok: if true we may sleep while awaiting command completion
- * @timeout: time to wait for command to finish before timing out
- * (negative implies @sleep_ok=false)
*
- * Set up a port's MAC and PHY according to a desired link configuration.
- * - If the PHY can auto-negotiate first decide what to advertise, then
- * enable/disable auto-negotiation as desired, and reset.
- * - If the PHY does not auto-negotiate just reset it.
- * - If auto-negotiation is off set the MAC to the proper speed/duplex/FC,
- * otherwise do it later based on the outcome of auto-negotiation.
+ * Synthesize the Advertised Port Capabilities we'll be using based on
+ * the base Advertised Port Capabilities (which have been filtered by
+ * ADVERT_MASK) plus the individual controls for things like Pause
+ * Frames, Forward Error Correction, MDI, etc.
*/
-int t4_link_l1cfg_core(struct adapter *adapter, unsigned int mbox,
- unsigned int port, struct link_config *lc,
- bool sleep_ok, int timeout)
+fw_port_cap32_t t4_link_acaps(struct adapter *adapter, unsigned int port,
+ struct link_config *lc)
{
- unsigned int fw_caps = adapter->params.fw_caps_support;
- fw_port_cap32_t fw_fc, cc_fec, fw_fec, rcap;
- struct fw_port_cmd cmd;
+ fw_port_cap32_t fw_fc, fw_fec, acaps;
unsigned int fw_mdi;
- int ret;
+ char cc_fec;
fw_mdi = (FW_PORT_CAP32_MDI_V(FW_PORT_CAP32_MDI_AUTO) & lc->pcaps);
* init_link_config().
*/
if (!(lc->pcaps & FW_PORT_CAP32_ANEG)) {
- if (lc->autoneg == AUTONEG_ENABLE)
- return -EINVAL;
-
- rcap = lc->acaps | fw_fc | fw_fec;
+ acaps = lc->acaps | fw_fc | fw_fec;
lc->fc = lc->requested_fc & ~PAUSE_AUTONEG;
lc->fec = cc_fec;
} else if (lc->autoneg == AUTONEG_DISABLE) {
- rcap = lc->speed_caps | fw_fc | fw_fec | fw_mdi;
+ acaps = lc->speed_caps | fw_fc | fw_fec | fw_mdi;
lc->fc = lc->requested_fc & ~PAUSE_AUTONEG;
lc->fec = cc_fec;
} else {
- rcap = lc->acaps | fw_fc | fw_fec | fw_mdi;
+ acaps = lc->acaps | fw_fc | fw_fec | fw_mdi;
}
/* Some Requested Port Capabilities are trivially wrong if they exceed
* we need to exclude this from this check in order to maintain
* compatibility ...
*/
- if ((rcap & ~lc->pcaps) & ~FW_PORT_CAP32_FORCE_PAUSE) {
- dev_err(adapter->pdev_dev,
- "Requested Port Capabilities %#x exceed Physical Port Capabilities %#x\n",
- rcap, lc->pcaps);
+ if ((acaps & ~lc->pcaps) & ~FW_PORT_CAP32_FORCE_PAUSE) {
+ dev_err(adapter->pdev_dev, "Requested Port Capabilities %#x exceed Physical Port Capabilities %#x\n",
+ acaps, lc->pcaps);
+ return -EINVAL;
+ }
+
+ return acaps;
+}
+
+/**
+ * t4_link_l1cfg_core - apply link configuration to MAC/PHY
+ * @adapter: the adapter
+ * @mbox: the Firmware Mailbox to use
+ * @port: the Port ID
+ * @lc: the Port's Link Configuration
+ * @sleep_ok: if true we may sleep while awaiting command completion
+ * @timeout: time to wait for command to finish before timing out
+ * (negative implies @sleep_ok=false)
+ *
+ * Set up a port's MAC and PHY according to a desired link configuration.
+ * - If the PHY can auto-negotiate first decide what to advertise, then
+ * enable/disable auto-negotiation as desired, and reset.
+ * - If the PHY does not auto-negotiate just reset it.
+ * - If auto-negotiation is off set the MAC to the proper speed/duplex/FC,
+ * otherwise do it later based on the outcome of auto-negotiation.
+ */
+int t4_link_l1cfg_core(struct adapter *adapter, unsigned int mbox,
+ unsigned int port, struct link_config *lc,
+ u8 sleep_ok, int timeout)
+{
+ unsigned int fw_caps = adapter->params.fw_caps_support;
+ struct fw_port_cmd cmd;
+ fw_port_cap32_t rcap;
+ int ret;
+
+ if (!(lc->pcaps & FW_PORT_CAP32_ANEG) &&
+ lc->autoneg == AUTONEG_ENABLE) {
return -EINVAL;
}
- /* And send that on to the Firmware ...
+ /* Compute our Requested Port Capabilities and send that on to the
+ * Firmware.
*/
+ rcap = t4_link_acaps(adapter, port, lc);
memset(&cmd, 0, sizeof(cmd));
cmd.op_to_portid = cpu_to_be32(FW_CMD_OP_V(FW_PORT_CMD) |
FW_CMD_REQUEST_F | FW_CMD_EXEC_F |
rcap, -ret);
return ret;
}
- return ret;
+ return 0;
}
/**
#define __T4FW_VERSION_H__
#define T4FW_VERSION_MAJOR 0x01
-#define T4FW_VERSION_MINOR 0x16
-#define T4FW_VERSION_MICRO 0x09
+#define T4FW_VERSION_MINOR 0x17
+#define T4FW_VERSION_MICRO 0x03
#define T4FW_VERSION_BUILD 0x00
#define T4FW_MIN_VERSION_MAJOR 0x01
#define T4FW_MIN_VERSION_MICRO 0x00
#define T5FW_VERSION_MAJOR 0x01
-#define T5FW_VERSION_MINOR 0x16
-#define T5FW_VERSION_MICRO 0x09
+#define T5FW_VERSION_MINOR 0x17
+#define T5FW_VERSION_MICRO 0x03
#define T5FW_VERSION_BUILD 0x00
#define T5FW_MIN_VERSION_MAJOR 0x00
#define T5FW_MIN_VERSION_MICRO 0x00
#define T6FW_VERSION_MAJOR 0x01
-#define T6FW_VERSION_MINOR 0x16
-#define T6FW_VERSION_MICRO 0x09
+#define T6FW_VERSION_MINOR 0x17
+#define T6FW_VERSION_MICRO 0x03
#define T6FW_VERSION_BUILD 0x00
#define T6FW_MIN_VERSION_MAJOR 0x00
base->duplex = DUPLEX_UNKNOWN;
}
- if (pi->link_cfg.fc & PAUSE_RX) {
- if (pi->link_cfg.fc & PAUSE_TX) {
- ethtool_link_ksettings_add_link_mode(link_ksettings,
- advertising,
- Pause);
- } else {
- ethtool_link_ksettings_add_link_mode(link_ksettings,
- advertising,
- Asym_Pause);
- }
- } else if (pi->link_cfg.fc & PAUSE_TX) {
- ethtool_link_ksettings_add_link_mode(link_ksettings,
- advertising,
- Asym_Pause);
- }
-
base->autoneg = pi->link_cfg.autoneg;
if (pi->link_cfg.pcaps & FW_PORT_CAP32_ANEG)
ethtool_link_ksettings_add_link_mode(link_ksettings,
return ret;
}
+/* In the Physical Function Driver Common Code, the ADVERT_MASK is used to
+ * mask out bits in the Advertised Port Capabilities which are managed via
+ * separate controls, like Pause Frames and Forward Error Correction. In the
+ * Virtual Function Common Code, since we never perform L1 Configuration on
+ * the Link, the only things we really need to filter out are things which
+ * we decode and report separately like Speed.
+ */
#define ADVERT_MASK (FW_PORT_CAP32_SPEED_V(FW_PORT_CAP32_SPEED_M) | \
+ FW_PORT_CAP32_802_3_PAUSE | \
+ FW_PORT_CAP32_802_3_ASM_DIR | \
+ FW_PORT_CAP32_FEC_V(FW_PORT_CAP32_FEC_M) | \
FW_PORT_CAP32_ANEG)
/**
if (vnic_wq_desc_avail(wq) < MAX_SKB_FRAGS + ENIC_DESC_MAX_SPLITS)
netif_tx_stop_queue(txq);
skb_tx_timestamp(skb);
- if (!skb->xmit_more || netif_xmit_stopped(txq))
+ if (!netdev_xmit_more() || netif_xmit_stopped(txq))
vnic_wq_doorbell(wq);
spin_unlock(&enic->wq_lock[txq_map]);
u16 q_idx = skb_get_queue_mapping(skb);
struct be_tx_obj *txo = &adapter->tx_obj[q_idx];
struct be_wrb_params wrb_params = { 0 };
- bool flush = !skb->xmit_more;
+ bool flush = !netdev_xmit_more();
u16 wrb_cnt;
skb = be_xmit_workarounds(adapter, skb, &wrb_params);
percpu_stats->rx_packets++;
percpu_stats->rx_bytes += dpaa2_fd_get_len(fd);
- napi_gro_receive(&ch->napi, skb);
+ list_add_tail(&skb->list, ch->rx_list);
return;
struct dpaa2_eth_fq *fq, *txc_fq = NULL;
struct netdev_queue *nq;
int store_cleaned, work_done;
+ struct list_head rx_list;
int err;
ch = container_of(napi, struct dpaa2_eth_channel, napi);
ch->xdp.res = 0;
priv = ch->priv;
+ INIT_LIST_HEAD(&rx_list);
+ ch->rx_list = &rx_list;
+
do {
err = pull_channel(ch);
if (unlikely(err))
work_done = max(rx_cleaned, 1);
out:
+ netif_receive_skb_list(ch->rx_list);
+
if (txc_fq && txc_fq->dq_frames) {
nq = netdev_get_tx_queue(priv->net_dev, txc_fq->flowid);
netdev_tx_completed_queue(nq, txc_fq->dq_frames,
.rxnfc_field = RXH_L2DA,
.cls_prot = NET_PROT_ETH,
.cls_field = NH_FLD_ETH_DA,
+ .id = DPAA2_ETH_DIST_ETHDST,
.size = 6,
}, {
.cls_prot = NET_PROT_ETH,
.cls_field = NH_FLD_ETH_SA,
+ .id = DPAA2_ETH_DIST_ETHSRC,
.size = 6,
}, {
/* This is the last ethertype field parsed:
*/
.cls_prot = NET_PROT_ETH,
.cls_field = NH_FLD_ETH_TYPE,
+ .id = DPAA2_ETH_DIST_ETHTYPE,
.size = 2,
}, {
/* VLAN header */
.rxnfc_field = RXH_VLAN,
.cls_prot = NET_PROT_VLAN,
.cls_field = NH_FLD_VLAN_TCI,
+ .id = DPAA2_ETH_DIST_VLAN,
.size = 2,
}, {
/* IP header */
.rxnfc_field = RXH_IP_SRC,
.cls_prot = NET_PROT_IP,
.cls_field = NH_FLD_IP_SRC,
+ .id = DPAA2_ETH_DIST_IPSRC,
.size = 4,
}, {
.rxnfc_field = RXH_IP_DST,
.cls_prot = NET_PROT_IP,
.cls_field = NH_FLD_IP_DST,
+ .id = DPAA2_ETH_DIST_IPDST,
.size = 4,
}, {
.rxnfc_field = RXH_L3_PROTO,
.cls_prot = NET_PROT_IP,
.cls_field = NH_FLD_IP_PROTO,
+ .id = DPAA2_ETH_DIST_IPPROTO,
.size = 1,
}, {
/* Using UDP ports, this is functionally equivalent to raw
.rxnfc_field = RXH_L4_B_0_1,
.cls_prot = NET_PROT_UDP,
.cls_field = NH_FLD_UDP_PORT_SRC,
+ .id = DPAA2_ETH_DIST_L4SRC,
.size = 2,
}, {
.rxnfc_field = RXH_L4_B_2_3,
.cls_prot = NET_PROT_UDP,
.cls_field = NH_FLD_UDP_PORT_DST,
+ .id = DPAA2_ETH_DIST_L4DST,
.size = 2,
},
};
}
/* Size of the Rx flow classification key */
-int dpaa2_eth_cls_key_size(void)
+int dpaa2_eth_cls_key_size(u64 fields)
{
int i, size = 0;
- for (i = 0; i < ARRAY_SIZE(dist_fields); i++)
+ for (i = 0; i < ARRAY_SIZE(dist_fields); i++) {
+ if (!(fields & dist_fields[i].id))
+ continue;
size += dist_fields[i].size;
+ }
return size;
}
return 0;
}
+/* Prune unused fields from the classification rule.
+ * Used when masking is not supported
+ */
+void dpaa2_eth_cls_trim_rule(void *key_mem, u64 fields)
+{
+ int off = 0, new_off = 0;
+ int i, size;
+
+ for (i = 0; i < ARRAY_SIZE(dist_fields); i++) {
+ size = dist_fields[i].size;
+ if (dist_fields[i].id & fields) {
+ memcpy(key_mem + new_off, key_mem + off, size);
+ new_off += size;
+ }
+ off += size;
+ }
+}
+
/* Set Rx distribution (hash or flow classification) key
* flags is a combination of RXH_ bits
*/
struct dpkg_extract *key =
&cls_cfg.extracts[cls_cfg.num_extracts];
- /* For Rx hashing key we set only the selected fields.
- * For Rx flow classification key we set all supported fields
+ /* For both Rx hashing and classification keys
+ * we set only the selected fields.
*/
- if (type == DPAA2_ETH_RX_DIST_HASH) {
- if (!(flags & dist_fields[i].rxnfc_field))
- continue;
+ if (!(flags & dist_fields[i].id))
+ continue;
+ if (type == DPAA2_ETH_RX_DIST_HASH)
rx_hash_fields |= dist_fields[i].rxnfc_field;
- }
if (cls_cfg.num_extracts >= DPKG_MAX_NUM_OF_EXTRACTS) {
dev_err(dev, "error adding key extraction rule, too many rules?\n");
int dpaa2_eth_set_hash(struct net_device *net_dev, u64 flags)
{
struct dpaa2_eth_priv *priv = netdev_priv(net_dev);
+ u64 key = 0;
+ int i;
if (!dpaa2_eth_hash_enabled(priv))
return -EOPNOTSUPP;
- return dpaa2_eth_set_dist_key(net_dev, DPAA2_ETH_RX_DIST_HASH, flags);
+ for (i = 0; i < ARRAY_SIZE(dist_fields); i++)
+ if (dist_fields[i].rxnfc_field & flags)
+ key |= dist_fields[i].id;
+
+ return dpaa2_eth_set_dist_key(net_dev, DPAA2_ETH_RX_DIST_HASH, key);
+}
+
+int dpaa2_eth_set_cls(struct net_device *net_dev, u64 flags)
+{
+ return dpaa2_eth_set_dist_key(net_dev, DPAA2_ETH_RX_DIST_CLS, flags);
}
-static int dpaa2_eth_set_cls(struct dpaa2_eth_priv *priv)
+static int dpaa2_eth_set_default_cls(struct dpaa2_eth_priv *priv)
{
struct device *dev = priv->net_dev->dev.parent;
+ int err;
/* Check if we actually support Rx flow classification */
if (dpaa2_eth_has_legacy_dist(priv)) {
return -EOPNOTSUPP;
}
- if (priv->dpni_attrs.options & DPNI_OPT_NO_FS ||
- !(priv->dpni_attrs.options & DPNI_OPT_HAS_KEY_MASKING)) {
+ if (!dpaa2_eth_fs_enabled(priv)) {
dev_dbg(dev, "Rx cls disabled in DPNI options\n");
return -EOPNOTSUPP;
}
return -EOPNOTSUPP;
}
+ /* If there is no support for masking in the classification table,
+ * we don't set a default key, as it will depend on the rules
+ * added by the user at runtime.
+ */
+ if (!dpaa2_eth_fs_mask_enabled(priv))
+ goto out;
+
+ err = dpaa2_eth_set_cls(priv->net_dev, DPAA2_ETH_DIST_ALL);
+ if (err)
+ return err;
+
+out:
priv->rx_cls_enabled = 1;
- return dpaa2_eth_set_dist_key(priv->net_dev, DPAA2_ETH_RX_DIST_CLS, 0);
+ return 0;
}
/* Bind the DPNI to its needed objects and resources: buffer pool, DPIOs,
/* Configure the flow classification key; it includes all
* supported header fields and cannot be modified at runtime
*/
- err = dpaa2_eth_set_cls(priv);
+ err = dpaa2_eth_set_default_cls(priv);
if (err && err != -EOPNOTSUPP)
dev_err(dev, "Failed to configure Rx classification key\n");
struct dpaa2_eth_ch_stats stats;
struct dpaa2_eth_ch_xdp xdp;
struct xdp_rxq_info xdp_rxq;
+ struct list_head *rx_list;
};
struct dpaa2_eth_dist_fields {
enum net_prot cls_prot;
int cls_field;
int size;
+ u64 id;
};
struct dpaa2_eth_cls_rule {
/* enabled ethtool hashing bits */
u64 rx_hash_fields;
+ u64 rx_cls_fields;
struct dpaa2_eth_cls_rule *cls_rules;
u8 rx_cls_enabled;
struct bpf_prog *xdp_prog;
(dpaa2_eth_cmp_dpni_ver((priv), DPNI_RX_DIST_KEY_VER_MAJOR, \
DPNI_RX_DIST_KEY_VER_MINOR) < 0)
+#define dpaa2_eth_fs_enabled(priv) \
+ (!((priv)->dpni_attrs.options & DPNI_OPT_NO_FS))
+
+#define dpaa2_eth_fs_mask_enabled(priv) \
+ ((priv)->dpni_attrs.options & DPNI_OPT_HAS_KEY_MASKING)
+
#define dpaa2_eth_fs_count(priv) \
((priv)->dpni_attrs.fs_entries)
DPAA2_ETH_RX_DIST_CLS
};
+/* Unique IDs for the supported Rx classification header fields */
+#define DPAA2_ETH_DIST_ETHDST BIT(0)
+#define DPAA2_ETH_DIST_ETHSRC BIT(1)
+#define DPAA2_ETH_DIST_ETHTYPE BIT(2)
+#define DPAA2_ETH_DIST_VLAN BIT(3)
+#define DPAA2_ETH_DIST_IPSRC BIT(4)
+#define DPAA2_ETH_DIST_IPDST BIT(5)
+#define DPAA2_ETH_DIST_IPPROTO BIT(6)
+#define DPAA2_ETH_DIST_L4SRC BIT(7)
+#define DPAA2_ETH_DIST_L4DST BIT(8)
+#define DPAA2_ETH_DIST_ALL (~0U)
+
static inline
unsigned int dpaa2_eth_needed_headroom(struct dpaa2_eth_priv *priv,
struct sk_buff *skb)
}
int dpaa2_eth_set_hash(struct net_device *net_dev, u64 flags);
-int dpaa2_eth_cls_key_size(void);
+int dpaa2_eth_set_cls(struct net_device *net_dev, u64 key);
+int dpaa2_eth_cls_key_size(u64 key);
int dpaa2_eth_cls_fld_off(int prot, int field);
+void dpaa2_eth_cls_trim_rule(void *key_mem, u64 fields);
#endif /* __DPAA2_H */
}
static int prep_eth_rule(struct ethhdr *eth_value, struct ethhdr *eth_mask,
- void *key, void *mask)
+ void *key, void *mask, u64 *fields)
{
int off;
off = dpaa2_eth_cls_fld_off(NET_PROT_ETH, NH_FLD_ETH_TYPE);
*(__be16 *)(key + off) = eth_value->h_proto;
*(__be16 *)(mask + off) = eth_mask->h_proto;
+ *fields |= DPAA2_ETH_DIST_ETHTYPE;
}
if (!is_zero_ether_addr(eth_mask->h_source)) {
off = dpaa2_eth_cls_fld_off(NET_PROT_ETH, NH_FLD_ETH_SA);
ether_addr_copy(key + off, eth_value->h_source);
ether_addr_copy(mask + off, eth_mask->h_source);
+ *fields |= DPAA2_ETH_DIST_ETHSRC;
}
if (!is_zero_ether_addr(eth_mask->h_dest)) {
off = dpaa2_eth_cls_fld_off(NET_PROT_ETH, NH_FLD_ETH_DA);
ether_addr_copy(key + off, eth_value->h_dest);
ether_addr_copy(mask + off, eth_mask->h_dest);
+ *fields |= DPAA2_ETH_DIST_ETHDST;
}
return 0;
static int prep_uip_rule(struct ethtool_usrip4_spec *uip_value,
struct ethtool_usrip4_spec *uip_mask,
- void *key, void *mask)
+ void *key, void *mask, u64 *fields)
{
int off;
u32 tmp_value, tmp_mask;
off = dpaa2_eth_cls_fld_off(NET_PROT_IP, NH_FLD_IP_SRC);
*(__be32 *)(key + off) = uip_value->ip4src;
*(__be32 *)(mask + off) = uip_mask->ip4src;
+ *fields |= DPAA2_ETH_DIST_IPSRC;
}
if (uip_mask->ip4dst) {
off = dpaa2_eth_cls_fld_off(NET_PROT_IP, NH_FLD_IP_DST);
*(__be32 *)(key + off) = uip_value->ip4dst;
*(__be32 *)(mask + off) = uip_mask->ip4dst;
+ *fields |= DPAA2_ETH_DIST_IPDST;
}
if (uip_mask->proto) {
off = dpaa2_eth_cls_fld_off(NET_PROT_IP, NH_FLD_IP_PROTO);
*(u8 *)(key + off) = uip_value->proto;
*(u8 *)(mask + off) = uip_mask->proto;
+ *fields |= DPAA2_ETH_DIST_IPPROTO;
}
if (uip_mask->l4_4_bytes) {
off = dpaa2_eth_cls_fld_off(NET_PROT_UDP, NH_FLD_UDP_PORT_SRC);
*(__be16 *)(key + off) = htons(tmp_value >> 16);
*(__be16 *)(mask + off) = htons(tmp_mask >> 16);
+ *fields |= DPAA2_ETH_DIST_L4SRC;
off = dpaa2_eth_cls_fld_off(NET_PROT_UDP, NH_FLD_UDP_PORT_DST);
*(__be16 *)(key + off) = htons(tmp_value & 0xFFFF);
*(__be16 *)(mask + off) = htons(tmp_mask & 0xFFFF);
+ *fields |= DPAA2_ETH_DIST_L4DST;
}
/* Only apply the rule for IPv4 frames */
off = dpaa2_eth_cls_fld_off(NET_PROT_ETH, NH_FLD_ETH_TYPE);
*(__be16 *)(key + off) = htons(ETH_P_IP);
*(__be16 *)(mask + off) = htons(0xFFFF);
+ *fields |= DPAA2_ETH_DIST_ETHTYPE;
return 0;
}
static int prep_l4_rule(struct ethtool_tcpip4_spec *l4_value,
struct ethtool_tcpip4_spec *l4_mask,
- void *key, void *mask, u8 l4_proto)
+ void *key, void *mask, u8 l4_proto, u64 *fields)
{
int off;
off = dpaa2_eth_cls_fld_off(NET_PROT_IP, NH_FLD_IP_SRC);
*(__be32 *)(key + off) = l4_value->ip4src;
*(__be32 *)(mask + off) = l4_mask->ip4src;
+ *fields |= DPAA2_ETH_DIST_IPSRC;
}
if (l4_mask->ip4dst) {
off = dpaa2_eth_cls_fld_off(NET_PROT_IP, NH_FLD_IP_DST);
*(__be32 *)(key + off) = l4_value->ip4dst;
*(__be32 *)(mask + off) = l4_mask->ip4dst;
+ *fields |= DPAA2_ETH_DIST_IPDST;
}
if (l4_mask->psrc) {
off = dpaa2_eth_cls_fld_off(NET_PROT_UDP, NH_FLD_UDP_PORT_SRC);
*(__be16 *)(key + off) = l4_value->psrc;
*(__be16 *)(mask + off) = l4_mask->psrc;
+ *fields |= DPAA2_ETH_DIST_L4SRC;
}
if (l4_mask->pdst) {
off = dpaa2_eth_cls_fld_off(NET_PROT_UDP, NH_FLD_UDP_PORT_DST);
*(__be16 *)(key + off) = l4_value->pdst;
*(__be16 *)(mask + off) = l4_mask->pdst;
+ *fields |= DPAA2_ETH_DIST_L4DST;
}
/* Only apply the rule for IPv4 frames with the specified L4 proto */
off = dpaa2_eth_cls_fld_off(NET_PROT_ETH, NH_FLD_ETH_TYPE);
*(__be16 *)(key + off) = htons(ETH_P_IP);
*(__be16 *)(mask + off) = htons(0xFFFF);
+ *fields |= DPAA2_ETH_DIST_ETHTYPE;
off = dpaa2_eth_cls_fld_off(NET_PROT_IP, NH_FLD_IP_PROTO);
*(u8 *)(key + off) = l4_proto;
*(u8 *)(mask + off) = 0xFF;
+ *fields |= DPAA2_ETH_DIST_IPPROTO;
return 0;
}
static int prep_ext_rule(struct ethtool_flow_ext *ext_value,
struct ethtool_flow_ext *ext_mask,
- void *key, void *mask)
+ void *key, void *mask, u64 *fields)
{
int off;
off = dpaa2_eth_cls_fld_off(NET_PROT_VLAN, NH_FLD_VLAN_TCI);
*(__be16 *)(key + off) = ext_value->vlan_tci;
*(__be16 *)(mask + off) = ext_mask->vlan_tci;
+ *fields |= DPAA2_ETH_DIST_VLAN;
}
return 0;
static int prep_mac_ext_rule(struct ethtool_flow_ext *ext_value,
struct ethtool_flow_ext *ext_mask,
- void *key, void *mask)
+ void *key, void *mask, u64 *fields)
{
int off;
off = dpaa2_eth_cls_fld_off(NET_PROT_ETH, NH_FLD_ETH_DA);
ether_addr_copy(key + off, ext_value->h_dest);
ether_addr_copy(mask + off, ext_mask->h_dest);
+ *fields |= DPAA2_ETH_DIST_ETHDST;
}
return 0;
}
-static int prep_cls_rule(struct ethtool_rx_flow_spec *fs, void *key, void *mask)
+static int prep_cls_rule(struct ethtool_rx_flow_spec *fs, void *key, void *mask,
+ u64 *fields)
{
int err;
switch (fs->flow_type & 0xFF) {
case ETHER_FLOW:
err = prep_eth_rule(&fs->h_u.ether_spec, &fs->m_u.ether_spec,
- key, mask);
+ key, mask, fields);
break;
case IP_USER_FLOW:
err = prep_uip_rule(&fs->h_u.usr_ip4_spec,
- &fs->m_u.usr_ip4_spec, key, mask);
+ &fs->m_u.usr_ip4_spec, key, mask, fields);
break;
case TCP_V4_FLOW:
err = prep_l4_rule(&fs->h_u.tcp_ip4_spec, &fs->m_u.tcp_ip4_spec,
- key, mask, IPPROTO_TCP);
+ key, mask, IPPROTO_TCP, fields);
break;
case UDP_V4_FLOW:
err = prep_l4_rule(&fs->h_u.udp_ip4_spec, &fs->m_u.udp_ip4_spec,
- key, mask, IPPROTO_UDP);
+ key, mask, IPPROTO_UDP, fields);
break;
case SCTP_V4_FLOW:
err = prep_l4_rule(&fs->h_u.sctp_ip4_spec,
&fs->m_u.sctp_ip4_spec, key, mask,
- IPPROTO_SCTP);
+ IPPROTO_SCTP, fields);
break;
default:
return -EOPNOTSUPP;
return err;
if (fs->flow_type & FLOW_EXT) {
- err = prep_ext_rule(&fs->h_ext, &fs->m_ext, key, mask);
+ err = prep_ext_rule(&fs->h_ext, &fs->m_ext, key, mask, fields);
if (err)
return err;
}
if (fs->flow_type & FLOW_MAC_EXT) {
- err = prep_mac_ext_rule(&fs->h_ext, &fs->m_ext, key, mask);
+ err = prep_mac_ext_rule(&fs->h_ext, &fs->m_ext, key, mask,
+ fields);
if (err)
return err;
}
struct dpni_rule_cfg rule_cfg = { 0 };
struct dpni_fs_action_cfg fs_act = { 0 };
dma_addr_t key_iova;
+ u64 fields = 0;
void *key_buf;
int err;
fs->ring_cookie >= dpaa2_eth_queue_count(priv))
return -EINVAL;
- rule_cfg.key_size = dpaa2_eth_cls_key_size();
+ rule_cfg.key_size = dpaa2_eth_cls_key_size(DPAA2_ETH_DIST_ALL);
/* allocate twice the key size, for the actual key and for mask */
key_buf = kzalloc(rule_cfg.key_size * 2, GFP_KERNEL);
return -ENOMEM;
/* Fill the key and mask memory areas */
- err = prep_cls_rule(fs, key_buf, key_buf + rule_cfg.key_size);
+ err = prep_cls_rule(fs, key_buf, key_buf + rule_cfg.key_size, &fields);
if (err)
goto free_mem;
+ if (!dpaa2_eth_fs_mask_enabled(priv)) {
+ /* Masking allows us to configure a maximal key during init and
+ * use it for all flow steering rules. Without it, we include
+ * in the key only the fields actually used, so we need to
+ * extract the others from the final key buffer.
+ *
+ * Program the FS key if needed, or return error if previously
+ * set key can't be used for the current rule. User needs to
+ * delete existing rules in this case to allow for the new one.
+ */
+ if (!priv->rx_cls_fields) {
+ err = dpaa2_eth_set_cls(net_dev, fields);
+ if (err)
+ goto free_mem;
+
+ priv->rx_cls_fields = fields;
+ } else if (priv->rx_cls_fields != fields) {
+ netdev_err(net_dev, "No support for multiple FS keys, need to delete existing rules\n");
+ err = -EOPNOTSUPP;
+ goto free_mem;
+ }
+
+ dpaa2_eth_cls_trim_rule(key_buf, fields);
+ rule_cfg.key_size = dpaa2_eth_cls_key_size(fields);
+ }
+
key_iova = dma_map_single(dev, key_buf, rule_cfg.key_size * 2,
DMA_TO_DEVICE);
if (dma_mapping_error(dev, key_iova)) {
}
rule_cfg.key_iova = key_iova;
- rule_cfg.mask_iova = key_iova + rule_cfg.key_size;
+ if (dpaa2_eth_fs_mask_enabled(priv))
+ rule_cfg.mask_iova = key_iova + rule_cfg.key_size;
if (add) {
if (fs->ring_cookie == RX_CLS_FLOW_DISC)
return err;
}
+static int num_rules(struct dpaa2_eth_priv *priv)
+{
+ int i, rules = 0;
+
+ for (i = 0; i < dpaa2_eth_fs_count(priv); i++)
+ if (priv->cls_rules[i].in_use)
+ rules++;
+
+ return rules;
+}
+
static int update_cls_rule(struct net_device *net_dev,
struct ethtool_rx_flow_spec *new_fs,
int location)
return err;
rule->in_use = 0;
+
+ if (!dpaa2_eth_fs_mask_enabled(priv) && !num_rules(priv))
+ priv->rx_cls_fields = 0;
}
/* If no new entry to add, return here */
break;
case ETHTOOL_GRXCLSRLCNT:
rxnfc->rule_cnt = 0;
- for (i = 0; i < max_rules; i++)
- if (priv->cls_rules[i].in_use)
- rxnfc->rule_cnt++;
+ rxnfc->rule_cnt = num_rules(priv);
rxnfc->data = max_rules;
break;
case ETHTOOL_GRXCLSRULE:
struct hns_mac_cb *mac_cb;
u8 addr[ETH_ALEN] = {0};
u8 port_num;
- u16 mskid;
+ int mskid;
/* promisc use vague table match with vlanid = 0 & macaddr = 0 */
hns_dsaf_set_mac_key(dsaf_dev, &mac_key, 0x00, port, addr);
static u16
hns_nic_select_queue(struct net_device *ndev, struct sk_buff *skb,
- struct net_device *sb_dev,
- select_queue_fallback_t fallback)
+ struct net_device *sb_dev)
{
struct ethhdr *eth_hdr = (struct ethhdr *)skb->data;
struct hns_nic_priv *priv = netdev_priv(ndev);
is_multicast_ether_addr(eth_hdr->h_dest))
return 0;
else
- return fallback(ndev, skb, NULL);
+ return netdev_pick_tx(ndev, skb, NULL);
}
static const struct net_device_ops hns_nic_netdev_ops = {
HCLGE_MBX_GET_QID_IN_PF, /* (VF -> PF) get queue id in pf */
HCLGE_MBX_LINK_STAT_MODE, /* (PF -> VF) link mode has changed */
HCLGE_MBX_GET_LINK_MODE, /* (VF -> PF) get the link mode of pf */
+ HLCGE_MBX_PUSH_VLAN_INFO, /* (PF -> VF) push port base vlan */
+ HCLGE_MBX_GET_MEDIA_TYPE, /* (VF -> PF) get media type */
HCLGE_MBX_GET_VF_FLR_STATUS = 200, /* (M7 -> PF) get vf reset status */
};
HCLGE_MBX_VLAN_FILTER = 0, /* set vlan filter */
HCLGE_MBX_VLAN_TX_OFF_CFG, /* set tx side vlan offload */
HCLGE_MBX_VLAN_RX_OFF_CFG, /* set rx side vlan offload */
+ HCLGE_MBX_PORT_BASE_VLAN_CFG, /* set port based vlan configuration */
+ HCLGE_MBX_GET_PORT_BASE_VLAN_STATE, /* get port based vlan state */
};
#define HCLGE_MBX_MAX_MSG_SIZE 16
return inited;
}
-static int hnae3_match_n_instantiate(struct hnae3_client *client,
- struct hnae3_ae_dev *ae_dev, bool is_reg)
+static int hnae3_init_client_instance(struct hnae3_client *client,
+ struct hnae3_ae_dev *ae_dev)
{
int ret;
return 0;
}
- /* now, (un-)instantiate client by calling lower layer */
- if (is_reg) {
- ret = ae_dev->ops->init_client_instance(client, ae_dev);
- if (ret)
- dev_err(&ae_dev->pdev->dev,
- "fail to instantiate client, ret = %d\n", ret);
+ ret = ae_dev->ops->init_client_instance(client, ae_dev);
+ if (ret)
+ dev_err(&ae_dev->pdev->dev,
+ "fail to instantiate client, ret = %d\n", ret);
- return ret;
- }
+ return ret;
+}
+
+static void hnae3_uninit_client_instance(struct hnae3_client *client,
+ struct hnae3_ae_dev *ae_dev)
+{
+ /* check if this client matches the type of ae_dev */
+ if (!(hnae3_client_match(client->type, ae_dev->dev_type) &&
+ hnae3_get_bit(ae_dev->flag, HNAE3_DEV_INITED_B)))
+ return;
if (hnae3_get_client_init_flag(client, ae_dev)) {
ae_dev->ops->uninit_client_instance(client, ae_dev);
hnae3_set_client_init_flag(client, ae_dev, 0);
}
-
- return 0;
}
int hnae3_register_client(struct hnae3_client *client)
/* if the client could not be initialized on current port, for
* any error reasons, move on to next available port
*/
- ret = hnae3_match_n_instantiate(client, ae_dev, true);
+ ret = hnae3_init_client_instance(client, ae_dev);
if (ret)
dev_err(&ae_dev->pdev->dev,
"match and instantiation failed for port, ret = %d\n",
mutex_lock(&hnae3_common_lock);
/* un-initialize the client on every matched port */
list_for_each_entry(ae_dev, &hnae3_ae_dev_list, node) {
- hnae3_match_n_instantiate(client, ae_dev, false);
+ hnae3_uninit_client_instance(client, ae_dev);
}
list_del(&client->node);
* initialize the figure out client instance
*/
list_for_each_entry(client, &hnae3_client_list, node) {
- ret = hnae3_match_n_instantiate(client, ae_dev, true);
+ ret = hnae3_init_client_instance(client, ae_dev);
if (ret)
dev_err(&ae_dev->pdev->dev,
"match and instantiation failed, ret = %d\n",
* un-initialize the figure out client instance
*/
list_for_each_entry(client, &hnae3_client_list, node)
- hnae3_match_n_instantiate(client, ae_dev, false);
+ hnae3_uninit_client_instance(client, ae_dev);
ae_algo->ops->uninit_ae_dev(ae_dev);
hnae3_set_bit(ae_dev->flag, HNAE3_DEV_INITED_B, 0);
* initialize the figure out client instance
*/
list_for_each_entry(client, &hnae3_client_list, node) {
- ret = hnae3_match_n_instantiate(client, ae_dev, true);
+ ret = hnae3_init_client_instance(client, ae_dev);
if (ret)
dev_err(&ae_dev->pdev->dev,
"match and instantiation failed, ret = %d\n",
continue;
list_for_each_entry(client, &hnae3_client_list, node)
- hnae3_match_n_instantiate(client, ae_dev, false);
+ hnae3_uninit_client_instance(client, ae_dev);
ae_algo->ops->uninit_ae_dev(ae_dev);
hnae3_set_bit(ae_dev->flag, HNAE3_DEV_INITED_B, 0);
HNAE3_FLR_DONE,
};
+enum hnae3_port_base_vlan_state {
+ HNAE3_PORT_BASE_VLAN_DISABLE,
+ HNAE3_PORT_BASE_VLAN_ENABLE,
+ HNAE3_PORT_BASE_VLAN_MODIFY,
+ HNAE3_PORT_BASE_VLAN_NOCHANGE,
+};
+
struct hnae3_vector_info {
u8 __iomem *io_addr;
int vector;
u32 numa_node_mask; /* for multi-chip support */
+ enum hnae3_port_base_vlan_state port_base_vlan_state;
+
u8 netdev_flags;
struct dentry *hnae3_dbgfs;
};
*/
static bool hns3_tunnel_csum_bug(struct sk_buff *skb)
{
-#define IANA_VXLAN_PORT 4789
union l4_hdr_info l4;
l4.hdr = skb_transport_header(skb);
- if (!(!skb->encapsulation && l4.udp->dest == htons(IANA_VXLAN_PORT)))
+ if (!(!skb->encapsulation &&
+ l4.udp->dest == htons(IANA_VXLAN_UDP_PORT)))
return false;
skb_checksum_help(skb);
{
#define HNS3_TX_VLAN_PRIO_SHIFT 13
+ struct hnae3_handle *handle = tx_ring->tqp->handle;
+
+ /* Since HW limitation, if port based insert VLAN enabled, only one VLAN
+ * header is allowed in skb, otherwise it will cause RAS error.
+ */
+ if (unlikely(skb_vlan_tagged_multi(skb) &&
+ handle->port_base_vlan_state ==
+ HNAE3_PORT_BASE_VLAN_ENABLE))
+ return -EINVAL;
+
if (skb->protocol == htons(ETH_P_8021Q) &&
!(tx_ring->tqp->handle->kinfo.netdev->features &
NETIF_F_HW_VLAN_CTAG_TX)) {
* and use inner_vtag in one tag case.
*/
if (skb->protocol == htons(ETH_P_8021Q)) {
- hns3_set_field(*out_vlan_flag, HNS3_TXD_OVLAN_B, 1);
- *out_vtag = vlan_tag;
+ if (handle->port_base_vlan_state ==
+ HNAE3_PORT_BASE_VLAN_DISABLE){
+ hns3_set_field(*out_vlan_flag,
+ HNS3_TXD_OVLAN_B, 1);
+ *out_vtag = vlan_tag;
+ } else {
+ hns3_set_field(*inner_vlan_flag,
+ HNS3_TXD_VLAN_B, 1);
+ *inner_vtag = vlan_tag;
+ }
} else {
hns3_set_field(*inner_vlan_flag, HNS3_TXD_VLAN_B, 1);
*inner_vtag = vlan_tag;
struct hns3_desc_cb *desc_cb = &ring->desc_cb[ring->next_to_use];
struct hns3_desc *desc = &ring->desc[ring->next_to_use];
struct device *dev = ring_to_dev(ring);
- u16 bdtp_fe_sc_vld_ra_ri = 0;
struct skb_frag_struct *frag;
unsigned int frag_buf_num;
int k, sizeoflast;
desc_cb->length = size;
+ if (likely(size <= HNS3_MAX_BD_SIZE)) {
+ u16 bdtp_fe_sc_vld_ra_ri = 0;
+
+ desc_cb->priv = priv;
+ desc_cb->dma = dma;
+ desc_cb->type = type;
+ desc->addr = cpu_to_le64(dma);
+ desc->tx.send_size = cpu_to_le16(size);
+ hns3_set_txbd_baseinfo(&bdtp_fe_sc_vld_ra_ri, frag_end);
+ desc->tx.bdtp_fe_sc_vld_ra_ri =
+ cpu_to_le16(bdtp_fe_sc_vld_ra_ri);
+
+ ring_ptr_move_fw(ring, next_to_use);
+ return 0;
+ }
+
frag_buf_num = hns3_tx_bd_count(size);
sizeoflast = size & HNS3_TX_LAST_SIZE_M;
sizeoflast = sizeoflast ? sizeoflast : HNS3_MAX_BD_SIZE;
/* When frag size is bigger than hardware limit, split this frag */
for (k = 0; k < frag_buf_num; k++) {
+ u16 bdtp_fe_sc_vld_ra_ri = 0;
+
/* The txbd's baseinfo of DESC_TYPE_PAGE & DESC_TYPE_SKB */
desc_cb->priv = priv;
desc_cb->dma = dma + HNS3_MAX_BD_SIZE * k;
struct hnae3_handle *h = hns3_get_handle(netdev);
int ret;
+ if (hns3_nic_resetting(netdev))
+ return -EBUSY;
+
if (!h->ae_algo->ops->set_mtu)
return -EOPNOTSUPP;
}
}
+static int hns3_gro_complete(struct sk_buff *skb)
+{
+ __be16 type = skb->protocol;
+ struct tcphdr *th;
+ int depth = 0;
+
+ while (type == htons(ETH_P_8021Q)) {
+ struct vlan_hdr *vh;
+
+ if ((depth + VLAN_HLEN) > skb_headlen(skb))
+ return -EFAULT;
+
+ vh = (struct vlan_hdr *)(skb->data + depth);
+ type = vh->h_vlan_encapsulated_proto;
+ depth += VLAN_HLEN;
+ }
+
+ if (type == htons(ETH_P_IP)) {
+ depth += sizeof(struct iphdr);
+ } else if (type == htons(ETH_P_IPV6)) {
+ depth += sizeof(struct ipv6hdr);
+ } else {
+ netdev_err(skb->dev,
+ "Error: FW GRO supports only IPv4/IPv6, not 0x%04x, depth: %d\n",
+ be16_to_cpu(type), depth);
+ return -EFAULT;
+ }
+
+ th = (struct tcphdr *)(skb->data + depth);
+ skb_shinfo(skb)->gso_segs = NAPI_GRO_CB(skb)->count;
+ if (th->cwr)
+ skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ECN;
+
+ skb->ip_summed = CHECKSUM_UNNECESSARY;
+
+ return 0;
+}
+
static void hns3_rx_checksum(struct hns3_enet_ring *ring, struct sk_buff *skb,
- struct hns3_desc *desc)
+ u32 l234info, u32 bd_base_info)
{
struct net_device *netdev = ring->tqp->handle->kinfo.netdev;
int l3_type, l4_type;
- u32 bd_base_info;
int ol4_type;
- u32 l234info;
-
- bd_base_info = le32_to_cpu(desc->rx.bd_base_info);
- l234info = le32_to_cpu(desc->rx.l234_info);
skb->ip_summed = CHECKSUM_NONE;
if (!(netdev->features & NETIF_F_RXCSUM))
return;
- /* We MUST enable hardware checksum before enabling hardware GRO */
- if (skb_shinfo(skb)->gso_size) {
- skb->ip_summed = CHECKSUM_UNNECESSARY;
- return;
- }
-
/* check if hardware has done checksum */
if (!(bd_base_info & BIT(HNS3_RXD_L3L4P_B)))
return;
struct hns3_desc *desc, u32 l234info,
u16 *vlan_tag)
{
+ struct hnae3_handle *handle = ring->tqp->handle;
struct pci_dev *pdev = ring->tqp->handle->pdev;
if (pdev->revision == 0x20) {
#define HNS3_STRP_OUTER_VLAN 0x1
#define HNS3_STRP_INNER_VLAN 0x2
+#define HNS3_STRP_BOTH 0x3
+ /* Hardware always insert VLAN tag into RX descriptor when
+ * remove the tag from packet, driver needs to determine
+ * reporting which tag to stack.
+ */
switch (hnae3_get_field(l234info, HNS3_RXD_STRP_TAGP_M,
HNS3_RXD_STRP_TAGP_S)) {
case HNS3_STRP_OUTER_VLAN:
+ if (handle->port_base_vlan_state !=
+ HNAE3_PORT_BASE_VLAN_DISABLE)
+ return false;
+
*vlan_tag = le16_to_cpu(desc->rx.ot_vlan_tag);
return true;
case HNS3_STRP_INNER_VLAN:
+ if (handle->port_base_vlan_state !=
+ HNAE3_PORT_BASE_VLAN_DISABLE)
+ return false;
+
*vlan_tag = le16_to_cpu(desc->rx.vlan_tag);
+ return true;
+ case HNS3_STRP_BOTH:
+ if (handle->port_base_vlan_state ==
+ HNAE3_PORT_BASE_VLAN_DISABLE)
+ *vlan_tag = le16_to_cpu(desc->rx.ot_vlan_tag);
+ else
+ *vlan_tag = le16_to_cpu(desc->rx.vlan_tag);
+
return true;
default:
return false;
return 0;
}
-static void hns3_set_gro_param(struct sk_buff *skb, u32 l234info,
- u32 bd_base_info)
+static int hns3_set_gro_and_checksum(struct hns3_enet_ring *ring,
+ struct sk_buff *skb, u32 l234info,
+ u32 bd_base_info)
{
u16 gro_count;
u32 l3_type;
gro_count = hnae3_get_field(l234info, HNS3_RXD_GRO_COUNT_M,
HNS3_RXD_GRO_COUNT_S);
/* if there is no HW GRO, do not set gro params */
- if (!gro_count)
- return;
+ if (!gro_count) {
+ hns3_rx_checksum(ring, skb, l234info, bd_base_info);
+ return 0;
+ }
- /* tcp_gro_complete() will copy NAPI_GRO_CB(skb)->count
- * to skb_shinfo(skb)->gso_segs
- */
NAPI_GRO_CB(skb)->count = gro_count;
l3_type = hnae3_get_field(l234info, HNS3_RXD_L3ID_M,
else if (l3_type == HNS3_L3_TYPE_IPV6)
skb_shinfo(skb)->gso_type = SKB_GSO_TCPV6;
else
- return;
+ return -EFAULT;
skb_shinfo(skb)->gso_size = hnae3_get_field(bd_base_info,
HNS3_RXD_GRO_SIZE_M,
HNS3_RXD_GRO_SIZE_S);
- if (skb_shinfo(skb)->gso_size)
- tcp_gro_complete(skb);
+
+ return hns3_gro_complete(skb);
}
static void hns3_set_rx_skb_rss_type(struct hns3_enet_ring *ring,
skb_set_hash(skb, le32_to_cpu(desc->rx.rss_hash), rss_type);
}
-static int hns3_handle_rx_bd(struct hns3_enet_ring *ring,
- struct sk_buff **out_skb)
+static int hns3_handle_bdinfo(struct hns3_enet_ring *ring, struct sk_buff *skb,
+ struct hns3_desc *desc)
{
struct net_device *netdev = ring->tqp->handle->kinfo.netdev;
+ u32 bd_base_info = le32_to_cpu(desc->rx.bd_base_info);
+ u32 l234info = le32_to_cpu(desc->rx.l234_info);
enum hns3_pkt_l2t_type l2_frame_type;
+ unsigned int len;
+ int ret;
+
+ /* Based on hw strategy, the tag offloaded will be stored at
+ * ot_vlan_tag in two layer tag case, and stored at vlan_tag
+ * in one layer tag case.
+ */
+ if (netdev->features & NETIF_F_HW_VLAN_CTAG_RX) {
+ u16 vlan_tag;
+
+ if (hns3_parse_vlan_tag(ring, desc, l234info, &vlan_tag))
+ __vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q),
+ vlan_tag);
+ }
+
+ if (unlikely(!(bd_base_info & BIT(HNS3_RXD_VLD_B)))) {
+ u64_stats_update_begin(&ring->syncp);
+ ring->stats.non_vld_descs++;
+ u64_stats_update_end(&ring->syncp);
+
+ return -EINVAL;
+ }
+
+ if (unlikely(!desc->rx.pkt_len || (l234info & (BIT(HNS3_RXD_TRUNCAT_B) |
+ BIT(HNS3_RXD_L2E_B))))) {
+ u64_stats_update_begin(&ring->syncp);
+ if (l234info & BIT(HNS3_RXD_L2E_B))
+ ring->stats.l2_err++;
+ else
+ ring->stats.err_pkt_len++;
+ u64_stats_update_end(&ring->syncp);
+
+ return -EFAULT;
+ }
+
+ len = skb->len;
+
+ /* Do update ip stack process */
+ skb->protocol = eth_type_trans(skb, netdev);
+
+ /* This is needed in order to enable forwarding support */
+ ret = hns3_set_gro_and_checksum(ring, skb, l234info, bd_base_info);
+ if (unlikely(ret)) {
+ u64_stats_update_begin(&ring->syncp);
+ ring->stats.rx_err_cnt++;
+ u64_stats_update_end(&ring->syncp);
+ return ret;
+ }
+
+ l2_frame_type = hnae3_get_field(l234info, HNS3_RXD_DMAC_M,
+ HNS3_RXD_DMAC_S);
+
+ u64_stats_update_begin(&ring->syncp);
+ ring->stats.rx_pkts++;
+ ring->stats.rx_bytes += len;
+
+ if (l2_frame_type == HNS3_L2_TYPE_MULTICAST)
+ ring->stats.rx_multicast++;
+
+ u64_stats_update_end(&ring->syncp);
+
+ ring->tqp_vector->rx_group.total_bytes += len;
+ return 0;
+}
+
+static int hns3_handle_rx_bd(struct hns3_enet_ring *ring,
+ struct sk_buff **out_skb)
+{
struct sk_buff *skb = ring->skb;
struct hns3_desc_cb *desc_cb;
struct hns3_desc *desc;
u32 bd_base_info;
- u32 l234info;
int length;
int ret;
ALIGN(ring->pull_len, sizeof(long)));
}
- l234info = le32_to_cpu(desc->rx.l234_info);
- bd_base_info = le32_to_cpu(desc->rx.bd_base_info);
-
- /* Based on hw strategy, the tag offloaded will be stored at
- * ot_vlan_tag in two layer tag case, and stored at vlan_tag
- * in one layer tag case.
- */
- if (netdev->features & NETIF_F_HW_VLAN_CTAG_RX) {
- u16 vlan_tag;
-
- if (hns3_parse_vlan_tag(ring, desc, l234info, &vlan_tag))
- __vlan_hwaccel_put_tag(skb,
- htons(ETH_P_8021Q),
- vlan_tag);
- }
-
- if (unlikely(!(bd_base_info & BIT(HNS3_RXD_VLD_B)))) {
- u64_stats_update_begin(&ring->syncp);
- ring->stats.non_vld_descs++;
- u64_stats_update_end(&ring->syncp);
-
+ ret = hns3_handle_bdinfo(ring, skb, desc);
+ if (unlikely(ret)) {
dev_kfree_skb_any(skb);
- return -EINVAL;
- }
-
- if (unlikely((!desc->rx.pkt_len) ||
- (l234info & (BIT(HNS3_RXD_TRUNCAT_B) |
- BIT(HNS3_RXD_L2E_B))))) {
- u64_stats_update_begin(&ring->syncp);
- if (l234info & BIT(HNS3_RXD_L2E_B))
- ring->stats.l2_err++;
- else
- ring->stats.err_pkt_len++;
- u64_stats_update_end(&ring->syncp);
-
- dev_kfree_skb_any(skb);
- return -EFAULT;
+ return ret;
}
-
- l2_frame_type = hnae3_get_field(l234info, HNS3_RXD_DMAC_M,
- HNS3_RXD_DMAC_S);
- u64_stats_update_begin(&ring->syncp);
- if (l2_frame_type == HNS3_L2_TYPE_MULTICAST)
- ring->stats.rx_multicast++;
-
- ring->stats.rx_pkts++;
- ring->stats.rx_bytes += skb->len;
- u64_stats_update_end(&ring->syncp);
-
- ring->tqp_vector->rx_group.total_bytes += skb->len;
-
- /* This is needed in order to enable forwarding support */
- hns3_set_gro_param(skb, l234info, bd_base_info);
-
- hns3_rx_checksum(ring, skb, desc);
*out_skb = skb;
hns3_set_rx_skb_rss_type(ring, skb);
void (*rx_fn)(struct hns3_enet_ring *, struct sk_buff *))
{
#define RCB_NOF_ALLOC_RX_BUFF_ONCE 16
- struct net_device *netdev = ring->tqp->handle->kinfo.netdev;
int recv_pkts, recv_bds, clean_count, err;
int unused_count = hns3_desc_unused(ring) - ring->pending_buf;
struct sk_buff *skb = ring->skb;
continue;
}
- /* Do update ip stack process */
- skb->protocol = eth_type_trans(skb, netdev);
rx_fn(ring, skb);
recv_bds += ring->pending_buf;
clean_count += ring->pending_buf;
struct hns3_enet_tqp_vector *tqp_vector =
container_of(napi, struct hns3_enet_tqp_vector, napi);
bool clean_complete = true;
- int rx_budget;
+ int rx_budget = budget;
if (unlikely(test_bit(HNS3_NIC_STATE_DOWN, &priv->state))) {
napi_complete(napi);
hns3_clean_tx_ring(ring);
/* make sure rx ring budget not smaller than 1 */
- rx_budget = max(budget / tqp_vector->num_tqps, 1);
+ if (tqp_vector->num_tqps > 1)
+ rx_budget = max(budget / tqp_vector->num_tqps, 1);
hns3_for_each_ring(ring, tqp_vector->rx_group) {
int rx_cleaned = hns3_clean_rx_ring(ring, rx_budget,
struct hns3_nic_priv *priv = netdev_priv(netdev);
int ret;
- hns3_client_stop(handle);
-
hns3_remove_hw_addr(netdev);
if (netdev->reg_state != NETREG_UNINITIALIZED)
unregister_netdev(netdev);
+ hns3_client_stop(handle);
+
if (!test_and_clear_bit(HNS3_NIC_STATE_INITED, &priv->state)) {
netdev_warn(netdev, "already uninitialized\n");
goto out_netdev_free;
struct netdev_hw_addr *ha, *tmp;
int ret = 0;
+ netif_addr_lock_bh(ndev);
/* go through and sync uc_addr entries to the device */
list = &ndev->uc;
list_for_each_entry_safe(ha, tmp, &list->list, list) {
ret = hns3_nic_uc_sync(ndev, ha->addr);
if (ret)
- return ret;
+ goto out;
}
/* go through and sync mc_addr entries to the device */
list_for_each_entry_safe(ha, tmp, &list->list, list) {
ret = hns3_nic_mc_sync(ndev, ha->addr);
if (ret)
- return ret;
+ goto out;
}
+out:
+ netif_addr_unlock_bh(ndev);
return ret;
}
hns3_nic_uc_unsync(netdev, netdev->dev_addr);
+ netif_addr_lock_bh(netdev);
/* go through and unsync uc_addr entries to the device */
list = &netdev->uc;
list_for_each_entry_safe(ha, tmp, &list->list, list)
list_for_each_entry_safe(ha, tmp, &list->list, list)
if (ha->refcount > 1)
hns3_nic_mc_unsync(netdev, ha->addr);
+
+ netif_addr_unlock_bh(netdev);
}
static void hns3_clear_tx_ring(struct hns3_enet_ring *ring)
ring_ptr_move_fw(ring, next_to_use);
}
+ /* Free the pending skb in rx ring */
+ if (ring->skb) {
+ dev_kfree_skb_any(ring->skb);
+ ring->skb = NULL;
+ ring->pending_buf = 0;
+ }
+
return 0;
}
if (ret)
goto err_uninit_vector;
+ ret = hns3_client_start(handle);
+ if (ret) {
+ dev_err(priv->dev, "hns3_client_start fail! ret=%d\n", ret);
+ goto err_uninit_ring;
+ }
+
set_bit(HNS3_NIC_STATE_INITED, &priv->state);
return ret;
+err_uninit_ring:
+ hns3_uninit_all_ring(priv);
err_uninit_vector:
hns3_nic_uninit_vector_data(priv);
priv->ring_data = NULL;
struct hns3_nic_priv *priv = netdev_priv(netdev);
int ret;
- if (!test_bit(HNS3_NIC_STATE_INITED, &priv->state)) {
+ if (!test_and_clear_bit(HNS3_NIC_STATE_INITED, &priv->state)) {
netdev_warn(netdev, "already uninitialized\n");
return 0;
}
hns3_put_ring_config(priv);
priv->ring_data = NULL;
- clear_bit(HNS3_NIC_STATE_INITED, &priv->state);
-
return ret;
}
unsigned char *hdr;
};
-/* the distance between [begin, end) in a ring buffer
- * note: there is a unuse slot between the begin and the end
- */
-static inline int ring_dist(struct hns3_enet_ring *ring, int begin, int end)
-{
- return (end - begin + ring->desc_num) % ring->desc_num;
-}
-
static inline int ring_space(struct hns3_enet_ring *ring)
{
- return ring->desc_num -
- ring_dist(ring, ring->next_to_clean, ring->next_to_use) - 1;
+ int begin = ring->next_to_clean;
+ int end = ring->next_to_use;
+
+ return ((end >= begin) ? (ring->desc_num - end + begin) :
+ (begin - end)) - 1;
}
static inline int is_ring_empty(struct hns3_enet_ring *ring)
struct hnae3_handle *h = hns3_get_handle(netdev);
u64 *p = data;
+ if (hns3_nic_resetting(netdev)) {
+ netdev_err(netdev, "dev resetting, could not get stats\n");
+ return;
+ }
+
if (!h->ae_algo->ops->get_stats || !h->ae_algo->ops->update_stats) {
netdev_err(netdev, "could not get any statistics\n");
return;
static int hns3_set_link_ksettings(struct net_device *netdev,
const struct ethtool_link_ksettings *cmd)
{
+ /* Chip doesn't support this mode. */
+ if (cmd->base.speed == SPEED_1000 && cmd->base.duplex == DUPLEX_HALF)
+ return -EINVAL;
+
/* Only support ksettings_set for netdev with phy attached for now */
if (netdev->phydev)
return phy_ethtool_ksettings_set(netdev->phydev, cmd);
int ret;
spin_lock_bh(&hdev->hw.cmq.csq.lock);
- spin_lock_bh(&hdev->hw.cmq.crq.lock);
+ spin_lock(&hdev->hw.cmq.crq.lock);
hdev->hw.cmq.csq.next_to_clean = 0;
hdev->hw.cmq.csq.next_to_use = 0;
hclge_cmd_init_regs(&hdev->hw);
- spin_unlock_bh(&hdev->hw.cmq.crq.lock);
+ spin_unlock(&hdev->hw.cmq.crq.lock);
spin_unlock_bh(&hdev->hw.cmq.csq.lock);
clear_bit(HCLGE_STATE_CMD_DISABLE, &hdev->state);
* reset may happen when lower level reset is being processed.
*/
if ((hclge_is_reset_pending(hdev))) {
- set_bit(HCLGE_STATE_CMD_DISABLE, &hdev->state);
- return -EBUSY;
+ ret = -EBUSY;
+ goto err_cmd_init;
}
ret = hclge_cmd_query_firmware_version(&hdev->hw, &version);
if (ret) {
dev_err(&hdev->pdev->dev,
"firmware version query failed %d\n", ret);
- return ret;
+ goto err_cmd_init;
}
hdev->fw_version = version;
dev_info(&hdev->pdev->dev, "The firmware version is %08x\n", version);
return 0;
+
+err_cmd_init:
+ set_bit(HCLGE_STATE_CMD_DISABLE, &hdev->state);
+
+ return ret;
}
static void hclge_cmd_uninit_regs(struct hclge_hw *hw)
spin_unlock(&ring->lock);
}
-void hclge_destroy_cmd_queue(struct hclge_hw *hw)
+static void hclge_destroy_cmd_queue(struct hclge_hw *hw)
{
hclge_destroy_queue(&hw->cmq.csq);
hclge_destroy_queue(&hw->cmq.crq);
#include "hclge_err.h"
static const struct hclge_hw_error hclge_imp_tcm_ecc_int[] = {
- { .int_msk = BIT(1), .msg = "imp_itcm0_ecc_mbit_err" },
- { .int_msk = BIT(3), .msg = "imp_itcm1_ecc_mbit_err" },
- { .int_msk = BIT(5), .msg = "imp_itcm2_ecc_mbit_err" },
- { .int_msk = BIT(7), .msg = "imp_itcm3_ecc_mbit_err" },
- { .int_msk = BIT(9), .msg = "imp_dtcm0_mem0_ecc_mbit_err" },
- { .int_msk = BIT(11), .msg = "imp_dtcm0_mem1_ecc_mbit_err" },
- { .int_msk = BIT(13), .msg = "imp_dtcm1_mem0_ecc_mbit_err" },
- { .int_msk = BIT(15), .msg = "imp_dtcm1_mem1_ecc_mbit_err" },
- { .int_msk = BIT(17), .msg = "imp_itcm4_ecc_mbit_err" },
+ { .int_msk = BIT(1), .msg = "imp_itcm0_ecc_mbit_err",
+ .reset_level = HNAE3_NONE_RESET },
+ { .int_msk = BIT(3), .msg = "imp_itcm1_ecc_mbit_err",
+ .reset_level = HNAE3_NONE_RESET },
+ { .int_msk = BIT(5), .msg = "imp_itcm2_ecc_mbit_err",
+ .reset_level = HNAE3_NONE_RESET },
+ { .int_msk = BIT(7), .msg = "imp_itcm3_ecc_mbit_err",
+ .reset_level = HNAE3_NONE_RESET },
+ { .int_msk = BIT(9), .msg = "imp_dtcm0_mem0_ecc_mbit_err",
+ .reset_level = HNAE3_NONE_RESET },
+ { .int_msk = BIT(11), .msg = "imp_dtcm0_mem1_ecc_mbit_err",
+ .reset_level = HNAE3_NONE_RESET },
+ { .int_msk = BIT(13), .msg = "imp_dtcm1_mem0_ecc_mbit_err",
+ .reset_level = HNAE3_NONE_RESET },
+ { .int_msk = BIT(15), .msg = "imp_dtcm1_mem1_ecc_mbit_err",
+ .reset_level = HNAE3_NONE_RESET },
+ { .int_msk = BIT(17), .msg = "imp_itcm4_ecc_mbit_err",
+ .reset_level = HNAE3_NONE_RESET },
{ /* sentinel */ }
};
static const struct hclge_hw_error hclge_cmdq_nic_mem_ecc_int[] = {
- { .int_msk = BIT(1), .msg = "cmdq_nic_rx_depth_ecc_mbit_err" },
- { .int_msk = BIT(3), .msg = "cmdq_nic_tx_depth_ecc_mbit_err" },
- { .int_msk = BIT(5), .msg = "cmdq_nic_rx_tail_ecc_mbit_err" },
- { .int_msk = BIT(7), .msg = "cmdq_nic_tx_tail_ecc_mbit_err" },
- { .int_msk = BIT(9), .msg = "cmdq_nic_rx_head_ecc_mbit_err" },
- { .int_msk = BIT(11), .msg = "cmdq_nic_tx_head_ecc_mbit_err" },
- { .int_msk = BIT(13), .msg = "cmdq_nic_rx_addr_ecc_mbit_err" },
- { .int_msk = BIT(15), .msg = "cmdq_nic_tx_addr_ecc_mbit_err" },
- { .int_msk = BIT(17), .msg = "cmdq_rocee_rx_depth_ecc_mbit_err" },
- { .int_msk = BIT(19), .msg = "cmdq_rocee_tx_depth_ecc_mbit_err" },
- { .int_msk = BIT(21), .msg = "cmdq_rocee_rx_tail_ecc_mbit_err" },
- { .int_msk = BIT(23), .msg = "cmdq_rocee_tx_tail_ecc_mbit_err" },
- { .int_msk = BIT(25), .msg = "cmdq_rocee_rx_head_ecc_mbit_err" },
- { .int_msk = BIT(27), .msg = "cmdq_rocee_tx_head_ecc_mbit_err" },
- { .int_msk = BIT(29), .msg = "cmdq_rocee_rx_addr_ecc_mbit_err" },
- { .int_msk = BIT(31), .msg = "cmdq_rocee_tx_addr_ecc_mbit_err" },
+ { .int_msk = BIT(1), .msg = "cmdq_nic_rx_depth_ecc_mbit_err",
+ .reset_level = HNAE3_NONE_RESET },
+ { .int_msk = BIT(3), .msg = "cmdq_nic_tx_depth_ecc_mbit_err",
+ .reset_level = HNAE3_NONE_RESET },
+ { .int_msk = BIT(5), .msg = "cmdq_nic_rx_tail_ecc_mbit_err",
+ .reset_level = HNAE3_NONE_RESET },
+ { .int_msk = BIT(7), .msg = "cmdq_nic_tx_tail_ecc_mbit_err",
+ .reset_level = HNAE3_NONE_RESET },
+ { .int_msk = BIT(9), .msg = "cmdq_nic_rx_head_ecc_mbit_err",
+ .reset_level = HNAE3_NONE_RESET },
+ { .int_msk = BIT(11), .msg = "cmdq_nic_tx_head_ecc_mbit_err",
+ .reset_level = HNAE3_NONE_RESET },
+ { .int_msk = BIT(13), .msg = "cmdq_nic_rx_addr_ecc_mbit_err",
+ .reset_level = HNAE3_NONE_RESET },
+ { .int_msk = BIT(15), .msg = "cmdq_nic_tx_addr_ecc_mbit_err",
+ .reset_level = HNAE3_NONE_RESET },
+ { .int_msk = BIT(17), .msg = "cmdq_rocee_rx_depth_ecc_mbit_err",
+ .reset_level = HNAE3_NONE_RESET },
+ { .int_msk = BIT(19), .msg = "cmdq_rocee_tx_depth_ecc_mbit_err",
+ .reset_level = HNAE3_NONE_RESET },
+ { .int_msk = BIT(21), .msg = "cmdq_rocee_rx_tail_ecc_mbit_err",
+ .reset_level = HNAE3_NONE_RESET },
+ { .int_msk = BIT(23), .msg = "cmdq_rocee_tx_tail_ecc_mbit_err",
+ .reset_level = HNAE3_NONE_RESET },
+ { .int_msk = BIT(25), .msg = "cmdq_rocee_rx_head_ecc_mbit_err",
+ .reset_level = HNAE3_NONE_RESET },
+ { .int_msk = BIT(27), .msg = "cmdq_rocee_tx_head_ecc_mbit_err",
+ .reset_level = HNAE3_NONE_RESET },
+ { .int_msk = BIT(29), .msg = "cmdq_rocee_rx_addr_ecc_mbit_err",
+ .reset_level = HNAE3_NONE_RESET },
+ { .int_msk = BIT(31), .msg = "cmdq_rocee_tx_addr_ecc_mbit_err",
+ .reset_level = HNAE3_NONE_RESET },
{ /* sentinel */ }
};
static const struct hclge_hw_error hclge_tqp_int_ecc_int[] = {
- { .int_msk = BIT(6), .msg = "tqp_int_cfg_even_ecc_mbit_err" },
- { .int_msk = BIT(7), .msg = "tqp_int_cfg_odd_ecc_mbit_err" },
- { .int_msk = BIT(8), .msg = "tqp_int_ctrl_even_ecc_mbit_err" },
- { .int_msk = BIT(9), .msg = "tqp_int_ctrl_odd_ecc_mbit_err" },
- { .int_msk = BIT(10), .msg = "tx_que_scan_int_ecc_mbit_err" },
- { .int_msk = BIT(11), .msg = "rx_que_scan_int_ecc_mbit_err" },
+ { .int_msk = BIT(6), .msg = "tqp_int_cfg_even_ecc_mbit_err",
+ .reset_level = HNAE3_NONE_RESET },
+ { .int_msk = BIT(7), .msg = "tqp_int_cfg_odd_ecc_mbit_err",
+ .reset_level = HNAE3_NONE_RESET },
+ { .int_msk = BIT(8), .msg = "tqp_int_ctrl_even_ecc_mbit_err",
+ .reset_level = HNAE3_NONE_RESET },
+ { .int_msk = BIT(9), .msg = "tqp_int_ctrl_odd_ecc_mbit_err",
+ .reset_level = HNAE3_NONE_RESET },
+ { .int_msk = BIT(10), .msg = "tx_que_scan_int_ecc_mbit_err",
+ .reset_level = HNAE3_NONE_RESET },
+ { .int_msk = BIT(11), .msg = "rx_que_scan_int_ecc_mbit_err",
+ .reset_level = HNAE3_NONE_RESET },
{ /* sentinel */ }
};
static const struct hclge_hw_error hclge_msix_sram_ecc_int[] = {
- { .int_msk = BIT(1), .msg = "msix_nic_ecc_mbit_err" },
- { .int_msk = BIT(3), .msg = "msix_rocee_ecc_mbit_err" },
+ { .int_msk = BIT(1), .msg = "msix_nic_ecc_mbit_err",
+ .reset_level = HNAE3_NONE_RESET },
+ { .int_msk = BIT(3), .msg = "msix_rocee_ecc_mbit_err",
+ .reset_level = HNAE3_NONE_RESET },
{ /* sentinel */ }
};
static const struct hclge_hw_error hclge_igu_int[] = {
- { .int_msk = BIT(0), .msg = "igu_rx_buf0_ecc_mbit_err" },
- { .int_msk = BIT(2), .msg = "igu_rx_buf1_ecc_mbit_err" },
+ { .int_msk = BIT(0), .msg = "igu_rx_buf0_ecc_mbit_err",
+ .reset_level = HNAE3_CORE_RESET },
+ { .int_msk = BIT(2), .msg = "igu_rx_buf1_ecc_mbit_err",
+ .reset_level = HNAE3_CORE_RESET },
{ /* sentinel */ }
};
static const struct hclge_hw_error hclge_igu_egu_tnl_int[] = {
- { .int_msk = BIT(0), .msg = "rx_buf_overflow" },
- { .int_msk = BIT(1), .msg = "rx_stp_fifo_overflow" },
- { .int_msk = BIT(2), .msg = "rx_stp_fifo_undeflow" },
- { .int_msk = BIT(3), .msg = "tx_buf_overflow" },
- { .int_msk = BIT(4), .msg = "tx_buf_underrun" },
- { .int_msk = BIT(5), .msg = "rx_stp_buf_overflow" },
+ { .int_msk = BIT(0), .msg = "rx_buf_overflow",
+ .reset_level = HNAE3_CORE_RESET },
+ { .int_msk = BIT(1), .msg = "rx_stp_fifo_overflow",
+ .reset_level = HNAE3_CORE_RESET },
+ { .int_msk = BIT(2), .msg = "rx_stp_fifo_undeflow",
+ .reset_level = HNAE3_CORE_RESET },
+ { .int_msk = BIT(3), .msg = "tx_buf_overflow",
+ .reset_level = HNAE3_CORE_RESET },
+ { .int_msk = BIT(4), .msg = "tx_buf_underrun",
+ .reset_level = HNAE3_CORE_RESET },
+ { .int_msk = BIT(5), .msg = "rx_stp_buf_overflow",
+ .reset_level = HNAE3_CORE_RESET },
{ /* sentinel */ }
};
static const struct hclge_hw_error hclge_ncsi_err_int[] = {
- { .int_msk = BIT(1), .msg = "ncsi_tx_ecc_mbit_err" },
+ { .int_msk = BIT(1), .msg = "ncsi_tx_ecc_mbit_err",
+ .reset_level = HNAE3_NONE_RESET },
{ /* sentinel */ }
};
static const struct hclge_hw_error hclge_ppp_mpf_abnormal_int_st1[] = {
- { .int_msk = BIT(0), .msg = "vf_vlan_ad_mem_ecc_mbit_err" },
- { .int_msk = BIT(1), .msg = "umv_mcast_group_mem_ecc_mbit_err" },
- { .int_msk = BIT(2), .msg = "umv_key_mem0_ecc_mbit_err" },
- { .int_msk = BIT(3), .msg = "umv_key_mem1_ecc_mbit_err" },
- { .int_msk = BIT(4), .msg = "umv_key_mem2_ecc_mbit_err" },
- { .int_msk = BIT(5), .msg = "umv_key_mem3_ecc_mbit_err" },
- { .int_msk = BIT(6), .msg = "umv_ad_mem_ecc_mbit_err" },
- { .int_msk = BIT(7), .msg = "rss_tc_mode_mem_ecc_mbit_err" },
- { .int_msk = BIT(8), .msg = "rss_idt_mem0_ecc_mbit_err" },
- { .int_msk = BIT(9), .msg = "rss_idt_mem1_ecc_mbit_err" },
- { .int_msk = BIT(10), .msg = "rss_idt_mem2_ecc_mbit_err" },
- { .int_msk = BIT(11), .msg = "rss_idt_mem3_ecc_mbit_err" },
- { .int_msk = BIT(12), .msg = "rss_idt_mem4_ecc_mbit_err" },
- { .int_msk = BIT(13), .msg = "rss_idt_mem5_ecc_mbit_err" },
- { .int_msk = BIT(14), .msg = "rss_idt_mem6_ecc_mbit_err" },
- { .int_msk = BIT(15), .msg = "rss_idt_mem7_ecc_mbit_err" },
- { .int_msk = BIT(16), .msg = "rss_idt_mem8_ecc_mbit_err" },
- { .int_msk = BIT(17), .msg = "rss_idt_mem9_ecc_mbit_err" },
- { .int_msk = BIT(18), .msg = "rss_idt_mem10_ecc_m1bit_err" },
- { .int_msk = BIT(19), .msg = "rss_idt_mem11_ecc_mbit_err" },
- { .int_msk = BIT(20), .msg = "rss_idt_mem12_ecc_mbit_err" },
- { .int_msk = BIT(21), .msg = "rss_idt_mem13_ecc_mbit_err" },
- { .int_msk = BIT(22), .msg = "rss_idt_mem14_ecc_mbit_err" },
- { .int_msk = BIT(23), .msg = "rss_idt_mem15_ecc_mbit_err" },
- { .int_msk = BIT(24), .msg = "port_vlan_mem_ecc_mbit_err" },
- { .int_msk = BIT(25), .msg = "mcast_linear_table_mem_ecc_mbit_err" },
- { .int_msk = BIT(26), .msg = "mcast_result_mem_ecc_mbit_err" },
- { .int_msk = BIT(27),
- .msg = "flow_director_ad_mem0_ecc_mbit_err" },
- { .int_msk = BIT(28),
- .msg = "flow_director_ad_mem1_ecc_mbit_err" },
- { .int_msk = BIT(29),
- .msg = "rx_vlan_tag_memory_ecc_mbit_err" },
- { .int_msk = BIT(30),
- .msg = "Tx_UP_mapping_config_mem_ecc_mbit_err" },
+ { .int_msk = BIT(0), .msg = "vf_vlan_ad_mem_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(1), .msg = "umv_mcast_group_mem_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(2), .msg = "umv_key_mem0_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(3), .msg = "umv_key_mem1_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(4), .msg = "umv_key_mem2_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(5), .msg = "umv_key_mem3_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(6), .msg = "umv_ad_mem_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(7), .msg = "rss_tc_mode_mem_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(8), .msg = "rss_idt_mem0_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(9), .msg = "rss_idt_mem1_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(10), .msg = "rss_idt_mem2_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(11), .msg = "rss_idt_mem3_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(12), .msg = "rss_idt_mem4_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(13), .msg = "rss_idt_mem5_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(14), .msg = "rss_idt_mem6_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(15), .msg = "rss_idt_mem7_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(16), .msg = "rss_idt_mem8_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(17), .msg = "rss_idt_mem9_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(18), .msg = "rss_idt_mem10_ecc_m1bit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(19), .msg = "rss_idt_mem11_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(20), .msg = "rss_idt_mem12_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(21), .msg = "rss_idt_mem13_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(22), .msg = "rss_idt_mem14_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(23), .msg = "rss_idt_mem15_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(24), .msg = "port_vlan_mem_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(25), .msg = "mcast_linear_table_mem_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(26), .msg = "mcast_result_mem_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(27), .msg = "flow_director_ad_mem0_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(28), .msg = "flow_director_ad_mem1_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(29), .msg = "rx_vlan_tag_memory_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(30), .msg = "Tx_UP_mapping_config_mem_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
{ /* sentinel */ }
};
static const struct hclge_hw_error hclge_ppp_pf_abnormal_int[] = {
- { .int_msk = BIT(0), .msg = "tx_vlan_tag_err" },
- { .int_msk = BIT(1), .msg = "rss_list_tc_unassigned_queue_err" },
+ { .int_msk = BIT(0), .msg = "tx_vlan_tag_err",
+ .reset_level = HNAE3_NONE_RESET },
+ { .int_msk = BIT(1), .msg = "rss_list_tc_unassigned_queue_err",
+ .reset_level = HNAE3_NONE_RESET },
{ /* sentinel */ }
};
static const struct hclge_hw_error hclge_ppp_mpf_abnormal_int_st3[] = {
- { .int_msk = BIT(0), .msg = "hfs_fifo_mem_ecc_mbit_err" },
- { .int_msk = BIT(1), .msg = "rslt_descr_fifo_mem_ecc_mbit_err" },
- { .int_msk = BIT(2), .msg = "tx_vlan_tag_mem_ecc_mbit_err" },
- { .int_msk = BIT(3), .msg = "FD_CN0_memory_ecc_mbit_err" },
- { .int_msk = BIT(4), .msg = "FD_CN1_memory_ecc_mbit_err" },
- { .int_msk = BIT(5), .msg = "GRO_AD_memory_ecc_mbit_err" },
+ { .int_msk = BIT(0), .msg = "hfs_fifo_mem_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(1), .msg = "rslt_descr_fifo_mem_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(2), .msg = "tx_vlan_tag_mem_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(3), .msg = "FD_CN0_memory_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(4), .msg = "FD_CN1_memory_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(5), .msg = "GRO_AD_memory_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
{ /* sentinel */ }
};
static const struct hclge_hw_error hclge_tm_sch_rint[] = {
- { .int_msk = BIT(1), .msg = "tm_sch_ecc_mbit_err" },
- { .int_msk = BIT(2), .msg = "tm_sch_port_shap_sub_fifo_wr_err" },
- { .int_msk = BIT(3), .msg = "tm_sch_port_shap_sub_fifo_rd_err" },
- { .int_msk = BIT(4), .msg = "tm_sch_pg_pshap_sub_fifo_wr_err" },
- { .int_msk = BIT(5), .msg = "tm_sch_pg_pshap_sub_fifo_rd_err" },
- { .int_msk = BIT(6), .msg = "tm_sch_pg_cshap_sub_fifo_wr_err" },
- { .int_msk = BIT(7), .msg = "tm_sch_pg_cshap_sub_fifo_rd_err" },
- { .int_msk = BIT(8), .msg = "tm_sch_pri_pshap_sub_fifo_wr_err" },
- { .int_msk = BIT(9), .msg = "tm_sch_pri_pshap_sub_fifo_rd_err" },
- { .int_msk = BIT(10), .msg = "tm_sch_pri_cshap_sub_fifo_wr_err" },
- { .int_msk = BIT(11), .msg = "tm_sch_pri_cshap_sub_fifo_rd_err" },
- { .int_msk = BIT(12),
- .msg = "tm_sch_port_shap_offset_fifo_wr_err" },
- { .int_msk = BIT(13),
- .msg = "tm_sch_port_shap_offset_fifo_rd_err" },
- { .int_msk = BIT(14),
- .msg = "tm_sch_pg_pshap_offset_fifo_wr_err" },
- { .int_msk = BIT(15),
- .msg = "tm_sch_pg_pshap_offset_fifo_rd_err" },
- { .int_msk = BIT(16),
- .msg = "tm_sch_pg_cshap_offset_fifo_wr_err" },
- { .int_msk = BIT(17),
- .msg = "tm_sch_pg_cshap_offset_fifo_rd_err" },
- { .int_msk = BIT(18),
- .msg = "tm_sch_pri_pshap_offset_fifo_wr_err" },
- { .int_msk = BIT(19),
- .msg = "tm_sch_pri_pshap_offset_fifo_rd_err" },
- { .int_msk = BIT(20),
- .msg = "tm_sch_pri_cshap_offset_fifo_wr_err" },
- { .int_msk = BIT(21),
- .msg = "tm_sch_pri_cshap_offset_fifo_rd_err" },
- { .int_msk = BIT(22), .msg = "tm_sch_rq_fifo_wr_err" },
- { .int_msk = BIT(23), .msg = "tm_sch_rq_fifo_rd_err" },
- { .int_msk = BIT(24), .msg = "tm_sch_nq_fifo_wr_err" },
- { .int_msk = BIT(25), .msg = "tm_sch_nq_fifo_rd_err" },
- { .int_msk = BIT(26), .msg = "tm_sch_roce_up_fifo_wr_err" },
- { .int_msk = BIT(27), .msg = "tm_sch_roce_up_fifo_rd_err" },
- { .int_msk = BIT(28), .msg = "tm_sch_rcb_byte_fifo_wr_err" },
- { .int_msk = BIT(29), .msg = "tm_sch_rcb_byte_fifo_rd_err" },
- { .int_msk = BIT(30), .msg = "tm_sch_ssu_byte_fifo_wr_err" },
- { .int_msk = BIT(31), .msg = "tm_sch_ssu_byte_fifo_rd_err" },
+ { .int_msk = BIT(1), .msg = "tm_sch_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(2), .msg = "tm_sch_port_shap_sub_fifo_wr_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(3), .msg = "tm_sch_port_shap_sub_fifo_rd_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(4), .msg = "tm_sch_pg_pshap_sub_fifo_wr_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(5), .msg = "tm_sch_pg_pshap_sub_fifo_rd_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(6), .msg = "tm_sch_pg_cshap_sub_fifo_wr_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(7), .msg = "tm_sch_pg_cshap_sub_fifo_rd_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(8), .msg = "tm_sch_pri_pshap_sub_fifo_wr_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(9), .msg = "tm_sch_pri_pshap_sub_fifo_rd_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(10), .msg = "tm_sch_pri_cshap_sub_fifo_wr_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(11), .msg = "tm_sch_pri_cshap_sub_fifo_rd_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(12), .msg = "tm_sch_port_shap_offset_fifo_wr_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(13), .msg = "tm_sch_port_shap_offset_fifo_rd_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(14), .msg = "tm_sch_pg_pshap_offset_fifo_wr_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(15), .msg = "tm_sch_pg_pshap_offset_fifo_rd_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(16), .msg = "tm_sch_pg_cshap_offset_fifo_wr_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(17), .msg = "tm_sch_pg_cshap_offset_fifo_rd_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(18), .msg = "tm_sch_pri_pshap_offset_fifo_wr_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(19), .msg = "tm_sch_pri_pshap_offset_fifo_rd_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(20), .msg = "tm_sch_pri_cshap_offset_fifo_wr_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(21), .msg = "tm_sch_pri_cshap_offset_fifo_rd_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(22), .msg = "tm_sch_rq_fifo_wr_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(23), .msg = "tm_sch_rq_fifo_rd_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(24), .msg = "tm_sch_nq_fifo_wr_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(25), .msg = "tm_sch_nq_fifo_rd_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(26), .msg = "tm_sch_roce_up_fifo_wr_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(27), .msg = "tm_sch_roce_up_fifo_rd_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(28), .msg = "tm_sch_rcb_byte_fifo_wr_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(29), .msg = "tm_sch_rcb_byte_fifo_rd_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(30), .msg = "tm_sch_ssu_byte_fifo_wr_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(31), .msg = "tm_sch_ssu_byte_fifo_rd_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
{ /* sentinel */ }
};
static const struct hclge_hw_error hclge_qcn_fifo_rint[] = {
- { .int_msk = BIT(0), .msg = "qcn_shap_gp0_sch_fifo_rd_err" },
- { .int_msk = BIT(1), .msg = "qcn_shap_gp0_sch_fifo_wr_err" },
- { .int_msk = BIT(2), .msg = "qcn_shap_gp1_sch_fifo_rd_err" },
- { .int_msk = BIT(3), .msg = "qcn_shap_gp1_sch_fifo_wr_err" },
- { .int_msk = BIT(4), .msg = "qcn_shap_gp2_sch_fifo_rd_err" },
- { .int_msk = BIT(5), .msg = "qcn_shap_gp2_sch_fifo_wr_err" },
- { .int_msk = BIT(6), .msg = "qcn_shap_gp3_sch_fifo_rd_err" },
- { .int_msk = BIT(7), .msg = "qcn_shap_gp3_sch_fifo_wr_err" },
- { .int_msk = BIT(8), .msg = "qcn_shap_gp0_offset_fifo_rd_err" },
- { .int_msk = BIT(9), .msg = "qcn_shap_gp0_offset_fifo_wr_err" },
- { .int_msk = BIT(10), .msg = "qcn_shap_gp1_offset_fifo_rd_err" },
- { .int_msk = BIT(11), .msg = "qcn_shap_gp1_offset_fifo_wr_err" },
- { .int_msk = BIT(12), .msg = "qcn_shap_gp2_offset_fifo_rd_err" },
- { .int_msk = BIT(13), .msg = "qcn_shap_gp2_offset_fifo_wr_err" },
- { .int_msk = BIT(14), .msg = "qcn_shap_gp3_offset_fifo_rd_err" },
- { .int_msk = BIT(15), .msg = "qcn_shap_gp3_offset_fifo_wr_err" },
- { .int_msk = BIT(16), .msg = "qcn_byte_info_fifo_rd_err" },
- { .int_msk = BIT(17), .msg = "qcn_byte_info_fifo_wr_err" },
+ { .int_msk = BIT(0), .msg = "qcn_shap_gp0_sch_fifo_rd_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(1), .msg = "qcn_shap_gp0_sch_fifo_wr_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(2), .msg = "qcn_shap_gp1_sch_fifo_rd_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(3), .msg = "qcn_shap_gp1_sch_fifo_wr_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(4), .msg = "qcn_shap_gp2_sch_fifo_rd_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(5), .msg = "qcn_shap_gp2_sch_fifo_wr_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(6), .msg = "qcn_shap_gp3_sch_fifo_rd_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(7), .msg = "qcn_shap_gp3_sch_fifo_wr_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(8), .msg = "qcn_shap_gp0_offset_fifo_rd_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(9), .msg = "qcn_shap_gp0_offset_fifo_wr_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(10), .msg = "qcn_shap_gp1_offset_fifo_rd_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(11), .msg = "qcn_shap_gp1_offset_fifo_wr_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(12), .msg = "qcn_shap_gp2_offset_fifo_rd_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(13), .msg = "qcn_shap_gp2_offset_fifo_wr_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(14), .msg = "qcn_shap_gp3_offset_fifo_rd_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(15), .msg = "qcn_shap_gp3_offset_fifo_wr_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(16), .msg = "qcn_byte_info_fifo_rd_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(17), .msg = "qcn_byte_info_fifo_wr_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
{ /* sentinel */ }
};
static const struct hclge_hw_error hclge_qcn_ecc_rint[] = {
- { .int_msk = BIT(1), .msg = "qcn_byte_mem_ecc_mbit_err" },
- { .int_msk = BIT(3), .msg = "qcn_time_mem_ecc_mbit_err" },
- { .int_msk = BIT(5), .msg = "qcn_fb_mem_ecc_mbit_err" },
- { .int_msk = BIT(7), .msg = "qcn_link_mem_ecc_mbit_err" },
- { .int_msk = BIT(9), .msg = "qcn_rate_mem_ecc_mbit_err" },
- { .int_msk = BIT(11), .msg = "qcn_tmplt_mem_ecc_mbit_err" },
- { .int_msk = BIT(13), .msg = "qcn_shap_cfg_mem_ecc_mbit_err" },
- { .int_msk = BIT(15), .msg = "qcn_gp0_barrel_mem_ecc_mbit_err" },
- { .int_msk = BIT(17), .msg = "qcn_gp1_barrel_mem_ecc_mbit_err" },
- { .int_msk = BIT(19), .msg = "qcn_gp2_barrel_mem_ecc_mbit_err" },
- { .int_msk = BIT(21), .msg = "qcn_gp3_barral_mem_ecc_mbit_err" },
+ { .int_msk = BIT(1), .msg = "qcn_byte_mem_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(3), .msg = "qcn_time_mem_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(5), .msg = "qcn_fb_mem_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(7), .msg = "qcn_link_mem_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(9), .msg = "qcn_rate_mem_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(11), .msg = "qcn_tmplt_mem_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(13), .msg = "qcn_shap_cfg_mem_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(15), .msg = "qcn_gp0_barrel_mem_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(17), .msg = "qcn_gp1_barrel_mem_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(19), .msg = "qcn_gp2_barrel_mem_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(21), .msg = "qcn_gp3_barral_mem_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
{ /* sentinel */ }
};
static const struct hclge_hw_error hclge_mac_afifo_tnl_int[] = {
- { .int_msk = BIT(0), .msg = "egu_cge_afifo_ecc_1bit_err" },
- { .int_msk = BIT(1), .msg = "egu_cge_afifo_ecc_mbit_err" },
- { .int_msk = BIT(2), .msg = "egu_lge_afifo_ecc_1bit_err" },
- { .int_msk = BIT(3), .msg = "egu_lge_afifo_ecc_mbit_err" },
- { .int_msk = BIT(4), .msg = "cge_igu_afifo_ecc_1bit_err" },
- { .int_msk = BIT(5), .msg = "cge_igu_afifo_ecc_mbit_err" },
- { .int_msk = BIT(6), .msg = "lge_igu_afifo_ecc_1bit_err" },
- { .int_msk = BIT(7), .msg = "lge_igu_afifo_ecc_mbit_err" },
- { .int_msk = BIT(8), .msg = "cge_igu_afifo_overflow_err" },
- { .int_msk = BIT(9), .msg = "lge_igu_afifo_overflow_err" },
- { .int_msk = BIT(10), .msg = "egu_cge_afifo_underrun_err" },
- { .int_msk = BIT(11), .msg = "egu_lge_afifo_underrun_err" },
- { .int_msk = BIT(12), .msg = "egu_ge_afifo_underrun_err" },
- { .int_msk = BIT(13), .msg = "ge_igu_afifo_overflow_err" },
+ { .int_msk = BIT(0), .msg = "egu_cge_afifo_ecc_1bit_err",
+ .reset_level = HNAE3_NONE_RESET },
+ { .int_msk = BIT(1), .msg = "egu_cge_afifo_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(2), .msg = "egu_lge_afifo_ecc_1bit_err",
+ .reset_level = HNAE3_NONE_RESET },
+ { .int_msk = BIT(3), .msg = "egu_lge_afifo_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(4), .msg = "cge_igu_afifo_ecc_1bit_err",
+ .reset_level = HNAE3_NONE_RESET },
+ { .int_msk = BIT(5), .msg = "cge_igu_afifo_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(6), .msg = "lge_igu_afifo_ecc_1bit_err",
+ .reset_level = HNAE3_NONE_RESET },
+ { .int_msk = BIT(7), .msg = "lge_igu_afifo_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(8), .msg = "cge_igu_afifo_overflow_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(9), .msg = "lge_igu_afifo_overflow_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(10), .msg = "egu_cge_afifo_underrun_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(11), .msg = "egu_lge_afifo_underrun_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(12), .msg = "egu_ge_afifo_underrun_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(13), .msg = "ge_igu_afifo_overflow_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
{ /* sentinel */ }
};
static const struct hclge_hw_error hclge_ppu_mpf_abnormal_int_st2[] = {
- { .int_msk = BIT(13), .msg = "rpu_rx_pkt_bit32_ecc_mbit_err" },
- { .int_msk = BIT(14), .msg = "rpu_rx_pkt_bit33_ecc_mbit_err" },
- { .int_msk = BIT(15), .msg = "rpu_rx_pkt_bit34_ecc_mbit_err" },
- { .int_msk = BIT(16), .msg = "rpu_rx_pkt_bit35_ecc_mbit_err" },
- { .int_msk = BIT(17), .msg = "rcb_tx_ring_ecc_mbit_err" },
- { .int_msk = BIT(18), .msg = "rcb_rx_ring_ecc_mbit_err" },
- { .int_msk = BIT(19), .msg = "rcb_tx_fbd_ecc_mbit_err" },
- { .int_msk = BIT(20), .msg = "rcb_rx_ebd_ecc_mbit_err" },
- { .int_msk = BIT(21), .msg = "rcb_tso_info_ecc_mbit_err" },
- { .int_msk = BIT(22), .msg = "rcb_tx_int_info_ecc_mbit_err" },
- { .int_msk = BIT(23), .msg = "rcb_rx_int_info_ecc_mbit_err" },
- { .int_msk = BIT(24), .msg = "tpu_tx_pkt_0_ecc_mbit_err" },
- { .int_msk = BIT(25), .msg = "tpu_tx_pkt_1_ecc_mbit_err" },
- { .int_msk = BIT(26), .msg = "rd_bus_err" },
- { .int_msk = BIT(27), .msg = "wr_bus_err" },
- { .int_msk = BIT(28), .msg = "reg_search_miss" },
- { .int_msk = BIT(29), .msg = "rx_q_search_miss" },
- { .int_msk = BIT(30), .msg = "ooo_ecc_err_detect" },
- { .int_msk = BIT(31), .msg = "ooo_ecc_err_multpl" },
+ { .int_msk = BIT(13), .msg = "rpu_rx_pkt_bit32_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(14), .msg = "rpu_rx_pkt_bit33_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(15), .msg = "rpu_rx_pkt_bit34_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(16), .msg = "rpu_rx_pkt_bit35_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(17), .msg = "rcb_tx_ring_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(18), .msg = "rcb_rx_ring_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(19), .msg = "rcb_tx_fbd_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(20), .msg = "rcb_rx_ebd_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(21), .msg = "rcb_tso_info_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(22), .msg = "rcb_tx_int_info_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(23), .msg = "rcb_rx_int_info_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(24), .msg = "tpu_tx_pkt_0_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(25), .msg = "tpu_tx_pkt_1_ecc_mbit_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(26), .msg = "rd_bus_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(27), .msg = "wr_bus_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(28), .msg = "reg_search_miss",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(29), .msg = "rx_q_search_miss",
+ .reset_level = HNAE3_NONE_RESET },
+ { .int_msk = BIT(30), .msg = "ooo_ecc_err_detect",
+ .reset_level = HNAE3_NONE_RESET },
+ { .int_msk = BIT(31), .msg = "ooo_ecc_err_multpl",
+ .reset_level = HNAE3_GLOBAL_RESET },
{ /* sentinel */ }
};
static const struct hclge_hw_error hclge_ppu_mpf_abnormal_int_st3[] = {
- { .int_msk = BIT(4), .msg = "gro_bd_ecc_mbit_err" },
- { .int_msk = BIT(5), .msg = "gro_context_ecc_mbit_err" },
- { .int_msk = BIT(6), .msg = "rx_stash_cfg_ecc_mbit_err" },
- { .int_msk = BIT(7), .msg = "axi_rd_fbd_ecc_mbit_err" },
+ { .int_msk = BIT(4), .msg = "gro_bd_ecc_mbit_err",
+ .reset_level = HNAE3_CORE_RESET },
+ { .int_msk = BIT(5), .msg = "gro_context_ecc_mbit_err",
+ .reset_level = HNAE3_CORE_RESET },
+ { .int_msk = BIT(6), .msg = "rx_stash_cfg_ecc_mbit_err",
+ .reset_level = HNAE3_CORE_RESET },
+ { .int_msk = BIT(7), .msg = "axi_rd_fbd_ecc_mbit_err",
+ .reset_level = HNAE3_CORE_RESET },
{ /* sentinel */ }
};
static const struct hclge_hw_error hclge_ppu_pf_abnormal_int[] = {
- { .int_msk = BIT(0), .msg = "over_8bd_no_fe" },
- { .int_msk = BIT(1), .msg = "tso_mss_cmp_min_err" },
- { .int_msk = BIT(2), .msg = "tso_mss_cmp_max_err" },
- { .int_msk = BIT(3), .msg = "tx_rd_fbd_poison" },
- { .int_msk = BIT(4), .msg = "rx_rd_ebd_poison" },
- { .int_msk = BIT(5), .msg = "buf_wait_timeout" },
+ { .int_msk = BIT(0), .msg = "over_8bd_no_fe",
+ .reset_level = HNAE3_FUNC_RESET },
+ { .int_msk = BIT(1), .msg = "tso_mss_cmp_min_err",
+ .reset_level = HNAE3_NONE_RESET },
+ { .int_msk = BIT(2), .msg = "tso_mss_cmp_max_err",
+ .reset_level = HNAE3_NONE_RESET },
+ { .int_msk = BIT(3), .msg = "tx_rd_fbd_poison",
+ .reset_level = HNAE3_FUNC_RESET },
+ { .int_msk = BIT(4), .msg = "rx_rd_ebd_poison",
+ .reset_level = HNAE3_FUNC_RESET },
+ { .int_msk = BIT(5), .msg = "buf_wait_timeout",
+ .reset_level = HNAE3_NONE_RESET },
{ /* sentinel */ }
};
static const struct hclge_hw_error hclge_ssu_com_err_int[] = {
- { .int_msk = BIT(0), .msg = "buf_sum_err" },
- { .int_msk = BIT(1), .msg = "ppp_mb_num_err" },
- { .int_msk = BIT(2), .msg = "ppp_mbid_err" },
- { .int_msk = BIT(3), .msg = "ppp_rlt_mac_err" },
- { .int_msk = BIT(4), .msg = "ppp_rlt_host_err" },
- { .int_msk = BIT(5), .msg = "cks_edit_position_err" },
- { .int_msk = BIT(6), .msg = "cks_edit_condition_err" },
- { .int_msk = BIT(7), .msg = "vlan_edit_condition_err" },
- { .int_msk = BIT(8), .msg = "vlan_num_ot_err" },
- { .int_msk = BIT(9), .msg = "vlan_num_in_err" },
+ { .int_msk = BIT(0), .msg = "buf_sum_err",
+ .reset_level = HNAE3_NONE_RESET },
+ { .int_msk = BIT(1), .msg = "ppp_mb_num_err",
+ .reset_level = HNAE3_NONE_RESET },
+ { .int_msk = BIT(2), .msg = "ppp_mbid_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(3), .msg = "ppp_rlt_mac_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(4), .msg = "ppp_rlt_host_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(5), .msg = "cks_edit_position_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(6), .msg = "cks_edit_condition_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(7), .msg = "vlan_edit_condition_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(8), .msg = "vlan_num_ot_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(9), .msg = "vlan_num_in_err",
+ .reset_level = HNAE3_GLOBAL_RESET },
{ /* sentinel */ }
};
#define HCLGE_SSU_MEM_ECC_ERR(x) \
- { .int_msk = BIT(x), .msg = "ssu_mem" #x "_ecc_mbit_err" }
+ { .int_msk = BIT(x), .msg = "ssu_mem" #x "_ecc_mbit_err", \
+ .reset_level = HNAE3_GLOBAL_RESET }
static const struct hclge_hw_error hclge_ssu_mem_ecc_err_int[] = {
HCLGE_SSU_MEM_ECC_ERR(0),
};
static const struct hclge_hw_error hclge_ssu_port_based_err_int[] = {
- { .int_msk = BIT(0), .msg = "roc_pkt_without_key_port" },
- { .int_msk = BIT(1), .msg = "tpu_pkt_without_key_port" },
- { .int_msk = BIT(2), .msg = "igu_pkt_without_key_port" },
- { .int_msk = BIT(3), .msg = "roc_eof_mis_match_port" },
- { .int_msk = BIT(4), .msg = "tpu_eof_mis_match_port" },
- { .int_msk = BIT(5), .msg = "igu_eof_mis_match_port" },
- { .int_msk = BIT(6), .msg = "roc_sof_mis_match_port" },
- { .int_msk = BIT(7), .msg = "tpu_sof_mis_match_port" },
- { .int_msk = BIT(8), .msg = "igu_sof_mis_match_port" },
- { .int_msk = BIT(11), .msg = "ets_rd_int_rx_port" },
- { .int_msk = BIT(12), .msg = "ets_wr_int_rx_port" },
- { .int_msk = BIT(13), .msg = "ets_rd_int_tx_port" },
- { .int_msk = BIT(14), .msg = "ets_wr_int_tx_port" },
+ { .int_msk = BIT(0), .msg = "roc_pkt_without_key_port",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(1), .msg = "tpu_pkt_without_key_port",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(2), .msg = "igu_pkt_without_key_port",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(3), .msg = "roc_eof_mis_match_port",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(4), .msg = "tpu_eof_mis_match_port",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(5), .msg = "igu_eof_mis_match_port",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(6), .msg = "roc_sof_mis_match_port",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(7), .msg = "tpu_sof_mis_match_port",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(8), .msg = "igu_sof_mis_match_port",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(11), .msg = "ets_rd_int_rx_port",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(12), .msg = "ets_wr_int_rx_port",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(13), .msg = "ets_rd_int_tx_port",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(14), .msg = "ets_wr_int_tx_port",
+ .reset_level = HNAE3_GLOBAL_RESET },
{ /* sentinel */ }
};
static const struct hclge_hw_error hclge_ssu_fifo_overflow_int[] = {
- { .int_msk = BIT(0), .msg = "ig_mac_inf_int" },
- { .int_msk = BIT(1), .msg = "ig_host_inf_int" },
- { .int_msk = BIT(2), .msg = "ig_roc_buf_int" },
- { .int_msk = BIT(3), .msg = "ig_host_data_fifo_int" },
- { .int_msk = BIT(4), .msg = "ig_host_key_fifo_int" },
- { .int_msk = BIT(5), .msg = "tx_qcn_fifo_int" },
- { .int_msk = BIT(6), .msg = "rx_qcn_fifo_int" },
- { .int_msk = BIT(7), .msg = "tx_pf_rd_fifo_int" },
- { .int_msk = BIT(8), .msg = "rx_pf_rd_fifo_int" },
- { .int_msk = BIT(9), .msg = "qm_eof_fifo_int" },
- { .int_msk = BIT(10), .msg = "mb_rlt_fifo_int" },
- { .int_msk = BIT(11), .msg = "dup_uncopy_fifo_int" },
- { .int_msk = BIT(12), .msg = "dup_cnt_rd_fifo_int" },
- { .int_msk = BIT(13), .msg = "dup_cnt_drop_fifo_int" },
- { .int_msk = BIT(14), .msg = "dup_cnt_wrb_fifo_int" },
- { .int_msk = BIT(15), .msg = "host_cmd_fifo_int" },
- { .int_msk = BIT(16), .msg = "mac_cmd_fifo_int" },
- { .int_msk = BIT(17), .msg = "host_cmd_bitmap_empty_int" },
- { .int_msk = BIT(18), .msg = "mac_cmd_bitmap_empty_int" },
- { .int_msk = BIT(19), .msg = "dup_bitmap_empty_int" },
- { .int_msk = BIT(20), .msg = "out_queue_bitmap_empty_int" },
- { .int_msk = BIT(21), .msg = "bank2_bitmap_empty_int" },
- { .int_msk = BIT(22), .msg = "bank1_bitmap_empty_int" },
- { .int_msk = BIT(23), .msg = "bank0_bitmap_empty_int" },
+ { .int_msk = BIT(0), .msg = "ig_mac_inf_int",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(1), .msg = "ig_host_inf_int",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(2), .msg = "ig_roc_buf_int",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(3), .msg = "ig_host_data_fifo_int",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(4), .msg = "ig_host_key_fifo_int",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(5), .msg = "tx_qcn_fifo_int",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(6), .msg = "rx_qcn_fifo_int",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(7), .msg = "tx_pf_rd_fifo_int",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(8), .msg = "rx_pf_rd_fifo_int",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(9), .msg = "qm_eof_fifo_int",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(10), .msg = "mb_rlt_fifo_int",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(11), .msg = "dup_uncopy_fifo_int",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(12), .msg = "dup_cnt_rd_fifo_int",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(13), .msg = "dup_cnt_drop_fifo_int",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(14), .msg = "dup_cnt_wrb_fifo_int",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(15), .msg = "host_cmd_fifo_int",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(16), .msg = "mac_cmd_fifo_int",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(17), .msg = "host_cmd_bitmap_empty_int",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(18), .msg = "mac_cmd_bitmap_empty_int",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(19), .msg = "dup_bitmap_empty_int",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(20), .msg = "out_queue_bitmap_empty_int",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(21), .msg = "bank2_bitmap_empty_int",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(22), .msg = "bank1_bitmap_empty_int",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(23), .msg = "bank0_bitmap_empty_int",
+ .reset_level = HNAE3_GLOBAL_RESET },
{ /* sentinel */ }
};
static const struct hclge_hw_error hclge_ssu_ets_tcg_int[] = {
- { .int_msk = BIT(0), .msg = "ets_rd_int_rx_tcg" },
- { .int_msk = BIT(1), .msg = "ets_wr_int_rx_tcg" },
- { .int_msk = BIT(2), .msg = "ets_rd_int_tx_tcg" },
- { .int_msk = BIT(3), .msg = "ets_wr_int_tx_tcg" },
+ { .int_msk = BIT(0), .msg = "ets_rd_int_rx_tcg",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(1), .msg = "ets_wr_int_rx_tcg",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(2), .msg = "ets_rd_int_tx_tcg",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(3), .msg = "ets_wr_int_tx_tcg",
+ .reset_level = HNAE3_GLOBAL_RESET },
{ /* sentinel */ }
};
static const struct hclge_hw_error hclge_ssu_port_based_pf_int[] = {
- { .int_msk = BIT(0), .msg = "roc_pkt_without_key_port" },
- { .int_msk = BIT(9), .msg = "low_water_line_err_port" },
- { .int_msk = BIT(10), .msg = "hi_water_line_err_port" },
+ { .int_msk = BIT(0), .msg = "roc_pkt_without_key_port",
+ .reset_level = HNAE3_GLOBAL_RESET },
+ { .int_msk = BIT(9), .msg = "low_water_line_err_port",
+ .reset_level = HNAE3_NONE_RESET },
+ { .int_msk = BIT(10), .msg = "hi_water_line_err_port",
+ .reset_level = HNAE3_GLOBAL_RESET },
{ /* sentinel */ }
};
{ /* sentinel */ }
};
-static void hclge_log_error(struct device *dev, char *reg,
- const struct hclge_hw_error *err,
- u32 err_sts)
+static enum hnae3_reset_type hclge_log_error(struct device *dev, char *reg,
+ const struct hclge_hw_error *err,
+ u32 err_sts)
{
+ enum hnae3_reset_type reset_level = HNAE3_FUNC_RESET;
+ bool need_reset = false;
+
while (err->msg) {
- if (err->int_msk & err_sts)
+ if (err->int_msk & err_sts) {
dev_warn(dev, "%s %s found [error status=0x%x]\n",
reg, err->msg, err_sts);
+ if (err->reset_level != HNAE3_NONE_RESET &&
+ err->reset_level >= reset_level) {
+ reset_level = err->reset_level;
+ need_reset = true;
+ }
+ }
err++;
}
+ if (need_reset)
+ return reset_level;
+ else
+ return HNAE3_NONE_RESET;
}
/* hclge_cmd_query_error: read the error information
int num)
{
struct hnae3_ae_dev *ae_dev = hdev->ae_dev;
+ enum hnae3_reset_type reset_level;
struct device *dev = &hdev->pdev->dev;
__le32 *desc_data;
u32 status;
/* log HNS common errors */
status = le32_to_cpu(desc[0].data[0]);
if (status) {
- hclge_log_error(dev, "IMP_TCM_ECC_INT_STS",
- &hclge_imp_tcm_ecc_int[0], status);
- HCLGE_SET_DEFAULT_RESET_REQUEST(HNAE3_GLOBAL_RESET);
+ reset_level = hclge_log_error(dev, "IMP_TCM_ECC_INT_STS",
+ &hclge_imp_tcm_ecc_int[0],
+ status);
+ HCLGE_SET_DEFAULT_RESET_REQUEST(reset_level);
}
status = le32_to_cpu(desc[0].data[1]);
if (status) {
- hclge_log_error(dev, "CMDQ_MEM_ECC_INT_STS",
- &hclge_cmdq_nic_mem_ecc_int[0], status);
- HCLGE_SET_DEFAULT_RESET_REQUEST(HNAE3_GLOBAL_RESET);
+ reset_level = hclge_log_error(dev, "CMDQ_MEM_ECC_INT_STS",
+ &hclge_cmdq_nic_mem_ecc_int[0],
+ status);
+ HCLGE_SET_DEFAULT_RESET_REQUEST(reset_level);
}
if ((le32_to_cpu(desc[0].data[2])) & BIT(0)) {
dev_warn(dev, "imp_rd_data_poison_err found\n");
- HCLGE_SET_DEFAULT_RESET_REQUEST(HNAE3_GLOBAL_RESET);
+ HCLGE_SET_DEFAULT_RESET_REQUEST(HNAE3_NONE_RESET);
}
status = le32_to_cpu(desc[0].data[3]);
if (status) {
- hclge_log_error(dev, "TQP_INT_ECC_INT_STS",
- &hclge_tqp_int_ecc_int[0], status);
- HCLGE_SET_DEFAULT_RESET_REQUEST(HNAE3_CORE_RESET);
+ reset_level = hclge_log_error(dev, "TQP_INT_ECC_INT_STS",
+ &hclge_tqp_int_ecc_int[0],
+ status);
+ HCLGE_SET_DEFAULT_RESET_REQUEST(reset_level);
}
status = le32_to_cpu(desc[0].data[4]);
if (status) {
- hclge_log_error(dev, "MSIX_ECC_INT_STS",
- &hclge_msix_sram_ecc_int[0], status);
- HCLGE_SET_DEFAULT_RESET_REQUEST(HNAE3_CORE_RESET);
+ reset_level = hclge_log_error(dev, "MSIX_ECC_INT_STS",
+ &hclge_msix_sram_ecc_int[0],
+ status);
+ HCLGE_SET_DEFAULT_RESET_REQUEST(reset_level);
}
/* log SSU(Storage Switch Unit) errors */
desc_data = (__le32 *)&desc[2];
status = le32_to_cpu(*(desc_data + 2));
if (status) {
- hclge_log_error(dev, "SSU_ECC_MULTI_BIT_INT_0",
- &hclge_ssu_mem_ecc_err_int[0], status);
- HCLGE_SET_DEFAULT_RESET_REQUEST(HNAE3_CORE_RESET);
+ reset_level = hclge_log_error(dev, "SSU_ECC_MULTI_BIT_INT_0",
+ &hclge_ssu_mem_ecc_err_int[0],
+ status);
+ HCLGE_SET_DEFAULT_RESET_REQUEST(reset_level);
}
status = le32_to_cpu(*(desc_data + 3)) & BIT(0);
if (status) {
dev_warn(dev, "SSU_ECC_MULTI_BIT_INT_1 ssu_mem32_ecc_mbit_err found [error status=0x%x]\n",
status);
- HCLGE_SET_DEFAULT_RESET_REQUEST(HNAE3_CORE_RESET);
+ HCLGE_SET_DEFAULT_RESET_REQUEST(HNAE3_GLOBAL_RESET);
}
status = le32_to_cpu(*(desc_data + 4)) & HCLGE_SSU_COMMON_ERR_INT_MASK;
if (status) {
- hclge_log_error(dev, "SSU_COMMON_ERR_INT",
- &hclge_ssu_com_err_int[0], status);
- HCLGE_SET_DEFAULT_RESET_REQUEST(HNAE3_GLOBAL_RESET);
+ reset_level = hclge_log_error(dev, "SSU_COMMON_ERR_INT",
+ &hclge_ssu_com_err_int[0],
+ status);
+ HCLGE_SET_DEFAULT_RESET_REQUEST(reset_level);
}
/* log IGU(Ingress Unit) errors */
desc_data = (__le32 *)&desc[3];
status = le32_to_cpu(*desc_data) & HCLGE_IGU_INT_MASK;
- if (status)
- hclge_log_error(dev, "IGU_INT_STS",
- &hclge_igu_int[0], status);
+ if (status) {
+ reset_level = hclge_log_error(dev, "IGU_INT_STS",
+ &hclge_igu_int[0], status);
+ HCLGE_SET_DEFAULT_RESET_REQUEST(reset_level);
+ }
/* log PPP(Programmable Packet Process) errors */
desc_data = (__le32 *)&desc[4];
status = le32_to_cpu(*(desc_data + 1));
- if (status)
- hclge_log_error(dev, "PPP_MPF_ABNORMAL_INT_ST1",
- &hclge_ppp_mpf_abnormal_int_st1[0], status);
+ if (status) {
+ reset_level =
+ hclge_log_error(dev, "PPP_MPF_ABNORMAL_INT_ST1",
+ &hclge_ppp_mpf_abnormal_int_st1[0],
+ status);
+ HCLGE_SET_DEFAULT_RESET_REQUEST(reset_level);
+ }
status = le32_to_cpu(*(desc_data + 3)) & HCLGE_PPP_MPF_INT_ST3_MASK;
- if (status)
- hclge_log_error(dev, "PPP_MPF_ABNORMAL_INT_ST3",
- &hclge_ppp_mpf_abnormal_int_st3[0], status);
+ if (status) {
+ reset_level =
+ hclge_log_error(dev, "PPP_MPF_ABNORMAL_INT_ST3",
+ &hclge_ppp_mpf_abnormal_int_st3[0],
+ status);
+ HCLGE_SET_DEFAULT_RESET_REQUEST(reset_level);
+ }
/* log PPU(RCB) errors */
desc_data = (__le32 *)&desc[5];
if (status) {
dev_warn(dev, "PPU_MPF_ABNORMAL_INT_ST1 %s found\n",
"rpu_rx_pkt_ecc_mbit_err");
- HCLGE_SET_DEFAULT_RESET_REQUEST(HNAE3_CORE_RESET);
+ HCLGE_SET_DEFAULT_RESET_REQUEST(HNAE3_GLOBAL_RESET);
}
status = le32_to_cpu(*(desc_data + 2));
if (status) {
- hclge_log_error(dev, "PPU_MPF_ABNORMAL_INT_ST2",
- &hclge_ppu_mpf_abnormal_int_st2[0], status);
- HCLGE_SET_DEFAULT_RESET_REQUEST(HNAE3_CORE_RESET);
+ reset_level =
+ hclge_log_error(dev, "PPU_MPF_ABNORMAL_INT_ST2",
+ &hclge_ppu_mpf_abnormal_int_st2[0],
+ status);
+ HCLGE_SET_DEFAULT_RESET_REQUEST(reset_level);
}
status = le32_to_cpu(*(desc_data + 3)) & HCLGE_PPU_MPF_INT_ST3_MASK;
if (status) {
- hclge_log_error(dev, "PPU_MPF_ABNORMAL_INT_ST3",
- &hclge_ppu_mpf_abnormal_int_st3[0], status);
- HCLGE_SET_DEFAULT_RESET_REQUEST(HNAE3_CORE_RESET);
+ reset_level =
+ hclge_log_error(dev, "PPU_MPF_ABNORMAL_INT_ST3",
+ &hclge_ppu_mpf_abnormal_int_st3[0],
+ status);
+ HCLGE_SET_DEFAULT_RESET_REQUEST(reset_level);
}
/* log TM(Traffic Manager) errors */
desc_data = (__le32 *)&desc[6];
status = le32_to_cpu(*desc_data);
if (status) {
- hclge_log_error(dev, "TM_SCH_RINT",
- &hclge_tm_sch_rint[0], status);
- HCLGE_SET_DEFAULT_RESET_REQUEST(HNAE3_CORE_RESET);
+ reset_level = hclge_log_error(dev, "TM_SCH_RINT",
+ &hclge_tm_sch_rint[0], status);
+ HCLGE_SET_DEFAULT_RESET_REQUEST(reset_level);
}
/* log QCN(Quantized Congestion Control) errors */
desc_data = (__le32 *)&desc[7];
status = le32_to_cpu(*desc_data) & HCLGE_QCN_FIFO_INT_MASK;
if (status) {
- hclge_log_error(dev, "QCN_FIFO_RINT",
- &hclge_qcn_fifo_rint[0], status);
- HCLGE_SET_DEFAULT_RESET_REQUEST(HNAE3_CORE_RESET);
+ reset_level = hclge_log_error(dev, "QCN_FIFO_RINT",
+ &hclge_qcn_fifo_rint[0], status);
+ HCLGE_SET_DEFAULT_RESET_REQUEST(reset_level);
}
status = le32_to_cpu(*(desc_data + 1)) & HCLGE_QCN_ECC_INT_MASK;
if (status) {
- hclge_log_error(dev, "QCN_ECC_RINT",
- &hclge_qcn_ecc_rint[0], status);
- HCLGE_SET_DEFAULT_RESET_REQUEST(HNAE3_CORE_RESET);
+ reset_level = hclge_log_error(dev, "QCN_ECC_RINT",
+ &hclge_qcn_ecc_rint[0],
+ status);
+ HCLGE_SET_DEFAULT_RESET_REQUEST(reset_level);
}
/* log NCSI errors */
desc_data = (__le32 *)&desc[9];
status = le32_to_cpu(*desc_data) & HCLGE_NCSI_ECC_INT_MASK;
if (status) {
- hclge_log_error(dev, "NCSI_ECC_INT_RPT",
- &hclge_ncsi_err_int[0], status);
- HCLGE_SET_DEFAULT_RESET_REQUEST(HNAE3_CORE_RESET);
+ reset_level = hclge_log_error(dev, "NCSI_ECC_INT_RPT",
+ &hclge_ncsi_err_int[0], status);
+ HCLGE_SET_DEFAULT_RESET_REQUEST(reset_level);
}
/* clear all main PF RAS errors */
{
struct hnae3_ae_dev *ae_dev = hdev->ae_dev;
struct device *dev = &hdev->pdev->dev;
+ enum hnae3_reset_type reset_level;
__le32 *desc_data;
u32 status;
int ret;
/* log SSU(Storage Switch Unit) errors */
status = le32_to_cpu(desc[0].data[0]);
if (status) {
- hclge_log_error(dev, "SSU_PORT_BASED_ERR_INT",
- &hclge_ssu_port_based_err_int[0], status);
- HCLGE_SET_DEFAULT_RESET_REQUEST(HNAE3_GLOBAL_RESET);
+ reset_level = hclge_log_error(dev, "SSU_PORT_BASED_ERR_INT",
+ &hclge_ssu_port_based_err_int[0],
+ status);
+ HCLGE_SET_DEFAULT_RESET_REQUEST(reset_level);
}
status = le32_to_cpu(desc[0].data[1]);
if (status) {
- hclge_log_error(dev, "SSU_FIFO_OVERFLOW_INT",
- &hclge_ssu_fifo_overflow_int[0], status);
- HCLGE_SET_DEFAULT_RESET_REQUEST(HNAE3_GLOBAL_RESET);
+ reset_level = hclge_log_error(dev, "SSU_FIFO_OVERFLOW_INT",
+ &hclge_ssu_fifo_overflow_int[0],
+ status);
+ HCLGE_SET_DEFAULT_RESET_REQUEST(reset_level);
}
status = le32_to_cpu(desc[0].data[2]);
if (status) {
- hclge_log_error(dev, "SSU_ETS_TCG_INT",
- &hclge_ssu_ets_tcg_int[0], status);
- HCLGE_SET_DEFAULT_RESET_REQUEST(HNAE3_GLOBAL_RESET);
+ reset_level = hclge_log_error(dev, "SSU_ETS_TCG_INT",
+ &hclge_ssu_ets_tcg_int[0],
+ status);
+ HCLGE_SET_DEFAULT_RESET_REQUEST(reset_level);
}
/* log IGU(Ingress Unit) EGU(Egress Unit) TNL errors */
desc_data = (__le32 *)&desc[1];
status = le32_to_cpu(*desc_data) & HCLGE_IGU_EGU_TNL_INT_MASK;
- if (status)
- hclge_log_error(dev, "IGU_EGU_TNL_INT_STS",
- &hclge_igu_egu_tnl_int[0], status);
+ if (status) {
+ reset_level = hclge_log_error(dev, "IGU_EGU_TNL_INT_STS",
+ &hclge_igu_egu_tnl_int[0],
+ status);
+ HCLGE_SET_DEFAULT_RESET_REQUEST(reset_level);
+ }
/* log PPU(RCB) errors */
desc_data = (__le32 *)&desc[3];
status = le32_to_cpu(*desc_data) & HCLGE_PPU_PF_INT_RAS_MASK;
- if (status)
- hclge_log_error(dev, "PPU_PF_ABNORMAL_INT_ST0",
- &hclge_ppu_pf_abnormal_int[0], status);
+ if (status) {
+ reset_level = hclge_log_error(dev, "PPU_PF_ABNORMAL_INT_ST0",
+ &hclge_ppu_pf_abnormal_int[0],
+ status);
+ HCLGE_SET_DEFAULT_RESET_REQUEST(reset_level);
+ }
/* clear all PF RAS errors */
hclge_cmd_reuse_desc(&desc[0], false);
{
struct device *dev = &hdev->pdev->dev;
u32 mpf_bd_num, pf_bd_num, bd_num;
+ enum hnae3_reset_type reset_level;
struct hclge_desc desc_bd;
struct hclge_desc *desc;
__le32 *desc_data;
- int ret = 0;
u32 status;
-
- /* set default handling */
- set_bit(HNAE3_FUNC_RESET, reset_requests);
+ int ret;
/* query the number of bds for the MSIx int status */
hclge_cmd_setup_basic_desc(&desc_bd, HCLGE_QUERY_MSIX_INT_STS_BD_NUM,
desc_data = (__le32 *)&desc[1];
status = le32_to_cpu(*desc_data);
if (status) {
- hclge_log_error(dev, "MAC_AFIFO_TNL_INT_R",
- &hclge_mac_afifo_tnl_int[0], status);
- set_bit(HNAE3_GLOBAL_RESET, reset_requests);
+ reset_level = hclge_log_error(dev, "MAC_AFIFO_TNL_INT_R",
+ &hclge_mac_afifo_tnl_int[0],
+ status);
+ set_bit(reset_level, reset_requests);
}
/* log PPU(RCB) MPF errors */
status = le32_to_cpu(*(desc_data + 2)) &
HCLGE_PPU_MPF_INT_ST2_MSIX_MASK;
if (status) {
- hclge_log_error(dev, "PPU_MPF_ABNORMAL_INT_ST2",
- &hclge_ppu_mpf_abnormal_int_st2[0], status);
- set_bit(HNAE3_CORE_RESET, reset_requests);
+ reset_level =
+ hclge_log_error(dev, "PPU_MPF_ABNORMAL_INT_ST2",
+ &hclge_ppu_mpf_abnormal_int_st2[0],
+ status);
+ set_bit(reset_level, reset_requests);
}
/* clear all main PF MSIx errors */
/* log SSU PF errors */
status = le32_to_cpu(desc[0].data[0]) & HCLGE_SSU_PORT_INT_MSIX_MASK;
if (status) {
- hclge_log_error(dev, "SSU_PORT_BASED_ERR_INT",
- &hclge_ssu_port_based_pf_int[0], status);
- set_bit(HNAE3_GLOBAL_RESET, reset_requests);
+ reset_level = hclge_log_error(dev, "SSU_PORT_BASED_ERR_INT",
+ &hclge_ssu_port_based_pf_int[0],
+ status);
+ set_bit(reset_level, reset_requests);
}
/* read and log PPP PF errors */
desc_data = (__le32 *)&desc[2];
status = le32_to_cpu(*desc_data);
- if (status)
- hclge_log_error(dev, "PPP_PF_ABNORMAL_INT_ST0",
- &hclge_ppp_pf_abnormal_int[0], status);
+ if (status) {
+ reset_level = hclge_log_error(dev, "PPP_PF_ABNORMAL_INT_ST0",
+ &hclge_ppp_pf_abnormal_int[0],
+ status);
+ set_bit(reset_level, reset_requests);
+ }
/* log PPU(RCB) PF errors */
desc_data = (__le32 *)&desc[3];
status = le32_to_cpu(*desc_data) & HCLGE_PPU_PF_INT_MSIX_MASK;
- if (status)
- hclge_log_error(dev, "PPU_PF_ABNORMAL_INT_ST",
- &hclge_ppu_pf_abnormal_int[0], status);
+ if (status) {
+ reset_level = hclge_log_error(dev, "PPU_PF_ABNORMAL_INT_ST",
+ &hclge_ppu_pf_abnormal_int[0],
+ status);
+ set_bit(reset_level, reset_requests);
+ }
/* clear all PF MSIx errors */
hclge_cmd_reuse_desc(&desc[0], false);
struct hclge_hw_error {
u32 int_msk;
const char *msg;
+ enum hnae3_reset_type reset_level;
};
int hclge_hw_error_set_state(struct hclge_dev *hdev, bool state);
#include <linux/pci.h>
#include <linux/platform_device.h>
#include <linux/if_vlan.h>
+#include <linux/crash_dump.h>
#include <net/rtnetlink.h>
#include "hclge_cmd.h"
#include "hclge_dcb.h"
static int hclge_set_mac_mtu(struct hclge_dev *hdev, int new_mps);
static int hclge_init_vlan_config(struct hclge_dev *hdev);
static int hclge_reset_ae_dev(struct hnae3_ae_dev *ae_dev);
+static bool hclge_get_hw_reset_stat(struct hnae3_handle *handle);
static int hclge_set_umv_space(struct hclge_dev *hdev, u16 space_size,
u16 *allocated_size, bool is_alloc);
return ret;
}
+static void hclge_init_kdump_kernel_config(struct hclge_dev *hdev)
+{
+#define HCLGE_MIN_TX_DESC 64
+#define HCLGE_MIN_RX_DESC 64
+
+ if (!is_kdump_kernel())
+ return;
+
+ dev_info(&hdev->pdev->dev,
+ "Running kdump kernel. Using minimal resources\n");
+
+ /* minimal queue pairs equals to the number of vports */
+ hdev->num_tqps = hdev->num_vmdq_vport + hdev->num_req_vfs + 1;
+ hdev->num_tx_desc = HCLGE_MIN_TX_DESC;
+ hdev->num_rx_desc = HCLGE_MIN_RX_DESC;
+}
+
static int hclge_configure(struct hclge_dev *hdev)
{
struct hclge_cfg cfg;
hdev->tx_sch_mode = HCLGE_FLAG_TC_BASE_SCH_MODE;
+ hclge_init_kdump_kernel_config(hdev);
+
return ret;
}
vport->back = hdev;
vport->vport_id = i;
vport->mps = HCLGE_MAC_DEFAULT_FRAME;
+ vport->port_base_vlan_cfg.state = HNAE3_PORT_BASE_VLAN_DISABLE;
+ vport->rxvlan_cfg.rx_vlan_offload_en = true;
INIT_LIST_HEAD(&vport->vlan_list);
INIT_LIST_HEAD(&vport->uc_mac_list);
INIT_LIST_HEAD(&vport->mc_mac_list);
return ret;
}
-static int hclge_get_tc_num(struct hclge_dev *hdev)
+static u32 hclge_get_tc_num(struct hclge_dev *hdev)
{
int i, cnt = 0;
return cnt;
}
-static int hclge_get_pfc_enalbe_num(struct hclge_dev *hdev)
-{
- int i, cnt = 0;
-
- for (i = 0; i < HCLGE_MAX_TC_NUM; i++)
- if (hdev->hw_tc_map & BIT(i) &&
- hdev->tm_info.hw_pfc_map & BIT(i))
- cnt++;
- return cnt;
-}
-
/* Get the number of pfc enabled TCs, which have private buffer */
static int hclge_get_pfc_priv_num(struct hclge_dev *hdev,
struct hclge_pkt_buf_alloc *buf_alloc)
struct hclge_pkt_buf_alloc *buf_alloc,
u32 rx_all)
{
- u32 shared_buf_min, shared_buf_tc, shared_std;
- int tc_num, pfc_enable_num;
+ u32 shared_buf_min, shared_buf_tc, shared_std, hi_thrd, lo_thrd;
+ u32 tc_num = hclge_get_tc_num(hdev);
u32 shared_buf, aligned_mps;
u32 rx_priv;
int i;
- tc_num = hclge_get_tc_num(hdev);
- pfc_enable_num = hclge_get_pfc_enalbe_num(hdev);
aligned_mps = roundup(hdev->mps, HCLGE_BUF_SIZE_UNIT);
if (hnae3_dev_dcb_supported(hdev))
shared_buf_min = aligned_mps + HCLGE_NON_DCB_ADDITIONAL_BUF
+ hdev->dv_buf_size;
- shared_buf_tc = pfc_enable_num * aligned_mps +
- (tc_num - pfc_enable_num) * aligned_mps / 2 +
- aligned_mps;
+ shared_buf_tc = tc_num * aligned_mps + aligned_mps;
shared_std = roundup(max_t(u32, shared_buf_min, shared_buf_tc),
HCLGE_BUF_SIZE_UNIT);
} else {
buf_alloc->s_buf.self.high = aligned_mps +
HCLGE_NON_DCB_ADDITIONAL_BUF;
- buf_alloc->s_buf.self.low =
- roundup(aligned_mps / 2, HCLGE_BUF_SIZE_UNIT);
+ buf_alloc->s_buf.self.low = aligned_mps;
+ }
+
+ if (hnae3_dev_dcb_supported(hdev)) {
+ if (tc_num)
+ hi_thrd = (shared_buf - hdev->dv_buf_size) / tc_num;
+ else
+ hi_thrd = shared_buf - hdev->dv_buf_size;
+
+ hi_thrd = max_t(u32, hi_thrd, 2 * aligned_mps);
+ hi_thrd = rounddown(hi_thrd, HCLGE_BUF_SIZE_UNIT);
+ lo_thrd = hi_thrd - aligned_mps / 2;
+ } else {
+ hi_thrd = aligned_mps + HCLGE_NON_DCB_ADDITIONAL_BUF;
+ lo_thrd = aligned_mps;
}
for (i = 0; i < HCLGE_MAX_TC_NUM; i++) {
- if ((hdev->hw_tc_map & BIT(i)) &&
- (hdev->tm_info.hw_pfc_map & BIT(i))) {
- buf_alloc->s_buf.tc_thrd[i].low = aligned_mps;
- buf_alloc->s_buf.tc_thrd[i].high = 2 * aligned_mps;
- } else {
- buf_alloc->s_buf.tc_thrd[i].low = 0;
- buf_alloc->s_buf.tc_thrd[i].high = aligned_mps;
- }
+ buf_alloc->s_buf.tc_thrd[i].low = lo_thrd;
+ buf_alloc->s_buf.tc_thrd[i].high = hi_thrd;
}
return true;
static void hclge_mbx_task_schedule(struct hclge_dev *hdev)
{
- if (!test_and_set_bit(HCLGE_STATE_MBX_SERVICE_SCHED, &hdev->state))
+ if (!test_bit(HCLGE_STATE_CMD_DISABLE, &hdev->state) &&
+ !test_and_set_bit(HCLGE_STATE_MBX_SERVICE_SCHED, &hdev->state))
schedule_work(&hdev->mbx_service_task);
}
return ret;
}
- if (!reset)
+ if (!reset || !test_bit(HCLGE_VPORT_STATE_ALIVE, &vport->state))
continue;
/* Inform VF to process the reset.
static void hclge_do_reset(struct hclge_dev *hdev)
{
+ struct hnae3_handle *handle = &hdev->vport[0].nic;
struct pci_dev *pdev = hdev->pdev;
u32 val;
+ if (hclge_get_hw_reset_stat(handle)) {
+ dev_info(&pdev->dev, "Hardware reset not finish\n");
+ dev_info(&pdev->dev, "func_rst_reg:0x%x, global_rst_reg:0x%x\n",
+ hclge_read_dev(&hdev->hw, HCLGE_FUN_RST_ING),
+ hclge_read_dev(&hdev->hw, HCLGE_GLOBAL_RESET_REG));
+ return;
+ }
+
switch (hdev->reset_type) {
case HNAE3_GLOBAL_RESET:
val = hclge_read_dev(&hdev->hw, HCLGE_GLOBAL_RESET_REG);
clear_bit(HNAE3_FLR_RESET, addr);
}
+ if (hdev->reset_type != HNAE3_NONE_RESET &&
+ rst_level < hdev->reset_type)
+ return HNAE3_NONE_RESET;
+
return rst_level;
}
hdev->last_reset_time = jiffies;
hdev->reset_fail_cnt = 0;
ae_dev->reset_type = HNAE3_NONE_RESET;
+ del_timer(&hdev->reset_timer);
return;
}
/* check if we just hit the duplicate */
- if (!ret)
- ret = -EINVAL;
+ if (!ret) {
+ dev_warn(&hdev->pdev->dev, "VF %d mac(%pM) exists\n",
+ vport->vport_id, addr);
+ return 0;
+ }
dev_err(&hdev->pdev->dev,
"PF failed to add unicast entry(%pM) in the MAC table\n",
return -EINVAL;
}
- if (!is_first && hclge_rm_uc_addr(handle, hdev->hw.mac.mac_addr))
+ if ((!is_first || is_kdump_kernel()) &&
+ hclge_rm_uc_addr(handle, hdev->hw.mac.mac_addr))
dev_warn(&hdev->pdev->dev,
"remove old uc mac address fail.\n");
return ret;
}
-int hclge_set_vlan_filter(struct hnae3_handle *handle, __be16 proto,
- u16 vlan_id, bool is_kill)
-{
- struct hclge_vport *vport = hclge_get_vport(handle);
- struct hclge_dev *hdev = vport->back;
-
- return hclge_set_vlan_filter_hw(hdev, proto, vport->vport_id, vlan_id,
- 0, is_kill);
-}
-
-static int hclge_set_vf_vlan_filter(struct hnae3_handle *handle, int vfid,
- u16 vlan, u8 qos, __be16 proto)
-{
- struct hclge_vport *vport = hclge_get_vport(handle);
- struct hclge_dev *hdev = vport->back;
-
- if ((vfid >= hdev->num_alloc_vfs) || (vlan > 4095) || (qos > 7))
- return -EINVAL;
- if (proto != htons(ETH_P_8021Q))
- return -EPROTONOSUPPORT;
-
- return hclge_set_vlan_filter_hw(hdev, proto, vfid, vlan, qos, false);
-}
-
static int hclge_set_vlan_tx_offload_cfg(struct hclge_vport *vport)
{
struct hclge_tx_vtag_cfg *vcfg = &vport->txvlan_cfg;
return status;
}
+static int hclge_vlan_offload_cfg(struct hclge_vport *vport,
+ u16 port_base_vlan_state,
+ u16 vlan_tag)
+{
+ int ret;
+
+ if (port_base_vlan_state == HNAE3_PORT_BASE_VLAN_DISABLE) {
+ vport->txvlan_cfg.accept_tag1 = true;
+ vport->txvlan_cfg.insert_tag1_en = false;
+ vport->txvlan_cfg.default_tag1 = 0;
+ } else {
+ vport->txvlan_cfg.accept_tag1 = false;
+ vport->txvlan_cfg.insert_tag1_en = true;
+ vport->txvlan_cfg.default_tag1 = vlan_tag;
+ }
+
+ vport->txvlan_cfg.accept_untag1 = true;
+
+ /* accept_tag2 and accept_untag2 are not supported on
+ * pdev revision(0x20), new revision support them,
+ * this two fields can not be configured by user.
+ */
+ vport->txvlan_cfg.accept_tag2 = true;
+ vport->txvlan_cfg.accept_untag2 = true;
+ vport->txvlan_cfg.insert_tag2_en = false;
+ vport->txvlan_cfg.default_tag2 = 0;
+
+ if (port_base_vlan_state == HNAE3_PORT_BASE_VLAN_DISABLE) {
+ vport->rxvlan_cfg.strip_tag1_en = false;
+ vport->rxvlan_cfg.strip_tag2_en =
+ vport->rxvlan_cfg.rx_vlan_offload_en;
+ } else {
+ vport->rxvlan_cfg.strip_tag1_en =
+ vport->rxvlan_cfg.rx_vlan_offload_en;
+ vport->rxvlan_cfg.strip_tag2_en = true;
+ }
+ vport->rxvlan_cfg.vlan1_vlan_prionly = false;
+ vport->rxvlan_cfg.vlan2_vlan_prionly = false;
+
+ ret = hclge_set_vlan_tx_offload_cfg(vport);
+ if (ret)
+ return ret;
+
+ return hclge_set_vlan_rx_offload_cfg(vport);
+}
+
static int hclge_set_vlan_protocol_type(struct hclge_dev *hdev)
{
struct hclge_rx_vlan_type_cfg_cmd *rx_req;
return ret;
for (i = 0; i < hdev->num_alloc_vport; i++) {
- vport = &hdev->vport[i];
- vport->txvlan_cfg.accept_tag1 = true;
- vport->txvlan_cfg.accept_untag1 = true;
+ u16 vlan_tag;
- /* accept_tag2 and accept_untag2 are not supported on
- * pdev revision(0x20), new revision support them. The
- * value of this two fields will not return error when driver
- * send command to fireware in revision(0x20).
- * This two fields can not configured by user.
- */
- vport->txvlan_cfg.accept_tag2 = true;
- vport->txvlan_cfg.accept_untag2 = true;
-
- vport->txvlan_cfg.insert_tag1_en = false;
- vport->txvlan_cfg.insert_tag2_en = false;
- vport->txvlan_cfg.default_tag1 = 0;
- vport->txvlan_cfg.default_tag2 = 0;
-
- ret = hclge_set_vlan_tx_offload_cfg(vport);
- if (ret)
- return ret;
-
- vport->rxvlan_cfg.strip_tag1_en = false;
- vport->rxvlan_cfg.strip_tag2_en = true;
- vport->rxvlan_cfg.vlan1_vlan_prionly = false;
- vport->rxvlan_cfg.vlan2_vlan_prionly = false;
+ vport = &hdev->vport[i];
+ vlan_tag = vport->port_base_vlan_cfg.vlan_info.vlan_tag;
- ret = hclge_set_vlan_rx_offload_cfg(vport);
+ ret = hclge_vlan_offload_cfg(vport,
+ vport->port_base_vlan_cfg.state,
+ vlan_tag);
if (ret)
return ret;
}
return hclge_set_vlan_filter(handle, htons(ETH_P_8021Q), 0, false);
}
-void hclge_add_vport_vlan_table(struct hclge_vport *vport, u16 vlan_id)
+static void hclge_add_vport_vlan_table(struct hclge_vport *vport, u16 vlan_id,
+ bool writen_to_tbl)
{
struct hclge_vport_vlan_cfg *vlan;
if (!vlan)
return;
- vlan->hd_tbl_status = true;
+ vlan->hd_tbl_status = writen_to_tbl;
vlan->vlan_id = vlan_id;
list_add_tail(&vlan->node, &vport->vlan_list);
}
-void hclge_rm_vport_vlan_table(struct hclge_vport *vport, u16 vlan_id,
- bool is_write_tbl)
+static int hclge_add_vport_all_vlan_table(struct hclge_vport *vport)
+{
+ struct hclge_vport_vlan_cfg *vlan, *tmp;
+ struct hclge_dev *hdev = vport->back;
+ int ret;
+
+ list_for_each_entry_safe(vlan, tmp, &vport->vlan_list, node) {
+ if (!vlan->hd_tbl_status) {
+ ret = hclge_set_vlan_filter_hw(hdev, htons(ETH_P_8021Q),
+ vport->vport_id,
+ vlan->vlan_id, 0, false);
+ if (ret) {
+ dev_err(&hdev->pdev->dev,
+ "restore vport vlan list failed, ret=%d\n",
+ ret);
+ return ret;
+ }
+ }
+ vlan->hd_tbl_status = true;
+ }
+
+ return 0;
+}
+
+static void hclge_rm_vport_vlan_table(struct hclge_vport *vport, u16 vlan_id,
+ bool is_write_tbl)
{
struct hclge_vport_vlan_cfg *vlan, *tmp;
struct hclge_dev *hdev = vport->back;
{
struct hclge_vport *vport = hclge_get_vport(handle);
- vport->rxvlan_cfg.strip_tag1_en = false;
- vport->rxvlan_cfg.strip_tag2_en = enable;
+ if (vport->port_base_vlan_cfg.state == HNAE3_PORT_BASE_VLAN_DISABLE) {
+ vport->rxvlan_cfg.strip_tag1_en = false;
+ vport->rxvlan_cfg.strip_tag2_en = enable;
+ } else {
+ vport->rxvlan_cfg.strip_tag1_en = enable;
+ vport->rxvlan_cfg.strip_tag2_en = true;
+ }
vport->rxvlan_cfg.vlan1_vlan_prionly = false;
vport->rxvlan_cfg.vlan2_vlan_prionly = false;
+ vport->rxvlan_cfg.rx_vlan_offload_en = enable;
return hclge_set_vlan_rx_offload_cfg(vport);
}
+static int hclge_update_vlan_filter_entries(struct hclge_vport *vport,
+ u16 port_base_vlan_state,
+ struct hclge_vlan_info *new_info,
+ struct hclge_vlan_info *old_info)
+{
+ struct hclge_dev *hdev = vport->back;
+ int ret;
+
+ if (port_base_vlan_state == HNAE3_PORT_BASE_VLAN_ENABLE) {
+ hclge_rm_vport_all_vlan_table(vport, false);
+ return hclge_set_vlan_filter_hw(hdev,
+ htons(new_info->vlan_proto),
+ vport->vport_id,
+ new_info->vlan_tag,
+ new_info->qos, false);
+ }
+
+ ret = hclge_set_vlan_filter_hw(hdev, htons(old_info->vlan_proto),
+ vport->vport_id, old_info->vlan_tag,
+ old_info->qos, true);
+ if (ret)
+ return ret;
+
+ return hclge_add_vport_all_vlan_table(vport);
+}
+
+int hclge_update_port_base_vlan_cfg(struct hclge_vport *vport, u16 state,
+ struct hclge_vlan_info *vlan_info)
+{
+ struct hnae3_handle *nic = &vport->nic;
+ struct hclge_vlan_info *old_vlan_info;
+ struct hclge_dev *hdev = vport->back;
+ int ret;
+
+ old_vlan_info = &vport->port_base_vlan_cfg.vlan_info;
+
+ ret = hclge_vlan_offload_cfg(vport, state, vlan_info->vlan_tag);
+ if (ret)
+ return ret;
+
+ if (state == HNAE3_PORT_BASE_VLAN_MODIFY) {
+ /* add new VLAN tag */
+ ret = hclge_set_vlan_filter_hw(hdev,
+ htons(vlan_info->vlan_proto),
+ vport->vport_id,
+ vlan_info->vlan_tag,
+ vlan_info->qos, false);
+ if (ret)
+ return ret;
+
+ /* remove old VLAN tag */
+ ret = hclge_set_vlan_filter_hw(hdev,
+ htons(old_vlan_info->vlan_proto),
+ vport->vport_id,
+ old_vlan_info->vlan_tag,
+ old_vlan_info->qos, true);
+ if (ret)
+ return ret;
+
+ goto update;
+ }
+
+ ret = hclge_update_vlan_filter_entries(vport, state, vlan_info,
+ old_vlan_info);
+ if (ret)
+ return ret;
+
+ /* update state only when disable/enable port based VLAN */
+ vport->port_base_vlan_cfg.state = state;
+ if (state == HNAE3_PORT_BASE_VLAN_DISABLE)
+ nic->port_base_vlan_state = HNAE3_PORT_BASE_VLAN_DISABLE;
+ else
+ nic->port_base_vlan_state = HNAE3_PORT_BASE_VLAN_ENABLE;
+
+update:
+ vport->port_base_vlan_cfg.vlan_info.vlan_tag = vlan_info->vlan_tag;
+ vport->port_base_vlan_cfg.vlan_info.qos = vlan_info->qos;
+ vport->port_base_vlan_cfg.vlan_info.vlan_proto = vlan_info->vlan_proto;
+
+ return 0;
+}
+
+static u16 hclge_get_port_base_vlan_state(struct hclge_vport *vport,
+ enum hnae3_port_base_vlan_state state,
+ u16 vlan)
+{
+ if (state == HNAE3_PORT_BASE_VLAN_DISABLE) {
+ if (!vlan)
+ return HNAE3_PORT_BASE_VLAN_NOCHANGE;
+ else
+ return HNAE3_PORT_BASE_VLAN_ENABLE;
+ } else {
+ if (!vlan)
+ return HNAE3_PORT_BASE_VLAN_DISABLE;
+ else if (vport->port_base_vlan_cfg.vlan_info.vlan_tag == vlan)
+ return HNAE3_PORT_BASE_VLAN_NOCHANGE;
+ else
+ return HNAE3_PORT_BASE_VLAN_MODIFY;
+ }
+}
+
+static int hclge_set_vf_vlan_filter(struct hnae3_handle *handle, int vfid,
+ u16 vlan, u8 qos, __be16 proto)
+{
+ struct hclge_vport *vport = hclge_get_vport(handle);
+ struct hclge_dev *hdev = vport->back;
+ struct hclge_vlan_info vlan_info;
+ u16 state;
+ int ret;
+
+ if (hdev->pdev->revision == 0x20)
+ return -EOPNOTSUPP;
+
+ /* qos is a 3 bits value, so can not be bigger than 7 */
+ if (vfid >= hdev->num_alloc_vfs || vlan > VLAN_N_VID - 1 || qos > 7)
+ return -EINVAL;
+ if (proto != htons(ETH_P_8021Q))
+ return -EPROTONOSUPPORT;
+
+ vport = &hdev->vport[vfid];
+ state = hclge_get_port_base_vlan_state(vport,
+ vport->port_base_vlan_cfg.state,
+ vlan);
+ if (state == HNAE3_PORT_BASE_VLAN_NOCHANGE)
+ return 0;
+
+ vlan_info.vlan_tag = vlan;
+ vlan_info.qos = qos;
+ vlan_info.vlan_proto = ntohs(proto);
+
+ /* update port based VLAN for PF */
+ if (!vfid) {
+ hclge_notify_client(hdev, HNAE3_DOWN_CLIENT);
+ ret = hclge_update_port_base_vlan_cfg(vport, state, &vlan_info);
+ hclge_notify_client(hdev, HNAE3_UP_CLIENT);
+
+ return ret;
+ }
+
+ if (!test_bit(HCLGE_VPORT_STATE_ALIVE, &vport->state)) {
+ return hclge_update_port_base_vlan_cfg(vport, state,
+ &vlan_info);
+ } else {
+ ret = hclge_push_vf_port_base_vlan_info(&hdev->vport[0],
+ (u8)vfid, state,
+ vlan, qos,
+ ntohs(proto));
+ return ret;
+ }
+}
+
+int hclge_set_vlan_filter(struct hnae3_handle *handle, __be16 proto,
+ u16 vlan_id, bool is_kill)
+{
+ struct hclge_vport *vport = hclge_get_vport(handle);
+ struct hclge_dev *hdev = vport->back;
+ bool writen_to_tbl = false;
+ int ret = 0;
+
+ /* when port based VLAN enabled, we use port based VLAN as the VLAN
+ * filter entry. In this case, we don't update VLAN filter table
+ * when user add new VLAN or remove exist VLAN, just update the vport
+ * VLAN list. The VLAN id in VLAN list won't be writen in VLAN filter
+ * table until port based VLAN disabled
+ */
+ if (handle->port_base_vlan_state == HNAE3_PORT_BASE_VLAN_DISABLE) {
+ ret = hclge_set_vlan_filter_hw(hdev, proto, vport->vport_id,
+ vlan_id, 0, is_kill);
+ writen_to_tbl = true;
+ }
+
+ if (ret)
+ return ret;
+
+ if (is_kill)
+ hclge_rm_vport_vlan_table(vport, vlan_id, false);
+ else
+ hclge_add_vport_vlan_table(vport, vlan_id,
+ writen_to_tbl);
+
+ return 0;
+}
+
static int hclge_set_mac_mtu(struct hclge_dev *hdev, int new_mps)
{
struct hclge_config_max_frm_size_cmd *req;
int i;
for (i = 0; i < hdev->num_alloc_vport; i++) {
- hclge_vport_start(vport);
+ hclge_vport_stop(vport);
vport++;
}
}
/* VPort level vlan tag configuration for RX direction */
struct hclge_rx_vtag_cfg {
- bool strip_tag1_en; /* Whether strip inner vlan tag */
- bool strip_tag2_en; /* Whether strip outer vlan tag */
- bool vlan1_vlan_prionly;/* Inner VLAN Tag up to descriptor Enable */
- bool vlan2_vlan_prionly;/* Outer VLAN Tag up to descriptor Enable */
+ u8 rx_vlan_offload_en; /* Whether enable rx vlan offload */
+ u8 strip_tag1_en; /* Whether strip inner vlan tag */
+ u8 strip_tag2_en; /* Whether strip outer vlan tag */
+ u8 vlan1_vlan_prionly; /* Inner VLAN Tag up to descriptor Enable */
+ u8 vlan2_vlan_prionly; /* Outer VLAN Tag up to descriptor Enable */
};
struct hclge_rss_tuple_cfg {
HCLGE_VPORT_STATE_MAX
};
+struct hclge_vlan_info {
+ u16 vlan_proto; /* so far support 802.1Q only */
+ u16 qos;
+ u16 vlan_tag;
+};
+
+struct hclge_port_base_vlan_config {
+ u16 state;
+ struct hclge_vlan_info vlan_info;
+};
+
struct hclge_vport {
u16 alloc_tqps; /* Allocated Tx/Rx queues */
u16 alloc_rss_size;
u16 qs_offset;
- u16 bw_limit; /* VSI BW Limit (0 = disabled) */
+ u32 bw_limit; /* VSI BW Limit (0 = disabled) */
u8 dwrr;
+ struct hclge_port_base_vlan_config port_base_vlan_cfg;
struct hclge_tx_vtag_cfg txvlan_cfg;
struct hclge_rx_vtag_cfg rxvlan_cfg;
void hclge_rm_vport_all_mac_table(struct hclge_vport *vport, bool is_del_list,
enum HCLGE_MAC_ADDR_TYPE mac_type);
void hclge_uninit_vport_mac_table(struct hclge_dev *hdev);
-void hclge_add_vport_vlan_table(struct hclge_vport *vport, u16 vlan_id);
-void hclge_rm_vport_vlan_table(struct hclge_vport *vport, u16 vlan_id,
- bool is_write_tbl);
void hclge_rm_vport_all_vlan_table(struct hclge_vport *vport, bool is_del_list);
void hclge_uninit_vport_vlan_table(struct hclge_dev *hdev);
+int hclge_update_port_base_vlan_cfg(struct hclge_vport *vport, u16 state,
+ struct hclge_vlan_info *vlan_info);
+int hclge_push_vf_port_base_vlan_info(struct hclge_vport *vport, u8 vfid,
+ u16 state, u16 vlan_tag, u16 qos,
+ u16 vlan_proto);
#endif
return 0;
}
+int hclge_push_vf_port_base_vlan_info(struct hclge_vport *vport, u8 vfid,
+ u16 state, u16 vlan_tag, u16 qos,
+ u16 vlan_proto)
+{
+#define MSG_DATA_SIZE 8
+
+ u8 msg_data[MSG_DATA_SIZE];
+
+ memcpy(&msg_data[0], &state, sizeof(u16));
+ memcpy(&msg_data[2], &vlan_proto, sizeof(u16));
+ memcpy(&msg_data[4], &qos, sizeof(u16));
+ memcpy(&msg_data[6], &vlan_tag, sizeof(u16));
+
+ return hclge_send_mbx_msg(vport, msg_data, sizeof(msg_data),
+ HLCGE_MBX_PUSH_VLAN_INFO, vfid);
+}
+
static int hclge_set_vf_vlan_cfg(struct hclge_vport *vport,
- struct hclge_mbx_vf_to_pf_cmd *mbx_req,
- bool gen_resp)
+ struct hclge_mbx_vf_to_pf_cmd *mbx_req)
{
int status = 0;
memcpy(&proto, &mbx_req->msg[5], sizeof(proto));
status = hclge_set_vlan_filter(handle, cpu_to_be16(proto),
vlan, is_kill);
- if (!status)
- is_kill ? hclge_rm_vport_vlan_table(vport, vlan, false)
- : hclge_add_vport_vlan_table(vport, vlan);
} else if (mbx_req->msg[1] == HCLGE_MBX_VLAN_RX_OFF_CFG) {
struct hnae3_handle *handle = &vport->nic;
bool en = mbx_req->msg[2] ? true : false;
status = hclge_en_hw_strip_rxvtag(handle, en);
+ } else if (mbx_req->msg[1] == HCLGE_MBX_PORT_BASE_VLAN_CFG) {
+ struct hclge_vlan_info *vlan_info;
+ u16 *state;
+
+ state = (u16 *)&mbx_req->msg[2];
+ vlan_info = (struct hclge_vlan_info *)&mbx_req->msg[4];
+ status = hclge_update_port_base_vlan_cfg(vport, *state,
+ vlan_info);
+ } else if (mbx_req->msg[1] == HCLGE_MBX_GET_PORT_BASE_VLAN_STATE) {
+ u8 state;
+
+ state = vport->port_base_vlan_cfg.state;
+ status = hclge_gen_resp_to_vf(vport, mbx_req, 0, &state,
+ sizeof(u8));
}
- if (gen_resp)
- status = hclge_gen_resp_to_vf(vport, mbx_req, status, NULL, 0);
-
return status;
}
HCLGE_TQPS_DEPTH_INFO_LEN);
}
+static int hclge_get_vf_media_type(struct hclge_vport *vport,
+ struct hclge_mbx_vf_to_pf_cmd *mbx_req)
+{
+ struct hclge_dev *hdev = vport->back;
+ u8 resp_data;
+
+ resp_data = hdev->hw.mac.media_type;
+ return hclge_gen_resp_to_vf(vport, mbx_req, 0, &resp_data,
+ sizeof(resp_data));
+}
+
static int hclge_get_link_info(struct hclge_vport *vport,
struct hclge_mbx_vf_to_pf_cmd *mbx_req)
{
struct hclge_dev *hdev = vport->back;
u16 link_status;
- u8 msg_data[10];
- u16 media_type;
+ u8 msg_data[8];
u8 dest_vfid;
u16 duplex;
/* mac.link can only be 0 or 1 */
link_status = (u16)hdev->hw.mac.link;
duplex = hdev->hw.mac.duplex;
- media_type = hdev->hw.mac.media_type;
memcpy(&msg_data[0], &link_status, sizeof(u16));
memcpy(&msg_data[2], &hdev->hw.mac.speed, sizeof(u32));
memcpy(&msg_data[6], &duplex, sizeof(u16));
- memcpy(&msg_data[8], &media_type, sizeof(u16));
dest_vfid = mbx_req->mbx_src_vfid;
/* send this requested info to VF */
ret);
break;
case HCLGE_MBX_SET_VLAN:
- ret = hclge_set_vf_vlan_cfg(vport, req, false);
+ ret = hclge_set_vf_vlan_cfg(vport, req);
if (ret)
dev_err(&hdev->pdev->dev,
"PF failed(%d) to config VF's VLAN\n",
hclge_rm_vport_all_vlan_table(vport, true);
mutex_unlock(&hdev->vport_cfg_mutex);
break;
+ case HCLGE_MBX_GET_MEDIA_TYPE:
+ ret = hclge_get_vf_media_type(vport, req);
+ if (ret)
+ dev_err(&hdev->pdev->dev,
+ "PF fail(%d) to media type for VF\n",
+ ret);
+ break;
default:
dev_err(&hdev->pdev->dev,
"un-supported mailbox message, code = %d\n",
int hclge_mac_mdio_config(struct hclge_dev *hdev)
{
+#define PHY_INEXISTENT 255
+
struct hclge_mac *mac = &hdev->hw.mac;
struct phy_device *phydev;
struct mii_bus *mdio_bus;
int ret;
- if (hdev->hw.mac.phy_addr >= PHY_MAX_ADDR) {
+ if (hdev->hw.mac.phy_addr == PHY_INEXISTENT) {
+ dev_info(&hdev->pdev->dev,
+ "no phy device is connected to mdio bus\n");
+ return 0;
+ } else if (hdev->hw.mac.phy_addr >= PHY_MAX_ADDR) {
dev_err(&hdev->pdev->dev, "phy_addr(%d) is too large.\n",
hdev->hw.mac.phy_addr);
return -EINVAL;
return ring->desc_num - used - 1;
}
+static int hclgevf_is_valid_csq_clean_head(struct hclgevf_cmq_ring *ring,
+ int head)
+{
+ int ntu = ring->next_to_use;
+ int ntc = ring->next_to_clean;
+
+ if (ntu > ntc)
+ return head >= ntc && head <= ntu;
+
+ return head >= ntc || head <= ntu;
+}
+
static int hclgevf_cmd_csq_clean(struct hclgevf_hw *hw)
{
+ struct hclgevf_dev *hdev = container_of(hw, struct hclgevf_dev, hw);
struct hclgevf_cmq_ring *csq = &hw->cmq.csq;
- u16 ntc = csq->next_to_clean;
- struct hclgevf_desc *desc;
int clean = 0;
u32 head;
- desc = &csq->desc[ntc];
head = hclgevf_read_dev(hw, HCLGEVF_NIC_CSQ_HEAD_REG);
- while (head != ntc) {
- memset(desc, 0, sizeof(*desc));
- ntc++;
- if (ntc == csq->desc_num)
- ntc = 0;
- desc = &csq->desc[ntc];
- clean++;
+ rmb(); /* Make sure head is ready before touch any data */
+
+ if (!hclgevf_is_valid_csq_clean_head(csq, head)) {
+ dev_warn(&hdev->pdev->dev, "wrong cmd head (%d, %d-%d)\n", head,
+ csq->next_to_use, csq->next_to_clean);
+ dev_warn(&hdev->pdev->dev,
+ "Disabling any further commands to IMP firmware\n");
+ set_bit(HCLGEVF_STATE_CMD_DISABLE, &hdev->state);
+ return -EIO;
}
- csq->next_to_clean = ntc;
+ clean = (head - csq->next_to_clean + csq->desc_num) % csq->desc_num;
+ csq->next_to_clean = head;
return clean;
}
int ret;
spin_lock_bh(&hdev->hw.cmq.csq.lock);
- spin_lock_bh(&hdev->hw.cmq.crq.lock);
+ spin_lock(&hdev->hw.cmq.crq.lock);
/* initialize the pointers of async rx queue of mailbox */
hdev->arq.hdev = hdev;
hclgevf_cmd_init_regs(&hdev->hw);
- spin_unlock_bh(&hdev->hw.cmq.crq.lock);
+ spin_unlock(&hdev->hw.cmq.crq.lock);
spin_unlock_bh(&hdev->hw.cmq.csq.lock);
clear_bit(HCLGEVF_STATE_CMD_DISABLE, &hdev->state);
* reset may happen when lower level reset is being processed.
*/
if (hclgevf_is_reset_pending(hdev)) {
- set_bit(HCLGEVF_STATE_CMD_DISABLE, &hdev->state);
- return -EBUSY;
+ ret = -EBUSY;
+ goto err_cmd_init;
}
/* get firmware version */
if (ret) {
dev_err(&hdev->pdev->dev,
"failed(%d) to query firmware version\n", ret);
- return ret;
+ goto err_cmd_init;
}
hdev->fw_version = version;
dev_info(&hdev->pdev->dev, "The firmware version is %08x\n", version);
return 0;
+
+err_cmd_init:
+ set_bit(HCLGEVF_STATE_CMD_DISABLE, &hdev->state);
+
+ return ret;
}
static void hclgevf_cmd_uninit_regs(struct hclgevf_hw *hw)
return 0;
}
+static int hclgevf_get_port_base_vlan_filter_state(struct hclgevf_dev *hdev)
+{
+ struct hnae3_handle *nic = &hdev->nic;
+ u8 resp_msg;
+ int ret;
+
+ ret = hclgevf_send_mbx_msg(hdev, HCLGE_MBX_SET_VLAN,
+ HCLGE_MBX_GET_PORT_BASE_VLAN_STATE,
+ NULL, 0, true, &resp_msg, sizeof(u8));
+ if (ret) {
+ dev_err(&hdev->pdev->dev,
+ "VF request to get port based vlan state failed %d",
+ ret);
+ return ret;
+ }
+
+ nic->port_base_vlan_state = resp_msg;
+
+ return 0;
+}
+
static int hclgevf_get_queue_info(struct hclgevf_dev *hdev)
{
#define HCLGEVF_TQPS_RSS_INFO_LEN 6
return qid_in_pf;
}
+static int hclgevf_get_pf_media_type(struct hclgevf_dev *hdev)
+{
+ u8 resp_msg;
+ int ret;
+
+ ret = hclgevf_send_mbx_msg(hdev, HCLGE_MBX_GET_MEDIA_TYPE, 0, NULL, 0,
+ true, &resp_msg, sizeof(resp_msg));
+ if (ret) {
+ dev_err(&hdev->pdev->dev,
+ "VF request to get the pf port media type failed %d",
+ ret);
+ return ret;
+ }
+
+ hdev->hw.mac.media_type = resp_msg;
+
+ return 0;
+}
+
static int hclgevf_alloc_tqps(struct hclgevf_dev *hdev)
{
struct hclgevf_tqp *tqp;
}
}
-void hclgevf_update_link_mode(struct hclgevf_dev *hdev)
+static void hclgevf_update_link_mode(struct hclgevf_dev *hdev)
{
#define HCLGEVF_ADVERTISING 0
#define HCLGEVF_SUPPORTED 1
*/
hclgevf_cmd_init(hdev);
dev_err(&hdev->pdev->dev, "failed to reset VF\n");
+ if (hclgevf_is_reset_pending(hdev))
+ hclgevf_reset_task_schedule(hdev);
return ret;
}
void hclgevf_reset_task_schedule(struct hclgevf_dev *hdev)
{
- if (!test_bit(HCLGEVF_STATE_RST_SERVICE_SCHED, &hdev->state) &&
- !test_bit(HCLGEVF_STATE_RST_HANDLING, &hdev->state)) {
+ if (!test_bit(HCLGEVF_STATE_RST_SERVICE_SCHED, &hdev->state)) {
set_bit(HCLGEVF_STATE_RST_SERVICE_SCHED, &hdev->state);
schedule_work(&hdev->rst_service_task);
}
{
int ret;
+ /* get current port based vlan state from PF */
+ ret = hclgevf_get_port_base_vlan_filter_state(hdev);
+ if (ret)
+ return ret;
+
/* get queue configuration from PF */
ret = hclgevf_get_queue_info(hdev);
if (ret)
if (ret)
return ret;
+ ret = hclgevf_get_pf_media_type(hdev);
+ if (ret)
+ return ret;
+
/* get tc configuration from PF */
return hclgevf_get_tc_info(hdev);
}
static int hclgevf_client_start(struct hnae3_handle *handle)
{
struct hclgevf_dev *hdev = hclgevf_ae_get_hdev(handle);
+ int ret;
+
+ ret = hclgevf_set_alive(handle, true);
+ if (ret)
+ return ret;
mod_timer(&hdev->keep_alive_timer, jiffies + 2 * HZ);
- return hclgevf_set_alive(handle, true);
+
+ return 0;
}
static void hclgevf_client_stop(struct hnae3_handle *handle)
{
set_bit(HCLGEVF_STATE_DOWN, &hdev->state);
+ if (hdev->keep_alive_timer.function)
+ del_timer_sync(&hdev->keep_alive_timer);
+ if (hdev->keep_alive_task.func)
+ cancel_work_sync(&hdev->keep_alive_task);
if (hdev->service_timer.function)
del_timer_sync(&hdev->service_timer);
if (hdev->service_task.func)
}
}
+void hclgevf_update_port_base_vlan_info(struct hclgevf_dev *hdev, u16 state,
+ u8 *port_base_vlan_info, u8 data_size)
+{
+ struct hnae3_handle *nic = &hdev->nic;
+
+ rtnl_lock();
+ hclgevf_notify_client(hdev, HNAE3_DOWN_CLIENT);
+ rtnl_unlock();
+
+ /* send msg to PF and wait update port based vlan info */
+ hclgevf_send_mbx_msg(hdev, HCLGE_MBX_SET_VLAN,
+ HCLGE_MBX_PORT_BASE_VLAN_CFG,
+ port_base_vlan_info, data_size,
+ false, NULL, 0);
+
+ if (state == HNAE3_PORT_BASE_VLAN_DISABLE)
+ nic->port_base_vlan_state = HNAE3_PORT_BASE_VLAN_DISABLE;
+ else
+ nic->port_base_vlan_state = HNAE3_PORT_BASE_VLAN_ENABLE;
+
+ rtnl_lock();
+ hclgevf_notify_client(hdev, HNAE3_UP_CLIENT);
+ rtnl_unlock();
+}
+
static const struct hnae3_ae_ops hclgevf_ops = {
.init_ae_dev = hclgevf_init_ae_dev,
.uninit_ae_dev = hclgevf_uninit_ae_dev,
u8 duplex);
void hclgevf_reset_task_schedule(struct hclgevf_dev *hdev);
void hclgevf_mbx_task_schedule(struct hclgevf_dev *hdev);
+void hclgevf_update_port_base_vlan_info(struct hclgevf_dev *hdev, u16 state,
+ u8 *port_base_vlan_info, u8 data_size);
#endif
case HCLGE_MBX_LINK_STAT_CHANGE:
case HCLGE_MBX_ASSERTING_RESET:
case HCLGE_MBX_LINK_STAT_MODE:
+ case HLCGE_MBX_PUSH_VLAN_INFO:
/* set this mbx event as pending. This is required as we
* might loose interrupt event when mbx task is busy
* handling. This shall be cleared when mbx task just
void hclgevf_mbx_async_handler(struct hclgevf_dev *hdev)
{
enum hnae3_reset_type reset_type;
- u16 link_status;
- u16 *msg_q;
+ u16 link_status, state;
+ u16 *msg_q, *vlan_info;
u8 duplex;
u32 speed;
u32 tail;
link_status = le16_to_cpu(msg_q[1]);
memcpy(&speed, &msg_q[2], sizeof(speed));
duplex = (u8)le16_to_cpu(msg_q[4]);
- hdev->hw.mac.media_type = (u8)le16_to_cpu(msg_q[5]);
/* update upper layer with new link link status */
hclgevf_update_link_status(hdev, link_status);
hclgevf_reset_task_schedule(hdev);
break;
+ case HLCGE_MBX_PUSH_VLAN_INFO:
+ state = le16_to_cpu(msg_q[1]);
+ vlan_info = &msg_q[1];
+ hclgevf_update_port_base_vlan_info(hdev, state,
+ (u8 *)vlan_info, 8);
+ break;
default:
dev_err(&hdev->pdev->dev,
"fetched unsupported(%d) message from arq\n",
flush_skbs:
netdev_txq = netdev_get_tx_queue(netdev, q_id);
- if ((!skb->xmit_more) || (netif_xmit_stopped(netdev_txq)))
+ if ((!netdev_xmit_more()) || (netif_xmit_stopped(netdev_txq)))
hinic_sq_write_db(txq->sq, prod_idx, wqe_size, 0);
return err;
memset(pr, 0, sizeof(struct ehea_port_res));
- pr->tx_bytes = rx_bytes;
+ pr->tx_bytes = tx_bytes;
pr->tx_packets = tx_packets;
pr->rx_bytes = rx_bytes;
pr->rx_packets = rx_packets;
int nr_of_cqe, u64 eq_handle, u32 cq_token)
{
struct ehea_cq *cq;
- struct h_epa epa;
- u64 *cq_handle_ref, hret, rpage;
+ u64 hret, rpage;
u32 counter;
int ret;
void *vpage;
cq->adapter = adapter;
- cq_handle_ref = &cq->fw_handle;
-
hret = ehea_h_alloc_resource_cq(adapter->handle, &cq->attr,
&cq->fw_handle, &cq->epas);
if (hret != H_SUCCESS) {
}
hw_qeit_reset(&cq->hw_queue);
- epa = cq->epas.kernel;
ehea_reset_cq_ep(cq);
ehea_reset_cq_n1(cq);
#define IBMVETH_STAT_OFF(stat) offsetof(struct ibmveth_adapter, stat)
#define IBMVETH_GET_STAT(a, off) *((u64 *)(((unsigned long)(a)) + off))
-struct ibmveth_stat ibmveth_stats[] = {
+static struct ibmveth_stat ibmveth_stats[] = {
{ "replenish_task_cycles", IBMVETH_STAT_OFF(replenish_task_cycles) },
{ "replenish_no_mem", IBMVETH_STAT_OFF(replenish_no_mem) },
{ "replenish_add_buff_failure",
static void release_crq_queue(struct ibmvnic_adapter *);
static int __ibmvnic_set_mac(struct net_device *netdev, struct sockaddr *p);
static int init_crq_queue(struct ibmvnic_adapter *adapter);
+static int send_query_phys_parms(struct ibmvnic_adapter *adapter);
struct ibmvnic_stat {
char name[ETH_GSTRING_LEN];
{
struct ibmvnic_rwi *rwi;
struct ibmvnic_adapter *adapter;
- struct net_device *netdev;
bool we_lock_rtnl = false;
u32 reset_state;
int rc = 0;
adapter = container_of(work, struct ibmvnic_adapter, ibmvnic_reset);
- netdev = adapter->netdev;
/* netif_set_real_num_xx_queues needs to take rtnl lock here
* unless wait_for_reset is set, in which case the rtnl lock
static int ibmvnic_get_link_ksettings(struct net_device *netdev,
struct ethtool_link_ksettings *cmd)
{
- u32 supported, advertising;
+ struct ibmvnic_adapter *adapter = netdev_priv(netdev);
+ int rc;
- supported = (SUPPORTED_1000baseT_Full | SUPPORTED_Autoneg |
- SUPPORTED_FIBRE);
- advertising = (ADVERTISED_1000baseT_Full | ADVERTISED_Autoneg |
- ADVERTISED_FIBRE);
- cmd->base.speed = SPEED_1000;
- cmd->base.duplex = DUPLEX_FULL;
+ rc = send_query_phys_parms(adapter);
+ if (rc) {
+ adapter->speed = SPEED_UNKNOWN;
+ adapter->duplex = DUPLEX_UNKNOWN;
+ }
+ cmd->base.speed = adapter->speed;
+ cmd->base.duplex = adapter->duplex;
cmd->base.port = PORT_FIBRE;
cmd->base.phy_address = 0;
cmd->base.autoneg = AUTONEG_ENABLE;
- ethtool_convert_legacy_u32_to_link_mode(cmd->link_modes.supported,
- supported);
- ethtool_convert_legacy_u32_to_link_mode(cmd->link_modes.advertising,
- advertising);
-
return 0;
}
}
}
+static int send_query_phys_parms(struct ibmvnic_adapter *adapter)
+{
+ union ibmvnic_crq crq;
+ int rc;
+
+ memset(&crq, 0, sizeof(crq));
+ crq.query_phys_parms.first = IBMVNIC_CRQ_CMD;
+ crq.query_phys_parms.cmd = QUERY_PHYS_PARMS;
+ init_completion(&adapter->fw_done);
+ rc = ibmvnic_send_crq(adapter, &crq);
+ if (rc)
+ return rc;
+ wait_for_completion(&adapter->fw_done);
+ return adapter->fw_done_rc ? -EIO : 0;
+}
+
+static int handle_query_phys_parms_rsp(union ibmvnic_crq *crq,
+ struct ibmvnic_adapter *adapter)
+{
+ struct net_device *netdev = adapter->netdev;
+ int rc;
+
+ rc = crq->query_phys_parms_rsp.rc.code;
+ if (rc) {
+ netdev_err(netdev, "Error %d in QUERY_PHYS_PARMS\n", rc);
+ return rc;
+ }
+ switch (cpu_to_be32(crq->query_phys_parms_rsp.speed)) {
+ case IBMVNIC_10MBPS:
+ adapter->speed = SPEED_10;
+ break;
+ case IBMVNIC_100MBPS:
+ adapter->speed = SPEED_100;
+ break;
+ case IBMVNIC_1GBPS:
+ adapter->speed = SPEED_1000;
+ break;
+ case IBMVNIC_10GBP:
+ adapter->speed = SPEED_10000;
+ break;
+ case IBMVNIC_25GBPS:
+ adapter->speed = SPEED_25000;
+ break;
+ case IBMVNIC_40GBPS:
+ adapter->speed = SPEED_40000;
+ break;
+ case IBMVNIC_50GBPS:
+ adapter->speed = SPEED_50000;
+ break;
+ case IBMVNIC_100GBPS:
+ adapter->speed = SPEED_100000;
+ break;
+ default:
+ netdev_warn(netdev, "Unknown speed 0x%08x\n",
+ cpu_to_be32(crq->query_phys_parms_rsp.speed));
+ adapter->speed = SPEED_UNKNOWN;
+ }
+ if (crq->query_phys_parms_rsp.flags1 & IBMVNIC_FULL_DUPLEX)
+ adapter->duplex = DUPLEX_FULL;
+ else if (crq->query_phys_parms_rsp.flags1 & IBMVNIC_HALF_DUPLEX)
+ adapter->duplex = DUPLEX_HALF;
+ else
+ adapter->duplex = DUPLEX_UNKNOWN;
+
+ return rc;
+}
+
static void ibmvnic_handle_crq(union ibmvnic_crq *crq,
struct ibmvnic_adapter *adapter)
{
case GET_VPD_RSP:
handle_vpd_rsp(crq, adapter);
break;
+ case QUERY_PHYS_PARMS_RSP:
+ adapter->fw_done_rc = handle_query_phys_parms_rsp(crq, adapter);
+ complete(&adapter->fw_done);
+ break;
default:
netdev_err(netdev, "Got an invalid cmd type 0x%02x\n",
gen_crq->cmd);
u8 flags2;
#define IBMVNIC_LOGICAL_LNK_ACTIVE 0x80
__be32 speed;
-#define IBMVNIC_AUTONEG 0x80
-#define IBMVNIC_10MBPS 0x40
-#define IBMVNIC_100MBPS 0x20
-#define IBMVNIC_1GBPS 0x10
-#define IBMVNIC_10GBPS 0x08
+#define IBMVNIC_AUTONEG 0x80000000
+#define IBMVNIC_10MBPS 0x40000000
+#define IBMVNIC_100MBPS 0x20000000
+#define IBMVNIC_1GBPS 0x10000000
+#define IBMVNIC_10GBP 0x08000000
+#define IBMVNIC_40GBPS 0x04000000
+#define IBMVNIC_100GBPS 0x02000000
+#define IBMVNIC_25GBPS 0x01000000
+#define IBMVNIC_50GBPS 0x00800000
+#define IBMVNIC_200GBPS 0x00400000
__be32 mtu;
struct ibmvnic_rc rc;
} __packed __aligned(8);
int phys_link_state;
int logical_link_state;
+ u32 speed;
+ u8 duplex;
+
/* login data */
struct ibmvnic_login_buffer *login_buf;
dma_addr_t login_buf_token;
netdev->features = features;
e100_exec_cb(nic, NULL, e100_configure);
- return 0;
+ return 1;
}
static const struct net_device_ops e100_netdev_ops = {
else
e1000_reset(adapter);
- return 0;
+ return 1;
}
static const struct net_device_ops e1000_netdev_ops = {
/* Make sure there is space in the ring for the next send. */
e1000_maybe_stop_tx(netdev, tx_ring, desc_needed);
- if (!skb->xmit_more ||
+ if (!netdev_xmit_more() ||
netif_xmit_stopped(netdev_get_tx_queue(netdev, 0))) {
writel(tx_ring->next_to_use, hw->hw_addr + tx_ring->tdt);
/* we need this if more than one processor can write to
DIV_ROUND_UP(PAGE_SIZE,
adapter->tx_fifo_limit) + 2));
- if (!skb->xmit_more ||
+ if (!netdev_xmit_more() ||
netif_xmit_stopped(netdev_get_tx_queue(netdev, 0))) {
if (adapter->flags2 & FLAG2_PCIM2PCI_ARBITER_WA)
e1000e_update_tdt_wa(tx_ring,
else
e1000e_reset(adapter);
- return 0;
+ return 1;
}
static const struct net_device_ops e1000e_netdev_ops = {
dev_pm_set_driver_flags(&pdev->dev, DPM_FLAG_NEVER_SKIP);
- if (pci_dev_run_wake(pdev))
+ if (pci_dev_run_wake(pdev) && hw->mac.type < e1000_pch_cnp)
pm_runtime_put_noidle(&pdev->dev);
return 0;
fm10k_maybe_stop_tx(tx_ring, DESC_NEEDED);
/* notify HW of packet */
- if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
+ if (netif_xmit_stopped(txring_txq(tx_ring)) || !netdev_xmit_more()) {
writel(i, tx_ring->tail);
/* we need this if more than one processor can write to our tail
i40e_diag.o \
i40e_txrx.o \
i40e_ptp.o \
+ i40e_ddp.o \
i40e_client.o \
i40e_virtchnl_pf.o \
i40e_xsk.o
u8 filter_index;
};
+#define I40_DDP_FLASH_REGION 100
+#define I40E_PROFILE_INFO_SIZE 48
+#define I40E_MAX_PROFILE_NUM 16
+#define I40E_PROFILE_LIST_SIZE \
+ (I40E_PROFILE_INFO_SIZE * I40E_MAX_PROFILE_NUM + 4)
+#define I40E_DDP_PROFILE_PATH "intel/i40e/ddp/"
+#define I40E_DDP_PROFILE_NAME_MAX 64
+
+int i40e_ddp_load(struct net_device *netdev, const u8 *data, size_t size,
+ bool is_add);
+int i40e_ddp_flash(struct net_device *netdev, struct ethtool_flash *flash);
+
+struct i40e_ddp_profile_list {
+ u32 p_count;
+ struct i40e_profile_info p_info[0];
+};
+
+struct i40e_ddp_old_profile_list {
+ struct list_head list;
+ size_t old_ddp_size;
+ u8 old_ddp_buf[0];
+};
+
/* macros related to FLX_PIT */
#define I40E_FLEX_SET_FSIZE(fsize) (((fsize) << \
I40E_PRTQF_FLX_PIT_FSIZE_SHIFT) & \
struct sk_buff *ptp_tx_skb;
unsigned long ptp_tx_start;
struct hwtstamp_config tstamp_config;
+ struct timespec64 ptp_prev_hw_time;
+ ktime_t ptp_reset_start;
struct mutex tmreg_lock; /* Used to protect the SYSTIME registers. */
u32 ptp_adj_mult;
u32 tx_hwtstamp_timeouts;
u16 override_q_count;
u16 last_sw_conf_flags;
u16 last_sw_conf_valid_flags;
+ /* List to keep previous DDP profiles to be rolled back in the future */
+ struct list_head ddp_old_prof;
};
/**
void i40e_ptp_set_increment(struct i40e_pf *pf);
int i40e_ptp_set_ts_config(struct i40e_pf *pf, struct ifreq *ifr);
int i40e_ptp_get_ts_config(struct i40e_pf *pf, struct ifreq *ifr);
+void i40e_ptp_save_hw_time(struct i40e_pf *pf);
+void i40e_ptp_restore_hw_time(struct i40e_pf *pf);
void i40e_ptp_init(struct i40e_pf *pf);
void i40e_ptp_stop(struct i40e_pf *pf);
int i40e_is_vsi_uplink_mode_veb(struct i40e_vsi *vsi);
if (val >= hw->aq.num_asq_entries) {
i40e_debug(hw, I40E_DEBUG_AQ_MESSAGE,
"AQTX: head overrun at %d\n", val);
- status = I40E_ERR_QUEUE_EMPTY;
+ status = I40E_ERR_ADMIN_QUEUE_FULL;
goto asq_send_command_error;
}
*/
#define I40E_FW_API_VERSION_MAJOR 0x0001
-#define I40E_FW_API_VERSION_MINOR_X722 0x0006
-#define I40E_FW_API_VERSION_MINOR_X710 0x0007
+#define I40E_FW_API_VERSION_MINOR_X722 0x0008
+#define I40E_FW_API_VERSION_MINOR_X710 0x0008
#define I40E_FW_MINOR_VERSION(_h) ((_h)->mac.type == I40E_MAC_XL710 ? \
I40E_FW_API_VERSION_MINOR_X710 : \
**/
u32 i40e_led_get(struct i40e_hw *hw)
{
- u32 current_mode = 0;
u32 mode = 0;
int i;
if (!gpio_val)
continue;
- /* ignore gpio LED src mode entries related to the activity
- * LEDs
- */
- current_mode = ((gpio_val & I40E_GLGEN_GPIO_CTL_LED_MODE_MASK)
- >> I40E_GLGEN_GPIO_CTL_LED_MODE_SHIFT);
- switch (current_mode) {
- case I40E_COMBINED_ACTIVITY:
- case I40E_FILTER_ACTIVITY:
- case I40E_MAC_ACTIVITY:
- case I40E_LINK_ACTIVITY:
- continue;
- default:
- break;
- }
-
mode = (gpio_val & I40E_GLGEN_GPIO_CTL_LED_MODE_MASK) >>
I40E_GLGEN_GPIO_CTL_LED_MODE_SHIFT;
break;
**/
void i40e_led_set(struct i40e_hw *hw, u32 mode, bool blink)
{
- u32 current_mode = 0;
int i;
if (mode & 0xfffffff0)
if (!gpio_val)
continue;
-
- /* ignore gpio LED src mode entries related to the activity
- * LEDs
- */
- current_mode = ((gpio_val & I40E_GLGEN_GPIO_CTL_LED_MODE_MASK)
- >> I40E_GLGEN_GPIO_CTL_LED_MODE_SHIFT);
- switch (current_mode) {
- case I40E_COMBINED_ACTIVITY:
- case I40E_FILTER_ACTIVITY:
- case I40E_MAC_ACTIVITY:
- case I40E_LINK_ACTIVITY:
- continue;
- default:
- break;
- }
-
gpio_val &= ~I40E_GLGEN_GPIO_CTL_LED_MODE_MASK;
/* this & is a bit of paranoia, but serves as a range check */
gpio_val |= ((mode << I40E_GLGEN_GPIO_CTL_LED_MODE_SHIFT) &
return NULL;
}
+/* Get section table in profile */
+#define I40E_SECTION_TABLE(profile, sec_tbl) \
+ do { \
+ struct i40e_profile_segment *p = (profile); \
+ u32 count; \
+ u32 *nvm; \
+ count = p->device_table_count; \
+ nvm = (u32 *)&p->device_table[count]; \
+ sec_tbl = (struct i40e_section_table *)&nvm[nvm[0] + 1]; \
+ } while (0)
+
+/* Get section header in profile */
+#define I40E_SECTION_HEADER(profile, offset) \
+ (struct i40e_profile_section_header *)((u8 *)(profile) + (offset))
+
+/**
+ * i40e_find_section_in_profile
+ * @section_type: the section type to search for (i.e., SECTION_TYPE_NOTE)
+ * @profile: pointer to the i40e segment header to be searched
+ *
+ * This function searches i40e segment for a particular section type. On
+ * success it returns a pointer to the section header, otherwise it will
+ * return NULL.
+ **/
+struct i40e_profile_section_header *
+i40e_find_section_in_profile(u32 section_type,
+ struct i40e_profile_segment *profile)
+{
+ struct i40e_profile_section_header *sec;
+ struct i40e_section_table *sec_tbl;
+ u32 sec_off;
+ u32 i;
+
+ if (profile->header.type != SEGMENT_TYPE_I40E)
+ return NULL;
+
+ I40E_SECTION_TABLE(profile, sec_tbl);
+
+ for (i = 0; i < sec_tbl->section_count; i++) {
+ sec_off = sec_tbl->section_offset[i];
+ sec = I40E_SECTION_HEADER(profile, sec_off);
+ if (sec->section.type == section_type)
+ return sec;
+ }
+
+ return NULL;
+}
+
+/**
+ * i40e_ddp_exec_aq_section - Execute generic AQ for DDP
+ * @hw: pointer to the hw struct
+ * @aq: command buffer containing all data to execute AQ
+ **/
+static enum
+i40e_status_code i40e_ddp_exec_aq_section(struct i40e_hw *hw,
+ struct i40e_profile_aq_section *aq)
+{
+ i40e_status status;
+ struct i40e_aq_desc desc;
+ u8 *msg = NULL;
+ u16 msglen;
+
+ i40e_fill_default_direct_cmd_desc(&desc, aq->opcode);
+ desc.flags |= cpu_to_le16(aq->flags);
+ memcpy(desc.params.raw, aq->param, sizeof(desc.params.raw));
+
+ msglen = aq->datalen;
+ if (msglen) {
+ desc.flags |= cpu_to_le16((u16)(I40E_AQ_FLAG_BUF |
+ I40E_AQ_FLAG_RD));
+ if (msglen > I40E_AQ_LARGE_BUF)
+ desc.flags |= cpu_to_le16((u16)I40E_AQ_FLAG_LB);
+ desc.datalen = cpu_to_le16(msglen);
+ msg = &aq->data[0];
+ }
+
+ status = i40e_asq_send_command(hw, &desc, msg, msglen, NULL);
+
+ if (status) {
+ i40e_debug(hw, I40E_DEBUG_PACKAGE,
+ "unable to exec DDP AQ opcode %u, error %d\n",
+ aq->opcode, status);
+ return status;
+ }
+
+ /* copy returned desc to aq_buf */
+ memcpy(aq->param, desc.params.raw, sizeof(desc.params.raw));
+
+ return 0;
+}
+
+/**
+ * i40e_validate_profile
+ * @hw: pointer to the hardware structure
+ * @profile: pointer to the profile segment of the package to be validated
+ * @track_id: package tracking id
+ * @rollback: flag if the profile is for rollback.
+ *
+ * Validates supported devices and profile's sections.
+ */
+static enum i40e_status_code
+i40e_validate_profile(struct i40e_hw *hw, struct i40e_profile_segment *profile,
+ u32 track_id, bool rollback)
+{
+ struct i40e_profile_section_header *sec = NULL;
+ i40e_status status = 0;
+ struct i40e_section_table *sec_tbl;
+ u32 vendor_dev_id;
+ u32 dev_cnt;
+ u32 sec_off;
+ u32 i;
+
+ if (track_id == I40E_DDP_TRACKID_INVALID) {
+ i40e_debug(hw, I40E_DEBUG_PACKAGE, "Invalid track_id\n");
+ return I40E_NOT_SUPPORTED;
+ }
+
+ dev_cnt = profile->device_table_count;
+ for (i = 0; i < dev_cnt; i++) {
+ vendor_dev_id = profile->device_table[i].vendor_dev_id;
+ if ((vendor_dev_id >> 16) == PCI_VENDOR_ID_INTEL &&
+ hw->device_id == (vendor_dev_id & 0xFFFF))
+ break;
+ }
+ if (dev_cnt && i == dev_cnt) {
+ i40e_debug(hw, I40E_DEBUG_PACKAGE,
+ "Device doesn't support DDP\n");
+ return I40E_ERR_DEVICE_NOT_SUPPORTED;
+ }
+
+ I40E_SECTION_TABLE(profile, sec_tbl);
+
+ /* Validate sections types */
+ for (i = 0; i < sec_tbl->section_count; i++) {
+ sec_off = sec_tbl->section_offset[i];
+ sec = I40E_SECTION_HEADER(profile, sec_off);
+ if (rollback) {
+ if (sec->section.type == SECTION_TYPE_MMIO ||
+ sec->section.type == SECTION_TYPE_AQ ||
+ sec->section.type == SECTION_TYPE_RB_AQ) {
+ i40e_debug(hw, I40E_DEBUG_PACKAGE,
+ "Not a roll-back package\n");
+ return I40E_NOT_SUPPORTED;
+ }
+ } else {
+ if (sec->section.type == SECTION_TYPE_RB_AQ ||
+ sec->section.type == SECTION_TYPE_RB_MMIO) {
+ i40e_debug(hw, I40E_DEBUG_PACKAGE,
+ "Not an original package\n");
+ return I40E_NOT_SUPPORTED;
+ }
+ }
+ }
+
+ return status;
+}
+
/**
* i40e_write_profile
* @hw: pointer to the hardware structure
i40e_status status = 0;
struct i40e_section_table *sec_tbl;
struct i40e_profile_section_header *sec = NULL;
- u32 dev_cnt;
- u32 vendor_dev_id;
- u32 *nvm;
+ struct i40e_profile_aq_section *ddp_aq;
u32 section_size = 0;
u32 offset = 0, info = 0;
+ u32 sec_off;
u32 i;
- dev_cnt = profile->device_table_count;
+ status = i40e_validate_profile(hw, profile, track_id, false);
+ if (status)
+ return status;
- for (i = 0; i < dev_cnt; i++) {
- vendor_dev_id = profile->device_table[i].vendor_dev_id;
- if ((vendor_dev_id >> 16) == PCI_VENDOR_ID_INTEL)
- if (hw->device_id == (vendor_dev_id & 0xFFFF))
+ I40E_SECTION_TABLE(profile, sec_tbl);
+
+ for (i = 0; i < sec_tbl->section_count; i++) {
+ sec_off = sec_tbl->section_offset[i];
+ sec = I40E_SECTION_HEADER(profile, sec_off);
+ /* Process generic admin command */
+ if (sec->section.type == SECTION_TYPE_AQ) {
+ ddp_aq = (struct i40e_profile_aq_section *)&sec[1];
+ status = i40e_ddp_exec_aq_section(hw, ddp_aq);
+ if (status) {
+ i40e_debug(hw, I40E_DEBUG_PACKAGE,
+ "Failed to execute aq: section %d, opcode %u\n",
+ i, ddp_aq->opcode);
break;
+ }
+ sec->section.type = SECTION_TYPE_RB_AQ;
+ }
+
+ /* Skip any non-mmio sections */
+ if (sec->section.type != SECTION_TYPE_MMIO)
+ continue;
+
+ section_size = sec->section.size +
+ sizeof(struct i40e_profile_section_header);
+
+ /* Write MMIO section */
+ status = i40e_aq_write_ddp(hw, (void *)sec, (u16)section_size,
+ track_id, &offset, &info, NULL);
+ if (status) {
+ i40e_debug(hw, I40E_DEBUG_PACKAGE,
+ "Failed to write profile: section %d, offset %d, info %d\n",
+ i, offset, info);
+ break;
+ }
}
- if (i == dev_cnt) {
- i40e_debug(hw, I40E_DEBUG_PACKAGE, "Device doesn't support DDP");
- return I40E_ERR_DEVICE_NOT_SUPPORTED;
- }
+ return status;
+}
+
+/**
+ * i40e_rollback_profile
+ * @hw: pointer to the hardware structure
+ * @profile: pointer to the profile segment of the package to be removed
+ * @track_id: package tracking id
+ *
+ * Rolls back previously loaded package.
+ */
+enum i40e_status_code
+i40e_rollback_profile(struct i40e_hw *hw, struct i40e_profile_segment *profile,
+ u32 track_id)
+{
+ struct i40e_profile_section_header *sec = NULL;
+ i40e_status status = 0;
+ struct i40e_section_table *sec_tbl;
+ u32 offset = 0, info = 0;
+ u32 section_size = 0;
+ u32 sec_off;
+ int i;
+
+ status = i40e_validate_profile(hw, profile, track_id, true);
+ if (status)
+ return status;
- nvm = (u32 *)&profile->device_table[dev_cnt];
- sec_tbl = (struct i40e_section_table *)&nvm[nvm[0] + 1];
+ I40E_SECTION_TABLE(profile, sec_tbl);
- for (i = 0; i < sec_tbl->section_count; i++) {
- sec = (struct i40e_profile_section_header *)((u8 *)profile +
- sec_tbl->section_offset[i]);
+ /* For rollback write sections in reverse */
+ for (i = sec_tbl->section_count - 1; i >= 0; i--) {
+ sec_off = sec_tbl->section_offset[i];
+ sec = I40E_SECTION_HEADER(profile, sec_off);
- /* Skip 'AQ', 'note' and 'name' sections */
- if (sec->section.type != SECTION_TYPE_MMIO)
+ /* Skip any non-rollback sections */
+ if (sec->section.type != SECTION_TYPE_RB_MMIO)
continue;
section_size = sec->section.size +
sizeof(struct i40e_profile_section_header);
- /* Write profile */
+ /* Write roll-back MMIO section */
status = i40e_aq_write_ddp(hw, (void *)sec, (u16)section_size,
track_id, &offset, &info, NULL);
if (status) {
i40e_debug(hw, I40E_DEBUG_PACKAGE,
- "Failed to write profile: offset %d, info %d",
- offset, info);
+ "Failed to write profile: section %d, offset %d, info %d\n",
+ i, offset, info);
break;
}
}
/**
* i40e_init_dcb
* @hw: pointer to the hw struct
+ * @enable_mib_change: enable mib change event
*
* Update DCB configuration from the Firmware
**/
-i40e_status i40e_init_dcb(struct i40e_hw *hw)
+i40e_status i40e_init_dcb(struct i40e_hw *hw, bool enable_mib_change)
{
i40e_status ret = 0;
struct i40e_lldp_variables lldp_cfg;
u8 adminstatus = 0;
if (!hw->func_caps.dcb)
- return ret;
+ return I40E_NOT_SUPPORTED;
/* Read LLDP NVM area */
ret = i40e_read_lldp_cfg(hw, &lldp_cfg);
if (ret)
- return ret;
+ return I40E_ERR_NOT_READY;
/* Get the LLDP AdminStatus for the current port */
adminstatus = lldp_cfg.adminstatus >> (hw->port * 4);
/* LLDP agent disabled */
if (!adminstatus) {
hw->dcbx_status = I40E_DCBX_STATUS_DISABLED;
- return ret;
+ return I40E_ERR_NOT_READY;
}
/* Get DCBX status */
return ret;
/* Check the DCBX Status */
- switch (hw->dcbx_status) {
- case I40E_DCBX_STATUS_DONE:
- case I40E_DCBX_STATUS_IN_PROGRESS:
+ if (hw->dcbx_status == I40E_DCBX_STATUS_DONE ||
+ hw->dcbx_status == I40E_DCBX_STATUS_IN_PROGRESS) {
/* Get current DCBX configuration */
ret = i40e_get_dcb_config(hw);
if (ret)
return ret;
- break;
- case I40E_DCBX_STATUS_DISABLED:
- return ret;
- case I40E_DCBX_STATUS_NOT_STARTED:
- case I40E_DCBX_STATUS_MULTIPLE_PEERS:
- default:
- break;
+ } else if (hw->dcbx_status == I40E_DCBX_STATUS_DISABLED) {
+ return I40E_ERR_NOT_READY;
}
/* Configure the LLDP MIB change event */
- ret = i40e_aq_cfg_lldp_mib_change_event(hw, true, NULL);
- if (ret)
- return ret;
+ if (enable_mib_change)
+ ret = i40e_aq_cfg_lldp_mib_change_event(hw, true, NULL);
return ret;
}
u8 bridgetype,
struct i40e_dcbx_config *dcbcfg);
i40e_status i40e_get_dcb_config(struct i40e_hw *hw);
-i40e_status i40e_init_dcb(struct i40e_hw *hw);
+i40e_status i40e_init_dcb(struct i40e_hw *hw, bool enable_mib_change);
#endif /* _I40E_DCB_H_ */
--- /dev/null
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
+
+#include "i40e.h"
+
+#include <linux/firmware.h>
+
+/**
+ * i40e_ddp_profiles_eq - checks if DDP profiles are the equivalent
+ * @a: new profile info
+ * @b: old profile info
+ *
+ * checks if DDP profiles are the equivalent.
+ * Returns true if profiles are the same.
+ **/
+static bool i40e_ddp_profiles_eq(struct i40e_profile_info *a,
+ struct i40e_profile_info *b)
+{
+ return a->track_id == b->track_id &&
+ !memcmp(&a->version, &b->version, sizeof(a->version)) &&
+ !memcmp(&a->name, &b->name, I40E_DDP_NAME_SIZE);
+}
+
+/**
+ * i40e_ddp_does_profile_exist - checks if DDP profile loaded already
+ * @hw: HW data structure
+ * @pinfo: DDP profile information structure
+ *
+ * checks if DDP profile loaded already.
+ * Returns >0 if the profile exists.
+ * Returns 0 if the profile is absent.
+ * Returns <0 if error.
+ **/
+static int i40e_ddp_does_profile_exist(struct i40e_hw *hw,
+ struct i40e_profile_info *pinfo)
+{
+ struct i40e_ddp_profile_list *profile_list;
+ u8 buff[I40E_PROFILE_LIST_SIZE];
+ i40e_status status;
+ int i;
+
+ status = i40e_aq_get_ddp_list(hw, buff, I40E_PROFILE_LIST_SIZE, 0,
+ NULL);
+ if (status)
+ return -1;
+
+ profile_list = (struct i40e_ddp_profile_list *)buff;
+ for (i = 0; i < profile_list->p_count; i++) {
+ if (i40e_ddp_profiles_eq(pinfo, &profile_list->p_info[i]))
+ return 1;
+ }
+ return 0;
+}
+
+/**
+ * i40e_ddp_profiles_overlap - checks if DDP profiles overlap.
+ * @new: new profile info
+ * @old: old profile info
+ *
+ * checks if DDP profiles overlap.
+ * Returns true if profiles are overlap.
+ **/
+static bool i40e_ddp_profiles_overlap(struct i40e_profile_info *new,
+ struct i40e_profile_info *old)
+{
+ unsigned int group_id_old = (u8)((old->track_id & 0x00FF0000) >> 16);
+ unsigned int group_id_new = (u8)((new->track_id & 0x00FF0000) >> 16);
+
+ /* 0x00 group must be only the first */
+ if (group_id_new == 0)
+ return true;
+ /* 0xFF group is compatible with anything else */
+ if (group_id_new == 0xFF || group_id_old == 0xFF)
+ return false;
+ /* otherwise only profiles from the same group are compatible*/
+ return group_id_old != group_id_new;
+}
+
+/**
+ * i40e_ddp_does_profiles_ - checks if DDP overlaps with existing one.
+ * @hw: HW data structure
+ * @pinfo: DDP profile information structure
+ *
+ * checks if DDP profile overlaps with existing one.
+ * Returns >0 if the profile overlaps.
+ * Returns 0 if the profile is ok.
+ * Returns <0 if error.
+ **/
+static int i40e_ddp_does_profile_overlap(struct i40e_hw *hw,
+ struct i40e_profile_info *pinfo)
+{
+ struct i40e_ddp_profile_list *profile_list;
+ u8 buff[I40E_PROFILE_LIST_SIZE];
+ i40e_status status;
+ int i;
+
+ status = i40e_aq_get_ddp_list(hw, buff, I40E_PROFILE_LIST_SIZE, 0,
+ NULL);
+ if (status)
+ return -EIO;
+
+ profile_list = (struct i40e_ddp_profile_list *)buff;
+ for (i = 0; i < profile_list->p_count; i++) {
+ if (i40e_ddp_profiles_overlap(pinfo,
+ &profile_list->p_info[i]))
+ return 1;
+ }
+ return 0;
+}
+
+/**
+ * i40e_add_pinfo
+ * @hw: pointer to the hardware structure
+ * @profile: pointer to the profile segment of the package
+ * @profile_info_sec: buffer for information section
+ * @track_id: package tracking id
+ *
+ * Register a profile to the list of loaded profiles.
+ */
+static enum i40e_status_code
+i40e_add_pinfo(struct i40e_hw *hw, struct i40e_profile_segment *profile,
+ u8 *profile_info_sec, u32 track_id)
+{
+ struct i40e_profile_section_header *sec;
+ struct i40e_profile_info *pinfo;
+ i40e_status status;
+ u32 offset = 0, info = 0;
+
+ sec = (struct i40e_profile_section_header *)profile_info_sec;
+ sec->tbl_size = 1;
+ sec->data_end = sizeof(struct i40e_profile_section_header) +
+ sizeof(struct i40e_profile_info);
+ sec->section.type = SECTION_TYPE_INFO;
+ sec->section.offset = sizeof(struct i40e_profile_section_header);
+ sec->section.size = sizeof(struct i40e_profile_info);
+ pinfo = (struct i40e_profile_info *)(profile_info_sec +
+ sec->section.offset);
+ pinfo->track_id = track_id;
+ pinfo->version = profile->version;
+ pinfo->op = I40E_DDP_ADD_TRACKID;
+
+ /* Clear reserved field */
+ memset(pinfo->reserved, 0, sizeof(pinfo->reserved));
+ memcpy(pinfo->name, profile->name, I40E_DDP_NAME_SIZE);
+
+ status = i40e_aq_write_ddp(hw, (void *)sec, sec->data_end,
+ track_id, &offset, &info, NULL);
+ return status;
+}
+
+/**
+ * i40e_del_pinfo - delete DDP profile info from NIC
+ * @hw: HW data structure
+ * @profile: DDP profile segment to be deleted
+ * @profile_info_sec: DDP profile section header
+ * @track_id: track ID of the profile for deletion
+ *
+ * Removes DDP profile from the NIC.
+ **/
+static enum i40e_status_code
+i40e_del_pinfo(struct i40e_hw *hw, struct i40e_profile_segment *profile,
+ u8 *profile_info_sec, u32 track_id)
+{
+ struct i40e_profile_section_header *sec;
+ struct i40e_profile_info *pinfo;
+ i40e_status status;
+ u32 offset = 0, info = 0;
+
+ sec = (struct i40e_profile_section_header *)profile_info_sec;
+ sec->tbl_size = 1;
+ sec->data_end = sizeof(struct i40e_profile_section_header) +
+ sizeof(struct i40e_profile_info);
+ sec->section.type = SECTION_TYPE_INFO;
+ sec->section.offset = sizeof(struct i40e_profile_section_header);
+ sec->section.size = sizeof(struct i40e_profile_info);
+ pinfo = (struct i40e_profile_info *)(profile_info_sec +
+ sec->section.offset);
+ pinfo->track_id = track_id;
+ pinfo->version = profile->version;
+ pinfo->op = I40E_DDP_REMOVE_TRACKID;
+
+ /* Clear reserved field */
+ memset(pinfo->reserved, 0, sizeof(pinfo->reserved));
+ memcpy(pinfo->name, profile->name, I40E_DDP_NAME_SIZE);
+
+ status = i40e_aq_write_ddp(hw, (void *)sec, sec->data_end,
+ track_id, &offset, &info, NULL);
+ return status;
+}
+
+/**
+ * i40e_ddp_is_pkg_hdr_valid - performs basic pkg header integrity checks
+ * @netdev: net device structure (for logging purposes)
+ * @pkg_hdr: pointer to package header
+ * @size_huge: size of the whole DDP profile package in size_t
+ *
+ * Checks correctness of pkg header: Version, size too big/small, and
+ * all segment offsets alignment and boundaries. This function lets
+ * reject non DDP profile file to be loaded by administrator mistake.
+ **/
+static bool i40e_ddp_is_pkg_hdr_valid(struct net_device *netdev,
+ struct i40e_package_header *pkg_hdr,
+ size_t size_huge)
+{
+ u32 size = 0xFFFFFFFFU & size_huge;
+ u32 pkg_hdr_size;
+ u32 segment;
+
+ if (!pkg_hdr)
+ return false;
+
+ if (pkg_hdr->version.major > 0) {
+ struct i40e_ddp_version ver = pkg_hdr->version;
+
+ netdev_err(netdev, "Unsupported DDP profile version %u.%u.%u.%u",
+ ver.major, ver.minor, ver.update, ver.draft);
+ return false;
+ }
+ if (size_huge > size) {
+ netdev_err(netdev, "Invalid DDP profile - size is bigger than 4G");
+ return false;
+ }
+ if (size < (sizeof(struct i40e_package_header) +
+ sizeof(struct i40e_metadata_segment) + sizeof(u32) * 2)) {
+ netdev_err(netdev, "Invalid DDP profile - size is too small.");
+ return false;
+ }
+
+ pkg_hdr_size = sizeof(u32) * (pkg_hdr->segment_count + 2U);
+ if (size < pkg_hdr_size) {
+ netdev_err(netdev, "Invalid DDP profile - too many segments");
+ return false;
+ }
+ for (segment = 0; segment < pkg_hdr->segment_count; ++segment) {
+ u32 offset = pkg_hdr->segment_offset[segment];
+
+ if (0xFU & offset) {
+ netdev_err(netdev,
+ "Invalid DDP profile %u segment alignment",
+ segment);
+ return false;
+ }
+ if (pkg_hdr_size > offset || offset >= size) {
+ netdev_err(netdev,
+ "Invalid DDP profile %u segment offset",
+ segment);
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/**
+ * i40e_ddp_load - performs DDP loading
+ * @netdev: net device structure
+ * @data: buffer containing recipe file
+ * @size: size of the buffer
+ * @is_add: true when loading profile, false when rolling back the previous one
+ *
+ * Checks correctness and loads DDP profile to the NIC. The function is
+ * also used for rolling back previously loaded profile.
+ **/
+int i40e_ddp_load(struct net_device *netdev, const u8 *data, size_t size,
+ bool is_add)
+{
+ u8 profile_info_sec[sizeof(struct i40e_profile_section_header) +
+ sizeof(struct i40e_profile_info)];
+ struct i40e_metadata_segment *metadata_hdr;
+ struct i40e_profile_segment *profile_hdr;
+ struct i40e_profile_info pinfo;
+ struct i40e_package_header *pkg_hdr;
+ i40e_status status;
+ struct i40e_netdev_priv *np = netdev_priv(netdev);
+ struct i40e_vsi *vsi = np->vsi;
+ struct i40e_pf *pf = vsi->back;
+ u32 track_id;
+ int istatus;
+
+ pkg_hdr = (struct i40e_package_header *)data;
+ if (!i40e_ddp_is_pkg_hdr_valid(netdev, pkg_hdr, size))
+ return -EINVAL;
+
+ if (size < (sizeof(struct i40e_package_header) +
+ sizeof(struct i40e_metadata_segment) + sizeof(u32) * 2)) {
+ netdev_err(netdev, "Invalid DDP recipe size.");
+ return -EINVAL;
+ }
+
+ /* Find beginning of segment data in buffer */
+ metadata_hdr = (struct i40e_metadata_segment *)
+ i40e_find_segment_in_package(SEGMENT_TYPE_METADATA, pkg_hdr);
+ if (!metadata_hdr) {
+ netdev_err(netdev, "Failed to find metadata segment in DDP recipe.");
+ return -EINVAL;
+ }
+
+ track_id = metadata_hdr->track_id;
+ profile_hdr = (struct i40e_profile_segment *)
+ i40e_find_segment_in_package(SEGMENT_TYPE_I40E, pkg_hdr);
+ if (!profile_hdr) {
+ netdev_err(netdev, "Failed to find profile segment in DDP recipe.");
+ return -EINVAL;
+ }
+
+ pinfo.track_id = track_id;
+ pinfo.version = profile_hdr->version;
+ if (is_add)
+ pinfo.op = I40E_DDP_ADD_TRACKID;
+ else
+ pinfo.op = I40E_DDP_REMOVE_TRACKID;
+
+ memcpy(pinfo.name, profile_hdr->name, I40E_DDP_NAME_SIZE);
+
+ /* Check if profile data already exists*/
+ istatus = i40e_ddp_does_profile_exist(&pf->hw, &pinfo);
+ if (istatus < 0) {
+ netdev_err(netdev, "Failed to fetch loaded profiles.");
+ return istatus;
+ }
+ if (is_add) {
+ if (istatus > 0) {
+ netdev_err(netdev, "DDP profile already loaded.");
+ return -EINVAL;
+ }
+ istatus = i40e_ddp_does_profile_overlap(&pf->hw, &pinfo);
+ if (istatus < 0) {
+ netdev_err(netdev, "Failed to fetch loaded profiles.");
+ return istatus;
+ }
+ if (istatus > 0) {
+ netdev_err(netdev, "DDP profile overlaps with existing one.");
+ return -EINVAL;
+ }
+ } else {
+ if (istatus == 0) {
+ netdev_err(netdev,
+ "DDP profile for deletion does not exist.");
+ return -EINVAL;
+ }
+ }
+
+ /* Load profile data */
+ if (is_add) {
+ status = i40e_write_profile(&pf->hw, profile_hdr, track_id);
+ if (status) {
+ if (status == I40E_ERR_DEVICE_NOT_SUPPORTED) {
+ netdev_err(netdev,
+ "Profile is not supported by the device.");
+ return -EPERM;
+ }
+ netdev_err(netdev, "Failed to write DDP profile.");
+ return -EIO;
+ }
+ } else {
+ status = i40e_rollback_profile(&pf->hw, profile_hdr, track_id);
+ if (status) {
+ netdev_err(netdev, "Failed to remove DDP profile.");
+ return -EIO;
+ }
+ }
+
+ /* Add/remove profile to/from profile list in FW */
+ if (is_add) {
+ status = i40e_add_pinfo(&pf->hw, profile_hdr, profile_info_sec,
+ track_id);
+ if (status) {
+ netdev_err(netdev, "Failed to add DDP profile info.");
+ return -EIO;
+ }
+ } else {
+ status = i40e_del_pinfo(&pf->hw, profile_hdr, profile_info_sec,
+ track_id);
+ if (status) {
+ netdev_err(netdev, "Failed to restore DDP profile info.");
+ return -EIO;
+ }
+ }
+
+ return 0;
+}
+
+/**
+ * i40e_ddp_restore - restore previously loaded profile and remove from list
+ * @pf: PF data struct
+ *
+ * Restores previously loaded profile stored on the list in driver memory.
+ * After rolling back removes entry from the list.
+ **/
+static int i40e_ddp_restore(struct i40e_pf *pf)
+{
+ struct i40e_ddp_old_profile_list *entry;
+ struct net_device *netdev = pf->vsi[pf->lan_vsi]->netdev;
+ int status = 0;
+
+ if (!list_empty(&pf->ddp_old_prof)) {
+ entry = list_first_entry(&pf->ddp_old_prof,
+ struct i40e_ddp_old_profile_list,
+ list);
+ status = i40e_ddp_load(netdev, entry->old_ddp_buf,
+ entry->old_ddp_size, false);
+ list_del(&entry->list);
+ kfree(entry);
+ }
+ return status;
+}
+
+/**
+ * i40e_ddp_flash - callback function for ethtool flash feature
+ * @netdev: net device structure
+ * @flash: kernel flash structure
+ *
+ * Ethtool callback function used for loading and unloading DDP profiles.
+ **/
+int i40e_ddp_flash(struct net_device *netdev, struct ethtool_flash *flash)
+{
+ const struct firmware *ddp_config;
+ struct i40e_netdev_priv *np = netdev_priv(netdev);
+ struct i40e_vsi *vsi = np->vsi;
+ struct i40e_pf *pf = vsi->back;
+ int status = 0;
+
+ /* Check for valid region first */
+ if (flash->region != I40_DDP_FLASH_REGION) {
+ netdev_err(netdev, "Requested firmware region is not recognized by this driver.");
+ return -EINVAL;
+ }
+ if (pf->hw.bus.func != 0) {
+ netdev_err(netdev, "Any DDP operation is allowed only on Phy0 NIC interface");
+ return -EINVAL;
+ }
+
+ /* If the user supplied "-" instead of file name rollback previously
+ * stored profile.
+ */
+ if (strncmp(flash->data, "-", 2) != 0) {
+ struct i40e_ddp_old_profile_list *list_entry;
+ char profile_name[sizeof(I40E_DDP_PROFILE_PATH)
+ + I40E_DDP_PROFILE_NAME_MAX];
+
+ profile_name[sizeof(profile_name) - 1] = 0;
+ strncpy(profile_name, I40E_DDP_PROFILE_PATH,
+ sizeof(profile_name) - 1);
+ strncat(profile_name, flash->data, I40E_DDP_PROFILE_NAME_MAX);
+ /* Load DDP recipe. */
+ status = request_firmware(&ddp_config, profile_name,
+ &netdev->dev);
+ if (status) {
+ netdev_err(netdev, "DDP recipe file request failed.");
+ return status;
+ }
+
+ status = i40e_ddp_load(netdev, ddp_config->data,
+ ddp_config->size, true);
+
+ if (!status) {
+ list_entry =
+ kzalloc(sizeof(struct i40e_ddp_old_profile_list) +
+ ddp_config->size, GFP_KERNEL);
+ if (!list_entry) {
+ netdev_info(netdev, "Failed to allocate memory for previous DDP profile data.");
+ netdev_info(netdev, "New profile loaded but roll-back will be impossible.");
+ } else {
+ memcpy(list_entry->old_ddp_buf,
+ ddp_config->data, ddp_config->size);
+ list_entry->old_ddp_size = ddp_config->size;
+ list_add(&list_entry->list, &pf->ddp_old_prof);
+ }
+ }
+
+ release_firmware(ddp_config);
+ } else {
+ if (!list_empty(&pf->ddp_old_prof)) {
+ status = i40e_ddp_restore(pf);
+ } else {
+ netdev_warn(netdev, "There is no DDP profile to restore.");
+ status = -ENOENT;
+ }
+ }
+ return status;
+}
ethtool_link_ksettings_add_link_mode(ks, advertising,
1000baseT_Full);
}
- if (phy_types & I40E_CAP_PHY_TYPE_40GBASE_SR4)
+ if (phy_types & I40E_CAP_PHY_TYPE_40GBASE_SR4) {
ethtool_link_ksettings_add_link_mode(ks, supported,
40000baseSR4_Full);
+ ethtool_link_ksettings_add_link_mode(ks, advertising,
+ 40000baseSR4_Full);
+ }
if (phy_types & I40E_CAP_PHY_TYPE_40GBASE_LR4)
ethtool_link_ksettings_add_link_mode(ks, supported,
40000baseLR4_Full);
case I40E_PHY_TYPE_40GBASE_SR4:
ethtool_link_ksettings_add_link_mode(ks, supported,
40000baseSR4_Full);
+ ethtool_link_ksettings_add_link_mode(ks, advertising,
+ 40000baseSR4_Full);
break;
case I40E_PHY_TYPE_40GBASE_LR4:
ethtool_link_ksettings_add_link_mode(ks, supported,
.set_link_ksettings = i40e_set_link_ksettings,
.get_fecparam = i40e_get_fec_param,
.set_fecparam = i40e_set_fec_param,
+ .flash_device = i40e_ddp_flash,
};
void i40e_set_ethtool_ops(struct net_device *netdev)
fcnt = i40e_update_filter_state(num_add, list, add_head);
if (fcnt != num_add) {
- set_bit(__I40E_VSI_OVERFLOW_PROMISC, vsi->state);
- dev_warn(&vsi->back->pdev->dev,
- "Error %s adding RX filters on %s, promiscuous mode forced on\n",
- i40e_aq_str(hw, aq_err),
- vsi_name);
+ if (vsi->type == I40E_VSI_MAIN) {
+ set_bit(__I40E_VSI_OVERFLOW_PROMISC, vsi->state);
+ dev_warn(&vsi->back->pdev->dev,
+ "Error %s adding RX filters on %s, promiscuous mode forced on\n",
+ i40e_aq_str(hw, aq_err), vsi_name);
+ } else if (vsi->type == I40E_VSI_SRIOV ||
+ vsi->type == I40E_VSI_VMDQ1 ||
+ vsi->type == I40E_VSI_VMDQ2) {
+ dev_warn(&vsi->back->pdev->dev,
+ "Error %s adding RX filters on %s, please set promiscuous on manually for %s\n",
+ i40e_aq_str(hw, aq_err), vsi_name, vsi_name);
+ } else {
+ dev_warn(&vsi->back->pdev->dev,
+ "Error %s adding RX filters on %s, incorrect VSI type: %i.\n",
+ i40e_aq_str(hw, aq_err), vsi_name, vsi->type);
+ }
}
}
struct i40e_vsi_context ctxt;
i40e_status ret;
+ /* Don't modify stripping options if a port VLAN is active */
+ if (vsi->info.pvid)
+ return;
+
if ((vsi->info.valid_sections &
cpu_to_le16(I40E_AQ_VSI_PROP_VLAN_VALID)) &&
((vsi->info.port_vlan_flags & I40E_AQ_VSI_PVLAN_MODE_MASK) == 0))
struct i40e_vsi_context ctxt;
i40e_status ret;
+ /* Don't modify stripping options if a port VLAN is active */
+ if (vsi->info.pvid)
+ return;
+
if ((vsi->info.valid_sections &
cpu_to_le16(I40E_AQ_VSI_PROP_VLAN_VALID)) &&
((vsi->info.port_vlan_flags & I40E_AQ_VSI_PVLAN_EMOD_MASK) ==
goto out;
/* Get the initial DCB configuration */
- err = i40e_init_dcb(hw);
+ err = i40e_init_dcb(hw, true);
if (!err) {
/* Device/Function is not DCBX capable */
if ((!hw->func_caps.dcb) ||
struct i40e_pf *pf = vsi->back;
u8 enabled_tc = 0, num_tc, hw;
bool need_reset = false;
+ int old_queue_pairs;
int ret = -EINVAL;
u16 mode;
int i;
+ old_queue_pairs = vsi->num_queue_pairs;
num_tc = mqprio_qopt->qopt.num_tc;
hw = mqprio_qopt->qopt.hw;
mode = mqprio_qopt->mode;
}
ret = i40e_configure_queue_channels(vsi);
if (ret) {
+ vsi->num_queue_pairs = old_queue_pairs;
netdev_info(netdev,
"Failed configuring queue channels\n");
need_reset = true;
dev_warn(&pf->pdev->dev,
"shutdown_lan_hmc failed: %d\n", ret);
}
+
+ /* Save the current PTP time so that we can restore the time after the
+ * reset completes.
+ */
+ i40e_ptp_save_hw_time(pf);
}
/**
INIT_LIST_HEAD(&pf->l3_flex_pit_list);
INIT_LIST_HEAD(&pf->l4_flex_pit_list);
+ INIT_LIST_HEAD(&pf->ddp_old_prof);
/* set up the locks for the AQ, do this only once in probe
* and destroy them only once in remove
if (err) {
if (err == I40E_ERR_FIRMWARE_API_VERSION)
dev_info(&pdev->dev,
- "The driver for the device stopped because the NVM image is newer than expected. You must install the most recent version of the network driver.\n");
+ "The driver for the device stopped because the NVM image v%u.%u is newer than expected v%u.%u. You must install the most recent version of the network driver.\n",
+ hw->aq.api_maj_ver,
+ hw->aq.api_min_ver,
+ I40E_FW_API_VERSION_MAJOR,
+ I40E_FW_MINOR_VERSION(hw));
else
dev_info(&pdev->dev,
"The driver for the device stopped because the device firmware failed to init. Try updating your NVM image.\n");
if (hw->aq.api_maj_ver == I40E_FW_API_VERSION_MAJOR &&
hw->aq.api_min_ver > I40E_FW_MINOR_VERSION(hw))
dev_info(&pdev->dev,
- "The driver for the device detected a newer version of the NVM image than expected. Please install the most recent version of the network driver.\n");
+ "The driver for the device detected a newer version of the NVM image v%u.%u than expected v%u.%u. Please install the most recent version of the network driver.\n",
+ hw->aq.api_maj_ver,
+ hw->aq.api_min_ver,
+ I40E_FW_API_VERSION_MAJOR,
+ I40E_FW_MINOR_VERSION(hw));
else if (hw->aq.api_maj_ver == 1 && hw->aq.api_min_ver < 4)
dev_info(&pdev->dev,
- "The driver for the device detected an older version of the NVM image than expected. Please update the NVM image.\n");
+ "The driver for the device detected an older version of the NVM image v%u.%u than expected v%u.%u. Please update the NVM image.\n",
+ hw->aq.api_maj_ver,
+ hw->aq.api_min_ver,
+ I40E_FW_API_VERSION_MAJOR,
+ I40E_FW_MINOR_VERSION(hw));
i40e_verify_eeprom(pf);
struct i40e_generic_seg_header *
i40e_find_segment_in_package(u32 segment_type,
struct i40e_package_header *pkg_header);
+struct i40e_profile_section_header *
+i40e_find_section_in_profile(u32 section_type,
+ struct i40e_profile_segment *profile);
enum i40e_status_code
i40e_write_profile(struct i40e_hw *hw, struct i40e_profile_segment *i40e_seg,
u32 track_id);
enum i40e_status_code
+i40e_rollback_profile(struct i40e_hw *hw, struct i40e_profile_segment *i40e_seg,
+ u32 track_id);
+enum i40e_status_code
i40e_add_pinfo_to_list(struct i40e_hw *hw,
struct i40e_profile_segment *profile,
u8 *profile_info_sec, u32 track_id);
pf->tstamp_config.rx_filter = HWTSTAMP_FILTER_NONE;
pf->tstamp_config.tx_type = HWTSTAMP_TX_OFF;
+ /* Set the previous "reset" time to the current Kernel clock time */
+ pf->ptp_prev_hw_time = ktime_to_timespec64(ktime_get_real());
+ pf->ptp_reset_start = ktime_get();
+
return 0;
}
+/**
+ * i40e_ptp_save_hw_time - Save the current PTP time as ptp_prev_hw_time
+ * @pf: Board private structure
+ *
+ * Read the current PTP time and save it into pf->ptp_prev_hw_time. This should
+ * be called at the end of preparing to reset, just before hardware reset
+ * occurs, in order to preserve the PTP time as close as possible across
+ * resets.
+ */
+void i40e_ptp_save_hw_time(struct i40e_pf *pf)
+{
+ /* don't try to access the PTP clock if it's not enabled */
+ if (!(pf->flags & I40E_FLAG_PTP))
+ return;
+
+ i40e_ptp_gettimex(&pf->ptp_caps, &pf->ptp_prev_hw_time, NULL);
+ /* Get a monotonic starting time for this reset */
+ pf->ptp_reset_start = ktime_get();
+}
+
+/**
+ * i40e_ptp_restore_hw_time - Restore the ptp_prev_hw_time + delta to PTP regs
+ * @pf: Board private structure
+ *
+ * Restore the PTP hardware clock registers. We previously cached the PTP
+ * hardware time as pf->ptp_prev_hw_time. To be as accurate as possible,
+ * update this value based on the time delta since the time was saved, using
+ * CLOCK_MONOTONIC (via ktime_get()) to calculate the time difference.
+ *
+ * This ensures that the hardware clock is restored to nearly what it should
+ * have been if a reset had not occurred.
+ */
+void i40e_ptp_restore_hw_time(struct i40e_pf *pf)
+{
+ ktime_t delta = ktime_sub(ktime_get(), pf->ptp_reset_start);
+
+ /* Update the previous HW time with the ktime delta */
+ timespec64_add_ns(&pf->ptp_prev_hw_time, ktime_to_ns(delta));
+
+ /* Restore the hardware clock registers */
+ i40e_ptp_settime(&pf->ptp_caps, &pf->ptp_prev_hw_time);
+}
+
/**
* i40e_ptp_init - Initialize the 1588 support after device probe or reset
* @pf: Board private structure
* This function sets device up for 1588 support. The first time it is run, it
* will create a PHC clock device. It does not create a clock device if one
* already exists. It also reconfigures the device after a reset.
+ *
+ * The first time a clock is created, i40e_ptp_create_clock will set
+ * pf->ptp_prev_hw_time to the current system time. During resets, it is
+ * expected that this timespec will be set to the last known PTP clock time,
+ * in order to preserve the clock time as close as possible across a reset.
**/
void i40e_ptp_init(struct i40e_pf *pf)
{
dev_err(&pf->pdev->dev, "%s: ptp_clock_register failed\n",
__func__);
} else if (pf->ptp_clock) {
- struct timespec64 ts;
u32 regval;
if (pf->hw.debug_mask & I40E_DEBUG_LAN)
/* reset timestamping mode */
i40e_ptp_set_timestamp_mode(pf, &pf->tstamp_config);
- /* Set the clock value. */
- ts = ktime_to_timespec64(ktime_get_real());
- i40e_ptp_settime(&pf->ptp_caps, &ts);
+ /* Restore the clock time based on last known value */
+ i40e_ptp_restore_hw_time(pf);
}
}
first->next_to_watch = tx_desc;
/* notify HW of packet */
- if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
+ if (netif_xmit_stopped(txring_txq(tx_ring)) || !netdev_xmit_more()) {
writel(i, tx_ring->tail);
/* we need this if more than one processor can write to our tail
struct i40e_metadata_segment {
struct i40e_generic_seg_header header;
struct i40e_ddp_version version;
+#define I40E_DDP_TRACKID_RDONLY 0
+#define I40E_DDP_TRACKID_INVALID 0xFFFFFFFF
u32 track_id;
char name[I40E_DDP_NAME_SIZE];
};
struct {
#define SECTION_TYPE_INFO 0x00000010
#define SECTION_TYPE_MMIO 0x00000800
+#define SECTION_TYPE_RB_MMIO 0x00001800
#define SECTION_TYPE_AQ 0x00000801
+#define SECTION_TYPE_RB_AQ 0x00001801
#define SECTION_TYPE_NOTE 0x80000000
#define SECTION_TYPE_NAME 0x80000001
+#define SECTION_TYPE_PROTO 0x80000002
+#define SECTION_TYPE_PCTYPE 0x80000003
+#define SECTION_TYPE_PTYPE 0x80000004
u32 type;
u32 offset;
u32 size;
} section;
};
+struct i40e_profile_tlv_section_record {
+ u8 rtype;
+ u8 type;
+ u16 len;
+ u8 data[12];
+};
+
+/* Generic AQ section in proflie */
+struct i40e_profile_aq_section {
+ u16 opcode;
+ u16 flags;
+ u8 param[16];
+ u16 datalen;
+ u8 data[1];
+};
+
struct i40e_profile_info {
u32 track_id;
struct i40e_ddp_version version;
(u8 *)&stats, sizeof(stats));
}
-/* If the VF is not trusted restrict the number of MAC/VLAN it can program */
-#define I40E_VC_MAX_MAC_ADDR_PER_VF 12
+/* If the VF is not trusted restrict the number of MAC/VLAN it can program
+ * MAC filters: 16 for multicast, 1 for MAC, 1 for broadcast
+ */
+#define I40E_VC_MAX_MAC_ADDR_PER_VF (16 + 1 + 1)
#define I40E_VC_MAX_VLAN_PER_VF 8
/**
#define I40E_FW_API_VERSION_MAJOR 0x0001
#define I40E_FW_API_VERSION_MINOR_X722 0x0005
-#define I40E_FW_API_VERSION_MINOR_X710 0x0007
+#define I40E_FW_API_VERSION_MINOR_X710 0x0008
#define I40E_FW_MINOR_VERSION(_h) ((_h)->mac.type == I40E_MAC_XL710 ? \
I40E_FW_API_VERSION_MINOR_X710 : \
first->next_to_watch = tx_desc;
/* notify HW of packet */
- if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
+ if (netif_xmit_stopped(txring_txq(tx_ring)) || !netdev_xmit_more()) {
writel(i, tx_ring->tail);
/* we need this if more than one processor can write to our tail
extern const char ice_drv_ver[];
#define ICE_BAR0 0
-#define ICE_DFLT_NUM_DESC 128
#define ICE_REQ_DESC_MULTIPLE 32
#define ICE_MIN_NUM_DESC ICE_REQ_DESC_MULTIPLE
#define ICE_MAX_NUM_DESC 8160
+/* set default number of Rx/Tx descriptors to the minimum between
+ * ICE_MAX_NUM_DESC and the number of descriptors to fill up an entire page
+ */
+#define ICE_DFLT_NUM_RX_DESC min_t(u16, ICE_MAX_NUM_DESC, \
+ ALIGN(PAGE_SIZE / \
+ sizeof(union ice_32byte_rx_desc), \
+ ICE_REQ_DESC_MULTIPLE))
+#define ICE_DFLT_NUM_TX_DESC min_t(u16, ICE_MAX_NUM_DESC, \
+ ALIGN(PAGE_SIZE / \
+ sizeof(struct ice_tx_desc), \
+ ICE_REQ_DESC_MULTIPLE))
+
#define ICE_DFLT_TRAFFIC_CLASS BIT(0)
#define ICE_INT_NAME_STR_LEN (IFNAMSIZ + 16)
#define ICE_ETHTOOL_FWVER_LEN 32
#define ice_for_each_q_vector(vsi, i) \
for ((i) = 0; (i) < (vsi)->num_q_vectors; (i)++)
+#define ICE_UCAST_PROMISC_BITS (ICE_PROMISC_UCAST_TX | ICE_PROMISC_MCAST_TX | \
+ ICE_PROMISC_UCAST_RX | ICE_PROMISC_MCAST_RX)
+
+#define ICE_UCAST_VLAN_PROMISC_BITS (ICE_PROMISC_UCAST_TX | \
+ ICE_PROMISC_MCAST_TX | \
+ ICE_PROMISC_UCAST_RX | \
+ ICE_PROMISC_MCAST_RX | \
+ ICE_PROMISC_VLAN_TX | \
+ ICE_PROMISC_VLAN_RX)
+
+#define ICE_MCAST_PROMISC_BITS (ICE_PROMISC_MCAST_TX | ICE_PROMISC_MCAST_RX)
+
+#define ICE_MCAST_VLAN_PROMISC_BITS (ICE_PROMISC_MCAST_TX | \
+ ICE_PROMISC_MCAST_RX | \
+ ICE_PROMISC_VLAN_TX | \
+ ICE_PROMISC_VLAN_RX)
+
struct ice_tc_info {
u16 qoffset;
u16 qcount_tx;
u8 irqs_ready;
u8 current_isup; /* Sync 'link up' logging */
u8 stat_offsets_loaded;
+ u8 vlan_ena;
/* queue information */
u8 tx_mapping_mode; /* ICE_MAP_MODE_[CONTIG|SCATTER] */
u16 num_txq; /* Used Tx queues */
u16 alloc_rxq; /* Allocated Rx queues */
u16 num_rxq; /* Used Rx queues */
- u16 num_desc;
+ u16 num_rx_desc;
+ u16 num_tx_desc;
struct ice_tc_cfg tc_cfg;
} ____cacheline_internodealigned_in_smp;
/* struct that defines an interrupt vector */
struct ice_q_vector {
struct ice_vsi *vsi;
- cpumask_t affinity_mask;
- struct napi_struct napi;
- struct ice_ring_container rx;
- struct ice_ring_container tx;
- struct irq_affinity_notify affinity_notify;
+
u16 v_idx; /* index in the vsi->q_vector array. */
- u8 num_ring_tx; /* total number of Tx rings in vector */
u8 num_ring_rx; /* total number of Rx rings in vector */
- char name[ICE_INT_NAME_STR_LEN];
+ u8 num_ring_tx; /* total number of Tx rings in vector */
+ u8 itr_countdown; /* when 0 should adjust adaptive ITR */
/* in usecs, need to use ice_intrl_to_usecs_reg() before writing this
* value to the device
*/
u8 intrl;
+
+ struct napi_struct napi;
+
+ struct ice_ring_container rx;
+ struct ice_ring_container tx;
+
+ cpumask_t affinity_mask;
+ struct irq_affinity_notify affinity_notify;
+
+ char name[ICE_INT_NAME_STR_LEN];
} ____cacheline_internodealigned_in_smp;
enum ice_pf_flags {
* @vsi: pointer to vsi struct, can be NULL
* @q_vector: pointer to q_vector, can be NULL
*/
-static inline void ice_irq_dynamic_ena(struct ice_hw *hw, struct ice_vsi *vsi,
- struct ice_q_vector *q_vector)
+static inline void
+ice_irq_dynamic_ena(struct ice_hw *hw, struct ice_vsi *vsi,
+ struct ice_q_vector *q_vector)
{
u32 vector = (vsi && q_vector) ? vsi->hw_base_vector + q_vector->v_idx :
((struct ice_pf *)hw->back)->hw_oicr_idx;
__le64 phy_type_low; /* Use values from ICE_PHY_TYPE_LOW_* */
__le64 phy_type_high; /* Use values from ICE_PHY_TYPE_HIGH_* */
u8 caps;
-#define ICE_AQ_PHY_ENA_TX_PAUSE_ABILITY BIT(0)
-#define ICE_AQ_PHY_ENA_RX_PAUSE_ABILITY BIT(1)
+#define ICE_AQ_PHY_ENA_VALID_MASK ICE_M(0xef, 0)
+#define ICE_AQ_PHY_ENA_TX_PAUSE_ABILITY BIT(0)
+#define ICE_AQ_PHY_ENA_RX_PAUSE_ABILITY BIT(1)
#define ICE_AQ_PHY_ENA_LOW_POWER BIT(2)
#define ICE_AQ_PHY_ENA_LINK BIT(3)
#define ICE_AQ_PHY_ENA_AUTO_LINK_UPDT BIT(5)
*
* Get Link Status (0x607). Returns the link status of the adapter.
*/
-static enum ice_status
+enum ice_status
ice_aq_get_link_info(struct ice_port_info *pi, bool ena_lse,
struct ice_link_status *link, struct ice_sq_cd *cd)
{
/* flag cleared so calling functions don't call AQ again */
pi->phy.get_link_info = false;
- return status;
+ return 0;
}
/**
*/
case ICE_RXDID_FLEX_NIC:
case ICE_RXDID_FLEX_NIC_2:
- ICE_PROG_FLG_ENTRY(hw, prof_id, ICE_RXFLG_PKT_FRG,
- ICE_RXFLG_UDP_GRE, ICE_RXFLG_PKT_DSI,
- ICE_RXFLG_FIN, idx++);
+ ICE_PROG_FLG_ENTRY(hw, prof_id, ICE_FLG_PKT_FRG,
+ ICE_FLG_UDP_GRE, ICE_FLG_PKT_DSI,
+ ICE_FLG_FIN, idx++);
/* flex flag 1 is not used for flexi-flag programming, skipping
* these four FLG64 bits.
*/
- ICE_PROG_FLG_ENTRY(hw, prof_id, ICE_RXFLG_SYN, ICE_RXFLG_RST,
- ICE_RXFLG_PKT_DSI, ICE_RXFLG_PKT_DSI, idx++);
- ICE_PROG_FLG_ENTRY(hw, prof_id, ICE_RXFLG_PKT_DSI,
- ICE_RXFLG_PKT_DSI, ICE_RXFLG_EVLAN_x8100,
- ICE_RXFLG_EVLAN_x9100, idx++);
- ICE_PROG_FLG_ENTRY(hw, prof_id, ICE_RXFLG_VLAN_x8100,
- ICE_RXFLG_TNL_VLAN, ICE_RXFLG_TNL_MAC,
- ICE_RXFLG_TNL0, idx++);
- ICE_PROG_FLG_ENTRY(hw, prof_id, ICE_RXFLG_TNL1, ICE_RXFLG_TNL2,
- ICE_RXFLG_PKT_DSI, ICE_RXFLG_PKT_DSI, idx);
+ ICE_PROG_FLG_ENTRY(hw, prof_id, ICE_FLG_SYN, ICE_FLG_RST,
+ ICE_FLG_PKT_DSI, ICE_FLG_PKT_DSI, idx++);
+ ICE_PROG_FLG_ENTRY(hw, prof_id, ICE_FLG_PKT_DSI,
+ ICE_FLG_PKT_DSI, ICE_FLG_EVLAN_x8100,
+ ICE_FLG_EVLAN_x9100, idx++);
+ ICE_PROG_FLG_ENTRY(hw, prof_id, ICE_FLG_VLAN_x8100,
+ ICE_FLG_TNL_VLAN, ICE_FLG_TNL_MAC,
+ ICE_FLG_TNL0, idx++);
+ ICE_PROG_FLG_ENTRY(hw, prof_id, ICE_FLG_TNL1, ICE_FLG_TNL2,
+ ICE_FLG_PKT_DSI, ICE_FLG_PKT_DSI, idx);
break;
default:
*
* Dumps debug log about control command with descriptor contents.
*/
-void ice_debug_cq(struct ice_hw *hw, u32 __maybe_unused mask, void *desc,
- void *buf, u16 buf_len)
+void
+ice_debug_cq(struct ice_hw *hw, u32 __maybe_unused mask, void *desc, void *buf,
+ u16 buf_len)
{
struct ice_aq_desc *cq_desc = (struct ice_aq_desc *)desc;
u16 len;
}
/**
- * ice_get_guar_num_vsi - determine number of guar VSI for a PF
+ * ice_get_num_per_func - determine number of resources per PF
* @hw: pointer to the hw structure
+ * @max: value to be evenly split between each PF
*
* Determine the number of valid functions by going through the bitmap returned
- * from parsing capabilities and use this to calculate the number of VSI per PF.
+ * from parsing capabilities and use this to calculate the number of resources
+ * per PF based on the max value passed in.
*/
-static u32 ice_get_guar_num_vsi(struct ice_hw *hw)
+static u32 ice_get_num_per_func(struct ice_hw *hw, u32 max)
{
u8 funcs;
if (!funcs)
return 0;
- return ICE_MAX_VSI / funcs;
+ return max / funcs;
}
/**
"HW caps: Dev.VSI cnt = %d\n",
dev_p->num_vsi_allocd_to_host);
} else if (func_p) {
- func_p->guar_num_vsi = ice_get_guar_num_vsi(hw);
+ func_p->guar_num_vsi =
+ ice_get_num_per_func(hw, ICE_MAX_VSI);
ice_debug(hw, ICE_DBG_INIT,
"HW caps: Func.VSI cnt = %d\n",
number);
* @hw: pointer to the hardware structure
* @opc: capabilities type to discover - pass in the command opcode
*/
-static enum ice_status ice_discover_caps(struct ice_hw *hw,
- enum ice_adminq_opc opc)
+static enum ice_status
+ice_discover_caps(struct ice_hw *hw, enum ice_adminq_opc opc)
{
enum ice_status status;
u32 cap_count;
if (!cfg)
return ICE_ERR_PARAM;
+ /* Ensure that only valid bits of cfg->caps can be turned on. */
+ if (cfg->caps & ~ICE_AQ_PHY_ENA_VALID_MASK) {
+ ice_debug(hw, ICE_DBG_PHY,
+ "Invalid bit is set in ice_aqc_set_phy_cfg_data->caps : 0x%x\n",
+ cfg->caps);
+
+ cfg->caps &= ICE_AQ_PHY_ENA_VALID_MASK;
+ }
+
ice_fill_dflt_direct_cmd_desc(&desc, ice_aqc_opc_set_phy_cfg);
desc.params.set_phy.lport_num = lport;
desc.flags |= cpu_to_le16(ICE_AQ_FLAG_RD);
/* clear the old pause settings */
cfg.caps = pcaps->caps & ~(ICE_AQC_PHY_EN_TX_LINK_PAUSE |
ICE_AQC_PHY_EN_RX_LINK_PAUSE);
+
/* set the new capabilities */
cfg.caps |= pause_mask;
+
/* If the capabilities have changed, then set the new config */
if (cfg.caps != pcaps->caps) {
int retry_count, retry_max = 10;
return ice_aq_send_cmd(pi->hw, &desc, NULL, 0, cd);
}
+/**
+ * ice_aq_set_event_mask
+ * @hw: pointer to the HW struct
+ * @port_num: port number of the physical function
+ * @mask: event mask to be set
+ * @cd: pointer to command details structure or NULL
+ *
+ * Set event mask (0x0613)
+ */
+enum ice_status
+ice_aq_set_event_mask(struct ice_hw *hw, u8 port_num, u16 mask,
+ struct ice_sq_cd *cd)
+{
+ struct ice_aqc_set_event_mask *cmd;
+ struct ice_aq_desc desc;
+
+ cmd = &desc.params.set_event_mask;
+
+ ice_fill_dflt_direct_cmd_desc(&desc, ice_aqc_opc_set_event_mask);
+
+ cmd->lport_num = port_num;
+
+ cmd->event_mask = cpu_to_le16(mask);
+ return ice_aq_send_cmd(hw, &desc, NULL, 0, cd);
+}
+
/**
* ice_aq_set_port_id_led
* @pi: pointer to the port information
* @dest_ctx: the context to be written to
* @ce_info: a description of the struct to be filled
*/
-static void ice_write_byte(u8 *src_ctx, u8 *dest_ctx,
- const struct ice_ctx_ele *ce_info)
+static void
+ice_write_byte(u8 *src_ctx, u8 *dest_ctx, const struct ice_ctx_ele *ce_info)
{
u8 src_byte, dest_byte, mask;
u8 *from, *dest;
* @dest_ctx: the context to be written to
* @ce_info: a description of the struct to be filled
*/
-static void ice_write_word(u8 *src_ctx, u8 *dest_ctx,
- const struct ice_ctx_ele *ce_info)
+static void
+ice_write_word(u8 *src_ctx, u8 *dest_ctx, const struct ice_ctx_ele *ce_info)
{
u16 src_word, mask;
__le16 dest_word;
* @dest_ctx: the context to be written to
* @ce_info: a description of the struct to be filled
*/
-static void ice_write_dword(u8 *src_ctx, u8 *dest_ctx,
- const struct ice_ctx_ele *ce_info)
+static void
+ice_write_dword(u8 *src_ctx, u8 *dest_ctx, const struct ice_ctx_ele *ce_info)
{
u32 src_dword, mask;
__le32 dest_dword;
* @dest_ctx: the context to be written to
* @ce_info: a description of the struct to be filled
*/
-static void ice_write_qword(u8 *src_ctx, u8 *dest_ctx,
- const struct ice_ctx_ele *ce_info)
+static void
+ice_write_qword(u8 *src_ctx, u8 *dest_ctx, const struct ice_ctx_ele *ce_info)
{
u64 src_qword, mask;
__le64 dest_qword;
mutex_lock(&pi->sched_lock);
- for (i = 0; i < ICE_MAX_TRAFFIC_CLASS; i++) {
+ ice_for_each_traffic_class(i) {
/* configuration is possible only if TC node is present */
if (!ice_sched_get_tc_node(pi, i))
continue;
* @prev_stat: ptr to previous loaded stat value
* @cur_stat: ptr to current stat value
*/
-void ice_stat_update40(struct ice_hw *hw, u32 hireg, u32 loreg,
- bool prev_stat_loaded, u64 *prev_stat, u64 *cur_stat)
+void
+ice_stat_update40(struct ice_hw *hw, u32 hireg, u32 loreg,
+ bool prev_stat_loaded, u64 *prev_stat, u64 *cur_stat)
{
u64 new_data;
* @prev_stat: ptr to previous loaded stat value
* @cur_stat: ptr to current stat value
*/
-void ice_stat_update32(struct ice_hw *hw, u32 reg, bool prev_stat_loaded,
- u64 *prev_stat, u64 *cur_stat)
+void
+ice_stat_update32(struct ice_hw *hw, u32 reg, bool prev_stat_loaded,
+ u64 *prev_stat, u64 *cur_stat)
{
u32 new_data;
#include "ice_switch.h"
#include <linux/avf/virtchnl.h>
-void ice_debug_cq(struct ice_hw *hw, u32 mask, void *desc, void *buf,
- u16 buf_len);
+void
+ice_debug_cq(struct ice_hw *hw, u32 mask, void *desc, void *buf, u16 buf_len);
enum ice_status ice_init_hw(struct ice_hw *hw);
void ice_deinit_hw(struct ice_hw *hw);
enum ice_status ice_check_reset(struct ice_hw *hw);
enum ice_aq_res_access_type access, u32 timeout);
void ice_release_res(struct ice_hw *hw, enum ice_aq_res_ids res);
enum ice_status ice_init_nvm(struct ice_hw *hw);
-enum ice_status ice_read_sr_buf(struct ice_hw *hw, u16 offset, u16 *words,
- u16 *data);
+enum ice_status
+ice_read_sr_buf(struct ice_hw *hw, u16 offset, u16 *words, u16 *data);
enum ice_status
ice_sq_send_cmd(struct ice_hw *hw, struct ice_ctl_q_info *cq,
struct ice_aq_desc *desc, void *buf, u16 buf_size,
ice_aq_set_link_restart_an(struct ice_port_info *pi, bool ena_link,
struct ice_sq_cd *cd);
enum ice_status
+ice_aq_get_link_info(struct ice_port_info *pi, bool ena_lse,
+ struct ice_link_status *link, struct ice_sq_cd *cd);
+enum ice_status
+ice_aq_set_event_mask(struct ice_hw *hw, u8 port_num, u16 mask,
+ struct ice_sq_cd *cd);
+enum ice_status
ice_aq_set_port_id_led(struct ice_port_info *pi, bool is_orig_mode,
struct ice_sq_cd *cd);
enum ice_status ice_replay_vsi(struct ice_hw *hw, u16 vsi_handle);
void ice_replay_post(struct ice_hw *hw);
void ice_output_fw_log(struct ice_hw *hw, struct ice_aq_desc *desc, void *buf);
-void ice_stat_update40(struct ice_hw *hw, u32 hireg, u32 loreg,
- bool prev_stat_loaded, u64 *prev_stat, u64 *cur_stat);
-void ice_stat_update32(struct ice_hw *hw, u32 reg, bool prev_stat_loaded,
- u64 *prev_stat, u64 *cur_stat);
+void
+ice_stat_update40(struct ice_hw *hw, u32 hireg, u32 loreg,
+ bool prev_stat_loaded, u64 *prev_stat, u64 *cur_stat);
+void
+ice_stat_update32(struct ice_hw *hw, u32 reg, bool prev_stat_loaded,
+ u64 *prev_stat, u64 *cur_stat);
#endif /* _ICE_COMMON_H_ */
*
* Reports speed/duplex settings based on media_type
*/
-static int ice_get_link_ksettings(struct net_device *netdev,
- struct ethtool_link_ksettings *ks)
+static int
+ice_get_link_ksettings(struct net_device *netdev,
+ struct ethtool_link_ksettings *ks)
{
struct ice_netdev_priv *np = netdev_priv(netdev);
struct ice_link_status *hw_link_info;
return -EOPNOTSUPP;
/* Check if this is lan vsi */
- for (idx = 0 ; idx < pf->num_alloc_vsi ; idx++) {
+ ice_for_each_vsi(pf, idx)
if (pf->vsi[idx]->type == ICE_VSI_PF) {
if (np->vsi != pf->vsi[idx])
return -EOPNOTSUPP;
break;
}
- }
if (p->phy.media_type != ICE_MEDIA_BASET &&
p->phy.media_type != ICE_MEDIA_FIBER &&
*
* Returns Success if the command is supported.
*/
-static int ice_get_rxnfc(struct net_device *netdev, struct ethtool_rxnfc *cmd,
- u32 __always_unused *rule_locs)
+static int
+ice_get_rxnfc(struct net_device *netdev, struct ethtool_rxnfc *cmd,
+ u32 __always_unused *rule_locs)
{
struct ice_netdev_priv *np = netdev_priv(netdev);
struct ice_vsi *vsi = np->vsi;
* Returns -EINVAL if the table specifies an invalid queue id, otherwise
* returns 0 after programming the table.
*/
-static int ice_set_rxfh(struct net_device *netdev, const u32 *indir,
- const u8 *key, const u8 hfunc)
+static int
+ice_set_rxfh(struct net_device *netdev, const u32 *indir, const u8 *key,
+ const u8 hfunc)
{
struct ice_netdev_priv *np = netdev_priv(netdev);
struct ice_vsi *vsi = np->vsi;
return __ice_get_coalesce(netdev, ec, -1);
}
-static int ice_get_per_q_coalesce(struct net_device *netdev, u32 q_num,
- struct ethtool_coalesce *ec)
+static int
+ice_get_per_q_coalesce(struct net_device *netdev, u32 q_num,
+ struct ethtool_coalesce *ec)
{
return __ice_get_coalesce(netdev, ec, q_num);
}
return __ice_set_coalesce(netdev, ec, -1);
}
-static int ice_set_per_q_coalesce(struct net_device *netdev, u32 q_num,
- struct ethtool_coalesce *ec)
+static int
+ice_set_per_q_coalesce(struct net_device *netdev, u32 q_num,
+ struct ethtool_coalesce *ec)
{
return __ice_set_coalesce(netdev, ec, q_num);
}
#define VPGEN_VFRTRIG_VFSWR_M BIT(0)
#define PFHMC_ERRORDATA 0x00520500
#define PFHMC_ERRORINFO 0x00520400
+#define GLINT_CTL 0x0016CC54
+#define GLINT_CTL_DIS_AUTOMASK_M BIT(0)
+#define GLINT_CTL_ITR_GRAN_200_S 16
+#define GLINT_CTL_ITR_GRAN_200_M ICE_M(0xF, 16)
+#define GLINT_CTL_ITR_GRAN_100_S 20
+#define GLINT_CTL_ITR_GRAN_100_M ICE_M(0xF, 20)
+#define GLINT_CTL_ITR_GRAN_50_S 24
+#define GLINT_CTL_ITR_GRAN_50_M ICE_M(0xF, 24)
+#define GLINT_CTL_ITR_GRAN_25_S 28
+#define GLINT_CTL_ITR_GRAN_25_M ICE_M(0xF, 28)
#define GLINT_DYN_CTL(_INT) (0x00160000 + ((_INT) * 4))
#define GLINT_DYN_CTL_INTENA_M BIT(0)
#define GLINT_DYN_CTL_CLEARPBA_M BIT(1)
#define VPINT_ALLOC_PCI_LAST_S 12
#define VPINT_ALLOC_PCI_LAST_M ICE_M(0x7FF, 12)
#define VPINT_ALLOC_PCI_VALID_M BIT(31)
+#define VPINT_MBX_CTL(_VSI) (0x0016A000 + ((_VSI) * 4))
+#define VPINT_MBX_CTL_CAUSE_ENA_M BIT(30)
#define GLLAN_RCTL_0 0x002941F8
#define QRX_CONTEXT(_i, _QRX) (0x00280000 + ((_i) * 8192 + (_QRX) * 4))
#define QRX_CTRL(_QRX) (0x00120000 + ((_QRX) * 4))
ICE_RX_MDID_HASH_HIGH,
};
-/* Rx Flag64 packet flag bits */
-enum ice_rx_flg64_bits {
- ICE_RXFLG_PKT_DSI = 0,
- ICE_RXFLG_EVLAN_x8100 = 15,
- ICE_RXFLG_EVLAN_x9100,
- ICE_RXFLG_VLAN_x8100,
- ICE_RXFLG_TNL_MAC = 22,
- ICE_RXFLG_TNL_VLAN,
- ICE_RXFLG_PKT_FRG,
- ICE_RXFLG_FIN = 32,
- ICE_RXFLG_SYN,
- ICE_RXFLG_RST,
- ICE_RXFLG_TNL0 = 38,
- ICE_RXFLG_TNL1,
- ICE_RXFLG_TNL2,
- ICE_RXFLG_UDP_GRE,
- ICE_RXFLG_RSVD = 63
+/* RX/TX Flag64 packet flag bits */
+enum ice_flg64_bits {
+ ICE_FLG_PKT_DSI = 0,
+ ICE_FLG_EVLAN_x8100 = 15,
+ ICE_FLG_EVLAN_x9100,
+ ICE_FLG_VLAN_x8100,
+ ICE_FLG_TNL_MAC = 22,
+ ICE_FLG_TNL_VLAN,
+ ICE_FLG_PKT_FRG,
+ ICE_FLG_FIN = 32,
+ ICE_FLG_SYN,
+ ICE_FLG_RST,
+ ICE_FLG_TNL0 = 38,
+ ICE_FLG_TNL1,
+ ICE_FLG_TNL2,
+ ICE_FLG_UDP_GRE,
+ ICE_FLG_RSVD = 63
};
/* for ice_32byte_rx_flex_desc.ptype_flexi_flags0 member */
ICE_TX_DESC_CMD_EOP = 0x0001,
ICE_TX_DESC_CMD_RS = 0x0002,
ICE_TX_DESC_CMD_IL2TAG1 = 0x0008,
- ICE_TX_DESC_CMD_IIPT_IPV6 = 0x0020, /* 2 BITS */
- ICE_TX_DESC_CMD_IIPT_IPV4 = 0x0040, /* 2 BITS */
- ICE_TX_DESC_CMD_IIPT_IPV4_CSUM = 0x0060, /* 2 BITS */
- ICE_TX_DESC_CMD_L4T_EOFT_TCP = 0x0100, /* 2 BITS */
- ICE_TX_DESC_CMD_L4T_EOFT_SCTP = 0x0200, /* 2 BITS */
- ICE_TX_DESC_CMD_L4T_EOFT_UDP = 0x0300, /* 2 BITS */
+ ICE_TX_DESC_CMD_IIPT_IPV6 = 0x0020,
+ ICE_TX_DESC_CMD_IIPT_IPV4 = 0x0040,
+ ICE_TX_DESC_CMD_IIPT_IPV4_CSUM = 0x0060,
+ ICE_TX_DESC_CMD_L4T_EOFT_TCP = 0x0100,
+ ICE_TX_DESC_CMD_L4T_EOFT_SCTP = 0x0200,
+ ICE_TX_DESC_CMD_L4T_EOFT_UDP = 0x0300,
};
#define ICE_TXD_QW1_OFFSET_S 16
int i;
for (i = 0; i < ICE_Q_WAIT_MAX_RETRY; i++) {
- u32 rx_reg = rd32(&pf->hw, QRX_CTRL(pf_q));
-
- if (ena == !!(rx_reg & QRX_CTRL_QENA_STAT_M))
- break;
+ if (ena == !!(rd32(&pf->hw, QRX_CTRL(pf_q)) &
+ QRX_CTRL_QENA_STAT_M))
+ return 0;
usleep_range(20, 40);
}
- if (i >= ICE_Q_WAIT_MAX_RETRY)
- return -ETIMEDOUT;
- return 0;
+ return -ETIMEDOUT;
}
/**
}
/**
- * ice_vsi_set_num_qs - Set num queues, descriptors and vectors for a VSI
+ * ice_vsi_set_num_desc - Set number of descriptors for queues on this VSI
* @vsi: the VSI being configured
+ */
+static void ice_vsi_set_num_desc(struct ice_vsi *vsi)
+{
+ switch (vsi->type) {
+ case ICE_VSI_PF:
+ vsi->num_rx_desc = ICE_DFLT_NUM_RX_DESC;
+ vsi->num_tx_desc = ICE_DFLT_NUM_TX_DESC;
+ break;
+ default:
+ dev_dbg(&vsi->back->pdev->dev,
+ "Not setting number of Tx/Rx descriptors for VSI type %d\n",
+ vsi->type);
+ break;
+ }
+}
+
+/**
+ * ice_vsi_set_num_qs - Set number of queues, descriptors and vectors for a VSI
+ * @vsi: the VSI being configured
+ * @vf_id: Id of the VF being configured
*
* Return 0 on success and a negative value on error
*/
-static void ice_vsi_set_num_qs(struct ice_vsi *vsi)
+static void ice_vsi_set_num_qs(struct ice_vsi *vsi, u16 vf_id)
{
struct ice_pf *pf = vsi->back;
+ struct ice_vf *vf = NULL;
+
+ if (vsi->type == ICE_VSI_VF)
+ vsi->vf_id = vf_id;
+
switch (vsi->type) {
case ICE_VSI_PF:
vsi->alloc_txq = pf->num_lan_tx;
vsi->alloc_rxq = pf->num_lan_rx;
- vsi->num_desc = ALIGN(ICE_DFLT_NUM_DESC, ICE_REQ_DESC_MULTIPLE);
vsi->num_q_vectors = max_t(int, pf->num_lan_rx, pf->num_lan_tx);
break;
case ICE_VSI_VF:
- vsi->alloc_txq = pf->num_vf_qps;
- vsi->alloc_rxq = pf->num_vf_qps;
+ vf = &pf->vf[vsi->vf_id];
+ vsi->alloc_txq = vf->num_vf_qs;
+ vsi->alloc_rxq = vf->num_vf_qs;
/* pf->num_vf_msix includes (VF miscellaneous vector +
* data queue interrupts). Since vsi->num_q_vectors is number
* of queues vectors, subtract 1 from the original vector
vsi->type);
break;
}
+
+ ice_vsi_set_num_desc(vsi);
}
/**
* ice_vsi_alloc - Allocates the next available struct VSI in the PF
* @pf: board private structure
* @type: type of VSI
+ * @vf_id: Id of the VF being configured
*
* returns a pointer to a VSI on success, NULL on failure.
*/
-static struct ice_vsi *ice_vsi_alloc(struct ice_pf *pf, enum ice_vsi_type type)
+static struct ice_vsi *
+ice_vsi_alloc(struct ice_pf *pf, enum ice_vsi_type type, u16 vf_id)
{
struct ice_vsi *vsi = NULL;
vsi->idx = pf->next_vsi;
vsi->work_lmt = ICE_DFLT_IRQ_WORK;
- ice_vsi_set_num_qs(vsi);
+ if (type == ICE_VSI_VF)
+ ice_vsi_set_num_qs(vsi, vf_id);
+ else
+ ice_vsi_set_num_qs(vsi, ICE_INVAL_VFID);
switch (vsi->type) {
case ICE_VSI_PF:
/**
* __ice_vsi_get_qs - helper function for assigning queues from PF to VSI
- * @qs_cfg: gathered variables needed for PF->VSI queues assignment
+ * @qs_cfg: gathered variables needed for pf->vsi queues assignment
*
- * This is an internal function for assigning queues from the PF to VSI and
- * initially tries to find contiguous space. If it is not successful to find
- * contiguous space, then it tries with the scatter approach.
+ * This function first tries to find contiguous space. If it is not successful,
+ * it tries with the scatter approach.
*
* Return 0 on success and -ENOMEM in case of no left space in PF queue bitmap
*/
/* find the (rounded up) power-of-2 of qcount */
pow = order_base_2(qcount_rx);
- for (i = 0; i < ICE_MAX_TRAFFIC_CLASS; i++) {
+ ice_for_each_traffic_class(i) {
if (!(vsi->tc_cfg.ena_tc & BIT(i))) {
/* TC is not enabled */
vsi->tc_cfg.tc_info[i].qoffset = 0;
tx_count += tx_numq_tc;
ctxt->info.tc_mapping[i] = cpu_to_le16(qmap);
}
- vsi->num_rxq = offset;
+
+ /* if offset is non-zero, means it is calculated correctly based on
+ * enabled TCs for a given VSI otherwise qcount_rx will always
+ * be correct and non-zero because it is based off - VSI's
+ * allocated Rx queues which is at least 1 (hence qcount_tx will be
+ * at least 1)
+ */
+ if (offset)
+ vsi->num_rxq = offset;
+ else
+ vsi->num_rxq = qcount_rx;
+
vsi->num_txq = tx_count;
if (vsi->type == ICE_VSI_VF && vsi->num_txq != vsi->num_rxq) {
if (!ctxt)
return -ENOMEM;
+ ctxt->info = vsi->info;
switch (vsi->type) {
case ICE_VSI_PF:
ctxt->flags = ICE_AQ_VSI_TYPE_PF;
ctxt->info.sw_id = vsi->port_info->sw_id;
ice_vsi_setup_q_map(vsi, ctxt);
+ /* Enable MAC Antispoof with new VSI being initialized or updated */
+ if (vsi->type == ICE_VSI_VF && pf->vf[vsi->vf_id].spoofchk) {
+ ctxt->info.valid_sections |=
+ cpu_to_le16(ICE_AQ_VSI_PROP_SECURITY_VALID);
+ ctxt->info.sec_flags |=
+ ICE_AQ_VSI_SEC_FLAG_ENA_MAC_ANTI_SPOOF;
+ }
+
ret = ice_add_vsi(hw, vsi->idx, ctxt, NULL);
if (ret) {
dev_err(&pf->pdev->dev,
ring->ring_active = false;
ring->vsi = vsi;
ring->dev = &pf->pdev->dev;
- ring->count = vsi->num_desc;
+ ring->count = vsi->num_tx_desc;
vsi->tx_rings[i] = ring;
}
ring->vsi = vsi;
ring->netdev = vsi->netdev;
ring->dev = &pf->pdev->dev;
- ring->count = vsi->num_desc;
+ ring->count = vsi->num_rx_desc;
vsi->rx_rings[i] = ring;
}
num_q_grps = 1;
/* set up and configure the Tx queues for each enabled TC */
- for (tc = 0; tc < ICE_MAX_TRAFFIC_CLASS; tc++) {
+ ice_for_each_traffic_class(tc) {
if (!(vsi->tc_cfg.ena_tc & BIT(tc)))
break;
return 0;
}
+/**
+ * ice_cfg_itr_gran - set the ITR granularity to 2 usecs if not already set
+ * @hw: board specific structure
+ */
+static void ice_cfg_itr_gran(struct ice_hw *hw)
+{
+ u32 regval = rd32(hw, GLINT_CTL);
+
+ /* no need to update global register if ITR gran is already set */
+ if (!(regval & GLINT_CTL_DIS_AUTOMASK_M) &&
+ (((regval & GLINT_CTL_ITR_GRAN_200_M) >>
+ GLINT_CTL_ITR_GRAN_200_S) == ICE_ITR_GRAN_US) &&
+ (((regval & GLINT_CTL_ITR_GRAN_100_M) >>
+ GLINT_CTL_ITR_GRAN_100_S) == ICE_ITR_GRAN_US) &&
+ (((regval & GLINT_CTL_ITR_GRAN_50_M) >>
+ GLINT_CTL_ITR_GRAN_50_S) == ICE_ITR_GRAN_US) &&
+ (((regval & GLINT_CTL_ITR_GRAN_25_M) >>
+ GLINT_CTL_ITR_GRAN_25_S) == ICE_ITR_GRAN_US))
+ return;
+
+ regval = ((ICE_ITR_GRAN_US << GLINT_CTL_ITR_GRAN_200_S) &
+ GLINT_CTL_ITR_GRAN_200_M) |
+ ((ICE_ITR_GRAN_US << GLINT_CTL_ITR_GRAN_100_S) &
+ GLINT_CTL_ITR_GRAN_100_M) |
+ ((ICE_ITR_GRAN_US << GLINT_CTL_ITR_GRAN_50_S) &
+ GLINT_CTL_ITR_GRAN_50_M) |
+ ((ICE_ITR_GRAN_US << GLINT_CTL_ITR_GRAN_25_S) &
+ GLINT_CTL_ITR_GRAN_25_M);
+ wr32(hw, GLINT_CTL, regval);
+}
+
/**
* ice_cfg_itr - configure the initial interrupt throttle values
* @hw: pointer to the HW structure
static void
ice_cfg_itr(struct ice_hw *hw, struct ice_q_vector *q_vector, u16 vector)
{
+ ice_cfg_itr_gran(hw);
+
if (q_vector->num_ring_rx) {
struct ice_ring_container *rc = &q_vector->rx;
rc->target_itr = ITR_TO_REG(rc->itr_setting);
rc->next_update = jiffies + 1;
rc->current_itr = rc->target_itr;
- rc->latency_range = ICE_LOW_LATENCY;
wr32(hw, GLINT_ITR(rc->itr_idx, vector),
ITR_REG_ALIGN(rc->current_itr) >> ICE_ITR_GRAN_S);
}
rc->target_itr = ITR_TO_REG(rc->itr_setting);
rc->next_update = jiffies + 1;
rc->current_itr = rc->target_itr;
- rc->latency_range = ICE_LOW_LATENCY;
wr32(hw, GLINT_ITR(rc->itr_idx, vector),
ITR_REG_ALIGN(rc->current_itr) >> ICE_ITR_GRAN_S);
}
* @rst_src: reset source
* @rel_vmvf_num: Relative id of VF/VM
*/
-int ice_vsi_stop_lan_tx_rings(struct ice_vsi *vsi,
- enum ice_disq_rst_src rst_src, u16 rel_vmvf_num)
+int
+ice_vsi_stop_lan_tx_rings(struct ice_vsi *vsi, enum ice_disq_rst_src rst_src,
+ u16 rel_vmvf_num)
{
return ice_vsi_stop_tx_rings(vsi, rst_src, rel_vmvf_num, vsi->tx_rings,
0);
* ice_cfg_vlan_pruning - enable or disable VLAN pruning on the VSI
* @vsi: VSI to enable or disable VLAN pruning on
* @ena: set to true to enable VLAN pruning and false to disable it
+ * @vlan_promisc: enable valid security flags if not in VLAN promiscuous mode
*
* returns 0 if VSI is updated, negative otherwise
*/
-int ice_cfg_vlan_pruning(struct ice_vsi *vsi, bool ena)
+int ice_cfg_vlan_pruning(struct ice_vsi *vsi, bool ena, bool vlan_promisc)
{
struct ice_vsi_ctx *ctxt;
struct device *dev;
ctxt->info.sw_flags2 &= ~ICE_AQ_VSI_SW_FLAG_RX_VLAN_PRUNE_ENA;
}
- ctxt->info.valid_sections = cpu_to_le16(ICE_AQ_VSI_PROP_SECURITY_VALID |
- ICE_AQ_VSI_PROP_SW_VALID);
+ if (!vlan_promisc)
+ ctxt->info.valid_sections =
+ cpu_to_le16(ICE_AQ_VSI_PROP_SECURITY_VALID |
+ ICE_AQ_VSI_PROP_SW_VALID);
status = ice_update_vsi(&vsi->back->hw, vsi->idx, ctxt, NULL);
if (status) {
struct ice_vsi *vsi;
int ret, i;
- vsi = ice_vsi_alloc(pf, type);
+ if (type == ICE_VSI_VF)
+ vsi = ice_vsi_alloc(pf, type, vf_id);
+ else
+ vsi = ice_vsi_alloc(pf, type, ICE_INVAL_VFID);
+
if (!vsi) {
dev_err(dev, "could not allocate VSI\n");
return NULL;
int ice_vsi_rebuild(struct ice_vsi *vsi)
{
u16 max_txqs[ICE_MAX_TRAFFIC_CLASS] = { 0 };
+ struct ice_vf *vf = NULL;
struct ice_pf *pf;
int ret, i;
return -EINVAL;
pf = vsi->back;
+ if (vsi->type == ICE_VSI_VF)
+ vf = &pf->vf[vsi->vf_id];
+
ice_rm_vsi_lan_cfg(vsi->port_info, vsi->idx);
ice_vsi_free_q_vectors(vsi);
- ice_free_res(vsi->back->sw_irq_tracker, vsi->sw_base_vector, vsi->idx);
- ice_free_res(vsi->back->hw_irq_tracker, vsi->hw_base_vector, vsi->idx);
- vsi->sw_base_vector = 0;
+
+ if (vsi->type != ICE_VSI_VF) {
+ /* reclaim SW interrupts back to the common pool */
+ ice_free_res(pf->sw_irq_tracker, vsi->sw_base_vector, vsi->idx);
+ pf->num_avail_sw_msix += vsi->num_q_vectors;
+ vsi->sw_base_vector = 0;
+ /* reclaim HW interrupts back to the common pool */
+ ice_free_res(pf->hw_irq_tracker, vsi->hw_base_vector,
+ vsi->idx);
+ pf->num_avail_hw_msix += vsi->num_q_vectors;
+ } else {
+ /* Reclaim VF resources back to the common pool for reset and
+ * and rebuild, with vector reassignment
+ */
+ ice_free_res(pf->hw_irq_tracker, vf->first_vector_idx,
+ vsi->idx);
+ pf->num_avail_hw_msix += pf->num_vf_msix;
+ }
vsi->hw_base_vector = 0;
+
ice_vsi_clear_rings(vsi);
ice_vsi_free_arrays(vsi, false);
ice_dev_onetime_setup(&vsi->back->hw);
- ice_vsi_set_num_qs(vsi);
+ if (vsi->type == ICE_VSI_VF)
+ ice_vsi_set_num_qs(vsi, vf->vf_id);
+ else
+ ice_vsi_set_num_qs(vsi, ICE_INVAL_VFID);
ice_vsi_set_tc_cfg(vsi);
/* Initialize VSI struct elements and create VSI in FW */
ice_vsi_stop_lan_tx_rings(struct ice_vsi *vsi, enum ice_disq_rst_src rst_src,
u16 rel_vmvf_num);
-int ice_cfg_vlan_pruning(struct ice_vsi *vsi, bool ena);
+int ice_cfg_vlan_pruning(struct ice_vsi *vsi, bool ena, bool vlan_promisc);
void ice_vsi_delete(struct ice_vsi *vsi);
void ice_vsi_free_tx_rings(struct ice_vsi *vsi);
-int ice_vsi_cfg_tc(struct ice_vsi *vsi, u8 ena_tc);
-
int ice_vsi_manage_rss_lut(struct ice_vsi *vsi, bool ena);
#endif /* !_ICE_LIB_H_ */
test_bit(ICE_VSI_FLAG_VLAN_FLTR_CHANGED, vsi->flags);
}
+/**
+ * ice_cfg_promisc - Enable or disable promiscuous mode for a given PF
+ * @vsi: the VSI being configured
+ * @promisc_m: mask of promiscuous config bits
+ * @set_promisc: enable or disable promisc flag request
+ *
+ */
+static int ice_cfg_promisc(struct ice_vsi *vsi, u8 promisc_m, bool set_promisc)
+{
+ struct ice_hw *hw = &vsi->back->hw;
+ enum ice_status status = 0;
+
+ if (vsi->type != ICE_VSI_PF)
+ return 0;
+
+ if (vsi->vlan_ena) {
+ status = ice_set_vlan_vsi_promisc(hw, vsi->idx, promisc_m,
+ set_promisc);
+ } else {
+ if (set_promisc)
+ status = ice_set_vsi_promisc(hw, vsi->idx, promisc_m,
+ 0);
+ else
+ status = ice_clear_vsi_promisc(hw, vsi->idx, promisc_m,
+ 0);
+ }
+
+ if (status)
+ return -EIO;
+
+ return 0;
+}
+
/**
* ice_vsi_sync_fltr - Update the VSI filter list to the HW
* @vsi: ptr to the VSI
struct ice_hw *hw = &pf->hw;
enum ice_status status = 0;
u32 changed_flags = 0;
+ u8 promisc_m;
int err = 0;
if (!vsi->netdev)
/* Add mac addresses in the sync list */
status = ice_add_mac(hw, &vsi->tmp_sync_list);
ice_free_fltr_list(dev, &vsi->tmp_sync_list);
- if (status) {
+ /* If filter is added successfully or already exists, do not go into
+ * 'if' condition and report it as error. Instead continue processing
+ * rest of the function.
+ */
+ if (status && status != ICE_ERR_ALREADY_EXISTS) {
netdev_err(netdev, "Failed to add MAC filters\n");
/* If there is no more space for new umac filters, vsi
* should go into promiscuous mode. There should be some
}
}
/* check for changes in promiscuous modes */
- if (changed_flags & IFF_ALLMULTI)
- netdev_warn(netdev, "Unsupported configuration\n");
+ if (changed_flags & IFF_ALLMULTI) {
+ if (vsi->current_netdev_flags & IFF_ALLMULTI) {
+ if (vsi->vlan_ena)
+ promisc_m = ICE_MCAST_VLAN_PROMISC_BITS;
+ else
+ promisc_m = ICE_MCAST_PROMISC_BITS;
+
+ err = ice_cfg_promisc(vsi, promisc_m, true);
+ if (err) {
+ netdev_err(netdev, "Error setting Multicast promiscuous mode on VSI %i\n",
+ vsi->vsi_num);
+ vsi->current_netdev_flags &= ~IFF_ALLMULTI;
+ goto out_promisc;
+ }
+ } else if (!(vsi->current_netdev_flags & IFF_ALLMULTI)) {
+ if (vsi->vlan_ena)
+ promisc_m = ICE_MCAST_VLAN_PROMISC_BITS;
+ else
+ promisc_m = ICE_MCAST_PROMISC_BITS;
+
+ err = ice_cfg_promisc(vsi, promisc_m, false);
+ if (err) {
+ netdev_err(netdev, "Error clearing Multicast promiscuous mode on VSI %i\n",
+ vsi->vsi_num);
+ vsi->current_netdev_flags |= IFF_ALLMULTI;
+ goto out_promisc;
+ }
+ }
+ }
if (((changed_flags & IFF_PROMISC) || promisc_forced_on) ||
test_bit(ICE_VSI_FLAG_PROMISC_CHANGED, vsi->flags)) {
clear_bit(ICE_FLAG_FLTR_SYNC, pf->flags);
- for (v = 0; v < pf->num_alloc_vsi; v++)
+ ice_for_each_vsi(pf, v)
if (pf->vsi[v] && ice_vsi_fltr_changed(pf->vsi[v]) &&
ice_vsi_sync_fltr(pf->vsi[v])) {
/* come back and try again later */
{
struct ice_hw *hw = &pf->hw;
+ /* already prepared for reset */
+ if (test_bit(__ICE_PREPARED_FOR_RESET, pf->state))
+ return;
+
/* Notify VFs of impending reset */
if (ice_check_sq_alive(hw, &hw->mailboxq))
ice_vc_notify_reset(pf);
ice_rebuild(pf);
clear_bit(__ICE_PREPARED_FOR_RESET, pf->state);
clear_bit(__ICE_PFR_REQ, pf->state);
+ ice_reset_all_vfs(pf, true);
}
}
* for the reset now), poll for reset done, rebuild and return.
*/
if (test_bit(__ICE_RESET_OICR_RECV, pf->state)) {
- clear_bit(__ICE_GLOBR_RECV, pf->state);
- clear_bit(__ICE_CORER_RECV, pf->state);
- if (!test_bit(__ICE_PREPARED_FOR_RESET, pf->state))
- ice_prepare_for_reset(pf);
+ /* Perform the largest reset requested */
+ if (test_and_clear_bit(__ICE_CORER_RECV, pf->state))
+ reset_type = ICE_RESET_CORER;
+ if (test_and_clear_bit(__ICE_GLOBR_RECV, pf->state))
+ reset_type = ICE_RESET_GLOBR;
+ /* return if no valid reset type requested */
+ if (reset_type == ICE_RESET_INVAL)
+ return;
+ ice_prepare_for_reset(pf);
/* make sure we are ready to rebuild */
if (ice_check_reset(&pf->hw)) {
clear_bit(__ICE_PFR_REQ, pf->state);
clear_bit(__ICE_CORER_REQ, pf->state);
clear_bit(__ICE_GLOBR_REQ, pf->state);
+ ice_reset_all_vfs(pf, true);
}
return;
case ICE_FC_RX_PAUSE:
fc = "RX";
break;
+ case ICE_FC_NONE:
+ fc = "None";
+ break;
default:
fc = "Unknown";
break;
pf->serv_tmr_prev = jiffies;
- if (ice_link_event(pf, pf->hw.port_info))
- dev_dbg(&pf->pdev->dev, "ice_link_event failed\n");
-
/* Update the stats for active netdevs so the network stack
* can look at updated numbers whenever it cares to
*/
ice_update_pf_stats(pf);
- for (i = 0; i < pf->num_alloc_vsi; i++)
+ ice_for_each_vsi(pf, i)
if (pf->vsi[i] && pf->vsi[i]->netdev)
ice_update_vsi_stats(pf->vsi[i]);
}
+/**
+ * ice_init_link_events - enable/initialize link events
+ * @pi: pointer to the port_info instance
+ *
+ * Returns -EIO on failure, 0 on success
+ */
+static int ice_init_link_events(struct ice_port_info *pi)
+{
+ u16 mask;
+
+ mask = ~((u16)(ICE_AQ_LINK_EVENT_UPDOWN | ICE_AQ_LINK_EVENT_MEDIA_NA |
+ ICE_AQ_LINK_EVENT_MODULE_QUAL_FAIL));
+
+ if (ice_aq_set_event_mask(pi->hw, pi->lport, mask, NULL)) {
+ dev_dbg(ice_hw_to_dev(pi->hw),
+ "Failed to set link event mask for port %d\n",
+ pi->lport);
+ return -EIO;
+ }
+
+ if (ice_aq_get_link_info(pi, true, NULL, NULL)) {
+ dev_dbg(ice_hw_to_dev(pi->hw),
+ "Failed to enable link events for port %d\n",
+ pi->lport);
+ return -EIO;
+ }
+
+ return 0;
+}
+
+/**
+ * ice_handle_link_event - handle link event via ARQ
+ * @pf: pf that the link event is associated with
+ *
+ * Return -EINVAL if port_info is null
+ * Return status on success
+ */
+static int ice_handle_link_event(struct ice_pf *pf)
+{
+ struct ice_port_info *port_info;
+ int status;
+
+ port_info = pf->hw.port_info;
+ if (!port_info)
+ return -EINVAL;
+
+ status = ice_link_event(pf, port_info);
+ if (status)
+ dev_dbg(&pf->pdev->dev,
+ "Could not process link event, error %d\n", status);
+
+ return status;
+}
+
/**
* __ice_clean_ctrlq - helper function to clean controlq rings
* @pf: ptr to struct ice_pf
opcode = le16_to_cpu(event.desc.opcode);
switch (opcode) {
+ case ice_aqc_opc_get_link_status:
+ if (ice_handle_link_event(pf))
+ dev_err(&pf->pdev->dev,
+ "Could not handle link event\n");
+ break;
case ice_mbx_opc_send_msg_to_pf:
ice_vc_process_vf_msg(pf, &event);
break;
clear_bit(__ICE_SERVICE_SCHED, pf->state);
}
+/**
+ * ice_service_task_restart - restart service task and schedule works
+ * @pf: board private structure
+ *
+ * This function is needed for suspend and resume works (e.g WoL scenario)
+ */
+static void ice_service_task_restart(struct ice_pf *pf)
+{
+ clear_bit(__ICE_SERVICE_DIS, pf->state);
+ ice_service_task_schedule(pf);
+}
+
/**
* ice_service_timer - timer callback to schedule service task
* @t: pointer to timer_list
* This is a callback function used by the irq_set_affinity_notifier function
* so that we may register to receive changes to the irq affinity masks.
*/
-static void ice_irq_affinity_notify(struct irq_affinity_notify *notify,
- const cpumask_t *mask)
+static void
+ice_irq_affinity_notify(struct irq_affinity_notify *notify,
+ const cpumask_t *mask)
{
struct ice_q_vector *q_vector =
container_of(notify, struct ice_q_vector, affinity_notify);
/* skip this unused q_vector */
continue;
}
- err = devm_request_irq(&pf->pdev->dev,
- pf->msix_entries[base + vector].vector,
- vsi->irq_handler, 0, q_vector->name,
- q_vector);
+ err = devm_request_irq(&pf->pdev->dev, irq_num,
+ vsi->irq_handler, 0,
+ q_vector->name, q_vector);
if (err) {
netdev_err(vsi->netdev,
"MSIX request_irq failed, error: %d\n", err);
*
* net_device_ops implementation for adding vlan ids
*/
-static int ice_vlan_rx_add_vid(struct net_device *netdev,
- __always_unused __be16 proto, u16 vid)
+static int
+ice_vlan_rx_add_vid(struct net_device *netdev, __always_unused __be16 proto,
+ u16 vid)
{
struct ice_netdev_priv *np = netdev_priv(netdev);
struct ice_vsi *vsi = np->vsi;
+ int ret;
if (vid >= VLAN_N_VID) {
netdev_err(netdev, "VLAN id requested %d is out of range %d\n",
/* Enable VLAN pruning when VLAN 0 is added */
if (unlikely(!vid)) {
- int ret = ice_cfg_vlan_pruning(vsi, true);
-
+ ret = ice_cfg_vlan_pruning(vsi, true, false);
if (ret)
return ret;
}
* needed to continue allowing all untagged packets since VLAN prune
* list is applied to all packets by the switch
*/
- return ice_vsi_add_vlan(vsi, vid);
+ ret = ice_vsi_add_vlan(vsi, vid);
+ if (!ret) {
+ vsi->vlan_ena = true;
+ set_bit(ICE_VSI_FLAG_VLAN_FLTR_CHANGED, vsi->flags);
+ }
+
+ return ret;
}
/**
*
* net_device_ops implementation for removing vlan ids
*/
-static int ice_vlan_rx_kill_vid(struct net_device *netdev,
- __always_unused __be16 proto, u16 vid)
+static int
+ice_vlan_rx_kill_vid(struct net_device *netdev, __always_unused __be16 proto,
+ u16 vid)
{
struct ice_netdev_priv *np = netdev_priv(netdev);
struct ice_vsi *vsi = np->vsi;
- int status;
+ int ret;
if (vsi->info.pvid)
return -EINVAL;
/* Make sure ice_vsi_kill_vlan is successful before updating VLAN
* information
*/
- status = ice_vsi_kill_vlan(vsi, vid);
- if (status)
- return status;
+ ret = ice_vsi_kill_vlan(vsi, vid);
+ if (ret)
+ return ret;
/* Disable VLAN pruning when VLAN 0 is removed */
if (unlikely(!vid))
- status = ice_cfg_vlan_pruning(vsi, false);
+ ret = ice_cfg_vlan_pruning(vsi, false, false);
- return status;
+ vsi->vlan_ena = false;
+ set_bit(ICE_VSI_FLAG_VLAN_FLTR_CHANGED, vsi->flags);
+ return ret;
}
/**
return 0;
}
-/**
- * ice_verify_itr_gran - verify driver's assumption of ITR granularity
- * @pf: pointer to the PF structure
- *
- * There is no error returned here because the driver will be able to handle a
- * different ITR granularity, but interrupt moderation will not be accurate if
- * the driver's assumptions are not verified. This assumption is made so we can
- * use constants in the hot path instead of accessing structure members.
- */
-static void ice_verify_itr_gran(struct ice_pf *pf)
-{
- if (pf->hw.itr_gran != (ICE_ITR_GRAN_S << 1))
- dev_warn(&pf->pdev->dev,
- "%d ITR granularity assumption is invalid, actual ITR granularity is %d. Interrupt moderation will be inaccurate!\n",
- (ICE_ITR_GRAN_S << 1), pf->hw.itr_gran);
-}
-
/**
* ice_verify_cacheline_size - verify driver's assumption of 64 Byte cache lines
* @pf: pointer to the PF structure
*
* Returns 0 on success, negative on failure
*/
-static int ice_probe(struct pci_dev *pdev,
- const struct pci_device_id __always_unused *ent)
+static int
+ice_probe(struct pci_dev *pdev, const struct pci_device_id __always_unused *ent)
{
+ struct device *dev = &pdev->dev;
struct ice_pf *pf;
struct ice_hw *hw;
int err;
err = pcim_iomap_regions(pdev, BIT(ICE_BAR0), pci_name(pdev));
if (err) {
- dev_err(&pdev->dev, "BAR0 I/O map error %d\n", err);
+ dev_err(dev, "BAR0 I/O map error %d\n", err);
return err;
}
- pf = devm_kzalloc(&pdev->dev, sizeof(*pf), GFP_KERNEL);
+ pf = devm_kzalloc(dev, sizeof(*pf), GFP_KERNEL);
if (!pf)
return -ENOMEM;
/* set up for high or low dma */
- err = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
+ err = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64));
if (err)
- err = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32));
+ err = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32));
if (err) {
- dev_err(&pdev->dev, "DMA configuration failed: 0x%x\n", err);
+ dev_err(dev, "DMA configuration failed: 0x%x\n", err);
return err;
}
err = ice_init_hw(hw);
if (err) {
- dev_err(&pdev->dev, "ice_init_hw failed: %d\n", err);
+ dev_err(dev, "ice_init_hw failed: %d\n", err);
err = -EIO;
goto err_exit_unroll;
}
- dev_info(&pdev->dev, "firmware %d.%d.%05d api %d.%d\n",
+ dev_info(dev, "firmware %d.%d.%05d api %d.%d\n",
hw->fw_maj_ver, hw->fw_min_ver, hw->fw_build,
hw->api_maj_ver, hw->api_min_ver);
goto err_init_pf_unroll;
}
- pf->vsi = devm_kcalloc(&pdev->dev, pf->num_alloc_vsi,
- sizeof(*pf->vsi), GFP_KERNEL);
+ pf->vsi = devm_kcalloc(dev, pf->num_alloc_vsi, sizeof(*pf->vsi),
+ GFP_KERNEL);
if (!pf->vsi) {
err = -ENOMEM;
goto err_init_pf_unroll;
err = ice_init_interrupt_scheme(pf);
if (err) {
- dev_err(&pdev->dev,
- "ice_init_interrupt_scheme failed: %d\n", err);
+ dev_err(dev, "ice_init_interrupt_scheme failed: %d\n", err);
err = -EIO;
goto err_init_interrupt_unroll;
}
if (test_bit(ICE_FLAG_MSIX_ENA, pf->flags)) {
err = ice_req_irq_msix_misc(pf);
if (err) {
- dev_err(&pdev->dev,
- "setup of misc vector failed: %d\n", err);
+ dev_err(dev, "setup of misc vector failed: %d\n", err);
goto err_init_interrupt_unroll;
}
}
/* create switch struct for the switch element created by FW on boot */
- pf->first_sw = devm_kzalloc(&pdev->dev, sizeof(*pf->first_sw),
- GFP_KERNEL);
+ pf->first_sw = devm_kzalloc(dev, sizeof(*pf->first_sw), GFP_KERNEL);
if (!pf->first_sw) {
err = -ENOMEM;
goto err_msix_misc_unroll;
err = ice_setup_pf_sw(pf);
if (err) {
- dev_err(&pdev->dev,
- "probe failed due to setup pf switch:%d\n", err);
+ dev_err(dev, "probe failed due to setup pf switch:%d\n", err);
goto err_alloc_sw_unroll;
}
/* since everything is good, start the service timer */
mod_timer(&pf->serv_tmr, round_jiffies(jiffies + pf->serv_tmr_period));
+ err = ice_init_link_events(pf->hw.port_info);
+ if (err) {
+ dev_err(dev, "ice_init_link_events failed: %d\n", err);
+ goto err_alloc_sw_unroll;
+ }
+
ice_verify_cacheline_size(pf);
- ice_verify_itr_gran(pf);
return 0;
ice_free_irq_msix_misc(pf);
err_init_interrupt_unroll:
ice_clear_interrupt_scheme(pf);
- devm_kfree(&pdev->dev, pf->vsi);
+ devm_kfree(dev, pf->vsi);
err_init_pf_unroll:
ice_deinit_pf(pf);
ice_deinit_hw(hw);
pci_disable_pcie_error_reporting(pdev);
}
+/**
+ * ice_pci_err_detected - warning that PCI error has been detected
+ * @pdev: PCI device information struct
+ * @err: the type of PCI error
+ *
+ * Called to warn that something happened on the PCI bus and the error handling
+ * is in progress. Allows the driver to gracefully prepare/handle PCI errors.
+ */
+static pci_ers_result_t
+ice_pci_err_detected(struct pci_dev *pdev, enum pci_channel_state err)
+{
+ struct ice_pf *pf = pci_get_drvdata(pdev);
+
+ if (!pf) {
+ dev_err(&pdev->dev, "%s: unrecoverable device error %d\n",
+ __func__, err);
+ return PCI_ERS_RESULT_DISCONNECT;
+ }
+
+ if (!test_bit(__ICE_SUSPENDED, pf->state)) {
+ ice_service_task_stop(pf);
+
+ if (!test_bit(__ICE_PREPARED_FOR_RESET, pf->state)) {
+ set_bit(__ICE_PFR_REQ, pf->state);
+ ice_prepare_for_reset(pf);
+ }
+ }
+
+ return PCI_ERS_RESULT_NEED_RESET;
+}
+
+/**
+ * ice_pci_err_slot_reset - a PCI slot reset has just happened
+ * @pdev: PCI device information struct
+ *
+ * Called to determine if the driver can recover from the PCI slot reset by
+ * using a register read to determine if the device is recoverable.
+ */
+static pci_ers_result_t ice_pci_err_slot_reset(struct pci_dev *pdev)
+{
+ struct ice_pf *pf = pci_get_drvdata(pdev);
+ pci_ers_result_t result;
+ int err;
+ u32 reg;
+
+ err = pci_enable_device_mem(pdev);
+ if (err) {
+ dev_err(&pdev->dev,
+ "Cannot re-enable PCI device after reset, error %d\n",
+ err);
+ result = PCI_ERS_RESULT_DISCONNECT;
+ } else {
+ pci_set_master(pdev);
+ pci_restore_state(pdev);
+ pci_save_state(pdev);
+ pci_wake_from_d3(pdev, false);
+
+ /* Check for life */
+ reg = rd32(&pf->hw, GLGEN_RTRIG);
+ if (!reg)
+ result = PCI_ERS_RESULT_RECOVERED;
+ else
+ result = PCI_ERS_RESULT_DISCONNECT;
+ }
+
+ err = pci_cleanup_aer_uncorrect_error_status(pdev);
+ if (err)
+ dev_dbg(&pdev->dev,
+ "pci_cleanup_aer_uncorrect_error_status failed, error %d\n",
+ err);
+ /* non-fatal, continue */
+
+ return result;
+}
+
+/**
+ * ice_pci_err_resume - restart operations after PCI error recovery
+ * @pdev: PCI device information struct
+ *
+ * Called to allow the driver to bring things back up after PCI error and/or
+ * reset recovery have finished
+ */
+static void ice_pci_err_resume(struct pci_dev *pdev)
+{
+ struct ice_pf *pf = pci_get_drvdata(pdev);
+
+ if (!pf) {
+ dev_err(&pdev->dev,
+ "%s failed, device is unrecoverable\n", __func__);
+ return;
+ }
+
+ if (test_bit(__ICE_SUSPENDED, pf->state)) {
+ dev_dbg(&pdev->dev, "%s failed to resume normal operations!\n",
+ __func__);
+ return;
+ }
+
+ ice_do_reset(pf, ICE_RESET_PFR);
+ ice_service_task_restart(pf);
+ mod_timer(&pf->serv_tmr, round_jiffies(jiffies + pf->serv_tmr_period));
+}
+
+/**
+ * ice_pci_err_reset_prepare - prepare device driver for PCI reset
+ * @pdev: PCI device information struct
+ */
+static void ice_pci_err_reset_prepare(struct pci_dev *pdev)
+{
+ struct ice_pf *pf = pci_get_drvdata(pdev);
+
+ if (!test_bit(__ICE_SUSPENDED, pf->state)) {
+ ice_service_task_stop(pf);
+
+ if (!test_bit(__ICE_PREPARED_FOR_RESET, pf->state)) {
+ set_bit(__ICE_PFR_REQ, pf->state);
+ ice_prepare_for_reset(pf);
+ }
+ }
+}
+
+/**
+ * ice_pci_err_reset_done - PCI reset done, device driver reset can begin
+ * @pdev: PCI device information struct
+ */
+static void ice_pci_err_reset_done(struct pci_dev *pdev)
+{
+ ice_pci_err_resume(pdev);
+}
+
/* ice_pci_tbl - PCI Device ID Table
*
* Wildcard entries (PCI_ANY_ID) should come last
};
MODULE_DEVICE_TABLE(pci, ice_pci_tbl);
+static const struct pci_error_handlers ice_pci_err_handler = {
+ .error_detected = ice_pci_err_detected,
+ .slot_reset = ice_pci_err_slot_reset,
+ .reset_prepare = ice_pci_err_reset_prepare,
+ .reset_done = ice_pci_err_reset_done,
+ .resume = ice_pci_err_resume
+};
+
static struct pci_driver ice_driver = {
.name = KBUILD_MODNAME,
.id_table = ice_pci_tbl,
.probe = ice_probe,
.remove = ice_remove,
.sriov_configure = ice_sriov_configure,
+ .err_handler = &ice_pci_err_handler
};
/**
* @addr: the MAC address entry being added
* @vid: VLAN id
*/
-static int ice_fdb_del(struct ndmsg *ndm, __always_unused struct nlattr *tb[],
- struct net_device *dev, const unsigned char *addr,
- __always_unused u16 vid)
+static int
+ice_fdb_del(struct ndmsg *ndm, __always_unused struct nlattr *tb[],
+ struct net_device *dev, const unsigned char *addr,
+ __always_unused u16 vid)
{
int err;
* @netdev: ptr to the netdev being adjusted
* @features: the feature set that the stack is suggesting
*/
-static int ice_set_features(struct net_device *netdev,
- netdev_features_t features)
+static int
+ice_set_features(struct net_device *netdev, netdev_features_t features)
{
struct ice_netdev_priv *np = netdev_priv(netdev);
struct ice_vsi *vsi = np->vsi;
ice_service_task_schedule(pf);
- return err;
+ return 0;
}
/**
* This function fetches stats from the ring considering the atomic operations
* that needs to be performed to read u64 values in 32 bit machine.
*/
-static void ice_fetch_u64_stats_per_ring(struct ice_ring *ring, u64 *pkts,
- u64 *bytes)
+static void
+ice_fetch_u64_stats_per_ring(struct ice_ring *ring, u64 *pkts, u64 *bytes)
{
unsigned int start;
*pkts = 0;
if (!pf->vsi)
return;
- for (i = 0; i < pf->num_alloc_vsi; i++) {
+ ice_for_each_vsi(pf, i) {
if (!pf->vsi[i])
continue;
int i;
/* loop through pf->vsi array and reinit the VSI if found */
- for (i = 0; i < pf->num_alloc_vsi; i++) {
+ ice_for_each_vsi(pf, i) {
int err;
if (!pf->vsi[i])
continue;
- /* VF VSI rebuild isn't supported yet */
- if (pf->vsi[i]->type == ICE_VSI_VF)
- continue;
-
err = ice_vsi_rebuild(pf->vsi[i]);
if (err) {
dev_err(&pf->pdev->dev,
int i;
/* loop through pf->vsi array and replay the VSI if found */
- for (i = 0; i < pf->num_alloc_vsi; i++) {
+ ice_for_each_vsi(pf, i) {
if (!pf->vsi[i])
continue;
goto err_vsi_rebuild;
}
- ice_reset_all_vfs(pf, true);
-
- for (i = 0; i < pf->num_alloc_vsi; i++) {
+ ice_for_each_vsi(pf, i) {
bool link_up;
if (!pf->vsi[i] || pf->vsi[i]->type != ICE_VSI_PF)
status = ice_aq_delete_sched_elems(hw, 1, buf, buf_size,
&num_groups_removed, NULL);
if (status || num_groups_removed != 1)
- ice_debug(hw, ICE_DBG_SCHED, "remove elements failed\n");
+ ice_debug(hw, ICE_DBG_SCHED, "remove node failed FW error %d\n",
+ hw->adminq.sq_last_status);
devm_kfree(ice_hw_to_dev(hw), buf);
return status;
node->info.data.elem_type != ICE_AQC_ELEM_TYPE_ROOT_PORT &&
node->info.data.elem_type != ICE_AQC_ELEM_TYPE_LEAF) {
u32 teid = le32_to_cpu(node->info.node_teid);
- enum ice_status status;
- status = ice_sched_remove_elems(hw, node->parent, 1, &teid);
- if (status)
- ice_debug(hw, ICE_DBG_SCHED,
- "remove element failed %d\n", status);
+ ice_sched_remove_elems(hw, node->parent, 1, &teid);
}
parent = node->parent;
/* root has no parent */
status = ice_aq_add_sched_elems(hw, 1, buf, buf_size,
&num_groups_added, NULL);
if (status || num_groups_added != 1) {
- ice_debug(hw, ICE_DBG_SCHED, "add elements failed\n");
+ ice_debug(hw, ICE_DBG_SCHED, "add node failed FW Error %d\n",
+ hw->adminq.sq_last_status);
devm_kfree(ice_hw_to_dev(hw), buf);
return ICE_ERR_CFG;
}
return 0;
}
-/**
- * ice_sched_rm_vsi_child_nodes - remove VSI child nodes from the tree
- * @pi: port information structure
- * @vsi_node: pointer to the VSI node
- * @num_nodes: pointer to the num nodes that needs to be removed per layer
- * @owner: node owner (lan or rdma)
- *
- * This function removes the VSI child nodes from the tree. It gets called for
- * lan and rdma separately.
- */
-static void
-ice_sched_rm_vsi_child_nodes(struct ice_port_info *pi,
- struct ice_sched_node *vsi_node, u16 *num_nodes,
- u8 owner)
-{
- struct ice_sched_node *node, *next;
- u8 i, qgl, vsil;
- u16 num;
-
- qgl = ice_sched_get_qgrp_layer(pi->hw);
- vsil = ice_sched_get_vsi_layer(pi->hw);
-
- for (i = qgl; i > vsil; i--) {
- num = num_nodes[i];
- node = ice_sched_get_first_node(pi->hw, vsi_node, i);
- while (node && num) {
- next = node->sibling;
- if (node->owner == owner && !node->num_children) {
- ice_free_sched_node(pi, node);
- num--;
- }
- node = next;
- }
- }
-}
-
/**
* ice_sched_calc_vsi_support_nodes - calculate number of VSI support nodes
* @hw: pointer to the hw struct
ice_sched_update_vsi_child_nodes(struct ice_port_info *pi, u16 vsi_handle,
u8 tc, u16 new_numqs, u8 owner)
{
- u16 prev_num_nodes[ICE_AQC_TOPO_MAX_LEVEL_NUM] = { 0 };
u16 new_num_nodes[ICE_AQC_TOPO_MAX_LEVEL_NUM] = { 0 };
struct ice_sched_node *vsi_node;
struct ice_sched_node *tc_node;
enum ice_status status = 0;
struct ice_hw *hw = pi->hw;
u16 prev_numqs;
- u8 i;
tc_node = ice_sched_get_tc_node(pi, tc);
if (!tc_node)
else
return ICE_ERR_PARAM;
- /* num queues are not changed */
- if (prev_numqs == new_numqs)
+ /* num queues are not changed or less than the previous number */
+ if (new_numqs <= prev_numqs)
return status;
-
- /* calculate number of nodes based on prev/new number of qs */
- if (prev_numqs)
- ice_sched_calc_vsi_child_nodes(hw, prev_numqs, prev_num_nodes);
-
if (new_numqs)
ice_sched_calc_vsi_child_nodes(hw, new_numqs, new_num_nodes);
-
- if (prev_numqs > new_numqs) {
- for (i = 0; i < ICE_AQC_TOPO_MAX_LEVEL_NUM; i++)
- new_num_nodes[i] = prev_num_nodes[i] - new_num_nodes[i];
-
- ice_sched_rm_vsi_child_nodes(pi, vsi_node, new_num_nodes,
- owner);
- } else {
- for (i = 0; i < ICE_AQC_TOPO_MAX_LEVEL_NUM; i++)
- new_num_nodes[i] -= prev_num_nodes[i];
-
- status = ice_sched_add_vsi_child_nodes(pi, vsi_handle, tc_node,
- new_num_nodes, owner);
- if (status)
- return status;
- }
-
+ /* Keep the max number of queue configuration all the time. Update the
+ * tree only if number of queues > previous number of queues. This may
+ * leave some extra nodes in the tree if number of queues < previous
+ * number but that wouldn't harm anything. Removing those extra nodes
+ * may complicate the code if those nodes are part of SRL or
+ * individually rate limited.
+ */
+ status = ice_sched_add_vsi_child_nodes(pi, vsi_handle, tc_node,
+ new_num_nodes, owner);
+ if (status)
+ return status;
vsi_ctx->sched.max_lanq[tc] = new_numqs;
- return status;
+ return 0;
}
/**
enum ice_status status = 0;
struct ice_hw *hw = pi->hw;
+ ice_debug(pi->hw, ICE_DBG_SCHED, "add/config VSI %d\n", vsi_handle);
tc_node = ice_sched_get_tc_node(pi, tc);
if (!tc_node)
return ICE_ERR_PARAM;
{
enum ice_status status = ICE_ERR_PARAM;
struct ice_vsi_ctx *vsi_ctx;
- u8 i, j = 0;
+ u8 i;
+ ice_debug(pi->hw, ICE_DBG_SCHED, "removing VSI %d\n", vsi_handle);
if (!ice_is_vsi_valid(pi->hw, vsi_handle))
return status;
mutex_lock(&pi->sched_lock);
if (!vsi_ctx)
goto exit_sched_rm_vsi_cfg;
- for (i = 0; i < ICE_MAX_TRAFFIC_CLASS; i++) {
+ ice_for_each_traffic_class(i) {
struct ice_sched_node *vsi_node, *tc_node;
+ u8 j = 0;
tc_node = ice_sched_get_tc_node(pi, i);
if (!tc_node)
*
* save the VSI context entry for a given VSI handle
*/
-static void ice_save_vsi_ctx(struct ice_hw *hw, u16 vsi_handle,
- struct ice_vsi_ctx *vsi)
+static void
+ice_save_vsi_ctx(struct ice_hw *hw, u16 vsi_handle, struct ice_vsi_ctx *vsi)
{
hw->vsi_ctx[vsi_handle] = vsi;
}
tmp_vsi_ctx->vsi_num = vsi_ctx->vsi_num;
}
- return status;
+ return 0;
}
/**
fi->fltr_act == ICE_FWD_TO_VSI_LIST ||
fi->fltr_act == ICE_FWD_TO_Q ||
fi->fltr_act == ICE_FWD_TO_QGRP)) {
- fi->lb_en = true;
- /* Do not set lan_en to TRUE if
+ /* Setting LB for prune actions will result in replicated
+ * packets to the internal switch that will be dropped.
+ */
+ if (fi->lkup_type != ICE_SW_LKUP_VLAN)
+ fi->lb_en = true;
+
+ /* Set lan_en to TRUE if
* 1. The switch is a VEB AND
* 2
- * 2.1 The lookup is MAC with unicast addr for MAC, OR
- * 2.2 The lookup is MAC_VLAN with unicast addr for MAC
+ * 2.1 The lookup is a directional lookup like ethertype,
+ * promiscuous, ethertype-mac, promiscuous-vlan
+ * and default-port OR
+ * 2.2 The lookup is VLAN, OR
+ * 2.3 The lookup is MAC with mcast or bcast addr for MAC, OR
+ * 2.4 The lookup is MAC_VLAN with mcast or bcast addr for MAC.
*
- * In all other cases, the LAN enable has to be set to true.
+ * OR
+ *
+ * The switch is a VEPA.
+ *
+ * In all other cases, the LAN enable has to be set to false.
*/
- if (!(hw->evb_veb &&
- ((fi->lkup_type == ICE_SW_LKUP_MAC &&
- is_unicast_ether_addr(fi->l_data.mac.mac_addr)) ||
- (fi->lkup_type == ICE_SW_LKUP_MAC_VLAN &&
- is_unicast_ether_addr(fi->l_data.mac_vlan.mac_addr)))))
+ if (hw->evb_veb) {
+ if (fi->lkup_type == ICE_SW_LKUP_ETHERTYPE ||
+ fi->lkup_type == ICE_SW_LKUP_PROMISC ||
+ fi->lkup_type == ICE_SW_LKUP_ETHERTYPE_MAC ||
+ fi->lkup_type == ICE_SW_LKUP_PROMISC_VLAN ||
+ fi->lkup_type == ICE_SW_LKUP_DFLT ||
+ fi->lkup_type == ICE_SW_LKUP_VLAN ||
+ (fi->lkup_type == ICE_SW_LKUP_MAC &&
+ !is_unicast_ether_addr(fi->l_data.mac.mac_addr)) ||
+ (fi->lkup_type == ICE_SW_LKUP_MAC_VLAN &&
+ !is_unicast_ether_addr(fi->l_data.mac.mac_addr)))
+ fi->lan_en = true;
+ } else {
fi->lan_en = true;
+ }
}
}
return status;
}
+/**
+ * ice_determine_promisc_mask
+ * @fi: filter info to parse
+ *
+ * Helper function to determine which ICE_PROMISC_ mask corresponds
+ * to given filter into.
+ */
+static u8 ice_determine_promisc_mask(struct ice_fltr_info *fi)
+{
+ u16 vid = fi->l_data.mac_vlan.vlan_id;
+ u8 *macaddr = fi->l_data.mac.mac_addr;
+ bool is_tx_fltr = false;
+ u8 promisc_mask = 0;
+
+ if (fi->flag == ICE_FLTR_TX)
+ is_tx_fltr = true;
+
+ if (is_broadcast_ether_addr(macaddr))
+ promisc_mask |= is_tx_fltr ?
+ ICE_PROMISC_BCAST_TX : ICE_PROMISC_BCAST_RX;
+ else if (is_multicast_ether_addr(macaddr))
+ promisc_mask |= is_tx_fltr ?
+ ICE_PROMISC_MCAST_TX : ICE_PROMISC_MCAST_RX;
+ else if (is_unicast_ether_addr(macaddr))
+ promisc_mask |= is_tx_fltr ?
+ ICE_PROMISC_UCAST_TX : ICE_PROMISC_UCAST_RX;
+ if (vid)
+ promisc_mask |= is_tx_fltr ?
+ ICE_PROMISC_VLAN_TX : ICE_PROMISC_VLAN_RX;
+
+ return promisc_mask;
+}
+
+/**
+ * ice_remove_promisc - Remove promisc based filter rules
+ * @hw: pointer to the hardware structure
+ * @recp_id: recipe id for which the rule needs to removed
+ * @v_list: list of promisc entries
+ */
+static enum ice_status
+ice_remove_promisc(struct ice_hw *hw, u8 recp_id,
+ struct list_head *v_list)
+{
+ struct ice_fltr_list_entry *v_list_itr, *tmp;
+
+ list_for_each_entry_safe(v_list_itr, tmp, v_list, list_entry) {
+ v_list_itr->status =
+ ice_remove_rule_internal(hw, recp_id, v_list_itr);
+ if (v_list_itr->status)
+ return v_list_itr->status;
+ }
+ return 0;
+}
+
+/**
+ * ice_clear_vsi_promisc - clear specified promiscuous mode(s) for given VSI
+ * @hw: pointer to the hardware structure
+ * @vsi_handle: VSI handle to clear mode
+ * @promisc_mask: mask of promiscuous config bits to clear
+ * @vid: VLAN ID to clear VLAN promiscuous
+ */
+enum ice_status
+ice_clear_vsi_promisc(struct ice_hw *hw, u16 vsi_handle, u8 promisc_mask,
+ u16 vid)
+{
+ struct ice_switch_info *sw = hw->switch_info;
+ struct ice_fltr_list_entry *fm_entry, *tmp;
+ struct list_head remove_list_head;
+ struct ice_fltr_mgmt_list_entry *itr;
+ struct list_head *rule_head;
+ struct mutex *rule_lock; /* Lock to protect filter rule list */
+ enum ice_status status = 0;
+ u8 recipe_id;
+
+ if (!ice_is_vsi_valid(hw, vsi_handle))
+ return ICE_ERR_PARAM;
+
+ if (vid)
+ recipe_id = ICE_SW_LKUP_PROMISC_VLAN;
+ else
+ recipe_id = ICE_SW_LKUP_PROMISC;
+
+ rule_head = &sw->recp_list[recipe_id].filt_rules;
+ rule_lock = &sw->recp_list[recipe_id].filt_rule_lock;
+
+ INIT_LIST_HEAD(&remove_list_head);
+
+ mutex_lock(rule_lock);
+ list_for_each_entry(itr, rule_head, list_entry) {
+ u8 fltr_promisc_mask = 0;
+
+ if (!ice_vsi_uses_fltr(itr, vsi_handle))
+ continue;
+
+ fltr_promisc_mask |=
+ ice_determine_promisc_mask(&itr->fltr_info);
+
+ /* Skip if filter is not completely specified by given mask */
+ if (fltr_promisc_mask & ~promisc_mask)
+ continue;
+
+ status = ice_add_entry_to_vsi_fltr_list(hw, vsi_handle,
+ &remove_list_head,
+ &itr->fltr_info);
+ if (status) {
+ mutex_unlock(rule_lock);
+ goto free_fltr_list;
+ }
+ }
+ mutex_unlock(rule_lock);
+
+ status = ice_remove_promisc(hw, recipe_id, &remove_list_head);
+
+free_fltr_list:
+ list_for_each_entry_safe(fm_entry, tmp, &remove_list_head, list_entry) {
+ list_del(&fm_entry->list_entry);
+ devm_kfree(ice_hw_to_dev(hw), fm_entry);
+ }
+
+ return status;
+}
+
+/**
+ * ice_set_vsi_promisc - set given VSI to given promiscuous mode(s)
+ * @hw: pointer to the hardware structure
+ * @vsi_handle: VSI handle to configure
+ * @promisc_mask: mask of promiscuous config bits
+ * @vid: VLAN ID to set VLAN promiscuous
+ */
+enum ice_status
+ice_set_vsi_promisc(struct ice_hw *hw, u16 vsi_handle, u8 promisc_mask, u16 vid)
+{
+ enum { UCAST_FLTR = 1, MCAST_FLTR, BCAST_FLTR };
+ struct ice_fltr_list_entry f_list_entry;
+ struct ice_fltr_info new_fltr;
+ enum ice_status status = 0;
+ bool is_tx_fltr;
+ u16 hw_vsi_id;
+ int pkt_type;
+ u8 recipe_id;
+
+ if (!ice_is_vsi_valid(hw, vsi_handle))
+ return ICE_ERR_PARAM;
+ hw_vsi_id = ice_get_hw_vsi_num(hw, vsi_handle);
+
+ memset(&new_fltr, 0, sizeof(new_fltr));
+
+ if (promisc_mask & (ICE_PROMISC_VLAN_RX | ICE_PROMISC_VLAN_TX)) {
+ new_fltr.lkup_type = ICE_SW_LKUP_PROMISC_VLAN;
+ new_fltr.l_data.mac_vlan.vlan_id = vid;
+ recipe_id = ICE_SW_LKUP_PROMISC_VLAN;
+ } else {
+ new_fltr.lkup_type = ICE_SW_LKUP_PROMISC;
+ recipe_id = ICE_SW_LKUP_PROMISC;
+ }
+
+ /* Separate filters must be set for each direction/packet type
+ * combination, so we will loop over the mask value, store the
+ * individual type, and clear it out in the input mask as it
+ * is found.
+ */
+ while (promisc_mask) {
+ u8 *mac_addr;
+
+ pkt_type = 0;
+ is_tx_fltr = false;
+
+ if (promisc_mask & ICE_PROMISC_UCAST_RX) {
+ promisc_mask &= ~ICE_PROMISC_UCAST_RX;
+ pkt_type = UCAST_FLTR;
+ } else if (promisc_mask & ICE_PROMISC_UCAST_TX) {
+ promisc_mask &= ~ICE_PROMISC_UCAST_TX;
+ pkt_type = UCAST_FLTR;
+ is_tx_fltr = true;
+ } else if (promisc_mask & ICE_PROMISC_MCAST_RX) {
+ promisc_mask &= ~ICE_PROMISC_MCAST_RX;
+ pkt_type = MCAST_FLTR;
+ } else if (promisc_mask & ICE_PROMISC_MCAST_TX) {
+ promisc_mask &= ~ICE_PROMISC_MCAST_TX;
+ pkt_type = MCAST_FLTR;
+ is_tx_fltr = true;
+ } else if (promisc_mask & ICE_PROMISC_BCAST_RX) {
+ promisc_mask &= ~ICE_PROMISC_BCAST_RX;
+ pkt_type = BCAST_FLTR;
+ } else if (promisc_mask & ICE_PROMISC_BCAST_TX) {
+ promisc_mask &= ~ICE_PROMISC_BCAST_TX;
+ pkt_type = BCAST_FLTR;
+ is_tx_fltr = true;
+ }
+
+ /* Check for VLAN promiscuous flag */
+ if (promisc_mask & ICE_PROMISC_VLAN_RX) {
+ promisc_mask &= ~ICE_PROMISC_VLAN_RX;
+ } else if (promisc_mask & ICE_PROMISC_VLAN_TX) {
+ promisc_mask &= ~ICE_PROMISC_VLAN_TX;
+ is_tx_fltr = true;
+ }
+
+ /* Set filter DA based on packet type */
+ mac_addr = new_fltr.l_data.mac.mac_addr;
+ if (pkt_type == BCAST_FLTR) {
+ eth_broadcast_addr(mac_addr);
+ } else if (pkt_type == MCAST_FLTR ||
+ pkt_type == UCAST_FLTR) {
+ /* Use the dummy ether header DA */
+ ether_addr_copy(mac_addr, dummy_eth_header);
+ if (pkt_type == MCAST_FLTR)
+ mac_addr[0] |= 0x1; /* Set multicast bit */
+ }
+
+ /* Need to reset this to zero for all iterations */
+ new_fltr.flag = 0;
+ if (is_tx_fltr) {
+ new_fltr.flag |= ICE_FLTR_TX;
+ new_fltr.src = hw_vsi_id;
+ } else {
+ new_fltr.flag |= ICE_FLTR_RX;
+ new_fltr.src = hw->port_info->lport;
+ }
+
+ new_fltr.fltr_act = ICE_FWD_TO_VSI;
+ new_fltr.vsi_handle = vsi_handle;
+ new_fltr.fwd_id.hw_vsi_id = hw_vsi_id;
+ f_list_entry.fltr_info = new_fltr;
+
+ status = ice_add_rule_internal(hw, recipe_id, &f_list_entry);
+ if (status)
+ goto set_promisc_exit;
+ }
+
+set_promisc_exit:
+ return status;
+}
+
+/**
+ * ice_set_vlan_vsi_promisc
+ * @hw: pointer to the hardware structure
+ * @vsi_handle: VSI handle to configure
+ * @promisc_mask: mask of promiscuous config bits
+ * @rm_vlan_promisc: Clear VLANs VSI promisc mode
+ *
+ * Configure VSI with all associated VLANs to given promiscuous mode(s)
+ */
+enum ice_status
+ice_set_vlan_vsi_promisc(struct ice_hw *hw, u16 vsi_handle, u8 promisc_mask,
+ bool rm_vlan_promisc)
+{
+ struct ice_switch_info *sw = hw->switch_info;
+ struct ice_fltr_list_entry *list_itr, *tmp;
+ struct list_head vsi_list_head;
+ struct list_head *vlan_head;
+ struct mutex *vlan_lock; /* Lock to protect filter rule list */
+ enum ice_status status;
+ u16 vlan_id;
+
+ INIT_LIST_HEAD(&vsi_list_head);
+ vlan_lock = &sw->recp_list[ICE_SW_LKUP_VLAN].filt_rule_lock;
+ vlan_head = &sw->recp_list[ICE_SW_LKUP_VLAN].filt_rules;
+ mutex_lock(vlan_lock);
+ status = ice_add_to_vsi_fltr_list(hw, vsi_handle, vlan_head,
+ &vsi_list_head);
+ mutex_unlock(vlan_lock);
+ if (status)
+ goto free_fltr_list;
+
+ list_for_each_entry(list_itr, &vsi_list_head, list_entry) {
+ vlan_id = list_itr->fltr_info.l_data.vlan.vlan_id;
+ if (rm_vlan_promisc)
+ status = ice_clear_vsi_promisc(hw, vsi_handle,
+ promisc_mask, vlan_id);
+ else
+ status = ice_set_vsi_promisc(hw, vsi_handle,
+ promisc_mask, vlan_id);
+ if (status)
+ break;
+ }
+
+free_fltr_list:
+ list_for_each_entry_safe(list_itr, tmp, &vsi_list_head, list_entry) {
+ list_del(&list_itr->list_entry);
+ devm_kfree(ice_hw_to_dev(hw), list_itr);
+ }
+ return status;
+}
+
/**
* ice_remove_vsi_lkup_fltr - Remove lookup type filters for a VSI
* @hw: pointer to the hardware structure
case ICE_SW_LKUP_VLAN:
ice_remove_vlan(hw, &remove_list_head);
break;
+ case ICE_SW_LKUP_PROMISC:
+ case ICE_SW_LKUP_PROMISC_VLAN:
+ ice_remove_promisc(hw, lkup, &remove_list_head);
+ break;
case ICE_SW_LKUP_MAC_VLAN:
case ICE_SW_LKUP_ETHERTYPE:
case ICE_SW_LKUP_ETHERTYPE_MAC:
- case ICE_SW_LKUP_PROMISC:
case ICE_SW_LKUP_DFLT:
- case ICE_SW_LKUP_PROMISC_VLAN:
case ICE_SW_LKUP_LAST:
default:
ice_debug(hw, ICE_DBG_SW, "Unsupported lookup type %d\n", lkup);
u8 counter_index;
};
+enum ice_promisc_flags {
+ ICE_PROMISC_UCAST_RX = 0x1,
+ ICE_PROMISC_UCAST_TX = 0x2,
+ ICE_PROMISC_MCAST_RX = 0x4,
+ ICE_PROMISC_MCAST_TX = 0x8,
+ ICE_PROMISC_BCAST_RX = 0x10,
+ ICE_PROMISC_BCAST_TX = 0x20,
+ ICE_PROMISC_VLAN_RX = 0x40,
+ ICE_PROMISC_VLAN_TX = 0x80,
+};
+
/* VSI related commands */
enum ice_status
ice_add_vsi(struct ice_hw *hw, u16 vsi_handle, struct ice_vsi_ctx *vsi_ctx,
enum ice_status ice_add_mac(struct ice_hw *hw, struct list_head *m_lst);
enum ice_status ice_remove_mac(struct ice_hw *hw, struct list_head *m_lst);
void ice_remove_vsi_fltr(struct ice_hw *hw, u16 vsi_handle);
-enum ice_status ice_add_vlan(struct ice_hw *hw, struct list_head *m_list);
+enum ice_status
+ice_add_vlan(struct ice_hw *hw, struct list_head *m_list);
enum ice_status ice_remove_vlan(struct ice_hw *hw, struct list_head *v_list);
+
+/* Promisc/defport setup for VSIs */
enum ice_status
ice_cfg_dflt_vsi(struct ice_hw *hw, u16 vsi_handle, bool set, u8 direction);
+enum ice_status
+ice_set_vsi_promisc(struct ice_hw *hw, u16 vsi_handle, u8 promisc_mask,
+ u16 vid);
+enum ice_status
+ice_clear_vsi_promisc(struct ice_hw *hw, u16 vsi_handle, u8 promisc_mask,
+ u16 vid);
+enum ice_status
+ice_set_vlan_vsi_promisc(struct ice_hw *hw, u16 vsi_handle, u8 promisc_mask,
+ bool rm_vlan_promisc);
enum ice_status ice_init_def_sw_recp(struct ice_hw *hw);
u16 ice_get_hw_vsi_num(struct ice_hw *hw, u16 vsi_handle);
*
* Returns true if there's any budget left (e.g. the clean is finished)
*/
-static bool ice_clean_tx_irq(struct ice_vsi *vsi, struct ice_ring *tx_ring,
- int napi_budget)
+static bool
+ice_clean_tx_irq(struct ice_vsi *vsi, struct ice_ring *tx_ring, int napi_budget)
{
unsigned int total_bytes = 0, total_pkts = 0;
unsigned int budget = vsi->work_lmt;
if (!tx_ring->tx_buf)
return -ENOMEM;
- /* round up to nearest 4K */
+ /* round up to nearest page */
tx_ring->size = ALIGN(tx_ring->count * sizeof(struct ice_tx_desc),
- 4096);
+ PAGE_SIZE);
tx_ring->desc = dmam_alloc_coherent(dev, tx_ring->size, &tx_ring->dma,
GFP_KERNEL);
if (!tx_ring->desc) {
if (!rx_buf->page)
continue;
- dma_unmap_page(dev, rx_buf->dma, PAGE_SIZE, DMA_FROM_DEVICE);
- __free_pages(rx_buf->page, 0);
+ /* Invalidate cache lines that may have been written to by
+ * device so that we avoid corrupting memory.
+ */
+ dma_sync_single_range_for_cpu(dev, rx_buf->dma,
+ rx_buf->page_offset,
+ ICE_RXBUF_2048, DMA_FROM_DEVICE);
+
+ /* free resources associated with mapping */
+ dma_unmap_page_attrs(dev, rx_buf->dma, PAGE_SIZE,
+ DMA_FROM_DEVICE, ICE_RX_DMA_ATTR);
+ __page_frag_cache_drain(rx_buf->page, rx_buf->pagecnt_bias);
rx_buf->page = NULL;
rx_buf->page_offset = 0;
if (!rx_ring->rx_buf)
return -ENOMEM;
- /* round up to nearest 4K */
- rx_ring->size = rx_ring->count * sizeof(union ice_32byte_rx_desc);
- rx_ring->size = ALIGN(rx_ring->size, 4096);
+ /* round up to nearest page */
+ rx_ring->size = ALIGN(rx_ring->count * sizeof(union ice_32byte_rx_desc),
+ PAGE_SIZE);
rx_ring->desc = dmam_alloc_coherent(dev, rx_ring->size, &rx_ring->dma,
GFP_KERNEL);
if (!rx_ring->desc) {
* Returns true if the page was successfully allocated or
* reused.
*/
-static bool ice_alloc_mapped_page(struct ice_ring *rx_ring,
- struct ice_rx_buf *bi)
+static bool
+ice_alloc_mapped_page(struct ice_ring *rx_ring, struct ice_rx_buf *bi)
{
struct page *page = bi->page;
dma_addr_t dma;
}
/* map page for use */
- dma = dma_map_page(rx_ring->dev, page, 0, PAGE_SIZE, DMA_FROM_DEVICE);
+ dma = dma_map_page_attrs(rx_ring->dev, page, 0, PAGE_SIZE,
+ DMA_FROM_DEVICE, ICE_RX_DMA_ATTR);
/* if mapping failed free memory back to system since
* there isn't much point in holding memory we can't use
bi->dma = dma;
bi->page = page;
bi->page_offset = 0;
+ page_ref_add(page, USHRT_MAX - 1);
+ bi->pagecnt_bias = USHRT_MAX;
return true;
}
if (!ice_alloc_mapped_page(rx_ring, bi))
goto no_bufs;
+ /* sync the buffer for use by the device */
+ dma_sync_single_range_for_device(rx_ring->dev, bi->dma,
+ bi->page_offset,
+ ICE_RXBUF_2048,
+ DMA_FROM_DEVICE);
+
/* Refresh the desc even if buffer_addrs didn't change
* because each write-back erases this info.
*/
}
/**
- * ice_add_rx_frag - Add contents of Rx buffer to sk_buff
- * @rx_buf: buffer containing page to add
- * @rx_desc: descriptor containing length of buffer written by hardware
- * @skb: sk_buf to place the data into
+ * ice_rx_buf_adjust_pg_offset - Prepare Rx buffer for reuse
+ * @rx_buf: Rx buffer to adjust
+ * @size: Size of adjustment
*
- * This function will add the data contained in rx_buf->page to the skb.
- * This is done either through a direct copy if the data in the buffer is
- * less than the skb header size, otherwise it will just attach the page as
- * a frag to the skb.
- *
- * The function will then update the page offset if necessary and return
- * true if the buffer can be reused by the adapter.
+ * Update the offset within page so that Rx buf will be ready to be reused.
+ * For systems with PAGE_SIZE < 8192 this function will flip the page offset
+ * so the second half of page assigned to Rx buffer will be used, otherwise
+ * the offset is moved by the @size bytes
*/
-static bool ice_add_rx_frag(struct ice_rx_buf *rx_buf,
- union ice_32b_rx_flex_desc *rx_desc,
- struct sk_buff *skb)
+static void
+ice_rx_buf_adjust_pg_offset(struct ice_rx_buf *rx_buf, unsigned int size)
{
#if (PAGE_SIZE < 8192)
- unsigned int truesize = ICE_RXBUF_2048;
+ /* flip page offset to other buffer */
+ rx_buf->page_offset ^= size;
#else
- unsigned int last_offset = PAGE_SIZE - ICE_RXBUF_2048;
- unsigned int truesize;
-#endif /* PAGE_SIZE < 8192) */
-
- struct page *page;
- unsigned int size;
-
- size = le16_to_cpu(rx_desc->wb.pkt_len) &
- ICE_RX_FLX_DESC_PKT_LEN_M;
-
- page = rx_buf->page;
+ /* move offset up to the next cache line */
+ rx_buf->page_offset += size;
+#endif
+}
+/**
+ * ice_can_reuse_rx_page - Determine if page can be reused for another Rx
+ * @rx_buf: buffer containing the page
+ *
+ * If page is reusable, we have a green light for calling ice_reuse_rx_page,
+ * which will assign the current buffer to the buffer that next_to_alloc is
+ * pointing to; otherwise, the DMA mapping needs to be destroyed and
+ * page freed
+ */
+static bool ice_can_reuse_rx_page(struct ice_rx_buf *rx_buf)
+{
#if (PAGE_SIZE >= 8192)
- truesize = ALIGN(size, L1_CACHE_BYTES);
-#endif /* PAGE_SIZE >= 8192) */
-
- /* will the data fit in the skb we allocated? if so, just
- * copy it as it is pretty small anyway
- */
- if (size <= ICE_RX_HDR_SIZE && !skb_is_nonlinear(skb)) {
- unsigned char *va = page_address(page) + rx_buf->page_offset;
-
- memcpy(__skb_put(skb, size), va, ALIGN(size, sizeof(long)));
-
- /* page is not reserved, we can reuse buffer as-is */
- if (likely(!ice_page_is_reserved(page)))
- return true;
-
- /* this page cannot be reused so discard it */
- __free_pages(page, 0);
- return false;
- }
-
- skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, page,
- rx_buf->page_offset, size, truesize);
+ unsigned int last_offset = PAGE_SIZE - ICE_RXBUF_2048;
+#endif
+ unsigned int pagecnt_bias = rx_buf->pagecnt_bias;
+ struct page *page = rx_buf->page;
/* avoid re-using remote pages */
if (unlikely(ice_page_is_reserved(page)))
#if (PAGE_SIZE < 8192)
/* if we are only owner of page we can reuse it */
- if (unlikely(page_count(page) != 1))
+ if (unlikely((page_count(page) - pagecnt_bias) > 1))
return false;
-
- /* flip page offset to other buffer */
- rx_buf->page_offset ^= truesize;
#else
- /* move offset up to the next cache line */
- rx_buf->page_offset += truesize;
-
if (rx_buf->page_offset > last_offset)
return false;
#endif /* PAGE_SIZE < 8192) */
- /* Even if we own the page, we are not allowed to use atomic_set()
- * This would break get_page_unless_zero() users.
+ /* If we have drained the page fragment pool we need to update
+ * the pagecnt_bias and page count so that we fully restock the
+ * number of references the driver holds.
*/
- get_page(rx_buf->page);
+ if (unlikely(pagecnt_bias == 1)) {
+ page_ref_add(page, USHRT_MAX - 1);
+ rx_buf->pagecnt_bias = USHRT_MAX;
+ }
return true;
}
+/**
+ * ice_add_rx_frag - Add contents of Rx buffer to sk_buff as a frag
+ * @rx_buf: buffer containing page to add
+ * @skb: sk_buff to place the data into
+ * @size: packet length from rx_desc
+ *
+ * This function will add the data contained in rx_buf->page to the skb.
+ * It will just attach the page as a frag to the skb.
+ * The function will then update the page offset.
+ */
+static void
+ice_add_rx_frag(struct ice_rx_buf *rx_buf, struct sk_buff *skb,
+ unsigned int size)
+{
+#if (PAGE_SIZE >= 8192)
+ unsigned int truesize = SKB_DATA_ALIGN(size);
+#else
+ unsigned int truesize = ICE_RXBUF_2048;
+#endif
+
+ skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, rx_buf->page,
+ rx_buf->page_offset, size, truesize);
+
+ /* page is being used so we must update the page offset */
+ ice_rx_buf_adjust_pg_offset(rx_buf, truesize);
+}
+
/**
* ice_reuse_rx_page - page flip buffer and store it back on the ring
* @rx_ring: Rx descriptor ring to store buffers on
*
* Synchronizes page for reuse by the adapter
*/
-static void ice_reuse_rx_page(struct ice_ring *rx_ring,
- struct ice_rx_buf *old_buf)
+static void
+ice_reuse_rx_page(struct ice_ring *rx_ring, struct ice_rx_buf *old_buf)
{
u16 nta = rx_ring->next_to_alloc;
struct ice_rx_buf *new_buf;
nta++;
rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
- /* transfer page from old buffer to new buffer */
- *new_buf = *old_buf;
+ /* Transfer page from old buffer to new buffer.
+ * Move each member individually to avoid possible store
+ * forwarding stalls and unnecessary copy of skb.
+ */
+ new_buf->dma = old_buf->dma;
+ new_buf->page = old_buf->page;
+ new_buf->page_offset = old_buf->page_offset;
+ new_buf->pagecnt_bias = old_buf->pagecnt_bias;
}
/**
- * ice_fetch_rx_buf - Allocate skb and populate it
+ * ice_get_rx_buf - Fetch Rx buffer and synchronize data for use
* @rx_ring: Rx descriptor ring to transact packets on
- * @rx_desc: descriptor containing info written by hardware
+ * @skb: skb to be used
+ * @size: size of buffer to add to skb
*
- * This function allocates an skb on the fly, and populates it with the page
- * data from the current receive descriptor, taking care to set up the skb
- * correctly, as well as handling calling the page recycle function if
- * necessary.
+ * This function will pull an Rx buffer from the ring and synchronize it
+ * for use by the CPU.
*/
-static struct sk_buff *ice_fetch_rx_buf(struct ice_ring *rx_ring,
- union ice_32b_rx_flex_desc *rx_desc)
+static struct ice_rx_buf *
+ice_get_rx_buf(struct ice_ring *rx_ring, struct sk_buff **skb,
+ const unsigned int size)
{
struct ice_rx_buf *rx_buf;
- struct sk_buff *skb;
- struct page *page;
rx_buf = &rx_ring->rx_buf[rx_ring->next_to_clean];
- page = rx_buf->page;
- prefetchw(page);
+ prefetchw(rx_buf->page);
+ *skb = rx_buf->skb;
+
+ /* we are reusing so sync this buffer for CPU use */
+ dma_sync_single_range_for_cpu(rx_ring->dev, rx_buf->dma,
+ rx_buf->page_offset, size,
+ DMA_FROM_DEVICE);
- skb = rx_buf->skb;
+ /* We have pulled a buffer for use, so decrement pagecnt_bias */
+ rx_buf->pagecnt_bias--;
- if (likely(!skb)) {
- u8 *page_addr = page_address(page) + rx_buf->page_offset;
+ return rx_buf;
+}
- /* prefetch first cache line of first page */
- prefetch(page_addr);
+/**
+ * ice_construct_skb - Allocate skb and populate it
+ * @rx_ring: Rx descriptor ring to transact packets on
+ * @rx_buf: Rx buffer to pull data from
+ * @size: the length of the packet
+ *
+ * This function allocates an skb. It then populates it with the page
+ * data from the current receive descriptor, taking care to set up the
+ * skb correctly.
+ */
+static struct sk_buff *
+ice_construct_skb(struct ice_ring *rx_ring, struct ice_rx_buf *rx_buf,
+ unsigned int size)
+{
+ void *va = page_address(rx_buf->page) + rx_buf->page_offset;
+ unsigned int headlen;
+ struct sk_buff *skb;
+
+ /* prefetch first cache line of first page */
+ prefetch(va);
#if L1_CACHE_BYTES < 128
- prefetch((void *)(page_addr + L1_CACHE_BYTES));
+ prefetch((u8 *)va + L1_CACHE_BYTES);
#endif /* L1_CACHE_BYTES */
- /* allocate a skb to store the frags */
- skb = __napi_alloc_skb(&rx_ring->q_vector->napi,
- ICE_RX_HDR_SIZE,
- GFP_ATOMIC | __GFP_NOWARN);
- if (unlikely(!skb)) {
- rx_ring->rx_stats.alloc_buf_failed++;
- return NULL;
- }
-
- /* we will be copying header into skb->data in
- * pskb_may_pull so it is in our interest to prefetch
- * it now to avoid a possible cache miss
- */
- prefetchw(skb->data);
+ /* allocate a skb to store the frags */
+ skb = __napi_alloc_skb(&rx_ring->q_vector->napi, ICE_RX_HDR_SIZE,
+ GFP_ATOMIC | __GFP_NOWARN);
+ if (unlikely(!skb))
+ return NULL;
- skb_record_rx_queue(skb, rx_ring->q_index);
- } else {
- /* we are reusing so sync this buffer for CPU use */
- dma_sync_single_range_for_cpu(rx_ring->dev, rx_buf->dma,
- rx_buf->page_offset,
- ICE_RXBUF_2048,
- DMA_FROM_DEVICE);
+ skb_record_rx_queue(skb, rx_ring->q_index);
+ /* Determine available headroom for copy */
+ headlen = size;
+ if (headlen > ICE_RX_HDR_SIZE)
+ headlen = eth_get_headlen(va, ICE_RX_HDR_SIZE);
- rx_buf->skb = NULL;
- }
+ /* align pull length to size of long to optimize memcpy performance */
+ memcpy(__skb_put(skb, headlen), va, ALIGN(headlen, sizeof(long)));
- /* pull page into skb */
- if (ice_add_rx_frag(rx_buf, rx_desc, skb)) {
- /* hand second half of page back to the ring */
- ice_reuse_rx_page(rx_ring, rx_buf);
- rx_ring->rx_stats.page_reuse_count++;
+ /* if we exhaust the linear part then add what is left as a frag */
+ size -= headlen;
+ if (size) {
+#if (PAGE_SIZE >= 8192)
+ unsigned int truesize = SKB_DATA_ALIGN(size);
+#else
+ unsigned int truesize = ICE_RXBUF_2048;
+#endif
+ skb_add_rx_frag(skb, 0, rx_buf->page,
+ rx_buf->page_offset + headlen, size, truesize);
+ /* buffer is used by skb, update page_offset */
+ ice_rx_buf_adjust_pg_offset(rx_buf, truesize);
} else {
- /* we are not reusing the buffer so unmap it */
- dma_unmap_page(rx_ring->dev, rx_buf->dma, PAGE_SIZE,
- DMA_FROM_DEVICE);
+ /* buffer is unused, reset bias back to rx_buf; data was copied
+ * onto skb's linear part so there's no need for adjusting
+ * page offset and we can reuse this buffer as-is
+ */
+ rx_buf->pagecnt_bias++;
}
- /* clear contents of buffer_info */
- rx_buf->page = NULL;
-
return skb;
}
/**
- * ice_pull_tail - ice specific version of skb_pull_tail
- * @skb: pointer to current skb being adjusted
+ * ice_put_rx_buf - Clean up used buffer and either recycle or free
+ * @rx_ring: Rx descriptor ring to transact packets on
+ * @rx_buf: Rx buffer to pull data from
*
- * This function is an ice specific version of __pskb_pull_tail. The
- * main difference between this version and the original function is that
- * this function can make several assumptions about the state of things
- * that allow for significant optimizations versus the standard function.
- * As a result we can do things like drop a frag and maintain an accurate
- * truesize for the skb.
+ * This function will clean up the contents of the rx_buf. It will
+ * either recycle the buffer or unmap it and free the associated resources.
*/
-static void ice_pull_tail(struct sk_buff *skb)
+static void ice_put_rx_buf(struct ice_ring *rx_ring, struct ice_rx_buf *rx_buf)
{
- struct skb_frag_struct *frag = &skb_shinfo(skb)->frags[0];
- unsigned int pull_len;
- unsigned char *va;
-
- /* it is valid to use page_address instead of kmap since we are
- * working with pages allocated out of the lomem pool per
- * alloc_page(GFP_ATOMIC)
- */
- va = skb_frag_address(frag);
-
- /* we need the header to contain the greater of either ETH_HLEN or
- * 60 bytes if the skb->len is less than 60 for skb_pad.
- */
- pull_len = eth_get_headlen(va, ICE_RX_HDR_SIZE);
-
- /* align pull length to size of long to optimize memcpy performance */
- skb_copy_to_linear_data(skb, va, ALIGN(pull_len, sizeof(long)));
+ /* hand second half of page back to the ring */
+ if (ice_can_reuse_rx_page(rx_buf)) {
+ ice_reuse_rx_page(rx_ring, rx_buf);
+ rx_ring->rx_stats.page_reuse_count++;
+ } else {
+ /* we are not reusing the buffer so unmap it */
+ dma_unmap_page_attrs(rx_ring->dev, rx_buf->dma, PAGE_SIZE,
+ DMA_FROM_DEVICE, ICE_RX_DMA_ATTR);
+ __page_frag_cache_drain(rx_buf->page, rx_buf->pagecnt_bias);
+ }
- /* update all of the pointers */
- skb_frag_size_sub(frag, pull_len);
- frag->page_offset += pull_len;
- skb->data_len -= pull_len;
- skb->tail += pull_len;
+ /* clear contents of buffer_info */
+ rx_buf->page = NULL;
+ rx_buf->skb = NULL;
}
/**
*/
static bool ice_cleanup_headers(struct sk_buff *skb)
{
- /* place header in linear portion of buffer */
- if (skb_is_nonlinear(skb))
- ice_pull_tail(skb);
-
/* if eth_skb_pad returns an error the skb was freed */
if (eth_skb_pad(skb))
return true;
* The status_error_len doesn't need to be shifted because it begins
* at offset zero.
*/
-static bool ice_test_staterr(union ice_32b_rx_flex_desc *rx_desc,
- const u16 stat_err_bits)
+static bool
+ice_test_staterr(union ice_32b_rx_flex_desc *rx_desc, const u16 stat_err_bits)
{
return !!(rx_desc->wb.status_error0 &
cpu_to_le16(stat_err_bits));
* sk_buff in the next buffer to be chained and return true indicating
* that this is in fact a non-EOP buffer.
*/
-static bool ice_is_non_eop(struct ice_ring *rx_ring,
- union ice_32b_rx_flex_desc *rx_desc,
- struct sk_buff *skb)
+static bool
+ice_is_non_eop(struct ice_ring *rx_ring, union ice_32b_rx_flex_desc *rx_desc,
+ struct sk_buff *skb)
{
u32 ntc = rx_ring->next_to_clean + 1;
*
* skb->protocol must be set before this function is called
*/
-static void ice_rx_csum(struct ice_vsi *vsi, struct sk_buff *skb,
- union ice_32b_rx_flex_desc *rx_desc, u8 ptype)
+static void
+ice_rx_csum(struct ice_vsi *vsi, struct sk_buff *skb,
+ union ice_32b_rx_flex_desc *rx_desc, u8 ptype)
{
struct ice_rx_ptype_decoded decoded;
u32 rx_error, rx_status;
* order to populate the hash, checksum, VLAN, protocol, and
* other fields within the skb.
*/
-static void ice_process_skb_fields(struct ice_ring *rx_ring,
- union ice_32b_rx_flex_desc *rx_desc,
- struct sk_buff *skb, u8 ptype)
+static void
+ice_process_skb_fields(struct ice_ring *rx_ring,
+ union ice_32b_rx_flex_desc *rx_desc,
+ struct sk_buff *skb, u8 ptype)
{
ice_rx_hash(rx_ring, rx_desc, skb, ptype);
* This function sends the completed packet (via. skb) up the stack using
* gro receive functions (with/without vlan tag)
*/
-static void ice_receive_skb(struct ice_ring *rx_ring, struct sk_buff *skb,
- u16 vlan_tag)
+static void
+ice_receive_skb(struct ice_ring *rx_ring, struct sk_buff *skb, u16 vlan_tag)
{
if ((rx_ring->netdev->features & NETIF_F_HW_VLAN_CTAG_RX) &&
- (vlan_tag & VLAN_VID_MASK)) {
+ (vlan_tag & VLAN_VID_MASK))
__vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q), vlan_tag);
- }
napi_gro_receive(&rx_ring->q_vector->napi, skb);
}
/* start the loop to process RX packets bounded by 'budget' */
while (likely(total_rx_pkts < (unsigned int)budget)) {
union ice_32b_rx_flex_desc *rx_desc;
+ struct ice_rx_buf *rx_buf;
struct sk_buff *skb;
+ unsigned int size;
u16 stat_err_bits;
u16 vlan_tag = 0;
u8 rx_ptype;
*/
dma_rmb();
+ size = le16_to_cpu(rx_desc->wb.pkt_len) &
+ ICE_RX_FLX_DESC_PKT_LEN_M;
+
+ rx_buf = ice_get_rx_buf(rx_ring, &skb, size);
/* allocate (if needed) and populate skb */
- skb = ice_fetch_rx_buf(rx_ring, rx_desc);
- if (!skb)
+ if (skb)
+ ice_add_rx_frag(rx_buf, skb, size);
+ else
+ skb = ice_construct_skb(rx_ring, rx_buf, size);
+
+ /* exit if we failed to retrieve a buffer */
+ if (!skb) {
+ rx_ring->rx_stats.alloc_buf_failed++;
+ rx_buf->pagecnt_bias++;
break;
+ }
+ ice_put_rx_buf(rx_ring, rx_buf);
cleaned_count++;
/* skip if it is NOP desc */
return failure ? budget : (int)total_rx_pkts;
}
+static unsigned int ice_itr_divisor(struct ice_port_info *pi)
+{
+ switch (pi->phy.link_info.link_speed) {
+ case ICE_AQ_LINK_SPEED_40GB:
+ return ICE_ITR_ADAPTIVE_MIN_INC * 1024;
+ case ICE_AQ_LINK_SPEED_25GB:
+ case ICE_AQ_LINK_SPEED_20GB:
+ return ICE_ITR_ADAPTIVE_MIN_INC * 512;
+ case ICE_AQ_LINK_SPEED_100MB:
+ return ICE_ITR_ADAPTIVE_MIN_INC * 32;
+ default:
+ return ICE_ITR_ADAPTIVE_MIN_INC * 256;
+ }
+}
+
+/**
+ * ice_update_itr - update the adaptive ITR value based on statistics
+ * @q_vector: structure containing interrupt and ring information
+ * @rc: structure containing ring performance data
+ *
+ * Stores a new ITR value based on packets and byte
+ * counts during the last interrupt. The advantage of per interrupt
+ * computation is faster updates and more accurate ITR for the current
+ * traffic pattern. Constants in this function were computed
+ * based on theoretical maximum wire speed and thresholds were set based
+ * on testing data as well as attempting to minimize response time
+ * while increasing bulk throughput.
+ */
+static void
+ice_update_itr(struct ice_q_vector *q_vector, struct ice_ring_container *rc)
+{
+ unsigned int avg_wire_size, packets, bytes, itr;
+ unsigned long next_update = jiffies;
+ bool container_is_rx;
+
+ if (!rc->ring || !ITR_IS_DYNAMIC(rc->itr_setting))
+ return;
+
+ /* If itr_countdown is set it means we programmed an ITR within
+ * the last 4 interrupt cycles. This has a side effect of us
+ * potentially firing an early interrupt. In order to work around
+ * this we need to throw out any data received for a few
+ * interrupts following the update.
+ */
+ if (q_vector->itr_countdown) {
+ itr = rc->target_itr;
+ goto clear_counts;
+ }
+
+ container_is_rx = (&q_vector->rx == rc);
+ /* For Rx we want to push the delay up and default to low latency.
+ * for Tx we want to pull the delay down and default to high latency.
+ */
+ itr = container_is_rx ?
+ ICE_ITR_ADAPTIVE_MIN_USECS | ICE_ITR_ADAPTIVE_LATENCY :
+ ICE_ITR_ADAPTIVE_MAX_USECS | ICE_ITR_ADAPTIVE_LATENCY;
+
+ /* If we didn't update within up to 1 - 2 jiffies we can assume
+ * that either packets are coming in so slow there hasn't been
+ * any work, or that there is so much work that NAPI is dealing
+ * with interrupt moderation and we don't need to do anything.
+ */
+ if (time_after(next_update, rc->next_update))
+ goto clear_counts;
+
+ packets = rc->total_pkts;
+ bytes = rc->total_bytes;
+
+ if (container_is_rx) {
+ /* If Rx there are 1 to 4 packets and bytes are less than
+ * 9000 assume insufficient data to use bulk rate limiting
+ * approach unless Tx is already in bulk rate limiting. We
+ * are likely latency driven.
+ */
+ if (packets && packets < 4 && bytes < 9000 &&
+ (q_vector->tx.target_itr & ICE_ITR_ADAPTIVE_LATENCY)) {
+ itr = ICE_ITR_ADAPTIVE_LATENCY;
+ goto adjust_by_size;
+ }
+ } else if (packets < 4) {
+ /* If we have Tx and Rx ITR maxed and Tx ITR is running in
+ * bulk mode and we are receiving 4 or fewer packets just
+ * reset the ITR_ADAPTIVE_LATENCY bit for latency mode so
+ * that the Rx can relax.
+ */
+ if (rc->target_itr == ICE_ITR_ADAPTIVE_MAX_USECS &&
+ (q_vector->rx.target_itr & ICE_ITR_MASK) ==
+ ICE_ITR_ADAPTIVE_MAX_USECS)
+ goto clear_counts;
+ } else if (packets > 32) {
+ /* If we have processed over 32 packets in a single interrupt
+ * for Tx assume we need to switch over to "bulk" mode.
+ */
+ rc->target_itr &= ~ICE_ITR_ADAPTIVE_LATENCY;
+ }
+
+ /* We have no packets to actually measure against. This means
+ * either one of the other queues on this vector is active or
+ * we are a Tx queue doing TSO with too high of an interrupt rate.
+ *
+ * Between 4 and 56 we can assume that our current interrupt delay
+ * is only slightly too low. As such we should increase it by a small
+ * fixed amount.
+ */
+ if (packets < 56) {
+ itr = rc->target_itr + ICE_ITR_ADAPTIVE_MIN_INC;
+ if ((itr & ICE_ITR_MASK) > ICE_ITR_ADAPTIVE_MAX_USECS) {
+ itr &= ICE_ITR_ADAPTIVE_LATENCY;
+ itr += ICE_ITR_ADAPTIVE_MAX_USECS;
+ }
+ goto clear_counts;
+ }
+
+ if (packets <= 256) {
+ itr = min(q_vector->tx.current_itr, q_vector->rx.current_itr);
+ itr &= ICE_ITR_MASK;
+
+ /* Between 56 and 112 is our "goldilocks" zone where we are
+ * working out "just right". Just report that our current
+ * ITR is good for us.
+ */
+ if (packets <= 112)
+ goto clear_counts;
+
+ /* If packet count is 128 or greater we are likely looking
+ * at a slight overrun of the delay we want. Try halving
+ * our delay to see if that will cut the number of packets
+ * in half per interrupt.
+ */
+ itr >>= 1;
+ itr &= ICE_ITR_MASK;
+ if (itr < ICE_ITR_ADAPTIVE_MIN_USECS)
+ itr = ICE_ITR_ADAPTIVE_MIN_USECS;
+
+ goto clear_counts;
+ }
+
+ /* The paths below assume we are dealing with a bulk ITR since
+ * number of packets is greater than 256. We are just going to have
+ * to compute a value and try to bring the count under control,
+ * though for smaller packet sizes there isn't much we can do as
+ * NAPI polling will likely be kicking in sooner rather than later.
+ */
+ itr = ICE_ITR_ADAPTIVE_BULK;
+
+adjust_by_size:
+ /* If packet counts are 256 or greater we can assume we have a gross
+ * overestimation of what the rate should be. Instead of trying to fine
+ * tune it just use the formula below to try and dial in an exact value
+ * gives the current packet size of the frame.
+ */
+ avg_wire_size = bytes / packets;
+
+ /* The following is a crude approximation of:
+ * wmem_default / (size + overhead) = desired_pkts_per_int
+ * rate / bits_per_byte / (size + ethernet overhead) = pkt_rate
+ * (desired_pkt_rate / pkt_rate) * usecs_per_sec = ITR value
+ *
+ * Assuming wmem_default is 212992 and overhead is 640 bytes per
+ * packet, (256 skb, 64 headroom, 320 shared info), we can reduce the
+ * formula down to
+ *
+ * (170 * (size + 24)) / (size + 640) = ITR
+ *
+ * We first do some math on the packet size and then finally bitshift
+ * by 8 after rounding up. We also have to account for PCIe link speed
+ * difference as ITR scales based on this.
+ */
+ if (avg_wire_size <= 60) {
+ /* Start at 250k ints/sec */
+ avg_wire_size = 4096;
+ } else if (avg_wire_size <= 380) {
+ /* 250K ints/sec to 60K ints/sec */
+ avg_wire_size *= 40;
+ avg_wire_size += 1696;
+ } else if (avg_wire_size <= 1084) {
+ /* 60K ints/sec to 36K ints/sec */
+ avg_wire_size *= 15;
+ avg_wire_size += 11452;
+ } else if (avg_wire_size <= 1980) {
+ /* 36K ints/sec to 30K ints/sec */
+ avg_wire_size *= 5;
+ avg_wire_size += 22420;
+ } else {
+ /* plateau at a limit of 30K ints/sec */
+ avg_wire_size = 32256;
+ }
+
+ /* If we are in low latency mode halve our delay which doubles the
+ * rate to somewhere between 100K to 16K ints/sec
+ */
+ if (itr & ICE_ITR_ADAPTIVE_LATENCY)
+ avg_wire_size >>= 1;
+
+ /* Resultant value is 256 times larger than it needs to be. This
+ * gives us room to adjust the value as needed to either increase
+ * or decrease the value based on link speeds of 10G, 2.5G, 1G, etc.
+ *
+ * Use addition as we have already recorded the new latency flag
+ * for the ITR value.
+ */
+ itr += DIV_ROUND_UP(avg_wire_size,
+ ice_itr_divisor(q_vector->vsi->port_info)) *
+ ICE_ITR_ADAPTIVE_MIN_INC;
+
+ if ((itr & ICE_ITR_MASK) > ICE_ITR_ADAPTIVE_MAX_USECS) {
+ itr &= ICE_ITR_ADAPTIVE_LATENCY;
+ itr += ICE_ITR_ADAPTIVE_MAX_USECS;
+ }
+
+clear_counts:
+ /* write back value */
+ rc->target_itr = itr;
+
+ /* next update should occur within next jiffy */
+ rc->next_update = next_update + 1;
+
+ rc->total_bytes = 0;
+ rc->total_pkts = 0;
+}
+
/**
* ice_buildreg_itr - build value for writing to the GLINT_DYN_CTL register
* @itr_idx: interrupt throttling index
- * @reg_itr: interrupt throttling value adjusted based on ITR granularity
+ * @itr: interrupt throttling value in usecs
*/
-static u32 ice_buildreg_itr(int itr_idx, u16 reg_itr)
+static u32 ice_buildreg_itr(u16 itr_idx, u16 itr)
{
+ /* The itr value is reported in microseconds, and the register value is
+ * recorded in 2 microsecond units. For this reason we only need to
+ * shift by the GLINT_DYN_CTL_INTERVAL_S - ICE_ITR_GRAN_S to apply this
+ * granularity as a shift instead of division. The mask makes sure the
+ * ITR value is never odd so we don't accidentally write into the field
+ * prior to the ITR field.
+ */
+ itr &= ICE_ITR_MASK;
+
return GLINT_DYN_CTL_INTENA_M | GLINT_DYN_CTL_CLEARPBA_M |
(itr_idx << GLINT_DYN_CTL_ITR_INDX_S) |
- (reg_itr << GLINT_DYN_CTL_INTERVAL_S);
+ (itr << (GLINT_DYN_CTL_INTERVAL_S - ICE_ITR_GRAN_S));
}
+/* The act of updating the ITR will cause it to immediately trigger. In order
+ * to prevent this from throwing off adaptive update statistics we defer the
+ * update so that it can only happen so often. So after either Tx or Rx are
+ * updated we make the adaptive scheme wait until either the ITR completely
+ * expires via the next_update expiration or we have been through at least
+ * 3 interrupts.
+ */
+#define ITR_COUNTDOWN_START 3
+
/**
* ice_update_ena_itr - Update ITR and re-enable MSIX interrupt
* @vsi: the VSI associated with the q_vector
static void
ice_update_ena_itr(struct ice_vsi *vsi, struct ice_q_vector *q_vector)
{
- struct ice_hw *hw = &vsi->back->hw;
- struct ice_ring_container *rc;
+ struct ice_ring_container *tx = &q_vector->tx;
+ struct ice_ring_container *rx = &q_vector->rx;
u32 itr_val;
+ /* This will do nothing if dynamic updates are not enabled */
+ ice_update_itr(q_vector, tx);
+ ice_update_itr(q_vector, rx);
+
/* This block of logic allows us to get away with only updating
* one ITR value with each interrupt. The idea is to perform a
* pseudo-lazy update with the following criteria.
* 2. If we must reduce an ITR that is given highest priority.
* 3. We then give priority to increasing ITR based on amount.
*/
- if (q_vector->rx.target_itr < q_vector->rx.current_itr) {
- rc = &q_vector->rx;
+ if (rx->target_itr < rx->current_itr) {
/* Rx ITR needs to be reduced, this is highest priority */
- itr_val = ice_buildreg_itr(rc->itr_idx, rc->target_itr);
- rc->current_itr = rc->target_itr;
- } else if ((q_vector->tx.target_itr < q_vector->tx.current_itr) ||
- ((q_vector->rx.target_itr - q_vector->rx.current_itr) <
- (q_vector->tx.target_itr - q_vector->tx.current_itr))) {
- rc = &q_vector->tx;
+ itr_val = ice_buildreg_itr(rx->itr_idx, rx->target_itr);
+ rx->current_itr = rx->target_itr;
+ q_vector->itr_countdown = ITR_COUNTDOWN_START;
+ } else if ((tx->target_itr < tx->current_itr) ||
+ ((rx->target_itr - rx->current_itr) <
+ (tx->target_itr - tx->current_itr))) {
/* Tx ITR needs to be reduced, this is second priority
* Tx ITR needs to be increased more than Rx, fourth priority
*/
- itr_val = ice_buildreg_itr(rc->itr_idx, rc->target_itr);
- rc->current_itr = rc->target_itr;
- } else if (q_vector->rx.current_itr != q_vector->rx.target_itr) {
- rc = &q_vector->rx;
+ itr_val = ice_buildreg_itr(tx->itr_idx, tx->target_itr);
+ tx->current_itr = tx->target_itr;
+ q_vector->itr_countdown = ITR_COUNTDOWN_START;
+ } else if (rx->current_itr != rx->target_itr) {
/* Rx ITR needs to be increased, third priority */
- itr_val = ice_buildreg_itr(rc->itr_idx, rc->target_itr);
- rc->current_itr = rc->target_itr;
+ itr_val = ice_buildreg_itr(rx->itr_idx, rx->target_itr);
+ rx->current_itr = rx->target_itr;
+ q_vector->itr_countdown = ITR_COUNTDOWN_START;
} else {
/* Still have to re-enable the interrupts */
itr_val = ice_buildreg_itr(ICE_ITR_NONE, 0);
+ if (q_vector->itr_countdown)
+ q_vector->itr_countdown--;
}
- if (!test_bit(__ICE_DOWN, vsi->state)) {
- int vector = vsi->hw_base_vector + q_vector->v_idx;
-
- wr32(hw, GLINT_DYN_CTL(vector), itr_val);
- }
+ if (!test_bit(__ICE_DOWN, vsi->state))
+ wr32(&vsi->back->hw,
+ GLINT_DYN_CTL(vsi->hw_base_vector + q_vector->v_idx),
+ itr_val);
}
/**
ice_maybe_stop_tx(tx_ring, DESC_NEEDED);
/* notify HW of packet */
- if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
+ if (netif_xmit_stopped(txring_txq(tx_ring)) || !netdev_xmit_more()) {
writel(i, tx_ring->tail);
/* we need this if more than one processor can write to our tail
#define ICE_TX_FLAGS_VLAN_M 0xffff0000
#define ICE_TX_FLAGS_VLAN_S 16
+#define ICE_RX_DMA_ATTR \
+ (DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_WEAK_ORDERING)
+
struct ice_tx_buf {
struct ice_tx_desc *next_to_watch;
struct sk_buff *skb;
dma_addr_t dma;
struct page *page;
unsigned int page_offset;
+ u16 pagecnt_bias;
};
struct ice_q_stats {
#define ICE_ITR_DYNAMIC 0x8000 /* used as flag for itr_setting */
#define ITR_IS_DYNAMIC(setting) (!!((setting) & ICE_ITR_DYNAMIC))
#define ITR_TO_REG(setting) ((setting) & ~ICE_ITR_DYNAMIC)
-#define ICE_ITR_GRAN_S 1 /* Assume ITR granularity is 2us */
+#define ICE_ITR_GRAN_S 1 /* ITR granularity is always 2us */
+#define ICE_ITR_GRAN_US BIT(ICE_ITR_GRAN_S)
#define ICE_ITR_MASK 0x1FFE /* ITR register value alignment mask */
#define ITR_REG_ALIGN(setting) __ALIGN_MASK(setting, ~ICE_ITR_MASK)
+#define ICE_ITR_ADAPTIVE_MIN_INC 0x0002
+#define ICE_ITR_ADAPTIVE_MIN_USECS 0x0002
+#define ICE_ITR_ADAPTIVE_MAX_USECS 0x00FA
+#define ICE_ITR_ADAPTIVE_LATENCY 0x8000
+#define ICE_ITR_ADAPTIVE_BULK 0x0000
+
#define ICE_DFLT_INTRL 0
/* Legacy or Advanced Mode Queue */
u16 next_to_alloc;
} ____cacheline_internodealigned_in_smp;
-enum ice_latency_range {
- ICE_LOWEST_LATENCY = 0,
- ICE_LOW_LATENCY = 1,
- ICE_BULK_LATENCY = 2,
- ICE_ULTRA_LATENCY = 3,
-};
-
struct ice_ring_container {
/* head of linked-list of rings */
struct ice_ring *ring;
unsigned long next_update; /* jiffies value of next queue update */
unsigned int total_bytes; /* total bytes processed this int */
unsigned int total_pkts; /* total packets processed this int */
- enum ice_latency_range latency_range;
- int itr_idx; /* index in the interrupt vector */
+ u16 itr_idx; /* index in the interrupt vector */
u16 target_itr; /* value in usecs divided by the hw->itr_gran */
u16 current_itr; /* value in usecs divided by the hw->itr_gran */
/* high bit set means dynamic ITR, rest is used to store user
/* debug masks - set these bits in hw->debug_mask to control output */
#define ICE_DBG_INIT BIT_ULL(1)
#define ICE_DBG_LINK BIT_ULL(4)
+#define ICE_DBG_PHY BIT_ULL(5)
#define ICE_DBG_QCTX BIT_ULL(6)
#define ICE_DBG_NVM BIT_ULL(7)
#define ICE_DBG_LAN BIT_ULL(8)
#define ICE_MAX_TRAFFIC_CLASS 8
#define ICE_TXSCHED_MAX_BRANCHES ICE_MAX_TRAFFIC_CLASS
+#define ice_for_each_traffic_class(_i) \
+ for ((_i) = 0; (_i) < ICE_MAX_TRAFFIC_CLASS; (_i)++)
+
struct ice_sched_node {
struct ice_sched_node *parent;
struct ice_sched_node *sibling; /* next sibling in the same layer */
struct ice_sched_node *ag_node[ICE_MAX_TRAFFIC_CLASS];
struct list_head list_entry;
u16 max_lanq[ICE_MAX_TRAFFIC_CLASS];
- u16 vsi_id;
};
/* driver defines the policy */
#include "ice.h"
#include "ice_lib.h"
+/**
+ * ice_err_to_virt err - translate errors for VF return code
+ * @ice_err: error return code
+ */
+static enum virtchnl_status_code ice_err_to_virt_err(enum ice_status ice_err)
+{
+ switch (ice_err) {
+ case ICE_SUCCESS:
+ return VIRTCHNL_STATUS_SUCCESS;
+ case ICE_ERR_BAD_PTR:
+ case ICE_ERR_INVAL_SIZE:
+ case ICE_ERR_DEVICE_NOT_SUPPORTED:
+ case ICE_ERR_PARAM:
+ case ICE_ERR_CFG:
+ return VIRTCHNL_STATUS_ERR_PARAM;
+ case ICE_ERR_NO_MEMORY:
+ return VIRTCHNL_STATUS_ERR_NO_MEMORY;
+ case ICE_ERR_NOT_READY:
+ case ICE_ERR_RESET_FAILED:
+ case ICE_ERR_FW_API_VER:
+ case ICE_ERR_AQ_ERROR:
+ case ICE_ERR_AQ_TIMEOUT:
+ case ICE_ERR_AQ_FULL:
+ case ICE_ERR_AQ_NO_WORK:
+ case ICE_ERR_AQ_EMPTY:
+ return VIRTCHNL_STATUS_ERR_ADMIN_QUEUE_ERROR;
+ default:
+ return VIRTCHNL_STATUS_ERR_NOT_SUPPORTED;
+ }
+}
+
/**
* ice_vc_vf_broadcast - Broadcast a message to all VFs on PF
* @pf: pointer to the PF structure
*/
static void
ice_vc_vf_broadcast(struct ice_pf *pf, enum virtchnl_ops v_opcode,
- enum ice_status v_retval, u8 *msg, u16 msglen)
+ enum virtchnl_status_code v_retval, u8 *msg, u16 msglen)
{
struct ice_hw *hw = &pf->hw;
struct ice_vf *vf = pf->vf;
ice_set_pfe_link(vf, &pfe, ls->link_speed, ls->link_info &
ICE_AQ_LINK_UP);
- ice_aq_send_msg_to_vf(hw, vf->vf_id, VIRTCHNL_OP_EVENT, 0, (u8 *)&pfe,
+ ice_aq_send_msg_to_vf(hw, vf->vf_id, VIRTCHNL_OP_EVENT,
+ VIRTCHNL_STATUS_SUCCESS, (u8 *)&pfe,
sizeof(pfe), NULL);
}
}
/**
- * ice_vsi_set_pvid - Set port VLAN id for the VSI
- * @vsi: the VSI being changed
+ * ice_vsi_set_pvid_fill_ctxt - Set VSI ctxt for add pvid
+ * @ctxt: the vsi ctxt to fill
* @vid: the VLAN id to set as a PVID
*/
-static int ice_vsi_set_pvid(struct ice_vsi *vsi, u16 vid)
+static void ice_vsi_set_pvid_fill_ctxt(struct ice_vsi_ctx *ctxt, u16 vid)
+{
+ ctxt->info.vlan_flags = (ICE_AQ_VSI_VLAN_MODE_UNTAGGED |
+ ICE_AQ_VSI_PVLAN_INSERT_PVID |
+ ICE_AQ_VSI_VLAN_EMOD_STR);
+ ctxt->info.pvid = cpu_to_le16(vid);
+ ctxt->info.sw_flags2 |= ICE_AQ_VSI_SW_FLAG_RX_VLAN_PRUNE_ENA;
+ ctxt->info.valid_sections = cpu_to_le16(ICE_AQ_VSI_PROP_VLAN_VALID |
+ ICE_AQ_VSI_PROP_SW_VALID);
+}
+
+/**
+ * ice_vsi_kill_pvid_fill_ctxt - Set VSI ctx for remove pvid
+ * @ctxt: the VSI ctxt to fill
+ */
+static void ice_vsi_kill_pvid_fill_ctxt(struct ice_vsi_ctx *ctxt)
+{
+ ctxt->info.vlan_flags = ICE_AQ_VSI_VLAN_EMOD_NOTHING;
+ ctxt->info.vlan_flags |= ICE_AQ_VSI_VLAN_MODE_ALL;
+ ctxt->info.sw_flags2 &= ~ICE_AQ_VSI_SW_FLAG_RX_VLAN_PRUNE_ENA;
+ ctxt->info.valid_sections = cpu_to_le16(ICE_AQ_VSI_PROP_VLAN_VALID |
+ ICE_AQ_VSI_PROP_SW_VALID);
+}
+
+/**
+ * ice_vsi_manage_pvid - Enable or disable port VLAN for VSI
+ * @vsi: the VSI to update
+ * @vid: the VLAN id to set as a PVID
+ * @enable: true for enable pvid false for disable
+ */
+static int ice_vsi_manage_pvid(struct ice_vsi *vsi, u16 vid, bool enable)
{
struct device *dev = &vsi->back->pdev->dev;
struct ice_hw *hw = &vsi->back->hw;
if (!ctxt)
return -ENOMEM;
- ctxt->info.vlan_flags = (ICE_AQ_VSI_VLAN_MODE_UNTAGGED |
- ICE_AQ_VSI_PVLAN_INSERT_PVID |
- ICE_AQ_VSI_VLAN_EMOD_STR);
- ctxt->info.pvid = cpu_to_le16(vid);
- ctxt->info.valid_sections = cpu_to_le16(ICE_AQ_VSI_PROP_VLAN_VALID);
+ ctxt->info = vsi->info;
+ if (enable)
+ ice_vsi_set_pvid_fill_ctxt(ctxt, vid);
+ else
+ ice_vsi_kill_pvid_fill_ctxt(ctxt);
status = ice_update_vsi(hw, vsi->idx, ctxt, NULL);
if (status) {
- dev_info(dev, "update VSI for VLAN insert failed, err %d aq_err %d\n",
+ dev_info(dev, "update VSI for port VLAN failed, err %d aq_err %d\n",
status, hw->adminq.sq_last_status);
ret = -EIO;
goto out;
}
- vsi->info.pvid = ctxt->info.pvid;
- vsi->info.vlan_flags = ctxt->info.vlan_flags;
+ vsi->info = ctxt->info;
out:
devm_kfree(dev, ctxt);
return ret;
}
-/**
- * ice_vsi_kill_pvid - Remove port VLAN id from the VSI
- * @vsi: the VSI being changed
- */
-static int ice_vsi_kill_pvid(struct ice_vsi *vsi)
-{
- struct ice_pf *pf = vsi->back;
-
- if (ice_vsi_manage_vlan_stripping(vsi, false)) {
- dev_err(&pf->pdev->dev, "Error removing Port VLAN on VSI %i\n",
- vsi->vsi_num);
- return -ENODEV;
- }
-
- vsi->info.pvid = 0;
- return 0;
-}
-
/**
* ice_vf_vsi_setup - Set up a VF VSI
* @pf: board private structure
vsi->hw_base_vector += 1;
/* Check if port VLAN exist before, and restore it accordingly */
- if (vf->port_vlan_id)
- ice_vsi_set_pvid(vsi, vf->port_vlan_id);
+ if (vf->port_vlan_id) {
+ ice_vsi_manage_pvid(vsi, vf->port_vlan_id, true);
+ ice_vsi_add_vlan(vsi, vf->port_vlan_id & ICE_VLAN_M);
+ }
eth_broadcast_addr(broadcast);
*/
static int ice_alloc_vf_res(struct ice_vf *vf)
{
+ struct ice_pf *pf = vf->pf;
+ int tx_rx_queue_left;
int status;
/* setup VF VSI and necessary resources */
if (status)
goto ice_alloc_vf_res_exit;
+ /* Update number of VF queues, in case VF had requested for queue
+ * changes
+ */
+ tx_rx_queue_left = min_t(int, pf->q_left_tx, pf->q_left_rx);
+ tx_rx_queue_left += ICE_DFLT_QS_PER_VF;
+ if (vf->num_req_qs && vf->num_req_qs <= tx_rx_queue_left &&
+ vf->num_req_qs != vf->num_vf_qs)
+ vf->num_vf_qs = vf->num_req_qs;
+
if (vf->trusted)
set_bit(ICE_VIRTCHNL_VF_CAP_PRIVILEGE, &vf->vf_caps);
else
wr32(hw, GLINT_VECT2FUNC(v), reg);
}
+ /* Map mailbox interrupt. We put an explicit 0 here to remind us that
+ * VF admin queue interrupts will go to VF MSI-X vector 0.
+ */
+ wr32(hw, VPINT_MBX_CTL(abs_vf_id), VPINT_MBX_CTL_CAUSE_ENA_M | 0);
/* set regardless of mapping mode */
wr32(hw, VPLAN_TXQ_MAPENA(vf->vf_id), VPLAN_TXQ_MAPENA_TX_ENA_M);
wr32(hw, VFGEN_RSTAT(vf->vf_id), VIRTCHNL_VFR_VFACTIVE);
}
+/**
+ * ice_vf_set_vsi_promisc - set given VF VSI to given promiscuous mode(s)
+ * @vf: pointer to the VF info
+ * @vsi: the VSI being configured
+ * @promisc_m: mask of promiscuous config bits
+ * @rm_promisc: promisc flag request from the VF to remove or add filter
+ *
+ * This function configures VF VSI promiscuous mode, based on the VF requests,
+ * for Unicast, Multicast and VLAN
+ */
+static enum ice_status
+ice_vf_set_vsi_promisc(struct ice_vf *vf, struct ice_vsi *vsi, u8 promisc_m,
+ bool rm_promisc)
+{
+ struct ice_pf *pf = vf->pf;
+ enum ice_status status = 0;
+ struct ice_hw *hw;
+
+ hw = &pf->hw;
+ if (vf->num_vlan) {
+ status = ice_set_vlan_vsi_promisc(hw, vsi->idx, promisc_m,
+ rm_promisc);
+ } else if (vf->port_vlan_id) {
+ if (rm_promisc)
+ status = ice_clear_vsi_promisc(hw, vsi->idx, promisc_m,
+ vf->port_vlan_id);
+ else
+ status = ice_set_vsi_promisc(hw, vsi->idx, promisc_m,
+ vf->port_vlan_id);
+ } else {
+ if (rm_promisc)
+ status = ice_clear_vsi_promisc(hw, vsi->idx, promisc_m,
+ 0);
+ else
+ status = ice_set_vsi_promisc(hw, vsi->idx, promisc_m,
+ 0);
+ }
+
+ return status;
+}
+
/**
* ice_reset_all_vfs - reset all allocated VFs in one go
* @pf: pointer to the PF structure
bool ice_reset_all_vfs(struct ice_pf *pf, bool is_vflr)
{
struct ice_hw *hw = &pf->hw;
+ struct ice_vf *vf;
int v, i;
/* If we don't have any VFs, then there is nothing to reset */
for (v = 0; v < pf->num_alloc_vfs; v++)
ice_trigger_vf_reset(&pf->vf[v], is_vflr);
- /* Call Disable LAN Tx queue AQ call with VFR bit set and 0
- * queues to inform Firmware about VF reset.
- */
- for (v = 0; v < pf->num_alloc_vfs; v++)
- ice_dis_vsi_txq(pf->vsi[0]->port_info, 0, NULL, NULL,
- ICE_VF_RESET, v, NULL);
+ for (v = 0; v < pf->num_alloc_vfs; v++) {
+ struct ice_vsi *vsi;
+
+ vf = &pf->vf[v];
+ vsi = pf->vsi[vf->lan_vsi_idx];
+ if (test_bit(ICE_VF_STATE_ENA, vf->vf_states)) {
+ ice_vsi_stop_lan_tx_rings(vsi, ICE_VF_RESET, vf->vf_id);
+ ice_vsi_stop_rx_rings(vsi);
+ clear_bit(ICE_VF_STATE_ENA, vf->vf_states);
+ }
+ }
/* HW requires some time to make sure it can flush the FIFO for a VF
* when it resets it. Poll the VPGEN_VFRSTAT register for each VF in
/* Check each VF in sequence */
while (v < pf->num_alloc_vfs) {
- struct ice_vf *vf = &pf->vf[v];
u32 reg;
+ vf = &pf->vf[v];
reg = rd32(hw, VPGEN_VFRSTAT(vf->vf_id));
if (!(reg & VPGEN_VFRSTAT_VFRD_M))
break;
usleep_range(10000, 20000);
/* free VF resources to begin resetting the VSI state */
- for (v = 0; v < pf->num_alloc_vfs; v++)
- ice_free_vf_res(&pf->vf[v]);
+ for (v = 0; v < pf->num_alloc_vfs; v++) {
+ vf = &pf->vf[v];
+
+ ice_free_vf_res(vf);
+
+ /* Free VF queues as well, and reallocate later.
+ * If a given VF has different number of queues
+ * configured, the request for update will come
+ * via mailbox communication.
+ */
+ vf->num_vf_qs = 0;
+ }
if (ice_check_avail_res(pf)) {
dev_err(&pf->pdev->dev,
}
/* Finish the reset on each VF */
- for (v = 0; v < pf->num_alloc_vfs; v++)
- ice_cleanup_and_realloc_vf(&pf->vf[v]);
+ for (v = 0; v < pf->num_alloc_vfs; v++) {
+ vf = &pf->vf[v];
+
+ vf->num_vf_qs = pf->num_vf_qps;
+ dev_dbg(&pf->pdev->dev,
+ "VF-id %d has %d queues configured\n",
+ vf->vf_id, vf->num_vf_qs);
+ ice_cleanup_and_realloc_vf(vf);
+ }
ice_flush(hw);
clear_bit(__ICE_VF_DIS, pf->state);
static bool ice_reset_vf(struct ice_vf *vf, bool is_vflr)
{
struct ice_pf *pf = vf->pf;
- struct ice_hw *hw = &pf->hw;
struct ice_vsi *vsi;
+ struct ice_hw *hw;
bool rsd = false;
+ u8 promisc_m;
u32 reg;
int i;
vf->vf_id, NULL);
}
+ hw = &pf->hw;
/* poll VPGEN_VFRSTAT reg to make sure
* that reset is complete
*/
usleep_range(10000, 20000);
+ /* disable promiscuous modes in case they were enabled
+ * ignore any error if disabling process failed
+ */
+ if (test_bit(ICE_VF_STATE_UC_PROMISC, vf->vf_states) ||
+ test_bit(ICE_VF_STATE_MC_PROMISC, vf->vf_states)) {
+ if (vf->port_vlan_id || vf->num_vlan)
+ promisc_m = ICE_UCAST_VLAN_PROMISC_BITS;
+ else
+ promisc_m = ICE_UCAST_PROMISC_BITS;
+
+ vsi = pf->vsi[vf->lan_vsi_idx];
+ if (ice_vf_set_vsi_promisc(vf, vsi, promisc_m, true))
+ dev_err(&pf->pdev->dev, "disabling promiscuous mode failed\n");
+ }
+
/* free VF resources to begin resetting the VSI state */
ice_free_vf_res(vf);
pfe.event = VIRTCHNL_EVENT_RESET_IMPENDING;
pfe.severity = PF_EVENT_SEVERITY_CERTAIN_DOOM;
- ice_vc_vf_broadcast(pf, VIRTCHNL_OP_EVENT, ICE_SUCCESS,
+ ice_vc_vf_broadcast(pf, VIRTCHNL_OP_EVENT, VIRTCHNL_STATUS_SUCCESS,
(u8 *)&pfe, sizeof(struct virtchnl_pf_event));
}
pfe.event = VIRTCHNL_EVENT_RESET_IMPENDING;
pfe.severity = PF_EVENT_SEVERITY_CERTAIN_DOOM;
- ice_aq_send_msg_to_vf(&vf->pf->hw, vf->vf_id, VIRTCHNL_OP_EVENT, 0,
- (u8 *)&pfe, sizeof(pfe), NULL);
+ ice_aq_send_msg_to_vf(&vf->pf->hw, vf->vf_id, VIRTCHNL_OP_EVENT,
+ VIRTCHNL_STATUS_SUCCESS, (u8 *)&pfe, sizeof(pfe),
+ NULL);
}
/**
pf->num_alloc_vfs = num_alloc_vfs;
/* VF resources get allocated during reset */
- if (!ice_reset_all_vfs(pf, false))
+ if (!ice_reset_all_vfs(pf, true))
goto err_unroll_sriov;
goto err_unroll_intr;
*
* send msg to VF
*/
-static int ice_vc_send_msg_to_vf(struct ice_vf *vf, u32 v_opcode,
- enum ice_status v_retval, u8 *msg, u16 msglen)
+static int
+ice_vc_send_msg_to_vf(struct ice_vf *vf, u32 v_opcode,
+ enum virtchnl_status_code v_retval, u8 *msg, u16 msglen)
{
enum ice_status aq_ret;
struct ice_pf *pf;
if (VF_IS_V10(&vf->vf_ver))
info.minor = VIRTCHNL_VERSION_MINOR_NO_VF_CAPS;
- return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_VERSION, ICE_SUCCESS,
- (u8 *)&info,
+ return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_VERSION,
+ VIRTCHNL_STATUS_SUCCESS, (u8 *)&info,
sizeof(struct virtchnl_version_info));
}
*/
static int ice_vc_get_vf_res_msg(struct ice_vf *vf, u8 *msg)
{
+ enum virtchnl_status_code v_ret = VIRTCHNL_STATUS_SUCCESS;
struct virtchnl_vf_resource *vfres = NULL;
- enum ice_status aq_ret = 0;
struct ice_pf *pf = vf->pf;
struct ice_vsi *vsi;
int len = 0;
int ret;
if (!test_bit(ICE_VF_STATE_INIT, vf->vf_states)) {
- aq_ret = ICE_ERR_PARAM;
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
goto err;
}
vfres = devm_kzalloc(&pf->pdev->dev, len, GFP_KERNEL);
if (!vfres) {
- aq_ret = ICE_ERR_NO_MEMORY;
+ v_ret = VIRTCHNL_STATUS_ERR_NO_MEMORY;
len = 0;
goto err;
}
vfres->vf_cap_flags = VIRTCHNL_VF_OFFLOAD_L2;
vsi = pf->vsi[vf->lan_vsi_idx];
+ if (!vsi) {
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+ goto err;
+ }
+
if (!vsi->info.pvid)
vfres->vf_cap_flags |= VIRTCHNL_VF_OFFLOAD_VLAN;
err:
/* send the response back to the VF */
- ret = ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_GET_VF_RESOURCES, aq_ret,
+ ret = ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_GET_VF_RESOURCES, v_ret,
(u8 *)vfres, len);
devm_kfree(&pf->pdev->dev, vfres);
{
int i;
- for (i = 0; i < pf->num_alloc_vsi; i++)
+ ice_for_each_vsi(pf, i)
if (pf->vsi[i] && pf->vsi[i]->vsi_num == id)
return pf->vsi[i];
*/
static int ice_vc_config_rss_key(struct ice_vf *vf, u8 *msg)
{
+ enum virtchnl_status_code v_ret = VIRTCHNL_STATUS_SUCCESS;
struct virtchnl_rss_key *vrk =
(struct virtchnl_rss_key *)msg;
+ struct ice_pf *pf = vf->pf;
struct ice_vsi *vsi = NULL;
- enum ice_status aq_ret;
- int ret;
if (!test_bit(ICE_VF_STATE_ACTIVE, vf->vf_states)) {
- aq_ret = ICE_ERR_PARAM;
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
goto error_param;
}
if (!ice_vc_isvalid_vsi_id(vf, vrk->vsi_id)) {
- aq_ret = ICE_ERR_PARAM;
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
goto error_param;
}
- vsi = ice_find_vsi_from_id(vf->pf, vrk->vsi_id);
+ vsi = pf->vsi[vf->lan_vsi_idx];
if (!vsi) {
- aq_ret = ICE_ERR_PARAM;
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
goto error_param;
}
if (vrk->key_len != ICE_VSIQF_HKEY_ARRAY_SIZE) {
- aq_ret = ICE_ERR_PARAM;
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
goto error_param;
}
if (!test_bit(ICE_FLAG_RSS_ENA, vf->pf->flags)) {
- aq_ret = ICE_ERR_PARAM;
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
goto error_param;
}
- ret = ice_set_rss(vsi, vrk->key, NULL, 0);
- aq_ret = ret ? ICE_ERR_PARAM : ICE_SUCCESS;
+ if (ice_set_rss(vsi, vrk->key, NULL, 0))
+ v_ret = VIRTCHNL_STATUS_ERR_ADMIN_QUEUE_ERROR;
error_param:
- return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_RSS_KEY, aq_ret,
+ return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_RSS_KEY, v_ret,
NULL, 0);
}
static int ice_vc_config_rss_lut(struct ice_vf *vf, u8 *msg)
{
struct virtchnl_rss_lut *vrl = (struct virtchnl_rss_lut *)msg;
+ enum virtchnl_status_code v_ret = VIRTCHNL_STATUS_SUCCESS;
+ struct ice_pf *pf = vf->pf;
struct ice_vsi *vsi = NULL;
- enum ice_status aq_ret;
- int ret;
if (!test_bit(ICE_VF_STATE_ACTIVE, vf->vf_states)) {
- aq_ret = ICE_ERR_PARAM;
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
goto error_param;
}
if (!ice_vc_isvalid_vsi_id(vf, vrl->vsi_id)) {
- aq_ret = ICE_ERR_PARAM;
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
goto error_param;
}
- vsi = ice_find_vsi_from_id(vf->pf, vrl->vsi_id);
+ vsi = pf->vsi[vf->lan_vsi_idx];
if (!vsi) {
- aq_ret = ICE_ERR_PARAM;
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
goto error_param;
}
if (vrl->lut_entries != ICE_VSIQF_HLUT_ARRAY_SIZE) {
- aq_ret = ICE_ERR_PARAM;
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
goto error_param;
}
if (!test_bit(ICE_FLAG_RSS_ENA, vf->pf->flags)) {
- aq_ret = ICE_ERR_PARAM;
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
goto error_param;
}
- ret = ice_set_rss(vsi, NULL, vrl->lut, ICE_VSIQF_HLUT_ARRAY_SIZE);
- aq_ret = ret ? ICE_ERR_PARAM : ICE_SUCCESS;
+ if (ice_set_rss(vsi, NULL, vrl->lut, ICE_VSIQF_HLUT_ARRAY_SIZE))
+ v_ret = VIRTCHNL_STATUS_ERR_ADMIN_QUEUE_ERROR;
error_param:
- return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_RSS_LUT, aq_ret,
+ return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_RSS_LUT, v_ret,
NULL, 0);
}
*/
static int ice_vc_get_stats_msg(struct ice_vf *vf, u8 *msg)
{
+ enum virtchnl_status_code v_ret = VIRTCHNL_STATUS_SUCCESS;
struct virtchnl_queue_select *vqs =
(struct virtchnl_queue_select *)msg;
- enum ice_status aq_ret = 0;
+ struct ice_pf *pf = vf->pf;
struct ice_eth_stats stats;
struct ice_vsi *vsi;
if (!test_bit(ICE_VF_STATE_ACTIVE, vf->vf_states)) {
- aq_ret = ICE_ERR_PARAM;
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
goto error_param;
}
if (!ice_vc_isvalid_vsi_id(vf, vqs->vsi_id)) {
- aq_ret = ICE_ERR_PARAM;
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
goto error_param;
}
- vsi = ice_find_vsi_from_id(vf->pf, vqs->vsi_id);
+ vsi = pf->vsi[vf->lan_vsi_idx];
if (!vsi) {
- aq_ret = ICE_ERR_PARAM;
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
goto error_param;
}
error_param:
/* send the response to the VF */
- return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_GET_STATS, aq_ret,
+ return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_GET_STATS, v_ret,
(u8 *)&stats, sizeof(stats));
}
*/
static int ice_vc_ena_qs_msg(struct ice_vf *vf, u8 *msg)
{
+ enum virtchnl_status_code v_ret = VIRTCHNL_STATUS_SUCCESS;
struct virtchnl_queue_select *vqs =
(struct virtchnl_queue_select *)msg;
- enum ice_status aq_ret = 0;
+ struct ice_pf *pf = vf->pf;
struct ice_vsi *vsi;
if (!test_bit(ICE_VF_STATE_ACTIVE, vf->vf_states)) {
- aq_ret = ICE_ERR_PARAM;
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
goto error_param;
}
if (!ice_vc_isvalid_vsi_id(vf, vqs->vsi_id)) {
- aq_ret = ICE_ERR_PARAM;
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
goto error_param;
}
if (!vqs->rx_queues && !vqs->tx_queues) {
- aq_ret = ICE_ERR_PARAM;
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
goto error_param;
}
- vsi = ice_find_vsi_from_id(vf->pf, vqs->vsi_id);
+ vsi = pf->vsi[vf->lan_vsi_idx];
if (!vsi) {
- aq_ret = ICE_ERR_PARAM;
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
goto error_param;
}
* programmed using ice_vsi_cfg_txqs
*/
if (ice_vsi_start_rx_rings(vsi))
- aq_ret = ICE_ERR_PARAM;
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
/* Set flag to indicate that queues are enabled */
- if (!aq_ret)
+ if (v_ret == VIRTCHNL_STATUS_SUCCESS)
set_bit(ICE_VF_STATE_ENA, vf->vf_states);
error_param:
/* send the response to the VF */
- return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_ENABLE_QUEUES, aq_ret,
+ return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_ENABLE_QUEUES, v_ret,
NULL, 0);
}
*/
static int ice_vc_dis_qs_msg(struct ice_vf *vf, u8 *msg)
{
+ enum virtchnl_status_code v_ret = VIRTCHNL_STATUS_SUCCESS;
struct virtchnl_queue_select *vqs =
(struct virtchnl_queue_select *)msg;
- enum ice_status aq_ret = 0;
+ struct ice_pf *pf = vf->pf;
struct ice_vsi *vsi;
if (!test_bit(ICE_VF_STATE_ACTIVE, vf->vf_states) &&
!test_bit(ICE_VF_STATE_ENA, vf->vf_states)) {
- aq_ret = ICE_ERR_PARAM;
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
goto error_param;
}
if (!ice_vc_isvalid_vsi_id(vf, vqs->vsi_id)) {
- aq_ret = ICE_ERR_PARAM;
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
goto error_param;
}
if (!vqs->rx_queues && !vqs->tx_queues) {
- aq_ret = ICE_ERR_PARAM;
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
goto error_param;
}
- vsi = ice_find_vsi_from_id(vf->pf, vqs->vsi_id);
+ vsi = pf->vsi[vf->lan_vsi_idx];
if (!vsi) {
- aq_ret = ICE_ERR_PARAM;
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
goto error_param;
}
dev_err(&vsi->back->pdev->dev,
"Failed to stop tx rings on VSI %d\n",
vsi->vsi_num);
- aq_ret = ICE_ERR_PARAM;
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
}
if (ice_vsi_stop_rx_rings(vsi)) {
dev_err(&vsi->back->pdev->dev,
"Failed to stop rx rings on VSI %d\n",
vsi->vsi_num);
- aq_ret = ICE_ERR_PARAM;
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
}
/* Clear enabled queues flag */
- if (!aq_ret)
+ if (v_ret == VIRTCHNL_STATUS_SUCCESS)
clear_bit(ICE_VF_STATE_ENA, vf->vf_states);
error_param:
/* send the response to the VF */
- return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_DISABLE_QUEUES, aq_ret,
+ return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_DISABLE_QUEUES, v_ret,
NULL, 0);
}
*/
static int ice_vc_cfg_irq_map_msg(struct ice_vf *vf, u8 *msg)
{
+ enum virtchnl_status_code v_ret = VIRTCHNL_STATUS_SUCCESS;
struct virtchnl_irq_map_info *irqmap_info =
(struct virtchnl_irq_map_info *)msg;
u16 vsi_id, vsi_q_id, vector_id;
struct virtchnl_vector_map *map;
struct ice_vsi *vsi = NULL;
struct ice_pf *pf = vf->pf;
- enum ice_status aq_ret = 0;
unsigned long qmap;
int i;
if (!test_bit(ICE_VF_STATE_ACTIVE, vf->vf_states)) {
- aq_ret = ICE_ERR_PARAM;
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
goto error_param;
}
/* validate msg params */
if (!(vector_id < pf->hw.func_caps.common_cap
.num_msix_vectors) || !ice_vc_isvalid_vsi_id(vf, vsi_id)) {
- aq_ret = ICE_ERR_PARAM;
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
goto error_param;
}
- vsi = ice_find_vsi_from_id(vf->pf, vsi_id);
+ vsi = pf->vsi[vf->lan_vsi_idx];
if (!vsi) {
- aq_ret = ICE_ERR_PARAM;
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
goto error_param;
}
struct ice_q_vector *q_vector;
if (!ice_vc_isvalid_q_id(vf, vsi_id, vsi_q_id)) {
- aq_ret = ICE_ERR_PARAM;
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
goto error_param;
}
q_vector = vsi->q_vectors[i];
struct ice_q_vector *q_vector;
if (!ice_vc_isvalid_q_id(vf, vsi_id, vsi_q_id)) {
- aq_ret = ICE_ERR_PARAM;
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
goto error_param;
}
q_vector = vsi->q_vectors[i];
ice_vsi_cfg_msix(vsi);
error_param:
/* send the response to the VF */
- return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_IRQ_MAP, aq_ret,
+ return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_IRQ_MAP, v_ret,
NULL, 0);
}
*/
static int ice_vc_cfg_qs_msg(struct ice_vf *vf, u8 *msg)
{
+ enum virtchnl_status_code v_ret = VIRTCHNL_STATUS_SUCCESS;
struct virtchnl_vsi_queue_config_info *qci =
(struct virtchnl_vsi_queue_config_info *)msg;
struct virtchnl_queue_pair_info *qpi;
- enum ice_status aq_ret = 0;
+ struct ice_pf *pf = vf->pf;
struct ice_vsi *vsi;
int i;
if (!test_bit(ICE_VF_STATE_ACTIVE, vf->vf_states)) {
- aq_ret = ICE_ERR_PARAM;
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
goto error_param;
}
if (!ice_vc_isvalid_vsi_id(vf, qci->vsi_id)) {
- aq_ret = ICE_ERR_PARAM;
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
goto error_param;
}
- vsi = ice_find_vsi_from_id(vf->pf, qci->vsi_id);
+ vsi = pf->vsi[vf->lan_vsi_idx];
if (!vsi) {
- aq_ret = ICE_ERR_PARAM;
+ goto error_param;
+ }
+
+ if (qci->num_queue_pairs > ICE_MAX_BASE_QS_PER_VF) {
+ dev_err(&pf->pdev->dev,
+ "VF-%d requesting more than supported number of queues: %d\n",
+ vf->vf_id, qci->num_queue_pairs);
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
goto error_param;
}
qpi->rxq.vsi_id != qci->vsi_id ||
qpi->rxq.queue_id != qpi->txq.queue_id ||
!ice_vc_isvalid_q_id(vf, qci->vsi_id, qpi->txq.queue_id)) {
- aq_ret = ICE_ERR_PARAM;
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
goto error_param;
}
/* copy Tx queue info from VF into VSI */
vsi->rx_rings[i]->dma = qpi->rxq.dma_ring_addr;
vsi->rx_rings[i]->count = qpi->rxq.ring_len;
if (qpi->rxq.databuffer_size > ((16 * 1024) - 128)) {
- aq_ret = ICE_ERR_PARAM;
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
goto error_param;
}
vsi->rx_buf_len = qpi->rxq.databuffer_size;
if (qpi->rxq.max_pkt_size >= (16 * 1024) ||
qpi->rxq.max_pkt_size < 64) {
- aq_ret = ICE_ERR_PARAM;
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
goto error_param;
}
vsi->max_frame = qpi->rxq.max_pkt_size;
*/
vsi->num_txq = qci->num_queue_pairs;
vsi->num_rxq = qci->num_queue_pairs;
+ /* All queues of VF VSI are in TC 0 */
+ vsi->tc_cfg.tc_info[0].qcount_tx = qci->num_queue_pairs;
+ vsi->tc_cfg.tc_info[0].qcount_rx = qci->num_queue_pairs;
- if (!ice_vsi_cfg_lan_txqs(vsi) && !ice_vsi_cfg_rxqs(vsi))
- aq_ret = 0;
- else
- aq_ret = ICE_ERR_PARAM;
+ if (ice_vsi_cfg_lan_txqs(vsi) || ice_vsi_cfg_rxqs(vsi))
+ v_ret = VIRTCHNL_STATUS_ERR_ADMIN_QUEUE_ERROR;
error_param:
/* send the response to the VF */
- return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_VSI_QUEUES, aq_ret,
+ return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_VSI_QUEUES, v_ret,
NULL, 0);
}
static int
ice_vc_handle_mac_addr_msg(struct ice_vf *vf, u8 *msg, bool set)
{
+ enum virtchnl_status_code v_ret = VIRTCHNL_STATUS_SUCCESS;
struct virtchnl_ether_addr_list *al =
(struct virtchnl_ether_addr_list *)msg;
struct ice_pf *pf = vf->pf;
enum virtchnl_ops vc_op;
- enum ice_status ret;
LIST_HEAD(mac_list);
struct ice_vsi *vsi;
int mac_count = 0;
if (!test_bit(ICE_VF_STATE_ACTIVE, vf->vf_states) ||
!ice_vc_isvalid_vsi_id(vf, al->vsi_id)) {
- ret = ICE_ERR_PARAM;
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
goto handle_mac_exit;
}
if (set && !ice_is_vf_trusted(vf) &&
(vf->num_mac + al->num_elements) > ICE_MAX_MACADDR_PER_VF) {
dev_err(&pf->pdev->dev,
- "Can't add more MAC addresses, because VF is not trusted, switch the VF to trusted mode in order to add more functionalities\n");
- ret = ICE_ERR_PARAM;
+ "Can't add more MAC addresses, because VF-%d is not trusted, switch the VF to trusted mode in order to add more functionalities\n",
+ vf->vf_id);
+ /* There is no need to let VF know about not being trusted
+ * to add more MAC addr, so we can just return success message.
+ */
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
goto handle_mac_exit;
}
vsi = pf->vsi[vf->lan_vsi_idx];
+ if (!vsi) {
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+ goto handle_mac_exit;
+ }
for (i = 0; i < al->num_elements; i++) {
u8 *maddr = al->list[i].addr;
* already added. Just continue.
*/
dev_info(&pf->pdev->dev,
- "mac %pM already set for VF %d\n",
+ "MAC %pM already set for VF %d\n",
maddr, vf->vf_id);
continue;
} else {
/* VF can't remove dflt_lan_addr/bcast mac */
dev_err(&pf->pdev->dev,
- "can't remove mac %pM for VF %d\n",
+ "VF can't remove default MAC address or MAC %pM programmed by PF for VF %d\n",
maddr, vf->vf_id);
- ret = ICE_ERR_PARAM;
- goto handle_mac_exit;
+ continue;
}
}
/* check for the invalid cases and bail if necessary */
if (is_zero_ether_addr(maddr)) {
dev_err(&pf->pdev->dev,
- "invalid mac %pM provided for VF %d\n",
+ "invalid MAC %pM provided for VF %d\n",
maddr, vf->vf_id);
- ret = ICE_ERR_PARAM;
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
goto handle_mac_exit;
}
if (is_unicast_ether_addr(maddr) &&
!ice_can_vf_change_mac(vf)) {
dev_err(&pf->pdev->dev,
- "can't change unicast mac for untrusted VF %d\n",
+ "can't change unicast MAC for untrusted VF %d\n",
vf->vf_id);
- ret = ICE_ERR_PARAM;
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
goto handle_mac_exit;
}
/* get here if maddr is multicast or if VF can change mac */
if (ice_add_mac_to_list(vsi, &mac_list, al->list[i].addr)) {
- ret = ICE_ERR_NO_MEMORY;
+ v_ret = VIRTCHNL_STATUS_ERR_NO_MEMORY;
goto handle_mac_exit;
}
mac_count++;
/* program the updated filter list */
if (set)
- ret = ice_add_mac(&pf->hw, &mac_list);
+ v_ret = ice_err_to_virt_err(ice_add_mac(&pf->hw, &mac_list));
else
- ret = ice_remove_mac(&pf->hw, &mac_list);
+ v_ret = ice_err_to_virt_err(ice_remove_mac(&pf->hw, &mac_list));
- if (ret) {
+ if (v_ret) {
dev_err(&pf->pdev->dev,
- "can't update mac filters for VF %d, error %d\n",
- vf->vf_id, ret);
+ "can't update MAC filters for VF %d, error %d\n",
+ vf->vf_id, v_ret);
} else {
if (set)
vf->num_mac += mac_count;
handle_mac_exit:
ice_free_fltr_list(&pf->pdev->dev, &mac_list);
/* send the response to the VF */
- return ice_vc_send_msg_to_vf(vf, vc_op, ret, NULL, 0);
+ return ice_vc_send_msg_to_vf(vf, vc_op, v_ret, NULL, 0);
}
/**
*/
static int ice_vc_request_qs_msg(struct ice_vf *vf, u8 *msg)
{
+ enum virtchnl_status_code v_ret = VIRTCHNL_STATUS_SUCCESS;
struct virtchnl_vf_res_request *vfres =
(struct virtchnl_vf_res_request *)msg;
int req_queues = vfres->num_queue_pairs;
- enum ice_status aq_ret = 0;
struct ice_pf *pf = vf->pf;
+ int max_allowed_vf_queues;
int tx_rx_queue_left;
int cur_queues;
if (!test_bit(ICE_VF_STATE_ACTIVE, vf->vf_states)) {
- aq_ret = ICE_ERR_PARAM;
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
goto error_param;
}
- cur_queues = pf->num_vf_qps;
+ cur_queues = vf->num_vf_qs;
tx_rx_queue_left = min_t(int, pf->q_left_tx, pf->q_left_rx);
+ max_allowed_vf_queues = tx_rx_queue_left + cur_queues;
if (req_queues <= 0) {
dev_err(&pf->pdev->dev,
"VF %d tried to request %d queues. Ignoring.\n",
vf->vf_id, req_queues);
- } else if (req_queues > ICE_MAX_QS_PER_VF) {
+ } else if (req_queues > ICE_MAX_BASE_QS_PER_VF) {
dev_err(&pf->pdev->dev,
"VF %d tried to request more than %d queues.\n",
- vf->vf_id, ICE_MAX_QS_PER_VF);
- vfres->num_queue_pairs = ICE_MAX_QS_PER_VF;
+ vf->vf_id, ICE_MAX_BASE_QS_PER_VF);
+ vfres->num_queue_pairs = ICE_MAX_BASE_QS_PER_VF;
} else if (req_queues - cur_queues > tx_rx_queue_left) {
dev_warn(&pf->pdev->dev,
"VF %d requested %d more queues, but only %d left.\n",
vf->vf_id, req_queues - cur_queues, tx_rx_queue_left);
- vfres->num_queue_pairs = tx_rx_queue_left + cur_queues;
+ vfres->num_queue_pairs = min_t(int, max_allowed_vf_queues,
+ ICE_MAX_BASE_QS_PER_VF);
} else {
/* request is successful, then reset VF */
vf->num_req_qs = req_queues;
error_param:
/* send the response to the VF */
return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_REQUEST_QUEUES,
- aq_ret, (u8 *)vfres, sizeof(*vfres));
+ v_ret, (u8 *)vfres, sizeof(*vfres));
}
/**
VLAN_VID_MASK));
if (vlan_id || qos) {
- ret = ice_vsi_set_pvid(vsi, vlanprio);
+ ret = ice_vsi_manage_pvid(vsi, vlanprio, true);
if (ret)
goto error_set_pvid;
} else {
- ice_vsi_kill_pvid(vsi);
+ ice_vsi_manage_pvid(vsi, 0, false);
+ vsi->info.pvid = 0;
}
if (vlan_id) {
*/
static int ice_vc_process_vlan_msg(struct ice_vf *vf, u8 *msg, bool add_v)
{
+ enum virtchnl_status_code v_ret = VIRTCHNL_STATUS_SUCCESS;
struct virtchnl_vlan_filter_list *vfl =
(struct virtchnl_vlan_filter_list *)msg;
- enum ice_status aq_ret = 0;
struct ice_pf *pf = vf->pf;
+ bool vlan_promisc = false;
struct ice_vsi *vsi;
+ struct ice_hw *hw;
+ int status = 0;
+ u8 promisc_m;
int i;
if (!test_bit(ICE_VF_STATE_ACTIVE, vf->vf_states)) {
- aq_ret = ICE_ERR_PARAM;
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
goto error_param;
}
if (!ice_vc_isvalid_vsi_id(vf, vfl->vsi_id)) {
- aq_ret = ICE_ERR_PARAM;
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
goto error_param;
}
if (add_v && !ice_is_vf_trusted(vf) &&
vf->num_vlan >= ICE_MAX_VLAN_PER_VF) {
dev_info(&pf->pdev->dev,
- "VF is not trusted, switch the VF to trusted mode, in order to add more VLAN addresses\n");
- aq_ret = ICE_ERR_PARAM;
+ "VF-%d is not trusted, switch the VF to trusted mode, in order to add more VLAN addresses\n",
+ vf->vf_id);
+ /* There is no need to let VF know about being not trusted,
+ * so we can just return success message here
+ */
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
goto error_param;
}
for (i = 0; i < vfl->num_elements; i++) {
if (vfl->vlan_id[i] > ICE_MAX_VLANID) {
- aq_ret = ICE_ERR_PARAM;
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
dev_err(&pf->pdev->dev,
"invalid VF VLAN id %d\n", vfl->vlan_id[i]);
goto error_param;
}
}
- vsi = ice_find_vsi_from_id(vf->pf, vfl->vsi_id);
+ hw = &pf->hw;
+ vsi = pf->vsi[vf->lan_vsi_idx];
if (!vsi) {
- aq_ret = ICE_ERR_PARAM;
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
goto error_param;
}
if (vsi->info.pvid) {
- aq_ret = ICE_ERR_PARAM;
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
goto error_param;
}
dev_err(&pf->pdev->dev,
"%sable VLAN stripping failed for VSI %i\n",
add_v ? "en" : "dis", vsi->vsi_num);
- aq_ret = ICE_ERR_PARAM;
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
goto error_param;
}
+ if (test_bit(ICE_VF_STATE_UC_PROMISC, vf->vf_states) ||
+ test_bit(ICE_VF_STATE_MC_PROMISC, vf->vf_states))
+ vlan_promisc = true;
+
if (add_v) {
for (i = 0; i < vfl->num_elements; i++) {
u16 vid = vfl->vlan_id[i];
- if (!ice_vsi_add_vlan(vsi, vid)) {
- vf->num_vlan++;
+ if (ice_vsi_add_vlan(vsi, vid)) {
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+ goto error_param;
+ }
- /* Enable VLAN pruning when VLAN 0 is added */
- if (unlikely(!vid))
- if (ice_cfg_vlan_pruning(vsi, true))
- aq_ret = ICE_ERR_PARAM;
+ vf->num_vlan++;
+ /* Enable VLAN pruning when VLAN is added */
+ if (!vlan_promisc) {
+ status = ice_cfg_vlan_pruning(vsi, true, false);
+ if (status) {
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+ dev_err(&pf->pdev->dev,
+ "Enable VLAN pruning on VLAN ID: %d failed error-%d\n",
+ vid, status);
+ goto error_param;
+ }
} else {
- aq_ret = ICE_ERR_PARAM;
+ /* Enable Ucast/Mcast VLAN promiscuous mode */
+ promisc_m = ICE_PROMISC_VLAN_TX |
+ ICE_PROMISC_VLAN_RX;
+
+ status = ice_set_vsi_promisc(hw, vsi->idx,
+ promisc_m, vid);
+ if (status) {
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+ dev_err(&pf->pdev->dev,
+ "Enable Unicast/multicast promiscuous mode on VLAN ID:%d failed error-%d\n",
+ vid, status);
+ }
}
}
} else {
/* Make sure ice_vsi_kill_vlan is successful before
* updating VLAN information
*/
- if (!ice_vsi_kill_vlan(vsi, vid)) {
- vf->num_vlan--;
+ if (ice_vsi_kill_vlan(vsi, vid)) {
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+ goto error_param;
+ }
+
+ vf->num_vlan--;
+ /* Disable VLAN pruning when removing VLAN */
+ ice_cfg_vlan_pruning(vsi, false, false);
- /* Disable VLAN pruning when removing VLAN 0 */
- if (unlikely(!vid))
- ice_cfg_vlan_pruning(vsi, false);
+ /* Disable Unicast/Multicast VLAN promiscuous mode */
+ if (vlan_promisc) {
+ promisc_m = ICE_PROMISC_VLAN_TX |
+ ICE_PROMISC_VLAN_RX;
+
+ ice_clear_vsi_promisc(hw, vsi->idx,
+ promisc_m, vid);
}
}
}
error_param:
/* send the response to the VF */
if (add_v)
- return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_ADD_VLAN, aq_ret,
+ return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_ADD_VLAN, v_ret,
NULL, 0);
else
- return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_DEL_VLAN, aq_ret,
+ return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_DEL_VLAN, v_ret,
NULL, 0);
}
*/
static int ice_vc_ena_vlan_stripping(struct ice_vf *vf)
{
- enum ice_status aq_ret = 0;
+ enum virtchnl_status_code v_ret = VIRTCHNL_STATUS_SUCCESS;
struct ice_pf *pf = vf->pf;
struct ice_vsi *vsi;
if (!test_bit(ICE_VF_STATE_ACTIVE, vf->vf_states)) {
- aq_ret = ICE_ERR_PARAM;
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
goto error_param;
}
vsi = pf->vsi[vf->lan_vsi_idx];
if (ice_vsi_manage_vlan_stripping(vsi, true))
- aq_ret = ICE_ERR_AQ_ERROR;
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
error_param:
return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_ENABLE_VLAN_STRIPPING,
- aq_ret, NULL, 0);
+ v_ret, NULL, 0);
}
/**
*/
static int ice_vc_dis_vlan_stripping(struct ice_vf *vf)
{
- enum ice_status aq_ret = 0;
+ enum virtchnl_status_code v_ret = VIRTCHNL_STATUS_SUCCESS;
struct ice_pf *pf = vf->pf;
struct ice_vsi *vsi;
if (!test_bit(ICE_VF_STATE_ACTIVE, vf->vf_states)) {
- aq_ret = ICE_ERR_PARAM;
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
goto error_param;
}
vsi = pf->vsi[vf->lan_vsi_idx];
+ if (!vsi) {
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+ goto error_param;
+ }
+
if (ice_vsi_manage_vlan_stripping(vsi, false))
- aq_ret = ICE_ERR_AQ_ERROR;
+ v_ret = VIRTCHNL_STATUS_ERR_PARAM;
error_param:
return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_DISABLE_VLAN_STRIPPING,
- aq_ret, NULL, 0);
+ v_ret, NULL, 0);
}
/**
/* Perform basic checks on the msg */
err = virtchnl_vc_validate_vf_msg(&vf->vf_ver, v_opcode, msg, msglen);
if (err) {
- if (err == VIRTCHNL_ERR_PARAM)
+ if (err == VIRTCHNL_STATUS_ERR_PARAM)
err = -EPERM;
else
err = -EINVAL;
error_handler:
if (err) {
- ice_vc_send_msg_to_vf(vf, v_opcode, ICE_ERR_PARAM, NULL, 0);
+ ice_vc_send_msg_to_vf(vf, v_opcode, VIRTCHNL_STATUS_ERR_PARAM,
+ NULL, 0);
dev_err(&pf->pdev->dev, "Invalid message from VF %d, opcode %d, len %d, error %d\n",
vf_id, v_opcode, msglen, err);
return;
default:
dev_err(&pf->pdev->dev, "Unsupported opcode %d from VF %d\n",
v_opcode, vf_id);
- err = ice_vc_send_msg_to_vf(vf, v_opcode, ICE_ERR_NOT_IMPL,
+ err = ice_vc_send_msg_to_vf(vf, v_opcode,
+ VIRTCHNL_STATUS_ERR_NOT_SUPPORTED,
NULL, 0);
break;
}
* as it is busy with pending work.
*/
dev_info(&pf->pdev->dev,
- "PF failed to honor VF %d, opcode %d\n, error %d\n",
+ "PF failed to honor VF %d, opcode %d, error %d\n",
vf_id, v_opcode, err);
}
}
*
* return VF configuration
*/
-int ice_get_vf_cfg(struct net_device *netdev, int vf_id,
- struct ifla_vf_info *ivi)
+int
+ice_get_vf_cfg(struct net_device *netdev, int vf_id, struct ifla_vf_info *ivi)
{
struct ice_netdev_priv *np = netdev_priv(netdev);
struct ice_vsi *vsi = np->vsi;
ether_addr_copy(vf->dflt_lan_addr.addr, mac);
vf->pf_set_mac = true;
netdev_info(netdev,
- "mac on VF %d set to %pM\n. VF driver will be reinitialized\n",
+ "MAC on VF %d set to %pM. VF driver will be reinitialized\n",
vf_id, mac);
ice_vc_dis_vf(vf);
ice_set_pfe_link(vf, &pfe, ls->link_speed, vf->link_up);
/* Notify the VF of its new link state */
- ice_aq_send_msg_to_vf(hw, vf->vf_id, VIRTCHNL_OP_EVENT, 0, (u8 *)&pfe,
+ ice_aq_send_msg_to_vf(hw, vf->vf_id, VIRTCHNL_OP_EVENT,
+ VIRTCHNL_STATUS_SUCCESS, (u8 *)&pfe,
sizeof(pfe), NULL);
return 0;
u8 spoofchk;
u16 num_mac;
u16 num_vlan;
+ u16 num_vf_qs; /* num of queue configured per VF */
u8 num_req_qs; /* num of queue pairs requested by VF */
};
void ice_process_vflr_event(struct ice_pf *pf);
int ice_sriov_configure(struct pci_dev *pdev, int num_vfs);
int ice_set_vf_mac(struct net_device *netdev, int vf_id, u8 *mac);
-int ice_get_vf_cfg(struct net_device *netdev, int vf_id,
- struct ifla_vf_info *ivi);
+int
+ice_get_vf_cfg(struct net_device *netdev, int vf_id, struct ifla_vf_info *ivi);
void ice_free_vfs(struct ice_pf *pf);
void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event);
void ice_vc_notify_reset(struct ice_pf *pf);
bool ice_reset_all_vfs(struct ice_pf *pf, bool is_vflr);
-int ice_set_vf_port_vlan(struct net_device *netdev, int vf_id,
- u16 vlan_id, u8 qos, __be16 vlan_proto);
-
-int ice_set_vf_bw(struct net_device *netdev, int vf_id, int min_tx_rate,
- int max_tx_rate);
+int
+ice_set_vf_port_vlan(struct net_device *netdev, int vf_id, u16 vlan_id, u8 qos,
+ __be16 vlan_proto);
int ice_set_vf_trust(struct net_device *netdev, int vf_id, bool trusted);
return -EOPNOTSUPP;
}
-static inline int
-ice_set_vf_bw(struct net_device __always_unused *netdev,
- int __always_unused vf_id, int __always_unused min_tx_rate,
- int __always_unused max_tx_rate)
-{
- return -EOPNOTSUPP;
-}
#endif /* CONFIG_PCI_IOV */
#endif /* _ICE_VIRTCHNL_PF_H_ */
} else if (!edata->eee_enabled) {
dev_err(&adapter->pdev->dev,
"Setting EEE options are not supported with EEE disabled\n");
- return -EINVAL;
- }
+ return -EINVAL;
+ }
adapter->eee_advert = ethtool_adv_to_mmd_eee_adv_t(edata->advertised);
if (hw->dev_spec._82575.eee_disable != !edata->eee_enabled) {
else
igb_reset(adapter);
- return 0;
+ return 1;
}
static int igb_ndo_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
break;
}
}
+
+ dev_pm_set_driver_flags(&pdev->dev, DPM_FLAG_NEVER_SKIP);
+
pm_runtime_put_noidle(&pdev->dev);
return 0;
/* Make sure there is space in the ring for the next send. */
igb_maybe_stop_tx(tx_ring, DESC_NEEDED);
- if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
+ if (netif_xmit_stopped(txring_txq(tx_ring)) || !netdev_xmit_more()) {
writel(i, tx_ring->tail);
/* we need this if more than one processor can write to our tail
void igc_set_flag_queue_pairs(struct igc_adapter *adapter,
const u32 max_rss_queues);
int igc_reinit_queues(struct igc_adapter *adapter);
+void igc_write_rss_indir_tbl(struct igc_adapter *adapter);
bool igc_has_link(struct igc_adapter *adapter);
void igc_reset(struct igc_adapter *adapter);
int igc_set_spd_dplx(struct igc_adapter *adapter, u32 spd, u8 dplx);
+int igc_add_mac_steering_filter(struct igc_adapter *adapter,
+ const u8 *addr, u8 queue, u8 flags);
+int igc_del_mac_steering_filter(struct igc_adapter *adapter,
+ const u8 *addr, u8 queue, u8 flags);
+void igc_update_stats(struct igc_adapter *adapter);
extern char igc_driver_name[];
extern char igc_driver_version[];
#define IGC_FLAG_VLAN_PROMISC BIT(15)
#define IGC_FLAG_RX_LEGACY BIT(16)
+#define IGC_FLAG_RSS_FIELD_IPV4_UDP BIT(6)
+#define IGC_FLAG_RSS_FIELD_IPV6_UDP BIT(7)
+
+#define IGC_MRQC_ENABLE_RSS_MQ 0x00000002
+#define IGC_MRQC_RSS_FIELD_IPV4_UDP 0x00400000
+#define IGC_MRQC_RSS_FIELD_IPV6_UDP 0x00800000
+
#define IGC_START_ITR 648 /* ~6000 ints/sec */
#define IGC_4K_ITR 980
#define IGC_20K_ITR 196
struct igc_ring ring[0] ____cacheline_internodealigned_in_smp;
};
+#define MAX_ETYPE_FILTER (4 - 1)
+
+enum igc_filter_match_flags {
+ IGC_FILTER_FLAG_ETHER_TYPE = 0x1,
+ IGC_FILTER_FLAG_VLAN_TCI = 0x2,
+ IGC_FILTER_FLAG_SRC_MAC_ADDR = 0x4,
+ IGC_FILTER_FLAG_DST_MAC_ADDR = 0x8,
+};
+
+/* RX network flow classification data structure */
+struct igc_nfc_input {
+ /* Byte layout in order, all values with MSB first:
+ * match_flags - 1 byte
+ * etype - 2 bytes
+ * vlan_tci - 2 bytes
+ */
+ u8 match_flags;
+ __be16 etype;
+ __be16 vlan_tci;
+ u8 src_addr[ETH_ALEN];
+ u8 dst_addr[ETH_ALEN];
+};
+
+struct igc_nfc_filter {
+ struct hlist_node nfc_node;
+ struct igc_nfc_input filter;
+ unsigned long cookie;
+ u16 etype_reg_index;
+ u16 sw_idx;
+ u16 action;
+};
+
struct igc_mac_addr {
u8 addr[ETH_ALEN];
u8 queue;
u8 state; /* bitmask */
};
-#define IGC_MAC_STATE_DEFAULT 0x1
-#define IGC_MAC_STATE_MODIFIED 0x2
-#define IGC_MAC_STATE_IN_USE 0x4
+#define IGC_MAC_STATE_DEFAULT 0x1
+#define IGC_MAC_STATE_IN_USE 0x2
+#define IGC_MAC_STATE_SRC_ADDR 0x4
+#define IGC_MAC_STATE_QUEUE_STEERING 0x8
+
+#define IGC_MAX_RXNFC_FILTERS 16
/* Board specific private data structure */
struct igc_adapter {
u16 tx_ring_count;
u16 rx_ring_count;
+ u32 tx_hwtstamp_timeouts;
+ u32 tx_hwtstamp_skipped;
+ u32 rx_hwtstamp_cleared;
u32 *shadow_vfta;
u32 rss_queues;
+ u32 rss_indir_tbl_init;
+
+ /* RX network flow classification support */
+ struct hlist_head nfc_filter_list;
+ struct hlist_head cls_flower_list;
+ unsigned int nfc_filter_count;
/* lock for RX network flow classification filter */
spinlock_t nfc_lock;
+ bool etype_bitmap[MAX_ETYPE_FILTER];
struct igc_mac_addr *mac_table;
/* forward declaration */
void igc_reinit_locked(struct igc_adapter *);
+int igc_add_filter(struct igc_adapter *adapter,
+ struct igc_nfc_filter *input);
+int igc_erase_filter(struct igc_adapter *adapter,
+ struct igc_nfc_filter *input);
#define igc_rx_pg_size(_ring) (PAGE_SIZE << igc_rx_pg_order(_ring))
/* SPDX-License-Identifier: GPL-2.0 */
/* Copyright (c) 2018 Intel Corporation */
-#ifndef _IGC_BASE_H
-#define _IGC_BASE_H
+#ifndef _IGC_BASE_H_
+#define _IGC_BASE_H_
/* forward declaration */
void igc_rx_fifo_flush_base(struct igc_hw *hw);
IGC_RXDEXT_STATERR_CXE | \
IGC_RXDEXT_STATERR_RXE)
+#define IGC_MRQC_RSS_FIELD_IPV4_TCP 0x00010000
+#define IGC_MRQC_RSS_FIELD_IPV4 0x00020000
+#define IGC_MRQC_RSS_FIELD_IPV6_TCP_EX 0x00040000
+#define IGC_MRQC_RSS_FIELD_IPV6 0x00100000
+#define IGC_MRQC_RSS_FIELD_IPV6_TCP 0x00200000
+
/* Header split receive */
#define IGC_RFCTL_IPV6_EX_DIS 0x00010000
#define IGC_RFCTL_LEF 0x00040000
#define I225_RXPBSIZE_DEFAULT 0x000000A2 /* RXPBSIZE default */
#define I225_TXPBSIZE_DEFAULT 0x04000014 /* TXPBSIZE default */
+/* Receive Checksum Control */
+#define IGC_RXCSUM_CRCOFL 0x00000800 /* CRC32 offload enable */
+#define IGC_RXCSUM_PCSD 0x00002000 /* packet checksum disabled */
+
/* GPY211 - I225 defines */
#define GPY_MMD_MASK 0xFFFF0000
#define GPY_MMD_SHIFT 16
#define IGC_N0_QUEUE -1
+#define IGC_MAX_MAC_HDR_LEN 127
+#define IGC_MAX_NETWORK_HDR_LEN 511
+
+#define IGC_VLAPQF_QUEUE_SEL(_n, q_idx) ((q_idx) << ((_n) * 4))
+#define IGC_VLAPQF_P_VALID(_n) (0x1 << (3 + (_n) * 4))
+#define IGC_VLAPQF_QUEUE_MASK 0x03
+
#endif /* _IGC_DEFINES_H_ */
/* Copyright (c) 2018 Intel Corporation */
/* ethtool support for igc */
+#include <linux/if_vlan.h>
#include <linux/pm_runtime.h>
#include "igc.h"
+/* forward declaration */
+struct igc_stats {
+ char stat_string[ETH_GSTRING_LEN];
+ int sizeof_stat;
+ int stat_offset;
+};
+
+#define IGC_STAT(_name, _stat) { \
+ .stat_string = _name, \
+ .sizeof_stat = FIELD_SIZEOF(struct igc_adapter, _stat), \
+ .stat_offset = offsetof(struct igc_adapter, _stat) \
+}
+
+static const struct igc_stats igc_gstrings_stats[] = {
+ IGC_STAT("rx_packets", stats.gprc),
+ IGC_STAT("tx_packets", stats.gptc),
+ IGC_STAT("rx_bytes", stats.gorc),
+ IGC_STAT("tx_bytes", stats.gotc),
+ IGC_STAT("rx_broadcast", stats.bprc),
+ IGC_STAT("tx_broadcast", stats.bptc),
+ IGC_STAT("rx_multicast", stats.mprc),
+ IGC_STAT("tx_multicast", stats.mptc),
+ IGC_STAT("multicast", stats.mprc),
+ IGC_STAT("collisions", stats.colc),
+ IGC_STAT("rx_crc_errors", stats.crcerrs),
+ IGC_STAT("rx_no_buffer_count", stats.rnbc),
+ IGC_STAT("rx_missed_errors", stats.mpc),
+ IGC_STAT("tx_aborted_errors", stats.ecol),
+ IGC_STAT("tx_carrier_errors", stats.tncrs),
+ IGC_STAT("tx_window_errors", stats.latecol),
+ IGC_STAT("tx_abort_late_coll", stats.latecol),
+ IGC_STAT("tx_deferred_ok", stats.dc),
+ IGC_STAT("tx_single_coll_ok", stats.scc),
+ IGC_STAT("tx_multi_coll_ok", stats.mcc),
+ IGC_STAT("tx_timeout_count", tx_timeout_count),
+ IGC_STAT("rx_long_length_errors", stats.roc),
+ IGC_STAT("rx_short_length_errors", stats.ruc),
+ IGC_STAT("rx_align_errors", stats.algnerrc),
+ IGC_STAT("tx_tcp_seg_good", stats.tsctc),
+ IGC_STAT("tx_tcp_seg_failed", stats.tsctfc),
+ IGC_STAT("rx_flow_control_xon", stats.xonrxc),
+ IGC_STAT("rx_flow_control_xoff", stats.xoffrxc),
+ IGC_STAT("tx_flow_control_xon", stats.xontxc),
+ IGC_STAT("tx_flow_control_xoff", stats.xofftxc),
+ IGC_STAT("rx_long_byte_count", stats.gorc),
+ IGC_STAT("tx_dma_out_of_sync", stats.doosync),
+ IGC_STAT("tx_smbus", stats.mgptc),
+ IGC_STAT("rx_smbus", stats.mgprc),
+ IGC_STAT("dropped_smbus", stats.mgpdc),
+ IGC_STAT("os2bmc_rx_by_bmc", stats.o2bgptc),
+ IGC_STAT("os2bmc_tx_by_bmc", stats.b2ospc),
+ IGC_STAT("os2bmc_tx_by_host", stats.o2bspc),
+ IGC_STAT("os2bmc_rx_by_host", stats.b2ogprc),
+ IGC_STAT("tx_hwtstamp_timeouts", tx_hwtstamp_timeouts),
+ IGC_STAT("tx_hwtstamp_skipped", tx_hwtstamp_skipped),
+ IGC_STAT("rx_hwtstamp_cleared", rx_hwtstamp_cleared),
+};
+
+#define IGC_NETDEV_STAT(_net_stat) { \
+ .stat_string = __stringify(_net_stat), \
+ .sizeof_stat = FIELD_SIZEOF(struct rtnl_link_stats64, _net_stat), \
+ .stat_offset = offsetof(struct rtnl_link_stats64, _net_stat) \
+}
+
+static const struct igc_stats igc_gstrings_net_stats[] = {
+ IGC_NETDEV_STAT(rx_errors),
+ IGC_NETDEV_STAT(tx_errors),
+ IGC_NETDEV_STAT(tx_dropped),
+ IGC_NETDEV_STAT(rx_length_errors),
+ IGC_NETDEV_STAT(rx_over_errors),
+ IGC_NETDEV_STAT(rx_frame_errors),
+ IGC_NETDEV_STAT(rx_fifo_errors),
+ IGC_NETDEV_STAT(tx_fifo_errors),
+ IGC_NETDEV_STAT(tx_heartbeat_errors)
+};
+
+enum igc_diagnostics_results {
+ TEST_REG = 0,
+ TEST_EEP,
+ TEST_IRQ,
+ TEST_LOOP,
+ TEST_LINK
+};
+
+static const char igc_gstrings_test[][ETH_GSTRING_LEN] = {
+ [TEST_REG] = "Register test (offline)",
+ [TEST_EEP] = "Eeprom test (offline)",
+ [TEST_IRQ] = "Interrupt test (offline)",
+ [TEST_LOOP] = "Loopback test (offline)",
+ [TEST_LINK] = "Link test (on/offline)"
+};
+
+#define IGC_TEST_LEN (sizeof(igc_gstrings_test) / ETH_GSTRING_LEN)
+
+#define IGC_GLOBAL_STATS_LEN \
+ (sizeof(igc_gstrings_stats) / sizeof(struct igc_stats))
+#define IGC_NETDEV_STATS_LEN \
+ (sizeof(igc_gstrings_net_stats) / sizeof(struct igc_stats))
+#define IGC_RX_QUEUE_STATS_LEN \
+ (sizeof(struct igc_rx_queue_stats) / sizeof(u64))
+#define IGC_TX_QUEUE_STATS_LEN 3 /* packets, bytes, restart_queue */
+#define IGC_QUEUE_STATS_LEN \
+ ((((struct igc_adapter *)netdev_priv(netdev))->num_rx_queues * \
+ IGC_RX_QUEUE_STATS_LEN) + \
+ (((struct igc_adapter *)netdev_priv(netdev))->num_tx_queues * \
+ IGC_TX_QUEUE_STATS_LEN))
+#define IGC_STATS_LEN \
+ (IGC_GLOBAL_STATS_LEN + IGC_NETDEV_STATS_LEN + IGC_QUEUE_STATS_LEN)
+
static const char igc_priv_flags_strings[][ETH_GSTRING_LEN] = {
#define IGC_PRIV_FLAGS_LEGACY_RX BIT(0)
"legacy-rx",
return retval;
}
+static void igc_get_strings(struct net_device *netdev, u32 stringset, u8 *data)
+{
+ struct igc_adapter *adapter = netdev_priv(netdev);
+ u8 *p = data;
+ int i;
+
+ switch (stringset) {
+ case ETH_SS_TEST:
+ memcpy(data, *igc_gstrings_test,
+ IGC_TEST_LEN * ETH_GSTRING_LEN);
+ break;
+ case ETH_SS_STATS:
+ for (i = 0; i < IGC_GLOBAL_STATS_LEN; i++) {
+ memcpy(p, igc_gstrings_stats[i].stat_string,
+ ETH_GSTRING_LEN);
+ p += ETH_GSTRING_LEN;
+ }
+ for (i = 0; i < IGC_NETDEV_STATS_LEN; i++) {
+ memcpy(p, igc_gstrings_net_stats[i].stat_string,
+ ETH_GSTRING_LEN);
+ p += ETH_GSTRING_LEN;
+ }
+ for (i = 0; i < adapter->num_tx_queues; i++) {
+ sprintf(p, "tx_queue_%u_packets", i);
+ p += ETH_GSTRING_LEN;
+ sprintf(p, "tx_queue_%u_bytes", i);
+ p += ETH_GSTRING_LEN;
+ sprintf(p, "tx_queue_%u_restart", i);
+ p += ETH_GSTRING_LEN;
+ }
+ for (i = 0; i < adapter->num_rx_queues; i++) {
+ sprintf(p, "rx_queue_%u_packets", i);
+ p += ETH_GSTRING_LEN;
+ sprintf(p, "rx_queue_%u_bytes", i);
+ p += ETH_GSTRING_LEN;
+ sprintf(p, "rx_queue_%u_drops", i);
+ p += ETH_GSTRING_LEN;
+ sprintf(p, "rx_queue_%u_csum_err", i);
+ p += ETH_GSTRING_LEN;
+ sprintf(p, "rx_queue_%u_alloc_failed", i);
+ p += ETH_GSTRING_LEN;
+ }
+ /* BUG_ON(p - data != IGC_STATS_LEN * ETH_GSTRING_LEN); */
+ break;
+ case ETH_SS_PRIV_FLAGS:
+ memcpy(data, igc_priv_flags_strings,
+ IGC_PRIV_FLAGS_STR_LEN * ETH_GSTRING_LEN);
+ break;
+ }
+}
+
+static int igc_get_sset_count(struct net_device *netdev, int sset)
+{
+ switch (sset) {
+ case ETH_SS_STATS:
+ return IGC_STATS_LEN;
+ case ETH_SS_TEST:
+ return IGC_TEST_LEN;
+ case ETH_SS_PRIV_FLAGS:
+ return IGC_PRIV_FLAGS_STR_LEN;
+ default:
+ return -ENOTSUPP;
+ }
+}
+
+static void igc_get_ethtool_stats(struct net_device *netdev,
+ struct ethtool_stats *stats, u64 *data)
+{
+ struct igc_adapter *adapter = netdev_priv(netdev);
+ struct rtnl_link_stats64 *net_stats = &adapter->stats64;
+ unsigned int start;
+ struct igc_ring *ring;
+ int i, j;
+ char *p;
+
+ spin_lock(&adapter->stats64_lock);
+ igc_update_stats(adapter);
+
+ for (i = 0; i < IGC_GLOBAL_STATS_LEN; i++) {
+ p = (char *)adapter + igc_gstrings_stats[i].stat_offset;
+ data[i] = (igc_gstrings_stats[i].sizeof_stat ==
+ sizeof(u64)) ? *(u64 *)p : *(u32 *)p;
+ }
+ for (j = 0; j < IGC_NETDEV_STATS_LEN; j++, i++) {
+ p = (char *)net_stats + igc_gstrings_net_stats[j].stat_offset;
+ data[i] = (igc_gstrings_net_stats[j].sizeof_stat ==
+ sizeof(u64)) ? *(u64 *)p : *(u32 *)p;
+ }
+ for (j = 0; j < adapter->num_tx_queues; j++) {
+ u64 restart2;
+
+ ring = adapter->tx_ring[j];
+ do {
+ start = u64_stats_fetch_begin_irq(&ring->tx_syncp);
+ data[i] = ring->tx_stats.packets;
+ data[i + 1] = ring->tx_stats.bytes;
+ data[i + 2] = ring->tx_stats.restart_queue;
+ } while (u64_stats_fetch_retry_irq(&ring->tx_syncp, start));
+ do {
+ start = u64_stats_fetch_begin_irq(&ring->tx_syncp2);
+ restart2 = ring->tx_stats.restart_queue2;
+ } while (u64_stats_fetch_retry_irq(&ring->tx_syncp2, start));
+ data[i + 2] += restart2;
+
+ i += IGC_TX_QUEUE_STATS_LEN;
+ }
+ for (j = 0; j < adapter->num_rx_queues; j++) {
+ ring = adapter->rx_ring[j];
+ do {
+ start = u64_stats_fetch_begin_irq(&ring->rx_syncp);
+ data[i] = ring->rx_stats.packets;
+ data[i + 1] = ring->rx_stats.bytes;
+ data[i + 2] = ring->rx_stats.drops;
+ data[i + 3] = ring->rx_stats.csum_err;
+ data[i + 4] = ring->rx_stats.alloc_failed;
+ } while (u64_stats_fetch_retry_irq(&ring->rx_syncp, start));
+ i += IGC_RX_QUEUE_STATS_LEN;
+ }
+ spin_unlock(&adapter->stats64_lock);
+}
+
static int igc_get_coalesce(struct net_device *netdev,
struct ethtool_coalesce *ec)
{
return 0;
}
+#define ETHER_TYPE_FULL_MASK ((__force __be16)~0)
+static int igc_get_ethtool_nfc_entry(struct igc_adapter *adapter,
+ struct ethtool_rxnfc *cmd)
+{
+ struct ethtool_rx_flow_spec *fsp = &cmd->fs;
+ struct igc_nfc_filter *rule = NULL;
+
+ /* report total rule count */
+ cmd->data = IGC_MAX_RXNFC_FILTERS;
+
+ hlist_for_each_entry(rule, &adapter->nfc_filter_list, nfc_node) {
+ if (fsp->location <= rule->sw_idx)
+ break;
+ }
+
+ if (!rule || fsp->location != rule->sw_idx)
+ return -EINVAL;
+
+ if (rule->filter.match_flags) {
+ fsp->flow_type = ETHER_FLOW;
+ fsp->ring_cookie = rule->action;
+ if (rule->filter.match_flags & IGC_FILTER_FLAG_ETHER_TYPE) {
+ fsp->h_u.ether_spec.h_proto = rule->filter.etype;
+ fsp->m_u.ether_spec.h_proto = ETHER_TYPE_FULL_MASK;
+ }
+ if (rule->filter.match_flags & IGC_FILTER_FLAG_VLAN_TCI) {
+ fsp->flow_type |= FLOW_EXT;
+ fsp->h_ext.vlan_tci = rule->filter.vlan_tci;
+ fsp->m_ext.vlan_tci = htons(VLAN_PRIO_MASK);
+ }
+ if (rule->filter.match_flags & IGC_FILTER_FLAG_DST_MAC_ADDR) {
+ ether_addr_copy(fsp->h_u.ether_spec.h_dest,
+ rule->filter.dst_addr);
+ /* As we only support matching by the full
+ * mask, return the mask to userspace
+ */
+ eth_broadcast_addr(fsp->m_u.ether_spec.h_dest);
+ }
+ if (rule->filter.match_flags & IGC_FILTER_FLAG_SRC_MAC_ADDR) {
+ ether_addr_copy(fsp->h_u.ether_spec.h_source,
+ rule->filter.src_addr);
+ /* As we only support matching by the full
+ * mask, return the mask to userspace
+ */
+ eth_broadcast_addr(fsp->m_u.ether_spec.h_source);
+ }
+
+ return 0;
+ }
+ return -EINVAL;
+}
+
+static int igc_get_ethtool_nfc_all(struct igc_adapter *adapter,
+ struct ethtool_rxnfc *cmd,
+ u32 *rule_locs)
+{
+ struct igc_nfc_filter *rule;
+ int cnt = 0;
+
+ /* report total rule count */
+ cmd->data = IGC_MAX_RXNFC_FILTERS;
+
+ hlist_for_each_entry(rule, &adapter->nfc_filter_list, nfc_node) {
+ if (cnt == cmd->rule_cnt)
+ return -EMSGSIZE;
+ rule_locs[cnt] = rule->sw_idx;
+ cnt++;
+ }
+
+ cmd->rule_cnt = cnt;
+
+ return 0;
+}
+
+static int igc_get_rss_hash_opts(struct igc_adapter *adapter,
+ struct ethtool_rxnfc *cmd)
+{
+ cmd->data = 0;
+
+ /* Report default options for RSS on igc */
+ switch (cmd->flow_type) {
+ case TCP_V4_FLOW:
+ cmd->data |= RXH_L4_B_0_1 | RXH_L4_B_2_3;
+ /* Fall through */
+ case UDP_V4_FLOW:
+ if (adapter->flags & IGC_FLAG_RSS_FIELD_IPV4_UDP)
+ cmd->data |= RXH_L4_B_0_1 | RXH_L4_B_2_3;
+ /* Fall through */
+ case SCTP_V4_FLOW:
+ /* Fall through */
+ case AH_ESP_V4_FLOW:
+ /* Fall through */
+ case AH_V4_FLOW:
+ /* Fall through */
+ case ESP_V4_FLOW:
+ /* Fall through */
+ case IPV4_FLOW:
+ cmd->data |= RXH_IP_SRC | RXH_IP_DST;
+ break;
+ case TCP_V6_FLOW:
+ cmd->data |= RXH_L4_B_0_1 | RXH_L4_B_2_3;
+ /* Fall through */
+ case UDP_V6_FLOW:
+ if (adapter->flags & IGC_FLAG_RSS_FIELD_IPV6_UDP)
+ cmd->data |= RXH_L4_B_0_1 | RXH_L4_B_2_3;
+ /* Fall through */
+ case SCTP_V6_FLOW:
+ /* Fall through */
+ case AH_ESP_V6_FLOW:
+ /* Fall through */
+ case AH_V6_FLOW:
+ /* Fall through */
+ case ESP_V6_FLOW:
+ /* Fall through */
+ case IPV6_FLOW:
+ cmd->data |= RXH_IP_SRC | RXH_IP_DST;
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int igc_get_rxnfc(struct net_device *dev, struct ethtool_rxnfc *cmd,
+ u32 *rule_locs)
+{
+ struct igc_adapter *adapter = netdev_priv(dev);
+ int ret = -EOPNOTSUPP;
+
+ switch (cmd->cmd) {
+ case ETHTOOL_GRXRINGS:
+ cmd->data = adapter->num_rx_queues;
+ ret = 0;
+ break;
+ case ETHTOOL_GRXCLSRLCNT:
+ cmd->rule_cnt = adapter->nfc_filter_count;
+ ret = 0;
+ break;
+ case ETHTOOL_GRXCLSRULE:
+ ret = igc_get_ethtool_nfc_entry(adapter, cmd);
+ break;
+ case ETHTOOL_GRXCLSRLALL:
+ ret = igc_get_ethtool_nfc_all(adapter, cmd, rule_locs);
+ break;
+ case ETHTOOL_GRXFH:
+ ret = igc_get_rss_hash_opts(adapter, cmd);
+ break;
+ default:
+ break;
+ }
+
+ return ret;
+}
+
+#define UDP_RSS_FLAGS (IGC_FLAG_RSS_FIELD_IPV4_UDP | \
+ IGC_FLAG_RSS_FIELD_IPV6_UDP)
+static int igc_set_rss_hash_opt(struct igc_adapter *adapter,
+ struct ethtool_rxnfc *nfc)
+{
+ u32 flags = adapter->flags;
+
+ /* RSS does not support anything other than hashing
+ * to queues on src and dst IPs and ports
+ */
+ if (nfc->data & ~(RXH_IP_SRC | RXH_IP_DST |
+ RXH_L4_B_0_1 | RXH_L4_B_2_3))
+ return -EINVAL;
+
+ switch (nfc->flow_type) {
+ case TCP_V4_FLOW:
+ case TCP_V6_FLOW:
+ if (!(nfc->data & RXH_IP_SRC) ||
+ !(nfc->data & RXH_IP_DST) ||
+ !(nfc->data & RXH_L4_B_0_1) ||
+ !(nfc->data & RXH_L4_B_2_3))
+ return -EINVAL;
+ break;
+ case UDP_V4_FLOW:
+ if (!(nfc->data & RXH_IP_SRC) ||
+ !(nfc->data & RXH_IP_DST))
+ return -EINVAL;
+ switch (nfc->data & (RXH_L4_B_0_1 | RXH_L4_B_2_3)) {
+ case 0:
+ flags &= ~IGC_FLAG_RSS_FIELD_IPV4_UDP;
+ break;
+ case (RXH_L4_B_0_1 | RXH_L4_B_2_3):
+ flags |= IGC_FLAG_RSS_FIELD_IPV4_UDP;
+ break;
+ default:
+ return -EINVAL;
+ }
+ break;
+ case UDP_V6_FLOW:
+ if (!(nfc->data & RXH_IP_SRC) ||
+ !(nfc->data & RXH_IP_DST))
+ return -EINVAL;
+ switch (nfc->data & (RXH_L4_B_0_1 | RXH_L4_B_2_3)) {
+ case 0:
+ flags &= ~IGC_FLAG_RSS_FIELD_IPV6_UDP;
+ break;
+ case (RXH_L4_B_0_1 | RXH_L4_B_2_3):
+ flags |= IGC_FLAG_RSS_FIELD_IPV6_UDP;
+ break;
+ default:
+ return -EINVAL;
+ }
+ break;
+ case AH_ESP_V4_FLOW:
+ case AH_V4_FLOW:
+ case ESP_V4_FLOW:
+ case SCTP_V4_FLOW:
+ case AH_ESP_V6_FLOW:
+ case AH_V6_FLOW:
+ case ESP_V6_FLOW:
+ case SCTP_V6_FLOW:
+ if (!(nfc->data & RXH_IP_SRC) ||
+ !(nfc->data & RXH_IP_DST) ||
+ (nfc->data & RXH_L4_B_0_1) ||
+ (nfc->data & RXH_L4_B_2_3))
+ return -EINVAL;
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ /* if we changed something we need to update flags */
+ if (flags != adapter->flags) {
+ struct igc_hw *hw = &adapter->hw;
+ u32 mrqc = rd32(IGC_MRQC);
+
+ if ((flags & UDP_RSS_FLAGS) &&
+ !(adapter->flags & UDP_RSS_FLAGS))
+ dev_err(&adapter->pdev->dev,
+ "enabling UDP RSS: fragmented packets may arrive out of order to the stack above\n");
+
+ adapter->flags = flags;
+
+ /* Perform hash on these packet types */
+ mrqc |= IGC_MRQC_RSS_FIELD_IPV4 |
+ IGC_MRQC_RSS_FIELD_IPV4_TCP |
+ IGC_MRQC_RSS_FIELD_IPV6 |
+ IGC_MRQC_RSS_FIELD_IPV6_TCP;
+
+ mrqc &= ~(IGC_MRQC_RSS_FIELD_IPV4_UDP |
+ IGC_MRQC_RSS_FIELD_IPV6_UDP);
+
+ if (flags & IGC_FLAG_RSS_FIELD_IPV4_UDP)
+ mrqc |= IGC_MRQC_RSS_FIELD_IPV4_UDP;
+
+ if (flags & IGC_FLAG_RSS_FIELD_IPV6_UDP)
+ mrqc |= IGC_MRQC_RSS_FIELD_IPV6_UDP;
+
+ wr32(IGC_MRQC, mrqc);
+ }
+
+ return 0;
+}
+
+static int igc_rxnfc_write_etype_filter(struct igc_adapter *adapter,
+ struct igc_nfc_filter *input)
+{
+ struct igc_hw *hw = &adapter->hw;
+ u8 i;
+ u32 etqf;
+ u16 etype;
+
+ /* find an empty etype filter register */
+ for (i = 0; i < MAX_ETYPE_FILTER; ++i) {
+ if (!adapter->etype_bitmap[i])
+ break;
+ }
+ if (i == MAX_ETYPE_FILTER) {
+ dev_err(&adapter->pdev->dev, "ethtool -N: etype filters are all used.\n");
+ return -EINVAL;
+ }
+
+ adapter->etype_bitmap[i] = true;
+
+ etqf = rd32(IGC_ETQF(i));
+ etype = ntohs(input->filter.etype & ETHER_TYPE_FULL_MASK);
+
+ etqf |= IGC_ETQF_FILTER_ENABLE;
+ etqf &= ~IGC_ETQF_ETYPE_MASK;
+ etqf |= (etype & IGC_ETQF_ETYPE_MASK);
+
+ etqf &= ~IGC_ETQF_QUEUE_MASK;
+ etqf |= ((input->action << IGC_ETQF_QUEUE_SHIFT)
+ & IGC_ETQF_QUEUE_MASK);
+ etqf |= IGC_ETQF_QUEUE_ENABLE;
+
+ wr32(IGC_ETQF(i), etqf);
+
+ input->etype_reg_index = i;
+
+ return 0;
+}
+
+static int igc_rxnfc_write_vlan_prio_filter(struct igc_adapter *adapter,
+ struct igc_nfc_filter *input)
+{
+ struct igc_hw *hw = &adapter->hw;
+ u8 vlan_priority;
+ u16 queue_index;
+ u32 vlapqf;
+
+ vlapqf = rd32(IGC_VLAPQF);
+ vlan_priority = (ntohs(input->filter.vlan_tci) & VLAN_PRIO_MASK)
+ >> VLAN_PRIO_SHIFT;
+ queue_index = (vlapqf >> (vlan_priority * 4)) & IGC_VLAPQF_QUEUE_MASK;
+
+ /* check whether this vlan prio is already set */
+ if (vlapqf & IGC_VLAPQF_P_VALID(vlan_priority) &&
+ queue_index != input->action) {
+ dev_err(&adapter->pdev->dev, "ethtool rxnfc set vlan prio filter failed.\n");
+ return -EEXIST;
+ }
+
+ vlapqf |= IGC_VLAPQF_P_VALID(vlan_priority);
+ vlapqf |= IGC_VLAPQF_QUEUE_SEL(vlan_priority, input->action);
+
+ wr32(IGC_VLAPQF, vlapqf);
+
+ return 0;
+}
+
+int igc_add_filter(struct igc_adapter *adapter, struct igc_nfc_filter *input)
+{
+ struct igc_hw *hw = &adapter->hw;
+ int err = -EINVAL;
+
+ if (hw->mac.type == igc_i225 &&
+ !(input->filter.match_flags & ~IGC_FILTER_FLAG_SRC_MAC_ADDR)) {
+ dev_err(&adapter->pdev->dev,
+ "i225 doesn't support flow classification rules specifying only source addresses.\n");
+ return -EOPNOTSUPP;
+ }
+
+ if (input->filter.match_flags & IGC_FILTER_FLAG_ETHER_TYPE) {
+ err = igc_rxnfc_write_etype_filter(adapter, input);
+ if (err)
+ return err;
+ }
+
+ if (input->filter.match_flags & IGC_FILTER_FLAG_DST_MAC_ADDR) {
+ err = igc_add_mac_steering_filter(adapter,
+ input->filter.dst_addr,
+ input->action, 0);
+ err = min_t(int, err, 0);
+ if (err)
+ return err;
+ }
+
+ if (input->filter.match_flags & IGC_FILTER_FLAG_SRC_MAC_ADDR) {
+ err = igc_add_mac_steering_filter(adapter,
+ input->filter.src_addr,
+ input->action,
+ IGC_MAC_STATE_SRC_ADDR);
+ err = min_t(int, err, 0);
+ if (err)
+ return err;
+ }
+
+ if (input->filter.match_flags & IGC_FILTER_FLAG_VLAN_TCI)
+ err = igc_rxnfc_write_vlan_prio_filter(adapter, input);
+
+ return err;
+}
+
+static void igc_clear_etype_filter_regs(struct igc_adapter *adapter,
+ u16 reg_index)
+{
+ struct igc_hw *hw = &adapter->hw;
+ u32 etqf = rd32(IGC_ETQF(reg_index));
+
+ etqf &= ~IGC_ETQF_QUEUE_ENABLE;
+ etqf &= ~IGC_ETQF_QUEUE_MASK;
+ etqf &= ~IGC_ETQF_FILTER_ENABLE;
+
+ wr32(IGC_ETQF(reg_index), etqf);
+
+ adapter->etype_bitmap[reg_index] = false;
+}
+
+static void igc_clear_vlan_prio_filter(struct igc_adapter *adapter,
+ u16 vlan_tci)
+{
+ struct igc_hw *hw = &adapter->hw;
+ u8 vlan_priority;
+ u32 vlapqf;
+
+ vlan_priority = (vlan_tci & VLAN_PRIO_MASK) >> VLAN_PRIO_SHIFT;
+
+ vlapqf = rd32(IGC_VLAPQF);
+ vlapqf &= ~IGC_VLAPQF_P_VALID(vlan_priority);
+ vlapqf &= ~IGC_VLAPQF_QUEUE_SEL(vlan_priority,
+ IGC_VLAPQF_QUEUE_MASK);
+
+ wr32(IGC_VLAPQF, vlapqf);
+}
+
+int igc_erase_filter(struct igc_adapter *adapter, struct igc_nfc_filter *input)
+{
+ if (input->filter.match_flags & IGC_FILTER_FLAG_ETHER_TYPE)
+ igc_clear_etype_filter_regs(adapter,
+ input->etype_reg_index);
+
+ if (input->filter.match_flags & IGC_FILTER_FLAG_VLAN_TCI)
+ igc_clear_vlan_prio_filter(adapter,
+ ntohs(input->filter.vlan_tci));
+
+ if (input->filter.match_flags & IGC_FILTER_FLAG_SRC_MAC_ADDR)
+ igc_del_mac_steering_filter(adapter, input->filter.src_addr,
+ input->action,
+ IGC_MAC_STATE_SRC_ADDR);
+
+ if (input->filter.match_flags & IGC_FILTER_FLAG_DST_MAC_ADDR)
+ igc_del_mac_steering_filter(adapter, input->filter.dst_addr,
+ input->action, 0);
+
+ return 0;
+}
+
+static int igc_update_ethtool_nfc_entry(struct igc_adapter *adapter,
+ struct igc_nfc_filter *input,
+ u16 sw_idx)
+{
+ struct igc_nfc_filter *rule, *parent;
+ int err = -EINVAL;
+
+ parent = NULL;
+ rule = NULL;
+
+ hlist_for_each_entry(rule, &adapter->nfc_filter_list, nfc_node) {
+ /* hash found, or no matching entry */
+ if (rule->sw_idx >= sw_idx)
+ break;
+ parent = rule;
+ }
+
+ /* if there is an old rule occupying our place remove it */
+ if (rule && rule->sw_idx == sw_idx) {
+ if (!input)
+ err = igc_erase_filter(adapter, rule);
+
+ hlist_del(&rule->nfc_node);
+ kfree(rule);
+ adapter->nfc_filter_count--;
+ }
+
+ /* If no input this was a delete, err should be 0 if a rule was
+ * successfully found and removed from the list else -EINVAL
+ */
+ if (!input)
+ return err;
+
+ /* initialize node */
+ INIT_HLIST_NODE(&input->nfc_node);
+
+ /* add filter to the list */
+ if (parent)
+ hlist_add_behind(&input->nfc_node, &parent->nfc_node);
+ else
+ hlist_add_head(&input->nfc_node, &adapter->nfc_filter_list);
+
+ /* update counts */
+ adapter->nfc_filter_count++;
+
+ return 0;
+}
+
+static int igc_add_ethtool_nfc_entry(struct igc_adapter *adapter,
+ struct ethtool_rxnfc *cmd)
+{
+ struct net_device *netdev = adapter->netdev;
+ struct ethtool_rx_flow_spec *fsp =
+ (struct ethtool_rx_flow_spec *)&cmd->fs;
+ struct igc_nfc_filter *input, *rule;
+ int err = 0;
+
+ if (!(netdev->hw_features & NETIF_F_NTUPLE))
+ return -EOPNOTSUPP;
+
+ /* Don't allow programming if the action is a queue greater than
+ * the number of online Rx queues.
+ */
+ if (fsp->ring_cookie == RX_CLS_FLOW_DISC ||
+ fsp->ring_cookie >= adapter->num_rx_queues) {
+ dev_err(&adapter->pdev->dev, "ethtool -N: The specified action is invalid\n");
+ return -EINVAL;
+ }
+
+ /* Don't allow indexes to exist outside of available space */
+ if (fsp->location >= IGC_MAX_RXNFC_FILTERS) {
+ dev_err(&adapter->pdev->dev, "Location out of range\n");
+ return -EINVAL;
+ }
+
+ if ((fsp->flow_type & ~FLOW_EXT) != ETHER_FLOW)
+ return -EINVAL;
+
+ input = kzalloc(sizeof(*input), GFP_KERNEL);
+ if (!input)
+ return -ENOMEM;
+
+ if (fsp->m_u.ether_spec.h_proto == ETHER_TYPE_FULL_MASK) {
+ input->filter.etype = fsp->h_u.ether_spec.h_proto;
+ input->filter.match_flags = IGC_FILTER_FLAG_ETHER_TYPE;
+ }
+
+ /* Only support matching addresses by the full mask */
+ if (is_broadcast_ether_addr(fsp->m_u.ether_spec.h_source)) {
+ input->filter.match_flags |= IGC_FILTER_FLAG_SRC_MAC_ADDR;
+ ether_addr_copy(input->filter.src_addr,
+ fsp->h_u.ether_spec.h_source);
+ }
+
+ /* Only support matching addresses by the full mask */
+ if (is_broadcast_ether_addr(fsp->m_u.ether_spec.h_dest)) {
+ input->filter.match_flags |= IGC_FILTER_FLAG_DST_MAC_ADDR;
+ ether_addr_copy(input->filter.dst_addr,
+ fsp->h_u.ether_spec.h_dest);
+ }
+
+ if ((fsp->flow_type & FLOW_EXT) && fsp->m_ext.vlan_tci) {
+ if (fsp->m_ext.vlan_tci != htons(VLAN_PRIO_MASK)) {
+ err = -EINVAL;
+ goto err_out;
+ }
+ input->filter.vlan_tci = fsp->h_ext.vlan_tci;
+ input->filter.match_flags |= IGC_FILTER_FLAG_VLAN_TCI;
+ }
+
+ input->action = fsp->ring_cookie;
+ input->sw_idx = fsp->location;
+
+ spin_lock(&adapter->nfc_lock);
+
+ hlist_for_each_entry(rule, &adapter->nfc_filter_list, nfc_node) {
+ if (!memcmp(&input->filter, &rule->filter,
+ sizeof(input->filter))) {
+ err = -EEXIST;
+ dev_err(&adapter->pdev->dev,
+ "ethtool: this filter is already set\n");
+ goto err_out_w_lock;
+ }
+ }
+
+ err = igc_add_filter(adapter, input);
+ if (err)
+ goto err_out_w_lock;
+
+ igc_update_ethtool_nfc_entry(adapter, input, input->sw_idx);
+
+ spin_unlock(&adapter->nfc_lock);
+ return 0;
+
+err_out_w_lock:
+ spin_unlock(&adapter->nfc_lock);
+err_out:
+ kfree(input);
+ return err;
+}
+
+static int igc_del_ethtool_nfc_entry(struct igc_adapter *adapter,
+ struct ethtool_rxnfc *cmd)
+{
+ struct ethtool_rx_flow_spec *fsp =
+ (struct ethtool_rx_flow_spec *)&cmd->fs;
+ int err;
+
+ spin_lock(&adapter->nfc_lock);
+ err = igc_update_ethtool_nfc_entry(adapter, NULL, fsp->location);
+ spin_unlock(&adapter->nfc_lock);
+
+ return err;
+}
+
+static int igc_set_rxnfc(struct net_device *dev, struct ethtool_rxnfc *cmd)
+{
+ struct igc_adapter *adapter = netdev_priv(dev);
+ int ret = -EOPNOTSUPP;
+
+ switch (cmd->cmd) {
+ case ETHTOOL_SRXFH:
+ ret = igc_set_rss_hash_opt(adapter, cmd);
+ break;
+ case ETHTOOL_SRXCLSRLINS:
+ ret = igc_add_ethtool_nfc_entry(adapter, cmd);
+ break;
+ case ETHTOOL_SRXCLSRLDEL:
+ ret = igc_del_ethtool_nfc_entry(adapter, cmd);
+ default:
+ break;
+ }
+
+ return ret;
+}
+
void igc_write_rss_indir_tbl(struct igc_adapter *adapter)
{
struct igc_hw *hw = &adapter->hw;
if (hw->mac.type == igc_i225 &&
(status & IGC_STATUS_SPEED_2500)) {
speed = SPEED_2500;
- hw_dbg("2500 Mbs, ");
} else {
speed = SPEED_1000;
- hw_dbg("1000 Mbs, ");
}
} else if (status & IGC_STATUS_SPEED_100) {
speed = SPEED_100;
- hw_dbg("100 Mbs, ");
} else {
speed = SPEED_10;
- hw_dbg("10 Mbs, ");
}
if ((status & IGC_STATUS_FD) ||
hw->phy.media_type != igc_media_type_copper)
.set_ringparam = igc_set_ringparam,
.get_pauseparam = igc_get_pauseparam,
.set_pauseparam = igc_set_pauseparam,
+ .get_strings = igc_get_strings,
+ .get_sset_count = igc_get_sset_count,
+ .get_ethtool_stats = igc_get_ethtool_stats,
.get_coalesce = igc_get_coalesce,
.set_coalesce = igc_set_coalesce,
+ .get_rxnfc = igc_get_rxnfc,
+ .set_rxnfc = igc_set_rxnfc,
.get_rxfh_indir_size = igc_get_rxfh_indir_size,
.get_rxfh = igc_get_rxfh,
.set_rxfh = igc_set_rxfh,
*/
static void igc_setup_mrqc(struct igc_adapter *adapter)
{
+ struct igc_hw *hw = &adapter->hw;
+ u32 j, num_rx_queues;
+ u32 mrqc, rxcsum;
+ u32 rss_key[10];
+
+ netdev_rss_key_fill(rss_key, sizeof(rss_key));
+ for (j = 0; j < 10; j++)
+ wr32(IGC_RSSRK(j), rss_key[j]);
+
+ num_rx_queues = adapter->rss_queues;
+
+ if (adapter->rss_indir_tbl_init != num_rx_queues) {
+ for (j = 0; j < IGC_RETA_SIZE; j++)
+ adapter->rss_indir_tbl[j] =
+ (j * num_rx_queues) / IGC_RETA_SIZE;
+ adapter->rss_indir_tbl_init = num_rx_queues;
+ }
+ igc_write_rss_indir_tbl(adapter);
+
+ /* Disable raw packet checksumming so that RSS hash is placed in
+ * descriptor on writeback. No need to enable TCP/UDP/IP checksum
+ * offloads as they are enabled by default
+ */
+ rxcsum = rd32(IGC_RXCSUM);
+ rxcsum |= IGC_RXCSUM_PCSD;
+
+ /* Enable Receive Checksum Offload for SCTP */
+ rxcsum |= IGC_RXCSUM_CRCOFL;
+
+ /* Don't need to set TUOFL or IPOFL, they default to 1 */
+ wr32(IGC_RXCSUM, rxcsum);
+
+ /* Generate RSS hash based on packet types, TCP/UDP
+ * port numbers and/or IPv4/v6 src and dst addresses
+ */
+ mrqc = IGC_MRQC_RSS_FIELD_IPV4 |
+ IGC_MRQC_RSS_FIELD_IPV4_TCP |
+ IGC_MRQC_RSS_FIELD_IPV6 |
+ IGC_MRQC_RSS_FIELD_IPV6_TCP |
+ IGC_MRQC_RSS_FIELD_IPV6_TCP_EX;
+
+ if (adapter->flags & IGC_FLAG_RSS_FIELD_IPV4_UDP)
+ mrqc |= IGC_MRQC_RSS_FIELD_IPV4_UDP;
+ if (adapter->flags & IGC_FLAG_RSS_FIELD_IPV6_UDP)
+ mrqc |= IGC_MRQC_RSS_FIELD_IPV6_UDP;
+
+ mrqc |= IGC_MRQC_ENABLE_RSS_MQ;
+
+ wr32(IGC_MRQC, mrqc);
}
/**
/* Make sure there is space in the ring for the next send. */
igc_maybe_stop_tx(tx_ring, DESC_NEEDED);
- if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
+ if (netif_xmit_stopped(txring_txq(tx_ring)) || !netdev_xmit_more()) {
writel(i, tx_ring->tail);
/* we need this if more than one processor can write to our tail
* igc_update_stats - Update the board statistics counters
* @adapter: board private structure
*/
-static void igc_update_stats(struct igc_adapter *adapter)
+void igc_update_stats(struct igc_adapter *adapter)
{
+ struct rtnl_link_stats64 *net_stats = &adapter->stats64;
+ struct pci_dev *pdev = adapter->pdev;
+ struct igc_hw *hw = &adapter->hw;
+ u64 _bytes, _packets;
+ u64 bytes, packets;
+ unsigned int start;
+ u32 mpc;
+ int i;
+
+ /* Prevent stats update while adapter is being reset, or if the pci
+ * connection is down.
+ */
+ if (adapter->link_speed == 0)
+ return;
+ if (pci_channel_offline(pdev))
+ return;
+
+ packets = 0;
+ bytes = 0;
+
+ rcu_read_lock();
+ for (i = 0; i < adapter->num_rx_queues; i++) {
+ struct igc_ring *ring = adapter->rx_ring[i];
+ u32 rqdpc = rd32(IGC_RQDPC(i));
+
+ if (hw->mac.type >= igc_i225)
+ wr32(IGC_RQDPC(i), 0);
+
+ if (rqdpc) {
+ ring->rx_stats.drops += rqdpc;
+ net_stats->rx_fifo_errors += rqdpc;
+ }
+
+ do {
+ start = u64_stats_fetch_begin_irq(&ring->rx_syncp);
+ _bytes = ring->rx_stats.bytes;
+ _packets = ring->rx_stats.packets;
+ } while (u64_stats_fetch_retry_irq(&ring->rx_syncp, start));
+ bytes += _bytes;
+ packets += _packets;
+ }
+
+ net_stats->rx_bytes = bytes;
+ net_stats->rx_packets = packets;
+
+ packets = 0;
+ bytes = 0;
+ for (i = 0; i < adapter->num_tx_queues; i++) {
+ struct igc_ring *ring = adapter->tx_ring[i];
+
+ do {
+ start = u64_stats_fetch_begin_irq(&ring->tx_syncp);
+ _bytes = ring->tx_stats.bytes;
+ _packets = ring->tx_stats.packets;
+ } while (u64_stats_fetch_retry_irq(&ring->tx_syncp, start));
+ bytes += _bytes;
+ packets += _packets;
+ }
+ net_stats->tx_bytes = bytes;
+ net_stats->tx_packets = packets;
+ rcu_read_unlock();
+
+ /* read stats registers */
+ adapter->stats.crcerrs += rd32(IGC_CRCERRS);
+ adapter->stats.gprc += rd32(IGC_GPRC);
+ adapter->stats.gorc += rd32(IGC_GORCL);
+ rd32(IGC_GORCH); /* clear GORCL */
+ adapter->stats.bprc += rd32(IGC_BPRC);
+ adapter->stats.mprc += rd32(IGC_MPRC);
+ adapter->stats.roc += rd32(IGC_ROC);
+
+ adapter->stats.prc64 += rd32(IGC_PRC64);
+ adapter->stats.prc127 += rd32(IGC_PRC127);
+ adapter->stats.prc255 += rd32(IGC_PRC255);
+ adapter->stats.prc511 += rd32(IGC_PRC511);
+ adapter->stats.prc1023 += rd32(IGC_PRC1023);
+ adapter->stats.prc1522 += rd32(IGC_PRC1522);
+ adapter->stats.symerrs += rd32(IGC_SYMERRS);
+ adapter->stats.sec += rd32(IGC_SEC);
+
+ mpc = rd32(IGC_MPC);
+ adapter->stats.mpc += mpc;
+ net_stats->rx_fifo_errors += mpc;
+ adapter->stats.scc += rd32(IGC_SCC);
+ adapter->stats.ecol += rd32(IGC_ECOL);
+ adapter->stats.mcc += rd32(IGC_MCC);
+ adapter->stats.latecol += rd32(IGC_LATECOL);
+ adapter->stats.dc += rd32(IGC_DC);
+ adapter->stats.rlec += rd32(IGC_RLEC);
+ adapter->stats.xonrxc += rd32(IGC_XONRXC);
+ adapter->stats.xontxc += rd32(IGC_XONTXC);
+ adapter->stats.xoffrxc += rd32(IGC_XOFFRXC);
+ adapter->stats.xofftxc += rd32(IGC_XOFFTXC);
+ adapter->stats.fcruc += rd32(IGC_FCRUC);
+ adapter->stats.gptc += rd32(IGC_GPTC);
+ adapter->stats.gotc += rd32(IGC_GOTCL);
+ rd32(IGC_GOTCH); /* clear GOTCL */
+ adapter->stats.rnbc += rd32(IGC_RNBC);
+ adapter->stats.ruc += rd32(IGC_RUC);
+ adapter->stats.rfc += rd32(IGC_RFC);
+ adapter->stats.rjc += rd32(IGC_RJC);
+ adapter->stats.tor += rd32(IGC_TORH);
+ adapter->stats.tot += rd32(IGC_TOTH);
+ adapter->stats.tpr += rd32(IGC_TPR);
+
+ adapter->stats.ptc64 += rd32(IGC_PTC64);
+ adapter->stats.ptc127 += rd32(IGC_PTC127);
+ adapter->stats.ptc255 += rd32(IGC_PTC255);
+ adapter->stats.ptc511 += rd32(IGC_PTC511);
+ adapter->stats.ptc1023 += rd32(IGC_PTC1023);
+ adapter->stats.ptc1522 += rd32(IGC_PTC1522);
+
+ adapter->stats.mptc += rd32(IGC_MPTC);
+ adapter->stats.bptc += rd32(IGC_BPTC);
+
+ adapter->stats.tpt += rd32(IGC_TPT);
+ adapter->stats.colc += rd32(IGC_COLC);
+
+ adapter->stats.algnerrc += rd32(IGC_ALGNERRC);
+
+ adapter->stats.tsctc += rd32(IGC_TSCTC);
+ adapter->stats.tsctfc += rd32(IGC_TSCTFC);
+
+ adapter->stats.iac += rd32(IGC_IAC);
+ adapter->stats.icrxoc += rd32(IGC_ICRXOC);
+ adapter->stats.icrxptc += rd32(IGC_ICRXPTC);
+ adapter->stats.icrxatc += rd32(IGC_ICRXATC);
+ adapter->stats.ictxptc += rd32(IGC_ICTXPTC);
+ adapter->stats.ictxatc += rd32(IGC_ICTXATC);
+ adapter->stats.ictxqec += rd32(IGC_ICTXQEC);
+ adapter->stats.ictxqmtc += rd32(IGC_ICTXQMTC);
+ adapter->stats.icrxdmtc += rd32(IGC_ICRXDMTC);
+
+ /* Fill out the OS statistics structure */
+ net_stats->multicast = adapter->stats.mprc;
+ net_stats->collisions = adapter->stats.colc;
+
+ /* Rx Errors */
+
+ /* RLEC on some newer hardware can be incorrect so build
+ * our own version based on RUC and ROC
+ */
+ net_stats->rx_errors = adapter->stats.rxerrc +
+ adapter->stats.crcerrs + adapter->stats.algnerrc +
+ adapter->stats.ruc + adapter->stats.roc +
+ adapter->stats.cexterr;
+ net_stats->rx_length_errors = adapter->stats.ruc +
+ adapter->stats.roc;
+ net_stats->rx_crc_errors = adapter->stats.crcerrs;
+ net_stats->rx_frame_errors = adapter->stats.algnerrc;
+ net_stats->rx_missed_errors = adapter->stats.mpc;
+
+ /* Tx Errors */
+ net_stats->tx_errors = adapter->stats.ecol +
+ adapter->stats.latecol;
+ net_stats->tx_aborted_errors = adapter->stats.ecol;
+ net_stats->tx_window_errors = adapter->stats.latecol;
+ net_stats->tx_carrier_errors = adapter->stats.tncrs;
+
+ /* Tx Dropped needs to be maintained elsewhere */
+
+ /* Management Stats */
+ adapter->stats.mgptc += rd32(IGC_MGTPTC);
+ adapter->stats.mgprc += rd32(IGC_MGTPRC);
+ adapter->stats.mgpdc += rd32(IGC_MGTPDC);
}
static void igc_nfc_filter_exit(struct igc_adapter *adapter)
{
+ struct igc_nfc_filter *rule;
+
+ spin_lock(&adapter->nfc_lock);
+
+ hlist_for_each_entry(rule, &adapter->nfc_filter_list, nfc_node)
+ igc_erase_filter(adapter, rule);
+
+ hlist_for_each_entry(rule, &adapter->cls_flower_list, nfc_node)
+ igc_erase_filter(adapter, rule);
+
+ spin_unlock(&adapter->nfc_lock);
+}
+
+static void igc_nfc_filter_restore(struct igc_adapter *adapter)
+{
+ struct igc_nfc_filter *rule;
+
+ spin_lock(&adapter->nfc_lock);
+
+ hlist_for_each_entry(rule, &adapter->nfc_filter_list, nfc_node)
+ igc_add_filter(adapter, rule);
+
+ spin_unlock(&adapter->nfc_lock);
}
/**
return &netdev->stats;
}
+static netdev_features_t igc_fix_features(struct net_device *netdev,
+ netdev_features_t features)
+{
+ /* Since there is no support for separate Rx/Tx vlan accel
+ * enable/disable make sure Tx flag is always in same state as Rx.
+ */
+ if (features & NETIF_F_HW_VLAN_CTAG_RX)
+ features |= NETIF_F_HW_VLAN_CTAG_TX;
+ else
+ features &= ~NETIF_F_HW_VLAN_CTAG_TX;
+
+ return features;
+}
+
+static int igc_set_features(struct net_device *netdev,
+ netdev_features_t features)
+{
+ netdev_features_t changed = netdev->features ^ features;
+ struct igc_adapter *adapter = netdev_priv(netdev);
+
+ /* Add VLAN support */
+ if (!(changed & (NETIF_F_RXALL | NETIF_F_NTUPLE)))
+ return 0;
+
+ if (!(features & NETIF_F_NTUPLE)) {
+ struct hlist_node *node2;
+ struct igc_nfc_filter *rule;
+
+ spin_lock(&adapter->nfc_lock);
+ hlist_for_each_entry_safe(rule, node2,
+ &adapter->nfc_filter_list, nfc_node) {
+ igc_erase_filter(adapter, rule);
+ hlist_del(&rule->nfc_node);
+ kfree(rule);
+ }
+ spin_unlock(&adapter->nfc_lock);
+ adapter->nfc_filter_count = 0;
+ }
+
+ netdev->features = features;
+
+ if (netif_running(netdev))
+ igc_reinit_locked(adapter);
+ else
+ igc_reset(adapter);
+
+ return 1;
+}
+
+static netdev_features_t
+igc_features_check(struct sk_buff *skb, struct net_device *dev,
+ netdev_features_t features)
+{
+ unsigned int network_hdr_len, mac_hdr_len;
+
+ /* Make certain the headers can be described by a context descriptor */
+ mac_hdr_len = skb_network_header(skb) - skb->data;
+ if (unlikely(mac_hdr_len > IGC_MAX_MAC_HDR_LEN))
+ return features & ~(NETIF_F_HW_CSUM |
+ NETIF_F_SCTP_CRC |
+ NETIF_F_HW_VLAN_CTAG_TX |
+ NETIF_F_TSO |
+ NETIF_F_TSO6);
+
+ network_hdr_len = skb_checksum_start(skb) - skb_network_header(skb);
+ if (unlikely(network_hdr_len > IGC_MAX_NETWORK_HDR_LEN))
+ return features & ~(NETIF_F_HW_CSUM |
+ NETIF_F_SCTP_CRC |
+ NETIF_F_TSO |
+ NETIF_F_TSO6);
+
+ /* We can only support IPv4 TSO in tunnels if we can mangle the
+ * inner IP ID field, so strip TSO if MANGLEID is not supported.
+ */
+ if (skb->encapsulation && !(features & NETIF_F_TSO_MANGLEID))
+ features &= ~NETIF_F_TSO;
+
+ return features;
+}
+
/**
* igc_configure - configure the hardware for RX and TX
* @adapter: private board structure
igc_setup_mrqc(adapter);
igc_setup_rctl(adapter);
+ igc_nfc_filter_restore(adapter);
igc_configure_tx(adapter);
igc_configure_rx(adapter);
igc_rar_set_index(adapter, 0);
}
+/* If the filter to be added and an already existing filter express
+ * the same address and address type, it should be possible to only
+ * override the other configurations, for example the queue to steer
+ * traffic.
+ */
+static bool igc_mac_entry_can_be_used(const struct igc_mac_addr *entry,
+ const u8 *addr, const u8 flags)
+{
+ if (!(entry->state & IGC_MAC_STATE_IN_USE))
+ return true;
+
+ if ((entry->state & IGC_MAC_STATE_SRC_ADDR) !=
+ (flags & IGC_MAC_STATE_SRC_ADDR))
+ return false;
+
+ if (!ether_addr_equal(addr, entry->addr))
+ return false;
+
+ return true;
+}
+
+/* Add a MAC filter for 'addr' directing matching traffic to 'queue',
+ * 'flags' is used to indicate what kind of match is made, match is by
+ * default for the destination address, if matching by source address
+ * is desired the flag IGC_MAC_STATE_SRC_ADDR can be used.
+ */
+static int igc_add_mac_filter_flags(struct igc_adapter *adapter,
+ const u8 *addr, const u8 queue,
+ const u8 flags)
+{
+ struct igc_hw *hw = &adapter->hw;
+ int rar_entries = hw->mac.rar_entry_count;
+ int i;
+
+ if (is_zero_ether_addr(addr))
+ return -EINVAL;
+
+ /* Search for the first empty entry in the MAC table.
+ * Do not touch entries at the end of the table reserved for the VF MAC
+ * addresses.
+ */
+ for (i = 0; i < rar_entries; i++) {
+ if (!igc_mac_entry_can_be_used(&adapter->mac_table[i],
+ addr, flags))
+ continue;
+
+ ether_addr_copy(adapter->mac_table[i].addr, addr);
+ adapter->mac_table[i].queue = queue;
+ adapter->mac_table[i].state |= IGC_MAC_STATE_IN_USE | flags;
+
+ igc_rar_set_index(adapter, i);
+ return i;
+ }
+
+ return -ENOSPC;
+}
+
+int igc_add_mac_steering_filter(struct igc_adapter *adapter,
+ const u8 *addr, u8 queue, u8 flags)
+{
+ return igc_add_mac_filter_flags(adapter, addr, queue,
+ IGC_MAC_STATE_QUEUE_STEERING | flags);
+}
+
+/* Remove a MAC filter for 'addr' directing matching traffic to
+ * 'queue', 'flags' is used to indicate what kind of match need to be
+ * removed, match is by default for the destination address, if
+ * matching by source address is to be removed the flag
+ * IGC_MAC_STATE_SRC_ADDR can be used.
+ */
+static int igc_del_mac_filter_flags(struct igc_adapter *adapter,
+ const u8 *addr, const u8 queue,
+ const u8 flags)
+{
+ struct igc_hw *hw = &adapter->hw;
+ int rar_entries = hw->mac.rar_entry_count;
+ int i;
+
+ if (is_zero_ether_addr(addr))
+ return -EINVAL;
+
+ /* Search for matching entry in the MAC table based on given address
+ * and queue. Do not touch entries at the end of the table reserved
+ * for the VF MAC addresses.
+ */
+ for (i = 0; i < rar_entries; i++) {
+ if (!(adapter->mac_table[i].state & IGC_MAC_STATE_IN_USE))
+ continue;
+ if ((adapter->mac_table[i].state & flags) != flags)
+ continue;
+ if (adapter->mac_table[i].queue != queue)
+ continue;
+ if (!ether_addr_equal(adapter->mac_table[i].addr, addr))
+ continue;
+
+ /* When a filter for the default address is "deleted",
+ * we return it to its initial configuration
+ */
+ if (adapter->mac_table[i].state & IGC_MAC_STATE_DEFAULT) {
+ adapter->mac_table[i].state =
+ IGC_MAC_STATE_DEFAULT | IGC_MAC_STATE_IN_USE;
+ } else {
+ adapter->mac_table[i].state = 0;
+ adapter->mac_table[i].queue = 0;
+ memset(adapter->mac_table[i].addr, 0, ETH_ALEN);
+ }
+
+ igc_rar_set_index(adapter, i);
+ return 0;
+ }
+
+ return -ENOENT;
+}
+
+int igc_del_mac_steering_filter(struct igc_adapter *adapter,
+ const u8 *addr, u8 queue, u8 flags)
+{
+ return igc_del_mac_filter_flags(adapter, addr, queue,
+ IGC_MAC_STATE_QUEUE_STEERING | flags);
+}
+
/**
* igc_set_rx_mode - Secondary Unicast, Multicast and Promiscuous mode set
* @netdev: network interface device structure
.ndo_set_mac_address = igc_set_mac,
.ndo_change_mtu = igc_change_mtu,
.ndo_get_stats = igc_get_stats,
+ .ndo_fix_features = igc_fix_features,
+ .ndo_set_features = igc_set_features,
+ .ndo_features_check = igc_features_check,
};
/* PCIe configuration access */
if (err)
goto err_sw_init;
+ /* copy netdev features into list of user selectable features */
+ netdev->hw_features |= NETIF_F_NTUPLE;
+
/* MTU range: 68 - 9216 */
netdev->min_mtu = ETH_MIN_MTU;
netdev->max_mtu = MAX_STD_JUMBO_FRAME_SIZE;
/* MSI-X Table Register Descriptions */
#define IGC_PBACL 0x05B68 /* MSIx PBA Clear - R/W 1 to clear */
+/* RSS registers */
+#define IGC_MRQC 0x05818 /* Multiple Receive Control - RW */
+
+/* Filtering Registers */
+#define IGC_ETQF(_n) (0x05CB0 + (4 * (_n))) /* EType Queue Fltr */
+
+/* ETQF register bit definitions */
+#define IGC_ETQF_FILTER_ENABLE BIT(26)
+#define IGC_ETQF_QUEUE_ENABLE BIT(31)
+#define IGC_ETQF_QUEUE_SHIFT 16
+#define IGC_ETQF_QUEUE_MASK 0x00070000
+#define IGC_ETQF_ETYPE_MASK 0x0000FFFF
+
/* Redirection Table - RW Array */
#define IGC_RETA(_i) (0x05C00 + ((_i) * 4))
+/* RSS Random Key - RW Array */
+#define IGC_RSSRK(_i) (0x05C80 + ((_i) * 4))
/* Receive Register Descriptions */
#define IGC_RCTL 0x00100 /* Rx Control - RW */
#define IGC_UTA 0x0A000 /* Unicast Table Array - RW */
#define IGC_RAL(_n) (0x05400 + ((_n) * 0x08))
#define IGC_RAH(_n) (0x05404 + ((_n) * 0x08))
+#define IGC_VLAPQF 0x055B0 /* VLAN Priority Queue Filter VLAPQF */
/* Transmit Register Descriptions */
#define IGC_TCTL 0x00400 /* Tx Control - RW */
ixgbe_maybe_stop_tx(tx_ring, DESC_NEEDED);
- if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
+ if (netif_xmit_stopped(txring_txq(tx_ring)) || !netdev_xmit_more()) {
writel(i, tx_ring->tail);
/* we need this if more than one processor can write to our tail
#ifdef IXGBE_FCOE
static u16 ixgbe_select_queue(struct net_device *dev, struct sk_buff *skb,
- struct net_device *sb_dev,
- select_queue_fallback_t fallback)
+ struct net_device *sb_dev)
{
struct ixgbe_adapter *adapter;
struct ixgbe_ring_feature *f;
break;
/* fall through */
default:
- return fallback(dev, skb, sb_dev);
+ return netdev_pick_tx(dev, skb, sb_dev);
}
f = &adapter->ring_feature[RING_F_FCOE];
NETIF_F_HW_VLAN_CTAG_FILTER))
ixgbe_set_rx_mode(netdev);
- return 0;
+ return 1;
}
/**
if (txq->count >= txq->tx_stop_threshold)
netif_tx_stop_queue(nq);
- if (!skb->xmit_more || netif_xmit_stopped(nq) ||
+ if (!netdev_xmit_more() || netif_xmit_stopped(nq) ||
txq->pending + frags > MVNETA_TXQ_DEC_SENT_MASK)
mvneta_txq_pend_desc_add(pp, txq, frags);
else
phylink_set(mask, 1000baseX_Full);
}
if (pp->comphy || state->interface == PHY_INTERFACE_MODE_2500BASEX) {
+ phylink_set(mask, 2500baseT_Full);
phylink_set(mask, 2500baseX_Full);
}
#define MVPP2_CLS_FLOW_TBL1_REG 0x1828
#define MVPP2_CLS_FLOW_TBL1_N_FIELDS_MASK 0x7
#define MVPP2_CLS_FLOW_TBL1_N_FIELDS(x) (x)
+#define MVPP2_CLS_FLOW_TBL1_LU_TYPE(lu) (((lu) & 0x3f) << 3)
#define MVPP2_CLS_FLOW_TBL1_PRIO_MASK 0x3f
#define MVPP2_CLS_FLOW_TBL1_PRIO(x) ((x) << 9)
#define MVPP2_CLS_FLOW_TBL1_SEQ_MASK 0x7
#define MVPP22_CLS_C2_TCAM_DATA2 0x1b18
#define MVPP22_CLS_C2_TCAM_DATA3 0x1b1c
#define MVPP22_CLS_C2_TCAM_DATA4 0x1b20
+#define MVPP22_CLS_C2_LU_TYPE(lu) ((lu) & 0x3f)
#define MVPP22_CLS_C2_PORT_ID(port) ((port) << 8)
+#define MVPP22_CLS_C2_TCAM_INV 0x1b24
+#define MVPP22_CLS_C2_TCAM_INV_BIT BIT(31)
#define MVPP22_CLS_C2_HIT_CTR 0x1b50
#define MVPP22_CLS_C2_ACT 0x1b60
#define MVPP22_CLS_C2_ACT_RSS_EN(act) (((act) & 0x3) << 19)
#define MVPP2_BIT_TO_WORD(bit) ((bit) / 32)
#define MVPP2_BIT_IN_WORD(bit) ((bit) % 32)
+#define MVPP2_N_PRS_FLOWS 52
+
/* RSS constants */
#define MVPP22_RSS_TABLE_ENTRIES 32
#define MVPP2_DESC_DMA_MASK DMA_BIT_MASK(40)
/* Definitions */
+struct mvpp2_dbgfs_entries;
/* Shared Packet Processor resources */
struct mvpp2 {
/* Debugfs root entry */
struct dentry *dbgfs_dir;
+
+ /* Debugfs entries private data */
+ struct mvpp2_dbgfs_entries *dbgfs_entries;
};
struct mvpp2_pcpu_stats {
} \
}
-static struct mvpp2_cls_flow cls_flows[MVPP2_N_FLOWS] = {
+static const struct mvpp2_cls_flow cls_flows[MVPP2_N_PRS_FLOWS] = {
/* TCP over IPv4 flows, Not fragmented, no vlan tag */
MVPP2_DEF_FLOW(TCP_V4_FLOW, MVPP2_FL_IP4_TCP_NF_UNTAG,
MVPP22_CLS_HEK_IP4_5T,
fe->data[0] &= ~MVPP2_CLS_FLOW_TBL0_PORT_ID_SEL;
}
-static void mvpp2_cls_flow_seq_set(struct mvpp2_cls_flow_entry *fe, u32 seq)
-{
- fe->data[1] &= ~MVPP2_CLS_FLOW_TBL1_SEQ(MVPP2_CLS_FLOW_TBL1_SEQ_MASK);
- fe->data[1] |= MVPP2_CLS_FLOW_TBL1_SEQ(seq);
-}
-
static void mvpp2_cls_flow_last_set(struct mvpp2_cls_flow_entry *fe,
bool is_last)
{
fe->data[0] |= MVPP2_CLS_FLOW_TBL0_PORT_ID(port);
}
+static void mvpp2_cls_flow_lu_type_set(struct mvpp2_cls_flow_entry *fe,
+ u8 lu_type)
+{
+ fe->data[1] &= ~MVPP2_CLS_FLOW_TBL1_LU_TYPE(MVPP2_CLS_LU_TYPE_MASK);
+ fe->data[1] |= MVPP2_CLS_FLOW_TBL1_LU_TYPE(lu_type);
+}
+
/* Initialize the parser entry for the given flow */
static void mvpp2_cls_flow_prs_init(struct mvpp2 *priv,
- struct mvpp2_cls_flow *flow)
+ const struct mvpp2_cls_flow *flow)
{
mvpp2_prs_add_flow(priv, flow->flow_id, flow->prs_ri.ri,
flow->prs_ri.ri_mask);
/* Initialize the Lookup Id table entry for the given flow */
static void mvpp2_cls_flow_lkp_init(struct mvpp2 *priv,
- struct mvpp2_cls_flow *flow)
+ const struct mvpp2_cls_flow *flow)
{
struct mvpp2_cls_lookup_entry le;
/* We point on the first lookup in the sequence for the flow, that is
* the C2 lookup.
*/
- le.data |= MVPP2_CLS_LKP_FLOW_PTR(MVPP2_FLOW_C2_ENTRY(flow->flow_id));
+ le.data |= MVPP2_CLS_LKP_FLOW_PTR(MVPP2_CLS_FLT_FIRST(flow->flow_id));
/* CLS is always enabled, RSS is enabled/disabled in C2 lookup */
le.data |= MVPP2_CLS_LKP_TBL_LOOKUP_EN_MASK;
mvpp2_cls_lookup_write(priv, &le);
}
+static void mvpp2_cls_c2_write(struct mvpp2 *priv,
+ struct mvpp2_cls_c2_entry *c2)
+{
+ u32 val;
+ mvpp2_write(priv, MVPP22_CLS_C2_TCAM_IDX, c2->index);
+
+ val = mvpp2_read(priv, MVPP22_CLS_C2_TCAM_INV);
+ if (c2->valid)
+ val &= ~MVPP22_CLS_C2_TCAM_INV_BIT;
+ else
+ val |= MVPP22_CLS_C2_TCAM_INV_BIT;
+ mvpp2_write(priv, MVPP22_CLS_C2_TCAM_INV, val);
+
+ mvpp2_write(priv, MVPP22_CLS_C2_ACT, c2->act);
+
+ mvpp2_write(priv, MVPP22_CLS_C2_ATTR0, c2->attr[0]);
+ mvpp2_write(priv, MVPP22_CLS_C2_ATTR1, c2->attr[1]);
+ mvpp2_write(priv, MVPP22_CLS_C2_ATTR2, c2->attr[2]);
+ mvpp2_write(priv, MVPP22_CLS_C2_ATTR3, c2->attr[3]);
+
+ mvpp2_write(priv, MVPP22_CLS_C2_TCAM_DATA0, c2->tcam[0]);
+ mvpp2_write(priv, MVPP22_CLS_C2_TCAM_DATA1, c2->tcam[1]);
+ mvpp2_write(priv, MVPP22_CLS_C2_TCAM_DATA2, c2->tcam[2]);
+ mvpp2_write(priv, MVPP22_CLS_C2_TCAM_DATA3, c2->tcam[3]);
+ /* Writing TCAM_DATA4 flushes writes to TCAM_DATA0-4 and INV to HW */
+ mvpp2_write(priv, MVPP22_CLS_C2_TCAM_DATA4, c2->tcam[4]);
+}
+
+void mvpp2_cls_c2_read(struct mvpp2 *priv, int index,
+ struct mvpp2_cls_c2_entry *c2)
+{
+ u32 val;
+ mvpp2_write(priv, MVPP22_CLS_C2_TCAM_IDX, index);
+
+ c2->index = index;
+
+ c2->tcam[0] = mvpp2_read(priv, MVPP22_CLS_C2_TCAM_DATA0);
+ c2->tcam[1] = mvpp2_read(priv, MVPP22_CLS_C2_TCAM_DATA1);
+ c2->tcam[2] = mvpp2_read(priv, MVPP22_CLS_C2_TCAM_DATA2);
+ c2->tcam[3] = mvpp2_read(priv, MVPP22_CLS_C2_TCAM_DATA3);
+ c2->tcam[4] = mvpp2_read(priv, MVPP22_CLS_C2_TCAM_DATA4);
+
+ c2->act = mvpp2_read(priv, MVPP22_CLS_C2_ACT);
+
+ c2->attr[0] = mvpp2_read(priv, MVPP22_CLS_C2_ATTR0);
+ c2->attr[1] = mvpp2_read(priv, MVPP22_CLS_C2_ATTR1);
+ c2->attr[2] = mvpp2_read(priv, MVPP22_CLS_C2_ATTR2);
+ c2->attr[3] = mvpp2_read(priv, MVPP22_CLS_C2_ATTR3);
+
+ val = mvpp2_read(priv, MVPP22_CLS_C2_TCAM_INV);
+ c2->valid = !(val & MVPP22_CLS_C2_TCAM_INV_BIT);
+}
+
/* Initialize the flow table entries for the given flow */
-static void mvpp2_cls_flow_init(struct mvpp2 *priv, struct mvpp2_cls_flow *flow)
+static void mvpp2_cls_flow_init(struct mvpp2 *priv,
+ const struct mvpp2_cls_flow *flow)
{
struct mvpp2_cls_flow_entry fe;
- int i;
+ int i, pri = 0;
+
+ /* Assign default values to all entries in the flow */
+ for (i = MVPP2_CLS_FLT_FIRST(flow->flow_id);
+ i <= MVPP2_CLS_FLT_LAST(flow->flow_id); i++) {
+ memset(&fe, 0, sizeof(fe));
+ fe.index = i;
+ mvpp2_cls_flow_pri_set(&fe, pri++);
- /* C2 lookup */
- memset(&fe, 0, sizeof(fe));
- fe.index = MVPP2_FLOW_C2_ENTRY(flow->flow_id);
+ if (i == MVPP2_CLS_FLT_LAST(flow->flow_id))
+ mvpp2_cls_flow_last_set(&fe, 1);
+
+ mvpp2_cls_flow_write(priv, &fe);
+ }
+
+ /* RSS config C2 lookup */
+ mvpp2_cls_flow_read(priv, MVPP2_CLS_FLT_C2_RSS_ENTRY(flow->flow_id),
+ &fe);
mvpp2_cls_flow_eng_set(&fe, MVPP22_CLS_ENGINE_C2);
mvpp2_cls_flow_port_id_sel(&fe, true);
- mvpp2_cls_flow_last_set(&fe, 0);
- mvpp2_cls_flow_pri_set(&fe, 0);
- mvpp2_cls_flow_seq_set(&fe, MVPP2_CLS_FLOW_SEQ_FIRST1);
+ mvpp2_cls_flow_lu_type_set(&fe, MVPP2_CLS_LU_ALL);
/* Add all ports */
for (i = 0; i < MVPP2_MAX_PORTS; i++)
/* C3Hx lookups */
for (i = 0; i < MVPP2_MAX_PORTS; i++) {
- memset(&fe, 0, sizeof(fe));
- fe.index = MVPP2_PORT_FLOW_HASH_ENTRY(i, flow->flow_id);
+ mvpp2_cls_flow_read(priv,
+ MVPP2_CLS_FLT_HASH_ENTRY(i, flow->flow_id),
+ &fe);
+ /* Set a default engine. Will be overwritten when setting the
+ * real HEK parameters
+ */
+ mvpp2_cls_flow_eng_set(&fe, MVPP22_CLS_ENGINE_C3HA);
mvpp2_cls_flow_port_id_sel(&fe, true);
- mvpp2_cls_flow_pri_set(&fe, i + 1);
- mvpp2_cls_flow_seq_set(&fe, MVPP2_CLS_FLOW_SEQ_MIDDLE);
mvpp2_cls_flow_port_add(&fe, BIT(i));
mvpp2_cls_flow_write(priv, &fe);
}
-
- /* Update the last entry */
- mvpp2_cls_flow_last_set(&fe, 1);
- mvpp2_cls_flow_seq_set(&fe, MVPP2_CLS_FLOW_SEQ_LAST);
-
- mvpp2_cls_flow_write(priv, &fe);
}
/* Adds a field to the Header Extracted Key generation parameters*/
for_each_set_bit(i, &hash_opts, MVPP22_CLS_HEK_N_FIELDS) {
switch (BIT(i)) {
+ case MVPP22_CLS_HEK_OPT_MAC_DA:
+ field_id = MVPP22_CLS_FIELD_MAC_DA;
+ break;
case MVPP22_CLS_HEK_OPT_VLAN:
field_id = MVPP22_CLS_FIELD_VLAN;
break;
return 0;
}
-struct mvpp2_cls_flow *mvpp2_cls_flow_get(int flow)
+const struct mvpp2_cls_flow *mvpp2_cls_flow_get(int flow)
{
- if (flow >= MVPP2_N_FLOWS)
+ if (flow >= MVPP2_N_PRS_FLOWS)
return NULL;
return &cls_flows[flow];
static int mvpp2_port_rss_hash_opts_set(struct mvpp2_port *port, int flow_type,
u16 requested_opts)
{
+ const struct mvpp2_cls_flow *flow;
struct mvpp2_cls_flow_entry fe;
- struct mvpp2_cls_flow *flow;
int i, engine, flow_index;
u16 hash_opts;
- for (i = 0; i < MVPP2_N_FLOWS; i++) {
+ for_each_cls_flow_id_with_type(i, flow_type) {
flow = mvpp2_cls_flow_get(i);
if (!flow)
return -EINVAL;
- if (flow->flow_type != flow_type)
- continue;
-
- flow_index = MVPP2_PORT_FLOW_HASH_ENTRY(port->id,
- flow->flow_id);
+ flow_index = MVPP2_CLS_FLT_HASH_ENTRY(port->id, flow->flow_id);
mvpp2_cls_flow_read(port->priv, flow_index, &fe);
*/
static u16 mvpp2_port_rss_hash_opts_get(struct mvpp2_port *port, int flow_type)
{
+ const struct mvpp2_cls_flow *flow;
struct mvpp2_cls_flow_entry fe;
- struct mvpp2_cls_flow *flow;
int i, flow_index;
u16 hash_opts = 0;
- for (i = 0; i < MVPP2_N_FLOWS; i++) {
+ for_each_cls_flow_id_with_type(i, flow_type) {
flow = mvpp2_cls_flow_get(i);
if (!flow)
return 0;
- if (flow->flow_type != flow_type)
- continue;
-
- flow_index = MVPP2_PORT_FLOW_HASH_ENTRY(port->id,
- flow->flow_id);
+ flow_index = MVPP2_CLS_FLT_HASH_ENTRY(port->id, flow->flow_id);
mvpp2_cls_flow_read(port->priv, flow_index, &fe);
static void mvpp2_cls_port_init_flows(struct mvpp2 *priv)
{
- struct mvpp2_cls_flow *flow;
+ const struct mvpp2_cls_flow *flow;
int i;
- for (i = 0; i < MVPP2_N_FLOWS; i++) {
+ for (i = 0; i < MVPP2_N_PRS_FLOWS; i++) {
flow = mvpp2_cls_flow_get(i);
if (!flow)
break;
}
}
-static void mvpp2_cls_c2_write(struct mvpp2 *priv,
- struct mvpp2_cls_c2_entry *c2)
-{
- mvpp2_write(priv, MVPP22_CLS_C2_TCAM_IDX, c2->index);
-
- /* Write TCAM */
- mvpp2_write(priv, MVPP22_CLS_C2_TCAM_DATA0, c2->tcam[0]);
- mvpp2_write(priv, MVPP22_CLS_C2_TCAM_DATA1, c2->tcam[1]);
- mvpp2_write(priv, MVPP22_CLS_C2_TCAM_DATA2, c2->tcam[2]);
- mvpp2_write(priv, MVPP22_CLS_C2_TCAM_DATA3, c2->tcam[3]);
- mvpp2_write(priv, MVPP22_CLS_C2_TCAM_DATA4, c2->tcam[4]);
-
- mvpp2_write(priv, MVPP22_CLS_C2_ACT, c2->act);
-
- mvpp2_write(priv, MVPP22_CLS_C2_ATTR0, c2->attr[0]);
- mvpp2_write(priv, MVPP22_CLS_C2_ATTR1, c2->attr[1]);
- mvpp2_write(priv, MVPP22_CLS_C2_ATTR2, c2->attr[2]);
- mvpp2_write(priv, MVPP22_CLS_C2_ATTR3, c2->attr[3]);
-}
-
-void mvpp2_cls_c2_read(struct mvpp2 *priv, int index,
- struct mvpp2_cls_c2_entry *c2)
-{
- mvpp2_write(priv, MVPP22_CLS_C2_TCAM_IDX, index);
-
- c2->index = index;
-
- c2->tcam[0] = mvpp2_read(priv, MVPP22_CLS_C2_TCAM_DATA0);
- c2->tcam[1] = mvpp2_read(priv, MVPP22_CLS_C2_TCAM_DATA1);
- c2->tcam[2] = mvpp2_read(priv, MVPP22_CLS_C2_TCAM_DATA2);
- c2->tcam[3] = mvpp2_read(priv, MVPP22_CLS_C2_TCAM_DATA3);
- c2->tcam[4] = mvpp2_read(priv, MVPP22_CLS_C2_TCAM_DATA4);
-
- c2->act = mvpp2_read(priv, MVPP22_CLS_C2_ACT);
-
- c2->attr[0] = mvpp2_read(priv, MVPP22_CLS_C2_ATTR0);
- c2->attr[1] = mvpp2_read(priv, MVPP22_CLS_C2_ATTR1);
- c2->attr[2] = mvpp2_read(priv, MVPP22_CLS_C2_ATTR2);
- c2->attr[3] = mvpp2_read(priv, MVPP22_CLS_C2_ATTR3);
-}
-
static void mvpp2_port_c2_cls_init(struct mvpp2_port *port)
{
struct mvpp2_cls_c2_entry c2;
c2.tcam[4] = MVPP22_CLS_C2_PORT_ID(pmap);
c2.tcam[4] |= MVPP22_CLS_C2_TCAM_EN(MVPP22_CLS_C2_PORT_ID(pmap));
+ /* Match on Lookup Type */
+ c2.tcam[4] |= MVPP22_CLS_C2_TCAM_EN(MVPP22_CLS_C2_LU_TYPE(MVPP2_CLS_LU_TYPE_MASK));
+ c2.tcam[4] |= MVPP22_CLS_C2_LU_TYPE(MVPP2_CLS_LU_ALL);
+
/* Update RSS status after matching this entry */
c2.act = MVPP22_CLS_C2_ACT_RSS_EN(MVPP22_C2_UPD_LOCK);
c2.attr[0] = MVPP22_CLS_C2_ATTR0_QHIGH(qh) |
MVPP22_CLS_C2_ATTR0_QLOW(ql);
+ c2.valid = true;
+
mvpp2_cls_c2_write(port->priv, &c2);
}
{
struct mvpp2_cls_lookup_entry le;
struct mvpp2_cls_flow_entry fe;
+ struct mvpp2_cls_c2_entry c2;
int index;
/* Enable classifier */
mvpp2_cls_lookup_write(priv, &le);
}
+ /* Clear C2 TCAM engine table */
+ memset(&c2, 0, sizeof(c2));
+ c2.valid = false;
+ for (index = 0; index < MVPP22_CLS_C2_N_ENTRIES; index++) {
+ c2.index = index;
+ mvpp2_cls_c2_write(priv, &c2);
+ }
+
mvpp2_cls_port_init_flows(priv);
}
mvpp2_cls_c2_write(port->priv, &c2);
}
-void mvpp22_rss_enable(struct mvpp2_port *port)
+void mvpp22_port_rss_enable(struct mvpp2_port *port)
{
mvpp2_rss_port_c2_enable(port);
}
-void mvpp22_rss_disable(struct mvpp2_port *port)
+void mvpp22_port_rss_disable(struct mvpp2_port *port)
{
mvpp2_rss_port_c2_disable(port);
}
return 0;
}
-void mvpp22_rss_port_init(struct mvpp2_port *port)
+void mvpp22_port_rss_init(struct mvpp2_port *port)
{
struct mvpp2 *priv = port->priv;
int i;
MVPP22_CLS_FIELD_L4DIP = 0x1e,
};
-enum mvpp2_cls_flow_seq {
- MVPP2_CLS_FLOW_SEQ_NORMAL = 0,
- MVPP2_CLS_FLOW_SEQ_FIRST1,
- MVPP2_CLS_FLOW_SEQ_FIRST2,
- MVPP2_CLS_FLOW_SEQ_LAST,
- MVPP2_CLS_FLOW_SEQ_MIDDLE
-};
-
/* Classifier C2 engine constants */
#define MVPP22_CLS_C2_TCAM_EN(data) ((data) << 16)
struct mvpp2_cls_c2_entry {
u32 index;
+ /* TCAM lookup key */
u32 tcam[MVPP2_CLS_C2_TCAM_WORDS];
+ /* Actions to perform upon TCAM match */
u32 act;
+ /* Attributes relative to the actions to perform */
u32 attr[MVPP2_CLS_C2_ATTR_WORDS];
+ /* Entry validity */
+ u8 valid;
};
/* Classifier C2 engine entries */
-#define MVPP22_CLS_C2_RSS_ENTRY(port) (port)
-#define MVPP22_CLS_C2_N_ENTRIES MVPP2_MAX_PORTS
+#define MVPP22_CLS_C2_N_ENTRIES 256
-/* RSS flow entries in the flow table. We have 2 entries per port for RSS.
- *
- * The first performs a lookup using the C2 TCAM engine, to tag the
- * packet for software forwarding (needed for RSS), enable or disable RSS, and
- * assign the default rx queue.
- *
- * The second configures the hash generation, by specifying which fields of the
- * packet header are used to generate the hash, and specifies the relevant hash
- * engine to use.
- */
-#define MVPP22_RSS_FLOW_C2_OFFS 0
-#define MVPP22_RSS_FLOW_HASH_OFFS 1
-#define MVPP22_RSS_FLOW_SIZE (MVPP22_RSS_FLOW_HASH_OFFS + 1)
+/* Number of per-port dedicated entries in the C2 TCAM */
+#define MVPP22_CLS_C2_PORT_RANGE 8
-#define MVPP22_RSS_FLOW_C2(port) ((port) * MVPP22_RSS_FLOW_SIZE + \
- MVPP22_RSS_FLOW_C2_OFFS)
-#define MVPP22_RSS_FLOW_HASH(port) ((port) * MVPP22_RSS_FLOW_SIZE + \
- MVPP22_RSS_FLOW_HASH_OFFS)
-#define MVPP22_RSS_FLOW_FIRST(port) MVPP22_RSS_FLOW_C2(port)
+#define MVPP22_CLS_C2_PORT_FIRST(p) (MVPP22_CLS_C2_N_ENTRIES - \
+ ((p) * MVPP22_CLS_C2_PORT_RANGE))
+#define MVPP22_CLS_C2_RSS_ENTRY(p) (MVPP22_CLS_C2_PORT_FIRST(p) - 1)
/* Packet flow ID */
enum mvpp2_prs_flow {
MVPP2_FL_LAST,
};
+enum mvpp2_cls_lu_type {
+ MVPP2_CLS_LU_ALL = 0,
+};
+
+/* LU Type defined for all engines, and specified in the flow table */
+#define MVPP2_CLS_LU_TYPE_MASK 0x3f
+
+#define MVPP2_N_FLOWS (MVPP2_FL_LAST - MVPP2_FL_START)
+
struct mvpp2_cls_flow {
/* The L2-L4 traffic flow type */
int flow_type;
struct mvpp2_prs_result_info prs_ri;
};
-#define MVPP2_N_FLOWS 52
+#define MVPP2_CLS_FLT_ENTRIES_PER_FLOW (MVPP2_MAX_PORTS + 1)
+#define MVPP2_CLS_FLT_FIRST(id) (((id) - MVPP2_FL_START) * \
+ MVPP2_CLS_FLT_ENTRIES_PER_FLOW)
+#define MVPP2_CLS_FLT_C2_RSS_ENTRY(id) (MVPP2_CLS_FLT_FIRST(id))
+#define MVPP2_CLS_FLT_HASH_ENTRY(port, id) (MVPP2_CLS_FLT_C2_RSS_ENTRY(id) + (port) + 1)
+#define MVPP2_CLS_FLT_LAST(id) (MVPP2_CLS_FLT_FIRST(id) + \
+ MVPP2_CLS_FLT_ENTRIES_PER_FLOW - 1)
+
+/* Iterate on each classifier flow id. Sets 'i' to be the index of the first
+ * entry in the cls_flows table for each different flow_id.
+ * This relies on entries having the same flow_id in the cls_flows table being
+ * contiguous.
+ */
+#define for_each_cls_flow_id(i) \
+ for ((i) = 0; (i) < MVPP2_N_PRS_FLOWS; (i)++) \
+ if ((i) > 0 && \
+ cls_flows[(i)].flow_id == cls_flows[(i) - 1].flow_id) \
+ continue; \
+ else
+
+/* Iterate on each classifier flow that has a given flow_type. Sets 'i' to be
+ * the index of the first entry in the cls_flow table for each different flow_id
+ * that has the given flow_type. This allows to operate on all flows that
+ * matches a given ethtool flow type.
+ */
+#define for_each_cls_flow_id_with_type(i, type) \
+ for_each_cls_flow_id((i)) \
+ if (cls_flows[(i)].flow_type != (type)) \
+ continue; \
+ else
-#define MVPP2_ENTRIES_PER_FLOW (MVPP2_MAX_PORTS + 1)
-#define MVPP2_FLOW_C2_ENTRY(id) ((id) * MVPP2_ENTRIES_PER_FLOW)
-#define MVPP2_PORT_FLOW_HASH_ENTRY(port, id) ((id) * MVPP2_ENTRIES_PER_FLOW + \
- (port) + 1)
struct mvpp2_cls_flow_entry {
u32 index;
u32 data[MVPP2_CLS_FLOWS_TBL_DATA_WORDS];
};
void mvpp22_rss_fill_table(struct mvpp2_port *port, u32 table);
+void mvpp22_port_rss_init(struct mvpp2_port *port);
-void mvpp22_rss_port_init(struct mvpp2_port *port);
-
-void mvpp22_rss_enable(struct mvpp2_port *port);
-void mvpp22_rss_disable(struct mvpp2_port *port);
+void mvpp22_port_rss_enable(struct mvpp2_port *port);
+void mvpp22_port_rss_disable(struct mvpp2_port *port);
int mvpp2_ethtool_rxfh_get(struct mvpp2_port *port, struct ethtool_rxnfc *info);
int mvpp2_ethtool_rxfh_set(struct mvpp2_port *port, struct ethtool_rxnfc *info);
u16 mvpp2_flow_get_hek_fields(struct mvpp2_cls_flow_entry *fe);
-struct mvpp2_cls_flow *mvpp2_cls_flow_get(int flow);
+const struct mvpp2_cls_flow *mvpp2_cls_flow_get(int flow);
u32 mvpp2_cls_flow_hits(struct mvpp2 *priv, int index);
struct mvpp2 *priv;
};
+struct mvpp2_dbgfs_c2_entry {
+ int id;
+ struct mvpp2 *priv;
+};
+
struct mvpp2_dbgfs_flow_entry {
int flow;
struct mvpp2 *priv;
};
+struct mvpp2_dbgfs_flow_tbl_entry {
+ int id;
+ struct mvpp2 *priv;
+};
+
struct mvpp2_dbgfs_port_flow_entry {
struct mvpp2_port *port;
struct mvpp2_dbgfs_flow_entry *dbg_fe;
};
+struct mvpp2_dbgfs_entries {
+ /* Entries for Header Parser debug info */
+ struct mvpp2_dbgfs_prs_entry prs_entries[MVPP2_PRS_TCAM_SRAM_SIZE];
+
+ /* Entries for Classifier C2 engine debug info */
+ struct mvpp2_dbgfs_c2_entry c2_entries[MVPP22_CLS_C2_N_ENTRIES];
+
+ /* Entries for Classifier Flow Table debug info */
+ struct mvpp2_dbgfs_flow_tbl_entry flt_entries[MVPP2_CLS_FLOWS_TBL_SIZE];
+
+ /* Entries for Classifier flows debug info */
+ struct mvpp2_dbgfs_flow_entry flow_entries[MVPP2_N_PRS_FLOWS];
+
+ /* Entries for per-port flows debug info */
+ struct mvpp2_dbgfs_port_flow_entry port_flow_entries[MVPP2_MAX_PORTS];
+};
+
static int mvpp2_dbgfs_flow_flt_hits_show(struct seq_file *s, void *unused)
{
- struct mvpp2_dbgfs_flow_entry *entry = s->private;
- int id = MVPP2_FLOW_C2_ENTRY(entry->flow);
+ struct mvpp2_dbgfs_flow_tbl_entry *entry = s->private;
- u32 hits = mvpp2_cls_flow_hits(entry->priv, id);
+ u32 hits = mvpp2_cls_flow_hits(entry->priv, entry->id);
seq_printf(s, "%u\n", hits);
static int mvpp2_dbgfs_flow_type_show(struct seq_file *s, void *unused)
{
struct mvpp2_dbgfs_flow_entry *entry = s->private;
- struct mvpp2_cls_flow *f;
+ const struct mvpp2_cls_flow *f;
const char *flow_name;
f = mvpp2_cls_flow_get(entry->flow);
return 0;
}
-static int mvpp2_dbgfs_flow_type_open(struct inode *inode, struct file *file)
-{
- return single_open(file, mvpp2_dbgfs_flow_type_show, inode->i_private);
-}
-
-static int mvpp2_dbgfs_flow_type_release(struct inode *inode, struct file *file)
-{
- struct seq_file *seq = file->private_data;
- struct mvpp2_dbgfs_flow_entry *flow_entry = seq->private;
-
- kfree(flow_entry);
- return single_release(inode, file);
-}
-
-static const struct file_operations mvpp2_dbgfs_flow_type_fops = {
- .open = mvpp2_dbgfs_flow_type_open,
- .read = seq_read,
- .release = mvpp2_dbgfs_flow_type_release,
-};
+DEFINE_SHOW_ATTRIBUTE(mvpp2_dbgfs_flow_type);
static int mvpp2_dbgfs_flow_id_show(struct seq_file *s, void *unused)
{
- struct mvpp2_dbgfs_flow_entry *entry = s->private;
- struct mvpp2_cls_flow *f;
+ const struct mvpp2_dbgfs_flow_entry *entry = s->private;
+ const struct mvpp2_cls_flow *f;
f = mvpp2_cls_flow_get(entry->flow);
if (!f)
struct mvpp2_dbgfs_port_flow_entry *entry = s->private;
struct mvpp2_port *port = entry->port;
struct mvpp2_cls_flow_entry fe;
- struct mvpp2_cls_flow *f;
+ const struct mvpp2_cls_flow *f;
int flow_index;
u16 hash_opts;
if (!f)
return -EINVAL;
- flow_index = MVPP2_PORT_FLOW_HASH_ENTRY(entry->port->id, f->flow_id);
+ flow_index = MVPP2_CLS_FLT_HASH_ENTRY(entry->port->id, f->flow_id);
mvpp2_cls_flow_read(port->priv, flow_index, &fe);
return 0;
}
-static int mvpp2_dbgfs_port_flow_hash_opt_open(struct inode *inode,
- struct file *file)
-{
- return single_open(file, mvpp2_dbgfs_port_flow_hash_opt_show,
- inode->i_private);
-}
-
-static int mvpp2_dbgfs_port_flow_hash_opt_release(struct inode *inode,
- struct file *file)
-{
- struct seq_file *seq = file->private_data;
- struct mvpp2_dbgfs_port_flow_entry *flow_entry = seq->private;
-
- kfree(flow_entry);
- return single_release(inode, file);
-}
-
-static const struct file_operations mvpp2_dbgfs_port_flow_hash_opt_fops = {
- .open = mvpp2_dbgfs_port_flow_hash_opt_open,
- .read = seq_read,
- .release = mvpp2_dbgfs_port_flow_hash_opt_release,
-};
+DEFINE_SHOW_ATTRIBUTE(mvpp2_dbgfs_port_flow_hash_opt);
static int mvpp2_dbgfs_port_flow_engine_show(struct seq_file *s, void *unused)
{
struct mvpp2_dbgfs_port_flow_entry *entry = s->private;
struct mvpp2_port *port = entry->port;
struct mvpp2_cls_flow_entry fe;
- struct mvpp2_cls_flow *f;
+ const struct mvpp2_cls_flow *f;
int flow_index, engine;
f = mvpp2_cls_flow_get(entry->dbg_fe->flow);
if (!f)
return -EINVAL;
- flow_index = MVPP2_PORT_FLOW_HASH_ENTRY(entry->port->id, f->flow_id);
+ flow_index = MVPP2_CLS_FLT_HASH_ENTRY(entry->port->id, f->flow_id);
mvpp2_cls_flow_read(port->priv, flow_index, &fe);
static int mvpp2_dbgfs_flow_c2_hits_show(struct seq_file *s, void *unused)
{
- struct mvpp2_port *port = s->private;
+ struct mvpp2_dbgfs_c2_entry *entry = s->private;
u32 hits;
- hits = mvpp2_cls_c2_hit_count(port->priv,
- MVPP22_CLS_C2_RSS_ENTRY(port->id));
+ hits = mvpp2_cls_c2_hit_count(entry->priv, entry->id);
seq_printf(s, "%u\n", hits);
static int mvpp2_dbgfs_flow_c2_rxq_show(struct seq_file *s, void *unused)
{
- struct mvpp2_port *port = s->private;
+ struct mvpp2_dbgfs_c2_entry *entry = s->private;
struct mvpp2_cls_c2_entry c2;
u8 qh, ql;
- mvpp2_cls_c2_read(port->priv, MVPP22_CLS_C2_RSS_ENTRY(port->id), &c2);
+ mvpp2_cls_c2_read(entry->priv, entry->id, &c2);
qh = (c2.attr[0] >> MVPP22_CLS_C2_ATTR0_QHIGH_OFFS) &
MVPP22_CLS_C2_ATTR0_QHIGH_MASK;
static int mvpp2_dbgfs_flow_c2_enable_show(struct seq_file *s, void *unused)
{
- struct mvpp2_port *port = s->private;
+ struct mvpp2_dbgfs_c2_entry *entry = s->private;
struct mvpp2_cls_c2_entry c2;
int enabled;
- mvpp2_cls_c2_read(port->priv, MVPP22_CLS_C2_RSS_ENTRY(port->id), &c2);
+ mvpp2_cls_c2_read(entry->priv, entry->id, &c2);
enabled = !!(c2.attr[2] & MVPP22_CLS_C2_ATTR2_RSS_EN);
return 0;
}
-static int mvpp2_dbgfs_prs_valid_open(struct inode *inode, struct file *file)
-{
- return single_open(file, mvpp2_dbgfs_prs_valid_show, inode->i_private);
-}
-
-static int mvpp2_dbgfs_prs_valid_release(struct inode *inode, struct file *file)
-{
- struct seq_file *seq = file->private_data;
- struct mvpp2_dbgfs_prs_entry *entry = seq->private;
-
- kfree(entry);
- return single_release(inode, file);
-}
-
-static const struct file_operations mvpp2_dbgfs_prs_valid_fops = {
- .open = mvpp2_dbgfs_prs_valid_open,
- .read = seq_read,
- .release = mvpp2_dbgfs_prs_valid_release,
-};
+DEFINE_SHOW_ATTRIBUTE(mvpp2_dbgfs_prs_valid);
static int mvpp2_dbgfs_flow_port_init(struct dentry *parent,
struct mvpp2_port *port,
if (IS_ERR(port_dir))
return PTR_ERR(port_dir);
- /* This will be freed by 'hash_opts' release op */
- port_entry = kmalloc(sizeof(*port_entry), GFP_KERNEL);
- if (!port_entry)
- return -ENOMEM;
+ port_entry = &port->priv->dbgfs_entries->port_flow_entries[port->id];
port_entry->port = port;
port_entry->dbg_fe = entry;
if (!flow_entry_dir)
return -ENOMEM;
- /* This will be freed by 'type' release op */
- entry = kmalloc(sizeof(*entry), GFP_KERNEL);
- if (!entry)
- return -ENOMEM;
+ entry = &priv->dbgfs_entries->flow_entries[flow];
entry->flow = flow;
entry->priv = priv;
- debugfs_create_file("flow_hits", 0444, flow_entry_dir, entry,
- &mvpp2_dbgfs_flow_flt_hits_fops);
-
debugfs_create_file("dec_hits", 0444, flow_entry_dir, entry,
&mvpp2_dbgfs_flow_dec_hits_fops);
if (ret)
return ret;
}
+
return 0;
}
if (!flow_dir)
return -ENOMEM;
- for (i = 0; i < MVPP2_N_FLOWS; i++) {
+ for (i = 0; i < MVPP2_N_PRS_FLOWS; i++) {
ret = mvpp2_dbgfs_flow_entry_init(flow_dir, priv, i);
if (ret)
return ret;
if (!prs_entry_dir)
return -ENOMEM;
- /* The 'valid' entry's ops will free that */
- entry = kmalloc(sizeof(*entry), GFP_KERNEL);
- if (!entry)
- return -ENOMEM;
+ entry = &priv->dbgfs_entries->prs_entries[tid];
entry->tid = tid;
entry->priv = priv;
return 0;
}
+static int mvpp2_dbgfs_c2_entry_init(struct dentry *parent,
+ struct mvpp2 *priv, int id)
+{
+ struct mvpp2_dbgfs_c2_entry *entry;
+ struct dentry *c2_entry_dir;
+ char c2_entry_name[10];
+
+ if (id >= MVPP22_CLS_C2_N_ENTRIES)
+ return -EINVAL;
+
+ sprintf(c2_entry_name, "%03d", id);
+
+ c2_entry_dir = debugfs_create_dir(c2_entry_name, parent);
+ if (!c2_entry_dir)
+ return -ENOMEM;
+
+ entry = &priv->dbgfs_entries->c2_entries[id];
+
+ entry->id = id;
+ entry->priv = priv;
+
+ debugfs_create_file("hits", 0444, c2_entry_dir, entry,
+ &mvpp2_dbgfs_flow_c2_hits_fops);
+
+ debugfs_create_file("default_rxq", 0444, c2_entry_dir, entry,
+ &mvpp2_dbgfs_flow_c2_rxq_fops);
+
+ debugfs_create_file("rss_enable", 0444, c2_entry_dir, entry,
+ &mvpp2_dbgfs_flow_c2_enable_fops);
+
+ return 0;
+}
+
+static int mvpp2_dbgfs_flow_tbl_entry_init(struct dentry *parent,
+ struct mvpp2 *priv, int id)
+{
+ struct mvpp2_dbgfs_flow_tbl_entry *entry;
+ struct dentry *flow_tbl_entry_dir;
+ char flow_tbl_entry_name[10];
+
+ if (id >= MVPP2_CLS_FLOWS_TBL_SIZE)
+ return -EINVAL;
+
+ sprintf(flow_tbl_entry_name, "%03d", id);
+
+ flow_tbl_entry_dir = debugfs_create_dir(flow_tbl_entry_name, parent);
+ if (!flow_tbl_entry_dir)
+ return -ENOMEM;
+
+ entry = &priv->dbgfs_entries->flt_entries[id];
+
+ entry->id = id;
+ entry->priv = priv;
+
+ debugfs_create_file("hits", 0444, flow_tbl_entry_dir, entry,
+ &mvpp2_dbgfs_flow_flt_hits_fops);
+
+ return 0;
+}
+
+static int mvpp2_dbgfs_cls_init(struct dentry *parent, struct mvpp2 *priv)
+{
+ struct dentry *cls_dir, *c2_dir, *flow_tbl_dir;
+ int i, ret;
+
+ cls_dir = debugfs_create_dir("classifier", parent);
+ if (!cls_dir)
+ return -ENOMEM;
+
+ c2_dir = debugfs_create_dir("c2", cls_dir);
+ if (!c2_dir)
+ return -ENOMEM;
+
+ for (i = 0; i < MVPP22_CLS_C2_N_ENTRIES; i++) {
+ ret = mvpp2_dbgfs_c2_entry_init(c2_dir, priv, i);
+ if (ret)
+ return ret;
+ }
+
+ flow_tbl_dir = debugfs_create_dir("flow_table", cls_dir);
+ if (!flow_tbl_dir)
+ return -ENOMEM;
+
+ for (i = 0; i < MVPP2_CLS_FLOWS_TBL_SIZE; i++) {
+ ret = mvpp2_dbgfs_flow_tbl_entry_init(flow_tbl_dir, priv, i);
+ if (ret)
+ return ret;
+ }
+
+ return 0;
+}
+
static int mvpp2_dbgfs_port_init(struct dentry *parent,
struct mvpp2_port *port)
{
debugfs_create_file("vid_filter", 0444, port_dir, port,
&mvpp2_dbgfs_port_vid_fops);
- debugfs_create_file("c2_hits", 0444, port_dir, port,
- &mvpp2_dbgfs_flow_c2_hits_fops);
-
- debugfs_create_file("default_rxq", 0444, port_dir, port,
- &mvpp2_dbgfs_flow_c2_rxq_fops);
-
- debugfs_create_file("rss_enable", 0444, port_dir, port,
- &mvpp2_dbgfs_flow_c2_enable_fops);
-
return 0;
}
void mvpp2_dbgfs_cleanup(struct mvpp2 *priv)
{
debugfs_remove_recursive(priv->dbgfs_dir);
+
+ kfree(priv->dbgfs_entries);
}
void mvpp2_dbgfs_init(struct mvpp2 *priv, const char *name)
return;
priv->dbgfs_dir = mvpp2_dir;
+ priv->dbgfs_entries = kzalloc(sizeof(*priv->dbgfs_entries), GFP_KERNEL);
+ if (!priv->dbgfs_entries)
+ goto err;
ret = mvpp2_dbgfs_prs_init(mvpp2_dir, priv);
if (ret)
goto err;
+ ret = mvpp2_dbgfs_cls_init(mvpp2_dir, priv);
+ if (ret)
+ goto err;
+
for (i = 0; i < priv->port_count; i++) {
ret = mvpp2_dbgfs_port_init(mvpp2_dir, priv->port_list[i]);
if (ret)
if (changed & NETIF_F_RXHASH) {
if (features & NETIF_F_RXHASH)
- mvpp22_rss_enable(port);
+ mvpp22_port_rss_enable(port);
else
- mvpp22_rss_disable(port);
+ mvpp22_port_rss_disable(port);
}
return 0;
mvpp2_cls_port_config(port);
if (mvpp22_rss_is_supported())
- mvpp22_rss_port_init(port);
+ mvpp22_port_rss_init(port);
/* Provide an initial Rx packet size */
port->pkt_size = MVPP2_RX_PKT_SIZE(port->dev->mtu);
struct mvpp2_port *port;
struct mvpp2_port_pcpu *port_pcpu;
struct device_node *port_node = to_of_node(port_fwnode);
+ netdev_features_t features;
struct net_device *dev;
struct resource *res;
struct phylink *phylink;
unsigned long flags = 0;
bool has_tx_irqs;
u32 id;
- int features;
int phy_mode;
int err, i;
*/
wmb();
- if (netif_xmit_stopped(netdev_get_tx_queue(dev, 0)) || !skb->xmit_more)
+ if (netif_xmit_stopped(netdev_get_tx_queue(dev, 0)) ||
+ !netdev_xmit_more())
mtk_w32(eth, txd->txd2, MTK_QTX_CTX_PTR);
return 0;
config MLX4_CORE
tristate
depends on PCI
+ select NET_DEVLINK
default n
config MLX4_DEBUG
}
u16 mlx4_en_select_queue(struct net_device *dev, struct sk_buff *skb,
- struct net_device *sb_dev,
- select_queue_fallback_t fallback)
+ struct net_device *sb_dev)
{
struct mlx4_en_priv *priv = netdev_priv(dev);
u16 rings_p_up = priv->num_tx_rings_p_up;
if (netdev_get_num_tc(dev))
- return fallback(dev, skb, NULL);
+ return netdev_pick_tx(dev, skb, NULL);
- return fallback(dev, skb, NULL) % rings_p_up;
+ return netdev_pick_tx(dev, skb, NULL) % rings_p_up;
}
static void mlx4_bf_copy(void __iomem *dst, const void *src,
send_doorbell = __netdev_tx_sent_queue(ring->tx_queue,
tx_info->nr_bytes,
- skb->xmit_more);
+ netdev_xmit_more());
real_size = (real_size / 16) & 0x3f;
void mlx4_en_tx_irq(struct mlx4_cq *mcq);
u16 mlx4_en_select_queue(struct net_device *dev, struct sk_buff *skb,
- struct net_device *sb_dev,
- select_queue_fallback_t fallback);
+ struct net_device *sb_dev);
netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev);
netdev_tx_t mlx4_en_xmit_frame(struct mlx4_en_rx_ring *rx_ring,
struct mlx4_en_rx_alloc *frame,
config MLX5_CORE
tristate "Mellanox 5th generation network adapters (ConnectX series) core driver"
depends on PCI
+ select NET_DEVLINK
imply PTP_1588_CLOCK
imply VXLAN
default n
struct mlx5_cmd *cmd = &dev->cmd;
snprintf(cmd->wq_name, sizeof(cmd->wq_name), "mlx5_cmd_%s",
- dev_name(&dev->pdev->dev));
+ dev->priv.name);
}
static void clean_debug_files(struct mlx5_core_dev *dev)
memset(cmd, 0, sizeof(*cmd));
cmd_if_rev = cmdif_rev(dev);
if (cmd_if_rev != CMD_IF_REV) {
- dev_err(&dev->pdev->dev,
- "Driver cmdif rev(%d) differs from firmware's(%d)\n",
- CMD_IF_REV, cmd_if_rev);
+ mlx5_core_err(dev,
+ "Driver cmdif rev(%d) differs from firmware's(%d)\n",
+ CMD_IF_REV, cmd_if_rev);
return -EINVAL;
}
cmd->log_sz = cmd_l >> 4 & 0xf;
cmd->log_stride = cmd_l & 0xf;
if (1 << cmd->log_sz > MLX5_MAX_COMMANDS) {
- dev_err(&dev->pdev->dev, "firmware reports too many outstanding commands %d\n",
- 1 << cmd->log_sz);
+ mlx5_core_err(dev, "firmware reports too many outstanding commands %d\n",
+ 1 << cmd->log_sz);
err = -EINVAL;
goto err_free_page;
}
if (cmd->log_sz + cmd->log_stride > MLX5_ADAPTER_PAGE_SHIFT) {
- dev_err(&dev->pdev->dev, "command queue size overflow\n");
+ mlx5_core_err(dev, "command queue size overflow\n");
err = -EINVAL;
goto err_free_page;
}
cmd->cmdif_rev = ioread32be(&dev->iseg->cmdif_rev_fw_sub) >> 16;
if (cmd->cmdif_rev > CMD_IF_REV) {
- dev_err(&dev->pdev->dev, "driver does not support command interface version. driver %d, firmware %d\n",
- CMD_IF_REV, cmd->cmdif_rev);
+ mlx5_core_err(dev, "driver does not support command interface version. driver %d, firmware %d\n",
+ CMD_IF_REV, cmd->cmdif_rev);
err = -EOPNOTSUPP;
goto err_free_page;
}
cmd_h = (u32)((u64)(cmd->dma) >> 32);
cmd_l = (u32)(cmd->dma);
if (cmd_l & 0xfff) {
- dev_err(&dev->pdev->dev, "invalid command queue address\n");
+ mlx5_core_err(dev, "invalid command queue address\n");
err = -ENOMEM;
goto err_free_page;
}
set_wqname(dev);
cmd->wq = create_singlethread_workqueue(cmd->wq_name);
if (!cmd->wq) {
- dev_err(&dev->pdev->dev, "failed to create command workqueue\n");
+ mlx5_core_err(dev, "failed to create command workqueue\n");
err = -ENOMEM;
goto err_cache;
}
TP_ARGS(tracer, trace_timestamp, lost, event_id, msg),
TP_STRUCT__entry(
- __string(dev_name, dev_name(&tracer->dev->pdev->dev))
+ __string(dev_name, tracer->dev->priv.name)
__field(u64, trace_timestamp)
__field(bool, lost)
__field(u8, event_id)
),
TP_fast_assign(
- __assign_str(dev_name, dev_name(&tracer->dev->pdev->dev));
+ __assign_str(dev_name, tracer->dev->priv.name);
__entry->trace_timestamp = trace_timestamp;
__entry->lost = lost;
__entry->event_id = event_id;
struct net_dim_cq_moder rx_cq_moderation;
struct net_dim_cq_moder tx_cq_moderation;
bool lro_en;
- u32 lro_wqe_sz;
u8 tx_min_inline_mode;
bool vlan_strip_disable;
bool scatter_fcs_en;
void mlx5e_build_ptys2ethtool_map(void);
u16 mlx5e_select_queue(struct net_device *dev, struct sk_buff *skb,
- struct net_device *sb_dev,
- select_queue_fallback_t fallback);
+ struct net_device *sb_dev);
netdev_tx_t mlx5e_xmit(struct sk_buff *skb, struct net_device *dev);
netdev_tx_t mlx5e_sq_xmit(struct mlx5e_txqsq *sq, struct sk_buff *skb,
- struct mlx5e_tx_wqe *wqe, u16 pi);
+ struct mlx5e_tx_wqe *wqe, u16 pi, bool xmit_more);
void mlx5e_completion_event(struct mlx5_core_cq *mcq);
void mlx5e_cq_error_event(struct mlx5_core_cq *mcq, enum mlx5_event event);
MLX5_CAP_FLOWTABLE_NIC_RX(mdev, ft_field_support.inner_ip_version));
}
+static inline bool mlx5_tx_swp_supported(struct mlx5_core_dev *mdev)
+{
+ return MLX5_CAP_ETH(mdev, swp) &&
+ MLX5_CAP_ETH(mdev, swp_csum) && MLX5_CAP_ETH(mdev, swp_lso);
+}
+
+struct mlx5e_swp_spec {
+ __be16 l3_proto;
+ u8 l4_proto;
+ u8 is_tun;
+ __be16 tun_l3_proto;
+ u8 tun_l4_proto;
+};
+
+static inline void
+mlx5e_set_eseg_swp(struct sk_buff *skb, struct mlx5_wqe_eth_seg *eseg,
+ struct mlx5e_swp_spec *swp_spec)
+{
+ /* SWP offsets are in 2-bytes words */
+ eseg->swp_outer_l3_offset = skb_network_offset(skb) / 2;
+ if (swp_spec->l3_proto == htons(ETH_P_IPV6))
+ eseg->swp_flags |= MLX5_ETH_WQE_SWP_OUTER_L3_IPV6;
+ if (swp_spec->l4_proto) {
+ eseg->swp_outer_l4_offset = skb_transport_offset(skb) / 2;
+ if (swp_spec->l4_proto == IPPROTO_UDP)
+ eseg->swp_flags |= MLX5_ETH_WQE_SWP_OUTER_L4_UDP;
+ }
+
+ if (swp_spec->is_tun) {
+ eseg->swp_inner_l3_offset = skb_inner_network_offset(skb) / 2;
+ if (swp_spec->tun_l3_proto == htons(ETH_P_IPV6))
+ eseg->swp_flags |= MLX5_ETH_WQE_SWP_INNER_L3_IPV6;
+ } else { /* typically for ipsec when xfrm mode != XFRM_MODE_TUNNEL */
+ eseg->swp_inner_l3_offset = skb_network_offset(skb) / 2;
+ if (swp_spec->l3_proto == htons(ETH_P_IPV6))
+ eseg->swp_flags |= MLX5_ETH_WQE_SWP_INNER_L3_IPV6;
+ }
+ switch (swp_spec->tun_l4_proto) {
+ case IPPROTO_UDP:
+ eseg->swp_flags |= MLX5_ETH_WQE_SWP_INNER_L4_UDP;
+ /* fall through */
+ case IPPROTO_TCP:
+ eseg->swp_inner_l4_offset = skb_inner_transport_offset(skb) / 2;
+ break;
+ }
+}
+
static inline void mlx5e_sq_fetch_wqe(struct mlx5e_txqsq *sq,
struct mlx5e_tx_wqe **wqe,
u16 *pi)
*/
wmb();
- mlx5_write64((__be32 *)ctrl, uar_map, NULL);
+ mlx5_write64((__be32 *)ctrl, uar_map);
}
static inline void mlx5e_cq_arm(struct mlx5e_cq *cq)
int mlx5e_attach_netdev(struct mlx5e_priv *priv);
void mlx5e_detach_netdev(struct mlx5e_priv *priv);
void mlx5e_destroy_netdev(struct mlx5e_priv *priv);
+void mlx5e_set_netdev_mtu_boundaries(struct mlx5e_priv *priv);
void mlx5e_build_nic_params(struct mlx5_core_dev *mdev,
struct mlx5e_rss_params *rss_params,
struct mlx5e_params *params,
}
/**
- * update_buffer_lossy()
- * max_mtu: netdev's max_mtu
- * pfc_en: <input> current pfc configuration
- * buffer: <input> current prio to buffer mapping
- * xoff: <input> xoff value
- * port_buffer: <output> port receive buffer configuration
- * change: <output>
+ * update_buffer_lossy - Update buffer configuration based on pfc
+ * @max_mtu: netdev's max_mtu
+ * @pfc_en: <input> current pfc configuration
+ * @buffer: <input> current prio to buffer mapping
+ * @xoff: <input> xoff value
+ * @port_buffer: <output> port receive buffer configuration
+ * @change: <output>
*
- * Update buffer configuration based on pfc configuraiton and priority
- * to buffer mapping.
- * Buffer's lossy bit is changed to:
- * lossless if there is at least one PFC enabled priority mapped to this buffer
- * lossy if all priorities mapped to this buffer are PFC disabled
+ * Update buffer configuration based on pfc configuraiton and
+ * priority to buffer mapping.
+ * Buffer's lossy bit is changed to:
+ * lossless if there is at least one PFC enabled priority
+ * mapped to this buffer lossy if all priorities mapped to
+ * this buffer are PFC disabled
*
- * Return:
- * Return 0 if no error.
- * Set change to true if buffer configuration is modified.
+ * @return: 0 if no error,
+ * sets change to true if buffer configuration was modified.
*/
static int update_buffer_lossy(unsigned int max_mtu,
u8 pfc_en, u8 *buffer, u32 xoff,
if (ret)
return ret;
- if (mlx5_lag_is_multipath(mdev) && !rt->rt_gateway)
+ if (mlx5_lag_is_multipath(mdev) && rt->rt_gw_family != AF_INET)
return -ENETUNREACH;
#else
return -EOPNOTSUPP;
if (dev->rtnl_link_ops)
return dev->rtnl_link_ops->kind;
else
- return "";
+ return "unknown";
}
static int mlx5e_route_lookup_ipv6(struct mlx5e_priv *priv,
headers_c, headers_v);
} else {
netdev_warn(priv->netdev,
- "decapsulation offload is not supported for %s net device (%d)\n",
- mlx5e_netdev_kind(filter_dev), tunnel_type);
+ "decapsulation offload is not supported for %s (kind: \"%s\")\n",
+ netdev_name(filter_dev),
+ mlx5e_netdev_kind(filter_dev));
+
return -EOPNOTSUPP;
}
return err;
#include "en_accel/tls_rxtx.h"
#include "en.h"
+#if IS_ENABLED(CONFIG_GENEVE)
+static inline bool mlx5_geneve_tx_allowed(struct mlx5_core_dev *mdev)
+{
+ return mlx5_tx_swp_supported(mdev);
+}
+
+static inline void
+mlx5e_tx_tunnel_accel(struct sk_buff *skb, struct mlx5_wqe_eth_seg *eseg)
+{
+ struct mlx5e_swp_spec swp_spec = {};
+ unsigned int offset = 0;
+ __be16 l3_proto;
+ u8 l4_proto;
+
+ l3_proto = vlan_get_protocol(skb);
+ switch (l3_proto) {
+ case htons(ETH_P_IP):
+ l4_proto = ip_hdr(skb)->protocol;
+ break;
+ case htons(ETH_P_IPV6):
+ l4_proto = ipv6_find_hdr(skb, &offset, -1, NULL, NULL);
+ break;
+ default:
+ return;
+ }
+
+ if (l4_proto != IPPROTO_UDP ||
+ udp_hdr(skb)->dest != cpu_to_be16(GENEVE_UDP_PORT))
+ return;
+ swp_spec.l3_proto = l3_proto;
+ swp_spec.l4_proto = l4_proto;
+ swp_spec.is_tun = true;
+ if (inner_ip_hdr(skb)->version == 6) {
+ swp_spec.tun_l3_proto = htons(ETH_P_IPV6);
+ swp_spec.tun_l4_proto = inner_ipv6_hdr(skb)->nexthdr;
+ } else {
+ swp_spec.tun_l3_proto = htons(ETH_P_IP);
+ swp_spec.tun_l4_proto = inner_ip_hdr(skb)->protocol;
+ }
+
+ mlx5e_set_eseg_swp(skb, eseg, &swp_spec);
+}
+
+#else
+static inline bool mlx5_geneve_tx_allowed(struct mlx5_core_dev *mdev)
+{
+ return false;
+}
+
+#endif /* CONFIG_GENEVE */
+
static inline void
mlx5e_udp_gso_handle_tx_skb(struct sk_buff *skb)
{
struct mlx5_wqe_eth_seg *eseg, u8 mode,
struct xfrm_offload *xo)
{
- u8 proto;
+ struct mlx5e_swp_spec swp_spec = {};
/* Tunnel Mode:
* SWP: OutL3 InL3 InL4
* SWP: OutL3 InL4
* InL3
* Pkt: MAC IP ESP L4
- *
- * Offsets are in 2-byte words, counting from start of frame
*/
- eseg->swp_outer_l3_offset = skb_network_offset(skb) / 2;
- if (skb->protocol == htons(ETH_P_IPV6))
- eseg->swp_flags |= MLX5_ETH_WQE_SWP_OUTER_L3_IPV6;
-
- if (mode == XFRM_MODE_TUNNEL) {
- eseg->swp_inner_l3_offset = skb_inner_network_offset(skb) / 2;
+ swp_spec.l3_proto = skb->protocol;
+ swp_spec.is_tun = mode == XFRM_MODE_TUNNEL;
+ if (swp_spec.is_tun) {
if (xo->proto == IPPROTO_IPV6) {
- eseg->swp_flags |= MLX5_ETH_WQE_SWP_INNER_L3_IPV6;
- proto = inner_ipv6_hdr(skb)->nexthdr;
+ swp_spec.tun_l3_proto = htons(ETH_P_IPV6);
+ swp_spec.tun_l4_proto = inner_ipv6_hdr(skb)->nexthdr;
} else {
- proto = inner_ip_hdr(skb)->protocol;
+ swp_spec.tun_l3_proto = htons(ETH_P_IP);
+ swp_spec.tun_l4_proto = inner_ip_hdr(skb)->protocol;
}
} else {
- eseg->swp_inner_l3_offset = skb_network_offset(skb) / 2;
- if (skb->protocol == htons(ETH_P_IPV6))
- eseg->swp_flags |= MLX5_ETH_WQE_SWP_INNER_L3_IPV6;
- proto = xo->proto;
- }
- switch (proto) {
- case IPPROTO_UDP:
- eseg->swp_flags |= MLX5_ETH_WQE_SWP_INNER_L4_UDP;
- /* Fall through */
- case IPPROTO_TCP:
- eseg->swp_inner_l4_offset = skb_inner_transport_offset(skb) / 2;
- break;
+ swp_spec.tun_l3_proto = skb->protocol;
+ swp_spec.tun_l4_proto = xo->proto;
}
+
+ mlx5e_set_eseg_swp(skb, eseg, &swp_spec);
}
void mlx5e_ipsec_set_iv_esn(struct sk_buff *skb, struct xfrm_state *x,
*/
nskb->ip_summed = CHECKSUM_PARTIAL;
- nskb->xmit_more = 1;
nskb->queue_mapping = skb->queue_mapping;
}
sq->stats->tls_resync_bytes += nskb->len;
mlx5e_tls_complete_sync_skb(skb, nskb, tcp_seq, headln,
cpu_to_be64(info.rcd_sn));
- mlx5e_sq_xmit(sq, nskb, *wqe, *pi);
+ mlx5e_sq_xmit(sq, nskb, *wqe, *pi, true);
mlx5e_sq_fetch_wqe(sq, wqe, pi);
return skb;
#include <net/pkt_cls.h>
#include <linux/mlx5/fs.h>
#include <net/vxlan.h>
+#include <net/geneve.h>
#include <linux/bpf.h>
#include <linux/if_bridge.h>
#include <net/page_pool.h>
#include "en_rep.h"
#include "en_accel/ipsec.h"
#include "en_accel/ipsec_rxtx.h"
+#include "en_accel/en_accel.h"
#include "en_accel/tls.h"
#include "accel/ipsec.h"
#include "accel/tls.h"
void mlx5e_init_rq_type_params(struct mlx5_core_dev *mdev,
struct mlx5e_params *params)
{
- params->lro_wqe_sz = MLX5E_PARAMS_DEFAULT_LRO_WQE_SZ;
params->log_rq_mtu_frames = is_kdump_kernel() ?
MLX5E_PARAMS_MINIMUM_LOG_RQ_SIZE :
MLX5E_PARAMS_DEFAULT_LOG_RQ_SIZE;
{
void *sqc = param->sqc;
void *wq = MLX5_ADDR_OF(sqc, sqc, wq);
+ bool allow_swp;
+ allow_swp = mlx5_geneve_tx_allowed(priv->mdev) ||
+ !!MLX5_IPSEC_DEV(priv->mdev);
mlx5e_build_sq_param_common(priv, param);
MLX5_SET(wq, wq, log_wq_sz, params->log_sq_size);
- MLX5_SET(sqc, sqc, allow_swp, !!MLX5_IPSEC_DEV(priv->mdev));
+ MLX5_SET(sqc, sqc, allow_swp, allow_swp);
}
static void mlx5e_build_common_cq_param(struct mlx5e_priv *priv,
MLX5_TIRC_LRO_ENABLE_MASK_IPV4_LRO |
MLX5_TIRC_LRO_ENABLE_MASK_IPV6_LRO);
MLX5_SET(tirc, tirc, lro_max_ip_payload_size,
- (params->lro_wqe_sz - ROUGH_MAX_L2_L3_HDR_SZ) >> 8);
+ (MLX5E_PARAMS_DEFAULT_LRO_WQE_SZ - ROUGH_MAX_L2_L3_HDR_SZ) >> 8);
MLX5_SET(tirc, tirc, lro_timeout_period_usecs, params->lro_timeout);
}
return 0;
}
+void mlx5e_set_netdev_mtu_boundaries(struct mlx5e_priv *priv)
+{
+ struct mlx5e_params *params = &priv->channels.params;
+ struct net_device *netdev = priv->netdev;
+ struct mlx5_core_dev *mdev = priv->mdev;
+ u16 max_mtu;
+
+ /* MTU range: 68 - hw-specific max */
+ netdev->min_mtu = ETH_MIN_MTU;
+
+ mlx5_query_port_max_mtu(mdev, &max_mtu, 1);
+ netdev->max_mtu = min_t(unsigned int, MLX5E_HW2SW_MTU(params, max_mtu),
+ ETH_MAX_MTU);
+}
+
static void mlx5e_netdev_set_tcs(struct net_device *netdev)
{
struct mlx5e_priv *priv = netdev_priv(netdev);
/* Verify if UDP port is being offloaded by HW */
if (mlx5_vxlan_lookup_port(priv->mdev->vxlan, port))
return features;
+
+#if IS_ENABLED(CONFIG_GENEVE)
+ /* Support Geneve offload for default UDP port */
+ if (port == GENEVE_UDP_PORT && mlx5_geneve_tx_allowed(priv->mdev))
+ return features;
+#endif
}
out:
netdev->hw_features |= NETIF_F_HW_VLAN_CTAG_FILTER;
netdev->hw_features |= NETIF_F_HW_VLAN_STAG_TX;
- if (mlx5_vxlan_allowed(mdev->vxlan) || MLX5_CAP_ETH(mdev, tunnel_stateless_gre)) {
+ if (mlx5_vxlan_allowed(mdev->vxlan) || mlx5_geneve_tx_allowed(mdev) ||
+ MLX5_CAP_ETH(mdev, tunnel_stateless_gre)) {
netdev->hw_enc_features |= NETIF_F_IP_CSUM;
netdev->hw_enc_features |= NETIF_F_IPV6_CSUM;
netdev->hw_enc_features |= NETIF_F_TSO;
netdev->hw_enc_features |= NETIF_F_GSO_PARTIAL;
}
- if (mlx5_vxlan_allowed(mdev->vxlan)) {
+ if (mlx5_vxlan_allowed(mdev->vxlan) || mlx5_geneve_tx_allowed(mdev)) {
netdev->hw_features |= NETIF_F_GSO_UDP_TUNNEL |
NETIF_F_GSO_UDP_TUNNEL_CSUM;
netdev->hw_enc_features |= NETIF_F_GSO_UDP_TUNNEL |
{
struct net_device *netdev = priv->netdev;
struct mlx5_core_dev *mdev = priv->mdev;
- u16 max_mtu;
mlx5e_init_l2_addr(priv);
if (!netif_running(netdev))
mlx5_set_port_admin_status(mdev, MLX5_PORT_DOWN);
- /* MTU range: 68 - hw-specific max */
- netdev->min_mtu = ETH_MIN_MTU;
- mlx5_query_port_max_mtu(priv->mdev, &max_mtu, 1);
- netdev->max_mtu = MLX5E_HW2SW_MTU(&priv->channels.params, max_mtu);
+ mlx5e_set_netdev_mtu_boundaries(priv);
mlx5e_set_dev_port_mtu(priv);
mlx5_lag_add(mdev, netdev);
struct mlx5e_priv *priv = netdev_priv(rpriv->netdev);
struct net_device *netdev = netdev_notifier_info_to_dev(ptr);
- if (!mlx5e_tc_tun_device_to_offload(priv, netdev))
+ if (!mlx5e_tc_tun_device_to_offload(priv, netdev) &&
+ !is_vlan_dev(netdev))
return NOTIFY_OK;
switch (event) {
static void mlx5e_vf_rep_enable(struct mlx5e_priv *priv)
{
- struct net_device *netdev = priv->netdev;
- struct mlx5_core_dev *mdev = priv->mdev;
- u16 max_mtu;
-
- netdev->min_mtu = ETH_MIN_MTU;
- mlx5_query_port_max_mtu(mdev, &max_mtu, 1);
- netdev->max_mtu = MLX5E_HW2SW_MTU(&priv->channels.params, max_mtu);
+ mlx5e_set_netdev_mtu_boundaries(priv);
}
static int uplink_rep_async_event(struct notifier_block *nb, unsigned long event, void *data)
#include <net/tc_act/tc_pedit.h>
#include <net/tc_act/tc_csum.h>
#include <net/arp.h>
+#include <net/ipv6_stubs.h>
#include "en.h"
#include "en_rep.h"
#include "en_tc.h"
return 0;
}
+static void *get_match_headers_criteria(u32 flags,
+ struct mlx5_flow_spec *spec)
+{
+ return (flags & MLX5_FLOW_CONTEXT_ACTION_DECAP) ?
+ MLX5_ADDR_OF(fte_match_param, spec->match_criteria,
+ inner_headers) :
+ MLX5_ADDR_OF(fte_match_param, spec->match_criteria,
+ outer_headers);
+}
+
+static void *get_match_headers_value(u32 flags,
+ struct mlx5_flow_spec *spec)
+{
+ return (flags & MLX5_FLOW_CONTEXT_ACTION_DECAP) ?
+ MLX5_ADDR_OF(fte_match_param, spec->match_value,
+ inner_headers) :
+ MLX5_ADDR_OF(fte_match_param, spec->match_value,
+ outer_headers);
+}
+
static int __parse_cls_flower(struct mlx5e_priv *priv,
struct mlx5_flow_spec *spec,
struct tc_cls_flower_offload *f,
/* In decap flow, header pointers should point to the inner
* headers, outer header were already set by parse_tunnel_attr
*/
- headers_c = MLX5_ADDR_OF(fte_match_param, spec->match_criteria,
- inner_headers);
- headers_v = MLX5_ADDR_OF(fte_match_param, spec->match_value,
- inner_headers);
+ headers_c = get_match_headers_criteria(MLX5_FLOW_CONTEXT_ACTION_DECAP,
+ spec);
+ headers_v = get_match_headers_value(MLX5_FLOW_CONTEXT_ACTION_DECAP,
+ spec);
}
if (flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_BASIC)) {
if (match.mask->n_proto)
*match_level = MLX5_MATCH_L2;
}
-
- if (flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_VLAN)) {
+ if (flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_VLAN) ||
+ is_vlan_dev(filter_dev)) {
+ struct flow_dissector_key_vlan filter_dev_mask;
+ struct flow_dissector_key_vlan filter_dev_key;
struct flow_match_vlan match;
- flow_rule_match_vlan(rule, &match);
+ if (is_vlan_dev(filter_dev)) {
+ match.key = &filter_dev_key;
+ match.key->vlan_id = vlan_dev_vlan_id(filter_dev);
+ match.key->vlan_tpid = vlan_dev_vlan_proto(filter_dev);
+ match.key->vlan_priority = 0;
+ match.mask = &filter_dev_mask;
+ memset(match.mask, 0xff, sizeof(*match.mask));
+ match.mask->vlan_priority = 0;
+ } else {
+ flow_rule_match_vlan(rule, &match);
+ }
if (match.mask->vlan_id ||
match.mask->vlan_priority ||
match.mask->vlan_tpid) {
struct pedit_headers {
struct ethhdr eth;
+ struct vlan_hdr vlan;
struct iphdr ip4;
struct ipv6hdr ip6;
struct tcphdr tcp;
u8 field;
u8 size;
u32 offset;
+ u32 match_offset;
};
-#define OFFLOAD(fw_field, size, field, off) \
- {MLX5_ACTION_IN_FIELD_OUT_ ## fw_field, size, offsetof(struct pedit_headers, field) + (off)}
+#define OFFLOAD(fw_field, size, field, off, match_field) \
+ {MLX5_ACTION_IN_FIELD_OUT_ ## fw_field, size, \
+ offsetof(struct pedit_headers, field) + (off), \
+ MLX5_BYTE_OFF(fte_match_set_lyr_2_4, match_field)}
+
+static bool cmp_val_mask(void *valp, void *maskp, void *matchvalp,
+ void *matchmaskp, int size)
+{
+ bool same = false;
+
+ switch (size) {
+ case sizeof(u8):
+ same = ((*(u8 *)valp) & (*(u8 *)maskp)) ==
+ ((*(u8 *)matchvalp) & (*(u8 *)matchmaskp));
+ break;
+ case sizeof(u16):
+ same = ((*(u16 *)valp) & (*(u16 *)maskp)) ==
+ ((*(u16 *)matchvalp) & (*(u16 *)matchmaskp));
+ break;
+ case sizeof(u32):
+ same = ((*(u32 *)valp) & (*(u32 *)maskp)) ==
+ ((*(u32 *)matchvalp) & (*(u32 *)matchmaskp));
+ break;
+ }
+
+ return same;
+}
static struct mlx5_fields fields[] = {
- OFFLOAD(DMAC_47_16, 4, eth.h_dest[0], 0),
- OFFLOAD(DMAC_15_0, 2, eth.h_dest[4], 0),
- OFFLOAD(SMAC_47_16, 4, eth.h_source[0], 0),
- OFFLOAD(SMAC_15_0, 2, eth.h_source[4], 0),
- OFFLOAD(ETHERTYPE, 2, eth.h_proto, 0),
-
- OFFLOAD(IP_TTL, 1, ip4.ttl, 0),
- OFFLOAD(SIPV4, 4, ip4.saddr, 0),
- OFFLOAD(DIPV4, 4, ip4.daddr, 0),
-
- OFFLOAD(SIPV6_127_96, 4, ip6.saddr.s6_addr32[0], 0),
- OFFLOAD(SIPV6_95_64, 4, ip6.saddr.s6_addr32[1], 0),
- OFFLOAD(SIPV6_63_32, 4, ip6.saddr.s6_addr32[2], 0),
- OFFLOAD(SIPV6_31_0, 4, ip6.saddr.s6_addr32[3], 0),
- OFFLOAD(DIPV6_127_96, 4, ip6.daddr.s6_addr32[0], 0),
- OFFLOAD(DIPV6_95_64, 4, ip6.daddr.s6_addr32[1], 0),
- OFFLOAD(DIPV6_63_32, 4, ip6.daddr.s6_addr32[2], 0),
- OFFLOAD(DIPV6_31_0, 4, ip6.daddr.s6_addr32[3], 0),
- OFFLOAD(IPV6_HOPLIMIT, 1, ip6.hop_limit, 0),
-
- OFFLOAD(TCP_SPORT, 2, tcp.source, 0),
- OFFLOAD(TCP_DPORT, 2, tcp.dest, 0),
- OFFLOAD(TCP_FLAGS, 1, tcp.ack_seq, 5),
-
- OFFLOAD(UDP_SPORT, 2, udp.source, 0),
- OFFLOAD(UDP_DPORT, 2, udp.dest, 0),
+ OFFLOAD(DMAC_47_16, 4, eth.h_dest[0], 0, dmac_47_16),
+ OFFLOAD(DMAC_15_0, 2, eth.h_dest[4], 0, dmac_15_0),
+ OFFLOAD(SMAC_47_16, 4, eth.h_source[0], 0, smac_47_16),
+ OFFLOAD(SMAC_15_0, 2, eth.h_source[4], 0, smac_15_0),
+ OFFLOAD(ETHERTYPE, 2, eth.h_proto, 0, ethertype),
+ OFFLOAD(FIRST_VID, 2, vlan.h_vlan_TCI, 0, first_vid),
+
+ OFFLOAD(IP_TTL, 1, ip4.ttl, 0, ttl_hoplimit),
+ OFFLOAD(SIPV4, 4, ip4.saddr, 0, src_ipv4_src_ipv6.ipv4_layout.ipv4),
+ OFFLOAD(DIPV4, 4, ip4.daddr, 0, dst_ipv4_dst_ipv6.ipv4_layout.ipv4),
+
+ OFFLOAD(SIPV6_127_96, 4, ip6.saddr.s6_addr32[0], 0,
+ src_ipv4_src_ipv6.ipv6_layout.ipv6[0]),
+ OFFLOAD(SIPV6_95_64, 4, ip6.saddr.s6_addr32[1], 0,
+ src_ipv4_src_ipv6.ipv6_layout.ipv6[4]),
+ OFFLOAD(SIPV6_63_32, 4, ip6.saddr.s6_addr32[2], 0,
+ src_ipv4_src_ipv6.ipv6_layout.ipv6[8]),
+ OFFLOAD(SIPV6_31_0, 4, ip6.saddr.s6_addr32[3], 0,
+ src_ipv4_src_ipv6.ipv6_layout.ipv6[12]),
+ OFFLOAD(DIPV6_127_96, 4, ip6.daddr.s6_addr32[0], 0,
+ dst_ipv4_dst_ipv6.ipv6_layout.ipv6[0]),
+ OFFLOAD(DIPV6_95_64, 4, ip6.daddr.s6_addr32[1], 0,
+ dst_ipv4_dst_ipv6.ipv6_layout.ipv6[4]),
+ OFFLOAD(DIPV6_63_32, 4, ip6.daddr.s6_addr32[2], 0,
+ dst_ipv4_dst_ipv6.ipv6_layout.ipv6[8]),
+ OFFLOAD(DIPV6_31_0, 4, ip6.daddr.s6_addr32[3], 0,
+ dst_ipv4_dst_ipv6.ipv6_layout.ipv6[12]),
+ OFFLOAD(IPV6_HOPLIMIT, 1, ip6.hop_limit, 0, ttl_hoplimit),
+
+ OFFLOAD(TCP_SPORT, 2, tcp.source, 0, tcp_sport),
+ OFFLOAD(TCP_DPORT, 2, tcp.dest, 0, tcp_dport),
+ OFFLOAD(TCP_FLAGS, 1, tcp.ack_seq, 5, tcp_flags),
+
+ OFFLOAD(UDP_SPORT, 2, udp.source, 0, udp_sport),
+ OFFLOAD(UDP_DPORT, 2, udp.dest, 0, udp_dport),
};
/* On input attr->max_mod_hdr_actions tells how many HW actions can be parsed at
*/
static int offload_pedit_fields(struct pedit_headers_action *hdrs,
struct mlx5e_tc_flow_parse_attr *parse_attr,
+ u32 *action_flags,
struct netlink_ext_ack *extack)
{
struct pedit_headers *set_masks, *add_masks, *set_vals, *add_vals;
+ void *headers_c = get_match_headers_criteria(*action_flags,
+ &parse_attr->spec);
+ void *headers_v = get_match_headers_value(*action_flags,
+ &parse_attr->spec);
int i, action_size, nactions, max_actions, first, last, next_z;
void *s_masks_p, *a_masks_p, *vals_p;
struct mlx5_fields *f;
nactions = parse_attr->num_mod_hdr_actions;
for (i = 0; i < ARRAY_SIZE(fields); i++) {
+ bool skip;
+
f = &fields[i];
/* avoid seeing bits set from previous iterations */
s_mask = 0;
return -EOPNOTSUPP;
}
+ skip = false;
if (s_mask) {
+ void *match_mask = headers_c + f->match_offset;
+ void *match_val = headers_v + f->match_offset;
+
cmd = MLX5_ACTION_TYPE_SET;
mask = s_mask;
vals_p = (void *)set_vals + f->offset;
+ /* don't rewrite if we have a match on the same value */
+ if (cmp_val_mask(vals_p, s_masks_p, match_val,
+ match_mask, f->size))
+ skip = true;
/* clear to denote we consumed this field */
memset(s_masks_p, 0, f->size);
} else {
+ u32 zero = 0;
+
cmd = MLX5_ACTION_TYPE_ADD;
mask = a_mask;
vals_p = (void *)add_vals + f->offset;
+ /* add 0 is no change */
+ if (!memcmp(vals_p, &zero, f->size))
+ skip = true;
/* clear to denote we consumed this field */
memset(a_masks_p, 0, f->size);
}
+ if (skip)
+ continue;
field_bsize = f->size * BITS_PER_BYTE;
return 0;
}
+static int mlx5e_flow_namespace_max_modify_action(struct mlx5_core_dev *mdev,
+ int namespace)
+{
+ if (namespace == MLX5_FLOW_NAMESPACE_FDB) /* FDB offloading */
+ return MLX5_CAP_ESW_FLOWTABLE_FDB(mdev, max_modify_header_actions);
+ else /* namespace is MLX5_FLOW_NAMESPACE_KERNEL - NIC offloading */
+ return MLX5_CAP_FLOWTABLE_NIC_RX(mdev, max_modify_header_actions);
+}
+
static int alloc_mod_hdr_actions(struct mlx5e_priv *priv,
struct pedit_headers_action *hdrs,
int namespace,
hdrs[TCA_PEDIT_KEY_EX_CMD_ADD].pedits;
action_size = MLX5_UN_SZ_BYTES(set_action_in_add_action_in_auto);
- if (namespace == MLX5_FLOW_NAMESPACE_FDB) /* FDB offloading */
- max_actions = MLX5_CAP_ESW_FLOWTABLE_FDB(priv->mdev, max_modify_header_actions);
- else /* namespace is MLX5_FLOW_NAMESPACE_KERNEL - NIC offloading */
- max_actions = MLX5_CAP_FLOWTABLE_NIC_RX(priv->mdev, max_modify_header_actions);
-
+ max_actions = mlx5e_flow_namespace_max_modify_action(priv->mdev, namespace);
/* can get up to crazingly 16 HW actions in 32 bits pedit SW key */
max_actions = min(max_actions, nkeys * 16);
goto out_err;
}
+ if (!mlx5e_flow_namespace_max_modify_action(priv->mdev, namespace)) {
+ NL_SET_ERR_MSG_MOD(extack,
+ "The pedit offload action is not supported");
+ goto out_err;
+ }
+
mask = act->mangle.mask;
val = act->mangle.val;
offset = act->mangle.offset;
static int alloc_tc_pedit_action(struct mlx5e_priv *priv, int namespace,
struct mlx5e_tc_flow_parse_attr *parse_attr,
struct pedit_headers_action *hdrs,
+ u32 *action_flags,
struct netlink_ext_ack *extack)
{
struct pedit_headers *cmd_masks;
goto out_err;
}
- err = offload_pedit_fields(hdrs, parse_attr, extack);
+ err = offload_pedit_fields(hdrs, parse_attr, action_flags, extack);
if (err < 0)
goto out_dealloc_parsed_actions;
u8 ip_proto;
int i;
- if (actions & MLX5_FLOW_CONTEXT_ACTION_DECAP)
- headers_v = MLX5_ADDR_OF(fte_match_param, spec->match_value, inner_headers);
- else
- headers_v = MLX5_ADDR_OF(fte_match_param, spec->match_value, outer_headers);
-
+ headers_v = get_match_headers_value(actions, spec);
ethertype = MLX5_GET(fte_match_set_lyr_2_4, headers_v, ethertype);
/* for non-IP we only re-write MACs, so we're okay */
actions = flow->nic_attr->action;
if (flow->flags & MLX5E_TC_FLOW_EGRESS &&
- !(actions & MLX5_FLOW_CONTEXT_ACTION_DECAP))
+ !((actions & MLX5_FLOW_CONTEXT_ACTION_DECAP) ||
+ (actions & MLX5_FLOW_CONTEXT_ACTION_VLAN_POP)))
return false;
if (actions & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR)
return (fsystem_guid == psystem_guid);
}
+static int add_vlan_rewrite_action(struct mlx5e_priv *priv, int namespace,
+ const struct flow_action_entry *act,
+ struct mlx5e_tc_flow_parse_attr *parse_attr,
+ struct pedit_headers_action *hdrs,
+ u32 *action, struct netlink_ext_ack *extack)
+{
+ u16 mask16 = VLAN_VID_MASK;
+ u16 val16 = act->vlan.vid & VLAN_VID_MASK;
+ const struct flow_action_entry pedit_act = {
+ .id = FLOW_ACTION_MANGLE,
+ .mangle.htype = FLOW_ACT_MANGLE_HDR_TYPE_ETH,
+ .mangle.offset = offsetof(struct vlan_ethhdr, h_vlan_TCI),
+ .mangle.mask = ~(u32)be16_to_cpu(*(__be16 *)&mask16),
+ .mangle.val = (u32)be16_to_cpu(*(__be16 *)&val16),
+ };
+ u8 match_prio_mask, match_prio_val;
+ void *headers_c, *headers_v;
+ int err;
+
+ headers_c = get_match_headers_criteria(*action, &parse_attr->spec);
+ headers_v = get_match_headers_value(*action, &parse_attr->spec);
+
+ if (!(MLX5_GET(fte_match_set_lyr_2_4, headers_c, cvlan_tag) &&
+ MLX5_GET(fte_match_set_lyr_2_4, headers_v, cvlan_tag))) {
+ NL_SET_ERR_MSG_MOD(extack,
+ "VLAN rewrite action must have VLAN protocol match");
+ return -EOPNOTSUPP;
+ }
+
+ match_prio_mask = MLX5_GET(fte_match_set_lyr_2_4, headers_c, first_prio);
+ match_prio_val = MLX5_GET(fte_match_set_lyr_2_4, headers_v, first_prio);
+ if (act->vlan.prio != (match_prio_val & match_prio_mask)) {
+ NL_SET_ERR_MSG_MOD(extack,
+ "Changing VLAN prio is not supported");
+ return -EOPNOTSUPP;
+ }
+
+ err = parse_tc_pedit_action(priv, &pedit_act, namespace, parse_attr,
+ hdrs, NULL);
+ *action |= MLX5_FLOW_CONTEXT_ACTION_MOD_HDR;
+
+ return err;
+}
+
static int parse_tc_nic_actions(struct mlx5e_priv *priv,
struct flow_action *flow_action,
struct mlx5e_tc_flow_parse_attr *parse_attr,
action |= MLX5_FLOW_CONTEXT_ACTION_MOD_HDR |
MLX5_FLOW_CONTEXT_ACTION_FWD_DEST;
break;
+ case FLOW_ACTION_VLAN_MANGLE:
+ err = add_vlan_rewrite_action(priv,
+ MLX5_FLOW_NAMESPACE_KERNEL,
+ act, parse_attr, hdrs,
+ &action, extack);
+ if (err)
+ return err;
+
+ break;
case FLOW_ACTION_CSUM:
if (csum_offload_supported(priv, action,
act->csum_flags,
}
break;
default:
- return -EINVAL;
+ NL_SET_ERR_MSG_MOD(extack, "The offload action is not supported");
+ return -EOPNOTSUPP;
}
}
if (hdrs[TCA_PEDIT_KEY_EX_CMD_SET].pedits ||
hdrs[TCA_PEDIT_KEY_EX_CMD_ADD].pedits) {
err = alloc_tc_pedit_action(priv, MLX5_FLOW_NAMESPACE_KERNEL,
- parse_attr, hdrs, extack);
+ parse_attr, hdrs, &action, extack);
if (err)
return err;
+ /* in case all pedit actions are skipped, remove the MOD_HDR
+ * flag.
+ */
+ if (parse_attr->num_mod_hdr_actions == 0)
+ action &= ~MLX5_FLOW_CONTEXT_ACTION_MOD_HDR;
}
attr->action = action;
}
break;
default:
- /* action is FLOW_ACT_VLAN_MANGLE */
- return -EOPNOTSUPP;
+ return -EINVAL;
}
attr->total_vlan = vlan_idx + 1;
return 0;
}
+static int add_vlan_push_action(struct mlx5e_priv *priv,
+ struct mlx5_esw_flow_attr *attr,
+ struct net_device **out_dev,
+ u32 *action)
+{
+ struct net_device *vlan_dev = *out_dev;
+ struct flow_action_entry vlan_act = {
+ .id = FLOW_ACTION_VLAN_PUSH,
+ .vlan.vid = vlan_dev_vlan_id(vlan_dev),
+ .vlan.proto = vlan_dev_vlan_proto(vlan_dev),
+ .vlan.prio = 0,
+ };
+ int err;
+
+ err = parse_tc_vlan_action(priv, &vlan_act, attr, action);
+ if (err)
+ return err;
+
+ *out_dev = dev_get_by_index_rcu(dev_net(vlan_dev),
+ dev_get_iflink(vlan_dev));
+ if (is_vlan_dev(*out_dev))
+ err = add_vlan_push_action(priv, attr, out_dev, action);
+
+ return err;
+}
+
+static int add_vlan_pop_action(struct mlx5e_priv *priv,
+ struct mlx5_esw_flow_attr *attr,
+ u32 *action)
+{
+ int nest_level = vlan_get_encap_level(attr->parse_attr->filter_dev);
+ struct flow_action_entry vlan_act = {
+ .id = FLOW_ACTION_VLAN_POP,
+ };
+ int err = 0;
+
+ while (nest_level--) {
+ err = parse_tc_vlan_action(priv, &vlan_act, attr, action);
+ if (err)
+ return err;
+ }
+
+ return err;
+}
+
static int parse_tc_fdb_actions(struct mlx5e_priv *priv,
struct flow_action *flow_action,
- struct mlx5e_tc_flow_parse_attr *parse_attr,
struct mlx5e_tc_flow *flow,
struct netlink_ext_ack *extack)
{
struct pedit_headers_action hdrs[2] = {};
struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
struct mlx5_esw_flow_attr *attr = flow->esw_attr;
+ struct mlx5e_tc_flow_parse_attr *parse_attr = attr->parse_attr;
struct mlx5e_rep_priv *rpriv = priv->ppriv;
const struct ip_tunnel_info *info = NULL;
const struct flow_action_entry *act;
uplink_upper == out_dev)
out_dev = uplink_dev;
+ if (is_vlan_dev(out_dev)) {
+ err = add_vlan_push_action(priv, attr,
+ &out_dev,
+ &action);
+ if (err)
+ return err;
+ }
+ if (is_vlan_dev(parse_attr->filter_dev)) {
+ err = add_vlan_pop_action(priv, attr,
+ &action);
+ if (err)
+ return err;
+ }
+
if (!mlx5e_eswitch_rep(out_dev))
return -EOPNOTSUPP;
out_dev->ifindex;
parse_attr->tun_info[attr->out_count] = *info;
encap = false;
- attr->parse_attr = parse_attr;
attr->dests[attr->out_count].flags |=
MLX5_ESW_DEST_ENCAP;
attr->out_count++;
break;
case FLOW_ACTION_VLAN_PUSH:
case FLOW_ACTION_VLAN_POP:
- err = parse_tc_vlan_action(priv, act, attr, &action);
+ if (act->id == FLOW_ACTION_VLAN_PUSH &&
+ (action & MLX5_FLOW_CONTEXT_ACTION_VLAN_POP)) {
+ /* Replace vlan pop+push with vlan modify */
+ action &= ~MLX5_FLOW_CONTEXT_ACTION_VLAN_POP;
+ err = add_vlan_rewrite_action(priv,
+ MLX5_FLOW_NAMESPACE_FDB,
+ act, parse_attr, hdrs,
+ &action, extack);
+ } else {
+ err = parse_tc_vlan_action(priv, act, attr, &action);
+ }
+ if (err)
+ return err;
+
+ attr->split_count = attr->out_count;
+ break;
+ case FLOW_ACTION_VLAN_MANGLE:
+ err = add_vlan_rewrite_action(priv,
+ MLX5_FLOW_NAMESPACE_FDB,
+ act, parse_attr, hdrs,
+ &action, extack);
if (err)
return err;
break;
}
default:
- return -EINVAL;
+ NL_SET_ERR_MSG_MOD(extack, "The offload action is not supported");
+ return -EOPNOTSUPP;
}
}
if (hdrs[TCA_PEDIT_KEY_EX_CMD_SET].pedits ||
hdrs[TCA_PEDIT_KEY_EX_CMD_ADD].pedits) {
err = alloc_tc_pedit_action(priv, MLX5_FLOW_NAMESPACE_FDB,
- parse_attr, hdrs, extack);
+ parse_attr, hdrs, &action, extack);
if (err)
return err;
+ /* in case all pedit actions are skipped, remove the MOD_HDR
+ * flag. we might have set split_count either by pedit or
+ * pop/push. if there is no pop/push either, reset it too.
+ */
+ if (parse_attr->num_mod_hdr_actions == 0) {
+ action &= ~MLX5_FLOW_CONTEXT_ACTION_MOD_HDR;
+ if (!((action & MLX5_FLOW_CONTEXT_ACTION_VLAN_POP) ||
+ (action & MLX5_FLOW_CONTEXT_ACTION_VLAN_PUSH)))
+ attr->split_count = 0;
+ }
}
attr->action = action;
if (err)
goto err_free;
- err = parse_tc_fdb_actions(priv, &rule->action, parse_attr, flow, extack);
+ err = parse_tc_fdb_actions(priv, &rule->action, flow, extack);
if (err)
goto err_free;
#include <linux/tcp.h>
#include <linux/if_vlan.h>
+#include <net/geneve.h>
#include <net/dsfield.h>
#include "en.h"
#include "ipoib/ipoib.h"
#endif
u16 mlx5e_select_queue(struct net_device *dev, struct sk_buff *skb,
- struct net_device *sb_dev,
- select_queue_fallback_t fallback)
+ struct net_device *sb_dev)
{
+ int channel_ix = netdev_pick_tx(dev, skb, NULL);
struct mlx5e_priv *priv = netdev_priv(dev);
- int channel_ix = fallback(dev, skb, NULL);
u16 num_channels;
int up = 0;
static inline void
mlx5e_txwqe_complete(struct mlx5e_txqsq *sq, struct sk_buff *skb,
u8 opcode, u16 ds_cnt, u8 num_wqebbs, u32 num_bytes, u8 num_dma,
- struct mlx5e_tx_wqe_info *wi, struct mlx5_wqe_ctrl_seg *cseg)
+ struct mlx5e_tx_wqe_info *wi, struct mlx5_wqe_ctrl_seg *cseg,
+ bool xmit_more)
{
struct mlx5_wq_cyc *wq = &sq->wq;
sq->stats->stopped++;
}
- if (!skb->xmit_more || netif_xmit_stopped(sq->txq))
+ if (!xmit_more || netif_xmit_stopped(sq->txq))
mlx5e_notify_hw(wq, sq->pc, sq->uar_map, cseg);
}
#define INL_HDR_START_SZ (sizeof(((struct mlx5_wqe_eth_seg *)NULL)->inline_hdr.start))
netdev_tx_t mlx5e_sq_xmit(struct mlx5e_txqsq *sq, struct sk_buff *skb,
- struct mlx5e_tx_wqe *wqe, u16 pi)
+ struct mlx5e_tx_wqe *wqe, u16 pi, bool xmit_more)
{
struct mlx5_wq_cyc *wq = &sq->wq;
struct mlx5_wqe_ctrl_seg *cseg;
}
stats->bytes += num_bytes;
- stats->xmit_more += skb->xmit_more;
+ stats->xmit_more += netdev_xmit_more();
headlen = skb->len - ihs - skb->data_len;
ds_cnt += !!headlen;
eseg = &wqe->eth;
dseg = wqe->data;
+#if IS_ENABLED(CONFIG_GENEVE)
+ if (skb->encapsulation)
+ mlx5e_tx_tunnel_accel(skb, eseg);
+#endif
mlx5e_txwqe_build_eseg_csum(sq, skb, eseg);
eseg->mss = mss;
goto err_drop;
mlx5e_txwqe_complete(sq, skb, opcode, ds_cnt, num_wqebbs, num_bytes,
- num_dma, wi, cseg);
+ num_dma, wi, cseg, xmit_more);
return NETDEV_TX_OK;
if (unlikely(!skb))
return NETDEV_TX_OK;
- return mlx5e_sq_xmit(sq, skb, wqe, pi);
+ return mlx5e_sq_xmit(sq, skb, wqe, pi, netdev_xmit_more());
}
static void mlx5e_dump_error_cqe(struct mlx5e_txqsq *sq,
}
stats->bytes += num_bytes;
- stats->xmit_more += skb->xmit_more;
+ stats->xmit_more += netdev_xmit_more();
headlen = skb->len - ihs - skb->data_len;
ds_cnt += !!headlen;
goto err_drop;
mlx5e_txwqe_complete(sq, skb, opcode, ds_cnt, num_wqebbs, num_bytes,
- num_dma, wi, cseg);
+ num_dma, wi, cseg, false);
return NETDEV_TX_OK;
__raw_writel((__force u32)cpu_to_be32(val), addr);
/* We still want ordering, just not swabbing, so add a barrier */
- mb();
+ wmb();
}
EXPORT_SYMBOL(mlx5_eq_update_ci);
}
EXPORT_SYMBOL(mlx5_comp_irq_get_affinity_mask);
+#ifdef CONFIG_RFS_ACCEL
struct cpu_rmap *mlx5_eq_table_get_rmap(struct mlx5_core_dev *dev)
{
-#ifdef CONFIG_RFS_ACCEL
return dev->priv.eq_table->rmap;
-#else
- return NULL;
-#endif
}
+#endif
struct mlx5_eq_comp *mlx5_eqn2comp_eq(struct mlx5_core_dev *dev, int eqn)
{
return index;
}
+/* TODO: This mlx5e_tc function shouldn't be called by eswitch */
+void mlx5e_tc_clean_fdb_peer_flows(struct mlx5_eswitch *esw);
+
#else /* CONFIG_MLX5_ESWITCH */
/* eswitch API stubs */
static inline int mlx5_eswitch_init(struct mlx5_core_dev *dev) { return 0; }
int esw_offloads_init_reps(struct mlx5_eswitch *esw)
{
- int total_vfs = MLX5_TOTAL_VPORTS(esw->dev);
+ int total_vports = MLX5_TOTAL_VPORTS(esw->dev);
struct mlx5_core_dev *dev = esw->dev;
struct mlx5_eswitch_rep *rep;
u8 hw_id[ETH_ALEN], rep_type;
int vport;
- esw->offloads.vport_reps = kcalloc(total_vfs,
+ esw->offloads.vport_reps = kcalloc(total_vports,
sizeof(struct mlx5_eswitch_rep),
GFP_KERNEL);
if (!esw->offloads.vport_reps)
return 0;
}
-void mlx5e_tc_clean_fdb_peer_flows(struct mlx5_eswitch *esw);
-
static void mlx5_esw_offloads_unpair(struct mlx5_eswitch *esw)
{
mlx5e_tc_clean_fdb_peer_flows(esw);
{
int err;
- mutex_init(&esw->fdb_table.offloads.fdb_prio_lock);
-
err = esw_offloads_steering_init(esw, total_nvports);
if (err)
return err;
static int any_notifier(struct notifier_block *, unsigned long, void *);
static int temp_warn(struct notifier_block *, unsigned long, void *);
static int port_module(struct notifier_block *, unsigned long, void *);
+static int pcie_core(struct notifier_block *, unsigned long, void *);
/* handler which forwards the event to events->nh, driver notifiers */
static int forward_event(struct notifier_block *, unsigned long, void *);
{.nb.notifier_call = any_notifier, .event_type = MLX5_EVENT_TYPE_NOTIFY_ANY },
{.nb.notifier_call = temp_warn, .event_type = MLX5_EVENT_TYPE_TEMP_WARN_EVENT },
{.nb.notifier_call = port_module, .event_type = MLX5_EVENT_TYPE_PORT_MODULE_EVENT },
+ {.nb.notifier_call = pcie_core, .event_type = MLX5_EVENT_TYPE_GENERAL_EVENT },
/* Events to be forwarded (as is) to mlx5 core interfaces (mlx5e/mlx5_ib) */
{.nb.notifier_call = forward_event, .event_type = MLX5_EVENT_TYPE_PORT_CHANGE },
struct mlx5_events {
struct mlx5_core_dev *dev;
+ struct workqueue_struct *wq;
struct mlx5_event_nb notifiers[ARRAY_SIZE(events_nbs_ref)];
/* driver notifier chain */
struct atomic_notifier_head nh;
/* port module events stats */
struct mlx5_pme_stats pme_stats;
+ /*pcie_core*/
+ struct work_struct pcie_core_work;
};
static const char *eqe_type_str(u8 type)
return NOTIFY_OK;
}
+enum {
+ MLX5_PCI_POWER_COULD_NOT_BE_READ = 0x0,
+ MLX5_PCI_POWER_SUFFICIENT_REPORTED = 0x1,
+ MLX5_PCI_POWER_INSUFFICIENT_REPORTED = 0x2,
+};
+
+static void mlx5_pcie_event(struct work_struct *work)
+{
+ u32 out[MLX5_ST_SZ_DW(mpein_reg)] = {0};
+ u32 in[MLX5_ST_SZ_DW(mpein_reg)] = {0};
+ struct mlx5_events *events;
+ struct mlx5_core_dev *dev;
+ u8 power_status;
+ u16 pci_power;
+
+ events = container_of(work, struct mlx5_events, pcie_core_work);
+ dev = events->dev;
+
+ if (!MLX5_CAP_MCAM_FEATURE(dev, pci_status_and_power))
+ return;
+
+ mlx5_core_access_reg(dev, in, sizeof(in), out, sizeof(out),
+ MLX5_REG_MPEIN, 0, 0);
+ power_status = MLX5_GET(mpein_reg, out, pwr_status);
+ pci_power = MLX5_GET(mpein_reg, out, pci_power);
+
+ switch (power_status) {
+ case MLX5_PCI_POWER_COULD_NOT_BE_READ:
+ mlx5_core_info_rl(dev,
+ "PCIe slot power capability was not advertised.\n");
+ break;
+ case MLX5_PCI_POWER_INSUFFICIENT_REPORTED:
+ mlx5_core_warn_rl(dev,
+ "Detected insufficient power on the PCIe slot (%uW).\n",
+ pci_power);
+ break;
+ case MLX5_PCI_POWER_SUFFICIENT_REPORTED:
+ mlx5_core_info_rl(dev,
+ "PCIe slot advertised sufficient power (%uW).\n",
+ pci_power);
+ break;
+ }
+}
+
+static int pcie_core(struct notifier_block *nb, unsigned long type, void *data)
+{
+ struct mlx5_event_nb *event_nb = mlx5_nb_cof(nb,
+ struct mlx5_event_nb,
+ nb);
+ struct mlx5_events *events = event_nb->ctx;
+ struct mlx5_eqe *eqe = data;
+
+ switch (eqe->sub_type) {
+ case MLX5_GENERAL_SUBTYPE_PCI_POWER_CHANGE_EVENT:
+ queue_work(events->wq, &events->pcie_core_work);
+ break;
+ default:
+ return NOTIFY_DONE;
+ }
+
+ return NOTIFY_OK;
+}
+
void mlx5_get_pme_stats(struct mlx5_core_dev *dev, struct mlx5_pme_stats *stats)
{
*stats = dev->priv.events->pme_stats;
ATOMIC_INIT_NOTIFIER_HEAD(&events->nh);
events->dev = dev;
dev->priv.events = events;
+ events->wq = create_singlethread_workqueue("mlx5_events");
+ if (!events->wq)
+ return -ENOMEM;
+ INIT_WORK(&events->pcie_core_work, mlx5_pcie_event);
+
return 0;
}
void mlx5_events_cleanup(struct mlx5_core_dev *dev)
{
+ destroy_workqueue(dev->priv.events->wq);
kvfree(dev->priv.events);
}
for (i = ARRAY_SIZE(events_nbs_ref) - 1; i >= 0 ; i--)
mlx5_eq_notifier_unregister(dev, &events->notifiers[i].nb);
+ flush_workqueue(events->wq);
}
int mlx5_notifier_register(struct mlx5_core_dev *dev, struct notifier_block *nb)
*conn->qp.wq.sq.db = cpu_to_be32(conn->qp.sq.pc);
/* Make sure that doorbell record is visible before ringing */
wmb();
- mlx5_write64(wqe, conn->fdev->conn_res.uar->map + MLX5_BF_OFFSET, NULL);
+ mlx5_write64(wqe, conn->fdev->conn_res.uar->map + MLX5_BF_OFFSET);
}
static void mlx5_fpga_conn_post_send(struct mlx5_fpga_conn *conn,
#include <linux/mlx5/eq.h>
+#include "mlx5_core.h"
#include "lib/eq.h"
#include "fpga/cmd.h"
};
#define mlx5_fpga_dbg(__adev, format, ...) \
- dev_dbg(&(__adev)->mdev->pdev->dev, "FPGA: %s:%d:(pid %d): " format, \
- __func__, __LINE__, current->pid, ##__VA_ARGS__)
+ mlx5_core_dbg((__adev)->mdev, "FPGA: %s:%d:(pid %d): " format, \
+ __func__, __LINE__, current->pid, ##__VA_ARGS__)
#define mlx5_fpga_err(__adev, format, ...) \
- dev_err(&(__adev)->mdev->pdev->dev, "FPGA: %s:%d:(pid %d): " format, \
- __func__, __LINE__, current->pid, ##__VA_ARGS__)
+ mlx5_core_err((__adev)->mdev, "FPGA: %s:%d:(pid %d): " format, \
+ __func__, __LINE__, current->pid, ##__VA_ARGS__)
#define mlx5_fpga_warn(__adev, format, ...) \
- dev_warn(&(__adev)->mdev->pdev->dev, "FPGA: %s:%d:(pid %d): " format, \
- __func__, __LINE__, current->pid, ##__VA_ARGS__)
+ mlx5_core_warn((__adev)->mdev, "FPGA: %s:%d:(pid %d): " format, \
+ __func__, __LINE__, current->pid, ##__VA_ARGS__)
#define mlx5_fpga_warn_ratelimited(__adev, format, ...) \
- dev_warn_ratelimited(&(__adev)->mdev->pdev->dev, "FPGA: %s:%d: " \
- format, __func__, __LINE__, ##__VA_ARGS__)
+ mlx5_core_err_rl((__adev)->mdev, "FPGA: %s:%d: " \
+ format, __func__, __LINE__, ##__VA_ARGS__)
#define mlx5_fpga_notice(__adev, format, ...) \
- dev_notice(&(__adev)->mdev->pdev->dev, "FPGA: " format, ##__VA_ARGS__)
+ mlx5_core_info((__adev)->mdev, "FPGA: " format, ##__VA_ARGS__)
#define mlx5_fpga_info(__adev, format, ...) \
- dev_info(&(__adev)->mdev->pdev->dev, "FPGA: " format, ##__VA_ARGS__)
+ mlx5_core_info((__adev)->mdev, "FPGA: " format, ##__VA_ARGS__)
int mlx5_fpga_init(struct mlx5_core_dev *mdev);
void mlx5_fpga_cleanup(struct mlx5_core_dev *mdev);
struct mlx5_flow_root_namespace *root = find_root(&prio->node);
struct mlx5_ft_underlay_qp *uqp;
int min_level = INT_MAX;
- int err;
+ int err = 0;
u32 qpn;
if (root->root_ft)
nic_state = mlx5_get_nic_state(dev);
if (nic_state == MLX5_NIC_IFC_INVALID) {
- dev_err(&dev->pdev->dev, "health recovery flow aborted since the nic state is invalid\n");
+ mlx5_core_err(dev, "health recovery flow aborted since the nic state is invalid\n");
return;
}
- dev_err(&dev->pdev->dev, "starting health recovery flow\n");
+ mlx5_core_err(dev, "starting health recovery flow\n");
mlx5_recover_device(dev);
}
if (!test_bit(MLX5_DROP_NEW_RECOVERY_WORK, &health->flags))
schedule_delayed_work(&health->recover_work, recover_delay);
else
- dev_err(&dev->pdev->dev,
- "new health works are not permitted at this stage\n");
+ mlx5_core_err(dev,
+ "new health works are not permitted at this stage\n");
spin_unlock_irqrestore(&health->wq_lock, flags);
}
return;
for (i = 0; i < ARRAY_SIZE(h->assert_var); i++)
- dev_err(&dev->pdev->dev, "assert_var[%d] 0x%08x\n", i, ioread32be(h->assert_var + i));
+ mlx5_core_err(dev, "assert_var[%d] 0x%08x\n", i,
+ ioread32be(h->assert_var + i));
- dev_err(&dev->pdev->dev, "assert_exit_ptr 0x%08x\n", ioread32be(&h->assert_exit_ptr));
- dev_err(&dev->pdev->dev, "assert_callra 0x%08x\n", ioread32be(&h->assert_callra));
+ mlx5_core_err(dev, "assert_exit_ptr 0x%08x\n",
+ ioread32be(&h->assert_exit_ptr));
+ mlx5_core_err(dev, "assert_callra 0x%08x\n",
+ ioread32be(&h->assert_callra));
sprintf(fw_str, "%d.%d.%d", fw_rev_maj(dev), fw_rev_min(dev), fw_rev_sub(dev));
- dev_err(&dev->pdev->dev, "fw_ver %s\n", fw_str);
- dev_err(&dev->pdev->dev, "hw_id 0x%08x\n", ioread32be(&h->hw_id));
- dev_err(&dev->pdev->dev, "irisc_index %d\n", ioread8(&h->irisc_index));
- dev_err(&dev->pdev->dev, "synd 0x%x: %s\n", ioread8(&h->synd), hsynd_str(ioread8(&h->synd)));
- dev_err(&dev->pdev->dev, "ext_synd 0x%04x\n", ioread16be(&h->ext_synd));
+ mlx5_core_err(dev, "fw_ver %s\n", fw_str);
+ mlx5_core_err(dev, "hw_id 0x%08x\n", ioread32be(&h->hw_id));
+ mlx5_core_err(dev, "irisc_index %d\n", ioread8(&h->irisc_index));
+ mlx5_core_err(dev, "synd 0x%x: %s\n", ioread8(&h->synd),
+ hsynd_str(ioread8(&h->synd)));
+ mlx5_core_err(dev, "ext_synd 0x%04x\n", ioread16be(&h->ext_synd));
fw = ioread32be(&h->fw_ver);
- dev_err(&dev->pdev->dev, "raw fw_ver 0x%08x\n", fw);
+ mlx5_core_err(dev, "raw fw_ver 0x%08x\n", fw);
}
static unsigned long get_next_poll_jiffies(void)
if (!test_bit(MLX5_DROP_NEW_HEALTH_WORK, &health->flags))
queue_work(health->wq, &health->work);
else
- dev_err(&dev->pdev->dev,
- "new health works are not permitted at this stage\n");
+ mlx5_core_err(dev, "new health works are not permitted at this stage\n");
spin_unlock_irqrestore(&health->wq_lock, flags);
}
health->prev = count;
if (health->miss_counter == MAX_MISSES) {
- dev_err(&dev->pdev->dev, "device's health compromised - reached miss count\n");
+ mlx5_core_err(dev, "device's health compromised - reached miss count\n");
print_health_info(dev);
}
cancel_delayed_work_sync(&dev->priv.health.recover_work);
}
+void mlx5_health_flush(struct mlx5_core_dev *dev)
+{
+ struct mlx5_core_health *health = &dev->priv.health;
+
+ flush_workqueue(health->wq);
+}
+
void mlx5_health_cleanup(struct mlx5_core_dev *dev)
{
struct mlx5_core_health *health = &dev->priv.health;
return -ENOMEM;
strcpy(name, "mlx5_health");
- strcat(name, dev_name(&dev->pdev->dev));
+ strcat(name, dev->priv.name);
health->wq = create_singlethread_workqueue(name);
kfree(name);
if (!health->wq)
void *ppriv)
{
struct mlx5e_priv *priv = mlx5i_epriv(netdev);
- u16 max_mtu;
int err;
err = mlx5e_netdev_init(netdev, priv, mdev, profile, ppriv);
if (err)
return err;
- mlx5_query_port_max_mtu(mdev, &max_mtu, 1);
- netdev->mtu = max_mtu;
+ mlx5e_set_netdev_mtu_boundaries(priv);
+ netdev->mtu = netdev->max_mtu;
mlx5e_build_nic_params(mdev, &priv->rss_params, &priv->channels.params,
mlx5e_get_netdev_max_channels(netdev),
/* Handle add/replace event */
if (fi->fib_nhs == 1) {
if (__mlx5_lag_is_active(ldev)) {
- struct net_device *nh_dev = fi->fib_nh[0].nh_dev;
+ struct net_device *nh_dev = fi->fib_nh[0].fib_nh_dev;
int i = mlx5_lag_dev_get_netdev_idx(ldev, nh_dev);
mlx5_lag_set_port_affinity(ldev, ++i);
return;
/* Verify next hops are ports of the same hca */
- if (!(fi->fib_nh[0].nh_dev == ldev->pf[0].netdev &&
- fi->fib_nh[1].nh_dev == ldev->pf[1].netdev) &&
- !(fi->fib_nh[0].nh_dev == ldev->pf[1].netdev &&
- fi->fib_nh[1].nh_dev == ldev->pf[0].netdev)) {
+ if (!(fi->fib_nh[0].fib_nh_dev == ldev->pf[0].netdev &&
+ fi->fib_nh[1].fib_nh_dev == ldev->pf[1].netdev) &&
+ !(fi->fib_nh[0].fib_nh_dev == ldev->pf[1].netdev &&
+ fi->fib_nh[1].fib_nh_dev == ldev->pf[0].netdev)) {
mlx5_core_warn(ldev->pf[0].dev, "Multipath offload require two ports of the same HCA\n");
return;
}
/* nh added/removed */
if (event == FIB_EVENT_NH_DEL) {
- int i = mlx5_lag_dev_get_netdev_idx(ldev, fib_nh->nh_dev);
+ int i = mlx5_lag_dev_get_netdev_idx(ldev, fib_nh->fib_nh_dev);
if (i >= 0) {
i = (i + 1) % 2 + 1; /* peer port */
mlx5_query_port_tun_entropy(mdev, &entropy_flags);
tun_entropy->num_enabling_entries = 0;
tun_entropy->num_disabling_entries = 0;
- tun_entropy->enabled = entropy_flags.calc_enabled;
- tun_entropy->enabled =
- (entropy_flags.calc_supported) ?
- entropy_flags.calc_enabled : true;
+ tun_entropy->enabled = entropy_flags.calc_supported ?
+ entropy_flags.calc_enabled : true;
}
static int mlx5_set_entropy(struct mlx5_tun_entropy *tun_entropy,
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/mlx5/driver.h>
+#include <net/vxlan.h>
#include "mlx5_core.h"
#include "vxlan.h"
spin_lock_init(&vxlan->lock);
hash_init(vxlan->htable);
- /* Hardware adds 4789 by default */
- mlx5_vxlan_add_port(vxlan, 4789);
+ /* Hardware adds 4789 (IANA_VXLAN_UDP_PORT) by default */
+ mlx5_vxlan_add_port(vxlan, IANA_VXLAN_UDP_PORT);
return vxlan;
}
static int set_hca_cap(struct mlx5_core_dev *dev)
{
- struct pci_dev *pdev = dev->pdev;
int err;
err = handle_hca_cap(dev);
if (err) {
- dev_err(&pdev->dev, "handle_hca_cap failed\n");
+ mlx5_core_err(dev, "handle_hca_cap failed\n");
goto out;
}
err = handle_hca_cap_atomic(dev);
if (err) {
- dev_err(&pdev->dev, "handle_hca_cap_atomic failed\n");
+ mlx5_core_err(dev, "handle_hca_cap_atomic failed\n");
goto out;
}
err = handle_hca_cap_odp(dev);
if (err) {
- dev_err(&pdev->dev, "handle_hca_cap_odp failed\n");
+ mlx5_core_err(dev, "handle_hca_cap_odp failed\n");
goto out;
}
return -EOPNOTSUPP;
}
-static int mlx5_pci_init(struct mlx5_core_dev *dev, struct mlx5_priv *priv)
+static int mlx5_pci_init(struct mlx5_core_dev *dev, struct pci_dev *pdev,
+ const struct pci_device_id *id)
{
- struct pci_dev *pdev = dev->pdev;
+ struct mlx5_priv *priv = &dev->priv;
int err = 0;
- pci_set_drvdata(dev->pdev, dev);
- strncpy(priv->name, dev_name(&pdev->dev), MLX5_MAX_NAME_LEN);
- priv->name[MLX5_MAX_NAME_LEN - 1] = 0;
-
- mutex_init(&priv->pgdir_mutex);
- INIT_LIST_HEAD(&priv->pgdir_list);
- spin_lock_init(&priv->mkey_lock);
+ dev->pdev = pdev;
+ priv->pci_dev_data = id->driver_data;
- mutex_init(&priv->alloc_mutex);
+ pci_set_drvdata(dev->pdev, dev);
+ dev->bar_addr = pci_resource_start(pdev, 0);
priv->numa_node = dev_to_node(&dev->pdev->dev);
- if (mlx5_debugfs_root)
- priv->dbg_root =
- debugfs_create_dir(pci_name(pdev), mlx5_debugfs_root);
-
err = mlx5_pci_enable_device(dev);
if (err) {
- dev_err(&pdev->dev, "Cannot enable PCI device, aborting\n");
- goto err_dbg;
+ mlx5_core_err(dev, "Cannot enable PCI device, aborting\n");
+ return err;
}
err = request_bar(pdev);
if (err) {
- dev_err(&pdev->dev, "error requesting BARs, aborting\n");
+ mlx5_core_err(dev, "error requesting BARs, aborting\n");
goto err_disable;
}
err = set_dma_caps(pdev);
if (err) {
- dev_err(&pdev->dev, "Failed setting DMA capabilities mask, aborting\n");
+ mlx5_core_err(dev, "Failed setting DMA capabilities mask, aborting\n");
goto err_clr_master;
}
pci_enable_atomic_ops_to_root(pdev, PCI_EXP_DEVCAP2_ATOMIC_COMP128))
mlx5_core_dbg(dev, "Enabling pci atomics failed\n");
- dev->iseg_base = pci_resource_start(dev->pdev, 0);
+ dev->iseg_base = dev->bar_addr;
dev->iseg = ioremap(dev->iseg_base, sizeof(*dev->iseg));
if (!dev->iseg) {
err = -ENOMEM;
- dev_err(&pdev->dev, "Failed mapping initialization segment, aborting\n");
+ mlx5_core_err(dev, "Failed mapping initialization segment, aborting\n");
goto err_clr_master;
}
release_bar(dev->pdev);
err_disable:
mlx5_pci_disable_device(dev);
-
-err_dbg:
- debugfs_remove(priv->dbg_root);
return err;
}
-static void mlx5_pci_close(struct mlx5_core_dev *dev, struct mlx5_priv *priv)
+static void mlx5_pci_close(struct mlx5_core_dev *dev)
{
iounmap(dev->iseg);
pci_clear_master(dev->pdev);
release_bar(dev->pdev);
mlx5_pci_disable_device(dev);
- debugfs_remove_recursive(priv->dbg_root);
}
-static int mlx5_init_once(struct mlx5_core_dev *dev, struct mlx5_priv *priv)
+static int mlx5_init_once(struct mlx5_core_dev *dev)
{
- struct pci_dev *pdev = dev->pdev;
int err;
- priv->devcom = mlx5_devcom_register_device(dev);
- if (IS_ERR(priv->devcom))
- dev_err(&pdev->dev, "failed to register with devcom (0x%p)\n",
- priv->devcom);
+ dev->priv.devcom = mlx5_devcom_register_device(dev);
+ if (IS_ERR(dev->priv.devcom))
+ mlx5_core_err(dev, "failed to register with devcom (0x%p)\n",
+ dev->priv.devcom);
err = mlx5_query_board_id(dev);
if (err) {
- dev_err(&pdev->dev, "query board id failed\n");
+ mlx5_core_err(dev, "query board id failed\n");
goto err_devcom;
}
err = mlx5_eq_table_init(dev);
if (err) {
- dev_err(&pdev->dev, "failed to initialize eq\n");
+ mlx5_core_err(dev, "failed to initialize eq\n");
goto err_devcom;
}
err = mlx5_events_init(dev);
if (err) {
- dev_err(&pdev->dev, "failed to initialize events\n");
+ mlx5_core_err(dev, "failed to initialize events\n");
goto err_eq_cleanup;
}
err = mlx5_cq_debugfs_init(dev);
if (err) {
- dev_err(&pdev->dev, "failed to initialize cq debugfs\n");
+ mlx5_core_err(dev, "failed to initialize cq debugfs\n");
goto err_events_cleanup;
}
err = mlx5_init_rl_table(dev);
if (err) {
- dev_err(&pdev->dev, "Failed to init rate limiting\n");
+ mlx5_core_err(dev, "Failed to init rate limiting\n");
goto err_tables_cleanup;
}
err = mlx5_mpfs_init(dev);
if (err) {
- dev_err(&pdev->dev, "Failed to init l2 table %d\n", err);
+ mlx5_core_err(dev, "Failed to init l2 table %d\n", err);
goto err_rl_cleanup;
}
err = mlx5_eswitch_init(dev);
if (err) {
- dev_err(&pdev->dev, "Failed to init eswitch %d\n", err);
+ mlx5_core_err(dev, "Failed to init eswitch %d\n", err);
goto err_mpfs_cleanup;
}
err = mlx5_sriov_init(dev);
if (err) {
- dev_err(&pdev->dev, "Failed to init sriov %d\n", err);
+ mlx5_core_err(dev, "Failed to init sriov %d\n", err);
goto err_eswitch_cleanup;
}
err = mlx5_fpga_init(dev);
if (err) {
- dev_err(&pdev->dev, "Failed to init fpga device %d\n", err);
+ mlx5_core_err(dev, "Failed to init fpga device %d\n", err);
goto err_sriov_cleanup;
}
mlx5_devcom_unregister_device(dev->priv.devcom);
}
-static int mlx5_load_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv,
- bool boot)
+static int mlx5_function_setup(struct mlx5_core_dev *dev, bool boot)
{
- struct pci_dev *pdev = dev->pdev;
int err;
- dev->caps.embedded_cpu = mlx5_read_embedded_cpu(dev);
- mutex_lock(&dev->intf_state_mutex);
- if (test_bit(MLX5_INTERFACE_STATE_UP, &dev->intf_state)) {
- dev_warn(&dev->pdev->dev, "%s: interface is up, NOP\n",
- __func__);
- goto out;
- }
-
- dev_info(&pdev->dev, "firmware version: %d.%d.%d\n", fw_rev_maj(dev),
- fw_rev_min(dev), fw_rev_sub(dev));
+ mlx5_core_info(dev, "firmware version: %d.%d.%d\n", fw_rev_maj(dev),
+ fw_rev_min(dev), fw_rev_sub(dev));
/* Only PFs hold the relevant PCIe information for this query */
if (mlx5_core_is_pf(dev))
pcie_print_link_status(dev->pdev);
- /* on load removing any previous indication of internal error, device is
- * up
- */
- dev->state = MLX5_DEVICE_STATE_UP;
-
/* wait for firmware to accept initialization segments configurations
*/
err = wait_fw_init(dev, FW_PRE_INIT_TIMEOUT_MILI);
if (err) {
- dev_err(&dev->pdev->dev, "Firmware over %d MS in pre-initializing state, aborting\n",
- FW_PRE_INIT_TIMEOUT_MILI);
- goto out_err;
+ mlx5_core_err(dev, "Firmware over %d MS in pre-initializing state, aborting\n",
+ FW_PRE_INIT_TIMEOUT_MILI);
+ return err;
}
err = mlx5_cmd_init(dev);
if (err) {
- dev_err(&pdev->dev, "Failed initializing command interface, aborting\n");
- goto out_err;
+ mlx5_core_err(dev, "Failed initializing command interface, aborting\n");
+ return err;
}
err = wait_fw_init(dev, FW_INIT_TIMEOUT_MILI);
if (err) {
- dev_err(&dev->pdev->dev, "Firmware over %d MS in initializing state, aborting\n",
- FW_INIT_TIMEOUT_MILI);
+ mlx5_core_err(dev, "Firmware over %d MS in initializing state, aborting\n",
+ FW_INIT_TIMEOUT_MILI);
goto err_cmd_cleanup;
}
err = mlx5_core_enable_hca(dev, 0);
if (err) {
- dev_err(&pdev->dev, "enable hca failed\n");
+ mlx5_core_err(dev, "enable hca failed\n");
goto err_cmd_cleanup;
}
err = mlx5_core_set_issi(dev);
if (err) {
- dev_err(&pdev->dev, "failed to set issi\n");
+ mlx5_core_err(dev, "failed to set issi\n");
goto err_disable_hca;
}
err = mlx5_satisfy_startup_pages(dev, 1);
if (err) {
- dev_err(&pdev->dev, "failed to allocate boot pages\n");
+ mlx5_core_err(dev, "failed to allocate boot pages\n");
goto err_disable_hca;
}
err = set_hca_ctrl(dev);
if (err) {
- dev_err(&pdev->dev, "set_hca_ctrl failed\n");
+ mlx5_core_err(dev, "set_hca_ctrl failed\n");
goto reclaim_boot_pages;
}
err = set_hca_cap(dev);
if (err) {
- dev_err(&pdev->dev, "set_hca_cap failed\n");
+ mlx5_core_err(dev, "set_hca_cap failed\n");
goto reclaim_boot_pages;
}
err = mlx5_satisfy_startup_pages(dev, 0);
if (err) {
- dev_err(&pdev->dev, "failed to allocate init pages\n");
+ mlx5_core_err(dev, "failed to allocate init pages\n");
goto reclaim_boot_pages;
}
err = mlx5_cmd_init_hca(dev, sw_owner_id);
if (err) {
- dev_err(&pdev->dev, "init hca failed\n");
+ mlx5_core_err(dev, "init hca failed\n");
goto reclaim_boot_pages;
}
err = mlx5_query_hca_caps(dev);
if (err) {
- dev_err(&pdev->dev, "query hca failed\n");
- goto err_stop_poll;
+ mlx5_core_err(dev, "query hca failed\n");
+ goto stop_health;
}
- if (boot) {
- err = mlx5_init_once(dev, priv);
- if (err) {
- dev_err(&pdev->dev, "sw objs init failed\n");
- goto err_stop_poll;
- }
+ return 0;
+
+stop_health:
+ mlx5_stop_health_poll(dev, boot);
+reclaim_boot_pages:
+ mlx5_reclaim_startup_pages(dev);
+err_disable_hca:
+ mlx5_core_disable_hca(dev, 0);
+err_cmd_cleanup:
+ mlx5_cmd_cleanup(dev);
+
+ return err;
+}
+
+static int mlx5_function_teardown(struct mlx5_core_dev *dev, bool boot)
+{
+ int err;
+
+ mlx5_stop_health_poll(dev, boot);
+ err = mlx5_cmd_teardown_hca(dev);
+ if (err) {
+ mlx5_core_err(dev, "tear_down_hca failed, skip cleanup\n");
+ return err;
}
+ mlx5_reclaim_startup_pages(dev);
+ mlx5_core_disable_hca(dev, 0);
+ mlx5_cmd_cleanup(dev);
+
+ return 0;
+}
+
+static int mlx5_load(struct mlx5_core_dev *dev)
+{
+ int err;
dev->priv.uar = mlx5_get_uars_page(dev);
if (IS_ERR(dev->priv.uar)) {
- dev_err(&pdev->dev, "Failed allocating uar, aborting\n");
+ mlx5_core_err(dev, "Failed allocating uar, aborting\n");
err = PTR_ERR(dev->priv.uar);
- goto err_get_uars;
+ return err;
}
mlx5_events_start(dev);
err = mlx5_eq_table_create(dev);
if (err) {
- dev_err(&pdev->dev, "Failed to create EQs\n");
+ mlx5_core_err(dev, "Failed to create EQs\n");
goto err_eq_table;
}
err = mlx5_fw_tracer_init(dev->tracer);
if (err) {
- dev_err(&pdev->dev, "Failed to init FW tracer\n");
+ mlx5_core_err(dev, "Failed to init FW tracer\n");
goto err_fw_tracer;
}
err = mlx5_fpga_device_start(dev);
if (err) {
- dev_err(&pdev->dev, "fpga device start failed %d\n", err);
+ mlx5_core_err(dev, "fpga device start failed %d\n", err);
goto err_fpga_start;
}
err = mlx5_accel_ipsec_init(dev);
if (err) {
- dev_err(&pdev->dev, "IPSec device start failed %d\n", err);
+ mlx5_core_err(dev, "IPSec device start failed %d\n", err);
goto err_ipsec_start;
}
err = mlx5_accel_tls_init(dev);
if (err) {
- dev_err(&pdev->dev, "TLS device start failed %d\n", err);
+ mlx5_core_err(dev, "TLS device start failed %d\n", err);
goto err_tls_start;
}
err = mlx5_init_fs(dev);
if (err) {
- dev_err(&pdev->dev, "Failed to init flow steering\n");
+ mlx5_core_err(dev, "Failed to init flow steering\n");
goto err_fs;
}
err = mlx5_core_set_hca_defaults(dev);
if (err) {
- dev_err(&pdev->dev, "Failed to set hca defaults\n");
+ mlx5_core_err(dev, "Failed to set hca defaults\n");
goto err_fs;
}
err = mlx5_sriov_attach(dev);
if (err) {
- dev_err(&pdev->dev, "sriov init failed %d\n", err);
+ mlx5_core_err(dev, "sriov init failed %d\n", err);
goto err_sriov;
}
err = mlx5_ec_init(dev);
if (err) {
- dev_err(&pdev->dev, "Failed to init embedded CPU\n");
+ mlx5_core_err(dev, "Failed to init embedded CPU\n");
goto err_ec;
}
- if (mlx5_device_registered(dev)) {
- mlx5_attach_device(dev);
- } else {
- err = mlx5_register_device(dev);
- if (err) {
- dev_err(&pdev->dev, "mlx5_register_device failed %d\n", err);
- goto err_reg_dev;
- }
- }
-
- set_bit(MLX5_INTERFACE_STATE_UP, &dev->intf_state);
-out:
- mutex_unlock(&dev->intf_state_mutex);
-
return 0;
-err_reg_dev:
- mlx5_ec_cleanup(dev);
-
err_ec:
mlx5_sriov_detach(dev);
-
err_sriov:
mlx5_cleanup_fs(dev);
-
err_fs:
mlx5_accel_tls_cleanup(dev);
-
err_tls_start:
mlx5_accel_ipsec_cleanup(dev);
-
err_ipsec_start:
mlx5_fpga_device_stop(dev);
-
err_fpga_start:
mlx5_fw_tracer_cleanup(dev->tracer);
-
err_fw_tracer:
mlx5_eq_table_destroy(dev);
-
err_eq_table:
mlx5_pagealloc_stop(dev);
mlx5_events_stop(dev);
- mlx5_put_uars_page(dev, priv->uar);
+ mlx5_put_uars_page(dev, dev->priv.uar);
+ return err;
+}
-err_get_uars:
- if (boot)
- mlx5_cleanup_once(dev);
+static void mlx5_unload(struct mlx5_core_dev *dev)
+{
+ mlx5_ec_cleanup(dev);
+ mlx5_sriov_detach(dev);
+ mlx5_cleanup_fs(dev);
+ mlx5_accel_ipsec_cleanup(dev);
+ mlx5_accel_tls_cleanup(dev);
+ mlx5_fpga_device_stop(dev);
+ mlx5_fw_tracer_cleanup(dev->tracer);
+ mlx5_eq_table_destroy(dev);
+ mlx5_pagealloc_stop(dev);
+ mlx5_events_stop(dev);
+ mlx5_put_uars_page(dev, dev->priv.uar);
+}
-err_stop_poll:
- mlx5_stop_health_poll(dev, boot);
- if (mlx5_cmd_teardown_hca(dev)) {
- dev_err(&dev->pdev->dev, "tear_down_hca failed, skip cleanup\n");
- goto out_err;
+static int mlx5_load_one(struct mlx5_core_dev *dev, bool boot)
+{
+ int err = 0;
+
+ dev->caps.embedded_cpu = mlx5_read_embedded_cpu(dev);
+ mutex_lock(&dev->intf_state_mutex);
+ if (test_bit(MLX5_INTERFACE_STATE_UP, &dev->intf_state)) {
+ mlx5_core_warn(dev, "interface is up, NOP\n");
+ goto out;
}
+ /* remove any previous indication of internal error */
+ dev->state = MLX5_DEVICE_STATE_UP;
-reclaim_boot_pages:
- mlx5_reclaim_startup_pages(dev);
+ err = mlx5_function_setup(dev, boot);
+ if (err)
+ goto out;
-err_disable_hca:
- mlx5_core_disable_hca(dev, 0);
+ if (boot) {
+ err = mlx5_init_once(dev);
+ if (err) {
+ mlx5_core_err(dev, "sw objs init failed\n");
+ goto function_teardown;
+ }
+ }
-err_cmd_cleanup:
- mlx5_cmd_cleanup(dev);
+ err = mlx5_load(dev);
+ if (err)
+ goto err_load;
-out_err:
+ if (mlx5_device_registered(dev)) {
+ mlx5_attach_device(dev);
+ } else {
+ err = mlx5_register_device(dev);
+ if (err) {
+ mlx5_core_err(dev, "register device failed %d\n", err);
+ goto err_reg_dev;
+ }
+ }
+
+ set_bit(MLX5_INTERFACE_STATE_UP, &dev->intf_state);
+out:
+ mutex_unlock(&dev->intf_state_mutex);
+
+ return err;
+
+err_reg_dev:
+ mlx5_unload(dev);
+err_load:
+ if (boot)
+ mlx5_cleanup_once(dev);
+function_teardown:
+ mlx5_function_teardown(dev, boot);
dev->state = MLX5_DEVICE_STATE_INTERNAL_ERROR;
mutex_unlock(&dev->intf_state_mutex);
return err;
}
-static int mlx5_unload_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv,
- bool cleanup)
+static int mlx5_unload_one(struct mlx5_core_dev *dev, bool cleanup)
{
int err = 0;
mutex_lock(&dev->intf_state_mutex);
if (!test_bit(MLX5_INTERFACE_STATE_UP, &dev->intf_state)) {
- dev_warn(&dev->pdev->dev, "%s: interface is down, NOP\n",
- __func__);
+ mlx5_core_warn(dev, "%s: interface is down, NOP\n",
+ __func__);
if (cleanup)
mlx5_cleanup_once(dev);
goto out;
if (mlx5_device_registered(dev))
mlx5_detach_device(dev);
- mlx5_ec_cleanup(dev);
- mlx5_sriov_detach(dev);
- mlx5_cleanup_fs(dev);
- mlx5_accel_ipsec_cleanup(dev);
- mlx5_accel_tls_cleanup(dev);
- mlx5_fpga_device_stop(dev);
- mlx5_fw_tracer_cleanup(dev->tracer);
- mlx5_eq_table_destroy(dev);
- mlx5_pagealloc_stop(dev);
- mlx5_events_stop(dev);
- mlx5_put_uars_page(dev, priv->uar);
+ mlx5_unload(dev);
+
if (cleanup)
mlx5_cleanup_once(dev);
- mlx5_stop_health_poll(dev, cleanup);
-
- err = mlx5_cmd_teardown_hca(dev);
- if (err) {
- dev_err(&dev->pdev->dev, "tear_down_hca failed, skip cleanup\n");
- goto out;
- }
- mlx5_reclaim_startup_pages(dev);
- mlx5_core_disable_hca(dev, 0);
- mlx5_cmd_cleanup(dev);
+ mlx5_function_teardown(dev, cleanup);
out:
mutex_unlock(&dev->intf_state_mutex);
return err;
#endif
};
-#define MLX5_IB_MOD "mlx5_ib"
-static int init_one(struct pci_dev *pdev,
- const struct pci_device_id *id)
+static int mlx5_mdev_init(struct mlx5_core_dev *dev, int profile_idx, const char *name)
{
- struct mlx5_core_dev *dev;
- struct devlink *devlink;
- struct mlx5_priv *priv;
+ struct mlx5_priv *priv = &dev->priv;
int err;
- devlink = devlink_alloc(&mlx5_devlink_ops, sizeof(*dev));
- if (!devlink) {
- dev_err(&pdev->dev, "kzalloc failed\n");
- return -ENOMEM;
- }
-
- dev = devlink_priv(devlink);
- priv = &dev->priv;
- priv->pci_dev_data = id->driver_data;
-
- pci_set_drvdata(pdev, dev);
+ strncpy(priv->name, name, MLX5_MAX_NAME_LEN);
+ priv->name[MLX5_MAX_NAME_LEN - 1] = 0;
- dev->pdev = pdev;
- dev->profile = &profile[prof_sel];
+ dev->profile = &profile[profile_idx];
INIT_LIST_HEAD(&priv->ctx_list);
spin_lock_init(&priv->ctx_lock);
INIT_LIST_HEAD(&priv->bfregs.reg_head.list);
INIT_LIST_HEAD(&priv->bfregs.wc_head.list);
- err = mlx5_pci_init(dev, priv);
- if (err) {
- dev_err(&pdev->dev, "mlx5_pci_init failed with error code %d\n", err);
- goto clean_dev;
+ mutex_init(&priv->alloc_mutex);
+ mutex_init(&priv->pgdir_mutex);
+ INIT_LIST_HEAD(&priv->pgdir_list);
+ spin_lock_init(&priv->mkey_lock);
+
+ priv->dbg_root = debugfs_create_dir(name, mlx5_debugfs_root);
+ if (!priv->dbg_root) {
+ pr_err("mlx5_core: %s error, Cannot create debugfs dir, aborting\n", name);
+ return -ENOMEM;
}
err = mlx5_health_init(dev);
- if (err) {
- dev_err(&pdev->dev, "mlx5_health_init failed with error code %d\n", err);
- goto close_pci;
- }
+ if (err)
+ goto err_health_init;
err = mlx5_pagealloc_init(dev);
if (err)
goto err_pagealloc_init;
- err = mlx5_load_one(dev, priv, true);
+ return 0;
+
+err_pagealloc_init:
+ mlx5_health_cleanup(dev);
+err_health_init:
+ debugfs_remove(dev->priv.dbg_root);
+
+ return err;
+}
+
+static void mlx5_mdev_uninit(struct mlx5_core_dev *dev)
+{
+ mlx5_pagealloc_cleanup(dev);
+ mlx5_health_cleanup(dev);
+ debugfs_remove_recursive(dev->priv.dbg_root);
+}
+
+#define MLX5_IB_MOD "mlx5_ib"
+static int init_one(struct pci_dev *pdev, const struct pci_device_id *id)
+{
+ struct mlx5_core_dev *dev;
+ struct devlink *devlink;
+ int err;
+
+ devlink = devlink_alloc(&mlx5_devlink_ops, sizeof(*dev));
+ if (!devlink) {
+ dev_err(&pdev->dev, "kzalloc failed\n");
+ return -ENOMEM;
+ }
+
+ dev = devlink_priv(devlink);
+
+ err = mlx5_mdev_init(dev, prof_sel, dev_name(&pdev->dev));
+ if (err)
+ goto mdev_init_err;
+
+ err = mlx5_pci_init(dev, pdev, id);
+ if (err) {
+ mlx5_core_err(dev, "mlx5_pci_init failed with error code %d\n",
+ err);
+ goto pci_init_err;
+ }
+
+ err = mlx5_load_one(dev, true);
if (err) {
- dev_err(&pdev->dev, "mlx5_load_one failed with error code %d\n", err);
+ mlx5_core_err(dev, "mlx5_load_one failed with error code %d\n",
+ err);
goto err_load_one;
}
return 0;
clean_load:
- mlx5_unload_one(dev, priv, true);
+ mlx5_unload_one(dev, true);
+
err_load_one:
- mlx5_pagealloc_cleanup(dev);
-err_pagealloc_init:
- mlx5_health_cleanup(dev);
-close_pci:
- mlx5_pci_close(dev, priv);
-clean_dev:
+ mlx5_pci_close(dev);
+pci_init_err:
+ mlx5_mdev_uninit(dev);
+mdev_init_err:
devlink_free(devlink);
return err;
{
struct mlx5_core_dev *dev = pci_get_drvdata(pdev);
struct devlink *devlink = priv_to_devlink(dev);
- struct mlx5_priv *priv = &dev->priv;
devlink_unregister(devlink);
mlx5_unregister_device(dev);
- if (mlx5_unload_one(dev, priv, true)) {
- dev_err(&dev->pdev->dev, "mlx5_unload_one failed\n");
- mlx5_health_cleanup(dev);
+ if (mlx5_unload_one(dev, true)) {
+ mlx5_core_err(dev, "mlx5_unload_one failed\n");
+ mlx5_health_flush(dev);
return;
}
- mlx5_pagealloc_cleanup(dev);
- mlx5_health_cleanup(dev);
- mlx5_pci_close(dev, priv);
+ mlx5_pci_close(dev);
+ mlx5_mdev_uninit(dev);
devlink_free(devlink);
}
pci_channel_state_t state)
{
struct mlx5_core_dev *dev = pci_get_drvdata(pdev);
- struct mlx5_priv *priv = &dev->priv;
- dev_info(&pdev->dev, "%s was called\n", __func__);
+ mlx5_core_info(dev, "%s was called\n", __func__);
mlx5_enter_error_state(dev, false);
- mlx5_unload_one(dev, priv, false);
+ mlx5_unload_one(dev, false);
/* In case of kernel call drain the health wq */
if (state) {
mlx5_drain_health_wq(dev);
count = ioread32be(health->health_counter);
if (count && count != 0xffffffff) {
if (last_count && last_count != count) {
- dev_info(&pdev->dev, "Counter value 0x%x after %d iterations\n", count, i);
+ mlx5_core_info(dev,
+ "wait vital counter value 0x%x after %d iterations\n",
+ count, i);
return 0;
}
last_count = count;
struct mlx5_core_dev *dev = pci_get_drvdata(pdev);
int err;
- dev_info(&pdev->dev, "%s was called\n", __func__);
+ mlx5_core_info(dev, "%s was called\n", __func__);
err = mlx5_pci_enable_device(dev);
if (err) {
- dev_err(&pdev->dev, "%s: mlx5_pci_enable_device failed with error code: %d\n"
- , __func__, err);
+ mlx5_core_err(dev, "%s: mlx5_pci_enable_device failed with error code: %d\n",
+ __func__, err);
return PCI_ERS_RESULT_DISCONNECT;
}
pci_save_state(pdev);
if (wait_vital(pdev)) {
- dev_err(&pdev->dev, "%s: wait_vital timed out\n", __func__);
+ mlx5_core_err(dev, "%s: wait_vital timed out\n", __func__);
return PCI_ERS_RESULT_DISCONNECT;
}
static void mlx5_pci_resume(struct pci_dev *pdev)
{
struct mlx5_core_dev *dev = pci_get_drvdata(pdev);
- struct mlx5_priv *priv = &dev->priv;
int err;
- dev_info(&pdev->dev, "%s was called\n", __func__);
+ mlx5_core_info(dev, "%s was called\n", __func__);
- err = mlx5_load_one(dev, priv, false);
+ err = mlx5_load_one(dev, false);
if (err)
- dev_err(&pdev->dev, "%s: mlx5_load_one failed with error code: %d\n"
- , __func__, err);
+ mlx5_core_err(dev, "%s: mlx5_load_one failed with error code: %d\n",
+ __func__, err);
else
- dev_info(&pdev->dev, "%s: device recovered\n", __func__);
+ mlx5_core_info(dev, "%s: device recovered\n", __func__);
}
static const struct pci_error_handlers mlx5_err_handler = {
static void shutdown(struct pci_dev *pdev)
{
struct mlx5_core_dev *dev = pci_get_drvdata(pdev);
- struct mlx5_priv *priv = &dev->priv;
int err;
- dev_info(&pdev->dev, "Shutdown was called\n");
+ mlx5_core_info(dev, "Shutdown was called\n");
err = mlx5_try_fast_unload(dev);
if (err)
- mlx5_unload_one(dev, priv, false);
+ mlx5_unload_one(dev, false);
mlx5_pci_disable_device(dev);
}
extern uint mlx5_core_debug_mask;
#define mlx5_core_dbg(__dev, format, ...) \
- dev_dbg(&(__dev)->pdev->dev, "%s:%d:(pid %d): " format, \
+ pr_debug("%s:%s:%d:(pid %d): " format, (__dev)->priv.name, \
__func__, __LINE__, current->pid, \
##__VA_ARGS__)
#define mlx5_core_dbg_once(__dev, format, ...) \
- dev_dbg_once(&(__dev)->pdev->dev, "%s:%d:(pid %d): " format, \
+ pr_debug_once("%s:%s:%d:(pid %d): " format, (__dev)->priv.name, \
__func__, __LINE__, current->pid, \
##__VA_ARGS__)
} while (0)
#define mlx5_core_err(__dev, format, ...) \
- dev_err(&(__dev)->pdev->dev, "%s:%d:(pid %d): " format, \
+ pr_err("%s:%s:%d:(pid %d): " format, (__dev)->priv.name, \
__func__, __LINE__, current->pid, \
##__VA_ARGS__)
-#define mlx5_core_err_rl(__dev, format, ...) \
- dev_err_ratelimited(&(__dev)->pdev->dev, \
- "%s:%d:(pid %d): " format, \
- __func__, __LINE__, current->pid, \
+#define mlx5_core_err_rl(__dev, format, ...) \
+ pr_err_ratelimited("%s:%s:%d:(pid %d): " format, (__dev)->priv.name, \
+ __func__, __LINE__, current->pid, \
##__VA_ARGS__)
#define mlx5_core_warn(__dev, format, ...) \
- dev_warn(&(__dev)->pdev->dev, "%s:%d:(pid %d): " format, \
+ pr_warn("%s:%s:%d:(pid %d): " format, (__dev)->priv.name, \
__func__, __LINE__, current->pid, \
##__VA_ARGS__)
#define mlx5_core_warn_once(__dev, format, ...) \
- dev_warn_once(&(__dev)->pdev->dev, "%s:%d:(pid %d): " format, \
+ pr_warn_once("%s:%s:%d:(pid %d): " format, (__dev)->priv.name, \
__func__, __LINE__, current->pid, \
##__VA_ARGS__)
+#define mlx5_core_warn_rl(__dev, format, ...) \
+ pr_warn_ratelimited("%s:%s:%d:(pid %d): " format, (__dev)->priv.name, \
+ __func__, __LINE__, current->pid, \
+ ##__VA_ARGS__)
+
#define mlx5_core_info(__dev, format, ...) \
- dev_info(&(__dev)->pdev->dev, format, ##__VA_ARGS__)
+ pr_info("%s " format, (__dev)->priv.name, ##__VA_ARGS__)
+
+#define mlx5_core_info_rl(__dev, format, ...) \
+ pr_info_ratelimited("%s:%s:%d:(pid %d): " format, (__dev)->priv.name, \
+ __func__, __LINE__, current->pid, \
+ ##__VA_ARGS__)
enum {
MLX5_CMD_DATA, /* print command payload only */
int mlx5_sriov_attach(struct mlx5_core_dev *dev);
void mlx5_sriov_detach(struct mlx5_core_dev *dev);
int mlx5_core_sriov_configure(struct pci_dev *dev, int num_vfs);
-bool mlx5_sriov_is_enabled(struct mlx5_core_dev *dev);
int mlx5_core_enable_hca(struct mlx5_core_dev *dev, u16 func_id);
int mlx5_core_disable_hca(struct mlx5_core_dev *dev, u16 func_id);
int mlx5_create_scheduling_element_cmd(struct mlx5_core_dev *dev, u8 hierarchy,
void mlx5e_init(void);
void mlx5e_cleanup(void);
+static inline bool mlx5_sriov_is_enabled(struct mlx5_core_dev *dev)
+{
+ return pci_num_vf(dev->pdev) ? true : false;
+}
+
static inline int mlx5_lag_is_lacp_owner(struct mlx5_core_dev *dev)
{
/* LACP owner conditions:
#include "mlx5_core.h"
#include "eswitch.h"
-bool mlx5_sriov_is_enabled(struct mlx5_core_dev *dev)
-{
- struct mlx5_core_sriov *sriov = &dev->priv.sriov;
-
- return !!sriov->num_vfs;
-}
-
static int sriov_restore_guids(struct mlx5_core_dev *dev, int vf)
{
struct mlx5_core_sriov *sriov = &dev->priv.sriov;
mlx5_core_warn(dev, "timeout reclaiming VFs pages\n");
}
-static int mlx5_pci_enable_sriov(struct pci_dev *pdev, int num_vfs)
-{
- struct mlx5_core_dev *dev = pci_get_drvdata(pdev);
- int err = 0;
-
- if (pci_num_vf(pdev)) {
- mlx5_core_warn(dev, "Unable to enable pci sriov, already enabled\n");
- return -EBUSY;
- }
-
- err = pci_enable_sriov(pdev, num_vfs);
- if (err)
- mlx5_core_warn(dev, "pci_enable_sriov failed : %d\n", err);
-
- return err;
-}
-
-static void mlx5_pci_disable_sriov(struct pci_dev *pdev)
-{
- pci_disable_sriov(pdev);
-}
-
static int mlx5_sriov_enable(struct pci_dev *pdev, int num_vfs)
{
struct mlx5_core_dev *dev = pci_get_drvdata(pdev);
- struct mlx5_core_sriov *sriov = &dev->priv.sriov;
- int err = 0;
+ int err;
err = mlx5_device_enable_sriov(dev, num_vfs);
if (err) {
return err;
}
- err = mlx5_pci_enable_sriov(pdev, num_vfs);
+ err = pci_enable_sriov(pdev, num_vfs);
if (err) {
- mlx5_core_warn(dev, "mlx5_pci_enable_sriov failed : %d\n", err);
+ mlx5_core_warn(dev, "pci_enable_sriov failed : %d\n", err);
mlx5_device_disable_sriov(dev);
- return err;
}
-
- sriov->num_vfs = num_vfs;
-
- return 0;
+ return err;
}
static void mlx5_sriov_disable(struct pci_dev *pdev)
{
struct mlx5_core_dev *dev = pci_get_drvdata(pdev);
- struct mlx5_core_sriov *sriov = &dev->priv.sriov;
- mlx5_pci_disable_sriov(pdev);
+ pci_disable_sriov(pdev);
mlx5_device_disable_sriov(dev);
- sriov->num_vfs = 0;
}
int mlx5_core_sriov_configure(struct pci_dev *pdev, int num_vfs)
{
struct mlx5_core_dev *dev = pci_get_drvdata(pdev);
+ struct mlx5_core_sriov *sriov = &dev->priv.sriov;
int err = 0;
mlx5_core_dbg(dev, "requested num_vfs %d\n", num_vfs);
- if (!mlx5_core_is_pf(dev))
- return -EPERM;
if (num_vfs)
err = mlx5_sriov_enable(pdev, num_vfs);
else
mlx5_sriov_disable(pdev);
+ if (!err)
+ sriov->num_vfs = num_vfs;
return err ? err : num_vfs;
}
else
system_page_index = index;
- return (pci_resource_start(mdev->pdev, 0) >> PAGE_SHIFT) + system_page_index;
+ return (mdev->bar_addr >> PAGE_SHIFT) + system_page_index;
}
static void up_rel_func(struct kref *kref)
config MLXSW_CORE
tristate "Mellanox Technologies Switch ASICs support"
+ select NET_DEVLINK
---help---
This driver supports Mellanox Technologies Switch ASICs family.
pool_type, p_cur, p_max);
}
+static int
+mlxsw_devlink_info_get(struct devlink *devlink, struct devlink_info_req *req,
+ struct netlink_ext_ack *extack)
+{
+ struct mlxsw_core *mlxsw_core = devlink_priv(devlink);
+ char fw_info_psid[MLXSW_REG_MGIR_FW_INFO_PSID_SIZE];
+ u32 hw_rev, fw_major, fw_minor, fw_sub_minor;
+ char mgir_pl[MLXSW_REG_MGIR_LEN];
+ char buf[32];
+ int err;
+
+ err = devlink_info_driver_name_put(req,
+ mlxsw_core->bus_info->device_kind);
+ if (err)
+ return err;
+
+ mlxsw_reg_mgir_pack(mgir_pl);
+ err = mlxsw_reg_query(mlxsw_core, MLXSW_REG(mgir), mgir_pl);
+ if (err)
+ return err;
+ mlxsw_reg_mgir_unpack(mgir_pl, &hw_rev, fw_info_psid, &fw_major,
+ &fw_minor, &fw_sub_minor);
+
+ sprintf(buf, "%X", hw_rev);
+ err = devlink_info_version_fixed_put(req, "hw.revision", buf);
+ if (err)
+ return err;
+
+ err = devlink_info_version_fixed_put(req, "fw.psid", fw_info_psid);
+ if (err)
+ return err;
+
+ sprintf(buf, "%d.%d.%d", fw_major, fw_minor, fw_sub_minor);
+ err = devlink_info_version_running_put(req, "fw.version", buf);
+ if (err)
+ return err;
+
+ return 0;
+}
+
static int mlxsw_devlink_core_bus_device_reload(struct devlink *devlink,
struct netlink_ext_ack *extack)
{
.sb_occ_max_clear = mlxsw_devlink_sb_occ_max_clear,
.sb_occ_port_pool_get = mlxsw_devlink_sb_occ_port_pool_get,
.sb_occ_tc_port_bind_get = mlxsw_devlink_sb_occ_tc_port_bind_get,
+ .info_get = mlxsw_devlink_info_get,
};
static int
}
EXPORT_SYMBOL(mlxsw_core_res_get);
-int mlxsw_core_port_init(struct mlxsw_core *mlxsw_core, u8 local_port)
+int mlxsw_core_port_init(struct mlxsw_core *mlxsw_core, u8 local_port,
+ u32 port_number, bool split,
+ u32 split_port_subnumber,
+ const unsigned char *switch_id,
+ unsigned char switch_id_len)
{
struct devlink *devlink = priv_to_devlink(mlxsw_core);
struct mlxsw_core_port *mlxsw_core_port =
int err;
mlxsw_core_port->local_port = local_port;
+ devlink_port_attrs_set(devlink_port, DEVLINK_PORT_FLAVOUR_PHYSICAL,
+ port_number, split, split_port_subnumber,
+ switch_id, switch_id_len);
err = devlink_port_register(devlink, devlink_port, local_port);
if (err)
memset(mlxsw_core_port, 0, sizeof(*mlxsw_core_port));
EXPORT_SYMBOL(mlxsw_core_port_fini);
void mlxsw_core_port_eth_set(struct mlxsw_core *mlxsw_core, u8 local_port,
- void *port_driver_priv, struct net_device *dev,
- u32 port_number, bool split,
- u32 split_port_subnumber)
+ void *port_driver_priv, struct net_device *dev)
{
struct mlxsw_core_port *mlxsw_core_port =
&mlxsw_core->ports[local_port];
struct devlink_port *devlink_port = &mlxsw_core_port->devlink_port;
mlxsw_core_port->port_driver_priv = port_driver_priv;
- devlink_port_attrs_set(devlink_port, DEVLINK_PORT_FLAVOUR_PHYSICAL,
- port_number, split, split_port_subnumber);
devlink_port_type_eth_set(devlink_port, dev);
}
EXPORT_SYMBOL(mlxsw_core_port_eth_set);
}
EXPORT_SYMBOL(mlxsw_core_port_type_get);
-int mlxsw_core_port_get_phys_port_name(struct mlxsw_core *mlxsw_core,
- u8 local_port, char *name, size_t len)
+
+struct devlink_port *
+mlxsw_core_port_devlink_port_get(struct mlxsw_core *mlxsw_core,
+ u8 local_port)
{
struct mlxsw_core_port *mlxsw_core_port =
&mlxsw_core->ports[local_port];
struct devlink_port *devlink_port = &mlxsw_core_port->devlink_port;
- return devlink_port_get_phys_port_name(devlink_port, name, len);
+ return devlink_port;
}
-EXPORT_SYMBOL(mlxsw_core_port_get_phys_port_name);
+EXPORT_SYMBOL(mlxsw_core_port_devlink_port_get);
static void mlxsw_core_buf_dump_dbg(struct mlxsw_core *mlxsw_core,
const char *buf, size_t size)
u16 lag_id, u8 local_port);
void *mlxsw_core_port_driver_priv(struct mlxsw_core_port *mlxsw_core_port);
-int mlxsw_core_port_init(struct mlxsw_core *mlxsw_core, u8 local_port);
+int mlxsw_core_port_init(struct mlxsw_core *mlxsw_core, u8 local_port,
+ u32 port_number, bool split,
+ u32 split_port_subnumber,
+ const unsigned char *switch_id,
+ unsigned char switch_id_len);
void mlxsw_core_port_fini(struct mlxsw_core *mlxsw_core, u8 local_port);
void mlxsw_core_port_eth_set(struct mlxsw_core *mlxsw_core, u8 local_port,
- void *port_driver_priv, struct net_device *dev,
- u32 port_number, bool split,
- u32 split_port_subnumber);
+ void *port_driver_priv, struct net_device *dev);
void mlxsw_core_port_ib_set(struct mlxsw_core *mlxsw_core, u8 local_port,
void *port_driver_priv);
void mlxsw_core_port_clear(struct mlxsw_core *mlxsw_core, u8 local_port,
void *port_driver_priv);
enum devlink_port_type mlxsw_core_port_type_get(struct mlxsw_core *mlxsw_core,
u8 local_port);
-int mlxsw_core_port_get_phys_port_name(struct mlxsw_core *mlxsw_core,
- u8 local_port, char *name, size_t len);
+struct devlink_port *
+mlxsw_core_port_devlink_port_get(struct mlxsw_core *mlxsw_core,
+ u8 local_port);
int mlxsw_core_schedule_dw(struct delayed_work *dwork, unsigned long delay);
bool mlxsw_core_schedule_work(struct work_struct *work);
return 0;
}
-static int
-mlxsw_m_port_get_phys_port_name(struct net_device *dev, char *name, size_t len)
-{
- struct mlxsw_m_port *mlxsw_m_port = netdev_priv(dev);
- struct mlxsw_core *core = mlxsw_m_port->mlxsw_m->core;
- u8 local_port = mlxsw_m_port->local_port;
-
- return mlxsw_core_port_get_phys_port_name(core, local_port, name, len);
-}
-
-static int mlxsw_m_port_get_port_parent_id(struct net_device *dev,
- struct netdev_phys_item_id *ppid)
+static struct devlink_port *
+mlxsw_m_port_get_devlink_port(struct net_device *dev)
{
struct mlxsw_m_port *mlxsw_m_port = netdev_priv(dev);
struct mlxsw_m *mlxsw_m = mlxsw_m_port->mlxsw_m;
- ppid->id_len = sizeof(mlxsw_m->base_mac);
- memcpy(&ppid->id, &mlxsw_m->base_mac, ppid->id_len);
-
- return 0;
+ return mlxsw_core_port_devlink_port_get(mlxsw_m->core,
+ mlxsw_m_port->local_port);
}
static const struct net_device_ops mlxsw_m_port_netdev_ops = {
.ndo_open = mlxsw_m_port_dummy_open_stop,
.ndo_stop = mlxsw_m_port_dummy_open_stop,
- .ndo_get_phys_port_name = mlxsw_m_port_get_phys_port_name,
- .ndo_get_port_parent_id = mlxsw_m_port_get_port_parent_id,
+ .ndo_get_devlink_port = mlxsw_m_port_get_devlink_port,
};
static int mlxsw_m_get_module_info(struct net_device *netdev,
struct net_device *dev;
int err;
- err = mlxsw_core_port_init(mlxsw_m->core, local_port);
+ err = mlxsw_core_port_init(mlxsw_m->core, local_port,
+ module + 1, false, 0,
+ mlxsw_m->base_mac,
+ sizeof(mlxsw_m->base_mac));
if (err) {
dev_err(mlxsw_m->bus_info->dev, "Port %d: Failed to init core port\n",
local_port);
}
mlxsw_core_port_eth_set(mlxsw_m->core, mlxsw_m_port->local_port,
- mlxsw_m_port, dev, module + 1, false, 0);
+ mlxsw_m_port, dev);
return 0;
mlxsw_reg_mpar_pa_id_set(payload, pa_id);
}
+/* MGIR - Management General Information Register
+ * ----------------------------------------------
+ * MGIR register allows software to query the hardware and firmware general
+ * information.
+ */
+#define MLXSW_REG_MGIR_ID 0x9020
+#define MLXSW_REG_MGIR_LEN 0x9C
+
+MLXSW_REG_DEFINE(mgir, MLXSW_REG_MGIR_ID, MLXSW_REG_MGIR_LEN);
+
+/* reg_mgir_hw_info_device_hw_revision
+ * Access: RO
+ */
+MLXSW_ITEM32(reg, mgir, hw_info_device_hw_revision, 0x0, 16, 16);
+
+#define MLXSW_REG_MGIR_FW_INFO_PSID_SIZE 16
+
+/* reg_mgir_fw_info_psid
+ * PSID (ASCII string).
+ * Access: RO
+ */
+MLXSW_ITEM_BUF(reg, mgir, fw_info_psid, 0x30, MLXSW_REG_MGIR_FW_INFO_PSID_SIZE);
+
+/* reg_mgir_fw_info_extended_major
+ * Access: RO
+ */
+MLXSW_ITEM32(reg, mgir, fw_info_extended_major, 0x44, 0, 32);
+
+/* reg_mgir_fw_info_extended_minor
+ * Access: RO
+ */
+MLXSW_ITEM32(reg, mgir, fw_info_extended_minor, 0x48, 0, 32);
+
+/* reg_mgir_fw_info_extended_sub_minor
+ * Access: RO
+ */
+MLXSW_ITEM32(reg, mgir, fw_info_extended_sub_minor, 0x4C, 0, 32);
+
+static inline void mlxsw_reg_mgir_pack(char *payload)
+{
+ MLXSW_REG_ZERO(mgir, payload);
+}
+
+static inline void
+mlxsw_reg_mgir_unpack(char *payload, u32 *hw_rev, char *fw_info_psid,
+ u32 *fw_major, u32 *fw_minor, u32 *fw_sub_minor)
+{
+ *hw_rev = mlxsw_reg_mgir_hw_info_device_hw_revision_get(payload);
+ mlxsw_reg_mgir_fw_info_psid_memcpy_from(payload, fw_info_psid);
+ *fw_major = mlxsw_reg_mgir_fw_info_extended_major_get(payload);
+ *fw_minor = mlxsw_reg_mgir_fw_info_extended_minor_get(payload);
+ *fw_sub_minor = mlxsw_reg_mgir_fw_info_extended_sub_minor_get(payload);
+}
+
/* MRSR - Management Reset and Shutdown Register
* ---------------------------------------------
* MRSR register is used to reset or shutdown the switch or
MLXSW_REG(mcia),
MLXSW_REG(mpat),
MLXSW_REG(mpar),
+ MLXSW_REG(mgir),
MLXSW_REG(mrsr),
MLXSW_REG(mlcr),
MLXSW_REG(mpsc),
return 0;
}
-static int mlxsw_sp_port_get_phys_port_name(struct net_device *dev, char *name,
- size_t len)
-{
- struct mlxsw_sp_port *mlxsw_sp_port = netdev_priv(dev);
-
- return mlxsw_core_port_get_phys_port_name(mlxsw_sp_port->mlxsw_sp->core,
- mlxsw_sp_port->local_port,
- name, len);
-}
-
static struct mlxsw_sp_port_mall_tc_entry *
mlxsw_sp_port_mall_tc_entry_find(struct mlxsw_sp_port *port,
unsigned long cookie) {
mlxsw_sp_feature_hw_tc);
}
-static int mlxsw_sp_port_get_port_parent_id(struct net_device *dev,
- struct netdev_phys_item_id *ppid)
+static struct devlink_port *
+mlxsw_sp_port_get_devlink_port(struct net_device *dev)
{
struct mlxsw_sp_port *mlxsw_sp_port = netdev_priv(dev);
struct mlxsw_sp *mlxsw_sp = mlxsw_sp_port->mlxsw_sp;
- ppid->id_len = sizeof(mlxsw_sp->base_mac);
- memcpy(&ppid->id, &mlxsw_sp->base_mac, ppid->id_len);
-
- return 0;
+ return mlxsw_core_port_devlink_port_get(mlxsw_sp->core,
+ mlxsw_sp_port->local_port);
}
static const struct net_device_ops mlxsw_sp_port_netdev_ops = {
.ndo_get_offload_stats = mlxsw_sp_port_get_offload_stats,
.ndo_vlan_rx_add_vid = mlxsw_sp_port_add_vid,
.ndo_vlan_rx_kill_vid = mlxsw_sp_port_kill_vid,
- .ndo_get_phys_port_name = mlxsw_sp_port_get_phys_port_name,
.ndo_set_features = mlxsw_sp_set_features,
- .ndo_get_port_parent_id = mlxsw_sp_port_get_port_parent_id,
+ .ndo_get_devlink_port = mlxsw_sp_port_get_devlink_port,
};
static void mlxsw_sp_port_get_drvinfo(struct net_device *dev,
struct net_device *dev;
int err;
- err = mlxsw_core_port_init(mlxsw_sp->core, local_port);
+ err = mlxsw_core_port_init(mlxsw_sp->core, local_port,
+ module + 1, split, lane / width,
+ mlxsw_sp->base_mac,
+ sizeof(mlxsw_sp->base_mac));
if (err) {
dev_err(mlxsw_sp->bus_info->dev, "Port %d: Failed to init core port\n",
local_port);
}
mlxsw_core_port_eth_set(mlxsw_sp->core, mlxsw_sp_port->local_port,
- mlxsw_sp_port, dev, module + 1,
- mlxsw_sp_port->split, lane / width);
+ mlxsw_sp_port, dev);
mlxsw_core_schedule_dw(&mlxsw_sp_port->periodic_hw_stats.update_dw, 0);
return 0;
struct mlxsw_sp_acl_tcam_rehash_ctx ctx;
} rehash;
struct mlxsw_sp *mlxsw_sp;
- bool failed_rollback; /* Indicates failed rollback during migration */
unsigned int ref_count;
};
struct mlxsw_sp_acl_tcam_chunk *new_chunk;
new_chunk = mlxsw_sp_acl_tcam_chunk_create(mlxsw_sp, vchunk, region);
- if (IS_ERR(new_chunk)) {
- if (ctx->this_is_rollback)
- vchunk->vregion->failed_rollback = true;
+ if (IS_ERR(new_chunk))
return PTR_ERR(new_chunk);
- }
vchunk->chunk2 = vchunk->chunk;
vchunk->chunk = new_chunk;
ctx->current_vchunk = vchunk;
err = mlxsw_sp_acl_tcam_ventry_migrate(mlxsw_sp, ventry,
vchunk->chunk, credits);
if (err) {
- if (ctx->this_is_rollback)
+ if (ctx->this_is_rollback) {
+ /* Save the ventry which we ended with and try
+ * to continue later on.
+ */
+ ctx->start_ventry = ventry;
return err;
+ }
/* Swap the chunk and chunk2 pointers so the follow-up
* rollback call will see the original chunk pointer
* in vchunk->chunk.
ctx->this_is_rollback = true;
err2 = mlxsw_sp_acl_tcam_vchunk_migrate_all(mlxsw_sp, vregion,
ctx, credits);
- if (err2)
- vregion->failed_rollback = true;
+ if (err2) {
+ trace_mlxsw_sp_acl_tcam_vregion_rehash_rollback_failed(mlxsw_sp,
+ vregion);
+ dev_err(mlxsw_sp->bus_info->dev, "Failed to rollback during vregion migration fail\n");
+ /* Let the rollback to be continued later on. */
+ }
}
mutex_unlock(&vregion->lock);
trace_mlxsw_sp_acl_tcam_vregion_migrate_end(mlxsw_sp, vregion);
int err;
trace_mlxsw_sp_acl_tcam_vregion_rehash(mlxsw_sp, vregion);
- if (vregion->failed_rollback)
- return -EBUSY;
hints_priv = ops->region_rehash_hints_get(vregion->region->priv);
if (IS_ERR(hints_priv))
struct mlxsw_sp_acl_tcam_region *unused_region = vregion->region2;
const struct mlxsw_sp_acl_tcam_ops *ops = mlxsw_sp->acl_tcam_ops;
- if (!vregion->failed_rollback) {
- vregion->region2 = NULL;
- mlxsw_sp_acl_tcam_group_region_detach(mlxsw_sp, unused_region);
- mlxsw_sp_acl_tcam_region_destroy(mlxsw_sp, unused_region);
- }
+ vregion->region2 = NULL;
+ mlxsw_sp_acl_tcam_group_region_detach(mlxsw_sp, unused_region);
+ mlxsw_sp_acl_tcam_region_destroy(mlxsw_sp, unused_region);
ops->region_rehash_hints_put(ctx->hints_priv);
ctx->hints_priv = NULL;
}
ctx, credits);
if (err) {
dev_err(mlxsw_sp->bus_info->dev, "Failed to migrate vregion\n");
- if (vregion->failed_rollback) {
- trace_mlxsw_sp_acl_tcam_vregion_rehash_dis(mlxsw_sp,
- vregion);
- dev_err(mlxsw_sp->bus_info->dev, "Failed to rollback during vregion migration fail\n");
- }
}
if (*credits >= 0)
MLXSW_REG_RAUHT_OP_WRITE_DELETE;
}
-static void
+static int
mlxsw_sp_router_neigh_entry_op4(struct mlxsw_sp *mlxsw_sp,
struct mlxsw_sp_neigh_entry *neigh_entry,
enum mlxsw_reg_rauht_op op)
if (neigh_entry->counter_valid)
mlxsw_reg_rauht_pack_counter(rauht_pl,
neigh_entry->counter_index);
- mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(rauht), rauht_pl);
+ return mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(rauht), rauht_pl);
}
-static void
+static int
mlxsw_sp_router_neigh_entry_op6(struct mlxsw_sp *mlxsw_sp,
struct mlxsw_sp_neigh_entry *neigh_entry,
enum mlxsw_reg_rauht_op op)
if (neigh_entry->counter_valid)
mlxsw_reg_rauht_pack_counter(rauht_pl,
neigh_entry->counter_index);
- mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(rauht), rauht_pl);
+ return mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(rauht), rauht_pl);
}
bool mlxsw_sp_neigh_ipv6_ignore(struct mlxsw_sp_neigh_entry *neigh_entry)
struct mlxsw_sp_neigh_entry *neigh_entry,
bool adding)
{
+ enum mlxsw_reg_rauht_op op = mlxsw_sp_rauht_op(adding);
+ int err;
+
if (!adding && !neigh_entry->connected)
return;
neigh_entry->connected = adding;
if (neigh_entry->key.n->tbl->family == AF_INET) {
- mlxsw_sp_router_neigh_entry_op4(mlxsw_sp, neigh_entry,
- mlxsw_sp_rauht_op(adding));
+ err = mlxsw_sp_router_neigh_entry_op4(mlxsw_sp, neigh_entry,
+ op);
+ if (err)
+ return;
} else if (neigh_entry->key.n->tbl->family == AF_INET6) {
if (mlxsw_sp_neigh_ipv6_ignore(neigh_entry))
return;
- mlxsw_sp_router_neigh_entry_op6(mlxsw_sp, neigh_entry,
- mlxsw_sp_rauht_op(adding));
+ err = mlxsw_sp_router_neigh_entry_op6(mlxsw_sp, neigh_entry,
+ op);
+ if (err)
+ return;
} else {
WARN_ON_ONCE(1);
+ return;
}
+
+ if (adding)
+ neigh_entry->key.n->flags |= NTF_OFFLOADED;
+ else
+ neigh_entry->key.n->flags &= ~NTF_OFFLOADED;
}
void
return false;
list_for_each_entry(mlxsw_sp_rt6, &fib6_entry->rt6_list, list) {
+ struct fib6_nh *fib6_nh = &mlxsw_sp_rt6->rt->fib6_nh;
struct in6_addr *gw;
int ifindex, weight;
- ifindex = mlxsw_sp_rt6->rt->fib6_nh.nh_dev->ifindex;
- weight = mlxsw_sp_rt6->rt->fib6_nh.nh_weight;
- gw = &mlxsw_sp_rt6->rt->fib6_nh.nh_gw;
+ ifindex = fib6_nh->fib_nh_dev->ifindex;
+ weight = fib6_nh->fib_nh_weight;
+ gw = &fib6_nh->fib_nh_gw6;
if (!mlxsw_sp_nexthop6_group_has_nexthop(nh_grp, gw, ifindex,
weight))
return false;
struct net_device *dev;
list_for_each_entry(mlxsw_sp_rt6, &fib6_entry->rt6_list, list) {
- dev = mlxsw_sp_rt6->rt->fib6_nh.nh_dev;
+ dev = mlxsw_sp_rt6->rt->fib6_nh.fib_nh_dev;
val ^= dev->ifindex;
}
const struct fib_nh *fib_nh,
enum mlxsw_sp_ipip_type *p_ipipt)
{
- struct net_device *dev = fib_nh->nh_dev;
+ struct net_device *dev = fib_nh->fib_nh_dev;
return dev &&
fib_nh->nh_parent->fib_type == RTN_UNICAST &&
struct fib_nh *fib_nh)
{
const struct mlxsw_sp_ipip_ops *ipip_ops;
- struct net_device *dev = fib_nh->nh_dev;
+ struct net_device *dev = fib_nh->fib_nh_dev;
struct mlxsw_sp_ipip_entry *ipip_entry;
struct mlxsw_sp_rif *rif;
int err;
struct mlxsw_sp_nexthop *nh,
struct fib_nh *fib_nh)
{
- struct net_device *dev = fib_nh->nh_dev;
+ struct net_device *dev = fib_nh->fib_nh_dev;
struct in_device *in_dev;
int err;
nh->nh_grp = nh_grp;
nh->key.fib_nh = fib_nh;
#ifdef CONFIG_IP_ROUTE_MULTIPATH
- nh->nh_weight = fib_nh->nh_weight;
+ nh->nh_weight = fib_nh->fib_nh_weight;
#else
nh->nh_weight = 1;
#endif
- memcpy(&nh->gw_addr, &fib_nh->nh_gw, sizeof(fib_nh->nh_gw));
+ memcpy(&nh->gw_addr, &fib_nh->fib_nh_gw4, sizeof(fib_nh->fib_nh_gw4));
err = mlxsw_sp_nexthop_insert(mlxsw_sp, nh);
if (err)
return err;
in_dev = __in_dev_get_rtnl(dev);
if (in_dev && IN_DEV_IGNORE_ROUTES_WITH_LINKDOWN(in_dev) &&
- fib_nh->nh_flags & RTNH_F_LINKDOWN)
+ fib_nh->fib_nh_flags & RTNH_F_LINKDOWN)
return 0;
err = mlxsw_sp_nexthop4_type_init(mlxsw_sp, nh, fib_nh);
static bool mlxsw_sp_fi_is_gateway(const struct mlxsw_sp *mlxsw_sp,
const struct fib_info *fi)
{
- return fi->fib_nh->nh_scope == RT_SCOPE_LINK ||
+ return fi->fib_nh->fib_nh_scope == RT_SCOPE_LINK ||
mlxsw_sp_nexthop4_ipip_type(mlxsw_sp, fi->fib_nh, NULL);
}
struct mlxsw_sp_nexthop *nh = &nh_grp->nexthops[i];
struct fib6_info *rt = mlxsw_sp_rt6->rt;
- if (nh->rif && nh->rif->dev == rt->fib6_nh.nh_dev &&
+ if (nh->rif && nh->rif->dev == rt->fib6_nh.fib_nh_dev &&
ipv6_addr_equal((const struct in6_addr *) &nh->gw_addr,
- &rt->fib6_nh.nh_gw))
+ &rt->fib6_nh.fib_nh_gw6))
return nh;
continue;
}
fib_entry->type == MLXSW_SP_FIB_ENTRY_TYPE_BLACKHOLE ||
fib_entry->type == MLXSW_SP_FIB_ENTRY_TYPE_IPIP_DECAP ||
fib_entry->type == MLXSW_SP_FIB_ENTRY_TYPE_NVE_DECAP) {
- nh_grp->nexthops->key.fib_nh->nh_flags |= RTNH_F_OFFLOAD;
+ nh_grp->nexthops->key.fib_nh->fib_nh_flags |= RTNH_F_OFFLOAD;
return;
}
struct mlxsw_sp_nexthop *nh = &nh_grp->nexthops[i];
if (nh->offloaded)
- nh->key.fib_nh->nh_flags |= RTNH_F_OFFLOAD;
+ nh->key.fib_nh->fib_nh_flags |= RTNH_F_OFFLOAD;
else
- nh->key.fib_nh->nh_flags &= ~RTNH_F_OFFLOAD;
+ nh->key.fib_nh->fib_nh_flags &= ~RTNH_F_OFFLOAD;
}
}
for (i = 0; i < nh_grp->count; i++) {
struct mlxsw_sp_nexthop *nh = &nh_grp->nexthops[i];
- nh->key.fib_nh->nh_flags &= ~RTNH_F_OFFLOAD;
+ nh->key.fib_nh->fib_nh_flags &= ~RTNH_F_OFFLOAD;
}
}
if (fib_entry->type == MLXSW_SP_FIB_ENTRY_TYPE_LOCAL ||
fib_entry->type == MLXSW_SP_FIB_ENTRY_TYPE_BLACKHOLE) {
list_first_entry(&fib6_entry->rt6_list, struct mlxsw_sp_rt6,
- list)->rt->fib6_nh.nh_flags |= RTNH_F_OFFLOAD;
+ list)->rt->fib6_nh.fib_nh_flags |= RTNH_F_OFFLOAD;
return;
}
list_for_each_entry(mlxsw_sp_rt6, &fib6_entry->rt6_list, list) {
struct mlxsw_sp_nexthop_group *nh_grp = fib_entry->nh_group;
+ struct fib6_nh *fib6_nh = &mlxsw_sp_rt6->rt->fib6_nh;
struct mlxsw_sp_nexthop *nh;
nh = mlxsw_sp_rt6_nexthop(nh_grp, mlxsw_sp_rt6);
if (nh && nh->offloaded)
- mlxsw_sp_rt6->rt->fib6_nh.nh_flags |= RTNH_F_OFFLOAD;
+ fib6_nh->fib_nh_flags |= RTNH_F_OFFLOAD;
else
- mlxsw_sp_rt6->rt->fib6_nh.nh_flags &= ~RTNH_F_OFFLOAD;
+ fib6_nh->fib_nh_flags &= ~RTNH_F_OFFLOAD;
}
}
list_for_each_entry(mlxsw_sp_rt6, &fib6_entry->rt6_list, list) {
struct fib6_info *rt = mlxsw_sp_rt6->rt;
- rt->fib6_nh.nh_flags &= ~RTNH_F_OFFLOAD;
+ rt->fib6_nh.fib_nh_flags &= ~RTNH_F_OFFLOAD;
}
}
static bool mlxsw_sp_fib6_rt_can_mp(const struct fib6_info *rt)
{
/* RTF_CACHE routes are ignored */
- return (rt->fib6_flags & (RTF_GATEWAY | RTF_ADDRCONF)) == RTF_GATEWAY;
+ return !(rt->fib6_flags & RTF_ADDRCONF) && rt->fib6_nh.fib_nh_gw_family;
}
static struct fib6_info *
const struct fib6_info *rt,
enum mlxsw_sp_ipip_type *ret)
{
- return rt->fib6_nh.nh_dev &&
- mlxsw_sp_netdev_ipip_type(mlxsw_sp, rt->fib6_nh.nh_dev, ret);
+ return rt->fib6_nh.fib_nh_dev &&
+ mlxsw_sp_netdev_ipip_type(mlxsw_sp, rt->fib6_nh.fib_nh_dev, ret);
}
static int mlxsw_sp_nexthop6_type_init(struct mlxsw_sp *mlxsw_sp,
{
const struct mlxsw_sp_ipip_ops *ipip_ops;
struct mlxsw_sp_ipip_entry *ipip_entry;
- struct net_device *dev = rt->fib6_nh.nh_dev;
+ struct net_device *dev = rt->fib6_nh.fib_nh_dev;
struct mlxsw_sp_rif *rif;
int err;
struct mlxsw_sp_nexthop *nh,
const struct fib6_info *rt)
{
- struct net_device *dev = rt->fib6_nh.nh_dev;
+ struct net_device *dev = rt->fib6_nh.fib_nh_dev;
nh->nh_grp = nh_grp;
- nh->nh_weight = rt->fib6_nh.nh_weight;
- memcpy(&nh->gw_addr, &rt->fib6_nh.nh_gw, sizeof(nh->gw_addr));
+ nh->nh_weight = rt->fib6_nh.fib_nh_weight;
+ memcpy(&nh->gw_addr, &rt->fib6_nh.fib_nh_gw6, sizeof(nh->gw_addr));
mlxsw_sp_nexthop_counter_alloc(mlxsw_sp, nh);
list_add_tail(&nh->router_list_node, &mlxsw_sp->router->nexthop_list);
static bool mlxsw_sp_rt6_is_gateway(const struct mlxsw_sp *mlxsw_sp,
const struct fib6_info *rt)
{
- return rt->fib6_flags & RTF_GATEWAY ||
+ return rt->fib6_nh.fib_nh_gw_family ||
mlxsw_sp_nexthop6_ipip_type(mlxsw_sp, rt, NULL);
}
NL_SET_ERR_MSG_MOD(info->extack, "FIB offload was aborted. Not configuring route");
return notifier_from_errno(-EINVAL);
}
+ if (info->family == AF_INET) {
+ struct fib_entry_notifier_info *fen_info = ptr;
+
+ if (fen_info->fi->fib_nh_is_v6) {
+ NL_SET_ERR_MSG_MOD(info->extack, "IPv6 gateway with IPv4 route is not supported");
+ return notifier_from_errno(-EINVAL);
+ }
+ }
break;
}
dev = rt->dst.dev;
*saddrp = fl4.saddr;
- *daddrp = rt->rt_gateway;
+ if (rt->rt_gw_family == AF_INET)
+ *daddrp = rt->rt_gw4;
+ /* can not offload if route has an IPv6 gateway */
+ else if (rt->rt_gw_family == AF_INET6)
+ dev = NULL;
out:
ip_rt_put(rt);
struct mlxsw_sib_port **ports;
struct mlxsw_core *core;
const struct mlxsw_bus_info *bus_info;
+ u8 hw_id[ETH_ALEN];
};
struct mlxsw_sib_port {
mlxsw_tx_v1_hdr_type_set(txhdr, MLXSW_TXHDR_TYPE_CONTROL);
}
+static int mlxsw_sib_hw_id_get(struct mlxsw_sib *mlxsw_sib)
+{
+ char spad_pl[MLXSW_REG_SPAD_LEN] = {0};
+ int err;
+
+ err = mlxsw_reg_query(mlxsw_sib->core, MLXSW_REG(spad), spad_pl);
+ if (err)
+ return err;
+ mlxsw_reg_spad_base_mac_memcpy_from(spad_pl, mlxsw_sib->hw_id);
+ return 0;
+}
+
static int
mlxsw_sib_port_admin_status_set(struct mlxsw_sib_port *mlxsw_sib_port,
bool is_up)
{
int err;
- err = mlxsw_core_port_init(mlxsw_sib->core, local_port);
+ err = mlxsw_core_port_init(mlxsw_sib->core, local_port,
+ module + 1, false, 0,
+ mlxsw_sib->hw_id, sizeof(mlxsw_sib->hw_id));
if (err) {
dev_err(mlxsw_sib->bus_info->dev, "Port %d: Failed to init core port\n",
local_port);
mlxsw_sib->core = mlxsw_core;
mlxsw_sib->bus_info = mlxsw_bus_info;
+ err = mlxsw_sib_hw_id_get(mlxsw_sib);
+ if (err) {
+ dev_err(mlxsw_sib->bus_info->dev, "Failed to get switch HW ID\n");
+ return err;
+ }
+
err = mlxsw_sib_ports_create(mlxsw_sib);
if (err) {
dev_err(mlxsw_sib->bus_info->dev, "Failed to create ports\n");
stats->tx_dropped = tx_dropped;
}
-static int mlxsw_sx_port_get_phys_port_name(struct net_device *dev, char *name,
- size_t len)
-{
- struct mlxsw_sx_port *mlxsw_sx_port = netdev_priv(dev);
-
- return mlxsw_core_port_get_phys_port_name(mlxsw_sx_port->mlxsw_sx->core,
- mlxsw_sx_port->local_port,
- name, len);
-}
-
-static int mlxsw_sx_port_get_port_parent_id(struct net_device *dev,
- struct netdev_phys_item_id *ppid)
+static struct devlink_port *
+mlxsw_sx_port_get_devlink_port(struct net_device *dev)
{
struct mlxsw_sx_port *mlxsw_sx_port = netdev_priv(dev);
struct mlxsw_sx *mlxsw_sx = mlxsw_sx_port->mlxsw_sx;
- ppid->id_len = sizeof(mlxsw_sx->hw_id);
- memcpy(&ppid->id, &mlxsw_sx->hw_id, ppid->id_len);
-
- return 0;
+ return mlxsw_core_port_devlink_port_get(mlxsw_sx->core,
+ mlxsw_sx_port->local_port);
}
static const struct net_device_ops mlxsw_sx_port_netdev_ops = {
.ndo_start_xmit = mlxsw_sx_port_xmit,
.ndo_change_mtu = mlxsw_sx_port_change_mtu,
.ndo_get_stats64 = mlxsw_sx_port_get_stats64,
- .ndo_get_phys_port_name = mlxsw_sx_port_get_phys_port_name,
- .ndo_get_port_parent_id = mlxsw_sx_port_get_port_parent_id,
+ .ndo_get_devlink_port = mlxsw_sx_port_get_devlink_port,
};
static void mlxsw_sx_port_get_drvinfo(struct net_device *dev,
}
mlxsw_core_port_eth_set(mlxsw_sx->core, mlxsw_sx_port->local_port,
- mlxsw_sx_port, dev, module + 1, false, 0);
+ mlxsw_sx_port, dev);
mlxsw_sx->ports[local_port] = mlxsw_sx_port;
return 0;
{
int err;
- err = mlxsw_core_port_init(mlxsw_sx->core, local_port);
+ err = mlxsw_core_port_init(mlxsw_sx->core, local_port,
+ module + 1, false, 0,
+ mlxsw_sx->hw_id, sizeof(mlxsw_sx->hw_id));
if (err) {
dev_err(mlxsw_sx->bus_info->dev, "Port %d: Failed to init core port\n",
local_port);
+// SPDX-License-Identifier: GPL-2.0+
/*
* Microchip ENC28J60 ethernet driver (MAC + PHY)
*
* Author: Claudio Lanconelli <lanconelli.claudio@eptar.com>
* based on enc28j60.c written by David Anders for 2.4 kernel version
*
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
* $Id: enc28j60.c,v 1.22 2007/12/20 10:47:01 claudio Exp $
*/
#include <linux/types.h>
#include <linux/fcntl.h>
#include <linux/interrupt.h>
+#include <linux/property.h>
#include <linux/string.h>
#include <linux/errno.h>
-#include <linux/init.h>
#include <linux/netdevice.h>
#include <linux/etherdevice.h>
#include <linux/ethtool.h>
#include <linux/skbuff.h>
#include <linux/delay.h>
#include <linux/spi/spi.h>
-#include <linux/of_net.h>
#include "enc28j60_hw.h"
(NETIF_MSG_PROBE | NETIF_MSG_IFUP | NETIF_MSG_IFDOWN | NETIF_MSG_LINK)
/* Buffer size required for the largest SPI transfer (i.e., reading a
- * frame). */
+ * frame).
+ */
#define SPI_TRANSFER_BUF_LEN (4 + MAX_FRAMELEN)
-#define TX_TIMEOUT (4 * HZ)
+#define TX_TIMEOUT (4 * HZ)
/* Max TX retries in case of collision as suggested by errata datasheet */
#define MAX_TX_RETRYCOUNT 16
/*
* SPI read buffer
- * wait for the SPI transfer and copy received data to destination
+ * Wait for the SPI transfer and copy received data to destination.
*/
static int
spi_read_buf(struct enc28j60_net *priv, int len, u8 *data)
{
+ struct device *dev = &priv->spi->dev;
u8 *rx_buf = priv->spi_transfer_buf + 4;
u8 *tx_buf = priv->spi_transfer_buf;
struct spi_transfer tx = {
ret = msg.status;
}
if (ret && netif_msg_drv(priv))
- printk(KERN_DEBUG DRV_NAME ": %s() failed: ret = %d\n",
- __func__, ret);
+ dev_printk(KERN_DEBUG, dev, "%s() failed: ret = %d\n",
+ __func__, ret);
return ret;
}
/*
* SPI write buffer
*/
-static int spi_write_buf(struct enc28j60_net *priv, int len,
- const u8 *data)
+static int spi_write_buf(struct enc28j60_net *priv, int len, const u8 *data)
{
+ struct device *dev = &priv->spi->dev;
int ret;
if (len > SPI_TRANSFER_BUF_LEN - 1 || len <= 0)
memcpy(&priv->spi_transfer_buf[1], data, len);
ret = spi_write(priv->spi, priv->spi_transfer_buf, len + 1);
if (ret && netif_msg_drv(priv))
- printk(KERN_DEBUG DRV_NAME ": %s() failed: ret = %d\n",
- __func__, ret);
+ dev_printk(KERN_DEBUG, dev, "%s() failed: ret = %d\n",
+ __func__, ret);
}
return ret;
}
/*
* basic SPI read operation
*/
-static u8 spi_read_op(struct enc28j60_net *priv, u8 op,
- u8 addr)
+static u8 spi_read_op(struct enc28j60_net *priv, u8 op, u8 addr)
{
+ struct device *dev = &priv->spi->dev;
u8 tx_buf[2];
u8 rx_buf[4];
u8 val = 0;
tx_buf[0] = op | (addr & ADDR_MASK);
ret = spi_write_then_read(priv->spi, tx_buf, 1, rx_buf, slen);
if (ret)
- printk(KERN_DEBUG DRV_NAME ": %s() failed: ret = %d\n",
- __func__, ret);
+ dev_printk(KERN_DEBUG, dev, "%s() failed: ret = %d\n",
+ __func__, ret);
else
val = rx_buf[slen - 1];
/*
* basic SPI write operation
*/
-static int spi_write_op(struct enc28j60_net *priv, u8 op,
- u8 addr, u8 val)
+static int spi_write_op(struct enc28j60_net *priv, u8 op, u8 addr, u8 val)
{
+ struct device *dev = &priv->spi->dev;
int ret;
priv->spi_transfer_buf[0] = op | (addr & ADDR_MASK);
priv->spi_transfer_buf[1] = val;
ret = spi_write(priv->spi, priv->spi_transfer_buf, 2);
if (ret && netif_msg_drv(priv))
- printk(KERN_DEBUG DRV_NAME ": %s() failed: ret = %d\n",
- __func__, ret);
+ dev_printk(KERN_DEBUG, dev, "%s() failed: ret = %d\n",
+ __func__, ret);
return ret;
}
static void enc28j60_soft_reset(struct enc28j60_net *priv)
{
- if (netif_msg_hw(priv))
- printk(KERN_DEBUG DRV_NAME ": %s() enter\n", __func__);
-
spi_write_op(priv, ENC28J60_SOFT_RESET, 0, ENC28J60_SOFT_RESET);
/* Errata workaround #1, CLKRDY check is unreliable,
- * delay at least 1 mS instead */
+ * delay at least 1 ms instead */
udelay(2000);
}
u8 b = (addr & BANK_MASK) >> 5;
/* These registers (EIE, EIR, ESTAT, ECON2, ECON1)
- * are present in all banks, no need to switch bank
+ * are present in all banks, no need to switch bank.
*/
if (addr >= EIE && addr <= ECON1)
return;
/*
* Register bit field Set
*/
-static void nolock_reg_bfset(struct enc28j60_net *priv,
- u8 addr, u8 mask)
+static void nolock_reg_bfset(struct enc28j60_net *priv, u8 addr, u8 mask)
{
enc28j60_set_bank(priv, addr);
spi_write_op(priv, ENC28J60_BIT_FIELD_SET, addr, mask);
}
-static void locked_reg_bfset(struct enc28j60_net *priv,
- u8 addr, u8 mask)
+static void locked_reg_bfset(struct enc28j60_net *priv, u8 addr, u8 mask)
{
mutex_lock(&priv->lock);
nolock_reg_bfset(priv, addr, mask);
/*
* Register bit field Clear
*/
-static void nolock_reg_bfclr(struct enc28j60_net *priv,
- u8 addr, u8 mask)
+static void nolock_reg_bfclr(struct enc28j60_net *priv, u8 addr, u8 mask)
{
enc28j60_set_bank(priv, addr);
spi_write_op(priv, ENC28J60_BIT_FIELD_CLR, addr, mask);
}
-static void locked_reg_bfclr(struct enc28j60_net *priv,
- u8 addr, u8 mask)
+static void locked_reg_bfclr(struct enc28j60_net *priv, u8 addr, u8 mask)
{
mutex_lock(&priv->lock);
nolock_reg_bfclr(priv, addr, mask);
/*
* Register byte read
*/
-static int nolock_regb_read(struct enc28j60_net *priv,
- u8 address)
+static int nolock_regb_read(struct enc28j60_net *priv, u8 address)
{
enc28j60_set_bank(priv, address);
return spi_read_op(priv, ENC28J60_READ_CTRL_REG, address);
}
-static int locked_regb_read(struct enc28j60_net *priv,
- u8 address)
+static int locked_regb_read(struct enc28j60_net *priv, u8 address)
{
int ret;
/*
* Register word read
*/
-static int nolock_regw_read(struct enc28j60_net *priv,
- u8 address)
+static int nolock_regw_read(struct enc28j60_net *priv, u8 address)
{
int rl, rh;
return (rh << 8) | rl;
}
-static int locked_regw_read(struct enc28j60_net *priv,
- u8 address)
+static int locked_regw_read(struct enc28j60_net *priv, u8 address)
{
int ret;
/*
* Register byte write
*/
-static void nolock_regb_write(struct enc28j60_net *priv,
- u8 address, u8 data)
+static void nolock_regb_write(struct enc28j60_net *priv, u8 address, u8 data)
{
enc28j60_set_bank(priv, address);
spi_write_op(priv, ENC28J60_WRITE_CTRL_REG, address, data);
}
-static void locked_regb_write(struct enc28j60_net *priv,
- u8 address, u8 data)
+static void locked_regb_write(struct enc28j60_net *priv, u8 address, u8 data)
{
mutex_lock(&priv->lock);
nolock_regb_write(priv, address, data);
/*
* Register word write
*/
-static void nolock_regw_write(struct enc28j60_net *priv,
- u8 address, u16 data)
+static void nolock_regw_write(struct enc28j60_net *priv, u8 address, u16 data)
{
enc28j60_set_bank(priv, address);
spi_write_op(priv, ENC28J60_WRITE_CTRL_REG, address, (u8) data);
(u8) (data >> 8));
}
-static void locked_regw_write(struct enc28j60_net *priv,
- u8 address, u16 data)
+static void locked_regw_write(struct enc28j60_net *priv, u8 address, u16 data)
{
mutex_lock(&priv->lock);
nolock_regw_write(priv, address, data);
/*
* Buffer memory read
- * Select the starting address and execute a SPI buffer read
+ * Select the starting address and execute a SPI buffer read.
*/
-static void enc28j60_mem_read(struct enc28j60_net *priv,
- u16 addr, int len, u8 *data)
+static void enc28j60_mem_read(struct enc28j60_net *priv, u16 addr, int len,
+ u8 *data)
{
mutex_lock(&priv->lock);
nolock_regw_write(priv, ERDPTL, addr);
#ifdef CONFIG_ENC28J60_WRITEVERIFY
if (netif_msg_drv(priv)) {
+ struct device *dev = &priv->spi->dev;
u16 reg;
+
reg = nolock_regw_read(priv, ERDPTL);
if (reg != addr)
- printk(KERN_DEBUG DRV_NAME ": %s() error writing ERDPT "
- "(0x%04x - 0x%04x)\n", __func__, reg, addr);
+ dev_printk(KERN_DEBUG, dev,
+ "%s() error writing ERDPT (0x%04x - 0x%04x)\n",
+ __func__, reg, addr);
}
#endif
spi_read_buf(priv, len, data);
static void
enc28j60_packet_write(struct enc28j60_net *priv, int len, const u8 *data)
{
+ struct device *dev = &priv->spi->dev;
+
mutex_lock(&priv->lock);
/* Set the write pointer to start of transmit buffer area */
nolock_regw_write(priv, EWRPTL, TXSTART_INIT);
u16 reg;
reg = nolock_regw_read(priv, EWRPTL);
if (reg != TXSTART_INIT)
- printk(KERN_DEBUG DRV_NAME
- ": %s() ERWPT:0x%04x != 0x%04x\n",
- __func__, reg, TXSTART_INIT);
+ dev_printk(KERN_DEBUG, dev,
+ "%s() ERWPT:0x%04x != 0x%04x\n",
+ __func__, reg, TXSTART_INIT);
}
#endif
/* Set the TXND pointer to correspond to the packet size given */
/* write per-packet control byte */
spi_write_op(priv, ENC28J60_WRITE_BUF_MEM, 0, 0x00);
if (netif_msg_hw(priv))
- printk(KERN_DEBUG DRV_NAME
- ": %s() after control byte ERWPT:0x%04x\n",
- __func__, nolock_regw_read(priv, EWRPTL));
+ dev_printk(KERN_DEBUG, dev,
+ "%s() after control byte ERWPT:0x%04x\n",
+ __func__, nolock_regw_read(priv, EWRPTL));
/* copy the packet into the transmit buffer */
spi_write_buf(priv, len, data);
if (netif_msg_hw(priv))
- printk(KERN_DEBUG DRV_NAME
- ": %s() after write packet ERWPT:0x%04x, len=%d\n",
- __func__, nolock_regw_read(priv, EWRPTL), len);
+ dev_printk(KERN_DEBUG, dev,
+ "%s() after write packet ERWPT:0x%04x, len=%d\n",
+ __func__, nolock_regw_read(priv, EWRPTL), len);
mutex_unlock(&priv->lock);
}
-static unsigned long msec20_to_jiffies;
-
static int poll_ready(struct enc28j60_net *priv, u8 reg, u8 mask, u8 val)
{
- unsigned long timeout = jiffies + msec20_to_jiffies;
+ struct device *dev = &priv->spi->dev;
+ unsigned long timeout = jiffies + msecs_to_jiffies(20);
/* 20 msec timeout read */
while ((nolock_regb_read(priv, reg) & mask) != val) {
if (time_after(jiffies, timeout)) {
if (netif_msg_drv(priv))
- dev_dbg(&priv->spi->dev,
- "reg %02x ready timeout!\n", reg);
+ dev_dbg(dev, "reg %02x ready timeout!\n", reg);
return -ETIMEDOUT;
}
cpu_relax();
/*
* PHY register read
- * PHY registers are not accessed directly, but through the MII
+ * PHY registers are not accessed directly, but through the MII.
*/
static u16 enc28j60_phy_read(struct enc28j60_net *priv, u8 address)
{
/* quit reading */
nolock_regb_write(priv, MICMD, 0x00);
/* return the data */
- ret = nolock_regw_read(priv, MIRDL);
+ ret = nolock_regw_read(priv, MIRDL);
mutex_unlock(&priv->lock);
return ret;
{
int ret;
struct enc28j60_net *priv = netdev_priv(ndev);
+ struct device *dev = &priv->spi->dev;
mutex_lock(&priv->lock);
if (!priv->hw_enable) {
if (netif_msg_drv(priv))
- printk(KERN_INFO DRV_NAME
- ": %s: Setting MAC address to %pM\n",
- ndev->name, ndev->dev_addr);
+ dev_info(dev, "%s: Setting MAC address to %pM\n",
+ ndev->name, ndev->dev_addr);
/* NOTE: MAC address in ENC28J60 is byte-backward */
nolock_regb_write(priv, MAADR5, ndev->dev_addr[0]);
nolock_regb_write(priv, MAADR4, ndev->dev_addr[1]);
ret = 0;
} else {
if (netif_msg_drv(priv))
- printk(KERN_DEBUG DRV_NAME
- ": %s() Hardware must be disabled to set "
- "Mac address\n", __func__);
+ dev_printk(KERN_DEBUG, dev,
+ "%s() Hardware must be disabled to set Mac address\n",
+ __func__);
ret = -EBUSY;
}
mutex_unlock(&priv->lock);
if (!is_valid_ether_addr(address->sa_data))
return -EADDRNOTAVAIL;
- memcpy(dev->dev_addr, address->sa_data, dev->addr_len);
+ ether_addr_copy(dev->dev_addr, address->sa_data);
return enc28j60_set_hw_macaddr(dev);
}
*/
static void enc28j60_dump_regs(struct enc28j60_net *priv, const char *msg)
{
+ struct device *dev = &priv->spi->dev;
+
mutex_lock(&priv->lock);
- printk(KERN_DEBUG DRV_NAME " %s\n"
- "HwRevID: 0x%02x\n"
- "Cntrl: ECON1 ECON2 ESTAT EIR EIE\n"
- " 0x%02x 0x%02x 0x%02x 0x%02x 0x%02x\n"
- "MAC : MACON1 MACON3 MACON4\n"
- " 0x%02x 0x%02x 0x%02x\n"
- "Rx : ERXST ERXND ERXWRPT ERXRDPT ERXFCON EPKTCNT MAMXFL\n"
- " 0x%04x 0x%04x 0x%04x 0x%04x "
- "0x%02x 0x%02x 0x%04x\n"
- "Tx : ETXST ETXND MACLCON1 MACLCON2 MAPHSUP\n"
- " 0x%04x 0x%04x 0x%02x 0x%02x 0x%02x\n",
- msg, nolock_regb_read(priv, EREVID),
- nolock_regb_read(priv, ECON1), nolock_regb_read(priv, ECON2),
- nolock_regb_read(priv, ESTAT), nolock_regb_read(priv, EIR),
- nolock_regb_read(priv, EIE), nolock_regb_read(priv, MACON1),
- nolock_regb_read(priv, MACON3), nolock_regb_read(priv, MACON4),
- nolock_regw_read(priv, ERXSTL), nolock_regw_read(priv, ERXNDL),
- nolock_regw_read(priv, ERXWRPTL),
- nolock_regw_read(priv, ERXRDPTL),
- nolock_regb_read(priv, ERXFCON),
- nolock_regb_read(priv, EPKTCNT),
- nolock_regw_read(priv, MAMXFLL), nolock_regw_read(priv, ETXSTL),
- nolock_regw_read(priv, ETXNDL),
- nolock_regb_read(priv, MACLCON1),
- nolock_regb_read(priv, MACLCON2),
- nolock_regb_read(priv, MAPHSUP));
+ dev_printk(KERN_DEBUG, dev,
+ " %s\n"
+ "HwRevID: 0x%02x\n"
+ "Cntrl: ECON1 ECON2 ESTAT EIR EIE\n"
+ " 0x%02x 0x%02x 0x%02x 0x%02x 0x%02x\n"
+ "MAC : MACON1 MACON3 MACON4\n"
+ " 0x%02x 0x%02x 0x%02x\n"
+ "Rx : ERXST ERXND ERXWRPT ERXRDPT ERXFCON EPKTCNT MAMXFL\n"
+ " 0x%04x 0x%04x 0x%04x 0x%04x "
+ "0x%02x 0x%02x 0x%04x\n"
+ "Tx : ETXST ETXND MACLCON1 MACLCON2 MAPHSUP\n"
+ " 0x%04x 0x%04x 0x%02x 0x%02x 0x%02x\n",
+ msg, nolock_regb_read(priv, EREVID),
+ nolock_regb_read(priv, ECON1), nolock_regb_read(priv, ECON2),
+ nolock_regb_read(priv, ESTAT), nolock_regb_read(priv, EIR),
+ nolock_regb_read(priv, EIE), nolock_regb_read(priv, MACON1),
+ nolock_regb_read(priv, MACON3), nolock_regb_read(priv, MACON4),
+ nolock_regw_read(priv, ERXSTL), nolock_regw_read(priv, ERXNDL),
+ nolock_regw_read(priv, ERXWRPTL),
+ nolock_regw_read(priv, ERXRDPTL),
+ nolock_regb_read(priv, ERXFCON),
+ nolock_regb_read(priv, EPKTCNT),
+ nolock_regw_read(priv, MAMXFLL), nolock_regw_read(priv, ETXSTL),
+ nolock_regw_read(priv, ETXNDL),
+ nolock_regb_read(priv, MACLCON1),
+ nolock_regb_read(priv, MACLCON2),
+ nolock_regb_read(priv, MAPHSUP));
mutex_unlock(&priv->lock);
}
static void nolock_rxfifo_init(struct enc28j60_net *priv, u16 start, u16 end)
{
+ struct device *dev = &priv->spi->dev;
u16 erxrdpt;
if (start > 0x1FFF || end > 0x1FFF || start > end) {
if (netif_msg_drv(priv))
- printk(KERN_ERR DRV_NAME ": %s(%d, %d) RXFIFO "
- "bad parameters!\n", __func__, start, end);
+ dev_err(dev, "%s(%d, %d) RXFIFO bad parameters!\n",
+ __func__, start, end);
return;
}
/* set receive buffer start + end */
static void nolock_txfifo_init(struct enc28j60_net *priv, u16 start, u16 end)
{
+ struct device *dev = &priv->spi->dev;
+
if (start > 0x1FFF || end > 0x1FFF || start > end) {
if (netif_msg_drv(priv))
- printk(KERN_ERR DRV_NAME ": %s(%d, %d) TXFIFO "
- "bad parameters!\n", __func__, start, end);
+ dev_err(dev, "%s(%d, %d) TXFIFO bad parameters!\n",
+ __func__, start, end);
return;
}
/* set transmit buffer start + end */
/*
* Low power mode shrinks power consumption about 100x, so we'd like
- * the chip to be in that mode whenever it's inactive. (However, we
- * can't stay in lowpower mode during suspend with WOL active.)
+ * the chip to be in that mode whenever it's inactive. (However, we
+ * can't stay in low power mode during suspend with WOL active.)
*/
static void enc28j60_lowpower(struct enc28j60_net *priv, bool is_low)
{
+ struct device *dev = &priv->spi->dev;
+
if (netif_msg_drv(priv))
- dev_dbg(&priv->spi->dev, "%s power...\n",
- is_low ? "low" : "high");
+ dev_dbg(dev, "%s power...\n", is_low ? "low" : "high");
mutex_lock(&priv->lock);
if (is_low) {
static int enc28j60_hw_init(struct enc28j60_net *priv)
{
+ struct device *dev = &priv->spi->dev;
u8 reg;
if (netif_msg_drv(priv))
- printk(KERN_DEBUG DRV_NAME ": %s() - %s\n", __func__,
- priv->full_duplex ? "FullDuplex" : "HalfDuplex");
+ dev_printk(KERN_DEBUG, dev, "%s() - %s\n", __func__,
+ priv->full_duplex ? "FullDuplex" : "HalfDuplex");
mutex_lock(&priv->lock);
/* first reset the chip */
/*
* Check the RevID.
* If it's 0x00 or 0xFF probably the enc28j60 is not mounted or
- * damaged
+ * damaged.
*/
reg = locked_regb_read(priv, EREVID);
if (netif_msg_drv(priv))
- printk(KERN_INFO DRV_NAME ": chip RevID: 0x%02x\n", reg);
+ dev_info(dev, "chip RevID: 0x%02x\n", reg);
if (reg == 0x00 || reg == 0xff) {
if (netif_msg_drv(priv))
- printk(KERN_DEBUG DRV_NAME ": %s() Invalid RevId %d\n",
- __func__, reg);
+ dev_printk(KERN_DEBUG, dev, "%s() Invalid RevId %d\n",
+ __func__, reg);
return 0;
}
/*
* MACLCON1 (default)
* MACLCON2 (default)
- * Set the maximum packet size which the controller will accept
+ * Set the maximum packet size which the controller will accept.
*/
locked_regw_write(priv, MAMXFLL, MAX_FRAMELEN);
static void enc28j60_hw_enable(struct enc28j60_net *priv)
{
+ struct device *dev = &priv->spi->dev;
+
/* enable interrupts */
if (netif_msg_hw(priv))
- printk(KERN_DEBUG DRV_NAME ": %s() enabling interrupts.\n",
- __func__);
+ dev_printk(KERN_DEBUG, dev, "%s() enabling interrupts.\n",
+ __func__);
enc28j60_phy_write(priv, PHIE, PHIE_PGEIE | PHIE_PLNKIE);
static void enc28j60_hw_disable(struct enc28j60_net *priv)
{
mutex_lock(&priv->lock);
- /* disable interrutps and packet reception */
+ /* disable interrupts and packet reception */
nolock_regb_write(priv, EIE, 0x00);
nolock_reg_bfclr(priv, ECON1, ECON1_RXEN);
priv->hw_enable = false;
priv->full_duplex = (duplex == DUPLEX_FULL);
else {
if (netif_msg_link(priv))
- dev_warn(&ndev->dev,
- "unsupported link setting\n");
+ netdev_warn(ndev, "unsupported link setting\n");
ret = -EOPNOTSUPP;
}
} else {
if (netif_msg_link(priv))
- dev_warn(&ndev->dev, "Warning: hw must be disabled "
- "to set link mode\n");
+ netdev_warn(ndev, "Warning: hw must be disabled to set link mode\n");
ret = -EBUSY;
}
return ret;
*/
static void enc28j60_read_tsv(struct enc28j60_net *priv, u8 tsv[TSV_SIZE])
{
+ struct device *dev = &priv->spi->dev;
int endptr;
endptr = locked_regw_read(priv, ETXNDL);
if (netif_msg_hw(priv))
- printk(KERN_DEBUG DRV_NAME ": reading TSV at addr:0x%04x\n",
- endptr + 1);
+ dev_printk(KERN_DEBUG, dev, "reading TSV at addr:0x%04x\n",
+ endptr + 1);
enc28j60_mem_read(priv, endptr + 1, TSV_SIZE, tsv);
}
static void enc28j60_dump_tsv(struct enc28j60_net *priv, const char *msg,
- u8 tsv[TSV_SIZE])
+ u8 tsv[TSV_SIZE])
{
+ struct device *dev = &priv->spi->dev;
u16 tmp1, tmp2;
- printk(KERN_DEBUG DRV_NAME ": %s - TSV:\n", msg);
+ dev_printk(KERN_DEBUG, dev, "%s - TSV:\n", msg);
tmp1 = tsv[1];
tmp1 <<= 8;
tmp1 |= tsv[0];
tmp2 <<= 8;
tmp2 |= tsv[4];
- printk(KERN_DEBUG DRV_NAME ": ByteCount: %d, CollisionCount: %d,"
- " TotByteOnWire: %d\n", tmp1, tsv[2] & 0x0f, tmp2);
- printk(KERN_DEBUG DRV_NAME ": TxDone: %d, CRCErr:%d, LenChkErr: %d,"
- " LenOutOfRange: %d\n", TSV_GETBIT(tsv, TSV_TXDONE),
- TSV_GETBIT(tsv, TSV_TXCRCERROR),
- TSV_GETBIT(tsv, TSV_TXLENCHKERROR),
- TSV_GETBIT(tsv, TSV_TXLENOUTOFRANGE));
- printk(KERN_DEBUG DRV_NAME ": Multicast: %d, Broadcast: %d, "
- "PacketDefer: %d, ExDefer: %d\n",
- TSV_GETBIT(tsv, TSV_TXMULTICAST),
- TSV_GETBIT(tsv, TSV_TXBROADCAST),
- TSV_GETBIT(tsv, TSV_TXPACKETDEFER),
- TSV_GETBIT(tsv, TSV_TXEXDEFER));
- printk(KERN_DEBUG DRV_NAME ": ExCollision: %d, LateCollision: %d, "
- "Giant: %d, Underrun: %d\n",
- TSV_GETBIT(tsv, TSV_TXEXCOLLISION),
- TSV_GETBIT(tsv, TSV_TXLATECOLLISION),
- TSV_GETBIT(tsv, TSV_TXGIANT), TSV_GETBIT(tsv, TSV_TXUNDERRUN));
- printk(KERN_DEBUG DRV_NAME ": ControlFrame: %d, PauseFrame: %d, "
- "BackPressApp: %d, VLanTagFrame: %d\n",
- TSV_GETBIT(tsv, TSV_TXCONTROLFRAME),
- TSV_GETBIT(tsv, TSV_TXPAUSEFRAME),
- TSV_GETBIT(tsv, TSV_BACKPRESSUREAPP),
- TSV_GETBIT(tsv, TSV_TXVLANTAGFRAME));
+ dev_printk(KERN_DEBUG, dev,
+ "ByteCount: %d, CollisionCount: %d, TotByteOnWire: %d\n",
+ tmp1, tsv[2] & 0x0f, tmp2);
+ dev_printk(KERN_DEBUG, dev,
+ "TxDone: %d, CRCErr:%d, LenChkErr: %d, LenOutOfRange: %d\n",
+ TSV_GETBIT(tsv, TSV_TXDONE),
+ TSV_GETBIT(tsv, TSV_TXCRCERROR),
+ TSV_GETBIT(tsv, TSV_TXLENCHKERROR),
+ TSV_GETBIT(tsv, TSV_TXLENOUTOFRANGE));
+ dev_printk(KERN_DEBUG, dev,
+ "Multicast: %d, Broadcast: %d, PacketDefer: %d, ExDefer: %d\n",
+ TSV_GETBIT(tsv, TSV_TXMULTICAST),
+ TSV_GETBIT(tsv, TSV_TXBROADCAST),
+ TSV_GETBIT(tsv, TSV_TXPACKETDEFER),
+ TSV_GETBIT(tsv, TSV_TXEXDEFER));
+ dev_printk(KERN_DEBUG, dev,
+ "ExCollision: %d, LateCollision: %d, Giant: %d, Underrun: %d\n",
+ TSV_GETBIT(tsv, TSV_TXEXCOLLISION),
+ TSV_GETBIT(tsv, TSV_TXLATECOLLISION),
+ TSV_GETBIT(tsv, TSV_TXGIANT), TSV_GETBIT(tsv, TSV_TXUNDERRUN));
+ dev_printk(KERN_DEBUG, dev,
+ "ControlFrame: %d, PauseFrame: %d, BackPressApp: %d, VLanTagFrame: %d\n",
+ TSV_GETBIT(tsv, TSV_TXCONTROLFRAME),
+ TSV_GETBIT(tsv, TSV_TXPAUSEFRAME),
+ TSV_GETBIT(tsv, TSV_BACKPRESSUREAPP),
+ TSV_GETBIT(tsv, TSV_TXVLANTAGFRAME));
}
/*
static void enc28j60_dump_rsv(struct enc28j60_net *priv, const char *msg,
u16 pk_ptr, int len, u16 sts)
{
- printk(KERN_DEBUG DRV_NAME ": %s - NextPk: 0x%04x - RSV:\n",
- msg, pk_ptr);
- printk(KERN_DEBUG DRV_NAME ": ByteCount: %d, DribbleNibble: %d\n", len,
- RSV_GETBIT(sts, RSV_DRIBBLENIBBLE));
- printk(KERN_DEBUG DRV_NAME ": RxOK: %d, CRCErr:%d, LenChkErr: %d,"
- " LenOutOfRange: %d\n", RSV_GETBIT(sts, RSV_RXOK),
- RSV_GETBIT(sts, RSV_CRCERROR),
- RSV_GETBIT(sts, RSV_LENCHECKERR),
- RSV_GETBIT(sts, RSV_LENOUTOFRANGE));
- printk(KERN_DEBUG DRV_NAME ": Multicast: %d, Broadcast: %d, "
- "LongDropEvent: %d, CarrierEvent: %d\n",
- RSV_GETBIT(sts, RSV_RXMULTICAST),
- RSV_GETBIT(sts, RSV_RXBROADCAST),
- RSV_GETBIT(sts, RSV_RXLONGEVDROPEV),
- RSV_GETBIT(sts, RSV_CARRIEREV));
- printk(KERN_DEBUG DRV_NAME ": ControlFrame: %d, PauseFrame: %d,"
- " UnknownOp: %d, VLanTagFrame: %d\n",
- RSV_GETBIT(sts, RSV_RXCONTROLFRAME),
- RSV_GETBIT(sts, RSV_RXPAUSEFRAME),
- RSV_GETBIT(sts, RSV_RXUNKNOWNOPCODE),
- RSV_GETBIT(sts, RSV_RXTYPEVLAN));
+ struct device *dev = &priv->spi->dev;
+
+ dev_printk(KERN_DEBUG, dev, "%s - NextPk: 0x%04x - RSV:\n", msg, pk_ptr);
+ dev_printk(KERN_DEBUG, dev, "ByteCount: %d, DribbleNibble: %d\n",
+ len, RSV_GETBIT(sts, RSV_DRIBBLENIBBLE));
+ dev_printk(KERN_DEBUG, dev,
+ "RxOK: %d, CRCErr:%d, LenChkErr: %d, LenOutOfRange: %d\n",
+ RSV_GETBIT(sts, RSV_RXOK),
+ RSV_GETBIT(sts, RSV_CRCERROR),
+ RSV_GETBIT(sts, RSV_LENCHECKERR),
+ RSV_GETBIT(sts, RSV_LENOUTOFRANGE));
+ dev_printk(KERN_DEBUG, dev,
+ "Multicast: %d, Broadcast: %d, LongDropEvent: %d, CarrierEvent: %d\n",
+ RSV_GETBIT(sts, RSV_RXMULTICAST),
+ RSV_GETBIT(sts, RSV_RXBROADCAST),
+ RSV_GETBIT(sts, RSV_RXLONGEVDROPEV),
+ RSV_GETBIT(sts, RSV_CARRIEREV));
+ dev_printk(KERN_DEBUG, dev,
+ "ControlFrame: %d, PauseFrame: %d, UnknownOp: %d, VLanTagFrame: %d\n",
+ RSV_GETBIT(sts, RSV_RXCONTROLFRAME),
+ RSV_GETBIT(sts, RSV_RXPAUSEFRAME),
+ RSV_GETBIT(sts, RSV_RXUNKNOWNOPCODE),
+ RSV_GETBIT(sts, RSV_RXTYPEVLAN));
}
static void dump_packet(const char *msg, int len, const char *data)
static void enc28j60_hw_rx(struct net_device *ndev)
{
struct enc28j60_net *priv = netdev_priv(ndev);
+ struct device *dev = &priv->spi->dev;
struct sk_buff *skb = NULL;
u16 erxrdpt, next_packet, rxstat;
u8 rsv[RSV_SIZE];
int len;
if (netif_msg_rx_status(priv))
- printk(KERN_DEBUG DRV_NAME ": RX pk_addr:0x%04x\n",
- priv->next_pk_ptr);
+ netdev_printk(KERN_DEBUG, ndev, "RX pk_addr:0x%04x\n",
+ priv->next_pk_ptr);
if (unlikely(priv->next_pk_ptr > RXEND_INIT)) {
if (netif_msg_rx_err(priv))
- dev_err(&ndev->dev,
- "%s() Invalid packet address!! 0x%04x\n",
- __func__, priv->next_pk_ptr);
+ netdev_err(ndev, "%s() Invalid packet address!! 0x%04x\n",
+ __func__, priv->next_pk_ptr);
/* packet address corrupted: reset RX logic */
mutex_lock(&priv->lock);
nolock_reg_bfclr(priv, ECON1, ECON1_RXEN);
if (!RSV_GETBIT(rxstat, RSV_RXOK) || len > MAX_FRAMELEN) {
if (netif_msg_rx_err(priv))
- dev_err(&ndev->dev, "Rx Error (%04x)\n", rxstat);
+ netdev_err(ndev, "Rx Error (%04x)\n", rxstat);
ndev->stats.rx_errors++;
if (RSV_GETBIT(rxstat, RSV_CRCERROR))
ndev->stats.rx_crc_errors++;
skb = netdev_alloc_skb(ndev, len + NET_IP_ALIGN);
if (!skb) {
if (netif_msg_rx_err(priv))
- dev_err(&ndev->dev,
- "out of memory for Rx'd frame\n");
+ netdev_err(ndev, "out of memory for Rx'd frame\n");
ndev->stats.rx_dropped++;
} else {
skb_reserve(skb, NET_IP_ALIGN);
/*
* Move the RX read pointer to the start of the next
* received packet.
- * This frees the memory we just read out
+ * This frees the memory we just read out.
*/
erxrdpt = erxrdpt_workaround(next_packet, RXSTART_INIT, RXEND_INIT);
if (netif_msg_hw(priv))
- printk(KERN_DEBUG DRV_NAME ": %s() ERXRDPT:0x%04x\n",
- __func__, erxrdpt);
+ dev_printk(KERN_DEBUG, dev, "%s() ERXRDPT:0x%04x\n",
+ __func__, erxrdpt);
mutex_lock(&priv->lock);
nolock_regw_write(priv, ERXRDPTL, erxrdpt);
u16 reg;
reg = nolock_regw_read(priv, ERXRDPTL);
if (reg != erxrdpt)
- printk(KERN_DEBUG DRV_NAME ": %s() ERXRDPT verify "
- "error (0x%04x - 0x%04x)\n", __func__,
- reg, erxrdpt);
+ dev_printk(KERN_DEBUG, dev,
+ "%s() ERXRDPT verify error (0x%04x - 0x%04x)\n",
+ __func__, reg, erxrdpt);
}
#endif
priv->next_pk_ptr = next_packet;
*/
static int enc28j60_get_free_rxfifo(struct enc28j60_net *priv)
{
+ struct net_device *ndev = priv->netdev;
int epkcnt, erxst, erxnd, erxwr, erxrd;
int free_space;
}
mutex_unlock(&priv->lock);
if (netif_msg_rx_status(priv))
- printk(KERN_DEBUG DRV_NAME ": %s() free_space = %d\n",
- __func__, free_space);
+ netdev_printk(KERN_DEBUG, ndev, "%s() free_space = %d\n",
+ __func__, free_space);
return free_space;
}
static void enc28j60_check_link_status(struct net_device *ndev)
{
struct enc28j60_net *priv = netdev_priv(ndev);
+ struct device *dev = &priv->spi->dev;
u16 reg;
int duplex;
reg = enc28j60_phy_read(priv, PHSTAT2);
if (netif_msg_hw(priv))
- printk(KERN_DEBUG DRV_NAME ": %s() PHSTAT1: %04x, "
- "PHSTAT2: %04x\n", __func__,
- enc28j60_phy_read(priv, PHSTAT1), reg);
+ dev_printk(KERN_DEBUG, dev,
+ "%s() PHSTAT1: %04x, PHSTAT2: %04x\n", __func__,
+ enc28j60_phy_read(priv, PHSTAT1), reg);
duplex = reg & PHSTAT2_DPXSTAT;
if (reg & PHSTAT2_LSTAT) {
netif_carrier_on(ndev);
if (netif_msg_ifup(priv))
- dev_info(&ndev->dev, "link up - %s\n",
- duplex ? "Full duplex" : "Half duplex");
+ netdev_info(ndev, "link up - %s\n",
+ duplex ? "Full duplex" : "Half duplex");
} else {
if (netif_msg_ifdown(priv))
- dev_info(&ndev->dev, "link down\n");
+ netdev_info(ndev, "link down\n");
netif_carrier_off(ndev);
}
}
/*
* RX handler
- * ignore PKTIF because is unreliable! (look at the errata datasheet)
- * check EPKTCNT is the suggested workaround.
+ * Ignore PKTIF because is unreliable! (Look at the errata datasheet)
+ * Check EPKTCNT is the suggested workaround.
* We don't need to clear interrupt flag, automatically done when
* enc28j60_hw_rx() decrements the packet counter.
* Returns how many packet processed.
pk_counter = locked_regb_read(priv, EPKTCNT);
if (pk_counter && netif_msg_intr(priv))
- printk(KERN_DEBUG DRV_NAME ": intRX, pk_cnt: %d\n", pk_counter);
+ netdev_printk(KERN_DEBUG, ndev, "intRX, pk_cnt: %d\n",
+ pk_counter);
if (pk_counter > priv->max_pk_counter) {
/* update statistics */
priv->max_pk_counter = pk_counter;
if (netif_msg_rx_status(priv) && priv->max_pk_counter > 1)
- printk(KERN_DEBUG DRV_NAME ": RX max_pk_cnt: %d\n",
- priv->max_pk_counter);
+ netdev_printk(KERN_DEBUG, ndev, "RX max_pk_cnt: %d\n",
+ priv->max_pk_counter);
}
ret = pk_counter;
while (pk_counter-- > 0)
struct net_device *ndev = priv->netdev;
int intflags, loop;
- if (netif_msg_intr(priv))
- printk(KERN_DEBUG DRV_NAME ": %s() enter\n", __func__);
/* disable further interrupts */
locked_reg_bfclr(priv, EIE, EIE_INTIE);
if ((intflags & EIR_DMAIF) != 0) {
loop++;
if (netif_msg_intr(priv))
- printk(KERN_DEBUG DRV_NAME
- ": intDMA(%d)\n", loop);
+ netdev_printk(KERN_DEBUG, ndev, "intDMA(%d)\n",
+ loop);
locked_reg_bfclr(priv, EIR, EIR_DMAIF);
}
/* LINK changed handler */
if ((intflags & EIR_LINKIF) != 0) {
loop++;
if (netif_msg_intr(priv))
- printk(KERN_DEBUG DRV_NAME
- ": intLINK(%d)\n", loop);
+ netdev_printk(KERN_DEBUG, ndev, "intLINK(%d)\n",
+ loop);
enc28j60_check_link_status(ndev);
/* read PHIR to clear the flag */
enc28j60_phy_read(priv, PHIR);
bool err = false;
loop++;
if (netif_msg_intr(priv))
- printk(KERN_DEBUG DRV_NAME
- ": intTX(%d)\n", loop);
+ netdev_printk(KERN_DEBUG, ndev, "intTX(%d)\n",
+ loop);
priv->tx_retry_count = 0;
if (locked_regb_read(priv, ESTAT) & ESTAT_TXABRT) {
if (netif_msg_tx_err(priv))
- dev_err(&ndev->dev,
- "Tx Error (aborted)\n");
+ netdev_err(ndev, "Tx Error (aborted)\n");
err = true;
}
if (netif_msg_tx_done(priv)) {
loop++;
if (netif_msg_intr(priv))
- printk(KERN_DEBUG DRV_NAME
- ": intTXErr(%d)\n", loop);
+ netdev_printk(KERN_DEBUG, ndev, "intTXErr(%d)\n",
+ loop);
locked_reg_bfclr(priv, ECON1, ECON1_TXRTS);
enc28j60_read_tsv(priv, tsv);
if (netif_msg_tx_err(priv))
/* Transmit Late collision check for retransmit */
if (TSV_GETBIT(tsv, TSV_TXLATECOLLISION)) {
if (netif_msg_tx_err(priv))
- printk(KERN_DEBUG DRV_NAME
- ": LateCollision TXErr (%d)\n",
- priv->tx_retry_count);
+ netdev_printk(KERN_DEBUG, ndev,
+ "LateCollision TXErr (%d)\n",
+ priv->tx_retry_count);
if (priv->tx_retry_count++ < MAX_TX_RETRYCOUNT)
locked_reg_bfset(priv, ECON1,
ECON1_TXRTS);
if ((intflags & EIR_RXERIF) != 0) {
loop++;
if (netif_msg_intr(priv))
- printk(KERN_DEBUG DRV_NAME
- ": intRXErr(%d)\n", loop);
+ netdev_printk(KERN_DEBUG, ndev, "intRXErr(%d)\n",
+ loop);
/* Check free FIFO space to flag RX overrun */
if (enc28j60_get_free_rxfifo(priv) <= 0) {
if (netif_msg_rx_err(priv))
- printk(KERN_DEBUG DRV_NAME
- ": RX Overrun\n");
+ netdev_printk(KERN_DEBUG, ndev, "RX Overrun\n");
ndev->stats.rx_dropped++;
}
locked_reg_bfclr(priv, EIR, EIR_RXERIF);
/* re-enable interrupts */
locked_reg_bfset(priv, EIE, EIE_INTIE);
- if (netif_msg_intr(priv))
- printk(KERN_DEBUG DRV_NAME ": %s() exit\n", __func__);
}
/*
*/
static void enc28j60_hw_tx(struct enc28j60_net *priv)
{
+ struct net_device *ndev = priv->netdev;
+
BUG_ON(!priv->tx_skb);
if (netif_msg_tx_queued(priv))
- printk(KERN_DEBUG DRV_NAME
- ": Tx Packet Len:%d\n", priv->tx_skb->len);
+ netdev_printk(KERN_DEBUG, ndev, "Tx Packet Len:%d\n",
+ priv->tx_skb->len);
if (netif_msg_pktdata(priv))
dump_packet(__func__,
#ifdef CONFIG_ENC28J60_WRITEVERIFY
/* readback and verify written data */
if (netif_msg_drv(priv)) {
+ struct device *dev = &priv->spi->dev;
int test_len, k;
u8 test_buf[64]; /* limit the test to the first 64 bytes */
int okflag;
okflag = 1;
for (k = 0; k < test_len; k++) {
if (priv->tx_skb->data[k] != test_buf[k]) {
- printk(KERN_DEBUG DRV_NAME
- ": Error, %d location differ: "
- "0x%02x-0x%02x\n", k,
- priv->tx_skb->data[k], test_buf[k]);
+ dev_printk(KERN_DEBUG, dev,
+ "Error, %d location differ: 0x%02x-0x%02x\n",
+ k, priv->tx_skb->data[k], test_buf[k]);
okflag = 0;
}
}
if (!okflag)
- printk(KERN_DEBUG DRV_NAME ": Tx write buffer, "
- "verify ERROR!\n");
+ dev_printk(KERN_DEBUG, dev, "Tx write buffer, verify ERROR!\n");
}
#endif
/* set TX request flag */
{
struct enc28j60_net *priv = netdev_priv(dev);
- if (netif_msg_tx_queued(priv))
- printk(KERN_DEBUG DRV_NAME ": %s() enter\n", __func__);
-
/* If some error occurs while trying to transmit this
* packet, you should return '1' from this function.
* In such a case you _may not_ do anything to the
* SKB, it is still owned by the network queueing
- * layer when an error is returned. This means you
+ * layer when an error is returned. This means you
* may not modify any SKB fields, you may not free
* the SKB, etc.
*/
struct enc28j60_net *priv = netdev_priv(ndev);
if (netif_msg_timer(priv))
- dev_err(&ndev->dev, DRV_NAME " tx timeout\n");
+ netdev_err(ndev, "tx timeout\n");
ndev->stats.tx_errors++;
/* can't restart safely under softirq */
{
struct enc28j60_net *priv = netdev_priv(dev);
- if (netif_msg_drv(priv))
- printk(KERN_DEBUG DRV_NAME ": %s() enter\n", __func__);
-
if (!is_valid_ether_addr(dev->dev_addr)) {
if (netif_msg_ifup(priv))
- dev_err(&dev->dev, "invalid MAC address %pM\n",
- dev->dev_addr);
+ netdev_err(dev, "invalid MAC address %pM\n", dev->dev_addr);
return -EADDRNOTAVAIL;
}
/* Reset the hardware here (and take it out of low power mode) */
enc28j60_hw_disable(priv);
if (!enc28j60_hw_init(priv)) {
if (netif_msg_ifup(priv))
- dev_err(&dev->dev, "hw_reset() failed\n");
+ netdev_err(dev, "hw_reset() failed\n");
return -EINVAL;
}
/* Update the MAC address (in case user has changed it) */
{
struct enc28j60_net *priv = netdev_priv(dev);
- if (netif_msg_drv(priv))
- printk(KERN_DEBUG DRV_NAME ": %s() enter\n", __func__);
-
enc28j60_hw_disable(priv);
enc28j60_lowpower(priv, true);
netif_stop_queue(dev);
if (dev->flags & IFF_PROMISC) {
if (netif_msg_link(priv))
- dev_info(&dev->dev, "promiscuous mode\n");
+ netdev_info(dev, "promiscuous mode\n");
priv->rxfilter = RXFILTER_PROMISC;
} else if ((dev->flags & IFF_ALLMULTI) || !netdev_mc_empty(dev)) {
if (netif_msg_link(priv))
- dev_info(&dev->dev, "%smulticast mode\n",
- (dev->flags & IFF_ALLMULTI) ? "all-" : "");
+ netdev_info(dev, "%smulticast mode\n",
+ (dev->flags & IFF_ALLMULTI) ? "all-" : "");
priv->rxfilter = RXFILTER_MULTI;
} else {
if (netif_msg_link(priv))
- dev_info(&dev->dev, "normal mode\n");
+ netdev_info(dev, "normal mode\n");
priv->rxfilter = RXFILTER_NORMAL;
}
{
struct enc28j60_net *priv =
container_of(work, struct enc28j60_net, setrx_work);
+ struct device *dev = &priv->spi->dev;
if (priv->rxfilter == RXFILTER_PROMISC) {
if (netif_msg_drv(priv))
- printk(KERN_DEBUG DRV_NAME ": promiscuous mode\n");
+ dev_printk(KERN_DEBUG, dev, "promiscuous mode\n");
locked_regb_write(priv, ERXFCON, 0x00);
} else if (priv->rxfilter == RXFILTER_MULTI) {
if (netif_msg_drv(priv))
- printk(KERN_DEBUG DRV_NAME ": multicast mode\n");
+ dev_printk(KERN_DEBUG, dev, "multicast mode\n");
locked_regb_write(priv, ERXFCON,
ERXFCON_UCEN | ERXFCON_CRCEN |
ERXFCON_BCEN | ERXFCON_MCEN);
} else {
if (netif_msg_drv(priv))
- printk(KERN_DEBUG DRV_NAME ": normal mode\n");
+ dev_printk(KERN_DEBUG, dev, "normal mode\n");
locked_regb_write(priv, ERXFCON,
ERXFCON_UCEN | ERXFCON_CRCEN |
ERXFCON_BCEN);
enc28j60_net_close(ndev);
ret = enc28j60_net_open(ndev);
if (unlikely(ret)) {
- dev_info(&ndev->dev, " could not restart %d\n", ret);
+ netdev_info(ndev, "could not restart %d\n", ret);
dev_close(ndev);
}
}
static int enc28j60_probe(struct spi_device *spi)
{
+ unsigned char macaddr[ETH_ALEN];
struct net_device *dev;
struct enc28j60_net *priv;
- const void *macaddr;
int ret = 0;
if (netif_msg_drv(&debug))
- dev_info(&spi->dev, DRV_NAME " Ethernet driver %s loaded\n",
- DRV_VERSION);
+ dev_info(&spi->dev, "Ethernet driver %s loaded\n", DRV_VERSION);
dev = alloc_etherdev(sizeof(struct enc28j60_net));
if (!dev) {
priv->netdev = dev; /* priv to netdev reference */
priv->spi = spi; /* priv to spi reference */
- priv->msg_enable = netif_msg_init(debug.msg_enable,
- ENC28J60_MSG_DEFAULT);
+ priv->msg_enable = netif_msg_init(debug.msg_enable, ENC28J60_MSG_DEFAULT);
mutex_init(&priv->lock);
INIT_WORK(&priv->tx_work, enc28j60_tx_work_handler);
INIT_WORK(&priv->setrx_work, enc28j60_setrx_work_handler);
if (!enc28j60_chipset_init(dev)) {
if (netif_msg_probe(priv))
- dev_info(&spi->dev, DRV_NAME " chip not found\n");
+ dev_info(&spi->dev, "chip not found\n");
ret = -EIO;
goto error_irq;
}
- macaddr = of_get_mac_address(spi->dev.of_node);
- if (macaddr)
+ if (device_get_mac_address(&spi->dev, macaddr, sizeof(macaddr)))
ether_addr_copy(dev->dev_addr, macaddr);
else
eth_hw_addr_random(dev);
ret = request_irq(spi->irq, enc28j60_irq, 0, DRV_NAME, priv);
if (ret < 0) {
if (netif_msg_probe(priv))
- dev_err(&spi->dev, DRV_NAME ": request irq %d failed "
- "(ret = %d)\n", spi->irq, ret);
+ dev_err(&spi->dev, "request irq %d failed (ret = %d)\n",
+ spi->irq, ret);
goto error_irq;
}
ret = register_netdev(dev);
if (ret) {
if (netif_msg_probe(priv))
- dev_err(&spi->dev, "register netdev " DRV_NAME
- " failed (ret = %d)\n", ret);
+ dev_err(&spi->dev, "register netdev failed (ret = %d)\n",
+ ret);
goto error_register;
}
- dev_info(&dev->dev, DRV_NAME " driver registered\n");
return 0;
{
struct enc28j60_net *priv = spi_get_drvdata(spi);
- if (netif_msg_drv(priv))
- printk(KERN_DEBUG DRV_NAME ": remove\n");
-
unregister_netdev(priv->netdev);
free_irq(spi->irq, priv);
free_netdev(priv->netdev);
.probe = enc28j60_probe,
.remove = enc28j60_remove,
};
-
-static int __init enc28j60_init(void)
-{
- msec20_to_jiffies = msecs_to_jiffies(20);
-
- return spi_register_driver(&enc28j60_driver);
-}
-
-module_init(enc28j60_init);
-
-static void __exit enc28j60_exit(void)
-{
- spi_unregister_driver(&enc28j60_driver);
-}
-
-module_exit(enc28j60_exit);
+module_spi_driver(enc28j60_driver);
MODULE_DESCRIPTION(DRV_NAME " ethernet driver");
MODULE_AUTHOR("Claudio Lanconelli <lanconelli.claudio@eptar.com>");
tristate "Netronome(R) NFP4000/NFP6000 NIC driver"
depends on PCI && PCI_MSI
depends on VXLAN || VXLAN=n
+ select NET_DEVLINK
---help---
This driver supports the Netronome(R) NFP4000/NFP6000 based
cards working as a advanced Ethernet NIC. It works with both
nfpcore/nfp_resource.o \
nfpcore/nfp_rtsym.o \
nfpcore/nfp_target.o \
+ ccm.o \
nfp_asm.o \
nfp_app.o \
nfp_app_nic.o \
int nfp_abm_ctrl_prio_map_update(struct nfp_abm_link *alink, u32 *packed)
{
+ const u32 cmd = NFP_NET_CFG_MBOX_CMD_PCI_DSCP_PRIOMAP_SET;
struct nfp_net *nn = alink->vnic;
unsigned int i;
int err;
+ err = nfp_net_mbox_lock(nn, alink->abm->prio_map_len);
+ if (err)
+ return err;
+
/* Write data_len and wipe reserved */
nn_writeq(nn, nn->tlv_caps.mbox_off + NFP_NET_ABM_MBOX_DATALEN,
alink->abm->prio_map_len);
nn_writel(nn, nn->tlv_caps.mbox_off + NFP_NET_ABM_MBOX_DATA + i,
packed[i / sizeof(u32)]);
- err = nfp_net_reconfig_mbox(nn,
- NFP_NET_CFG_MBOX_CMD_PCI_DSCP_PRIOMAP_SET);
+ err = nfp_net_mbox_reconfig_and_unlock(nn, cmd);
if (err)
nfp_err(alink->abm->app->cpp,
"setting DSCP -> VQ map failed with error %d\n", err);
}
}
-static struct net_device *nfp_abm_repr_get(struct nfp_app *app, u32 port_id)
+static struct net_device *
+nfp_abm_repr_get(struct nfp_app *app, u32 port_id, bool *redir_egress)
{
enum nfp_repr_type rtype;
struct nfp_reprs *reprs;
.eswitch_mode_get = nfp_abm_eswitch_mode_get,
.eswitch_mode_set = nfp_abm_eswitch_mode_set,
- .repr_get = nfp_abm_repr_get,
+ .dev_get = nfp_abm_repr_get,
};
#include <linux/bug.h>
#include <linux/jiffies.h>
#include <linux/skbuff.h>
-#include <linux/wait.h>
+#include "../ccm.h"
#include "../nfp_app.h"
#include "../nfp_net.h"
#include "fw.h"
#include "main.h"
-#define NFP_BPF_TAG_ALLOC_SPAN (U16_MAX / 4)
-
-static bool nfp_bpf_all_tags_busy(struct nfp_app_bpf *bpf)
-{
- u16 used_tags;
-
- used_tags = bpf->tag_alloc_next - bpf->tag_alloc_last;
-
- return used_tags > NFP_BPF_TAG_ALLOC_SPAN;
-}
-
-static int nfp_bpf_alloc_tag(struct nfp_app_bpf *bpf)
-{
- /* All FW communication for BPF is request-reply. To make sure we
- * don't reuse the message ID too early after timeout - limit the
- * number of requests in flight.
- */
- if (nfp_bpf_all_tags_busy(bpf)) {
- cmsg_warn(bpf, "all FW request contexts busy!\n");
- return -EAGAIN;
- }
-
- WARN_ON(__test_and_set_bit(bpf->tag_alloc_next, bpf->tag_allocator));
- return bpf->tag_alloc_next++;
-}
-
-static void nfp_bpf_free_tag(struct nfp_app_bpf *bpf, u16 tag)
-{
- WARN_ON(!__test_and_clear_bit(tag, bpf->tag_allocator));
-
- while (!test_bit(bpf->tag_alloc_last, bpf->tag_allocator) &&
- bpf->tag_alloc_last != bpf->tag_alloc_next)
- bpf->tag_alloc_last++;
-}
-
static struct sk_buff *
nfp_bpf_cmsg_alloc(struct nfp_app_bpf *bpf, unsigned int size)
{
return size;
}
-static u8 nfp_bpf_cmsg_get_type(struct sk_buff *skb)
-{
- struct cmsg_hdr *hdr;
-
- hdr = (struct cmsg_hdr *)skb->data;
-
- return hdr->type;
-}
-
-static unsigned int nfp_bpf_cmsg_get_tag(struct sk_buff *skb)
-{
- struct cmsg_hdr *hdr;
-
- hdr = (struct cmsg_hdr *)skb->data;
-
- return be16_to_cpu(hdr->tag);
-}
-
-static struct sk_buff *__nfp_bpf_reply(struct nfp_app_bpf *bpf, u16 tag)
-{
- unsigned int msg_tag;
- struct sk_buff *skb;
-
- skb_queue_walk(&bpf->cmsg_replies, skb) {
- msg_tag = nfp_bpf_cmsg_get_tag(skb);
- if (msg_tag == tag) {
- nfp_bpf_free_tag(bpf, tag);
- __skb_unlink(skb, &bpf->cmsg_replies);
- return skb;
- }
- }
-
- return NULL;
-}
-
-static struct sk_buff *nfp_bpf_reply(struct nfp_app_bpf *bpf, u16 tag)
-{
- struct sk_buff *skb;
-
- nfp_ctrl_lock(bpf->app->ctrl);
- skb = __nfp_bpf_reply(bpf, tag);
- nfp_ctrl_unlock(bpf->app->ctrl);
-
- return skb;
-}
-
-static struct sk_buff *nfp_bpf_reply_drop_tag(struct nfp_app_bpf *bpf, u16 tag)
-{
- struct sk_buff *skb;
-
- nfp_ctrl_lock(bpf->app->ctrl);
- skb = __nfp_bpf_reply(bpf, tag);
- if (!skb)
- nfp_bpf_free_tag(bpf, tag);
- nfp_ctrl_unlock(bpf->app->ctrl);
-
- return skb;
-}
-
-static struct sk_buff *
-nfp_bpf_cmsg_wait_reply(struct nfp_app_bpf *bpf, enum nfp_bpf_cmsg_type type,
- int tag)
-{
- struct sk_buff *skb;
- int i, err;
-
- for (i = 0; i < 50; i++) {
- udelay(4);
- skb = nfp_bpf_reply(bpf, tag);
- if (skb)
- return skb;
- }
-
- err = wait_event_interruptible_timeout(bpf->cmsg_wq,
- skb = nfp_bpf_reply(bpf, tag),
- msecs_to_jiffies(5000));
- /* We didn't get a response - try last time and atomically drop
- * the tag even if no response is matched.
- */
- if (!skb)
- skb = nfp_bpf_reply_drop_tag(bpf, tag);
- if (err < 0) {
- cmsg_warn(bpf, "%s waiting for response to 0x%02x: %d\n",
- err == ERESTARTSYS ? "interrupted" : "error",
- type, err);
- return ERR_PTR(err);
- }
- if (!skb) {
- cmsg_warn(bpf, "timeout waiting for response to 0x%02x\n",
- type);
- return ERR_PTR(-ETIMEDOUT);
- }
-
- return skb;
-}
-
-static struct sk_buff *
-nfp_bpf_cmsg_communicate(struct nfp_app_bpf *bpf, struct sk_buff *skb,
- enum nfp_bpf_cmsg_type type, unsigned int reply_size)
-{
- struct cmsg_hdr *hdr;
- int tag;
-
- nfp_ctrl_lock(bpf->app->ctrl);
- tag = nfp_bpf_alloc_tag(bpf);
- if (tag < 0) {
- nfp_ctrl_unlock(bpf->app->ctrl);
- dev_kfree_skb_any(skb);
- return ERR_PTR(tag);
- }
-
- hdr = (void *)skb->data;
- hdr->ver = CMSG_MAP_ABI_VERSION;
- hdr->type = type;
- hdr->tag = cpu_to_be16(tag);
-
- __nfp_app_ctrl_tx(bpf->app, skb);
-
- nfp_ctrl_unlock(bpf->app->ctrl);
-
- skb = nfp_bpf_cmsg_wait_reply(bpf, type, tag);
- if (IS_ERR(skb))
- return skb;
-
- hdr = (struct cmsg_hdr *)skb->data;
- if (hdr->type != __CMSG_REPLY(type)) {
- cmsg_warn(bpf, "cmsg drop - wrong type 0x%02x != 0x%02lx!\n",
- hdr->type, __CMSG_REPLY(type));
- goto err_free;
- }
- /* 0 reply_size means caller will do the validation */
- if (reply_size && skb->len != reply_size) {
- cmsg_warn(bpf, "cmsg drop - type 0x%02x wrong size %d != %d!\n",
- type, skb->len, reply_size);
- goto err_free;
- }
-
- return skb;
-err_free:
- dev_kfree_skb_any(skb);
- return ERR_PTR(-EIO);
-}
-
static int
nfp_bpf_ctrl_rc_to_errno(struct nfp_app_bpf *bpf,
struct cmsg_reply_map_simple *reply)
req->map_type = cpu_to_be32(map->map_type);
req->map_flags = 0;
- skb = nfp_bpf_cmsg_communicate(bpf, skb, CMSG_TYPE_MAP_ALLOC,
- sizeof(*reply));
+ skb = nfp_ccm_communicate(&bpf->ccm, skb, NFP_CCM_TYPE_BPF_MAP_ALLOC,
+ sizeof(*reply));
if (IS_ERR(skb))
return PTR_ERR(skb);
req = (void *)skb->data;
req->tid = cpu_to_be32(nfp_map->tid);
- skb = nfp_bpf_cmsg_communicate(bpf, skb, CMSG_TYPE_MAP_FREE,
- sizeof(*reply));
+ skb = nfp_ccm_communicate(&bpf->ccm, skb, NFP_CCM_TYPE_BPF_MAP_FREE,
+ sizeof(*reply));
if (IS_ERR(skb)) {
cmsg_warn(bpf, "leaking map - I/O error\n");
return;
}
static int
-nfp_bpf_ctrl_entry_op(struct bpf_offloaded_map *offmap,
- enum nfp_bpf_cmsg_type op,
+nfp_bpf_ctrl_entry_op(struct bpf_offloaded_map *offmap, enum nfp_ccm_type op,
u8 *key, u8 *value, u64 flags, u8 *out_key, u8 *out_value)
{
struct nfp_bpf_map *nfp_map = offmap->dev_priv;
memcpy(nfp_bpf_ctrl_req_val(bpf, req, 0), value,
map->value_size);
- skb = nfp_bpf_cmsg_communicate(bpf, skb, op,
- nfp_bpf_cmsg_map_reply_size(bpf, 1));
+ skb = nfp_ccm_communicate(&bpf->ccm, skb, op,
+ nfp_bpf_cmsg_map_reply_size(bpf, 1));
if (IS_ERR(skb))
return PTR_ERR(skb);
int nfp_bpf_ctrl_update_entry(struct bpf_offloaded_map *offmap,
void *key, void *value, u64 flags)
{
- return nfp_bpf_ctrl_entry_op(offmap, CMSG_TYPE_MAP_UPDATE,
+ return nfp_bpf_ctrl_entry_op(offmap, NFP_CCM_TYPE_BPF_MAP_UPDATE,
key, value, flags, NULL, NULL);
}
int nfp_bpf_ctrl_del_entry(struct bpf_offloaded_map *offmap, void *key)
{
- return nfp_bpf_ctrl_entry_op(offmap, CMSG_TYPE_MAP_DELETE,
+ return nfp_bpf_ctrl_entry_op(offmap, NFP_CCM_TYPE_BPF_MAP_DELETE,
key, NULL, 0, NULL, NULL);
}
int nfp_bpf_ctrl_lookup_entry(struct bpf_offloaded_map *offmap,
void *key, void *value)
{
- return nfp_bpf_ctrl_entry_op(offmap, CMSG_TYPE_MAP_LOOKUP,
+ return nfp_bpf_ctrl_entry_op(offmap, NFP_CCM_TYPE_BPF_MAP_LOOKUP,
key, NULL, 0, NULL, value);
}
int nfp_bpf_ctrl_getfirst_entry(struct bpf_offloaded_map *offmap,
void *next_key)
{
- return nfp_bpf_ctrl_entry_op(offmap, CMSG_TYPE_MAP_GETFIRST,
+ return nfp_bpf_ctrl_entry_op(offmap, NFP_CCM_TYPE_BPF_MAP_GETFIRST,
NULL, NULL, 0, next_key, NULL);
}
int nfp_bpf_ctrl_getnext_entry(struct bpf_offloaded_map *offmap,
void *key, void *next_key)
{
- return nfp_bpf_ctrl_entry_op(offmap, CMSG_TYPE_MAP_GETNEXT,
+ return nfp_bpf_ctrl_entry_op(offmap, NFP_CCM_TYPE_BPF_MAP_GETNEXT,
key, NULL, 0, next_key, NULL);
}
void nfp_bpf_ctrl_msg_rx(struct nfp_app *app, struct sk_buff *skb)
{
struct nfp_app_bpf *bpf = app->priv;
- unsigned int tag;
if (unlikely(skb->len < sizeof(struct cmsg_reply_map_simple))) {
cmsg_warn(bpf, "cmsg drop - too short %d!\n", skb->len);
- goto err_free;
+ dev_kfree_skb_any(skb);
+ return;
}
- if (nfp_bpf_cmsg_get_type(skb) == CMSG_TYPE_BPF_EVENT) {
+ if (nfp_ccm_get_type(skb) == NFP_CCM_TYPE_BPF_BPF_EVENT) {
if (!nfp_bpf_event_output(bpf, skb->data, skb->len))
dev_consume_skb_any(skb);
else
dev_kfree_skb_any(skb);
- return;
}
- nfp_ctrl_lock(bpf->app->ctrl);
-
- tag = nfp_bpf_cmsg_get_tag(skb);
- if (unlikely(!test_bit(tag, bpf->tag_allocator))) {
- cmsg_warn(bpf, "cmsg drop - no one is waiting for tag %u!\n",
- tag);
- goto err_unlock;
- }
-
- __skb_queue_tail(&bpf->cmsg_replies, skb);
- wake_up_interruptible_all(&bpf->cmsg_wq);
-
- nfp_ctrl_unlock(bpf->app->ctrl);
-
- return;
-err_unlock:
- nfp_ctrl_unlock(bpf->app->ctrl);
-err_free:
- dev_kfree_skb_any(skb);
+ nfp_ccm_rx(&bpf->ccm, skb);
}
void
nfp_bpf_ctrl_msg_rx_raw(struct nfp_app *app, const void *data, unsigned int len)
{
+ const struct nfp_ccm_hdr *hdr = data;
struct nfp_app_bpf *bpf = app->priv;
- const struct cmsg_hdr *hdr = data;
if (unlikely(len < sizeof(struct cmsg_reply_map_simple))) {
cmsg_warn(bpf, "cmsg drop - too short %d!\n", len);
return;
}
- if (hdr->type == CMSG_TYPE_BPF_EVENT)
+ if (hdr->type == NFP_CCM_TYPE_BPF_BPF_EVENT)
nfp_bpf_event_output(bpf, data, len);
else
cmsg_warn(bpf, "cmsg drop - msg type %d with raw buffer!\n",
#include <linux/bitops.h>
#include <linux/types.h>
+#include "../ccm.h"
/* Kernel's enum bpf_reg_type is not uABI so people may change it breaking
* our FW ABI. In that case we will do translation in the driver.
/*
* Types defined for map related control messages
*/
-#define CMSG_MAP_ABI_VERSION 1
-
-enum nfp_bpf_cmsg_type {
- CMSG_TYPE_MAP_ALLOC = 1,
- CMSG_TYPE_MAP_FREE = 2,
- CMSG_TYPE_MAP_LOOKUP = 3,
- CMSG_TYPE_MAP_UPDATE = 4,
- CMSG_TYPE_MAP_DELETE = 5,
- CMSG_TYPE_MAP_GETNEXT = 6,
- CMSG_TYPE_MAP_GETFIRST = 7,
- CMSG_TYPE_BPF_EVENT = 8,
- __CMSG_TYPE_MAP_MAX,
-};
-
-#define CMSG_TYPE_MAP_REPLY_BIT 7
-#define __CMSG_REPLY(req) (BIT(CMSG_TYPE_MAP_REPLY_BIT) | (req))
/* BPF ABIv2 fixed-length control message fields */
#define CMSG_MAP_KEY_LW 16
CMSG_RC_ERR_MAP_E2BIG = 7,
};
-struct cmsg_hdr {
- u8 type;
- u8 ver;
- __be16 tag;
-};
-
struct cmsg_reply_map_simple {
- struct cmsg_hdr hdr;
+ struct nfp_ccm_hdr hdr;
__be32 rc;
};
struct cmsg_req_map_alloc_tbl {
- struct cmsg_hdr hdr;
+ struct nfp_ccm_hdr hdr;
__be32 key_size; /* in bytes */
__be32 value_size; /* in bytes */
__be32 max_entries;
};
struct cmsg_req_map_free_tbl {
- struct cmsg_hdr hdr;
+ struct nfp_ccm_hdr hdr;
__be32 tid;
};
};
struct cmsg_req_map_op {
- struct cmsg_hdr hdr;
+ struct nfp_ccm_hdr hdr;
__be32 tid;
__be32 count;
__be32 flags;
};
struct cmsg_bpf_event {
- struct cmsg_hdr hdr;
+ struct nfp_ccm_hdr hdr;
__be32 cpu_id;
__be64 map_ptr;
__be32 data_size;
bpf->app = app;
app->priv = bpf;
- skb_queue_head_init(&bpf->cmsg_replies);
- init_waitqueue_head(&bpf->cmsg_wq);
INIT_LIST_HEAD(&bpf->map_list);
- err = rhashtable_init(&bpf->maps_neutral, &nfp_bpf_maps_neutral_params);
+ err = nfp_ccm_init(&bpf->ccm, app);
if (err)
goto err_free_bpf;
+ err = rhashtable_init(&bpf->maps_neutral, &nfp_bpf_maps_neutral_params);
+ if (err)
+ goto err_clean_ccm;
+
nfp_bpf_init_capabilities(bpf);
err = nfp_bpf_parse_capabilities(app);
err_free_neutral_maps:
rhashtable_destroy(&bpf->maps_neutral);
+err_clean_ccm:
+ nfp_ccm_clean(&bpf->ccm);
err_free_bpf:
kfree(bpf);
return err;
struct nfp_app_bpf *bpf = app->priv;
bpf_offload_dev_destroy(bpf->bpf_dev);
- WARN_ON(!skb_queue_empty(&bpf->cmsg_replies));
+ nfp_ccm_clean(&bpf->ccm);
WARN_ON(!list_empty(&bpf->map_list));
WARN_ON(bpf->maps_in_use || bpf->map_elems_in_use);
rhashtable_free_and_destroy(&bpf->maps_neutral,
#include <linux/types.h>
#include <linux/wait.h>
+#include "../ccm.h"
#include "../nfp_asm.h"
#include "fw.h"
/**
* struct nfp_app_bpf - bpf app priv structure
* @app: backpointer to the app
+ * @ccm: common control message handler data
*
* @bpf_dev: BPF offload device handle
*
- * @tag_allocator: bitmap of control message tags in use
- * @tag_alloc_next: next tag bit to allocate
- * @tag_alloc_last: next tag bit to be freed
- *
- * @cmsg_replies: received cmsg replies waiting to be consumed
- * @cmsg_wq: work queue for waiting for cmsg replies
- *
* @cmsg_key_sz: size of key in cmsg element array
* @cmsg_val_sz: size of value in cmsg element array
*
*/
struct nfp_app_bpf {
struct nfp_app *app;
+ struct nfp_ccm ccm;
struct bpf_offload_dev *bpf_dev;
- DECLARE_BITMAP(tag_allocator, U16_MAX + 1);
- u16 tag_alloc_next;
- u16 tag_alloc_last;
-
- struct sk_buff_head cmsg_replies;
- struct wait_queue_head cmsg_wq;
-
unsigned int cmsg_key_sz;
unsigned int cmsg_val_sz;
#include <net/tc_act/tc_mirred.h>
#include "main.h"
+#include "../ccm.h"
#include "../nfp_app.h"
#include "../nfp_net_ctrl.h"
#include "../nfp_net.h"
if (len < sizeof(struct cmsg_bpf_event) + pkt_size + data_size)
return -EINVAL;
- if (cbe->hdr.ver != CMSG_MAP_ABI_VERSION)
+ if (cbe->hdr.ver != NFP_CCM_ABI_VERSION)
return -EINVAL;
rcu_read_lock();
--- /dev/null
+// SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+/* Copyright (C) 2016-2019 Netronome Systems, Inc. */
+
+#include <linux/bitops.h>
+
+#include "ccm.h"
+#include "nfp_app.h"
+#include "nfp_net.h"
+
+#define NFP_CCM_TYPE_REPLY_BIT 7
+#define __NFP_CCM_REPLY(req) (BIT(NFP_CCM_TYPE_REPLY_BIT) | (req))
+
+#define ccm_warn(app, msg...) nn_dp_warn(&(app)->ctrl->dp, msg)
+
+#define NFP_CCM_TAG_ALLOC_SPAN (U16_MAX / 4)
+
+static bool nfp_ccm_all_tags_busy(struct nfp_ccm *ccm)
+{
+ u16 used_tags;
+
+ used_tags = ccm->tag_alloc_next - ccm->tag_alloc_last;
+
+ return used_tags > NFP_CCM_TAG_ALLOC_SPAN;
+}
+
+static int nfp_ccm_alloc_tag(struct nfp_ccm *ccm)
+{
+ /* CCM is for FW communication which is request-reply. To make sure
+ * we don't reuse the message ID too early after timeout - limit the
+ * number of requests in flight.
+ */
+ if (unlikely(nfp_ccm_all_tags_busy(ccm))) {
+ ccm_warn(ccm->app, "all FW request contexts busy!\n");
+ return -EAGAIN;
+ }
+
+ WARN_ON(__test_and_set_bit(ccm->tag_alloc_next, ccm->tag_allocator));
+ return ccm->tag_alloc_next++;
+}
+
+static void nfp_ccm_free_tag(struct nfp_ccm *ccm, u16 tag)
+{
+ WARN_ON(!__test_and_clear_bit(tag, ccm->tag_allocator));
+
+ while (!test_bit(ccm->tag_alloc_last, ccm->tag_allocator) &&
+ ccm->tag_alloc_last != ccm->tag_alloc_next)
+ ccm->tag_alloc_last++;
+}
+
+static struct sk_buff *__nfp_ccm_reply(struct nfp_ccm *ccm, u16 tag)
+{
+ unsigned int msg_tag;
+ struct sk_buff *skb;
+
+ skb_queue_walk(&ccm->replies, skb) {
+ msg_tag = nfp_ccm_get_tag(skb);
+ if (msg_tag == tag) {
+ nfp_ccm_free_tag(ccm, tag);
+ __skb_unlink(skb, &ccm->replies);
+ return skb;
+ }
+ }
+
+ return NULL;
+}
+
+static struct sk_buff *
+nfp_ccm_reply(struct nfp_ccm *ccm, struct nfp_app *app, u16 tag)
+{
+ struct sk_buff *skb;
+
+ nfp_ctrl_lock(app->ctrl);
+ skb = __nfp_ccm_reply(ccm, tag);
+ nfp_ctrl_unlock(app->ctrl);
+
+ return skb;
+}
+
+static struct sk_buff *
+nfp_ccm_reply_drop_tag(struct nfp_ccm *ccm, struct nfp_app *app, u16 tag)
+{
+ struct sk_buff *skb;
+
+ nfp_ctrl_lock(app->ctrl);
+ skb = __nfp_ccm_reply(ccm, tag);
+ if (!skb)
+ nfp_ccm_free_tag(ccm, tag);
+ nfp_ctrl_unlock(app->ctrl);
+
+ return skb;
+}
+
+static struct sk_buff *
+nfp_ccm_wait_reply(struct nfp_ccm *ccm, struct nfp_app *app,
+ enum nfp_ccm_type type, int tag)
+{
+ struct sk_buff *skb;
+ int i, err;
+
+ for (i = 0; i < 50; i++) {
+ udelay(4);
+ skb = nfp_ccm_reply(ccm, app, tag);
+ if (skb)
+ return skb;
+ }
+
+ err = wait_event_interruptible_timeout(ccm->wq,
+ skb = nfp_ccm_reply(ccm, app,
+ tag),
+ msecs_to_jiffies(5000));
+ /* We didn't get a response - try last time and atomically drop
+ * the tag even if no response is matched.
+ */
+ if (!skb)
+ skb = nfp_ccm_reply_drop_tag(ccm, app, tag);
+ if (err < 0) {
+ ccm_warn(app, "%s waiting for response to 0x%02x: %d\n",
+ err == ERESTARTSYS ? "interrupted" : "error",
+ type, err);
+ return ERR_PTR(err);
+ }
+ if (!skb) {
+ ccm_warn(app, "timeout waiting for response to 0x%02x\n", type);
+ return ERR_PTR(-ETIMEDOUT);
+ }
+
+ return skb;
+}
+
+struct sk_buff *
+nfp_ccm_communicate(struct nfp_ccm *ccm, struct sk_buff *skb,
+ enum nfp_ccm_type type, unsigned int reply_size)
+{
+ struct nfp_app *app = ccm->app;
+ struct nfp_ccm_hdr *hdr;
+ int reply_type, tag;
+
+ nfp_ctrl_lock(app->ctrl);
+ tag = nfp_ccm_alloc_tag(ccm);
+ if (tag < 0) {
+ nfp_ctrl_unlock(app->ctrl);
+ dev_kfree_skb_any(skb);
+ return ERR_PTR(tag);
+ }
+
+ hdr = (void *)skb->data;
+ hdr->ver = NFP_CCM_ABI_VERSION;
+ hdr->type = type;
+ hdr->tag = cpu_to_be16(tag);
+
+ __nfp_app_ctrl_tx(app, skb);
+
+ nfp_ctrl_unlock(app->ctrl);
+
+ skb = nfp_ccm_wait_reply(ccm, app, type, tag);
+ if (IS_ERR(skb))
+ return skb;
+
+ reply_type = nfp_ccm_get_type(skb);
+ if (reply_type != __NFP_CCM_REPLY(type)) {
+ ccm_warn(app, "cmsg drop - wrong type 0x%02x != 0x%02lx!\n",
+ reply_type, __NFP_CCM_REPLY(type));
+ goto err_free;
+ }
+ /* 0 reply_size means caller will do the validation */
+ if (reply_size && skb->len != reply_size) {
+ ccm_warn(app, "cmsg drop - type 0x%02x wrong size %d != %d!\n",
+ type, skb->len, reply_size);
+ goto err_free;
+ }
+
+ return skb;
+err_free:
+ dev_kfree_skb_any(skb);
+ return ERR_PTR(-EIO);
+}
+
+void nfp_ccm_rx(struct nfp_ccm *ccm, struct sk_buff *skb)
+{
+ struct nfp_app *app = ccm->app;
+ unsigned int tag;
+
+ if (unlikely(skb->len < sizeof(struct nfp_ccm_hdr))) {
+ ccm_warn(app, "cmsg drop - too short %d!\n", skb->len);
+ goto err_free;
+ }
+
+ nfp_ctrl_lock(app->ctrl);
+
+ tag = nfp_ccm_get_tag(skb);
+ if (unlikely(!test_bit(tag, ccm->tag_allocator))) {
+ ccm_warn(app, "cmsg drop - no one is waiting for tag %u!\n",
+ tag);
+ goto err_unlock;
+ }
+
+ __skb_queue_tail(&ccm->replies, skb);
+ wake_up_interruptible_all(&ccm->wq);
+
+ nfp_ctrl_unlock(app->ctrl);
+ return;
+
+err_unlock:
+ nfp_ctrl_unlock(app->ctrl);
+err_free:
+ dev_kfree_skb_any(skb);
+}
+
+int nfp_ccm_init(struct nfp_ccm *ccm, struct nfp_app *app)
+{
+ ccm->app = app;
+ skb_queue_head_init(&ccm->replies);
+ init_waitqueue_head(&ccm->wq);
+ return 0;
+}
+
+void nfp_ccm_clean(struct nfp_ccm *ccm)
+{
+ WARN_ON(!skb_queue_empty(&ccm->replies));
+}
--- /dev/null
+/* SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) */
+/* Copyright (C) 2016-2019 Netronome Systems, Inc. */
+
+#ifndef NFP_CCM_H
+#define NFP_CCM_H 1
+
+#include <linux/bitmap.h>
+#include <linux/skbuff.h>
+#include <linux/wait.h>
+
+struct nfp_app;
+
+/* Firmware ABI */
+
+enum nfp_ccm_type {
+ NFP_CCM_TYPE_BPF_MAP_ALLOC = 1,
+ NFP_CCM_TYPE_BPF_MAP_FREE = 2,
+ NFP_CCM_TYPE_BPF_MAP_LOOKUP = 3,
+ NFP_CCM_TYPE_BPF_MAP_UPDATE = 4,
+ NFP_CCM_TYPE_BPF_MAP_DELETE = 5,
+ NFP_CCM_TYPE_BPF_MAP_GETNEXT = 6,
+ NFP_CCM_TYPE_BPF_MAP_GETFIRST = 7,
+ NFP_CCM_TYPE_BPF_BPF_EVENT = 8,
+ __NFP_CCM_TYPE_MAX,
+};
+
+#define NFP_CCM_ABI_VERSION 1
+
+struct nfp_ccm_hdr {
+ u8 type;
+ u8 ver;
+ __be16 tag;
+};
+
+static inline u8 nfp_ccm_get_type(struct sk_buff *skb)
+{
+ struct nfp_ccm_hdr *hdr;
+
+ hdr = (struct nfp_ccm_hdr *)skb->data;
+
+ return hdr->type;
+}
+
+static inline unsigned int nfp_ccm_get_tag(struct sk_buff *skb)
+{
+ struct nfp_ccm_hdr *hdr;
+
+ hdr = (struct nfp_ccm_hdr *)skb->data;
+
+ return be16_to_cpu(hdr->tag);
+}
+
+/* Implementation */
+
+/**
+ * struct nfp_ccm - common control message handling
+ * @tag_allocator: bitmap of control message tags in use
+ * @tag_alloc_next: next tag bit to allocate
+ * @tag_alloc_last: next tag bit to be freed
+ *
+ * @replies: received cmsg replies waiting to be consumed
+ * @wq: work queue for waiting for cmsg replies
+ */
+struct nfp_ccm {
+ struct nfp_app *app;
+
+ DECLARE_BITMAP(tag_allocator, U16_MAX + 1);
+ u16 tag_alloc_next;
+ u16 tag_alloc_last;
+
+ struct sk_buff_head replies;
+ struct wait_queue_head wq;
+};
+
+int nfp_ccm_init(struct nfp_ccm *ccm, struct nfp_app *app);
+void nfp_ccm_clean(struct nfp_ccm *ccm);
+void nfp_ccm_rx(struct nfp_ccm *ccm, struct sk_buff *skb);
+struct sk_buff *
+nfp_ccm_communicate(struct nfp_ccm *ccm, struct sk_buff *skb,
+ enum nfp_ccm_type type, unsigned int reply_size);
+#endif
struct nfp_flower_priv *priv = app->priv;
switch (tun->key.tp_dst) {
- case htons(NFP_FL_VXLAN_PORT):
+ case htons(IANA_VXLAN_UDP_PORT):
return NFP_FL_TUNNEL_VXLAN;
- case htons(NFP_FL_GENEVE_PORT):
+ case htons(GENEVE_UDP_PORT):
if (priv->flower_ext_feats & NFP_FL_FEATS_GENEVE)
return NFP_FL_TUNNEL_GENEVE;
/* FALLTHROUGH */
}
}
-static int
-nfp_fl_pedit(const struct flow_action_entry *act,
- struct tc_cls_flower_offload *flow,
- char *nfp_action, int *a_len, u32 *csum_updated)
-{
- struct flow_rule *rule = tc_cls_flower_offload_flow_rule(flow);
+struct nfp_flower_pedit_acts {
struct nfp_fl_set_ipv6_addr set_ip6_dst, set_ip6_src;
struct nfp_fl_set_ipv6_tc_hl_fl set_ip6_tc_hl_fl;
struct nfp_fl_set_ip4_ttl_tos set_ip_ttl_tos;
struct nfp_fl_set_ip4_addrs set_ip_addr;
- enum flow_action_mangle_base htype;
struct nfp_fl_set_tport set_tport;
struct nfp_fl_set_eth set_eth;
+};
+
+static int
+nfp_fl_commit_mangle(struct tc_cls_flower_offload *flow, char *nfp_action,
+ int *a_len, struct nfp_flower_pedit_acts *set_act,
+ u32 *csum_updated)
+{
+ struct flow_rule *rule = tc_cls_flower_offload_flow_rule(flow);
size_t act_size = 0;
u8 ip_proto = 0;
- u32 offset;
- int err;
-
- memset(&set_ip6_tc_hl_fl, 0, sizeof(set_ip6_tc_hl_fl));
- memset(&set_ip_ttl_tos, 0, sizeof(set_ip_ttl_tos));
- memset(&set_ip6_dst, 0, sizeof(set_ip6_dst));
- memset(&set_ip6_src, 0, sizeof(set_ip6_src));
- memset(&set_ip_addr, 0, sizeof(set_ip_addr));
- memset(&set_tport, 0, sizeof(set_tport));
- memset(&set_eth, 0, sizeof(set_eth));
-
- htype = act->mangle.htype;
- offset = act->mangle.offset;
-
- switch (htype) {
- case TCA_PEDIT_KEY_EX_HDR_TYPE_ETH:
- err = nfp_fl_set_eth(act, offset, &set_eth);
- break;
- case TCA_PEDIT_KEY_EX_HDR_TYPE_IP4:
- err = nfp_fl_set_ip4(act, offset, &set_ip_addr,
- &set_ip_ttl_tos);
- break;
- case TCA_PEDIT_KEY_EX_HDR_TYPE_IP6:
- err = nfp_fl_set_ip6(act, offset, &set_ip6_dst,
- &set_ip6_src, &set_ip6_tc_hl_fl);
- break;
- case TCA_PEDIT_KEY_EX_HDR_TYPE_TCP:
- err = nfp_fl_set_tport(act, offset, &set_tport,
- NFP_FL_ACTION_OPCODE_SET_TCP);
- break;
- case TCA_PEDIT_KEY_EX_HDR_TYPE_UDP:
- err = nfp_fl_set_tport(act, offset, &set_tport,
- NFP_FL_ACTION_OPCODE_SET_UDP);
- break;
- default:
- return -EOPNOTSUPP;
- }
- if (err)
- return err;
if (flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_BASIC)) {
struct flow_match_basic match;
ip_proto = match.key->ip_proto;
}
- if (set_eth.head.len_lw) {
- act_size = sizeof(set_eth);
- memcpy(nfp_action, &set_eth, act_size);
+ if (set_act->set_eth.head.len_lw) {
+ act_size = sizeof(set_act->set_eth);
+ memcpy(nfp_action, &set_act->set_eth, act_size);
*a_len += act_size;
}
- if (set_ip_ttl_tos.head.len_lw) {
+
+ if (set_act->set_ip_ttl_tos.head.len_lw) {
nfp_action += act_size;
- act_size = sizeof(set_ip_ttl_tos);
- memcpy(nfp_action, &set_ip_ttl_tos, act_size);
+ act_size = sizeof(set_act->set_ip_ttl_tos);
+ memcpy(nfp_action, &set_act->set_ip_ttl_tos, act_size);
*a_len += act_size;
/* Hardware will automatically fix IPv4 and TCP/UDP checksum. */
*csum_updated |= TCA_CSUM_UPDATE_FLAG_IPV4HDR |
nfp_fl_csum_l4_to_flag(ip_proto);
}
- if (set_ip_addr.head.len_lw) {
+
+ if (set_act->set_ip_addr.head.len_lw) {
nfp_action += act_size;
- act_size = sizeof(set_ip_addr);
- memcpy(nfp_action, &set_ip_addr, act_size);
+ act_size = sizeof(set_act->set_ip_addr);
+ memcpy(nfp_action, &set_act->set_ip_addr, act_size);
*a_len += act_size;
/* Hardware will automatically fix IPv4 and TCP/UDP checksum. */
*csum_updated |= TCA_CSUM_UPDATE_FLAG_IPV4HDR |
nfp_fl_csum_l4_to_flag(ip_proto);
}
- if (set_ip6_tc_hl_fl.head.len_lw) {
+
+ if (set_act->set_ip6_tc_hl_fl.head.len_lw) {
nfp_action += act_size;
- act_size = sizeof(set_ip6_tc_hl_fl);
- memcpy(nfp_action, &set_ip6_tc_hl_fl, act_size);
+ act_size = sizeof(set_act->set_ip6_tc_hl_fl);
+ memcpy(nfp_action, &set_act->set_ip6_tc_hl_fl, act_size);
*a_len += act_size;
/* Hardware will automatically fix TCP/UDP checksum. */
*csum_updated |= nfp_fl_csum_l4_to_flag(ip_proto);
}
- if (set_ip6_dst.head.len_lw && set_ip6_src.head.len_lw) {
+
+ if (set_act->set_ip6_dst.head.len_lw &&
+ set_act->set_ip6_src.head.len_lw) {
/* TC compiles set src and dst IPv6 address as a single action,
* the hardware requires this to be 2 separate actions.
*/
nfp_action += act_size;
- act_size = sizeof(set_ip6_src);
- memcpy(nfp_action, &set_ip6_src, act_size);
+ act_size = sizeof(set_act->set_ip6_src);
+ memcpy(nfp_action, &set_act->set_ip6_src, act_size);
*a_len += act_size;
- act_size = sizeof(set_ip6_dst);
- memcpy(&nfp_action[sizeof(set_ip6_src)], &set_ip6_dst,
- act_size);
+ act_size = sizeof(set_act->set_ip6_dst);
+ memcpy(&nfp_action[sizeof(set_act->set_ip6_src)],
+ &set_act->set_ip6_dst, act_size);
*a_len += act_size;
/* Hardware will automatically fix TCP/UDP checksum. */
*csum_updated |= nfp_fl_csum_l4_to_flag(ip_proto);
- } else if (set_ip6_dst.head.len_lw) {
+ } else if (set_act->set_ip6_dst.head.len_lw) {
nfp_action += act_size;
- act_size = sizeof(set_ip6_dst);
- memcpy(nfp_action, &set_ip6_dst, act_size);
+ act_size = sizeof(set_act->set_ip6_dst);
+ memcpy(nfp_action, &set_act->set_ip6_dst, act_size);
*a_len += act_size;
/* Hardware will automatically fix TCP/UDP checksum. */
*csum_updated |= nfp_fl_csum_l4_to_flag(ip_proto);
- } else if (set_ip6_src.head.len_lw) {
+ } else if (set_act->set_ip6_src.head.len_lw) {
nfp_action += act_size;
- act_size = sizeof(set_ip6_src);
- memcpy(nfp_action, &set_ip6_src, act_size);
+ act_size = sizeof(set_act->set_ip6_src);
+ memcpy(nfp_action, &set_act->set_ip6_src, act_size);
*a_len += act_size;
/* Hardware will automatically fix TCP/UDP checksum. */
*csum_updated |= nfp_fl_csum_l4_to_flag(ip_proto);
}
- if (set_tport.head.len_lw) {
+ if (set_act->set_tport.head.len_lw) {
nfp_action += act_size;
- act_size = sizeof(set_tport);
- memcpy(nfp_action, &set_tport, act_size);
+ act_size = sizeof(set_act->set_tport);
+ memcpy(nfp_action, &set_act->set_tport, act_size);
*a_len += act_size;
/* Hardware will automatically fix TCP/UDP checksum. */
}
static int
-nfp_flower_output_action(struct nfp_app *app, const struct flow_action_entry *act,
+nfp_fl_pedit(const struct flow_action_entry *act,
+ struct tc_cls_flower_offload *flow, char *nfp_action, int *a_len,
+ u32 *csum_updated, struct nfp_flower_pedit_acts *set_act)
+{
+ enum flow_action_mangle_base htype;
+ u32 offset;
+
+ htype = act->mangle.htype;
+ offset = act->mangle.offset;
+
+ switch (htype) {
+ case TCA_PEDIT_KEY_EX_HDR_TYPE_ETH:
+ return nfp_fl_set_eth(act, offset, &set_act->set_eth);
+ case TCA_PEDIT_KEY_EX_HDR_TYPE_IP4:
+ return nfp_fl_set_ip4(act, offset, &set_act->set_ip_addr,
+ &set_act->set_ip_ttl_tos);
+ case TCA_PEDIT_KEY_EX_HDR_TYPE_IP6:
+ return nfp_fl_set_ip6(act, offset, &set_act->set_ip6_dst,
+ &set_act->set_ip6_src,
+ &set_act->set_ip6_tc_hl_fl);
+ case TCA_PEDIT_KEY_EX_HDR_TYPE_TCP:
+ return nfp_fl_set_tport(act, offset, &set_act->set_tport,
+ NFP_FL_ACTION_OPCODE_SET_TCP);
+ case TCA_PEDIT_KEY_EX_HDR_TYPE_UDP:
+ return nfp_fl_set_tport(act, offset, &set_act->set_tport,
+ NFP_FL_ACTION_OPCODE_SET_UDP);
+ default:
+ return -EOPNOTSUPP;
+ }
+}
+
+static int
+nfp_flower_output_action(struct nfp_app *app,
+ const struct flow_action_entry *act,
struct nfp_fl_payload *nfp_fl, int *a_len,
struct net_device *netdev, bool last,
enum nfp_flower_tun_type *tun_type, int *tun_out_cnt,
struct nfp_fl_payload *nfp_fl, int *a_len,
struct net_device *netdev,
enum nfp_flower_tun_type *tun_type, int *tun_out_cnt,
- int *out_cnt, u32 *csum_updated)
+ int *out_cnt, u32 *csum_updated,
+ struct nfp_flower_pedit_acts *set_act)
{
struct nfp_fl_set_ipv4_udp_tun *set_tun;
struct nfp_fl_pre_tunnel *pre_tun;
return 0;
case FLOW_ACTION_MANGLE:
if (nfp_fl_pedit(act, flow, &nfp_fl->action_data[*a_len],
- a_len, csum_updated))
+ a_len, csum_updated, set_act))
return -EOPNOTSUPP;
break;
case FLOW_ACTION_CSUM:
return 0;
}
+static bool nfp_fl_check_mangle_start(struct flow_action *flow_act,
+ int current_act_idx)
+{
+ struct flow_action_entry current_act;
+ struct flow_action_entry prev_act;
+
+ current_act = flow_act->entries[current_act_idx];
+ if (current_act.id != FLOW_ACTION_MANGLE)
+ return false;
+
+ if (current_act_idx == 0)
+ return true;
+
+ prev_act = flow_act->entries[current_act_idx - 1];
+
+ return prev_act.id != FLOW_ACTION_MANGLE;
+}
+
+static bool nfp_fl_check_mangle_end(struct flow_action *flow_act,
+ int current_act_idx)
+{
+ struct flow_action_entry current_act;
+ struct flow_action_entry next_act;
+
+ current_act = flow_act->entries[current_act_idx];
+ if (current_act.id != FLOW_ACTION_MANGLE)
+ return false;
+
+ if (current_act_idx == flow_act->num_entries)
+ return true;
+
+ next_act = flow_act->entries[current_act_idx + 1];
+
+ return next_act.id != FLOW_ACTION_MANGLE;
+}
+
int nfp_flower_compile_action(struct nfp_app *app,
struct tc_cls_flower_offload *flow,
struct net_device *netdev,
struct nfp_fl_payload *nfp_flow)
{
int act_len, act_cnt, err, tun_out_cnt, out_cnt, i;
+ struct nfp_flower_pedit_acts set_act;
enum nfp_flower_tun_type tun_type;
struct flow_action_entry *act;
u32 csum_updated = 0;
out_cnt = 0;
flow_action_for_each(i, act, &flow->rule->action) {
+ if (nfp_fl_check_mangle_start(&flow->rule->action, i))
+ memset(&set_act, 0, sizeof(set_act));
err = nfp_flower_loop_action(app, act, flow, nfp_flow, &act_len,
netdev, &tun_type, &tun_out_cnt,
- &out_cnt, &csum_updated);
+ &out_cnt, &csum_updated, &set_act);
if (err)
return err;
act_cnt++;
+ if (nfp_fl_check_mangle_end(&flow->rule->action, i))
+ nfp_fl_commit_mangle(flow,
+ &nfp_flow->action_data[act_len],
+ &act_len, &set_act, &csum_updated);
}
/* We optimise when the action list is small, this can unfortunately
rtnl_lock();
rcu_read_lock();
- netdev = nfp_app_repr_get(app, be32_to_cpu(msg->portnum));
+ netdev = nfp_app_dev_get(app, be32_to_cpu(msg->portnum), NULL);
rcu_read_unlock();
if (!netdev) {
nfp_flower_cmsg_warn(app, "ctrl msg for unknown port 0x%08x\n",
msg = nfp_flower_cmsg_get_data(skb);
rcu_read_lock();
- exists = !!nfp_app_repr_get(app, be32_to_cpu(msg->portnum));
+ exists = !!nfp_app_dev_get(app, be32_to_cpu(msg->portnum), NULL);
rcu_read_unlock();
if (!exists) {
nfp_flower_cmsg_warn(app, "ctrl msg for unknown port 0x%08x\n",
wake_up(&priv->reify_wait_queue);
}
+static void
+nfp_flower_cmsg_merge_hint_rx(struct nfp_app *app, struct sk_buff *skb)
+{
+ unsigned int msg_len = nfp_flower_cmsg_get_data_len(skb);
+ struct nfp_flower_cmsg_merge_hint *msg;
+ struct nfp_fl_payload *sub_flows[2];
+ int err, i, flow_cnt;
+
+ msg = nfp_flower_cmsg_get_data(skb);
+ /* msg->count starts at 0 and always assumes at least 1 entry. */
+ flow_cnt = msg->count + 1;
+
+ if (msg_len < struct_size(msg, flow, flow_cnt)) {
+ nfp_flower_cmsg_warn(app, "Merge hint ctrl msg too short - %d bytes but expect %ld\n",
+ msg_len, struct_size(msg, flow, flow_cnt));
+ return;
+ }
+
+ if (flow_cnt != 2) {
+ nfp_flower_cmsg_warn(app, "Merge hint contains %d flows - two are expected\n",
+ flow_cnt);
+ return;
+ }
+
+ rtnl_lock();
+ for (i = 0; i < flow_cnt; i++) {
+ u32 ctx = be32_to_cpu(msg->flow[i].host_ctx);
+
+ sub_flows[i] = nfp_flower_get_fl_payload_from_ctx(app, ctx);
+ if (!sub_flows[i]) {
+ nfp_flower_cmsg_warn(app, "Invalid flow in merge hint\n");
+ goto err_rtnl_unlock;
+ }
+ }
+
+ err = nfp_flower_merge_offloaded_flows(app, sub_flows[0], sub_flows[1]);
+ /* Only warn on memory fail. Hint veto will not break functionality. */
+ if (err == -ENOMEM)
+ nfp_flower_cmsg_warn(app, "Flow merge memory fail.\n");
+
+err_rtnl_unlock:
+ rtnl_unlock();
+}
+
static void
nfp_flower_cmsg_process_one_rx(struct nfp_app *app, struct sk_buff *skb)
{
case NFP_FLOWER_CMSG_TYPE_PORT_MOD:
nfp_flower_cmsg_portmod_rx(app, skb);
break;
+ case NFP_FLOWER_CMSG_TYPE_MERGE_HINT:
+ if (app_priv->flower_ext_feats & NFP_FL_FEATS_FLOW_MERGE) {
+ nfp_flower_cmsg_merge_hint_rx(app, skb);
+ break;
+ }
+ goto err_default;
case NFP_FLOWER_CMSG_TYPE_NO_NEIGH:
nfp_tunnel_request_route(app, skb);
break;
}
/* fall through */
default:
+err_default:
nfp_flower_cmsg_warn(app, "Cannot handle invalid repr control type %u\n",
type);
goto out;
/* Types defined for port related control messages */
enum nfp_flower_cmsg_type_port {
NFP_FLOWER_CMSG_TYPE_FLOW_ADD = 0,
+ NFP_FLOWER_CMSG_TYPE_FLOW_MOD = 1,
NFP_FLOWER_CMSG_TYPE_FLOW_DEL = 2,
NFP_FLOWER_CMSG_TYPE_LAG_CONFIG = 4,
NFP_FLOWER_CMSG_TYPE_PORT_REIFY = 6,
NFP_FLOWER_CMSG_TYPE_MAC_REPR = 7,
NFP_FLOWER_CMSG_TYPE_PORT_MOD = 8,
+ NFP_FLOWER_CMSG_TYPE_MERGE_HINT = 9,
NFP_FLOWER_CMSG_TYPE_NO_NEIGH = 10,
NFP_FLOWER_CMSG_TYPE_TUN_MAC = 11,
NFP_FLOWER_CMSG_TYPE_ACTIVE_TUNS = 12,
#define NFP_FLOWER_CMSG_PORTREIFY_INFO_EXIST BIT(0)
+/* NFP_FLOWER_CMSG_TYPE_FLOW_MERGE_HINT */
+struct nfp_flower_cmsg_merge_hint {
+ u8 reserved[3];
+ u8 count;
+ struct {
+ __be32 host_ctx;
+ __be64 host_cookie;
+ } __packed flow[0];
+};
+
enum nfp_flower_cmsg_port_type {
NFP_FLOWER_CMSG_PORT_TYPE_UNSPEC = 0x0,
NFP_FLOWER_CMSG_PORT_TYPE_PHYS_PORT = 0x1,
#define NFP_FLOWER_CMSG_PORT_PCIE_Q GENMASK(5, 0)
#define NFP_FLOWER_CMSG_PORT_PHYS_PORT_NUM GENMASK(7, 0)
+static inline u32 nfp_flower_internal_port_get_port_id(u8 internal_port)
+{
+ return FIELD_PREP(NFP_FLOWER_CMSG_PORT_PHYS_PORT_NUM, internal_port) |
+ FIELD_PREP(NFP_FLOWER_CMSG_PORT_TYPE,
+ NFP_FLOWER_CMSG_PORT_TYPE_OTHER_PORT);
+}
+
static inline u32 nfp_flower_cmsg_phys_port(u8 phys_port)
{
return FIELD_PREP(NFP_FLOWER_CMSG_PORT_PHYS_PORT_NUM, phys_port) |
#define NFP_FLOWER_ALLOWED_VER 0x0001000000010000UL
+#define NFP_MIN_INT_PORT_ID 1
+#define NFP_MAX_INT_PORT_ID 256
+
static const char *nfp_flower_extra_cap(struct nfp_app *app, struct nfp_net *nn)
{
return "FLOWER";
return DEVLINK_ESWITCH_MODE_SWITCHDEV;
}
+static int
+nfp_flower_lookup_internal_port_id(struct nfp_flower_priv *priv,
+ struct net_device *netdev)
+{
+ struct net_device *entry;
+ int i, id = 0;
+
+ rcu_read_lock();
+ idr_for_each_entry(&priv->internal_ports.port_ids, entry, i)
+ if (entry == netdev) {
+ id = i;
+ break;
+ }
+ rcu_read_unlock();
+
+ return id;
+}
+
+static int
+nfp_flower_get_internal_port_id(struct nfp_app *app, struct net_device *netdev)
+{
+ struct nfp_flower_priv *priv = app->priv;
+ int id;
+
+ id = nfp_flower_lookup_internal_port_id(priv, netdev);
+ if (id > 0)
+ return id;
+
+ idr_preload(GFP_ATOMIC);
+ spin_lock_bh(&priv->internal_ports.lock);
+ id = idr_alloc(&priv->internal_ports.port_ids, netdev,
+ NFP_MIN_INT_PORT_ID, NFP_MAX_INT_PORT_ID, GFP_ATOMIC);
+ spin_unlock_bh(&priv->internal_ports.lock);
+ idr_preload_end();
+
+ return id;
+}
+
+u32 nfp_flower_get_port_id_from_netdev(struct nfp_app *app,
+ struct net_device *netdev)
+{
+ int ext_port;
+
+ if (nfp_netdev_is_nfp_repr(netdev)) {
+ return nfp_repr_get_port_id(netdev);
+ } else if (nfp_flower_internal_port_can_offload(app, netdev)) {
+ ext_port = nfp_flower_get_internal_port_id(app, netdev);
+ if (ext_port < 0)
+ return 0;
+
+ return nfp_flower_internal_port_get_port_id(ext_port);
+ }
+
+ return 0;
+}
+
+static struct net_device *
+nfp_flower_get_netdev_from_internal_port_id(struct nfp_app *app, int port_id)
+{
+ struct nfp_flower_priv *priv = app->priv;
+ struct net_device *netdev;
+
+ rcu_read_lock();
+ netdev = idr_find(&priv->internal_ports.port_ids, port_id);
+ rcu_read_unlock();
+
+ return netdev;
+}
+
+static void
+nfp_flower_free_internal_port_id(struct nfp_app *app, struct net_device *netdev)
+{
+ struct nfp_flower_priv *priv = app->priv;
+ int id;
+
+ id = nfp_flower_lookup_internal_port_id(priv, netdev);
+ if (!id)
+ return;
+
+ spin_lock_bh(&priv->internal_ports.lock);
+ idr_remove(&priv->internal_ports.port_ids, id);
+ spin_unlock_bh(&priv->internal_ports.lock);
+}
+
+static int
+nfp_flower_internal_port_event_handler(struct nfp_app *app,
+ struct net_device *netdev,
+ unsigned long event)
+{
+ if (event == NETDEV_UNREGISTER &&
+ nfp_flower_internal_port_can_offload(app, netdev))
+ nfp_flower_free_internal_port_id(app, netdev);
+
+ return NOTIFY_OK;
+}
+
+static void nfp_flower_internal_port_init(struct nfp_flower_priv *priv)
+{
+ spin_lock_init(&priv->internal_ports.lock);
+ idr_init(&priv->internal_ports.port_ids);
+}
+
+static void nfp_flower_internal_port_cleanup(struct nfp_flower_priv *priv)
+{
+ idr_destroy(&priv->internal_ports.port_ids);
+}
+
static struct nfp_flower_non_repr_priv *
nfp_flower_non_repr_priv_lookup(struct nfp_app *app, struct net_device *netdev)
{
}
static struct net_device *
-nfp_flower_repr_get(struct nfp_app *app, u32 port_id)
+nfp_flower_dev_get(struct nfp_app *app, u32 port_id, bool *redir_egress)
{
enum nfp_repr_type repr_type;
struct nfp_reprs *reprs;
u8 port = 0;
+ /* Check if the port is internal. */
+ if (FIELD_GET(NFP_FLOWER_CMSG_PORT_TYPE, port_id) ==
+ NFP_FLOWER_CMSG_PORT_TYPE_OTHER_PORT) {
+ if (redir_egress)
+ *redir_egress = true;
+ port = FIELD_GET(NFP_FLOWER_CMSG_PORT_PHYS_PORT_NUM, port_id);
+ return nfp_flower_get_netdev_from_internal_port_id(app, port);
+ }
+
repr_type = nfp_flower_repr_get_type_and_port(app, port_id, &port);
if (repr_type > NFP_REPR_TYPE_MAX)
return NULL;
goto err_cleanup_metadata;
}
+ if (app_priv->flower_ext_feats & NFP_FL_FEATS_FLOW_MOD) {
+ /* Tell the firmware that the driver supports flow merging. */
+ err = nfp_rtsym_write_le(app->pf->rtbl,
+ "_abi_flower_merge_hint_enable", 1);
+ if (!err) {
+ app_priv->flower_ext_feats |= NFP_FL_FEATS_FLOW_MERGE;
+ nfp_flower_internal_port_init(app_priv);
+ } else if (err == -ENOENT) {
+ nfp_warn(app->cpp, "Flow merge not supported by FW.\n");
+ } else {
+ goto err_lag_clean;
+ }
+ } else {
+ nfp_warn(app->cpp, "Flow mod/merge not supported by FW.\n");
+ }
+
INIT_LIST_HEAD(&app_priv->indr_block_cb_priv);
INIT_LIST_HEAD(&app_priv->non_repr_priv);
return 0;
+err_lag_clean:
+ if (app_priv->flower_ext_feats & NFP_FL_FEATS_LAG)
+ nfp_flower_lag_cleanup(&app_priv->nfp_lag);
err_cleanup_metadata:
nfp_flower_metadata_cleanup(app);
err_free_app_priv:
if (app_priv->flower_ext_feats & NFP_FL_FEATS_LAG)
nfp_flower_lag_cleanup(&app_priv->nfp_lag);
+ if (app_priv->flower_ext_feats & NFP_FL_FEATS_FLOW_MERGE)
+ nfp_flower_internal_port_cleanup(app_priv);
+
nfp_flower_metadata_cleanup(app);
vfree(app->priv);
app->priv = NULL;
if (ret & NOTIFY_STOP_MASK)
return ret;
+ ret = nfp_flower_internal_port_event_handler(app, netdev, event);
+ if (ret & NOTIFY_STOP_MASK)
+ return ret;
+
return nfp_tunnel_mac_event_handler(app, netdev, event, ptr);
}
.sriov_disable = nfp_flower_sriov_disable,
.eswitch_mode_get = eswitch_mode_get,
- .repr_get = nfp_flower_repr_get,
+ .dev_get = nfp_flower_dev_get,
.setup_tc = nfp_flower_setup_tc,
};
#define NFP_FL_MASK_REUSE_TIME_NS 40000
#define NFP_FL_MASK_ID_LOCATION 1
-#define NFP_FL_VXLAN_PORT 4789
-#define NFP_FL_GENEVE_PORT 6081
-
/* Extra features bitmap. */
#define NFP_FL_FEATS_GENEVE BIT(0)
#define NFP_FL_NBI_MTU_SETTING BIT(1)
#define NFP_FL_FEATS_GENEVE_OPT BIT(2)
#define NFP_FL_FEATS_VLAN_PCP BIT(3)
+#define NFP_FL_FEATS_FLOW_MOD BIT(5)
+#define NFP_FL_FEATS_FLOW_MERGE BIT(30)
#define NFP_FL_FEATS_LAG BIT(31)
struct nfp_fl_mask_id {
struct sk_buff_head retrans_skbs;
};
+/**
+ * struct nfp_fl_internal_ports - Flower APP priv data for additional ports
+ * @port_ids: Assignment of ids to any additional ports
+ * @lock: Lock for extra ports list
+ */
+struct nfp_fl_internal_ports {
+ struct idr port_ids;
+ spinlock_t lock;
+};
+
/**
* struct nfp_flower_priv - Flower APP per-vNIC priv data
* @app: Back pointer to app
* @flow_table: Hash table used to store flower rules
* @stats: Stored stats updates for flower rules
* @stats_lock: Lock for flower rule stats updates
+ * @stats_ctx_table: Hash table to map stats contexts to its flow rule
* @cmsg_work: Workqueue for control messages processing
* @cmsg_skbs_high: List of higher priority skbs for control message
* processing
* @non_repr_priv: List of offloaded non-repr ports and their priv data
* @active_mem_unit: Current active memory unit for flower rules
* @total_mem_units: Total number of available memory units for flower rules
+ * @internal_ports: Internal port ids used in offloaded rules
*/
struct nfp_flower_priv {
struct nfp_app *app;
struct rhashtable flow_table;
struct nfp_fl_stats *stats;
spinlock_t stats_lock; /* lock stats */
+ struct rhashtable stats_ctx_table;
struct work_struct cmsg_work;
struct sk_buff_head cmsg_skbs_high;
struct sk_buff_head cmsg_skbs_low;
struct list_head non_repr_priv;
unsigned int active_mem_unit;
unsigned int total_mem_units;
+ struct nfp_fl_internal_ports internal_ports;
};
/**
char *unmasked_data;
char *mask_data;
char *action_data;
+ struct list_head linked_flows;
+ bool in_hw;
+};
+
+struct nfp_fl_payload_link {
+ /* A link contains a pointer to a merge flow and an associated sub_flow.
+ * Each merge flow will feature in 2 links to its underlying sub_flows.
+ * A sub_flow will have at least 1 link to a merge flow or more if it
+ * has been used to create multiple merge flows.
+ *
+ * For a merge flow, 'linked_flows' in its nfp_fl_payload struct lists
+ * all links to sub_flows (sub_flow.flow) via merge.list.
+ * For a sub_flow, 'linked_flows' gives all links to merge flows it has
+ * formed (merge_flow.flow) via sub_flow.list.
+ */
+ struct {
+ struct list_head list;
+ struct nfp_fl_payload *flow;
+ } merge_flow, sub_flow;
};
extern const struct rhashtable_params nfp_flower_table_params;
__be64 stats_cookie;
};
+static inline bool
+nfp_flower_internal_port_can_offload(struct nfp_app *app,
+ struct net_device *netdev)
+{
+ struct nfp_flower_priv *app_priv = app->priv;
+
+ if (!(app_priv->flower_ext_feats & NFP_FL_FEATS_FLOW_MERGE))
+ return false;
+ if (!netdev->rtnl_link_ops)
+ return false;
+ if (!strcmp(netdev->rtnl_link_ops->kind, "openvswitch"))
+ return true;
+
+ return false;
+}
+
+/* The address of the merged flow acts as its cookie.
+ * Cookies supplied to us by TC flower are also addresses to allocated
+ * memory and thus this scheme should not generate any collisions.
+ */
+static inline bool nfp_flower_is_merge_flow(struct nfp_fl_payload *flow_pay)
+{
+ return flow_pay->tc_flower_cookie == (unsigned long)flow_pay;
+}
+
int nfp_flower_metadata_init(struct nfp_app *app, u64 host_ctx_count,
unsigned int host_ctx_split);
void nfp_flower_metadata_cleanup(struct nfp_app *app);
int nfp_flower_setup_tc(struct nfp_app *app, struct net_device *netdev,
enum tc_setup_type type, void *type_data);
+int nfp_flower_merge_offloaded_flows(struct nfp_app *app,
+ struct nfp_fl_payload *sub_flow1,
+ struct nfp_fl_payload *sub_flow2);
int nfp_flower_compile_flow_match(struct nfp_app *app,
struct tc_cls_flower_offload *flow,
struct nfp_fl_key_ls *key_ls,
struct tc_cls_flower_offload *flow,
struct nfp_fl_payload *nfp_flow,
struct net_device *netdev);
+void __nfp_modify_flow_metadata(struct nfp_flower_priv *priv,
+ struct nfp_fl_payload *nfp_flow);
int nfp_modify_flow_metadata(struct nfp_app *app,
struct nfp_fl_payload *nfp_flow);
nfp_flower_search_fl_table(struct nfp_app *app, unsigned long tc_flower_cookie,
struct net_device *netdev);
struct nfp_fl_payload *
+nfp_flower_get_fl_payload_from_ctx(struct nfp_app *app, u32 ctx_id);
+struct nfp_fl_payload *
nfp_flower_remove_fl_table(struct nfp_app *app, unsigned long tc_flower_cookie);
void nfp_flower_rx_flow_stats(struct nfp_app *app, struct sk_buff *skb);
__nfp_flower_non_repr_priv_put(struct nfp_flower_non_repr_priv *non_repr_priv);
void
nfp_flower_non_repr_priv_put(struct nfp_app *app, struct net_device *netdev);
+u32 nfp_flower_get_port_id_from_netdev(struct nfp_app *app,
+ struct net_device *netdev);
#endif
struct nfp_fl_payload *nfp_flow,
enum nfp_flower_tun_type tun_type)
{
- u32 cmsg_port = 0;
+ u32 port_id;
int err;
u8 *ext;
u8 *msk;
- if (nfp_netdev_is_nfp_repr(netdev))
- cmsg_port = nfp_repr_get_port_id(netdev);
+ port_id = nfp_flower_get_port_id_from_netdev(app, netdev);
memset(nfp_flow->unmasked_data, 0, key_ls->key_size);
memset(nfp_flow->mask_data, 0, key_ls->key_size);
/* Populate Exact Port data. */
err = nfp_flower_compile_port((struct nfp_flower_in_port *)ext,
- cmsg_port, false, tun_type);
+ port_id, false, tun_type);
if (err)
return err;
/* Populate Mask Port Data. */
err = nfp_flower_compile_port((struct nfp_flower_in_port *)msk,
- cmsg_port, true, tun_type);
+ port_id, true, tun_type);
if (err)
return err;
unsigned long cookie;
};
+struct nfp_fl_stats_ctx_to_flow {
+ struct rhash_head ht_node;
+ u32 stats_cxt;
+ struct nfp_fl_payload *flow;
+};
+
+static const struct rhashtable_params stats_ctx_table_params = {
+ .key_offset = offsetof(struct nfp_fl_stats_ctx_to_flow, stats_cxt),
+ .head_offset = offsetof(struct nfp_fl_stats_ctx_to_flow, ht_node),
+ .key_len = sizeof(u32),
+};
+
static int nfp_release_stats_entry(struct nfp_app *app, u32 stats_context_id)
{
struct nfp_flower_priv *priv = app->priv;
if (!mask_entry)
return false;
- if (meta_flags)
- *meta_flags &= ~NFP_FL_META_FLAG_MANAGE_MASK;
-
*mask_id = mask_entry->mask_id;
mask_entry->ref_cnt--;
if (!mask_entry->ref_cnt) {
struct nfp_fl_payload *nfp_flow,
struct net_device *netdev)
{
+ struct nfp_fl_stats_ctx_to_flow *ctx_entry;
struct nfp_flower_priv *priv = app->priv;
struct nfp_fl_payload *check_entry;
u8 new_mask_id;
u32 stats_cxt;
+ int err;
- if (nfp_get_stats_entry(app, &stats_cxt))
- return -ENOENT;
+ err = nfp_get_stats_entry(app, &stats_cxt);
+ if (err)
+ return err;
nfp_flow->meta.host_ctx_id = cpu_to_be32(stats_cxt);
nfp_flow->meta.host_cookie = cpu_to_be64(flow->cookie);
nfp_flow->ingress_dev = netdev;
+ ctx_entry = kzalloc(sizeof(*ctx_entry), GFP_KERNEL);
+ if (!ctx_entry) {
+ err = -ENOMEM;
+ goto err_release_stats;
+ }
+
+ ctx_entry->stats_cxt = stats_cxt;
+ ctx_entry->flow = nfp_flow;
+
+ if (rhashtable_insert_fast(&priv->stats_ctx_table, &ctx_entry->ht_node,
+ stats_ctx_table_params)) {
+ err = -ENOMEM;
+ goto err_free_ctx_entry;
+ }
+
new_mask_id = 0;
if (!nfp_check_mask_add(app, nfp_flow->mask_data,
nfp_flow->meta.mask_len,
&nfp_flow->meta.flags, &new_mask_id)) {
- if (nfp_release_stats_entry(app, stats_cxt))
- return -EINVAL;
- return -ENOENT;
+ err = -ENOENT;
+ goto err_remove_rhash;
}
nfp_flow->meta.flow_version = cpu_to_be64(priv->flower_version);
check_entry = nfp_flower_search_fl_table(app, flow->cookie, netdev);
if (check_entry) {
- if (nfp_release_stats_entry(app, stats_cxt))
- return -EINVAL;
-
- if (!nfp_check_mask_remove(app, nfp_flow->mask_data,
- nfp_flow->meta.mask_len,
- NULL, &new_mask_id))
- return -EINVAL;
-
- return -EEXIST;
+ err = -EEXIST;
+ goto err_remove_mask;
}
return 0;
+
+err_remove_mask:
+ nfp_check_mask_remove(app, nfp_flow->mask_data, nfp_flow->meta.mask_len,
+ NULL, &new_mask_id);
+err_remove_rhash:
+ WARN_ON_ONCE(rhashtable_remove_fast(&priv->stats_ctx_table,
+ &ctx_entry->ht_node,
+ stats_ctx_table_params));
+err_free_ctx_entry:
+ kfree(ctx_entry);
+err_release_stats:
+ nfp_release_stats_entry(app, stats_cxt);
+
+ return err;
+}
+
+void __nfp_modify_flow_metadata(struct nfp_flower_priv *priv,
+ struct nfp_fl_payload *nfp_flow)
+{
+ nfp_flow->meta.flags &= ~NFP_FL_META_FLAG_MANAGE_MASK;
+ nfp_flow->meta.flow_version = cpu_to_be64(priv->flower_version);
+ priv->flower_version++;
}
int nfp_modify_flow_metadata(struct nfp_app *app,
struct nfp_fl_payload *nfp_flow)
{
+ struct nfp_fl_stats_ctx_to_flow *ctx_entry;
struct nfp_flower_priv *priv = app->priv;
u8 new_mask_id = 0;
u32 temp_ctx_id;
+ __nfp_modify_flow_metadata(priv, nfp_flow);
+
nfp_check_mask_remove(app, nfp_flow->mask_data,
nfp_flow->meta.mask_len, &nfp_flow->meta.flags,
&new_mask_id);
- nfp_flow->meta.flow_version = cpu_to_be64(priv->flower_version);
- priv->flower_version++;
-
/* Update flow payload with mask ids. */
nfp_flow->unmasked_data[NFP_FL_MASK_ID_LOCATION] = new_mask_id;
- /* Release the stats ctx id. */
+ /* Release the stats ctx id and ctx to flow table entry. */
temp_ctx_id = be32_to_cpu(nfp_flow->meta.host_ctx_id);
+ ctx_entry = rhashtable_lookup_fast(&priv->stats_ctx_table, &temp_ctx_id,
+ stats_ctx_table_params);
+ if (!ctx_entry)
+ return -ENOENT;
+
+ WARN_ON_ONCE(rhashtable_remove_fast(&priv->stats_ctx_table,
+ &ctx_entry->ht_node,
+ stats_ctx_table_params));
+ kfree(ctx_entry);
+
return nfp_release_stats_entry(app, temp_ctx_id);
}
+struct nfp_fl_payload *
+nfp_flower_get_fl_payload_from_ctx(struct nfp_app *app, u32 ctx_id)
+{
+ struct nfp_fl_stats_ctx_to_flow *ctx_entry;
+ struct nfp_flower_priv *priv = app->priv;
+
+ ctx_entry = rhashtable_lookup_fast(&priv->stats_ctx_table, &ctx_id,
+ stats_ctx_table_params);
+ if (!ctx_entry)
+ return NULL;
+
+ return ctx_entry->flow;
+}
+
static int nfp_fl_obj_cmpfn(struct rhashtable_compare_arg *arg,
const void *obj)
{
if (err)
return err;
+ err = rhashtable_init(&priv->stats_ctx_table, &stats_ctx_table_params);
+ if (err)
+ goto err_free_flow_table;
+
get_random_bytes(&priv->mask_id_seed, sizeof(priv->mask_id_seed));
/* Init ring buffer and unallocated mask_ids. */
kmalloc_array(NFP_FLOWER_MASK_ENTRY_RS,
NFP_FLOWER_MASK_ELEMENT_RS, GFP_KERNEL);
if (!priv->mask_ids.mask_id_free_list.buf)
- goto err_free_flow_table;
+ goto err_free_stats_ctx_table;
priv->mask_ids.init_unallocated = NFP_FLOWER_MASK_ENTRY_RS - 1;
kfree(priv->mask_ids.last_used);
err_free_mask_id:
kfree(priv->mask_ids.mask_id_free_list.buf);
+err_free_stats_ctx_table:
+ rhashtable_destroy(&priv->stats_ctx_table);
err_free_flow_table:
rhashtable_destroy(&priv->flow_table);
return -ENOMEM;
rhashtable_free_and_destroy(&priv->flow_table,
nfp_check_rhashtable_empty, NULL);
+ rhashtable_free_and_destroy(&priv->stats_ctx_table,
+ nfp_check_rhashtable_empty, NULL);
kvfree(priv->stats);
kfree(priv->mask_ids.mask_id_free_list.buf);
kfree(priv->mask_ids.last_used);
BIT(FLOW_DISSECTOR_KEY_ENC_IPV4_ADDRS) | \
BIT(FLOW_DISSECTOR_KEY_ENC_PORTS))
+#define NFP_FLOWER_MERGE_FIELDS \
+ (NFP_FLOWER_LAYER_PORT | \
+ NFP_FLOWER_LAYER_MAC | \
+ NFP_FLOWER_LAYER_TP | \
+ NFP_FLOWER_LAYER_IPV4 | \
+ NFP_FLOWER_LAYER_IPV6)
+
+struct nfp_flower_merge_check {
+ union {
+ struct {
+ __be16 tci;
+ struct nfp_flower_mac_mpls l2;
+ struct nfp_flower_tp_ports l4;
+ union {
+ struct nfp_flower_ipv4 ipv4;
+ struct nfp_flower_ipv6 ipv6;
+ };
+ };
+ unsigned long vals[8];
+ };
+};
+
static int
nfp_flower_xmit_flow(struct nfp_app *app, struct nfp_fl_payload *nfp_flow,
u8 mtype)
flow_rule_match_enc_opts(rule, &enc_op);
switch (enc_ports.key->dst) {
- case htons(NFP_FL_VXLAN_PORT):
+ case htons(IANA_VXLAN_UDP_PORT):
*tun_type = NFP_FL_TUNNEL_VXLAN;
key_layer |= NFP_FLOWER_LAYER_VXLAN;
key_size += sizeof(struct nfp_flower_ipv4_udp_tun);
if (enc_op.key)
return -EOPNOTSUPP;
break;
- case htons(NFP_FL_GENEVE_PORT):
+ case htons(GENEVE_UDP_PORT):
if (!(priv->flower_ext_feats & NFP_FL_FEATS_GENEVE))
return -EOPNOTSUPP;
*tun_type = NFP_FL_TUNNEL_GENEVE;
break;
case cpu_to_be16(ETH_P_IPV6):
- key_layer |= NFP_FLOWER_LAYER_IPV6;
+ key_layer |= NFP_FLOWER_LAYER_IPV6;
key_size += sizeof(struct nfp_flower_ipv6);
break;
flow_pay->nfp_tun_ipv4_addr = 0;
flow_pay->meta.flags = 0;
+ INIT_LIST_HEAD(&flow_pay->linked_flows);
+ flow_pay->in_hw = false;
return flow_pay;
return NULL;
}
+static int
+nfp_flower_update_merge_with_actions(struct nfp_fl_payload *flow,
+ struct nfp_flower_merge_check *merge,
+ u8 *last_act_id, int *act_out)
+{
+ struct nfp_fl_set_ipv6_tc_hl_fl *ipv6_tc_hl_fl;
+ struct nfp_fl_set_ip4_ttl_tos *ipv4_ttl_tos;
+ struct nfp_fl_set_ip4_addrs *ipv4_add;
+ struct nfp_fl_set_ipv6_addr *ipv6_add;
+ struct nfp_fl_push_vlan *push_vlan;
+ struct nfp_fl_set_tport *tport;
+ struct nfp_fl_set_eth *eth;
+ struct nfp_fl_act_head *a;
+ unsigned int act_off = 0;
+ u8 act_id = 0;
+ u8 *ports;
+ int i;
+
+ while (act_off < flow->meta.act_len) {
+ a = (struct nfp_fl_act_head *)&flow->action_data[act_off];
+ act_id = a->jump_id;
+
+ switch (act_id) {
+ case NFP_FL_ACTION_OPCODE_OUTPUT:
+ if (act_out)
+ (*act_out)++;
+ break;
+ case NFP_FL_ACTION_OPCODE_PUSH_VLAN:
+ push_vlan = (struct nfp_fl_push_vlan *)a;
+ if (push_vlan->vlan_tci)
+ merge->tci = cpu_to_be16(0xffff);
+ break;
+ case NFP_FL_ACTION_OPCODE_POP_VLAN:
+ merge->tci = cpu_to_be16(0);
+ break;
+ case NFP_FL_ACTION_OPCODE_SET_IPV4_TUNNEL:
+ /* New tunnel header means l2 to l4 can be matched. */
+ eth_broadcast_addr(&merge->l2.mac_dst[0]);
+ eth_broadcast_addr(&merge->l2.mac_src[0]);
+ memset(&merge->l4, 0xff,
+ sizeof(struct nfp_flower_tp_ports));
+ memset(&merge->ipv4, 0xff,
+ sizeof(struct nfp_flower_ipv4));
+ break;
+ case NFP_FL_ACTION_OPCODE_SET_ETHERNET:
+ eth = (struct nfp_fl_set_eth *)a;
+ for (i = 0; i < ETH_ALEN; i++)
+ merge->l2.mac_dst[i] |= eth->eth_addr_mask[i];
+ for (i = 0; i < ETH_ALEN; i++)
+ merge->l2.mac_src[i] |=
+ eth->eth_addr_mask[ETH_ALEN + i];
+ break;
+ case NFP_FL_ACTION_OPCODE_SET_IPV4_ADDRS:
+ ipv4_add = (struct nfp_fl_set_ip4_addrs *)a;
+ merge->ipv4.ipv4_src |= ipv4_add->ipv4_src_mask;
+ merge->ipv4.ipv4_dst |= ipv4_add->ipv4_dst_mask;
+ break;
+ case NFP_FL_ACTION_OPCODE_SET_IPV4_TTL_TOS:
+ ipv4_ttl_tos = (struct nfp_fl_set_ip4_ttl_tos *)a;
+ merge->ipv4.ip_ext.ttl |= ipv4_ttl_tos->ipv4_ttl_mask;
+ merge->ipv4.ip_ext.tos |= ipv4_ttl_tos->ipv4_tos_mask;
+ break;
+ case NFP_FL_ACTION_OPCODE_SET_IPV6_SRC:
+ ipv6_add = (struct nfp_fl_set_ipv6_addr *)a;
+ for (i = 0; i < 4; i++)
+ merge->ipv6.ipv6_src.in6_u.u6_addr32[i] |=
+ ipv6_add->ipv6[i].mask;
+ break;
+ case NFP_FL_ACTION_OPCODE_SET_IPV6_DST:
+ ipv6_add = (struct nfp_fl_set_ipv6_addr *)a;
+ for (i = 0; i < 4; i++)
+ merge->ipv6.ipv6_dst.in6_u.u6_addr32[i] |=
+ ipv6_add->ipv6[i].mask;
+ break;
+ case NFP_FL_ACTION_OPCODE_SET_IPV6_TC_HL_FL:
+ ipv6_tc_hl_fl = (struct nfp_fl_set_ipv6_tc_hl_fl *)a;
+ merge->ipv6.ip_ext.ttl |=
+ ipv6_tc_hl_fl->ipv6_hop_limit_mask;
+ merge->ipv6.ip_ext.tos |= ipv6_tc_hl_fl->ipv6_tc_mask;
+ merge->ipv6.ipv6_flow_label_exthdr |=
+ ipv6_tc_hl_fl->ipv6_label_mask;
+ break;
+ case NFP_FL_ACTION_OPCODE_SET_UDP:
+ case NFP_FL_ACTION_OPCODE_SET_TCP:
+ tport = (struct nfp_fl_set_tport *)a;
+ ports = (u8 *)&merge->l4.port_src;
+ for (i = 0; i < 4; i++)
+ ports[i] |= tport->tp_port_mask[i];
+ break;
+ case NFP_FL_ACTION_OPCODE_PRE_TUNNEL:
+ case NFP_FL_ACTION_OPCODE_PRE_LAG:
+ case NFP_FL_ACTION_OPCODE_PUSH_GENEVE:
+ break;
+ default:
+ return -EOPNOTSUPP;
+ }
+
+ act_off += a->len_lw << NFP_FL_LW_SIZ;
+ }
+
+ if (last_act_id)
+ *last_act_id = act_id;
+
+ return 0;
+}
+
+static int
+nfp_flower_populate_merge_match(struct nfp_fl_payload *flow,
+ struct nfp_flower_merge_check *merge,
+ bool extra_fields)
+{
+ struct nfp_flower_meta_tci *meta_tci;
+ u8 *mask = flow->mask_data;
+ u8 key_layer, match_size;
+
+ memset(merge, 0, sizeof(struct nfp_flower_merge_check));
+
+ meta_tci = (struct nfp_flower_meta_tci *)mask;
+ key_layer = meta_tci->nfp_flow_key_layer;
+
+ if (key_layer & ~NFP_FLOWER_MERGE_FIELDS && !extra_fields)
+ return -EOPNOTSUPP;
+
+ merge->tci = meta_tci->tci;
+ mask += sizeof(struct nfp_flower_meta_tci);
+
+ if (key_layer & NFP_FLOWER_LAYER_EXT_META)
+ mask += sizeof(struct nfp_flower_ext_meta);
+
+ mask += sizeof(struct nfp_flower_in_port);
+
+ if (key_layer & NFP_FLOWER_LAYER_MAC) {
+ match_size = sizeof(struct nfp_flower_mac_mpls);
+ memcpy(&merge->l2, mask, match_size);
+ mask += match_size;
+ }
+
+ if (key_layer & NFP_FLOWER_LAYER_TP) {
+ match_size = sizeof(struct nfp_flower_tp_ports);
+ memcpy(&merge->l4, mask, match_size);
+ mask += match_size;
+ }
+
+ if (key_layer & NFP_FLOWER_LAYER_IPV4) {
+ match_size = sizeof(struct nfp_flower_ipv4);
+ memcpy(&merge->ipv4, mask, match_size);
+ }
+
+ if (key_layer & NFP_FLOWER_LAYER_IPV6) {
+ match_size = sizeof(struct nfp_flower_ipv6);
+ memcpy(&merge->ipv6, mask, match_size);
+ }
+
+ return 0;
+}
+
+static int
+nfp_flower_can_merge(struct nfp_fl_payload *sub_flow1,
+ struct nfp_fl_payload *sub_flow2)
+{
+ /* Two flows can be merged if sub_flow2 only matches on bits that are
+ * either matched by sub_flow1 or set by a sub_flow1 action. This
+ * ensures that every packet that hits sub_flow1 and recirculates is
+ * guaranteed to hit sub_flow2.
+ */
+ struct nfp_flower_merge_check sub_flow1_merge, sub_flow2_merge;
+ int err, act_out = 0;
+ u8 last_act_id = 0;
+
+ err = nfp_flower_populate_merge_match(sub_flow1, &sub_flow1_merge,
+ true);
+ if (err)
+ return err;
+
+ err = nfp_flower_populate_merge_match(sub_flow2, &sub_flow2_merge,
+ false);
+ if (err)
+ return err;
+
+ err = nfp_flower_update_merge_with_actions(sub_flow1, &sub_flow1_merge,
+ &last_act_id, &act_out);
+ if (err)
+ return err;
+
+ /* Must only be 1 output action and it must be the last in sequence. */
+ if (act_out != 1 || last_act_id != NFP_FL_ACTION_OPCODE_OUTPUT)
+ return -EOPNOTSUPP;
+
+ /* Reject merge if sub_flow2 matches on something that is not matched
+ * on or set in an action by sub_flow1.
+ */
+ err = bitmap_andnot(sub_flow2_merge.vals, sub_flow2_merge.vals,
+ sub_flow1_merge.vals,
+ sizeof(struct nfp_flower_merge_check) * 8);
+ if (err)
+ return -EINVAL;
+
+ return 0;
+}
+
+static unsigned int
+nfp_flower_copy_pre_actions(char *act_dst, char *act_src, int len,
+ bool *tunnel_act)
+{
+ unsigned int act_off = 0, act_len;
+ struct nfp_fl_act_head *a;
+ u8 act_id = 0;
+
+ while (act_off < len) {
+ a = (struct nfp_fl_act_head *)&act_src[act_off];
+ act_len = a->len_lw << NFP_FL_LW_SIZ;
+ act_id = a->jump_id;
+
+ switch (act_id) {
+ case NFP_FL_ACTION_OPCODE_PRE_TUNNEL:
+ if (tunnel_act)
+ *tunnel_act = true;
+ /* fall through */
+ case NFP_FL_ACTION_OPCODE_PRE_LAG:
+ memcpy(act_dst + act_off, act_src + act_off, act_len);
+ break;
+ default:
+ return act_off;
+ }
+
+ act_off += act_len;
+ }
+
+ return act_off;
+}
+
+static int nfp_fl_verify_post_tun_acts(char *acts, int len)
+{
+ struct nfp_fl_act_head *a;
+ unsigned int act_off = 0;
+
+ while (act_off < len) {
+ a = (struct nfp_fl_act_head *)&acts[act_off];
+ if (a->jump_id != NFP_FL_ACTION_OPCODE_OUTPUT)
+ return -EOPNOTSUPP;
+
+ act_off += a->len_lw << NFP_FL_LW_SIZ;
+ }
+
+ return 0;
+}
+
+static int
+nfp_flower_merge_action(struct nfp_fl_payload *sub_flow1,
+ struct nfp_fl_payload *sub_flow2,
+ struct nfp_fl_payload *merge_flow)
+{
+ unsigned int sub1_act_len, sub2_act_len, pre_off1, pre_off2;
+ bool tunnel_act = false;
+ char *merge_act;
+ int err;
+
+ /* The last action of sub_flow1 must be output - do not merge this. */
+ sub1_act_len = sub_flow1->meta.act_len - sizeof(struct nfp_fl_output);
+ sub2_act_len = sub_flow2->meta.act_len;
+
+ if (!sub2_act_len)
+ return -EINVAL;
+
+ if (sub1_act_len + sub2_act_len > NFP_FL_MAX_A_SIZ)
+ return -EINVAL;
+
+ /* A shortcut can only be applied if there is a single action. */
+ if (sub1_act_len)
+ merge_flow->meta.shortcut = cpu_to_be32(NFP_FL_SC_ACT_NULL);
+ else
+ merge_flow->meta.shortcut = sub_flow2->meta.shortcut;
+
+ merge_flow->meta.act_len = sub1_act_len + sub2_act_len;
+ merge_act = merge_flow->action_data;
+
+ /* Copy any pre-actions to the start of merge flow action list. */
+ pre_off1 = nfp_flower_copy_pre_actions(merge_act,
+ sub_flow1->action_data,
+ sub1_act_len, &tunnel_act);
+ merge_act += pre_off1;
+ sub1_act_len -= pre_off1;
+ pre_off2 = nfp_flower_copy_pre_actions(merge_act,
+ sub_flow2->action_data,
+ sub2_act_len, NULL);
+ merge_act += pre_off2;
+ sub2_act_len -= pre_off2;
+
+ /* FW does a tunnel push when egressing, therefore, if sub_flow 1 pushes
+ * a tunnel, sub_flow 2 can only have output actions for a valid merge.
+ */
+ if (tunnel_act) {
+ char *post_tun_acts = &sub_flow2->action_data[pre_off2];
+
+ err = nfp_fl_verify_post_tun_acts(post_tun_acts, sub2_act_len);
+ if (err)
+ return err;
+ }
+
+ /* Copy remaining actions from sub_flows 1 and 2. */
+ memcpy(merge_act, sub_flow1->action_data + pre_off1, sub1_act_len);
+ merge_act += sub1_act_len;
+ memcpy(merge_act, sub_flow2->action_data + pre_off2, sub2_act_len);
+
+ return 0;
+}
+
+/* Flow link code should only be accessed under RTNL. */
+static void nfp_flower_unlink_flow(struct nfp_fl_payload_link *link)
+{
+ list_del(&link->merge_flow.list);
+ list_del(&link->sub_flow.list);
+ kfree(link);
+}
+
+static void nfp_flower_unlink_flows(struct nfp_fl_payload *merge_flow,
+ struct nfp_fl_payload *sub_flow)
+{
+ struct nfp_fl_payload_link *link;
+
+ list_for_each_entry(link, &merge_flow->linked_flows, merge_flow.list)
+ if (link->sub_flow.flow == sub_flow) {
+ nfp_flower_unlink_flow(link);
+ return;
+ }
+}
+
+static int nfp_flower_link_flows(struct nfp_fl_payload *merge_flow,
+ struct nfp_fl_payload *sub_flow)
+{
+ struct nfp_fl_payload_link *link;
+
+ link = kmalloc(sizeof(*link), GFP_KERNEL);
+ if (!link)
+ return -ENOMEM;
+
+ link->merge_flow.flow = merge_flow;
+ list_add_tail(&link->merge_flow.list, &merge_flow->linked_flows);
+ link->sub_flow.flow = sub_flow;
+ list_add_tail(&link->sub_flow.list, &sub_flow->linked_flows);
+
+ return 0;
+}
+
+/**
+ * nfp_flower_merge_offloaded_flows() - Merge 2 existing flows to single flow.
+ * @app: Pointer to the APP handle
+ * @sub_flow1: Initial flow matched to produce merge hint
+ * @sub_flow2: Post recirculation flow matched in merge hint
+ *
+ * Combines 2 flows (if valid) to a single flow, removing the initial from hw
+ * and offloading the new, merged flow.
+ *
+ * Return: negative value on error, 0 in success.
+ */
+int nfp_flower_merge_offloaded_flows(struct nfp_app *app,
+ struct nfp_fl_payload *sub_flow1,
+ struct nfp_fl_payload *sub_flow2)
+{
+ struct tc_cls_flower_offload merge_tc_off;
+ struct nfp_flower_priv *priv = app->priv;
+ struct nfp_fl_payload *merge_flow;
+ struct nfp_fl_key_ls merge_key_ls;
+ int err;
+
+ ASSERT_RTNL();
+
+ if (sub_flow1 == sub_flow2 ||
+ nfp_flower_is_merge_flow(sub_flow1) ||
+ nfp_flower_is_merge_flow(sub_flow2))
+ return -EINVAL;
+
+ err = nfp_flower_can_merge(sub_flow1, sub_flow2);
+ if (err)
+ return err;
+
+ merge_key_ls.key_size = sub_flow1->meta.key_len;
+
+ merge_flow = nfp_flower_allocate_new(&merge_key_ls);
+ if (!merge_flow)
+ return -ENOMEM;
+
+ merge_flow->tc_flower_cookie = (unsigned long)merge_flow;
+ merge_flow->ingress_dev = sub_flow1->ingress_dev;
+
+ memcpy(merge_flow->unmasked_data, sub_flow1->unmasked_data,
+ sub_flow1->meta.key_len);
+ memcpy(merge_flow->mask_data, sub_flow1->mask_data,
+ sub_flow1->meta.mask_len);
+
+ err = nfp_flower_merge_action(sub_flow1, sub_flow2, merge_flow);
+ if (err)
+ goto err_destroy_merge_flow;
+
+ err = nfp_flower_link_flows(merge_flow, sub_flow1);
+ if (err)
+ goto err_destroy_merge_flow;
+
+ err = nfp_flower_link_flows(merge_flow, sub_flow2);
+ if (err)
+ goto err_unlink_sub_flow1;
+
+ merge_tc_off.cookie = merge_flow->tc_flower_cookie;
+ err = nfp_compile_flow_metadata(app, &merge_tc_off, merge_flow,
+ merge_flow->ingress_dev);
+ if (err)
+ goto err_unlink_sub_flow2;
+
+ err = rhashtable_insert_fast(&priv->flow_table, &merge_flow->fl_node,
+ nfp_flower_table_params);
+ if (err)
+ goto err_release_metadata;
+
+ err = nfp_flower_xmit_flow(app, merge_flow,
+ NFP_FLOWER_CMSG_TYPE_FLOW_MOD);
+ if (err)
+ goto err_remove_rhash;
+
+ merge_flow->in_hw = true;
+ sub_flow1->in_hw = false;
+
+ return 0;
+
+err_remove_rhash:
+ WARN_ON_ONCE(rhashtable_remove_fast(&priv->flow_table,
+ &merge_flow->fl_node,
+ nfp_flower_table_params));
+err_release_metadata:
+ nfp_modify_flow_metadata(app, merge_flow);
+err_unlink_sub_flow2:
+ nfp_flower_unlink_flows(merge_flow, sub_flow2);
+err_unlink_sub_flow1:
+ nfp_flower_unlink_flows(merge_flow, sub_flow1);
+err_destroy_merge_flow:
+ kfree(merge_flow->action_data);
+ kfree(merge_flow->mask_data);
+ kfree(merge_flow->unmasked_data);
+ kfree(merge_flow);
+ return err;
+}
+
/**
* nfp_flower_add_offload() - Adds a new flow to hardware.
* @app: Pointer to the APP handle
if (port)
port->tc_offload_cnt++;
+ flow_pay->in_hw = true;
+
/* Deallocate flow payload when flower rule has been destroyed. */
kfree(key_layer);
return err;
}
+static void
+nfp_flower_remove_merge_flow(struct nfp_app *app,
+ struct nfp_fl_payload *del_sub_flow,
+ struct nfp_fl_payload *merge_flow)
+{
+ struct nfp_flower_priv *priv = app->priv;
+ struct nfp_fl_payload_link *link, *temp;
+ struct nfp_fl_payload *origin;
+ bool mod = false;
+ int err;
+
+ link = list_first_entry(&merge_flow->linked_flows,
+ struct nfp_fl_payload_link, merge_flow.list);
+ origin = link->sub_flow.flow;
+
+ /* Re-add rule the merge had overwritten if it has not been deleted. */
+ if (origin != del_sub_flow)
+ mod = true;
+
+ err = nfp_modify_flow_metadata(app, merge_flow);
+ if (err) {
+ nfp_flower_cmsg_warn(app, "Metadata fail for merge flow delete.\n");
+ goto err_free_links;
+ }
+
+ if (!mod) {
+ err = nfp_flower_xmit_flow(app, merge_flow,
+ NFP_FLOWER_CMSG_TYPE_FLOW_DEL);
+ if (err) {
+ nfp_flower_cmsg_warn(app, "Failed to delete merged flow.\n");
+ goto err_free_links;
+ }
+ } else {
+ __nfp_modify_flow_metadata(priv, origin);
+ err = nfp_flower_xmit_flow(app, origin,
+ NFP_FLOWER_CMSG_TYPE_FLOW_MOD);
+ if (err)
+ nfp_flower_cmsg_warn(app, "Failed to revert merge flow.\n");
+ origin->in_hw = true;
+ }
+
+err_free_links:
+ /* Clean any links connected with the merged flow. */
+ list_for_each_entry_safe(link, temp, &merge_flow->linked_flows,
+ merge_flow.list)
+ nfp_flower_unlink_flow(link);
+
+ kfree(merge_flow->action_data);
+ kfree(merge_flow->mask_data);
+ kfree(merge_flow->unmasked_data);
+ WARN_ON_ONCE(rhashtable_remove_fast(&priv->flow_table,
+ &merge_flow->fl_node,
+ nfp_flower_table_params));
+ kfree_rcu(merge_flow, rcu);
+}
+
+static void
+nfp_flower_del_linked_merge_flows(struct nfp_app *app,
+ struct nfp_fl_payload *sub_flow)
+{
+ struct nfp_fl_payload_link *link, *temp;
+
+ /* Remove any merge flow formed from the deleted sub_flow. */
+ list_for_each_entry_safe(link, temp, &sub_flow->linked_flows,
+ sub_flow.list)
+ nfp_flower_remove_merge_flow(app, sub_flow,
+ link->merge_flow.flow);
+}
+
/**
* nfp_flower_del_offload() - Removes a flow from hardware.
* @app: Pointer to the APP handle
* @flow: TC flower classifier offload structure
*
* Removes a flow from the repeated hash structure and clears the
- * action payload.
+ * action payload. Any flows merged from this are also deleted.
*
* Return: negative value on error, 0 if removed successfully.
*/
err = nfp_modify_flow_metadata(app, nfp_flow);
if (err)
- goto err_free_flow;
+ goto err_free_merge_flow;
if (nfp_flow->nfp_tun_ipv4_addr)
nfp_tunnel_del_ipv4_off(app, nfp_flow->nfp_tun_ipv4_addr);
+ if (!nfp_flow->in_hw) {
+ err = 0;
+ goto err_free_merge_flow;
+ }
+
err = nfp_flower_xmit_flow(app, nfp_flow,
NFP_FLOWER_CMSG_TYPE_FLOW_DEL);
- if (err)
- goto err_free_flow;
+ /* Fall through on error. */
-err_free_flow:
+err_free_merge_flow:
+ nfp_flower_del_linked_merge_flows(app, nfp_flow);
if (port)
port->tc_offload_cnt--;
kfree(nfp_flow->action_data);
return err;
}
+static void
+__nfp_flower_update_merge_stats(struct nfp_app *app,
+ struct nfp_fl_payload *merge_flow)
+{
+ struct nfp_flower_priv *priv = app->priv;
+ struct nfp_fl_payload_link *link;
+ struct nfp_fl_payload *sub_flow;
+ u64 pkts, bytes, used;
+ u32 ctx_id;
+
+ ctx_id = be32_to_cpu(merge_flow->meta.host_ctx_id);
+ pkts = priv->stats[ctx_id].pkts;
+ /* Do not cycle subflows if no stats to distribute. */
+ if (!pkts)
+ return;
+ bytes = priv->stats[ctx_id].bytes;
+ used = priv->stats[ctx_id].used;
+
+ /* Reset stats for the merge flow. */
+ priv->stats[ctx_id].pkts = 0;
+ priv->stats[ctx_id].bytes = 0;
+
+ /* The merge flow has received stats updates from firmware.
+ * Distribute these stats to all subflows that form the merge.
+ * The stats will collected from TC via the subflows.
+ */
+ list_for_each_entry(link, &merge_flow->linked_flows, merge_flow.list) {
+ sub_flow = link->sub_flow.flow;
+ ctx_id = be32_to_cpu(sub_flow->meta.host_ctx_id);
+ priv->stats[ctx_id].pkts += pkts;
+ priv->stats[ctx_id].bytes += bytes;
+ max_t(u64, priv->stats[ctx_id].used, used);
+ }
+}
+
+static void
+nfp_flower_update_merge_stats(struct nfp_app *app,
+ struct nfp_fl_payload *sub_flow)
+{
+ struct nfp_fl_payload_link *link;
+
+ /* Get merge flows that the subflow forms to distribute their stats. */
+ list_for_each_entry(link, &sub_flow->linked_flows, sub_flow.list)
+ __nfp_flower_update_merge_stats(app, link->merge_flow.flow);
+}
+
/**
* nfp_flower_get_stats() - Populates flow stats obtained from hardware.
* @app: Pointer to the APP handle
ctx_id = be32_to_cpu(nfp_flow->meta.host_ctx_id);
spin_lock_bh(&priv->stats_lock);
+ /* If request is for a sub_flow, update stats from merged flows. */
+ if (!list_empty(&nfp_flow->linked_flows))
+ nfp_flower_update_merge_stats(app, nfp_flow);
+
flow_stats_update(&flow->stats, priv->stats[ctx_id].bytes,
priv->stats[ctx_id].pkts, priv->stats[ctx_id].used);
struct nfp_flower_priv *priv = app->priv;
int err;
- if (f->binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
+ if (f->binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS &&
+ !(f->binder_type == TCF_BLOCK_BINDER_TYPE_CLSACT_EGRESS &&
+ nfp_flower_internal_port_can_offload(app, netdev)))
return -EOPNOTSUPP;
switch (f->command) {
for (i = 0; i < count; i++) {
ipv4_addr = payload->tun_info[i].ipv4;
port = be32_to_cpu(payload->tun_info[i].egress_port);
- netdev = nfp_app_repr_get(app, port);
+ netdev = nfp_app_dev_get(app, port, NULL);
if (!netdev)
continue;
struct flowi4 *flow, struct neighbour *neigh, gfp_t flag)
{
struct nfp_tun_neigh payload;
+ u32 port_id;
- /* Only offload representor IPv4s for now. */
- if (!nfp_netdev_is_nfp_repr(netdev))
+ port_id = nfp_flower_get_port_id_from_netdev(app, netdev);
+ if (!port_id)
return;
memset(&payload, 0, sizeof(struct nfp_tun_neigh));
payload.src_ipv4 = flow->saddr;
ether_addr_copy(payload.src_addr, netdev->dev_addr);
neigh_ha_snapshot(payload.dst_addr, neigh, netdev);
- payload.port_id = cpu_to_be32(nfp_repr_get_port_id(netdev));
+ payload.port_id = cpu_to_be32(port_id);
/* Add destination of new route to NFP cache. */
nfp_tun_add_route_to_cache(app, payload.dst_ipv4);
payload = nfp_flower_cmsg_get_data(skb);
- netdev = nfp_app_repr_get(app, be32_to_cpu(payload->ingress_port));
+ netdev = nfp_app_dev_get(app, be32_to_cpu(payload->ingress_port), NULL);
if (!netdev)
goto route_fail_warning;
* @eswitch_mode_set: set SR-IOV eswitch mode (under pf->lock)
* @sriov_enable: app-specific sriov initialisation
* @sriov_disable: app-specific sriov clean-up
- * @repr_get: get representor netdev
+ * @dev_get: get representor or internal port representing netdev
*/
struct nfp_app_type {
enum nfp_app_id id;
enum devlink_eswitch_mode (*eswitch_mode_get)(struct nfp_app *app);
int (*eswitch_mode_set)(struct nfp_app *app, u16 mode);
- struct net_device *(*repr_get)(struct nfp_app *app, u32 id);
+ struct net_device *(*dev_get)(struct nfp_app *app, u32 id,
+ bool *redir_egress);
};
/**
app->type->sriov_disable(app);
}
-static inline struct net_device *nfp_app_repr_get(struct nfp_app *app, u32 id)
+static inline
+struct net_device *nfp_app_dev_get(struct nfp_app *app, u32 id,
+ bool *redir_egress)
{
- if (unlikely(!app || !app->type->repr_get))
+ if (unlikely(!app || !app->type->dev_get))
return NULL;
- return app->type->repr_get(app, id);
+ return app->type->dev_get(app, id, redir_egress);
}
struct nfp_app *nfp_app_from_netdev(struct net_device *netdev);
int nfp_app_nic_vnic_init_phy_port(struct nfp_pf *pf, struct nfp_app *app,
struct nfp_net *nn, unsigned int id);
-struct devlink *nfp_devlink_get_devlink(struct net_device *netdev);
+struct devlink_port *nfp_devlink_get_devlink_port(struct net_device *netdev);
#endif
{
struct nfp_eth_table_port eth_port;
struct devlink *devlink;
+ const u8 *serial;
+ int serial_len;
int ret;
rtnl_lock();
if (ret)
return ret;
- devlink_port_type_eth_set(&port->dl_port, port->netdev);
+ serial_len = nfp_cpp_serial(port->app->cpp, &serial);
devlink_port_attrs_set(&port->dl_port, DEVLINK_PORT_FLAVOUR_PHYSICAL,
eth_port.label_port, eth_port.is_split,
- eth_port.label_subport);
+ eth_port.label_subport, serial, serial_len);
devlink = priv_to_devlink(app->pf);
devlink_port_unregister(&port->dl_port);
}
-struct devlink *nfp_devlink_get_devlink(struct net_device *netdev)
+void nfp_devlink_port_type_eth_set(struct nfp_port *port)
+{
+ devlink_port_type_eth_set(&port->dl_port, port->netdev);
+}
+
+void nfp_devlink_port_type_clear(struct nfp_port *port)
{
- struct nfp_app *app;
+ devlink_port_type_clear(&port->dl_port);
+}
+
+struct devlink_port *nfp_devlink_get_devlink_port(struct net_device *netdev)
+{
+ struct nfp_port *port;
- app = nfp_app_from_netdev(netdev);
- if (!app)
+ port = nfp_port_from_netdev(netdev);
+ if (!port)
return NULL;
- return priv_to_devlink(app->pf);
+ return &port->dl_port;
}
* @shared_handler: Handler for shared interrupts
* @shared_name: Name for shared interrupt
* @me_freq_mhz: ME clock_freq (MHz)
- * @reconfig_lock: Protects HW reconfiguration request regs/machinery
+ * @reconfig_lock: Protects @reconfig_posted, @reconfig_timer_active,
+ * @reconfig_sync_present and HW reconfiguration request
+ * regs/machinery from async requests (sync must take
+ * @bar_lock)
* @reconfig_posted: Pending reconfig bits coming from async sources
* @reconfig_timer_active: Timer for reading reconfiguration results is pending
* @reconfig_sync_present: Some thread is performing synchronous reconfig
* @reconfig_timer: Timer for async reading of reconfig results
* @reconfig_in_progress_update: Update FW is processing now (debug only)
+ * @bar_lock: vNIC config BAR access lock, protects: update,
+ * mailbox area
* @link_up: Is the link up?
* @link_status_lock: Protects @link_* and ensures atomicity with BAR reading
* @rx_coalesce_usecs: RX interrupt moderation usecs delay parameter
struct timer_list reconfig_timer;
u32 reconfig_in_progress_update;
+ struct mutex bar_lock;
+
u32 rx_coalesce_usecs;
u32 rx_coalesce_max_frames;
u32 tx_coalesce_usecs;
spin_unlock_bh(&nn->r_vecs[0].lock);
}
+static inline void nn_ctrl_bar_lock(struct nfp_net *nn)
+{
+ mutex_lock(&nn->bar_lock);
+}
+
+static inline void nn_ctrl_bar_unlock(struct nfp_net *nn)
+{
+ mutex_unlock(&nn->bar_lock);
+}
+
/* Globals */
extern const char nfp_driver_version[];
void nfp_net_rss_write_itbl(struct nfp_net *nn);
void nfp_net_rss_write_key(struct nfp_net *nn);
void nfp_net_coalesce_write_cfg(struct nfp_net *nn);
-int nfp_net_reconfig_mbox(struct nfp_net *nn, u32 mbox_cmd);
+int nfp_net_mbox_lock(struct nfp_net *nn, unsigned int data_size);
+int nfp_net_mbox_reconfig(struct nfp_net *nn, u32 mbox_cmd);
+int nfp_net_mbox_reconfig_and_unlock(struct nfp_net *nn, u32 mbox_cmd);
unsigned int
nfp_net_irqs_alloc(struct pci_dev *pdev, struct msix_entry *irq_entries,
#include <linux/interrupt.h>
#include <linux/ip.h>
#include <linux/ipv6.h>
+#include <linux/lockdep.h>
#include <linux/mm.h>
#include <linux/overflow.h>
#include <linux/page_ref.h>
return false;
}
-static int nfp_net_reconfig_wait(struct nfp_net *nn, unsigned long deadline)
+static bool __nfp_net_reconfig_wait(struct nfp_net *nn, unsigned long deadline)
{
bool timed_out = false;
+ int i;
+
+ /* Poll update field, waiting for NFP to ack the config.
+ * Do an opportunistic wait-busy loop, afterward sleep.
+ */
+ for (i = 0; i < 50; i++) {
+ if (nfp_net_reconfig_check_done(nn, false))
+ return false;
+ udelay(4);
+ }
- /* Poll update field, waiting for NFP to ack the config */
while (!nfp_net_reconfig_check_done(nn, timed_out)) {
- msleep(1);
+ usleep_range(250, 500);
timed_out = time_is_before_eq_jiffies(deadline);
}
+ return timed_out;
+}
+
+static int nfp_net_reconfig_wait(struct nfp_net *nn, unsigned long deadline)
+{
+ if (__nfp_net_reconfig_wait(nn, deadline))
+ return -EIO;
+
if (nn_readl(nn, NFP_NET_CFG_UPDATE) & NFP_NET_CFG_UPDATE_ERR)
return -EIO;
- return timed_out ? -EIO : 0;
+ return 0;
}
static void nfp_net_reconfig_timer(struct timer_list *t)
}
/**
- * nfp_net_reconfig() - Reconfigure the firmware
+ * __nfp_net_reconfig() - Reconfigure the firmware
* @nn: NFP Net device to reconfigure
* @update: The value for the update field in the BAR config
*
*
* Return: Negative errno on error, 0 on success
*/
-int nfp_net_reconfig(struct nfp_net *nn, u32 update)
+static int __nfp_net_reconfig(struct nfp_net *nn, u32 update)
{
int ret;
+ lockdep_assert_held(&nn->bar_lock);
+
nfp_net_reconfig_sync_enter(nn);
nfp_net_reconfig_start(nn, update);
return ret;
}
+int nfp_net_reconfig(struct nfp_net *nn, u32 update)
+{
+ int ret;
+
+ nn_ctrl_bar_lock(nn);
+ ret = __nfp_net_reconfig(nn, update);
+ nn_ctrl_bar_unlock(nn);
+
+ return ret;
+}
+
+int nfp_net_mbox_lock(struct nfp_net *nn, unsigned int data_size)
+{
+ if (nn->tlv_caps.mbox_len < NFP_NET_CFG_MBOX_SIMPLE_VAL + data_size) {
+ nn_err(nn, "mailbox too small for %u of data (%u)\n",
+ data_size, nn->tlv_caps.mbox_len);
+ return -EIO;
+ }
+
+ nn_ctrl_bar_lock(nn);
+ return 0;
+}
+
/**
- * nfp_net_reconfig_mbox() - Reconfigure the firmware via the mailbox
+ * nfp_net_mbox_reconfig() - Reconfigure the firmware via the mailbox
* @nn: NFP Net device to reconfigure
* @mbox_cmd: The value for the mailbox command
*
*
* Return: Negative errno on error, 0 on success
*/
-int nfp_net_reconfig_mbox(struct nfp_net *nn, u32 mbox_cmd)
+int nfp_net_mbox_reconfig(struct nfp_net *nn, u32 mbox_cmd)
{
u32 mbox = nn->tlv_caps.mbox_off;
int ret;
- if (!nfp_net_has_mbox(&nn->tlv_caps)) {
- nn_err(nn, "no mailbox present, command: %u\n", mbox_cmd);
- return -EIO;
- }
-
+ lockdep_assert_held(&nn->bar_lock);
nn_writeq(nn, mbox + NFP_NET_CFG_MBOX_SIMPLE_CMD, mbox_cmd);
- ret = nfp_net_reconfig(nn, NFP_NET_CFG_UPDATE_MBOX);
+ ret = __nfp_net_reconfig(nn, NFP_NET_CFG_UPDATE_MBOX);
if (ret) {
nn_err(nn, "Mailbox update error\n");
return ret;
return -nn_readl(nn, mbox + NFP_NET_CFG_MBOX_SIMPLE_RET);
}
+int nfp_net_mbox_reconfig_and_unlock(struct nfp_net *nn, u32 mbox_cmd)
+{
+ int ret;
+
+ ret = nfp_net_mbox_reconfig(nn, mbox_cmd);
+ nn_ctrl_bar_unlock(nn);
+ return ret;
+}
+
/* Interrupt configuration and handling
*/
nfp_net_tx_ring_stop(nd_q, tx_ring);
tx_ring->wr_ptr_add += nr_frags + 1;
- if (__netdev_tx_sent_queue(nd_q, txbuf->real_len, skb->xmit_more))
+ if (__netdev_tx_sent_queue(nd_q, txbuf->real_len, netdev_xmit_more()))
nfp_net_tx_xmit_more_flush(tx_ring);
return NETDEV_TX_OK;
struct nfp_net_rx_buf *rxbuf;
struct nfp_net_rx_desc *rxd;
struct nfp_meta_parsed meta;
+ bool redir_egress = false;
struct net_device *netdev;
dma_addr_t new_dma_addr;
u32 meta_len_xdp = 0;
struct nfp_net *nn;
nn = netdev_priv(dp->netdev);
- netdev = nfp_app_repr_get(nn->app, meta.portid);
+ netdev = nfp_app_dev_get(nn->app, meta.portid,
+ &redir_egress);
if (unlikely(!netdev)) {
nfp_net_rx_drop(dp, r_vec, rx_ring, rxbuf,
NULL);
continue;
}
- nfp_repr_inc_rx_stats(netdev, pkt_len);
+
+ if (nfp_netdev_is_nfp_repr(netdev))
+ nfp_repr_inc_rx_stats(netdev, pkt_len);
}
skb = build_skb(rxbuf->frag, true_bufsz);
if (meta_len_xdp)
skb_metadata_set(skb, meta_len_xdp);
- napi_gro_receive(&rx_ring->r_vec->napi, skb);
+ if (likely(!redir_egress)) {
+ napi_gro_receive(&rx_ring->r_vec->napi, skb);
+ } else {
+ skb->dev = netdev;
+ __skb_push(skb, ETH_HLEN);
+ dev_queue_xmit(skb);
+ }
}
if (xdp_prog) {
static int
nfp_net_vlan_rx_add_vid(struct net_device *netdev, __be16 proto, u16 vid)
{
+ const u32 cmd = NFP_NET_CFG_MBOX_CMD_CTAG_FILTER_ADD;
struct nfp_net *nn = netdev_priv(netdev);
+ int err;
/* Priority tagged packets with vlan id 0 are processed by the
* NFP as untagged packets
if (!vid)
return 0;
+ err = nfp_net_mbox_lock(nn, NFP_NET_CFG_VLAN_FILTER_SZ);
+ if (err)
+ return err;
+
nn_writew(nn, nn->tlv_caps.mbox_off + NFP_NET_CFG_VLAN_FILTER_VID, vid);
nn_writew(nn, nn->tlv_caps.mbox_off + NFP_NET_CFG_VLAN_FILTER_PROTO,
ETH_P_8021Q);
- return nfp_net_reconfig_mbox(nn, NFP_NET_CFG_MBOX_CMD_CTAG_FILTER_ADD);
+ return nfp_net_mbox_reconfig_and_unlock(nn, cmd);
}
static int
nfp_net_vlan_rx_kill_vid(struct net_device *netdev, __be16 proto, u16 vid)
{
+ const u32 cmd = NFP_NET_CFG_MBOX_CMD_CTAG_FILTER_KILL;
struct nfp_net *nn = netdev_priv(netdev);
+ int err;
/* Priority tagged packets with vlan id 0 are processed by the
* NFP as untagged packets
if (!vid)
return 0;
+ err = nfp_net_mbox_lock(nn, NFP_NET_CFG_VLAN_FILTER_SZ);
+ if (err)
+ return err;
+
nn_writew(nn, nn->tlv_caps.mbox_off + NFP_NET_CFG_VLAN_FILTER_VID, vid);
nn_writew(nn, nn->tlv_caps.mbox_off + NFP_NET_CFG_VLAN_FILTER_PROTO,
ETH_P_8021Q);
- return nfp_net_reconfig_mbox(nn, NFP_NET_CFG_MBOX_CMD_CTAG_FILTER_KILL);
+ return nfp_net_mbox_reconfig_and_unlock(nn, cmd);
}
static void nfp_net_stat64(struct net_device *netdev,
struct nfp_net *nn = netdev_priv(netdev);
int n;
+ /* If port is defined, devlink_port is registered and devlink core
+ * is taking care of name formatting.
+ */
if (nn->port)
- return nfp_port_get_phys_port_name(netdev, name, len);
+ return -EOPNOTSUPP;
if (nn->dp.is_vf || nn->vnic_no_name)
return -EOPNOTSUPP;
.ndo_udp_tunnel_add = nfp_net_add_vxlan_port,
.ndo_udp_tunnel_del = nfp_net_del_vxlan_port,
.ndo_bpf = nfp_net_xdp,
- .ndo_get_port_parent_id = nfp_port_get_port_parent_id,
- .ndo_get_devlink = nfp_devlink_get_devlink,
+ .ndo_get_devlink_port = nfp_devlink_get_devlink_port,
};
/**
nn->fw_ver.resv, nn->fw_ver.class,
nn->fw_ver.major, nn->fw_ver.minor,
nn->max_mtu);
- nn_info(nn, "CAP: %#x %s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s\n",
+ nn_info(nn, "CAP: %#x %s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s\n",
nn->cap,
nn->cap & NFP_NET_CFG_CTRL_PROMISC ? "PROMISC " : "",
nn->cap & NFP_NET_CFG_CTRL_L2BC ? "L2BCFILT " : "",
nn->cap & NFP_NET_CFG_CTRL_RSS ? "RSS1 " : "",
nn->cap & NFP_NET_CFG_CTRL_RSS2 ? "RSS2 " : "",
nn->cap & NFP_NET_CFG_CTRL_CTAG_FILTER ? "CTAG_FILTER " : "",
- nn->cap & NFP_NET_CFG_CTRL_L2SWITCH ? "L2SWITCH " : "",
nn->cap & NFP_NET_CFG_CTRL_MSIXAUTO ? "AUTOMASK " : "",
nn->cap & NFP_NET_CFG_CTRL_IRQMOD ? "IRQMOD " : "",
nn->cap & NFP_NET_CFG_CTRL_VXLAN ? "VXLAN " : "",
nn->dp.txd_cnt = NFP_NET_TX_DESCS_DEFAULT;
nn->dp.rxd_cnt = NFP_NET_RX_DESCS_DEFAULT;
+ mutex_init(&nn->bar_lock);
+
spin_lock_init(&nn->reconfig_lock);
spin_lock_init(&nn->link_status_lock);
void nfp_net_free(struct nfp_net *nn)
{
WARN_ON(timer_pending(&nn->reconfig_timer) || nn->reconfig_posted);
+
+ mutex_destroy(&nn->bar_lock);
+
if (nn->dp.netdev)
free_netdev(nn->dp.netdev);
else
nn->dp.ctrl |= NFP_NET_CFG_CTRL_IRQMOD;
}
- if (nn->dp.netdev)
- nfp_net_netdev_init(nn);
-
/* Stash the re-configuration queue away. First odd queue in TX Bar */
nn->qcp_cfg = nn->tx_bar + NFP_QCP_QUEUE_ADDR_SZ;
if (err)
return err;
+ if (nn->dp.netdev)
+ nfp_net_netdev_init(nn);
+
nfp_net_vecs_init(nn);
if (!nn->dp.netdev)
#define NFP_NET_CFG_CTRL_RINGPRIO (0x1 << 19) /* Ring priorities */
#define NFP_NET_CFG_CTRL_MSIXAUTO (0x1 << 20) /* MSI-X auto-masking */
#define NFP_NET_CFG_CTRL_TXRWB (0x1 << 21) /* Write-back of TX ring*/
-#define NFP_NET_CFG_CTRL_L2SWITCH (0x1 << 22) /* L2 Switch */
-#define NFP_NET_CFG_CTRL_L2SWITCH_LOCAL (0x1 << 23) /* Switch to local */
#define NFP_NET_CFG_CTRL_VXLAN (0x1 << 24) /* VXLAN tunnel support */
#define NFP_NET_CFG_CTRL_NVGRE (0x1 << 25) /* NVGRE tunnel support */
#define NFP_NET_CFG_CTRL_BPF (0x1 << 27) /* BPF offload capable */
#define NFP_NET_CFG_UPDATE_TXRPRIO (0x1 << 3) /* TX Ring prio change */
#define NFP_NET_CFG_UPDATE_RXRPRIO (0x1 << 4) /* RX Ring prio change */
#define NFP_NET_CFG_UPDATE_MSIX (0x1 << 5) /* MSI-X change */
-#define NFP_NET_CFG_UPDATE_L2SWITCH (0x1 << 6) /* Switch changes */
#define NFP_NET_CFG_UPDATE_RESET (0x1 << 7) /* Update due to FLR */
#define NFP_NET_CFG_UPDATE_IRQMOD (0x1 << 8) /* IRQ mod change */
#define NFP_NET_CFG_UPDATE_VXLAN (0x1 << 9) /* VXLAN port change */
#define NFP_NET_CFG_MBOX_SIMPLE_CMD 0x0
#define NFP_NET_CFG_MBOX_SIMPLE_RET 0x4
#define NFP_NET_CFG_MBOX_SIMPLE_VAL 0x8
-#define NFP_NET_CFG_MBOX_SIMPLE_LEN 12
#define NFP_NET_CFG_MBOX_CMD_CTAG_FILTER_ADD 1
#define NFP_NET_CFG_MBOX_CMD_CTAG_FILTER_KILL 2
int nfp_net_tlv_caps_parse(struct device *dev, u8 __iomem *ctrl_mem,
struct nfp_net_tlv_caps *caps);
-
-static inline bool nfp_net_has_mbox(struct nfp_net_tlv_caps *caps)
-{
- return caps->mbox_len >= NFP_NET_CFG_MBOX_SIMPLE_LEN;
-}
-
#endif /* _NFP_NET_CTRL_H_ */
#include <linux/pci.h>
#include <linux/ethtool.h>
#include <linux/firmware.h>
+#include <linux/sfp.h>
#include "nfpcore/nfp.h"
#include "nfpcore/nfp_nsp.h"
#define NN_RVEC_GATHER_STATS 9
#define NN_RVEC_PER_Q_STATS 3
+#define SFP_SFF_REV_COMPLIANCE 1
+
static void nfp_net_get_nspinfo(struct nfp_app *app, char *version)
{
struct nfp_nsp *nsp;
buffer);
}
+static int
+nfp_port_get_module_info(struct net_device *netdev,
+ struct ethtool_modinfo *modinfo)
+{
+ struct nfp_eth_table_port *eth_port;
+ struct nfp_port *port;
+ unsigned int read_len;
+ struct nfp_nsp *nsp;
+ int err = 0;
+ u8 data;
+
+ port = nfp_port_from_netdev(netdev);
+ eth_port = nfp_port_get_eth_port(port);
+ if (!eth_port)
+ return -EOPNOTSUPP;
+
+ nsp = nfp_nsp_open(port->app->cpp);
+ if (IS_ERR(nsp)) {
+ err = PTR_ERR(nsp);
+ netdev_err(netdev, "Failed to access the NSP: %d\n", err);
+ return err;
+ }
+
+ if (!nfp_nsp_has_read_module_eeprom(nsp)) {
+ netdev_info(netdev, "reading module EEPROM not supported. Please update flash\n");
+ err = -EOPNOTSUPP;
+ goto exit_close_nsp;
+ }
+
+ switch (eth_port->interface) {
+ case NFP_INTERFACE_SFP:
+ case NFP_INTERFACE_SFP28:
+ err = nfp_nsp_read_module_eeprom(nsp, eth_port->eth_index,
+ SFP_SFF8472_COMPLIANCE, &data,
+ 1, &read_len);
+ if (err < 0)
+ goto exit_close_nsp;
+
+ if (!data) {
+ modinfo->type = ETH_MODULE_SFF_8079;
+ modinfo->eeprom_len = ETH_MODULE_SFF_8079_LEN;
+ } else {
+ modinfo->type = ETH_MODULE_SFF_8472;
+ modinfo->eeprom_len = ETH_MODULE_SFF_8472_LEN;
+ }
+ break;
+ case NFP_INTERFACE_QSFP:
+ err = nfp_nsp_read_module_eeprom(nsp, eth_port->eth_index,
+ SFP_SFF_REV_COMPLIANCE, &data,
+ 1, &read_len);
+ if (err < 0)
+ goto exit_close_nsp;
+
+ if (data < 0x3) {
+ modinfo->type = ETH_MODULE_SFF_8436;
+ modinfo->eeprom_len = ETH_MODULE_SFF_8436_LEN;
+ } else {
+ modinfo->type = ETH_MODULE_SFF_8636;
+ modinfo->eeprom_len = ETH_MODULE_SFF_8636_LEN;
+ }
+ break;
+ case NFP_INTERFACE_QSFP28:
+ modinfo->type = ETH_MODULE_SFF_8636;
+ modinfo->eeprom_len = ETH_MODULE_SFF_8636_LEN;
+ break;
+ default:
+ netdev_err(netdev, "Unsupported module 0x%x detected\n",
+ eth_port->interface);
+ err = -EINVAL;
+ }
+
+exit_close_nsp:
+ nfp_nsp_close(nsp);
+ return err;
+}
+
+static int
+nfp_port_get_module_eeprom(struct net_device *netdev,
+ struct ethtool_eeprom *eeprom, u8 *data)
+{
+ struct nfp_eth_table_port *eth_port;
+ struct nfp_port *port;
+ struct nfp_nsp *nsp;
+ int err;
+
+ port = nfp_port_from_netdev(netdev);
+ eth_port = __nfp_port_get_eth_port(port);
+ if (!eth_port)
+ return -EOPNOTSUPP;
+
+ nsp = nfp_nsp_open(port->app->cpp);
+ if (IS_ERR(nsp)) {
+ err = PTR_ERR(nsp);
+ netdev_err(netdev, "Failed to access the NSP: %d\n", err);
+ return err;
+ }
+
+ if (!nfp_nsp_has_read_module_eeprom(nsp)) {
+ netdev_info(netdev, "reading module EEPROM not supported. Please update flash\n");
+ err = -EOPNOTSUPP;
+ goto exit_close_nsp;
+ }
+
+ err = nfp_nsp_read_module_eeprom(nsp, eth_port->eth_index,
+ eeprom->offset, data, eeprom->len,
+ &eeprom->len);
+ if (err < 0) {
+ if (eeprom->len) {
+ netdev_warn(netdev,
+ "Incomplete read from module EEPROM: %d\n",
+ err);
+ err = 0;
+ } else {
+ netdev_err(netdev,
+ "Reading from module EEPROM failed: %d\n",
+ err);
+ }
+ }
+
+exit_close_nsp:
+ nfp_nsp_close(nsp);
+ return err;
+}
+
static int nfp_net_set_coalesce(struct net_device *netdev,
struct ethtool_coalesce *ec)
{
.set_dump = nfp_app_set_dump,
.get_dump_flag = nfp_app_get_dump_flag,
.get_dump_data = nfp_app_get_dump_data,
+ .get_module_info = nfp_port_get_module_info,
+ .get_module_eeprom = nfp_port_get_module_eeprom,
.get_coalesce = nfp_net_get_coalesce,
.set_coalesce = nfp_net_set_coalesce,
.get_channels = nfp_net_get_channels,
.set_dump = nfp_app_set_dump,
.get_dump_flag = nfp_app_get_dump_flag,
.get_dump_data = nfp_app_get_dump_data,
+ .get_module_info = nfp_port_get_module_info,
+ .get_module_eeprom = nfp_port_get_module_eeprom,
.get_link_ksettings = nfp_net_get_link_ksettings,
.set_link_ksettings = nfp_net_set_link_ksettings,
.get_fecparam = nfp_port_get_fecparam,
nn->id = id;
+ if (nn->port) {
+ err = nfp_devlink_port_register(pf->app, nn->port);
+ if (err)
+ return err;
+ }
+
err = nfp_net_init(nn);
if (err)
- return err;
+ goto err_devlink_port_clean;
nfp_net_debugfs_vnic_add(nn, pf->ddir);
- if (nn->port) {
- err = nfp_devlink_port_register(pf->app, nn->port);
- if (err)
- goto err_dfs_clean;
- }
+ if (nn->port)
+ nfp_devlink_port_type_eth_set(nn->port);
nfp_net_info(nn);
if (nfp_net_is_data_vnic(nn)) {
err = nfp_app_vnic_init(pf->app, nn);
if (err)
- goto err_devlink_port_clean;
+ goto err_devlink_port_type_clean;
}
return 0;
-err_devlink_port_clean:
+err_devlink_port_type_clean:
if (nn->port)
- nfp_devlink_port_unregister(nn->port);
-err_dfs_clean:
+ nfp_devlink_port_type_clear(nn->port);
nfp_net_debugfs_dir_clean(&nn->debugfs_dir);
nfp_net_clean(nn);
+err_devlink_port_clean:
+ if (nn->port)
+ nfp_devlink_port_unregister(nn->port);
return err;
}
if (nfp_net_is_data_vnic(nn))
nfp_app_vnic_clean(pf->app, nn);
if (nn->port)
- nfp_devlink_port_unregister(nn->port);
+ nfp_devlink_port_type_clear(nn->port);
nfp_net_debugfs_dir_clean(&nn->debugfs_dir);
nfp_net_clean(nn);
+ if (nn->port)
+ nfp_devlink_port_unregister(nn->port);
}
static int nfp_net_pf_alloc_irqs(struct nfp_pf *pf)
.ndo_fix_features = nfp_repr_fix_features,
.ndo_set_features = nfp_port_set_features,
.ndo_set_mac_address = eth_mac_addr,
- .ndo_get_port_parent_id = nfp_port_get_port_parent_id,
- .ndo_get_devlink = nfp_devlink_get_devlink,
+ .ndo_get_devlink_port = nfp_devlink_get_devlink_port,
};
void
return NULL;
}
-int nfp_port_get_port_parent_id(struct net_device *netdev,
- struct netdev_phys_item_id *ppid)
-{
- struct nfp_port *port;
- const u8 *serial;
-
- port = nfp_port_from_netdev(netdev);
- if (!port)
- return -EOPNOTSUPP;
-
- ppid->id_len = nfp_cpp_serial(port->app->cpp, &serial);
- memcpy(&ppid->id, serial, ppid->id_len);
-
- return 0;
-}
-
int nfp_port_setup_tc(struct net_device *netdev, enum tc_setup_type type,
void *type_data)
{
int nfp_devlink_port_register(struct nfp_app *app, struct nfp_port *port);
void nfp_devlink_port_unregister(struct nfp_port *port);
+void nfp_devlink_port_type_eth_set(struct nfp_port *port);
+void nfp_devlink_port_type_clear(struct nfp_port *port);
/**
* Mac stats (0x0000 - 0x0200)
#define NFP_VERSIONS_NCSI_OFF 22
#define NFP_VERSIONS_CFGR_OFF 26
+#define NSP_SFF_EEPROM_BLOCK_LEN 8
+
enum nfp_nsp_cmd {
SPCODE_NOOP = 0, /* No operation */
SPCODE_SOFT_RESET = 1, /* Soft reset the NFP */
SPCODE_FW_STORED = 16, /* If no FW loaded, load flash app FW */
SPCODE_HWINFO_LOOKUP = 17, /* Lookup HWinfo with overwrites etc. */
SPCODE_VERSIONS = 21, /* Report FW versions */
+ SPCODE_READ_SFF_EEPROM = 22, /* Read module EEPROM */
};
struct nfp_nsp_dma_buf {
return (const char *)&buf[buf_off];
}
+
+static int
+__nfp_nsp_module_eeprom(struct nfp_nsp *state, void *buf, unsigned int size)
+{
+ struct nfp_nsp_command_buf_arg module_eeprom = {
+ {
+ .code = SPCODE_READ_SFF_EEPROM,
+ .option = size,
+ },
+ .in_buf = buf,
+ .in_size = size,
+ .out_buf = buf,
+ .out_size = size,
+ };
+
+ return nfp_nsp_command_buf(state, &module_eeprom);
+}
+
+int nfp_nsp_read_module_eeprom(struct nfp_nsp *state, int eth_index,
+ unsigned int offset, void *data,
+ unsigned int len, unsigned int *read_len)
+{
+ struct eeprom_buf {
+ u8 metalen;
+ __le16 length;
+ __le16 offset;
+ __le16 readlen;
+ u8 eth_index;
+ u8 data[0];
+ } __packed *buf;
+ int bufsz, ret;
+
+ BUILD_BUG_ON(offsetof(struct eeprom_buf, data) % 8);
+
+ /* Buffer must be large enough and rounded to the next block size. */
+ bufsz = struct_size(buf, data, round_up(len, NSP_SFF_EEPROM_BLOCK_LEN));
+ buf = kzalloc(bufsz, GFP_KERNEL);
+ if (!buf)
+ return -ENOMEM;
+
+ buf->metalen =
+ offsetof(struct eeprom_buf, data) / NSP_SFF_EEPROM_BLOCK_LEN;
+ buf->length = cpu_to_le16(len);
+ buf->offset = cpu_to_le16(offset);
+ buf->eth_index = eth_index;
+
+ ret = __nfp_nsp_module_eeprom(state, buf, bufsz);
+
+ *read_len = min_t(unsigned int, len, le16_to_cpu(buf->readlen));
+ if (*read_len)
+ memcpy(data, buf->data, *read_len);
+
+ if (!ret && *read_len < len)
+ ret = -EIO;
+
+ kfree(buf);
+
+ return ret;
+}
int nfp_nsp_mac_reinit(struct nfp_nsp *state);
int nfp_nsp_load_stored_fw(struct nfp_nsp *state);
int nfp_nsp_hwinfo_lookup(struct nfp_nsp *state, void *buf, unsigned int size);
+int nfp_nsp_read_module_eeprom(struct nfp_nsp *state, int eth_index,
+ unsigned int offset, void *data,
+ unsigned int len, unsigned int *read_len);
static inline bool nfp_nsp_has_mac_reinit(struct nfp_nsp *state)
{
return nfp_nsp_get_abi_ver_minor(state) > 27;
}
+static inline bool nfp_nsp_has_read_module_eeprom(struct nfp_nsp *state)
+{
+ return nfp_nsp_get_abi_ver_minor(state) > 28;
+}
+
enum nfp_eth_interface {
NFP_INTERFACE_NONE = 0,
NFP_INTERFACE_SFP = 1,
const int nh_off = skb_network_offset(skb);
const int nh_len = skb_network_header_len(skb);
const int nfrags = skb_shinfo(skb)->nr_frags;
- int cs_size, i, fill, hdr, cpyhdr, evt;
+ int cs_size, i, fill, hdr, evt;
dma_addr_t csdma;
fund = XCT_FUN_ST | XCT_FUN_RR_8BRES |
fill++;
/* Copy the result into the TCP packet */
- cpyhdr = fill;
CS_DESC(csring, fill++) = XCT_FUN_O | XCT_FUN_FUN(csring->fun) |
XCT_FUN_LLEN(2) | XCT_FUN_SE;
CS_DESC(csring, fill++) = XCT_PTR_LEN(2) | XCT_PTR_ADDR(cs_dest) | XCT_PTR_T;
pci_unregister_driver(&pasemi_mac_driver);
}
-int pasemi_mac_init_module(void)
+static int pasemi_mac_init_module(void)
{
int err;
/* Allow DSCP to TC mapping */
QED_MF_DSCP_TO_TC_MAP,
+
+ /* Do not insert a vlan tag with id 0 */
+ QED_MF_DONT_ADD_VLAN0_TAG,
};
enum qed_ufp_mode {
else
p_data->arr[type].update = DONT_UPDATE_DCB_DSCP;
- /* Do not add vlan tag 0 when DCB is enabled and port in UFP/OV mode */
- if ((test_bit(QED_MF_8021Q_TAGGING, &p_hwfn->cdev->mf_bits) ||
- test_bit(QED_MF_8021AD_TAGGING, &p_hwfn->cdev->mf_bits)))
+ if (test_bit(QED_MF_DONT_ADD_VLAN0_TAG, &p_hwfn->cdev->mf_bits))
p_data->arr[type].dont_add_vlan0 = true;
/* QM reconf data */
cdev->mf_bits = BIT(QED_MF_OVLAN_CLSS) |
BIT(QED_MF_LLH_PROTO_CLSS) |
BIT(QED_MF_UFP_SPECIFIC) |
- BIT(QED_MF_8021Q_TAGGING);
+ BIT(QED_MF_8021Q_TAGGING) |
+ BIT(QED_MF_DONT_ADD_VLAN0_TAG);
break;
case NVM_CFG1_GLOB_MF_MODE_BD:
cdev->mf_bits = BIT(QED_MF_OVLAN_CLSS) |
BIT(QED_MF_LLH_PROTO_CLSS) |
- BIT(QED_MF_8021AD_TAGGING);
+ BIT(QED_MF_8021AD_TAGGING) |
+ BIT(QED_MF_DONT_ADD_VLAN0_TAG);
break;
case NVM_CFG1_GLOB_MF_MODE_NPAR1_0:
cdev->mf_bits = BIT(QED_MF_LLH_MAC_CLSS) |
/* Datapath functions definition */
netdev_tx_t qede_start_xmit(struct sk_buff *skb, struct net_device *ndev);
u16 qede_select_queue(struct net_device *dev, struct sk_buff *skb,
- struct net_device *sb_dev,
- select_queue_fallback_t fallback);
+ struct net_device *sb_dev);
netdev_features_t qede_features_check(struct sk_buff *skb,
struct net_device *dev,
netdev_features_t features);
{
char mfw[ETHTOOL_FWVERS_LEN], storm[ETHTOOL_FWVERS_LEN];
struct qede_dev *edev = netdev_priv(ndev);
+ char mbi[ETHTOOL_FWVERS_LEN];
strlcpy(info->driver, "qede", sizeof(info->driver));
- strlcpy(info->version, DRV_MODULE_VERSION, sizeof(info->version));
snprintf(storm, ETHTOOL_FWVERS_LEN, "%d.%d.%d.%d",
edev->dev_info.common.fw_major,
(edev->dev_info.common.mfw_rev >> 8) & 0xFF,
edev->dev_info.common.mfw_rev & 0xFF);
- if ((strlen(storm) + strlen(mfw) + strlen("mfw storm ")) <
- sizeof(info->fw_version)) {
+ if ((strlen(storm) + strlen(DRV_MODULE_VERSION) + strlen("[storm] ")) <
+ sizeof(info->version))
+ snprintf(info->version, sizeof(info->version),
+ "%s [storm %s]", DRV_MODULE_VERSION, storm);
+ else
+ snprintf(info->version, sizeof(info->version),
+ "%s %s", DRV_MODULE_VERSION, storm);
+
+ if (edev->dev_info.common.mbi_version) {
+ snprintf(mbi, ETHTOOL_FWVERS_LEN, "%d.%d.%d",
+ (edev->dev_info.common.mbi_version &
+ QED_MBI_VERSION_2_MASK) >> QED_MBI_VERSION_2_OFFSET,
+ (edev->dev_info.common.mbi_version &
+ QED_MBI_VERSION_1_MASK) >> QED_MBI_VERSION_1_OFFSET,
+ (edev->dev_info.common.mbi_version &
+ QED_MBI_VERSION_0_MASK) >> QED_MBI_VERSION_0_OFFSET);
snprintf(info->fw_version, sizeof(info->fw_version),
- "mfw %s storm %s", mfw, storm);
+ "mbi %s [mfw %s]", mbi, mfw);
} else {
snprintf(info->fw_version, sizeof(info->fw_version),
- "%s %s", mfw, storm);
+ "mfw %s", mfw);
}
strlcpy(info->bus_info, pci_name(edev->pdev), sizeof(info->bus_info));
txq->tx_db.data.bd_prod =
cpu_to_le16(qed_chain_get_prod_idx(&txq->tx_pbl));
- if (!skb->xmit_more || netif_xmit_stopped(netdev_txq))
+ if (!netdev_xmit_more() || netif_xmit_stopped(netdev_txq))
qede_update_tx_producer(txq);
if (unlikely(qed_chain_get_elem_left(&txq->tx_pbl)
< (MAX_SKB_FRAGS + 1))) {
- if (skb->xmit_more)
+ if (netdev_xmit_more())
qede_update_tx_producer(txq);
netif_tx_stop_queue(netdev_txq);
}
u16 qede_select_queue(struct net_device *dev, struct sk_buff *skb,
- struct net_device *sb_dev,
- select_queue_fallback_t fallback)
+ struct net_device *sb_dev)
{
struct qede_dev *edev = netdev_priv(dev);
int total_txq;
total_txq = QEDE_TSS_COUNT(edev) * edev->dev_info.num_tc;
return QEDE_TSS_COUNT(edev) ?
- fallback(dev, skb, NULL) % total_txq : 0;
+ netdev_pick_tx(dev, skb, NULL) % total_txq : 0;
}
/* 8B udp header + 8B base tunnel header + 32B option length */
skb_tx_timestamp(skb);
/* Trigger the MAC to check the TX descriptor */
- if (!skb->xmit_more || netif_queue_stopped(dev))
+ if (!netdev_xmit_more() || netif_queue_stopped(dev))
iowrite16(TM2TX, ioaddr + MTPR);
lp->tx_insert_ptr = descptr->vndescp;
PCIDAC = (1 << 4),
PCIMulRW = (1 << 3),
#define INTT_MASK GENMASK(1, 0)
- INTT_0 = 0x0000, // 8168
- INTT_1 = 0x0001, // 8168
- INTT_2 = 0x0002, // 8168
- INTT_3 = 0x0003, // 8168
/* rtl8169_PHYstatus */
TBI_Enable = 0x80,
u32 ocp_base;
};
+typedef void (*rtl_generic_fct)(struct rtl8169_private *tp);
+
MODULE_AUTHOR("Realtek and the Linux r8169 crew <netdev@vger.kernel.org>");
MODULE_DESCRIPTION("RealTek RTL-8169 Gigabit Ethernet driver");
module_param_named(debug, debug.msg_enable, int, 0);
static void rtl_hw_phy_config(struct net_device *dev)
{
+ static const rtl_generic_fct phy_configs[] = {
+ /* PCI devices. */
+ [RTL_GIGA_MAC_VER_01] = NULL,
+ [RTL_GIGA_MAC_VER_02] = rtl8169s_hw_phy_config,
+ [RTL_GIGA_MAC_VER_03] = rtl8169s_hw_phy_config,
+ [RTL_GIGA_MAC_VER_04] = rtl8169sb_hw_phy_config,
+ [RTL_GIGA_MAC_VER_05] = rtl8169scd_hw_phy_config,
+ [RTL_GIGA_MAC_VER_06] = rtl8169sce_hw_phy_config,
+ /* PCI-E devices. */
+ [RTL_GIGA_MAC_VER_07] = rtl8102e_hw_phy_config,
+ [RTL_GIGA_MAC_VER_08] = rtl8102e_hw_phy_config,
+ [RTL_GIGA_MAC_VER_09] = rtl8102e_hw_phy_config,
+ [RTL_GIGA_MAC_VER_10] = NULL,
+ [RTL_GIGA_MAC_VER_11] = rtl8168bb_hw_phy_config,
+ [RTL_GIGA_MAC_VER_12] = rtl8168bef_hw_phy_config,
+ [RTL_GIGA_MAC_VER_13] = NULL,
+ [RTL_GIGA_MAC_VER_14] = NULL,
+ [RTL_GIGA_MAC_VER_15] = NULL,
+ [RTL_GIGA_MAC_VER_16] = NULL,
+ [RTL_GIGA_MAC_VER_17] = rtl8168bef_hw_phy_config,
+ [RTL_GIGA_MAC_VER_18] = rtl8168cp_1_hw_phy_config,
+ [RTL_GIGA_MAC_VER_19] = rtl8168c_1_hw_phy_config,
+ [RTL_GIGA_MAC_VER_20] = rtl8168c_2_hw_phy_config,
+ [RTL_GIGA_MAC_VER_21] = rtl8168c_3_hw_phy_config,
+ [RTL_GIGA_MAC_VER_22] = rtl8168c_4_hw_phy_config,
+ [RTL_GIGA_MAC_VER_23] = rtl8168cp_2_hw_phy_config,
+ [RTL_GIGA_MAC_VER_24] = rtl8168cp_2_hw_phy_config,
+ [RTL_GIGA_MAC_VER_25] = rtl8168d_1_hw_phy_config,
+ [RTL_GIGA_MAC_VER_26] = rtl8168d_2_hw_phy_config,
+ [RTL_GIGA_MAC_VER_27] = rtl8168d_3_hw_phy_config,
+ [RTL_GIGA_MAC_VER_28] = rtl8168d_4_hw_phy_config,
+ [RTL_GIGA_MAC_VER_29] = rtl8105e_hw_phy_config,
+ [RTL_GIGA_MAC_VER_30] = rtl8105e_hw_phy_config,
+ [RTL_GIGA_MAC_VER_31] = NULL,
+ [RTL_GIGA_MAC_VER_32] = rtl8168e_1_hw_phy_config,
+ [RTL_GIGA_MAC_VER_33] = rtl8168e_1_hw_phy_config,
+ [RTL_GIGA_MAC_VER_34] = rtl8168e_2_hw_phy_config,
+ [RTL_GIGA_MAC_VER_35] = rtl8168f_1_hw_phy_config,
+ [RTL_GIGA_MAC_VER_36] = rtl8168f_2_hw_phy_config,
+ [RTL_GIGA_MAC_VER_37] = rtl8402_hw_phy_config,
+ [RTL_GIGA_MAC_VER_38] = rtl8411_hw_phy_config,
+ [RTL_GIGA_MAC_VER_39] = rtl8106e_hw_phy_config,
+ [RTL_GIGA_MAC_VER_40] = rtl8168g_1_hw_phy_config,
+ [RTL_GIGA_MAC_VER_41] = NULL,
+ [RTL_GIGA_MAC_VER_42] = rtl8168g_2_hw_phy_config,
+ [RTL_GIGA_MAC_VER_43] = rtl8168g_2_hw_phy_config,
+ [RTL_GIGA_MAC_VER_44] = rtl8168g_2_hw_phy_config,
+ [RTL_GIGA_MAC_VER_45] = rtl8168h_1_hw_phy_config,
+ [RTL_GIGA_MAC_VER_46] = rtl8168h_2_hw_phy_config,
+ [RTL_GIGA_MAC_VER_47] = rtl8168h_1_hw_phy_config,
+ [RTL_GIGA_MAC_VER_48] = rtl8168h_2_hw_phy_config,
+ [RTL_GIGA_MAC_VER_49] = rtl8168ep_1_hw_phy_config,
+ [RTL_GIGA_MAC_VER_50] = rtl8168ep_2_hw_phy_config,
+ [RTL_GIGA_MAC_VER_51] = rtl8168ep_2_hw_phy_config,
+ };
struct rtl8169_private *tp = netdev_priv(dev);
- switch (tp->mac_version) {
- case RTL_GIGA_MAC_VER_01:
- break;
- case RTL_GIGA_MAC_VER_02:
- case RTL_GIGA_MAC_VER_03:
- rtl8169s_hw_phy_config(tp);
- break;
- case RTL_GIGA_MAC_VER_04:
- rtl8169sb_hw_phy_config(tp);
- break;
- case RTL_GIGA_MAC_VER_05:
- rtl8169scd_hw_phy_config(tp);
- break;
- case RTL_GIGA_MAC_VER_06:
- rtl8169sce_hw_phy_config(tp);
- break;
- case RTL_GIGA_MAC_VER_07:
- case RTL_GIGA_MAC_VER_08:
- case RTL_GIGA_MAC_VER_09:
- rtl8102e_hw_phy_config(tp);
- break;
- case RTL_GIGA_MAC_VER_11:
- rtl8168bb_hw_phy_config(tp);
- break;
- case RTL_GIGA_MAC_VER_12:
- rtl8168bef_hw_phy_config(tp);
- break;
- case RTL_GIGA_MAC_VER_17:
- rtl8168bef_hw_phy_config(tp);
- break;
- case RTL_GIGA_MAC_VER_18:
- rtl8168cp_1_hw_phy_config(tp);
- break;
- case RTL_GIGA_MAC_VER_19:
- rtl8168c_1_hw_phy_config(tp);
- break;
- case RTL_GIGA_MAC_VER_20:
- rtl8168c_2_hw_phy_config(tp);
- break;
- case RTL_GIGA_MAC_VER_21:
- rtl8168c_3_hw_phy_config(tp);
- break;
- case RTL_GIGA_MAC_VER_22:
- rtl8168c_4_hw_phy_config(tp);
- break;
- case RTL_GIGA_MAC_VER_23:
- case RTL_GIGA_MAC_VER_24:
- rtl8168cp_2_hw_phy_config(tp);
- break;
- case RTL_GIGA_MAC_VER_25:
- rtl8168d_1_hw_phy_config(tp);
- break;
- case RTL_GIGA_MAC_VER_26:
- rtl8168d_2_hw_phy_config(tp);
- break;
- case RTL_GIGA_MAC_VER_27:
- rtl8168d_3_hw_phy_config(tp);
- break;
- case RTL_GIGA_MAC_VER_28:
- rtl8168d_4_hw_phy_config(tp);
- break;
- case RTL_GIGA_MAC_VER_29:
- case RTL_GIGA_MAC_VER_30:
- rtl8105e_hw_phy_config(tp);
- break;
- case RTL_GIGA_MAC_VER_31:
- /* None. */
- break;
- case RTL_GIGA_MAC_VER_32:
- case RTL_GIGA_MAC_VER_33:
- rtl8168e_1_hw_phy_config(tp);
- break;
- case RTL_GIGA_MAC_VER_34:
- rtl8168e_2_hw_phy_config(tp);
- break;
- case RTL_GIGA_MAC_VER_35:
- rtl8168f_1_hw_phy_config(tp);
- break;
- case RTL_GIGA_MAC_VER_36:
- rtl8168f_2_hw_phy_config(tp);
- break;
-
- case RTL_GIGA_MAC_VER_37:
- rtl8402_hw_phy_config(tp);
- break;
-
- case RTL_GIGA_MAC_VER_38:
- rtl8411_hw_phy_config(tp);
- break;
-
- case RTL_GIGA_MAC_VER_39:
- rtl8106e_hw_phy_config(tp);
- break;
-
- case RTL_GIGA_MAC_VER_40:
- rtl8168g_1_hw_phy_config(tp);
- break;
- case RTL_GIGA_MAC_VER_42:
- case RTL_GIGA_MAC_VER_43:
- case RTL_GIGA_MAC_VER_44:
- rtl8168g_2_hw_phy_config(tp);
- break;
- case RTL_GIGA_MAC_VER_45:
- case RTL_GIGA_MAC_VER_47:
- rtl8168h_1_hw_phy_config(tp);
- break;
- case RTL_GIGA_MAC_VER_46:
- case RTL_GIGA_MAC_VER_48:
- rtl8168h_2_hw_phy_config(tp);
- break;
-
- case RTL_GIGA_MAC_VER_49:
- rtl8168ep_1_hw_phy_config(tp);
- break;
- case RTL_GIGA_MAC_VER_50:
- case RTL_GIGA_MAC_VER_51:
- rtl8168ep_2_hw_phy_config(tp);
- break;
-
- case RTL_GIGA_MAC_VER_41:
- default:
- break;
- }
+ if (phy_configs[tp->mac_version])
+ phy_configs[tp->mac_version](tp);
}
static void rtl_schedule_task(struct rtl8169_private *tp, enum rtl_flag flag)
rtl_set_rx_tx_desc_registers(tp);
rtl_lock_config_regs(tp);
+ /* disable interrupt coalescing */
+ RTL_W16(tp, IntrMitigate, 0x0000);
/* Initially a 10 us delay. Turned it into a PCI commit. - FR */
RTL_R8(tp, IntrMask);
RTL_W8(tp, ChipCmd, CmdTxEnb | CmdRxEnb);
rtl8169_set_magic_reg(tp, tp->mac_version);
- /*
- * Undocumented corner. Supposedly:
- * (TxTimer << 12) | (TxPackets << 8) | (RxTimer << 4) | RxPackets
- */
- RTL_W16(tp, IntrMitigate, 0x0000);
-
RTL_W32(tp, RxMissed, 0);
}
rtl_hw_aspm_clkreq_enable(tp, true);
}
-static void rtl_hw_start_8168(struct rtl8169_private *tp)
-{
- RTL_W8(tp, MaxTxPacketSize, TxPacketMax);
-
- tp->cp_cmd &= ~INTT_MASK;
- tp->cp_cmd |= PktCntrDisable | INTT_1;
- RTL_W16(tp, CPlusCmd, tp->cp_cmd);
-
- RTL_W16(tp, IntrMitigate, 0x5100);
-
- /* Work around for RxFIFO overflow. */
- if (tp->mac_version == RTL_GIGA_MAC_VER_11) {
- tp->irq_mask |= RxFIFOOver;
- tp->irq_mask &= ~RxOverflow;
- }
-
- switch (tp->mac_version) {
- case RTL_GIGA_MAC_VER_11:
- rtl_hw_start_8168bb(tp);
- break;
-
- case RTL_GIGA_MAC_VER_12:
- case RTL_GIGA_MAC_VER_17:
- rtl_hw_start_8168bef(tp);
- break;
-
- case RTL_GIGA_MAC_VER_18:
- rtl_hw_start_8168cp_1(tp);
- break;
-
- case RTL_GIGA_MAC_VER_19:
- rtl_hw_start_8168c_1(tp);
- break;
-
- case RTL_GIGA_MAC_VER_20:
- rtl_hw_start_8168c_2(tp);
- break;
-
- case RTL_GIGA_MAC_VER_21:
- rtl_hw_start_8168c_3(tp);
- break;
-
- case RTL_GIGA_MAC_VER_22:
- rtl_hw_start_8168c_4(tp);
- break;
-
- case RTL_GIGA_MAC_VER_23:
- rtl_hw_start_8168cp_2(tp);
- break;
-
- case RTL_GIGA_MAC_VER_24:
- rtl_hw_start_8168cp_3(tp);
- break;
-
- case RTL_GIGA_MAC_VER_25:
- case RTL_GIGA_MAC_VER_26:
- case RTL_GIGA_MAC_VER_27:
- rtl_hw_start_8168d(tp);
- break;
-
- case RTL_GIGA_MAC_VER_28:
- rtl_hw_start_8168d_4(tp);
- break;
-
- case RTL_GIGA_MAC_VER_31:
- rtl_hw_start_8168dp(tp);
- break;
-
- case RTL_GIGA_MAC_VER_32:
- case RTL_GIGA_MAC_VER_33:
- rtl_hw_start_8168e_1(tp);
- break;
- case RTL_GIGA_MAC_VER_34:
- rtl_hw_start_8168e_2(tp);
- break;
-
- case RTL_GIGA_MAC_VER_35:
- case RTL_GIGA_MAC_VER_36:
- rtl_hw_start_8168f_1(tp);
- break;
-
- case RTL_GIGA_MAC_VER_38:
- rtl_hw_start_8411(tp);
- break;
-
- case RTL_GIGA_MAC_VER_40:
- case RTL_GIGA_MAC_VER_41:
- rtl_hw_start_8168g_1(tp);
- break;
- case RTL_GIGA_MAC_VER_42:
- rtl_hw_start_8168g_2(tp);
- break;
-
- case RTL_GIGA_MAC_VER_44:
- rtl_hw_start_8411_2(tp);
- break;
-
- case RTL_GIGA_MAC_VER_45:
- case RTL_GIGA_MAC_VER_46:
- rtl_hw_start_8168h_1(tp);
- break;
-
- case RTL_GIGA_MAC_VER_49:
- rtl_hw_start_8168ep_1(tp);
- break;
-
- case RTL_GIGA_MAC_VER_50:
- rtl_hw_start_8168ep_2(tp);
- break;
-
- case RTL_GIGA_MAC_VER_51:
- rtl_hw_start_8168ep_3(tp);
- break;
-
- default:
- netif_err(tp, drv, tp->dev,
- "unknown chipset (mac_version = %d)\n",
- tp->mac_version);
- break;
- }
-}
-
static void rtl_hw_start_8102e_1(struct rtl8169_private *tp)
{
static const struct ephy_info e_info_8102e_1[] = {
rtl_hw_aspm_clkreq_enable(tp, true);
}
+static void rtl_hw_config(struct rtl8169_private *tp)
+{
+ static const rtl_generic_fct hw_configs[] = {
+ [RTL_GIGA_MAC_VER_07] = rtl_hw_start_8102e_1,
+ [RTL_GIGA_MAC_VER_08] = rtl_hw_start_8102e_3,
+ [RTL_GIGA_MAC_VER_09] = rtl_hw_start_8102e_2,
+ [RTL_GIGA_MAC_VER_10] = NULL,
+ [RTL_GIGA_MAC_VER_11] = rtl_hw_start_8168bb,
+ [RTL_GIGA_MAC_VER_12] = rtl_hw_start_8168bef,
+ [RTL_GIGA_MAC_VER_13] = NULL,
+ [RTL_GIGA_MAC_VER_14] = NULL,
+ [RTL_GIGA_MAC_VER_15] = NULL,
+ [RTL_GIGA_MAC_VER_16] = NULL,
+ [RTL_GIGA_MAC_VER_17] = rtl_hw_start_8168bef,
+ [RTL_GIGA_MAC_VER_18] = rtl_hw_start_8168cp_1,
+ [RTL_GIGA_MAC_VER_19] = rtl_hw_start_8168c_1,
+ [RTL_GIGA_MAC_VER_20] = rtl_hw_start_8168c_2,
+ [RTL_GIGA_MAC_VER_21] = rtl_hw_start_8168c_3,
+ [RTL_GIGA_MAC_VER_22] = rtl_hw_start_8168c_4,
+ [RTL_GIGA_MAC_VER_23] = rtl_hw_start_8168cp_2,
+ [RTL_GIGA_MAC_VER_24] = rtl_hw_start_8168cp_3,
+ [RTL_GIGA_MAC_VER_25] = rtl_hw_start_8168d,
+ [RTL_GIGA_MAC_VER_26] = rtl_hw_start_8168d,
+ [RTL_GIGA_MAC_VER_27] = rtl_hw_start_8168d,
+ [RTL_GIGA_MAC_VER_28] = rtl_hw_start_8168d_4,
+ [RTL_GIGA_MAC_VER_29] = rtl_hw_start_8105e_1,
+ [RTL_GIGA_MAC_VER_30] = rtl_hw_start_8105e_2,
+ [RTL_GIGA_MAC_VER_31] = rtl_hw_start_8168dp,
+ [RTL_GIGA_MAC_VER_32] = rtl_hw_start_8168e_1,
+ [RTL_GIGA_MAC_VER_33] = rtl_hw_start_8168e_1,
+ [RTL_GIGA_MAC_VER_34] = rtl_hw_start_8168e_2,
+ [RTL_GIGA_MAC_VER_35] = rtl_hw_start_8168f_1,
+ [RTL_GIGA_MAC_VER_36] = rtl_hw_start_8168f_1,
+ [RTL_GIGA_MAC_VER_37] = rtl_hw_start_8402,
+ [RTL_GIGA_MAC_VER_38] = rtl_hw_start_8411,
+ [RTL_GIGA_MAC_VER_39] = rtl_hw_start_8106,
+ [RTL_GIGA_MAC_VER_40] = rtl_hw_start_8168g_1,
+ [RTL_GIGA_MAC_VER_41] = rtl_hw_start_8168g_1,
+ [RTL_GIGA_MAC_VER_42] = rtl_hw_start_8168g_2,
+ [RTL_GIGA_MAC_VER_43] = rtl_hw_start_8168g_2,
+ [RTL_GIGA_MAC_VER_44] = rtl_hw_start_8411_2,
+ [RTL_GIGA_MAC_VER_45] = rtl_hw_start_8168h_1,
+ [RTL_GIGA_MAC_VER_46] = rtl_hw_start_8168h_1,
+ [RTL_GIGA_MAC_VER_47] = rtl_hw_start_8168h_1,
+ [RTL_GIGA_MAC_VER_48] = rtl_hw_start_8168h_1,
+ [RTL_GIGA_MAC_VER_49] = rtl_hw_start_8168ep_1,
+ [RTL_GIGA_MAC_VER_50] = rtl_hw_start_8168ep_2,
+ [RTL_GIGA_MAC_VER_51] = rtl_hw_start_8168ep_3,
+ };
+
+ if (hw_configs[tp->mac_version])
+ hw_configs[tp->mac_version](tp);
+}
+
+static void rtl_hw_start_8168(struct rtl8169_private *tp)
+{
+ RTL_W8(tp, MaxTxPacketSize, TxPacketMax);
+
+ /* Workaround for RxFIFO overflow. */
+ if (tp->mac_version == RTL_GIGA_MAC_VER_11) {
+ tp->irq_mask |= RxFIFOOver;
+ tp->irq_mask &= ~RxOverflow;
+ }
+
+ rtl_hw_config(tp);
+}
+
static void rtl_hw_start_8101(struct rtl8169_private *tp)
{
if (tp->mac_version >= RTL_GIGA_MAC_VER_30)
tp->cp_cmd &= CPCMD_QUIRK_MASK;
RTL_W16(tp, CPlusCmd, tp->cp_cmd);
- switch (tp->mac_version) {
- case RTL_GIGA_MAC_VER_07:
- rtl_hw_start_8102e_1(tp);
- break;
-
- case RTL_GIGA_MAC_VER_08:
- rtl_hw_start_8102e_3(tp);
- break;
-
- case RTL_GIGA_MAC_VER_09:
- rtl_hw_start_8102e_2(tp);
- break;
-
- case RTL_GIGA_MAC_VER_29:
- rtl_hw_start_8105e_1(tp);
- break;
- case RTL_GIGA_MAC_VER_30:
- rtl_hw_start_8105e_2(tp);
- break;
-
- case RTL_GIGA_MAC_VER_37:
- rtl_hw_start_8402(tp);
- break;
-
- case RTL_GIGA_MAC_VER_39:
- rtl_hw_start_8106(tp);
- break;
- case RTL_GIGA_MAC_VER_43:
- rtl_hw_start_8168g_2(tp);
- break;
- case RTL_GIGA_MAC_VER_47:
- case RTL_GIGA_MAC_VER_48:
- rtl_hw_start_8168h_1(tp);
- break;
- }
-
- RTL_W16(tp, IntrMitigate, 0x0000);
+ rtl_hw_config(tp);
}
static int rtl8169_change_mtu(struct net_device *dev, int new_mtu)
*/
smp_mb();
if (rtl_tx_slots_avail(tp, MAX_SKB_FRAGS))
- netif_wake_queue(dev);
+ netif_start_queue(dev);
}
return NETDEV_TX_OK;
set_bit(RTL_FLAG_TASK_RESET_PENDING, tp->wk.flags);
}
- if (status & (RTL_EVENT_NAPI | LinkChg)) {
- rtl_irq_disable(tp);
- napi_schedule_irqoff(&tp->napi);
- }
+ rtl_irq_disable(tp);
+ napi_schedule_irqoff(&tp->napi);
out:
rtl_ack_events(tp, status);
}
static u16 ravb_select_queue(struct net_device *ndev, struct sk_buff *skb,
- struct net_device *sb_dev,
- select_queue_fallback_t fallback)
+ struct net_device *sb_dev)
{
/* If skb needs TX timestamp, it is handled in network control queue */
return (skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP) ? RAVB_NC :
switch (event) {
case FIB_EVENT_ENTRY_ADD: /* fall through */
case FIB_EVENT_ENTRY_DEL:
+ if (info->family == AF_INET) {
+ struct fib_entry_notifier_info *fen_info = ptr;
+
+ if (fen_info->fi->fib_nh_is_v6) {
+ NL_SET_ERR_MSG_MOD(info->extack, "IPv6 gateway with IPv4 route is not supported");
+ return notifier_from_errno(-EINVAL);
+ }
+ }
+
memcpy(&fib_work->fen_info, ptr, sizeof(fib_work->fen_info));
/* Take referece on fib_info to prevent it from being
* freed while work is queued. Release it afterwards.
nh = fi->fib_nh;
nh_on_port = (fi->fib_dev == ofdpa_port->dev);
- has_gw = !!nh->nh_gw;
+ has_gw = !!nh->fib_nh_gw4;
if (has_gw && nh_on_port) {
err = ofdpa_port_ipv4_nh(ofdpa_port, flags,
- nh->nh_gw, &index);
+ nh->fib_nh_gw4, &index);
if (err)
return err;
fen_info->tb_id, 0);
if (err)
return err;
- fen_info->fi->fib_nh->nh_flags |= RTNH_F_OFFLOAD;
+ fen_info->fi->fib_nh->fib_nh_flags |= RTNH_F_OFFLOAD;
return 0;
}
ofdpa_port = ofdpa_port_dev_lower_find(fen_info->fi->fib_dev, rocker);
if (!ofdpa_port)
return 0;
- fen_info->fi->fib_nh->nh_flags &= ~RTNH_F_OFFLOAD;
+ fen_info->fi->fib_nh->fib_nh_flags &= ~RTNH_F_OFFLOAD;
return ofdpa_port_fib_ipv4(ofdpa_port, htonl(fen_info->dst),
fen_info->dst_len, fen_info->fi,
fen_info->tb_id, OFDPA_OP_FLAG_REMOVE);
rocker);
if (!ofdpa_port)
continue;
- flow_entry->fi->fib_nh->nh_flags &= ~RTNH_F_OFFLOAD;
+ flow_entry->fi->fib_nh->fib_nh_flags &= ~RTNH_F_OFFLOAD;
ofdpa_flow_tbl_del(ofdpa_port, OFDPA_OP_FLAG_REMOVE,
flow_entry);
}
netdev_tx_sent_queue(tx_queue->core_txq, skb_len);
/* Pass off to hardware */
- if (!skb->xmit_more || netif_xmit_stopped(tx_queue->core_txq)) {
+ if (!netdev_xmit_more() || netif_xmit_stopped(tx_queue->core_txq)) {
struct ef4_tx_queue *txq2 = ef4_tx_queue_partner(tx_queue);
/* There could be packets left on the partner queue if those
ef4_nic_push_buffers(tx_queue);
} else {
- tx_queue->xmit_more_available = skb->xmit_more;
+ tx_queue->xmit_more_available = netdev_xmit_more();
}
tx_queue->tx_packets++;
next = skb->next;
skb->next = NULL;
- if (next)
- skb->xmit_more = true;
efx_enqueue_skb(tx_queue, skb);
skb = next;
}
netdev_tx_t efx_enqueue_skb(struct efx_tx_queue *tx_queue, struct sk_buff *skb)
{
unsigned int old_insert_count = tx_queue->insert_count;
- bool xmit_more = skb->xmit_more;
+ bool xmit_more = netdev_xmit_more();
bool data_mapped = false;
unsigned int segments;
unsigned int skb_len;
if (rc)
goto err;
#ifdef EFX_USE_PIO
- } else if (skb_len <= efx_piobuf_size && !skb->xmit_more &&
+ } else if (skb_len <= efx_piobuf_size && !xmit_more &&
efx_nic_may_tx_pio(tx_queue)) {
/* Use PIO for short packets with an empty queue. */
if (efx_enqueue_skb_pio(tx_queue, skb))
if (__netdev_tx_sent_queue(tx_queue->core_txq, skb_len, xmit_more)) {
struct efx_tx_queue *txq2 = efx_tx_queue_partner(tx_queue);
- /* There could be packets left on the partner queue if those
- * SKBs had skb->xmit_more set. If we do not push those they
+ /* There could be packets left on the partner queue if
+ * xmit_more was set. If we do not push those they
* could be left for a long time and cause a netdev watchdog.
*/
if (txq2->xmit_more_available)
efx_nic_push_buffers(tx_queue);
} else {
- tx_queue->xmit_more_available = skb->xmit_more;
+ tx_queue->xmit_more_available = xmit_more;
}
if (segments) {
#define XGMAC_RSF BIT(5)
#define XGMAC_RTC GENMASK(1, 0)
#define XGMAC_RTC_SHIFT 0
+#define XGMAC_MTL_RXQ_FLOW_CONTROL(x) (0x00001150 + (0x80 * (x)))
+#define XGMAC_RFD GENMASK(31, 17)
+#define XGMAC_RFD_SHIFT 17
+#define XGMAC_RFA GENMASK(15, 1)
+#define XGMAC_RFA_SHIFT 1
#define XGMAC_MTL_QINTEN(x) (0x00001170 + (0x80 * (x)))
#define XGMAC_RXOIE BIT(16)
#define XGMAC_MTL_QINT_STATUS(x) (0x00001174 + (0x80 * (x)))
value &= ~XGMAC_RQS;
value |= (rqs << XGMAC_RQS_SHIFT) & XGMAC_RQS;
+ if ((fifosz >= 4096) && (qmode != MTL_QUEUE_AVB)) {
+ u32 flow = readl(ioaddr + XGMAC_MTL_RXQ_FLOW_CONTROL(channel));
+ unsigned int rfd, rfa;
+
+ value |= XGMAC_EHFC;
+
+ /* Set Threshold for Activating Flow Control to min 2 frames,
+ * i.e. 1500 * 2 = 3000 bytes.
+ *
+ * Set Threshold for Deactivating Flow Control to min 1 frame,
+ * i.e. 1500 bytes.
+ */
+ switch (fifosz) {
+ case 4096:
+ /* This violates the above formula because of FIFO size
+ * limit therefore overflow may occur in spite of this.
+ */
+ rfd = 0x03; /* Full-2.5K */
+ rfa = 0x01; /* Full-1.5K */
+ break;
+
+ case 8192:
+ rfd = 0x06; /* Full-4K */
+ rfa = 0x0a; /* Full-6K */
+ break;
+
+ case 16384:
+ rfd = 0x06; /* Full-4K */
+ rfa = 0x12; /* Full-10K */
+ break;
+
+ default:
+ rfd = 0x06; /* Full-4K */
+ rfa = 0x1e; /* Full-16K */
+ break;
+ }
+
+ flow &= ~XGMAC_RFD;
+ flow |= rfd << XGMAC_RFD_SHIFT;
+
+ flow &= ~XGMAC_RFA;
+ flow |= rfa << XGMAC_RFA_SHIFT;
+
+ writel(flow, ioaddr + XGMAC_MTL_RXQ_FLOW_CONTROL(channel));
+ }
+
writel(value, ioaddr + XGMAC_MTL_RXQ_OPMODE(channel));
/* Enable MTL RX overflow */
#define STMMAC_TX_THRESH (DMA_TX_SIZE / 4)
#define STMMAC_RX_THRESH (DMA_RX_SIZE / 4)
-static int flow_ctrl = FLOW_OFF;
+static int flow_ctrl = FLOW_AUTO;
module_param(flow_ctrl, int, 0644);
MODULE_PARM_DESC(flow_ctrl, "Flow control ability [on/off]");
}
static u16 vsw_select_queue(struct net_device *dev, struct sk_buff *skb,
- struct net_device *sb_dev,
- select_queue_fallback_t fallback)
+ struct net_device *sb_dev)
{
struct vnet_port *port = netdev_priv(dev);
}
static u16 vnet_select_queue(struct net_device *dev, struct sk_buff *skb,
- struct net_device *sb_dev,
- select_queue_fallback_t fallback)
+ struct net_device *sb_dev)
{
struct vnet *vp = netdev_priv(dev);
struct vnet_port *port = __tx_port_find(vp, skb);
smp_wmb();
ring->cur = cur_index + 1;
- if (!pkt_info->skb->xmit_more ||
+ if (!netdev_xmit_more() ||
netif_xmit_stopped(netdev_get_tx_queue(pdata->netdev,
channel->queue_index)))
xlgmac_tx_start_xmit(channel, ring);
static void davinci_mdio_enable(struct davinci_mdio_data *data)
{
/* set enable and clock divider */
- __raw_writel(data->clk_div | CONTROL_ENABLE, &data->regs->control);
+ writel(data->clk_div | CONTROL_ENABLE, &data->regs->control);
}
static int davinci_mdio_reset(struct mii_bus *bus)
msleep(PHY_MAX_ADDR * data->access_time);
/* dump hardware version info */
- ver = __raw_readl(&data->regs->version);
+ ver = readl(&data->regs->version);
dev_info(data->dev,
"davinci mdio revision %d.%d, bus freq %ld\n",
(ver >> 8) & 0xff, ver & 0xff,
goto done;
/* get phy mask from the alive register */
- phy_mask = __raw_readl(&data->regs->alive);
+ phy_mask = readl(&data->regs->alive);
if (phy_mask) {
/* restrict mdio bus to live phys only */
dev_info(data->dev, "detected phy mask %x\n", ~phy_mask);
u32 reg;
while (time_after(timeout, jiffies)) {
- reg = __raw_readl(®s->user[0].access);
+ reg = readl(®s->user[0].access);
if ((reg & USERACCESS_GO) == 0)
return 0;
- reg = __raw_readl(®s->control);
+ reg = readl(®s->control);
if ((reg & CONTROL_IDLE) == 0) {
usleep_range(100, 200);
continue;
return -EAGAIN;
}
- reg = __raw_readl(®s->user[0].access);
+ reg = readl(®s->user[0].access);
if ((reg & USERACCESS_GO) == 0)
return 0;
if (ret < 0)
break;
- __raw_writel(reg, &data->regs->user[0].access);
+ writel(reg, &data->regs->user[0].access);
ret = wait_for_user_access(data);
if (ret == -EAGAIN)
if (ret < 0)
break;
- reg = __raw_readl(&data->regs->user[0].access);
+ reg = readl(&data->regs->user[0].access);
ret = (reg & USERACCESS_ACK) ? (reg & USERACCESS_DATA) : -EIO;
break;
}
if (ret < 0)
break;
- __raw_writel(reg, &data->regs->user[0].access);
+ writel(reg, &data->regs->user[0].access);
ret = wait_for_user_access(data);
if (ret == -EAGAIN)
u32 ctrl;
/* shutdown the scan state machine */
- ctrl = __raw_readl(&data->regs->control);
+ ctrl = readl(&data->regs->control);
ctrl &= ~CONTROL_ENABLE;
- __raw_writel(ctrl, &data->regs->control);
+ writel(ctrl, &data->regs->control);
wait_for_idle(data);
return 0;
#include <linux/netdevice.h>
#include <linux/etherdevice.h>
#include <linux/skbuff.h>
+#include <linux/ethtool.h>
#include <linux/io.h>
#include <linux/slab.h>
#include <linux/of_address.h>
return (bool)*p;
}
+/**
+ * xemaclite_ethtools_get_drvinfo - Get various Axi Emac Lite driver info
+ * @ndev: Pointer to net_device structure
+ * @ed: Pointer to ethtool_drvinfo structure
+ *
+ * This implements ethtool command for getting the driver information.
+ * Issue "ethtool -i ethX" under linux prompt to execute this function.
+ */
+static void xemaclite_ethtools_get_drvinfo(struct net_device *ndev,
+ struct ethtool_drvinfo *ed)
+{
+ strlcpy(ed->driver, DRIVER_NAME, sizeof(ed->driver));
+}
+
+static const struct ethtool_ops xemaclite_ethtool_ops = {
+ .get_drvinfo = xemaclite_ethtools_get_drvinfo,
+ .get_link = ethtool_op_get_link,
+ .get_link_ksettings = phy_ethtool_get_link_ksettings,
+ .set_link_ksettings = phy_ethtool_set_link_ksettings,
+};
+
static const struct net_device_ops xemaclite_netdev_ops;
/**
dev_info(dev, "MAC address is now %pM\n", ndev->dev_addr);
ndev->netdev_ops = &xemaclite_netdev_ops;
+ ndev->ethtool_ops = &xemaclite_ethtool_ops;
ndev->flags &= ~IFF_MULTICAST;
ndev->watchdog_timeo = TX_TIMEOUT;
}
#endif
+/* Ioctl MII Interface */
+static int xemaclite_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
+{
+ if (!dev->phydev || !netif_running(dev))
+ return -EINVAL;
+
+ switch (cmd) {
+ case SIOCGMIIPHY:
+ case SIOCGMIIREG:
+ case SIOCSMIIREG:
+ return phy_mii_ioctl(dev->phydev, rq, cmd);
+ default:
+ return -EOPNOTSUPP;
+ }
+}
+
static const struct net_device_ops xemaclite_netdev_ops = {
.ndo_open = xemaclite_open,
.ndo_stop = xemaclite_close,
.ndo_start_xmit = xemaclite_send,
.ndo_set_mac_address = xemaclite_set_mac_address,
.ndo_tx_timeout = xemaclite_tx_timeout,
+ .ndo_do_ioctl = xemaclite_ioctl,
#ifdef CONFIG_NET_POLL_CONTROLLER
.ndo_poll_controller = xemaclite_poll_controller,
#endif
#include <linux/module.h>
#include <linux/etherdevice.h>
#include <linux/hash.h>
+#include <net/ipv6_stubs.h>
#include <net/dst_metadata.h>
#include <net/gro_cells.h>
#include <net/rtnetlink.h>
#define GENEVE_NETDEV_VER "0.6"
-#define GENEVE_UDP_PORT 6081
-
#define GENEVE_N_VID (1u << 24)
#define GENEVE_VID_MASK (GENEVE_N_VID - 1)
{
.cmd = GTP_CMD_NEWPDP,
.doit = gtp_genl_new_pdp,
- .policy = gtp_genl_policy,
.flags = GENL_ADMIN_PERM,
},
{
.cmd = GTP_CMD_DELPDP,
.doit = gtp_genl_del_pdp,
- .policy = gtp_genl_policy,
.flags = GENL_ADMIN_PERM,
},
{
.cmd = GTP_CMD_GETPDP,
.doit = gtp_genl_get_pdp,
.dumpit = gtp_genl_dump_pdp,
- .policy = gtp_genl_policy,
.flags = GENL_ADMIN_PERM,
},
};
.version = 0,
.hdrsize = 0,
.maxattr = GTPA_MAX,
+ .policy = gtp_genl_policy,
.netnsok = true,
.module = THIS_MODULE,
.ops = gtp_genl_ops,
/* Keep aggregating only if stack says more data is coming
* and not doing mixed modes send and not flow blocked
*/
- xmit_more = skb->xmit_more &&
+ xmit_more = netdev_xmit_more() &&
!packet->cp_partial &&
!netif_xmit_stopped(netdev_get_tx_queue(ndev, packet->q_idx));
* If a valid queue has already been assigned, then use that.
* Otherwise compute tx queue based on hash and the send table.
*
- * This is basically similar to default (__netdev_pick_tx) with the added step
+ * This is basically similar to default (netdev_pick_tx) with the added step
* of using the host send_table when no other queue has been assigned.
*
* TODO support XPS - but get_xps_queue not exported
}
static u16 netvsc_select_queue(struct net_device *ndev, struct sk_buff *skb,
- struct net_device *sb_dev,
- select_queue_fallback_t fallback)
+ struct net_device *sb_dev)
{
struct net_device_context *ndc = netdev_priv(ndev);
struct net_device *vf_netdev;
const struct net_device_ops *vf_ops = vf_netdev->netdev_ops;
if (vf_ops->ndo_select_queue)
- txq = vf_ops->ndo_select_queue(vf_netdev, skb,
- sb_dev, fallback);
+ txq = vf_ops->ndo_select_queue(vf_netdev, skb, sb_dev);
else
- txq = fallback(vf_netdev, skb, NULL);
+ txq = netdev_pick_tx(vf_netdev, skb, NULL);
/* Record the queue selected by VF so that it can be
* used for common case where VF has more queues than
static const struct genl_ops hwsim_nl_ops[] = {
{
.cmd = MAC802154_HWSIM_CMD_NEW_RADIO,
- .policy = hwsim_genl_policy,
.doit = hwsim_new_radio_nl,
.flags = GENL_UNS_ADMIN_PERM,
},
{
.cmd = MAC802154_HWSIM_CMD_DEL_RADIO,
- .policy = hwsim_genl_policy,
.doit = hwsim_del_radio_nl,
.flags = GENL_UNS_ADMIN_PERM,
},
{
.cmd = MAC802154_HWSIM_CMD_GET_RADIO,
- .policy = hwsim_genl_policy,
.doit = hwsim_get_radio_nl,
.dumpit = hwsim_dump_radio_nl,
},
{
.cmd = MAC802154_HWSIM_CMD_NEW_EDGE,
- .policy = hwsim_genl_policy,
.doit = hwsim_new_edge_nl,
.flags = GENL_UNS_ADMIN_PERM,
},
{
.cmd = MAC802154_HWSIM_CMD_DEL_EDGE,
- .policy = hwsim_genl_policy,
.doit = hwsim_del_edge_nl,
.flags = GENL_UNS_ADMIN_PERM,
},
{
.cmd = MAC802154_HWSIM_CMD_SET_EDGE,
- .policy = hwsim_genl_policy,
.doit = hwsim_set_edge_lqi,
.flags = GENL_UNS_ADMIN_PERM,
},
.name = "MAC802154_HWSIM",
.version = 1,
.maxattr = MAC802154_HWSIM_ATTR_MAX,
+ .policy = hwsim_genl_policy,
.module = THIS_MODULE,
.ops = hwsim_nl_ops,
.n_ops = ARRAY_SIZE(hwsim_nl_ops),
return 1;
}
-static int loopback_get_ts_info(struct net_device *netdev,
- struct ethtool_ts_info *ts_info)
-{
- ts_info->so_timestamping = SOF_TIMESTAMPING_TX_SOFTWARE |
- SOF_TIMESTAMPING_RX_SOFTWARE |
- SOF_TIMESTAMPING_SOFTWARE;
-
- ts_info->phc_index = -1;
-
- return 0;
-};
-
static const struct ethtool_ops loopback_ethtool_ops = {
.get_link = always_on,
- .get_ts_info = loopback_get_ts_info,
+ .get_ts_info = ethtool_op_get_ts_info,
};
static int loopback_dev_init(struct net_device *dev)
return 0;
}
-static int copy_rx_sa_stats(struct sk_buff *skb,
- struct macsec_rx_sa_stats __percpu *pstats)
+static noinline_for_stack int
+copy_rx_sa_stats(struct sk_buff *skb,
+ struct macsec_rx_sa_stats __percpu *pstats)
{
struct macsec_rx_sa_stats sum = {0, };
int cpu;
return 0;
}
-static int copy_rx_sc_stats(struct sk_buff *skb,
- struct pcpu_rx_sc_stats __percpu *pstats)
+static noinline_for_stack int
+copy_rx_sc_stats(struct sk_buff *skb, struct pcpu_rx_sc_stats __percpu *pstats)
{
struct macsec_rx_sc_stats sum = {0, };
int cpu;
return 0;
}
-static int copy_tx_sc_stats(struct sk_buff *skb,
- struct pcpu_tx_sc_stats __percpu *pstats)
+static noinline_for_stack int
+copy_tx_sc_stats(struct sk_buff *skb, struct pcpu_tx_sc_stats __percpu *pstats)
{
struct macsec_tx_sc_stats sum = {0, };
int cpu;
return 0;
}
-static int copy_secy_stats(struct sk_buff *skb,
- struct pcpu_secy_stats __percpu *pstats)
+static noinline_for_stack int
+copy_secy_stats(struct sk_buff *skb, struct pcpu_secy_stats __percpu *pstats)
{
struct macsec_dev_stats sum = {0, };
int cpu;
return 1;
}
-static int dump_secy(struct macsec_secy *secy, struct net_device *dev,
- struct sk_buff *skb, struct netlink_callback *cb)
+static noinline_for_stack int
+dump_secy(struct macsec_secy *secy, struct net_device *dev,
+ struct sk_buff *skb, struct netlink_callback *cb)
{
struct macsec_rx_sc *rx_sc;
struct macsec_tx_sc *tx_sc = &secy->tx_sc;
{
.cmd = MACSEC_CMD_GET_TXSC,
.dumpit = macsec_dump_txsc,
- .policy = macsec_genl_policy,
},
{
.cmd = MACSEC_CMD_ADD_RXSC,
.doit = macsec_add_rxsc,
- .policy = macsec_genl_policy,
.flags = GENL_ADMIN_PERM,
},
{
.cmd = MACSEC_CMD_DEL_RXSC,
.doit = macsec_del_rxsc,
- .policy = macsec_genl_policy,
.flags = GENL_ADMIN_PERM,
},
{
.cmd = MACSEC_CMD_UPD_RXSC,
.doit = macsec_upd_rxsc,
- .policy = macsec_genl_policy,
.flags = GENL_ADMIN_PERM,
},
{
.cmd = MACSEC_CMD_ADD_TXSA,
.doit = macsec_add_txsa,
- .policy = macsec_genl_policy,
.flags = GENL_ADMIN_PERM,
},
{
.cmd = MACSEC_CMD_DEL_TXSA,
.doit = macsec_del_txsa,
- .policy = macsec_genl_policy,
.flags = GENL_ADMIN_PERM,
},
{
.cmd = MACSEC_CMD_UPD_TXSA,
.doit = macsec_upd_txsa,
- .policy = macsec_genl_policy,
.flags = GENL_ADMIN_PERM,
},
{
.cmd = MACSEC_CMD_ADD_RXSA,
.doit = macsec_add_rxsa,
- .policy = macsec_genl_policy,
.flags = GENL_ADMIN_PERM,
},
{
.cmd = MACSEC_CMD_DEL_RXSA,
.doit = macsec_del_rxsa,
- .policy = macsec_genl_policy,
.flags = GENL_ADMIN_PERM,
},
{
.cmd = MACSEC_CMD_UPD_RXSA,
.doit = macsec_upd_rxsa,
- .policy = macsec_genl_policy,
.flags = GENL_ADMIN_PERM,
},
};
.hdrsize = 0,
.version = MACSEC_GENL_VERSION,
.maxattr = MACSEC_ATTR_MAX,
+ .policy = macsec_genl_policy,
.netnsok = true,
.module = THIS_MODULE,
.ops = macsec_genl_ops,
#include <linux/notifier.h>
#include <linux/netdevice.h>
#include <linux/etherdevice.h>
+#include <linux/net_tstamp.h>
#include <linux/ethtool.h>
#include <linux/if_arp.h>
#include <linux/if_vlan.h>
#include <net/rtnetlink.h>
#include <net/xfrm.h>
#include <linux/netpoll.h>
+#include <linux/phy.h>
#define MACVLAN_HASH_BITS 8
#define MACVLAN_HASH_SIZE (1<<MACVLAN_HASH_BITS)
return 0;
}
+static int macvlan_do_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
+{
+ struct net_device *real_dev = macvlan_dev_real_dev(dev);
+ const struct net_device_ops *ops = real_dev->netdev_ops;
+ struct ifreq ifrr;
+ int err = -EOPNOTSUPP;
+
+ strncpy(ifrr.ifr_name, real_dev->name, IFNAMSIZ);
+ ifrr.ifr_ifru = ifr->ifr_ifru;
+
+ switch (cmd) {
+ case SIOCSHWTSTAMP:
+ case SIOCGHWTSTAMP:
+ if (netif_device_present(real_dev) && ops->ndo_do_ioctl)
+ err = ops->ndo_do_ioctl(real_dev, &ifrr, cmd);
+ break;
+ }
+
+ if (!err)
+ ifr->ifr_ifru = ifrr.ifr_ifru;
+
+ return err;
+}
+
/*
* macvlan network devices have devices nesting below it and are a special
* "super class" of normal network devices; split their locks off into a
return __ethtool_get_link_ksettings(vlan->lowerdev, cmd);
}
+static int macvlan_ethtool_get_ts_info(struct net_device *dev,
+ struct ethtool_ts_info *info)
+{
+ struct net_device *real_dev = macvlan_dev_real_dev(dev);
+ const struct ethtool_ops *ops = real_dev->ethtool_ops;
+ struct phy_device *phydev = real_dev->phydev;
+
+ if (phydev && phydev->drv && phydev->drv->ts_info) {
+ return phydev->drv->ts_info(phydev, info);
+ } else if (ops->get_ts_info) {
+ return ops->get_ts_info(real_dev, info);
+ } else {
+ info->so_timestamping = SOF_TIMESTAMPING_RX_SOFTWARE |
+ SOF_TIMESTAMPING_SOFTWARE;
+ info->phc_index = -1;
+ }
+
+ return 0;
+}
+
static netdev_features_t macvlan_fix_features(struct net_device *dev,
netdev_features_t features)
{
.get_link = ethtool_op_get_link,
.get_link_ksettings = macvlan_ethtool_get_link_ksettings,
.get_drvinfo = macvlan_ethtool_get_drvinfo,
+ .get_ts_info = macvlan_ethtool_get_ts_info,
};
static const struct net_device_ops macvlan_netdev_ops = {
.ndo_stop = macvlan_stop,
.ndo_start_xmit = macvlan_start_xmit,
.ndo_change_mtu = macvlan_change_mtu,
+ .ndo_do_ioctl = macvlan_do_ioctl,
.ndo_fix_features = macvlan_fix_features,
.ndo_change_rx_flags = macvlan_change_rx_flags,
.ndo_set_mac_address = macvlan_set_mac_address,
static u16 net_failover_select_queue(struct net_device *dev,
struct sk_buff *skb,
- struct net_device *sb_dev,
- select_queue_fallback_t fallback)
+ struct net_device *sb_dev)
{
struct net_failover_info *nfo_info = netdev_priv(dev);
struct net_device *primary_dev;
const struct net_device_ops *ops = primary_dev->netdev_ops;
if (ops->ndo_select_queue)
- txq = ops->ndo_select_queue(primary_dev, skb,
- sb_dev, fallback);
+ txq = ops->ndo_select_queue(primary_dev, skb, sb_dev);
else
- txq = fallback(primary_dev, skb, NULL);
+ txq = netdev_pick_tx(primary_dev, skb, NULL);
qdisc_skb_cb(skb)->slave_dev_queue_mapping = skb->queue_mapping;
obj-$(CONFIG_NETDEVSIM) += netdevsim.o
netdevsim-objs := \
- netdev.o \
+ netdev.o devlink.o fib.o sdev.o \
ifeq ($(CONFIG_BPF_SYSCALL),y)
netdevsim-objs += \
bpf.o
endif
-ifneq ($(CONFIG_NET_DEVLINK),)
-netdevsim-objs += devlink.o fib.o
-endif
-
ifneq ($(CONFIG_XFRM_OFFLOAD),)
netdevsim-objs += ipsec.o
endif
bpf_verifier_log_write(env, "[netdevsim] " fmt, ##__VA_ARGS__)
struct nsim_bpf_bound_prog {
- struct netdevsim *ns;
+ struct netdevsim_shared_dev *sdev;
struct bpf_prog *prog;
struct dentry *ddir;
const char *state;
struct nsim_bpf_bound_prog *state;
state = env->prog->aux->offload->dev_priv;
- if (state->ns->bpf_bind_verifier_delay && !insn_idx)
- msleep(state->ns->bpf_bind_verifier_delay);
+ if (state->sdev->bpf_bind_verifier_delay && !insn_idx)
+ msleep(state->sdev->bpf_bind_verifier_delay);
if (insn_idx == env->prog->len - 1)
pr_vlog(env, "Hello from netdevsim!\n");
return 0;
}
-static int nsim_bpf_create_prog(struct netdevsim *ns, struct bpf_prog *prog)
+static int nsim_bpf_create_prog(struct netdevsim_shared_dev *sdev,
+ struct bpf_prog *prog)
{
struct nsim_bpf_bound_prog *state;
char name[16];
if (!state)
return -ENOMEM;
- state->ns = ns;
+ state->sdev = sdev;
state->prog = prog;
state->state = "verify";
/* Program id is not populated yet when we create the state. */
- sprintf(name, "%u", ns->sdev->prog_id_gen++);
- state->ddir = debugfs_create_dir(name, ns->sdev->ddir_bpf_bound_progs);
+ sprintf(name, "%u", sdev->prog_id_gen++);
+ state->ddir = debugfs_create_dir(name, sdev->ddir_bpf_bound_progs);
if (IS_ERR_OR_NULL(state->ddir)) {
kfree(state);
return -ENOMEM;
&state->state, &nsim_bpf_string_fops);
debugfs_create_bool("loaded", 0400, state->ddir, &state->is_loaded);
- list_add_tail(&state->l, &ns->sdev->bpf_bound_progs);
+ list_add_tail(&state->l, &sdev->bpf_bound_progs);
prog->aux->offload->dev_priv = state;
static int nsim_bpf_verifier_prep(struct bpf_prog *prog)
{
- struct netdevsim *ns = bpf_offload_dev_priv(prog->aux->offload->offdev);
+ struct netdevsim_shared_dev *sdev =
+ bpf_offload_dev_priv(prog->aux->offload->offdev);
- if (!ns->bpf_bind_accept)
+ if (!sdev->bpf_bind_accept)
return -EOPNOTSUPP;
- return nsim_bpf_create_prog(ns, prog);
+ return nsim_bpf_create_prog(sdev, prog);
}
static int nsim_bpf_translate(struct bpf_prog *prog)
}
}
-int nsim_bpf_init(struct netdevsim *ns)
+static int nsim_bpf_sdev_init(struct netdevsim_shared_dev *sdev)
{
int err;
- if (ns->sdev->refcnt == 1) {
- INIT_LIST_HEAD(&ns->sdev->bpf_bound_progs);
- INIT_LIST_HEAD(&ns->sdev->bpf_bound_maps);
+ INIT_LIST_HEAD(&sdev->bpf_bound_progs);
+ INIT_LIST_HEAD(&sdev->bpf_bound_maps);
- ns->sdev->ddir_bpf_bound_progs =
- debugfs_create_dir("bpf_bound_progs", ns->sdev->ddir);
- if (IS_ERR_OR_NULL(ns->sdev->ddir_bpf_bound_progs))
- return -ENOMEM;
+ sdev->ddir_bpf_bound_progs =
+ debugfs_create_dir("bpf_bound_progs", sdev->ddir);
+ if (IS_ERR_OR_NULL(sdev->ddir_bpf_bound_progs))
+ return -ENOMEM;
+
+ sdev->bpf_dev = bpf_offload_dev_create(&nsim_bpf_dev_ops, sdev);
+ err = PTR_ERR_OR_ZERO(sdev->bpf_dev);
+ if (err)
+ return err;
- ns->sdev->bpf_dev = bpf_offload_dev_create(&nsim_bpf_dev_ops,
- ns);
- err = PTR_ERR_OR_ZERO(ns->sdev->bpf_dev);
+ sdev->bpf_bind_accept = true;
+ debugfs_create_bool("bpf_bind_accept", 0600, sdev->ddir,
+ &sdev->bpf_bind_accept);
+ debugfs_create_u32("bpf_bind_verifier_delay", 0600, sdev->ddir,
+ &sdev->bpf_bind_verifier_delay);
+ return 0;
+}
+
+static void nsim_bpf_sdev_uninit(struct netdevsim_shared_dev *sdev)
+{
+ WARN_ON(!list_empty(&sdev->bpf_bound_progs));
+ WARN_ON(!list_empty(&sdev->bpf_bound_maps));
+ bpf_offload_dev_destroy(sdev->bpf_dev);
+}
+
+int nsim_bpf_init(struct netdevsim *ns)
+{
+ int err;
+
+ if (ns->sdev->refcnt == 1) {
+ err = nsim_bpf_sdev_init(ns->sdev);
if (err)
return err;
}
err = bpf_offload_dev_netdev_register(ns->sdev->bpf_dev, ns->netdev);
if (err)
- goto err_destroy_bdev;
+ goto err_bpf_sdev_uninit;
debugfs_create_u32("bpf_offloaded_id", 0400, ns->ddir,
&ns->bpf_offloaded_id);
- ns->bpf_bind_accept = true;
- debugfs_create_bool("bpf_bind_accept", 0600, ns->ddir,
- &ns->bpf_bind_accept);
- debugfs_create_u32("bpf_bind_verifier_delay", 0600, ns->ddir,
- &ns->bpf_bind_verifier_delay);
-
ns->bpf_tc_accept = true;
debugfs_create_bool("bpf_tc_accept", 0600, ns->ddir,
&ns->bpf_tc_accept);
return 0;
-err_destroy_bdev:
+err_bpf_sdev_uninit:
if (ns->sdev->refcnt == 1)
- bpf_offload_dev_destroy(ns->sdev->bpf_dev);
+ nsim_bpf_sdev_uninit(ns->sdev);
return err;
}
WARN_ON(ns->bpf_offloaded);
bpf_offload_dev_netdev_unregister(ns->sdev->bpf_dev, ns->netdev);
- if (ns->sdev->refcnt == 1) {
- WARN_ON(!list_empty(&ns->sdev->bpf_bound_progs));
- WARN_ON(!list_empty(&ns->sdev->bpf_bound_maps));
- bpf_offload_dev_destroy(ns->sdev->bpf_dev);
- }
+ if (ns->sdev->refcnt == 1)
+ nsim_bpf_sdev_uninit(ns->sdev);
}
#include "netdevsim.h"
+static u32 nsim_dev_id;
+
struct nsim_vf_config {
int link_state;
u16 min_tx_rate;
bool rss_query_enabled;
};
-static u32 nsim_dev_id;
-
static struct dentry *nsim_ddir;
-static struct dentry *nsim_sdev_ddir;
static int nsim_num_vf(struct device *dev)
{
struct netdevsim *ns = to_nsim(dev);
nsim_vfs_disable(ns);
- free_netdev(ns->netdev);
}
static struct device_type nsim_dev_type = {
static int nsim_init(struct net_device *dev)
{
- char sdev_ddir_name[10], sdev_link_name[32];
struct netdevsim *ns = netdev_priv(dev);
+ char sdev_link_name[32];
int err;
ns->netdev = dev;
if (IS_ERR_OR_NULL(ns->ddir))
return -ENOMEM;
- if (!ns->sdev) {
- ns->sdev = kzalloc(sizeof(*ns->sdev), GFP_KERNEL);
- if (!ns->sdev) {
- err = -ENOMEM;
- goto err_debugfs_destroy;
- }
- ns->sdev->refcnt = 1;
- ns->sdev->switch_id = nsim_dev_id;
- sprintf(sdev_ddir_name, "%u", ns->sdev->switch_id);
- ns->sdev->ddir = debugfs_create_dir(sdev_ddir_name,
- nsim_sdev_ddir);
- if (IS_ERR_OR_NULL(ns->sdev->ddir)) {
- err = PTR_ERR_OR_ZERO(ns->sdev->ddir) ?: -EINVAL;
- goto err_sdev_free;
- }
- } else {
- sprintf(sdev_ddir_name, "%u", ns->sdev->switch_id);
- ns->sdev->refcnt++;
- }
-
- sprintf(sdev_link_name, "../../" DRV_NAME "_sdev/%s", sdev_ddir_name);
+ sprintf(sdev_link_name, "../../" DRV_NAME "_sdev/%u",
+ ns->sdev->switch_id);
debugfs_create_symlink("sdev", ns->ddir, sdev_link_name);
err = nsim_bpf_init(ns);
if (err)
- goto err_sdev_destroy;
+ goto err_debugfs_destroy;
ns->dev.id = nsim_dev_id++;
ns->dev.bus = &nsim_bus;
device_unregister(&ns->dev);
err_bpf_uninit:
nsim_bpf_uninit(ns);
-err_sdev_destroy:
- if (!--ns->sdev->refcnt) {
- debugfs_remove_recursive(ns->sdev->ddir);
-err_sdev_free:
- kfree(ns->sdev);
- }
err_debugfs_destroy:
debugfs_remove_recursive(ns->ddir);
return err;
nsim_devlink_teardown(ns);
debugfs_remove_recursive(ns->ddir);
nsim_bpf_uninit(ns);
- if (!--ns->sdev->refcnt) {
- debugfs_remove_recursive(ns->sdev->ddir);
- kfree(ns->sdev);
- }
}
static void nsim_free(struct net_device *dev)
device_unregister(&ns->dev);
/* netdev and vf state will be freed out of device_release() */
+ nsim_sdev_put(ns->sdev);
}
static netdev_tx_t nsim_start_xmit(struct sk_buff *skb, struct net_device *dev)
eth_hw_addr_random(dev);
dev->netdev_ops = &nsim_netdev_ops;
+ dev->needs_free_netdev = true;
dev->priv_destructor = nsim_free;
dev->tx_queue_len = 0;
struct netlink_ext_ack *extack)
{
struct netdevsim *ns = netdev_priv(dev);
+ struct netdevsim *joinns = NULL;
+ int err;
if (tb[IFLA_LINK]) {
struct net_device *joindev;
- struct netdevsim *joinns;
joindev = __dev_get_by_index(src_net,
nla_get_u32(tb[IFLA_LINK]));
return -EINVAL;
joinns = netdev_priv(joindev);
- if (!joinns->sdev || !joinns->sdev->refcnt)
- return -EINVAL;
- ns->sdev = joinns->sdev;
}
- return register_netdevice(dev);
-}
+ ns->sdev = nsim_sdev_get(joinns);
+ if (IS_ERR(ns->sdev))
+ return PTR_ERR(ns->sdev);
-static void nsim_dellink(struct net_device *dev, struct list_head *head)
-{
- unregister_netdevice_queue(dev, head);
+ err = register_netdevice(dev);
+ if (err)
+ goto err_sdev_put;
+ return 0;
+
+err_sdev_put:
+ nsim_sdev_put(ns->sdev);
+ return err;
}
static struct rtnl_link_ops nsim_link_ops __read_mostly = {
.setup = nsim_setup,
.validate = nsim_validate,
.newlink = nsim_newlink,
- .dellink = nsim_dellink,
};
static int __init nsim_module_init(void)
if (IS_ERR_OR_NULL(nsim_ddir))
return -ENOMEM;
- nsim_sdev_ddir = debugfs_create_dir(DRV_NAME "_sdev", NULL);
- if (IS_ERR_OR_NULL(nsim_sdev_ddir)) {
- err = -ENOMEM;
+ err = nsim_sdev_init();
+ if (err)
goto err_debugfs_destroy;
- }
err = bus_register(&nsim_bus);
if (err)
- goto err_sdir_destroy;
+ goto err_sdev_exit;
err = nsim_devlink_init();
if (err)
nsim_devlink_exit();
err_unreg_bus:
bus_unregister(&nsim_bus);
-err_sdir_destroy:
- debugfs_remove_recursive(nsim_sdev_ddir);
+err_sdev_exit:
+ nsim_sdev_exit();
err_debugfs_destroy:
debugfs_remove_recursive(nsim_ddir);
return err;
rtnl_link_unregister(&nsim_link_ops);
nsim_devlink_exit();
bus_unregister(&nsim_bus);
- debugfs_remove_recursive(nsim_sdev_ddir);
+ nsim_sdev_exit();
debugfs_remove_recursive(nsim_ddir);
}
struct bpf_offload_dev *bpf_dev;
+ bool bpf_bind_accept;
+ u32 bpf_bind_verifier_delay;
+
struct dentry *ddir_bpf_bound_progs;
u32 prog_id_gen;
struct list_head bpf_bound_maps;
};
+struct netdevsim;
+
+struct netdevsim_shared_dev *nsim_sdev_get(struct netdevsim *joinns);
+void nsim_sdev_put(struct netdevsim_shared_dev *sdev);
+int nsim_sdev_init(void);
+void nsim_sdev_exit(void);
+
#define NSIM_IPSEC_MAX_SA_COUNT 33
#define NSIM_IPSEC_VALID BIT(31)
struct xdp_attachment_info xdp;
struct xdp_attachment_info xdp_hw;
- bool bpf_bind_accept;
- u32 bpf_bind_verifier_delay;
-
bool bpf_tc_accept;
bool bpf_tc_non_bound_accept;
bool bpf_xdpdrv_accept;
bool bpf_xdpoffload_accept;
bool bpf_map_accept;
-#if IS_ENABLED(CONFIG_NET_DEVLINK)
struct devlink *devlink;
-#endif
struct nsim_ipsec ipsec;
};
}
#endif
-#if IS_ENABLED(CONFIG_NET_DEVLINK)
enum nsim_resource_id {
NSIM_RESOURCE_NONE, /* DEVLINK_RESOURCE_ID_PARENT_TOP */
NSIM_RESOURCE_IPV4,
u64 nsim_fib_get_val(struct net *net, enum nsim_resource_id res_id, bool max);
int nsim_fib_set_max(struct net *net, enum nsim_resource_id res_id, u64 val,
struct netlink_ext_ack *extack);
-#else
-static inline int nsim_devlink_setup(struct netdevsim *ns)
-{
- return 0;
-}
-
-static inline void nsim_devlink_teardown(struct netdevsim *ns)
-{
-}
-
-static inline int nsim_devlink_init(void)
-{
- return 0;
-}
-
-static inline void nsim_devlink_exit(void)
-{
-}
-#endif
#if IS_ENABLED(CONFIG_XFRM_OFFLOAD)
void nsim_ipsec_init(struct netdevsim *ns);
--- /dev/null
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2019 Mellanox Technologies. All rights reserved */
+
+#include <linux/debugfs.h>
+#include <linux/err.h>
+#include <linux/kernel.h>
+#include <linux/slab.h>
+
+#include "netdevsim.h"
+
+static struct dentry *nsim_sdev_ddir;
+
+static u32 nsim_sdev_id;
+
+struct netdevsim_shared_dev *nsim_sdev_get(struct netdevsim *joinns)
+{
+ struct netdevsim_shared_dev *sdev;
+ char sdev_ddir_name[10];
+ int err;
+
+ if (joinns) {
+ if (WARN_ON(!joinns->sdev))
+ return ERR_PTR(-EINVAL);
+ sdev = joinns->sdev;
+ sdev->refcnt++;
+ return sdev;
+ }
+
+ sdev = kzalloc(sizeof(*sdev), GFP_KERNEL);
+ if (!sdev)
+ return ERR_PTR(-ENOMEM);
+ sdev->refcnt = 1;
+ sdev->switch_id = nsim_sdev_id++;
+
+ sprintf(sdev_ddir_name, "%u", sdev->switch_id);
+ sdev->ddir = debugfs_create_dir(sdev_ddir_name, nsim_sdev_ddir);
+ if (IS_ERR_OR_NULL(sdev->ddir)) {
+ err = PTR_ERR_OR_ZERO(sdev->ddir) ?: -EINVAL;
+ goto err_sdev_free;
+ }
+
+ return sdev;
+
+err_sdev_free:
+ nsim_sdev_id--;
+ kfree(sdev);
+ return ERR_PTR(err);
+}
+
+void nsim_sdev_put(struct netdevsim_shared_dev *sdev)
+{
+ if (--sdev->refcnt)
+ return;
+ debugfs_remove_recursive(sdev->ddir);
+ kfree(sdev);
+}
+
+int nsim_sdev_init(void)
+{
+ nsim_sdev_ddir = debugfs_create_dir(DRV_NAME "_sdev", NULL);
+ if (IS_ERR_OR_NULL(nsim_sdev_ddir))
+ return -ENOMEM;
+ return 0;
+}
+
+void nsim_sdev_exit(void)
+{
+ debugfs_remove_recursive(nsim_sdev_ddir);
+}
several child MDIO busses to a parent bus. Child bus
selection is under the control of GPIO lines.
+config MDIO_BUS_MUX_MESON_G12A
+ tristate "Amlogic G12a based MDIO bus multiplexer"
+ depends on ARCH_MESON || COMPILE_TEST
+ depends on OF_MDIO && HAS_IOMEM && COMMON_CLK
+ select MDIO_BUS_MUX
+ default m if ARCH_MESON
+ help
+ This module provides a driver for the MDIO multiplexer/glue of
+ the amlogic g12a SoC. The multiplexers connects either the external
+ or the internal MDIO bus to the parent bus.
+
config MDIO_BUS_MUX_MMIOREG
tristate "MMIO device-controlled MDIO bus multiplexers"
depends on OF_MDIO && HAS_IOMEM
Currently supports the BCM8706 and BCM8727 10G Ethernet PHYs.
config BCM_CYGNUS_PHY
- tristate "Broadcom Cygnus SoC internal PHY"
- depends on ARCH_BCM_CYGNUS || COMPILE_TEST
+ tristate "Broadcom Cygnus/Omega SoC internal PHY"
+ depends on ARCH_BCM_IPROC || COMPILE_TEST
depends on MDIO_BCM_IPROC
select BCM_NET_PHYLIB
---help---
This PHY driver is for the 1G internal PHYs of the Broadcom
- Cygnus Family SoC.
+ Cygnus and Omega Family SoC.
Currently supports internal PHY's used in the BCM11300,
BCM11320, BCM11350, BCM11360, BCM58300, BCM58302,
obj-$(CONFIG_MDIO_BUS_MUX) += mdio-mux.o
obj-$(CONFIG_MDIO_BUS_MUX_BCM_IPROC) += mdio-mux-bcm-iproc.o
obj-$(CONFIG_MDIO_BUS_MUX_GPIO) += mdio-mux-gpio.o
+obj-$(CONFIG_MDIO_BUS_MUX_MESON_G12A) += mdio-mux-meson-g12a.o
obj-$(CONFIG_MDIO_BUS_MUX_MMIOREG) += mdio-mux-mmioreg.o
obj-$(CONFIG_MDIO_BUS_MUX_MULTIPLEXER) += mdio-mux-multiplexer.o
obj-$(CONFIG_MDIO_CAVIUM) += mdio-cavium.o
.phy_id = PHY_ID_AM79C874,
.name = "AM79C874",
.phy_id_mask = 0xfffffff0,
- .features = PHY_BASIC_FEATURES,
+ /* PHY_BASIC_FEATURES */
.config_init = am79c_config_init,
.ack_interrupt = am79c_ack_interrupt,
.config_intr = am79c_config_intr,
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/delay.h>
+#include <linux/bitfield.h>
#include <linux/phy.h>
#include "aquantia.h"
#define PHY_ID_AQCS109 0x03a1b5c2
#define PHY_ID_AQR405 0x03a1b4b0
+#define MDIO_PHYXS_VEND_IF_STATUS 0xe812
+#define MDIO_PHYXS_VEND_IF_STATUS_TYPE_MASK GENMASK(7, 3)
+#define MDIO_PHYXS_VEND_IF_STATUS_TYPE_KR 0
+#define MDIO_PHYXS_VEND_IF_STATUS_TYPE_XFI 2
+#define MDIO_PHYXS_VEND_IF_STATUS_TYPE_SGMII 6
+#define MDIO_PHYXS_VEND_IF_STATUS_TYPE_OCSGMII 10
+
#define MDIO_AN_VEND_PROV 0xc400
#define MDIO_AN_VEND_PROV_1000BASET_FULL BIT(15)
#define MDIO_AN_VEND_PROV_1000BASET_HALF BIT(14)
+#define MDIO_AN_VEND_PROV_DOWNSHIFT_EN BIT(4)
+#define MDIO_AN_VEND_PROV_DOWNSHIFT_MASK GENMASK(3, 0)
+#define MDIO_AN_VEND_PROV_DOWNSHIFT_DFLT 4
#define MDIO_AN_TX_VEND_STATUS1 0xc800
-#define MDIO_AN_TX_VEND_STATUS1_10BASET (0x0 << 1)
-#define MDIO_AN_TX_VEND_STATUS1_100BASETX (0x1 << 1)
-#define MDIO_AN_TX_VEND_STATUS1_1000BASET (0x2 << 1)
-#define MDIO_AN_TX_VEND_STATUS1_10GBASET (0x3 << 1)
-#define MDIO_AN_TX_VEND_STATUS1_2500BASET (0x4 << 1)
-#define MDIO_AN_TX_VEND_STATUS1_5000BASET (0x5 << 1)
-#define MDIO_AN_TX_VEND_STATUS1_RATE_MASK (0x7 << 1)
+#define MDIO_AN_TX_VEND_STATUS1_RATE_MASK GENMASK(3, 1)
+#define MDIO_AN_TX_VEND_STATUS1_10BASET 0
+#define MDIO_AN_TX_VEND_STATUS1_100BASETX 1
+#define MDIO_AN_TX_VEND_STATUS1_1000BASET 2
+#define MDIO_AN_TX_VEND_STATUS1_10GBASET 3
+#define MDIO_AN_TX_VEND_STATUS1_2500BASET 4
+#define MDIO_AN_TX_VEND_STATUS1_5000BASET 5
#define MDIO_AN_TX_VEND_STATUS1_FULL_DUPLEX BIT(0)
+#define MDIO_AN_TX_VEND_INT_STATUS1 0xcc00
+#define MDIO_AN_TX_VEND_INT_STATUS1_DOWNSHIFT BIT(1)
+
#define MDIO_AN_TX_VEND_INT_STATUS2 0xcc01
#define MDIO_AN_TX_VEND_INT_MASK2 0xd401
#define MDIO_AN_RX_LP_STAT1 0xe820
#define MDIO_AN_RX_LP_STAT1_1000BASET_FULL BIT(15)
#define MDIO_AN_RX_LP_STAT1_1000BASET_HALF BIT(14)
+#define MDIO_AN_RX_LP_STAT1_SHORT_REACH BIT(13)
+#define MDIO_AN_RX_LP_STAT1_AQRATE_DOWNSHIFT BIT(12)
+#define MDIO_AN_RX_LP_STAT1_AQ_PHY BIT(2)
+
+#define MDIO_AN_RX_LP_STAT4 0xe823
+#define MDIO_AN_RX_LP_STAT4_FW_MAJOR GENMASK(15, 8)
+#define MDIO_AN_RX_LP_STAT4_FW_MINOR GENMASK(7, 0)
+
+#define MDIO_AN_RX_VEND_STAT3 0xe832
+#define MDIO_AN_RX_VEND_STAT3_AFR BIT(0)
+
+/* MDIO_MMD_C22EXT */
+#define MDIO_C22EXT_STAT_SGMII_RX_GOOD_FRAMES 0xd292
+#define MDIO_C22EXT_STAT_SGMII_RX_BAD_FRAMES 0xd294
+#define MDIO_C22EXT_STAT_SGMII_RX_FALSE_CARRIER 0xd297
+#define MDIO_C22EXT_STAT_SGMII_TX_GOOD_FRAMES 0xd313
+#define MDIO_C22EXT_STAT_SGMII_TX_BAD_FRAMES 0xd315
+#define MDIO_C22EXT_STAT_SGMII_TX_FALSE_CARRIER 0xd317
+#define MDIO_C22EXT_STAT_SGMII_TX_COLLISIONS 0xd318
+#define MDIO_C22EXT_STAT_SGMII_TX_LINE_COLLISIONS 0xd319
+#define MDIO_C22EXT_STAT_SGMII_TX_FRAME_ALIGN_ERR 0xd31a
+#define MDIO_C22EXT_STAT_SGMII_TX_RUNT_FRAMES 0xd31b
/* Vendor specific 1, MDIO_MMD_VEND1 */
+#define VEND1_GLOBAL_FW_ID 0x0020
+#define VEND1_GLOBAL_FW_ID_MAJOR GENMASK(15, 8)
+#define VEND1_GLOBAL_FW_ID_MINOR GENMASK(7, 0)
+
+#define VEND1_GLOBAL_RSVD_STAT1 0xc885
+#define VEND1_GLOBAL_RSVD_STAT1_FW_BUILD_ID GENMASK(7, 4)
+#define VEND1_GLOBAL_RSVD_STAT1_PROV_ID GENMASK(3, 0)
+
+#define VEND1_GLOBAL_RSVD_STAT9 0xc88d
+#define VEND1_GLOBAL_RSVD_STAT9_MODE GENMASK(7, 0)
+#define VEND1_GLOBAL_RSVD_STAT9_1000BT2 0x23
+
#define VEND1_GLOBAL_INT_STD_STATUS 0xfc00
#define VEND1_GLOBAL_INT_VEND_STATUS 0xfc01
#define VEND1_GLOBAL_INT_VEND_MASK_GLOBAL2 BIT(1)
#define VEND1_GLOBAL_INT_VEND_MASK_GLOBAL3 BIT(0)
+struct aqr107_hw_stat {
+ const char *name;
+ int reg;
+ int size;
+};
+
+#define SGMII_STAT(n, r, s) { n, MDIO_C22EXT_STAT_SGMII_ ## r, s }
+static const struct aqr107_hw_stat aqr107_hw_stats[] = {
+ SGMII_STAT("sgmii_rx_good_frames", RX_GOOD_FRAMES, 26),
+ SGMII_STAT("sgmii_rx_bad_frames", RX_BAD_FRAMES, 26),
+ SGMII_STAT("sgmii_rx_false_carrier_events", RX_FALSE_CARRIER, 8),
+ SGMII_STAT("sgmii_tx_good_frames", TX_GOOD_FRAMES, 26),
+ SGMII_STAT("sgmii_tx_bad_frames", TX_BAD_FRAMES, 26),
+ SGMII_STAT("sgmii_tx_false_carrier_events", TX_FALSE_CARRIER, 8),
+ SGMII_STAT("sgmii_tx_collisions", TX_COLLISIONS, 8),
+ SGMII_STAT("sgmii_tx_line_collisions", TX_LINE_COLLISIONS, 8),
+ SGMII_STAT("sgmii_tx_frame_alignment_err", TX_FRAME_ALIGN_ERR, 16),
+ SGMII_STAT("sgmii_tx_runt_frames", TX_RUNT_FRAMES, 22),
+};
+#define AQR107_SGMII_STAT_SZ ARRAY_SIZE(aqr107_hw_stats)
+
+struct aqr107_priv {
+ u64 sgmii_stats[AQR107_SGMII_STAT_SZ];
+};
+
+static int aqr107_get_sset_count(struct phy_device *phydev)
+{
+ return AQR107_SGMII_STAT_SZ;
+}
+
+static void aqr107_get_strings(struct phy_device *phydev, u8 *data)
+{
+ int i;
+
+ for (i = 0; i < AQR107_SGMII_STAT_SZ; i++)
+ strscpy(data + i * ETH_GSTRING_LEN, aqr107_hw_stats[i].name,
+ ETH_GSTRING_LEN);
+}
+
+static u64 aqr107_get_stat(struct phy_device *phydev, int index)
+{
+ const struct aqr107_hw_stat *stat = aqr107_hw_stats + index;
+ int len_l = min(stat->size, 16);
+ int len_h = stat->size - len_l;
+ u64 ret;
+ int val;
+
+ val = phy_read_mmd(phydev, MDIO_MMD_C22EXT, stat->reg);
+ if (val < 0)
+ return U64_MAX;
+
+ ret = val & GENMASK(len_l - 1, 0);
+ if (len_h) {
+ val = phy_read_mmd(phydev, MDIO_MMD_C22EXT, stat->reg + 1);
+ if (val < 0)
+ return U64_MAX;
+
+ ret += (val & GENMASK(len_h - 1, 0)) << 16;
+ }
+
+ return ret;
+}
+
+static void aqr107_get_stats(struct phy_device *phydev,
+ struct ethtool_stats *stats, u64 *data)
+{
+ struct aqr107_priv *priv = phydev->priv;
+ u64 val;
+ int i;
+
+ for (i = 0; i < AQR107_SGMII_STAT_SZ; i++) {
+ val = aqr107_get_stat(phydev, i);
+ if (val == U64_MAX)
+ phydev_err(phydev, "Reading HW Statistics failed for %s\n",
+ aqr107_hw_stats[i].name);
+ else
+ priv->sgmii_stats[i] += val;
+
+ data[i] = priv->sgmii_stats[i];
+ }
+}
+
static int aqr_config_aneg(struct phy_device *phydev)
{
bool changed = false;
static int aqr_config_intr(struct phy_device *phydev)
{
+ bool en = phydev->interrupts == PHY_INTERRUPT_ENABLED;
int err;
- if (phydev->interrupts == PHY_INTERRUPT_ENABLED) {
- err = phy_write_mmd(phydev, MDIO_MMD_AN,
- MDIO_AN_TX_VEND_INT_MASK2,
- MDIO_AN_TX_VEND_INT_MASK2_LINK);
- if (err < 0)
- return err;
-
- err = phy_write_mmd(phydev, MDIO_MMD_VEND1,
- VEND1_GLOBAL_INT_STD_MASK,
- VEND1_GLOBAL_INT_STD_MASK_ALL);
- if (err < 0)
- return err;
-
- err = phy_write_mmd(phydev, MDIO_MMD_VEND1,
- VEND1_GLOBAL_INT_VEND_MASK,
- VEND1_GLOBAL_INT_VEND_MASK_GLOBAL3 |
- VEND1_GLOBAL_INT_VEND_MASK_AN);
- } else {
- err = phy_write_mmd(phydev, MDIO_MMD_AN,
- MDIO_AN_TX_VEND_INT_MASK2, 0);
- if (err < 0)
- return err;
-
- err = phy_write_mmd(phydev, MDIO_MMD_VEND1,
- VEND1_GLOBAL_INT_STD_MASK, 0);
- if (err < 0)
- return err;
-
- err = phy_write_mmd(phydev, MDIO_MMD_VEND1,
- VEND1_GLOBAL_INT_VEND_MASK, 0);
- }
+ err = phy_write_mmd(phydev, MDIO_MMD_AN, MDIO_AN_TX_VEND_INT_MASK2,
+ en ? MDIO_AN_TX_VEND_INT_MASK2_LINK : 0);
+ if (err < 0)
+ return err;
+
+ err = phy_write_mmd(phydev, MDIO_MMD_VEND1, VEND1_GLOBAL_INT_STD_MASK,
+ en ? VEND1_GLOBAL_INT_STD_MASK_ALL : 0);
+ if (err < 0)
+ return err;
- return err;
+ return phy_write_mmd(phydev, MDIO_MMD_VEND1, VEND1_GLOBAL_INT_VEND_MASK,
+ en ? VEND1_GLOBAL_INT_VEND_MASK_GLOBAL3 |
+ VEND1_GLOBAL_INT_VEND_MASK_AN : 0);
}
static int aqr_ack_interrupt(struct phy_device *phydev)
return genphy_c45_read_status(phydev);
}
+static int aqr107_read_downshift_event(struct phy_device *phydev)
+{
+ int val;
+
+ val = phy_read_mmd(phydev, MDIO_MMD_AN, MDIO_AN_TX_VEND_INT_STATUS1);
+ if (val < 0)
+ return val;
+
+ return !!(val & MDIO_AN_TX_VEND_INT_STATUS1_DOWNSHIFT);
+}
+
+static int aqr107_read_rate(struct phy_device *phydev)
+{
+ int val;
+
+ val = phy_read_mmd(phydev, MDIO_MMD_AN, MDIO_AN_TX_VEND_STATUS1);
+ if (val < 0)
+ return val;
+
+ switch (FIELD_GET(MDIO_AN_TX_VEND_STATUS1_RATE_MASK, val)) {
+ case MDIO_AN_TX_VEND_STATUS1_10BASET:
+ phydev->speed = SPEED_10;
+ break;
+ case MDIO_AN_TX_VEND_STATUS1_100BASETX:
+ phydev->speed = SPEED_100;
+ break;
+ case MDIO_AN_TX_VEND_STATUS1_1000BASET:
+ phydev->speed = SPEED_1000;
+ break;
+ case MDIO_AN_TX_VEND_STATUS1_2500BASET:
+ phydev->speed = SPEED_2500;
+ break;
+ case MDIO_AN_TX_VEND_STATUS1_5000BASET:
+ phydev->speed = SPEED_5000;
+ break;
+ case MDIO_AN_TX_VEND_STATUS1_10GBASET:
+ phydev->speed = SPEED_10000;
+ break;
+ default:
+ phydev->speed = SPEED_UNKNOWN;
+ break;
+ }
+
+ if (val & MDIO_AN_TX_VEND_STATUS1_FULL_DUPLEX)
+ phydev->duplex = DUPLEX_FULL;
+ else
+ phydev->duplex = DUPLEX_HALF;
+
+ return 0;
+}
+
+static int aqr107_read_status(struct phy_device *phydev)
+{
+ int val, ret;
+
+ ret = aqr_read_status(phydev);
+ if (ret)
+ return ret;
+
+ if (!phydev->link || phydev->autoneg == AUTONEG_DISABLE)
+ return 0;
+
+ val = phy_read_mmd(phydev, MDIO_MMD_PHYXS, MDIO_PHYXS_VEND_IF_STATUS);
+ if (val < 0)
+ return val;
+
+ switch (FIELD_GET(MDIO_PHYXS_VEND_IF_STATUS_TYPE_MASK, val)) {
+ case MDIO_PHYXS_VEND_IF_STATUS_TYPE_KR:
+ case MDIO_PHYXS_VEND_IF_STATUS_TYPE_XFI:
+ phydev->interface = PHY_INTERFACE_MODE_10GKR;
+ break;
+ case MDIO_PHYXS_VEND_IF_STATUS_TYPE_SGMII:
+ phydev->interface = PHY_INTERFACE_MODE_SGMII;
+ break;
+ case MDIO_PHYXS_VEND_IF_STATUS_TYPE_OCSGMII:
+ phydev->interface = PHY_INTERFACE_MODE_2500BASEX;
+ break;
+ default:
+ phydev->interface = PHY_INTERFACE_MODE_NA;
+ break;
+ }
+
+ val = aqr107_read_downshift_event(phydev);
+ if (val <= 0)
+ return val;
+
+ phydev_warn(phydev, "Downshift occurred! Cabling may be defective.\n");
+
+ /* Read downshifted rate from vendor register */
+ return aqr107_read_rate(phydev);
+}
+
+static int aqr107_get_downshift(struct phy_device *phydev, u8 *data)
+{
+ int val, cnt, enable;
+
+ val = phy_read_mmd(phydev, MDIO_MMD_AN, MDIO_AN_VEND_PROV);
+ if (val < 0)
+ return val;
+
+ enable = FIELD_GET(MDIO_AN_VEND_PROV_DOWNSHIFT_EN, val);
+ cnt = FIELD_GET(MDIO_AN_VEND_PROV_DOWNSHIFT_MASK, val);
+
+ *data = enable && cnt ? cnt : DOWNSHIFT_DEV_DISABLE;
+
+ return 0;
+}
+
+static int aqr107_set_downshift(struct phy_device *phydev, u8 cnt)
+{
+ int val = 0;
+
+ if (!FIELD_FIT(MDIO_AN_VEND_PROV_DOWNSHIFT_MASK, cnt))
+ return -E2BIG;
+
+ if (cnt != DOWNSHIFT_DEV_DISABLE) {
+ val = MDIO_AN_VEND_PROV_DOWNSHIFT_EN;
+ val |= FIELD_PREP(MDIO_AN_VEND_PROV_DOWNSHIFT_MASK, cnt);
+ }
+
+ return phy_modify_mmd(phydev, MDIO_MMD_AN, MDIO_AN_VEND_PROV,
+ MDIO_AN_VEND_PROV_DOWNSHIFT_EN |
+ MDIO_AN_VEND_PROV_DOWNSHIFT_MASK, val);
+}
+
+static int aqr107_get_tunable(struct phy_device *phydev,
+ struct ethtool_tunable *tuna, void *data)
+{
+ switch (tuna->id) {
+ case ETHTOOL_PHY_DOWNSHIFT:
+ return aqr107_get_downshift(phydev, data);
+ default:
+ return -EOPNOTSUPP;
+ }
+}
+
+static int aqr107_set_tunable(struct phy_device *phydev,
+ struct ethtool_tunable *tuna, const void *data)
+{
+ switch (tuna->id) {
+ case ETHTOOL_PHY_DOWNSHIFT:
+ return aqr107_set_downshift(phydev, *(const u8 *)data);
+ default:
+ return -EOPNOTSUPP;
+ }
+}
+
+/* If we configure settings whilst firmware is still initializing the chip,
+ * then these settings may be overwritten. Therefore make sure chip
+ * initialization has completed. Use presence of the firmware ID as
+ * indicator for initialization having completed.
+ * The chip also provides a "reset completed" bit, but it's cleared after
+ * read. Therefore function would time out if called again.
+ */
+static int aqr107_wait_reset_complete(struct phy_device *phydev)
+{
+ int val, retries = 100;
+
+ do {
+ val = phy_read_mmd(phydev, MDIO_MMD_VEND1, VEND1_GLOBAL_FW_ID);
+ if (val < 0)
+ return val;
+ msleep(20);
+ } while (!val && --retries);
+
+ return val ? 0 : -ETIMEDOUT;
+}
+
+static void aqr107_chip_info(struct phy_device *phydev)
+{
+ u8 fw_major, fw_minor, build_id, prov_id;
+ int val;
+
+ val = phy_read_mmd(phydev, MDIO_MMD_VEND1, VEND1_GLOBAL_FW_ID);
+ if (val < 0)
+ return;
+
+ fw_major = FIELD_GET(VEND1_GLOBAL_FW_ID_MAJOR, val);
+ fw_minor = FIELD_GET(VEND1_GLOBAL_FW_ID_MINOR, val);
+
+ val = phy_read_mmd(phydev, MDIO_MMD_VEND1, VEND1_GLOBAL_RSVD_STAT1);
+ if (val < 0)
+ return;
+
+ build_id = FIELD_GET(VEND1_GLOBAL_RSVD_STAT1_FW_BUILD_ID, val);
+ prov_id = FIELD_GET(VEND1_GLOBAL_RSVD_STAT1_PROV_ID, val);
+
+ phydev_dbg(phydev, "FW %u.%u, Build %u, Provisioning %u\n",
+ fw_major, fw_minor, build_id, prov_id);
+}
+
+static int aqr107_config_init(struct phy_device *phydev)
+{
+ int ret;
+
+ /* Check that the PHY interface type is compatible */
+ if (phydev->interface != PHY_INTERFACE_MODE_SGMII &&
+ phydev->interface != PHY_INTERFACE_MODE_2500BASEX &&
+ phydev->interface != PHY_INTERFACE_MODE_10GKR)
+ return -ENODEV;
+
+ ret = aqr107_wait_reset_complete(phydev);
+ if (!ret)
+ aqr107_chip_info(phydev);
+
+ /* ensure that a latched downshift event is cleared */
+ aqr107_read_downshift_event(phydev);
+
+ return aqr107_set_downshift(phydev, MDIO_AN_VEND_PROV_DOWNSHIFT_DFLT);
+}
+
static int aqcs109_config_init(struct phy_device *phydev)
{
+ int ret;
+
+ /* Check that the PHY interface type is compatible */
+ if (phydev->interface != PHY_INTERFACE_MODE_SGMII &&
+ phydev->interface != PHY_INTERFACE_MODE_2500BASEX)
+ return -ENODEV;
+
+ ret = aqr107_wait_reset_complete(phydev);
+ if (!ret)
+ aqr107_chip_info(phydev);
+
/* AQCS109 belongs to a chip family partially supporting 10G and 5G.
* PMA speed ability bits are the same for all members of the family,
* AQCS109 however supports speeds up to 2.5G only.
*/
- return phy_set_max_speed(phydev, SPEED_2500);
+ ret = phy_set_max_speed(phydev, SPEED_2500);
+ if (ret)
+ return ret;
+
+ /* ensure that a latched downshift event is cleared */
+ aqr107_read_downshift_event(phydev);
+
+ return aqr107_set_downshift(phydev, MDIO_AN_VEND_PROV_DOWNSHIFT_DFLT);
+}
+
+static void aqr107_link_change_notify(struct phy_device *phydev)
+{
+ u8 fw_major, fw_minor;
+ bool downshift, short_reach, afr;
+ int mode, val;
+
+ if (phydev->state != PHY_RUNNING || phydev->autoneg == AUTONEG_DISABLE)
+ return;
+
+ val = phy_read_mmd(phydev, MDIO_MMD_AN, MDIO_AN_RX_LP_STAT1);
+ /* call failed or link partner is no Aquantia PHY */
+ if (val < 0 || !(val & MDIO_AN_RX_LP_STAT1_AQ_PHY))
+ return;
+
+ short_reach = val & MDIO_AN_RX_LP_STAT1_SHORT_REACH;
+ downshift = val & MDIO_AN_RX_LP_STAT1_AQRATE_DOWNSHIFT;
+
+ val = phy_read_mmd(phydev, MDIO_MMD_AN, MDIO_AN_RX_LP_STAT4);
+ if (val < 0)
+ return;
+
+ fw_major = FIELD_GET(MDIO_AN_RX_LP_STAT4_FW_MAJOR, val);
+ fw_minor = FIELD_GET(MDIO_AN_RX_LP_STAT4_FW_MINOR, val);
+
+ val = phy_read_mmd(phydev, MDIO_MMD_AN, MDIO_AN_RX_VEND_STAT3);
+ if (val < 0)
+ return;
+
+ afr = val & MDIO_AN_RX_VEND_STAT3_AFR;
+
+ phydev_dbg(phydev, "Link partner is Aquantia PHY, FW %u.%u%s%s%s\n",
+ fw_major, fw_minor,
+ short_reach ? ", short reach mode" : "",
+ downshift ? ", fast-retrain downshift advertised" : "",
+ afr ? ", fast reframe advertised" : "");
+
+ val = phy_read_mmd(phydev, MDIO_MMD_VEND1, VEND1_GLOBAL_RSVD_STAT9);
+ if (val < 0)
+ return;
+
+ mode = FIELD_GET(VEND1_GLOBAL_RSVD_STAT9_MODE, val);
+ if (mode == VEND1_GLOBAL_RSVD_STAT9_1000BT2)
+ phydev_info(phydev, "Aquantia 1000Base-T2 mode active\n");
+}
+
+static int aqr107_suspend(struct phy_device *phydev)
+{
+ return phy_set_bits_mmd(phydev, MDIO_MMD_VEND1, MDIO_CTRL1,
+ MDIO_CTRL1_LPOWER);
+}
+
+static int aqr107_resume(struct phy_device *phydev)
+{
+ return phy_clear_bits_mmd(phydev, MDIO_MMD_VEND1, MDIO_CTRL1,
+ MDIO_CTRL1_LPOWER);
+}
+
+static int aqr107_probe(struct phy_device *phydev)
+{
+ phydev->priv = devm_kzalloc(&phydev->mdio.dev,
+ sizeof(struct aqr107_priv), GFP_KERNEL);
+ if (!phydev->priv)
+ return -ENOMEM;
+
+ return aqr_hwmon_probe(phydev);
}
static struct phy_driver aqr_driver[] = {
{
PHY_ID_MATCH_MODEL(PHY_ID_AQ1202),
.name = "Aquantia AQ1202",
- .aneg_done = genphy_c45_aneg_done,
- .get_features = genphy_c45_pma_read_abilities,
.config_aneg = aqr_config_aneg,
.config_intr = aqr_config_intr,
.ack_interrupt = aqr_ack_interrupt,
{
PHY_ID_MATCH_MODEL(PHY_ID_AQ2104),
.name = "Aquantia AQ2104",
- .aneg_done = genphy_c45_aneg_done,
- .get_features = genphy_c45_pma_read_abilities,
.config_aneg = aqr_config_aneg,
.config_intr = aqr_config_intr,
.ack_interrupt = aqr_ack_interrupt,
{
PHY_ID_MATCH_MODEL(PHY_ID_AQR105),
.name = "Aquantia AQR105",
- .aneg_done = genphy_c45_aneg_done,
- .get_features = genphy_c45_pma_read_abilities,
.config_aneg = aqr_config_aneg,
.config_intr = aqr_config_intr,
.ack_interrupt = aqr_ack_interrupt,
{
PHY_ID_MATCH_MODEL(PHY_ID_AQR106),
.name = "Aquantia AQR106",
- .aneg_done = genphy_c45_aneg_done,
- .get_features = genphy_c45_pma_read_abilities,
.config_aneg = aqr_config_aneg,
.config_intr = aqr_config_intr,
.ack_interrupt = aqr_ack_interrupt,
{
PHY_ID_MATCH_MODEL(PHY_ID_AQR107),
.name = "Aquantia AQR107",
- .aneg_done = genphy_c45_aneg_done,
- .get_features = genphy_c45_pma_read_abilities,
- .probe = aqr_hwmon_probe,
+ .probe = aqr107_probe,
+ .config_init = aqr107_config_init,
.config_aneg = aqr_config_aneg,
.config_intr = aqr_config_intr,
.ack_interrupt = aqr_ack_interrupt,
- .read_status = aqr_read_status,
+ .read_status = aqr107_read_status,
+ .get_tunable = aqr107_get_tunable,
+ .set_tunable = aqr107_set_tunable,
+ .suspend = aqr107_suspend,
+ .resume = aqr107_resume,
+ .get_sset_count = aqr107_get_sset_count,
+ .get_strings = aqr107_get_strings,
+ .get_stats = aqr107_get_stats,
+ .link_change_notify = aqr107_link_change_notify,
},
{
PHY_ID_MATCH_MODEL(PHY_ID_AQCS109),
.name = "Aquantia AQCS109",
- .aneg_done = genphy_c45_aneg_done,
- .get_features = genphy_c45_pma_read_abilities,
- .probe = aqr_hwmon_probe,
+ .probe = aqr107_probe,
.config_init = aqcs109_config_init,
.config_aneg = aqr_config_aneg,
.config_intr = aqr_config_intr,
.ack_interrupt = aqr_ack_interrupt,
- .read_status = aqr_read_status,
+ .read_status = aqr107_read_status,
+ .get_tunable = aqr107_get_tunable,
+ .set_tunable = aqr107_set_tunable,
+ .suspend = aqr107_suspend,
+ .resume = aqr107_resume,
+ .get_sset_count = aqr107_get_sset_count,
+ .get_strings = aqr107_get_strings,
+ .get_stats = aqr107_get_stats,
+ .link_change_notify = aqr107_link_change_notify,
},
{
PHY_ID_MATCH_MODEL(PHY_ID_AQR405),
.name = "Aquantia AQR405",
- .aneg_done = genphy_c45_aneg_done,
- .get_features = genphy_c45_pma_read_abilities,
.config_aneg = aqr_config_aneg,
.config_intr = aqr_config_intr,
.ack_interrupt = aqr_ack_interrupt,
.phy_id = PHY_ID_ASIX_AX88796B,
.name = "Asix Electronics AX88796B",
.phy_id_mask = 0xfffffff0,
- .features = PHY_BASIC_FEATURES,
+ /* PHY_BASIC_FEATURES */
.soft_reset = asix_soft_reset,
} };
static void at803x_link_change_notify(struct phy_device *phydev)
{
- struct at803x_priv *priv = phydev->priv;
-
/*
* Conduct a hardware reset for AT8030 every time a link loss is
* signalled. This is necessary to circumvent a hardware bug that
* in the FIFO. In such cases, the FIFO enters an error mode it
* cannot recover from by software.
*/
- if (phydev->state == PHY_NOLINK) {
- if (phydev->mdio.reset && !priv->phy_reset) {
- struct at803x_context context;
+ if (phydev->state == PHY_NOLINK && phydev->mdio.reset) {
+ struct at803x_context context;
- at803x_context_save(phydev, &context);
+ at803x_context_save(phydev, &context);
- phy_device_reset(phydev, 1);
- msleep(1);
- phy_device_reset(phydev, 0);
- msleep(1);
+ phy_device_reset(phydev, 1);
+ msleep(1);
+ phy_device_reset(phydev, 0);
+ msleep(1);
- at803x_context_restore(phydev, &context);
+ at803x_context_restore(phydev, &context);
- phydev_dbg(phydev, "%s(): phy was reset\n",
- __func__);
- priv->phy_reset = true;
- }
- } else {
- priv->phy_reset = false;
+ phydev_dbg(phydev, "%s(): phy was reset\n", __func__);
}
}
.get_wol = at803x_get_wol,
.suspend = at803x_suspend,
.resume = at803x_resume,
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.ack_interrupt = at803x_ack_interrupt,
.config_intr = at803x_config_intr,
}, {
.get_wol = at803x_get_wol,
.suspend = at803x_suspend,
.resume = at803x_resume,
- .features = PHY_BASIC_FEATURES,
+ /* PHY_BASIC_FEATURES */
.ack_interrupt = at803x_ack_interrupt,
.config_intr = at803x_config_intr,
}, {
.get_wol = at803x_get_wol,
.suspend = at803x_suspend,
.resume = at803x_resume,
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.aneg_done = at803x_aneg_done,
.ack_interrupt = &at803x_ack_interrupt,
.config_intr = &at803x_config_intr,
#include <linux/netdevice.h>
#include <linux/phy.h>
+struct bcm_omega_phy_priv {
+ u64 *stats;
+};
+
/* Broadcom Cygnus Phy specific registers */
#define MII_BCM_CYGNUS_AFE_VDAC_ICTRL_0 0x91E5 /* VDAL Control register */
return genphy_config_aneg(phydev);
}
+static int bcm_omega_config_init(struct phy_device *phydev)
+{
+ u8 count, rev;
+ int ret = 0;
+
+ rev = phydev->phy_id & ~phydev->drv->phy_id_mask;
+
+ pr_info_once("%s: %s PHY revision: 0x%02x\n",
+ phydev_name(phydev), phydev->drv->name, rev);
+
+ /* Dummy read to a register to workaround an issue upon reset where the
+ * internal inverter may not allow the first MDIO transaction to pass
+ * the MDIO management controller and make us return 0xffff for such
+ * reads.
+ */
+ phy_read(phydev, MII_BMSR);
+
+ switch (rev) {
+ case 0x00:
+ ret = bcm_phy_28nm_a0b0_afe_config_init(phydev);
+ break;
+ default:
+ break;
+ }
+
+ if (ret)
+ return ret;
+
+ ret = bcm_phy_downshift_get(phydev, &count);
+ if (ret)
+ return ret;
+
+ /* Only enable EEE if Wirespeed/downshift is disabled */
+ ret = bcm_phy_set_eee(phydev, count == DOWNSHIFT_DEV_DISABLE);
+ if (ret)
+ return ret;
+
+ return bcm_phy_enable_apd(phydev, true);
+}
+
+static int bcm_omega_resume(struct phy_device *phydev)
+{
+ int ret;
+
+ /* Re-apply workarounds coming out suspend/resume */
+ ret = bcm_omega_config_init(phydev);
+ if (ret)
+ return ret;
+
+ /* 28nm Gigabit PHYs come out of reset without any half-duplex
+ * or "hub" compliant advertised mode, fix that. This does not
+ * cause any problems with the PHY library since genphy_config_aneg()
+ * gracefully handles auto-negotiated and forced modes.
+ */
+ return genphy_config_aneg(phydev);
+}
+
+static int bcm_omega_get_tunable(struct phy_device *phydev,
+ struct ethtool_tunable *tuna, void *data)
+{
+ switch (tuna->id) {
+ case ETHTOOL_PHY_DOWNSHIFT:
+ return bcm_phy_downshift_get(phydev, (u8 *)data);
+ default:
+ return -EOPNOTSUPP;
+ }
+}
+
+static int bcm_omega_set_tunable(struct phy_device *phydev,
+ struct ethtool_tunable *tuna,
+ const void *data)
+{
+ u8 count = *(u8 *)data;
+ int ret;
+
+ switch (tuna->id) {
+ case ETHTOOL_PHY_DOWNSHIFT:
+ ret = bcm_phy_downshift_set(phydev, count);
+ break;
+ default:
+ return -EOPNOTSUPP;
+ }
+
+ if (ret)
+ return ret;
+
+ /* Disable EEE advertisement since this prevents the PHY
+ * from successfully linking up, trigger auto-negotiation restart
+ * to let the MAC decide what to do.
+ */
+ ret = bcm_phy_set_eee(phydev, count == DOWNSHIFT_DEV_DISABLE);
+ if (ret)
+ return ret;
+
+ return genphy_restart_aneg(phydev);
+}
+
+static void bcm_omega_get_phy_stats(struct phy_device *phydev,
+ struct ethtool_stats *stats, u64 *data)
+{
+ struct bcm_omega_phy_priv *priv = phydev->priv;
+
+ bcm_phy_get_stats(phydev, priv->stats, stats, data);
+}
+
+static int bcm_omega_probe(struct phy_device *phydev)
+{
+ struct bcm_omega_phy_priv *priv;
+
+ priv = devm_kzalloc(&phydev->mdio.dev, sizeof(*priv), GFP_KERNEL);
+ if (!priv)
+ return -ENOMEM;
+
+ phydev->priv = priv;
+
+ priv->stats = devm_kcalloc(&phydev->mdio.dev,
+ bcm_phy_get_sset_count(phydev), sizeof(u64),
+ GFP_KERNEL);
+ if (!priv->stats)
+ return -ENOMEM;
+
+ return 0;
+}
+
static struct phy_driver bcm_cygnus_phy_driver[] = {
{
.phy_id = PHY_ID_BCM_CYGNUS,
.phy_id_mask = 0xfffffff0,
.name = "Broadcom Cygnus PHY",
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.config_init = bcm_cygnus_config_init,
.ack_interrupt = bcm_phy_ack_intr,
.config_intr = bcm_phy_config_intr,
.suspend = genphy_suspend,
.resume = bcm_cygnus_resume,
-} };
+}, {
+ .phy_id = PHY_ID_BCM_OMEGA,
+ .phy_id_mask = 0xfffffff0,
+ .name = "Broadcom Omega Combo GPHY",
+ /* PHY_GBIT_FEATURES */
+ .flags = PHY_IS_INTERNAL,
+ .config_init = bcm_omega_config_init,
+ .suspend = genphy_suspend,
+ .resume = bcm_omega_resume,
+ .get_tunable = bcm_omega_get_tunable,
+ .set_tunable = bcm_omega_set_tunable,
+ .get_sset_count = bcm_phy_get_sset_count,
+ .get_strings = bcm_phy_get_strings,
+ .get_stats = bcm_omega_get_phy_stats,
+ .probe = bcm_omega_probe,
+}
+};
static struct mdio_device_id __maybe_unused bcm_cygnus_phy_tbl[] = {
{ PHY_ID_BCM_CYGNUS, 0xfffffff0, },
+ { PHY_ID_BCM_OMEGA, 0xfffffff0, },
{ }
};
MODULE_DEVICE_TABLE(mdio, bcm_cygnus_phy_tbl);
}
EXPORT_SYMBOL_GPL(bcm_phy_get_stats);
+void bcm_phy_r_rc_cal_reset(struct phy_device *phydev)
+{
+ /* Reset R_CAL/RC_CAL Engine */
+ bcm_phy_write_exp_sel(phydev, 0x00b0, 0x0010);
+
+ /* Disable Reset R_AL/RC_CAL Engine */
+ bcm_phy_write_exp_sel(phydev, 0x00b0, 0x0000);
+}
+EXPORT_SYMBOL_GPL(bcm_phy_r_rc_cal_reset);
+
+int bcm_phy_28nm_a0b0_afe_config_init(struct phy_device *phydev)
+{
+ /* Increase VCO range to prevent unlocking problem of PLL at low
+ * temp
+ */
+ bcm_phy_write_misc(phydev, PLL_PLLCTRL_1, 0x0048);
+
+ /* Change Ki to 011 */
+ bcm_phy_write_misc(phydev, PLL_PLLCTRL_2, 0x021b);
+
+ /* Disable loading of TVCO buffer to bandgap, set bandgap trim
+ * to 111
+ */
+ bcm_phy_write_misc(phydev, PLL_PLLCTRL_4, 0x0e20);
+
+ /* Adjust bias current trim by -3 */
+ bcm_phy_write_misc(phydev, DSP_TAP10, 0x690b);
+
+ /* Switch to CORE_BASE1E */
+ phy_write(phydev, MII_BRCM_CORE_BASE1E, 0xd);
+
+ bcm_phy_r_rc_cal_reset(phydev);
+
+ /* write AFE_RXCONFIG_0 */
+ bcm_phy_write_misc(phydev, AFE_RXCONFIG_0, 0xeb19);
+
+ /* write AFE_RXCONFIG_1 */
+ bcm_phy_write_misc(phydev, AFE_RXCONFIG_1, 0x9a3f);
+
+ /* write AFE_RX_LP_COUNTER */
+ bcm_phy_write_misc(phydev, AFE_RX_LP_COUNTER, 0x7fc0);
+
+ /* write AFE_HPF_TRIM_OTHERS */
+ bcm_phy_write_misc(phydev, AFE_HPF_TRIM_OTHERS, 0x000b);
+
+ /* write AFTE_TX_CONFIG */
+ bcm_phy_write_misc(phydev, AFE_TX_CONFIG, 0x0800);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(bcm_phy_28nm_a0b0_afe_config_init);
+
MODULE_DESCRIPTION("Broadcom PHY Library");
MODULE_LICENSE("GPL v2");
MODULE_AUTHOR("Broadcom Corporation");
#include <linux/brcmphy.h>
#include <linux/phy.h>
+/* 28nm only register definitions */
+#define MISC_ADDR(base, channel) base, channel
+
+#define DSP_TAP10 MISC_ADDR(0x0a, 0)
+#define PLL_PLLCTRL_1 MISC_ADDR(0x32, 1)
+#define PLL_PLLCTRL_2 MISC_ADDR(0x32, 2)
+#define PLL_PLLCTRL_4 MISC_ADDR(0x33, 0)
+
+#define AFE_RXCONFIG_0 MISC_ADDR(0x38, 0)
+#define AFE_RXCONFIG_1 MISC_ADDR(0x38, 1)
+#define AFE_RXCONFIG_2 MISC_ADDR(0x38, 2)
+#define AFE_RX_LP_COUNTER MISC_ADDR(0x38, 3)
+#define AFE_TX_CONFIG MISC_ADDR(0x39, 0)
+#define AFE_VDCA_ICTRL_0 MISC_ADDR(0x39, 1)
+#define AFE_VDAC_OTHERS_0 MISC_ADDR(0x39, 3)
+#define AFE_HPF_TRIM_OTHERS MISC_ADDR(0x3a, 0)
+
+
int bcm_phy_write_exp(struct phy_device *phydev, u16 reg, u16 val);
int bcm_phy_read_exp(struct phy_device *phydev, u16 reg);
void bcm_phy_get_strings(struct phy_device *phydev, u8 *data);
void bcm_phy_get_stats(struct phy_device *phydev, u64 *shadow,
struct ethtool_stats *stats, u64 *data);
+void bcm_phy_r_rc_cal_reset(struct phy_device *phydev);
+int bcm_phy_28nm_a0b0_afe_config_init(struct phy_device *phydev);
#endif /* _LINUX_BCM_PHY_LIB_H */
.phy_id = 0x00406000,
.phy_id_mask = 0xfffffc00,
.name = "Broadcom BCM63XX (1)",
- .features = PHY_BASIC_FEATURES,
+ /* PHY_BASIC_FEATURES */
.flags = PHY_IS_INTERNAL,
.config_init = bcm63xx_config_init,
.ack_interrupt = bcm_phy_ack_intr,
/* same phy as above, with just a different OUI */
.phy_id = 0x002bdc00,
.phy_id_mask = 0xfffffc00,
- .features = PHY_BASIC_FEATURES,
+ /* PHY_BASIC_FEATURES */
.flags = PHY_IS_INTERNAL,
.config_init = bcm63xx_config_init,
.ack_interrupt = bcm_phy_ack_intr,
#define MII_BCM7XXX_SHD_3_TL4 0x23
#define MII_BCM7XXX_TL4_RST_MSK (BIT(2) | BIT(1))
-/* 28nm only register definitions */
-#define MISC_ADDR(base, channel) base, channel
-
-#define DSP_TAP10 MISC_ADDR(0x0a, 0)
-#define PLL_PLLCTRL_1 MISC_ADDR(0x32, 1)
-#define PLL_PLLCTRL_2 MISC_ADDR(0x32, 2)
-#define PLL_PLLCTRL_4 MISC_ADDR(0x33, 0)
-
-#define AFE_RXCONFIG_0 MISC_ADDR(0x38, 0)
-#define AFE_RXCONFIG_1 MISC_ADDR(0x38, 1)
-#define AFE_RXCONFIG_2 MISC_ADDR(0x38, 2)
-#define AFE_RX_LP_COUNTER MISC_ADDR(0x38, 3)
-#define AFE_TX_CONFIG MISC_ADDR(0x39, 0)
-#define AFE_VDCA_ICTRL_0 MISC_ADDR(0x39, 1)
-#define AFE_VDAC_OTHERS_0 MISC_ADDR(0x39, 3)
-#define AFE_HPF_TRIM_OTHERS MISC_ADDR(0x3a, 0)
-
struct bcm7xxx_phy_priv {
u64 *stats;
};
-static void r_rc_cal_reset(struct phy_device *phydev)
-{
- /* Reset R_CAL/RC_CAL Engine */
- bcm_phy_write_exp_sel(phydev, 0x00b0, 0x0010);
-
- /* Disable Reset R_AL/RC_CAL Engine */
- bcm_phy_write_exp_sel(phydev, 0x00b0, 0x0000);
-}
-
-static int bcm7xxx_28nm_b0_afe_config_init(struct phy_device *phydev)
-{
- /* Increase VCO range to prevent unlocking problem of PLL at low
- * temp
- */
- bcm_phy_write_misc(phydev, PLL_PLLCTRL_1, 0x0048);
-
- /* Change Ki to 011 */
- bcm_phy_write_misc(phydev, PLL_PLLCTRL_2, 0x021b);
-
- /* Disable loading of TVCO buffer to bandgap, set bandgap trim
- * to 111
- */
- bcm_phy_write_misc(phydev, PLL_PLLCTRL_4, 0x0e20);
-
- /* Adjust bias current trim by -3 */
- bcm_phy_write_misc(phydev, DSP_TAP10, 0x690b);
-
- /* Switch to CORE_BASE1E */
- phy_write(phydev, MII_BRCM_CORE_BASE1E, 0xd);
-
- r_rc_cal_reset(phydev);
-
- /* write AFE_RXCONFIG_0 */
- bcm_phy_write_misc(phydev, AFE_RXCONFIG_0, 0xeb19);
-
- /* write AFE_RXCONFIG_1 */
- bcm_phy_write_misc(phydev, AFE_RXCONFIG_1, 0x9a3f);
-
- /* write AFE_RX_LP_COUNTER */
- bcm_phy_write_misc(phydev, AFE_RX_LP_COUNTER, 0x7fc0);
-
- /* write AFE_HPF_TRIM_OTHERS */
- bcm_phy_write_misc(phydev, AFE_HPF_TRIM_OTHERS, 0x000b);
-
- /* write AFTE_TX_CONFIG */
- bcm_phy_write_misc(phydev, AFE_TX_CONFIG, 0x0800);
-
- return 0;
-}
-
static int bcm7xxx_28nm_d0_afe_config_init(struct phy_device *phydev)
{
/* AFE_RXCONFIG_0 */
bcm_phy_write_misc(phydev, DSP_TAP10, 0x011b);
/* Reset R_CAL/RC_CAL engine */
- r_rc_cal_reset(phydev);
+ bcm_phy_r_rc_cal_reset(phydev);
return 0;
}
bcm_phy_write_misc(phydev, DSP_TAP10, 0x011b);
/* Reset R_CAL/RC_CAL engine */
- r_rc_cal_reset(phydev);
+ bcm_phy_r_rc_cal_reset(phydev);
return 0;
}
/* Enable ffe zero detection for Vitesse interoperability */
bcm_phy_write_misc(phydev, 0x26, 0x2, 0x0015);
- r_rc_cal_reset(phydev);
+ bcm_phy_r_rc_cal_reset(phydev);
return 0;
}
switch (rev) {
case 0xa0:
case 0xb0:
- ret = bcm7xxx_28nm_b0_afe_config_init(phydev);
+ ret = bcm_phy_28nm_a0b0_afe_config_init(phydev);
break;
case 0xd0:
ret = bcm7xxx_28nm_d0_afe_config_init(phydev);
.phy_id = (_oui), \
.phy_id_mask = 0xfffffff0, \
.name = _name, \
- .features = PHY_GBIT_FEATURES, \
+ /* PHY_GBIT_FEATURES */ \
.flags = PHY_IS_INTERNAL, \
.config_init = bcm7xxx_28nm_config_init, \
.resume = bcm7xxx_28nm_resume, \
.phy_id = (_oui), \
.phy_id_mask = 0xfffffff0, \
.name = _name, \
- .features = PHY_BASIC_FEATURES, \
+ /* PHY_BASIC_FEATURES */ \
.flags = PHY_IS_INTERNAL, \
.config_init = bcm7xxx_28nm_ephy_config_init, \
.resume = bcm7xxx_28nm_ephy_resume, \
.phy_id = (_oui), \
.phy_id_mask = 0xfffffff0, \
.name = _name, \
- .features = PHY_BASIC_FEATURES, \
+ /* PHY_BASIC_FEATURES */ \
.flags = PHY_IS_INTERNAL, \
.config_init = bcm7xxx_config_init, \
.suspend = bcm7xxx_suspend, \
BCM7XXX_28NM_GPHY(PHY_ID_BCM7439, "Broadcom BCM7439"),
BCM7XXX_28NM_GPHY(PHY_ID_BCM7439_2, "Broadcom BCM7439 (2)"),
BCM7XXX_28NM_GPHY(PHY_ID_BCM7445, "Broadcom BCM7445"),
- BCM7XXX_28NM_GPHY(PHY_ID_BCM_OMEGA, "Broadcom Omega Combo GPHY"),
BCM7XXX_40NM_EPHY(PHY_ID_BCM7346, "Broadcom BCM7346"),
BCM7XXX_40NM_EPHY(PHY_ID_BCM7362, "Broadcom BCM7362"),
BCM7XXX_40NM_EPHY(PHY_ID_BCM7425, "Broadcom BCM7425"),
.phy_id = PHY_ID_BCM5411,
.phy_id_mask = 0xfffffff0,
.name = "Broadcom BCM5411",
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.config_init = bcm54xx_config_init,
.ack_interrupt = bcm_phy_ack_intr,
.config_intr = bcm_phy_config_intr,
.phy_id = PHY_ID_BCM5421,
.phy_id_mask = 0xfffffff0,
.name = "Broadcom BCM5421",
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.config_init = bcm54xx_config_init,
.ack_interrupt = bcm_phy_ack_intr,
.config_intr = bcm_phy_config_intr,
.phy_id = PHY_ID_BCM54210E,
.phy_id_mask = 0xfffffff0,
.name = "Broadcom BCM54210E",
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.config_init = bcm54xx_config_init,
.ack_interrupt = bcm_phy_ack_intr,
.config_intr = bcm_phy_config_intr,
.phy_id = PHY_ID_BCM5461,
.phy_id_mask = 0xfffffff0,
.name = "Broadcom BCM5461",
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.config_init = bcm54xx_config_init,
.ack_interrupt = bcm_phy_ack_intr,
.config_intr = bcm_phy_config_intr,
.phy_id = PHY_ID_BCM54612E,
.phy_id_mask = 0xfffffff0,
.name = "Broadcom BCM54612E",
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.config_init = bcm54xx_config_init,
.ack_interrupt = bcm_phy_ack_intr,
.config_intr = bcm_phy_config_intr,
.phy_id = PHY_ID_BCM54616S,
.phy_id_mask = 0xfffffff0,
.name = "Broadcom BCM54616S",
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.config_init = bcm54xx_config_init,
.config_aneg = bcm54616s_config_aneg,
.ack_interrupt = bcm_phy_ack_intr,
.phy_id = PHY_ID_BCM5464,
.phy_id_mask = 0xfffffff0,
.name = "Broadcom BCM5464",
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.config_init = bcm54xx_config_init,
.ack_interrupt = bcm_phy_ack_intr,
.config_intr = bcm_phy_config_intr,
.phy_id = PHY_ID_BCM5481,
.phy_id_mask = 0xfffffff0,
.name = "Broadcom BCM5481",
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.config_init = bcm54xx_config_init,
.config_aneg = bcm5481_config_aneg,
.ack_interrupt = bcm_phy_ack_intr,
.phy_id = PHY_ID_BCM54810,
.phy_id_mask = 0xfffffff0,
.name = "Broadcom BCM54810",
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.config_init = bcm54xx_config_init,
.config_aneg = bcm5481_config_aneg,
.ack_interrupt = bcm_phy_ack_intr,
.phy_id = PHY_ID_BCM5482,
.phy_id_mask = 0xfffffff0,
.name = "Broadcom BCM5482",
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.config_init = bcm5482_config_init,
.read_status = bcm5482_read_status,
.ack_interrupt = bcm_phy_ack_intr,
.phy_id = PHY_ID_BCM50610,
.phy_id_mask = 0xfffffff0,
.name = "Broadcom BCM50610",
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.config_init = bcm54xx_config_init,
.ack_interrupt = bcm_phy_ack_intr,
.config_intr = bcm_phy_config_intr,
.phy_id = PHY_ID_BCM50610M,
.phy_id_mask = 0xfffffff0,
.name = "Broadcom BCM50610M",
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.config_init = bcm54xx_config_init,
.ack_interrupt = bcm_phy_ack_intr,
.config_intr = bcm_phy_config_intr,
.phy_id = PHY_ID_BCM57780,
.phy_id_mask = 0xfffffff0,
.name = "Broadcom BCM57780",
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.config_init = bcm54xx_config_init,
.ack_interrupt = bcm_phy_ack_intr,
.config_intr = bcm_phy_config_intr,
.phy_id = PHY_ID_BCMAC131,
.phy_id_mask = 0xfffffff0,
.name = "Broadcom BCMAC131",
- .features = PHY_BASIC_FEATURES,
+ /* PHY_BASIC_FEATURES */
.config_init = brcm_fet_config_init,
.ack_interrupt = brcm_fet_ack_interrupt,
.config_intr = brcm_fet_config_intr,
.phy_id = PHY_ID_BCM5241,
.phy_id_mask = 0xfffffff0,
.name = "Broadcom BCM5241",
- .features = PHY_BASIC_FEATURES,
+ /* PHY_BASIC_FEATURES */
.config_init = brcm_fet_config_init,
.ack_interrupt = brcm_fet_ack_interrupt,
.config_intr = brcm_fet_config_intr,
.phy_id_mask = 0xfffffff0,
.name = "Broadcom BCM5395",
.flags = PHY_IS_INTERNAL,
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.get_sset_count = bcm_phy_get_sset_count,
.get_strings = bcm_phy_get_strings,
.get_stats = bcm53xx_phy_get_stats,
.phy_id = PHY_ID_BCM89610,
.phy_id_mask = 0xfffffff0,
.name = "Broadcom BCM89610",
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.config_init = bcm54xx_config_init,
.ack_interrupt = bcm_phy_ack_intr,
.config_intr = bcm_phy_config_intr,
.phy_id = 0x000fc410,
.name = "Cicada Cis8201",
.phy_id_mask = 0x000ffff0,
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.config_init = &cis820x_config_init,
.ack_interrupt = &cis820x_ack_interrupt,
.config_intr = &cis820x_config_intr,
.phy_id = 0x000fc440,
.name = "Cicada Cis8204",
.phy_id_mask = 0x000fffc0,
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.config_init = &cis820x_config_init,
.ack_interrupt = &cis820x_ack_interrupt,
.config_intr = &cis820x_config_intr,
.phy_id = 0x0181b880,
.name = "Davicom DM9161E",
.phy_id_mask = 0x0ffffff0,
- .features = PHY_BASIC_FEATURES,
+ /* PHY_BASIC_FEATURES */
.config_init = dm9161_config_init,
.config_aneg = dm9161_config_aneg,
.ack_interrupt = dm9161_ack_interrupt,
.phy_id = 0x0181b8b0,
.name = "Davicom DM9161B/C",
.phy_id_mask = 0x0ffffff0,
- .features = PHY_BASIC_FEATURES,
+ /* PHY_BASIC_FEATURES */
.config_init = dm9161_config_init,
.config_aneg = dm9161_config_aneg,
.ack_interrupt = dm9161_ack_interrupt,
.phy_id = 0x0181b8a0,
.name = "Davicom DM9161A",
.phy_id_mask = 0x0ffffff0,
- .features = PHY_BASIC_FEATURES,
+ /* PHY_BASIC_FEATURES */
.config_init = dm9161_config_init,
.config_aneg = dm9161_config_aneg,
.ack_interrupt = dm9161_ack_interrupt,
.phy_id = 0x00181b80,
.name = "Davicom DM9131",
.phy_id_mask = 0x0ffffff0,
- .features = PHY_BASIC_FEATURES,
+ /* PHY_BASIC_FEATURES */
.ack_interrupt = dm9161_ack_interrupt,
.config_intr = dm9161_config_intr,
} };
.phy_id = DP83640_PHY_ID,
.phy_id_mask = 0xfffffff0,
.name = "NatSemi DP83640",
- .features = PHY_BASIC_FEATURES,
+ /* PHY_BASIC_FEATURES */
.probe = dp83640_probe,
.remove = dp83640_remove,
.soft_reset = dp83640_soft_reset,
{ \
PHY_ID_MATCH_MODEL(_id), \
.name = (_name), \
- .features = PHY_BASIC_FEATURES, \
+ /* PHY_BASIC_FEATURES */ \
.soft_reset = dp83822_phy_reset, \
.config_init = dp83822_config_init, \
.get_wol = dp83822_get_wol, \
.phy_id = _id, \
.phy_id_mask = 0xfffffff0, \
.name = _name, \
- .features = PHY_BASIC_FEATURES, \
+ /* PHY_BASIC_FEATURES */ \
\
.soft_reset = genphy_soft_reset, \
.config_init = _config_init, \
.phy_id = DP83867_PHY_ID,
.phy_id_mask = 0xfffffff0,
.name = "TI DP83867",
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.config_init = dp83867_config_init,
.soft_reset = dp83867_phy_reset,
.phy_id = DP83TC811_PHY_ID,
.phy_id_mask = 0xfffffff0,
.name = "TI DP83TC811",
- .features = PHY_BASIC_FEATURES,
+ /* PHY_BASIC_FEATURES */
.config_init = dp83811_config_init,
.config_aneg = dp83811_config_aneg,
.soft_reset = dp83811_phy_reset,
.phy_id = 0x0282f014,
.name = "ET1011C",
.phy_id_mask = 0xfffffff0,
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.config_aneg = et1011c_config_aneg,
.read_status = et1011c_read_status,
} };
.phy_id = 0x02430d80,
.name = "ICPlus IP175C",
.phy_id_mask = 0x0ffffff0,
- .features = PHY_BASIC_FEATURES,
+ /* PHY_BASIC_FEATURES */
.config_init = &ip175c_config_init,
.config_aneg = &ip175c_config_aneg,
.read_status = &ip175c_read_status,
.phy_id = 0x02430d90,
.name = "ICPlus IP1001",
.phy_id_mask = 0x0ffffff0,
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.config_init = &ip1001_config_init,
.suspend = genphy_suspend,
.resume = genphy_resume,
.phy_id = 0x02430c54,
.name = "ICPlus IP101A/G",
.phy_id_mask = 0x0ffffff0,
- .features = PHY_BASIC_FEATURES,
+ /* PHY_BASIC_FEATURES */
.probe = ip101a_g_probe,
.config_intr = ip101a_g_config_intr,
.did_interrupt = ip101a_g_did_interrupt,
.phy_id = PHY_ID_PHY11G_1_3,
.phy_id_mask = 0xffffffff,
.name = "Intel XWAY PHY11G (PEF 7071/PEF 7072) v1.3",
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.config_init = xway_gphy_config_init,
.config_aneg = xway_gphy14_config_aneg,
.ack_interrupt = xway_gphy_ack_interrupt,
.phy_id = PHY_ID_PHY22F_1_3,
.phy_id_mask = 0xffffffff,
.name = "Intel XWAY PHY22F (PEF 7061) v1.3",
- .features = PHY_BASIC_FEATURES,
+ /* PHY_BASIC_FEATURES */
.config_init = xway_gphy_config_init,
.config_aneg = xway_gphy14_config_aneg,
.ack_interrupt = xway_gphy_ack_interrupt,
.phy_id = PHY_ID_PHY11G_1_4,
.phy_id_mask = 0xffffffff,
.name = "Intel XWAY PHY11G (PEF 7071/PEF 7072) v1.4",
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.config_init = xway_gphy_config_init,
.config_aneg = xway_gphy14_config_aneg,
.ack_interrupt = xway_gphy_ack_interrupt,
.phy_id = PHY_ID_PHY22F_1_4,
.phy_id_mask = 0xffffffff,
.name = "Intel XWAY PHY22F (PEF 7061) v1.4",
- .features = PHY_BASIC_FEATURES,
+ /* PHY_BASIC_FEATURES */
.config_init = xway_gphy_config_init,
.config_aneg = xway_gphy14_config_aneg,
.ack_interrupt = xway_gphy_ack_interrupt,
.phy_id = PHY_ID_PHY11G_1_5,
.phy_id_mask = 0xffffffff,
.name = "Intel XWAY PHY11G (PEF 7071/PEF 7072) v1.5 / v1.6",
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.config_init = xway_gphy_config_init,
.ack_interrupt = xway_gphy_ack_interrupt,
.did_interrupt = xway_gphy_did_interrupt,
.phy_id = PHY_ID_PHY22F_1_5,
.phy_id_mask = 0xffffffff,
.name = "Intel XWAY PHY22F (PEF 7061) v1.5 / v1.6",
- .features = PHY_BASIC_FEATURES,
+ /* PHY_BASIC_FEATURES */
.config_init = xway_gphy_config_init,
.ack_interrupt = xway_gphy_ack_interrupt,
.did_interrupt = xway_gphy_did_interrupt,
.phy_id = PHY_ID_PHY11G_VR9_1_1,
.phy_id_mask = 0xffffffff,
.name = "Intel XWAY PHY11G (xRX v1.1 integrated)",
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.config_init = xway_gphy_config_init,
.ack_interrupt = xway_gphy_ack_interrupt,
.did_interrupt = xway_gphy_did_interrupt,
.phy_id = PHY_ID_PHY22F_VR9_1_1,
.phy_id_mask = 0xffffffff,
.name = "Intel XWAY PHY22F (xRX v1.1 integrated)",
- .features = PHY_BASIC_FEATURES,
+ /* PHY_BASIC_FEATURES */
.config_init = xway_gphy_config_init,
.ack_interrupt = xway_gphy_ack_interrupt,
.did_interrupt = xway_gphy_did_interrupt,
.phy_id = PHY_ID_PHY11G_VR9_1_2,
.phy_id_mask = 0xffffffff,
.name = "Intel XWAY PHY11G (xRX v1.2 integrated)",
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.config_init = xway_gphy_config_init,
.ack_interrupt = xway_gphy_ack_interrupt,
.did_interrupt = xway_gphy_did_interrupt,
.phy_id = PHY_ID_PHY22F_VR9_1_2,
.phy_id_mask = 0xffffffff,
.name = "Intel XWAY PHY22F (xRX v1.2 integrated)",
- .features = PHY_BASIC_FEATURES,
+ /* PHY_BASIC_FEATURES */
.config_init = xway_gphy_config_init,
.ack_interrupt = xway_gphy_ack_interrupt,
.did_interrupt = xway_gphy_did_interrupt,
.phy_id = 0x78100000,
.name = "LXT970",
.phy_id_mask = 0xfffffff0,
- .features = PHY_BASIC_FEATURES,
+ /* PHY_BASIC_FEATURES */
.config_init = lxt970_config_init,
.ack_interrupt = lxt970_ack_interrupt,
.config_intr = lxt970_config_intr,
.phy_id = 0x001378e0,
.name = "LXT971",
.phy_id_mask = 0xfffffff0,
- .features = PHY_BASIC_FEATURES,
+ /* PHY_BASIC_FEATURES */
.ack_interrupt = lxt971_ack_interrupt,
.config_intr = lxt971_config_intr,
}, {
.phy_id = 0x00137a10,
.name = "LXT973-A2",
.phy_id_mask = 0xffffffff,
- .features = PHY_BASIC_FEATURES,
+ /* PHY_BASIC_FEATURES */
.flags = 0,
.probe = lxt973_probe,
.config_aneg = lxt973_config_aneg,
.phy_id = 0x00137a10,
.name = "LXT973",
.phy_id_mask = 0xfffffff0,
- .features = PHY_BASIC_FEATURES,
+ /* PHY_BASIC_FEATURES */
.flags = 0,
.probe = lxt973_probe,
.config_aneg = lxt973_config_aneg,
#include <linux/ethtool.h>
#include <linux/phy.h>
#include <linux/marvell_phy.h>
+#include <linux/bitfield.h>
#include <linux/of.h>
#include <linux/io.h>
#define MII_88E1510_TEMP_SENSOR 0x1b
#define MII_88E1510_TEMP_SENSOR_MASK 0xff
+#define MII_88E1540_COPPER_CTRL3 0x1a
+#define MII_88E1540_COPPER_CTRL3_LINK_DOWN_DELAY_MASK GENMASK(11, 10)
+#define MII_88E1540_COPPER_CTRL3_LINK_DOWN_DELAY_00MS 0
+#define MII_88E1540_COPPER_CTRL3_LINK_DOWN_DELAY_10MS 1
+#define MII_88E1540_COPPER_CTRL3_LINK_DOWN_DELAY_20MS 2
+#define MII_88E1540_COPPER_CTRL3_LINK_DOWN_DELAY_40MS 3
+#define MII_88E1540_COPPER_CTRL3_FAST_LINK_DOWN BIT(9)
+
#define MII_88E6390_MISC_TEST 0x1b
#define MII_88E6390_MISC_TEST_SAMPLE_1S 0
#define MII_88E6390_MISC_TEST_SAMPLE_10MS BIT(14)
return 0;
}
+static int m88e1540_get_fld(struct phy_device *phydev, u8 *msecs)
+{
+ int val;
+
+ val = phy_read(phydev, MII_88E1540_COPPER_CTRL3);
+ if (val < 0)
+ return val;
+
+ if (!(val & MII_88E1540_COPPER_CTRL3_FAST_LINK_DOWN)) {
+ *msecs = ETHTOOL_PHY_FAST_LINK_DOWN_OFF;
+ return 0;
+ }
+
+ val = FIELD_GET(MII_88E1540_COPPER_CTRL3_LINK_DOWN_DELAY_MASK, val);
+
+ switch (val) {
+ case MII_88E1540_COPPER_CTRL3_LINK_DOWN_DELAY_00MS:
+ *msecs = 0;
+ break;
+ case MII_88E1540_COPPER_CTRL3_LINK_DOWN_DELAY_10MS:
+ *msecs = 10;
+ break;
+ case MII_88E1540_COPPER_CTRL3_LINK_DOWN_DELAY_20MS:
+ *msecs = 20;
+ break;
+ case MII_88E1540_COPPER_CTRL3_LINK_DOWN_DELAY_40MS:
+ *msecs = 40;
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int m88e1540_set_fld(struct phy_device *phydev, const u8 *msecs)
+{
+ struct ethtool_eee eee;
+ int val, ret;
+
+ if (*msecs == ETHTOOL_PHY_FAST_LINK_DOWN_OFF)
+ return phy_clear_bits(phydev, MII_88E1540_COPPER_CTRL3,
+ MII_88E1540_COPPER_CTRL3_FAST_LINK_DOWN);
+
+ /* According to the Marvell data sheet EEE must be disabled for
+ * Fast Link Down detection to work properly
+ */
+ ret = phy_ethtool_get_eee(phydev, &eee);
+ if (!ret && eee.eee_enabled) {
+ phydev_warn(phydev, "Fast Link Down detection requires EEE to be disabled!\n");
+ return -EBUSY;
+ }
+
+ if (*msecs <= 5)
+ val = MII_88E1540_COPPER_CTRL3_LINK_DOWN_DELAY_00MS;
+ else if (*msecs <= 15)
+ val = MII_88E1540_COPPER_CTRL3_LINK_DOWN_DELAY_10MS;
+ else if (*msecs <= 30)
+ val = MII_88E1540_COPPER_CTRL3_LINK_DOWN_DELAY_20MS;
+ else
+ val = MII_88E1540_COPPER_CTRL3_LINK_DOWN_DELAY_40MS;
+
+ val = FIELD_PREP(MII_88E1540_COPPER_CTRL3_LINK_DOWN_DELAY_MASK, val);
+
+ ret = phy_modify(phydev, MII_88E1540_COPPER_CTRL3,
+ MII_88E1540_COPPER_CTRL3_LINK_DOWN_DELAY_MASK, val);
+ if (ret)
+ return ret;
+
+ return phy_set_bits(phydev, MII_88E1540_COPPER_CTRL3,
+ MII_88E1540_COPPER_CTRL3_FAST_LINK_DOWN);
+}
+
+static int m88e1540_get_tunable(struct phy_device *phydev,
+ struct ethtool_tunable *tuna, void *data)
+{
+ switch (tuna->id) {
+ case ETHTOOL_PHY_FAST_LINK_DOWN:
+ return m88e1540_get_fld(phydev, data);
+ default:
+ return -EOPNOTSUPP;
+ }
+}
+
+static int m88e1540_set_tunable(struct phy_device *phydev,
+ struct ethtool_tunable *tuna, const void *data)
+{
+ switch (tuna->id) {
+ case ETHTOOL_PHY_FAST_LINK_DOWN:
+ return m88e1540_set_fld(phydev, data);
+ default:
+ return -EOPNOTSUPP;
+ }
+}
+
/* The VOD can be out of specification on link up. Poke an
* undocumented register, in an undocumented page, with a magic value
* to fix this.
.phy_id = MARVELL_PHY_ID_88E1101,
.phy_id_mask = MARVELL_PHY_ID_MASK,
.name = "Marvell 88E1101",
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.probe = marvell_probe,
.config_init = &marvell_config_init,
.config_aneg = &m88e1101_config_aneg,
.phy_id = MARVELL_PHY_ID_88E1112,
.phy_id_mask = MARVELL_PHY_ID_MASK,
.name = "Marvell 88E1112",
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.probe = marvell_probe,
.config_init = &m88e1111_config_init,
.config_aneg = &marvell_config_aneg,
.phy_id = MARVELL_PHY_ID_88E1111,
.phy_id_mask = MARVELL_PHY_ID_MASK,
.name = "Marvell 88E1111",
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.probe = marvell_probe,
.config_init = &m88e1111_config_init,
.config_aneg = &marvell_config_aneg,
.phy_id = MARVELL_PHY_ID_88E1118,
.phy_id_mask = MARVELL_PHY_ID_MASK,
.name = "Marvell 88E1118",
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.probe = marvell_probe,
.config_init = &m88e1118_config_init,
.config_aneg = &m88e1118_config_aneg,
.phy_id = MARVELL_PHY_ID_88E1121R,
.phy_id_mask = MARVELL_PHY_ID_MASK,
.name = "Marvell 88E1121R",
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.probe = &m88e1121_probe,
.config_init = &marvell_config_init,
.config_aneg = &m88e1121_config_aneg,
.phy_id = MARVELL_PHY_ID_88E1318S,
.phy_id_mask = MARVELL_PHY_ID_MASK,
.name = "Marvell 88E1318S",
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.probe = marvell_probe,
.config_init = &m88e1318_config_init,
.config_aneg = &m88e1318_config_aneg,
.phy_id = MARVELL_PHY_ID_88E1145,
.phy_id_mask = MARVELL_PHY_ID_MASK,
.name = "Marvell 88E1145",
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.probe = marvell_probe,
.config_init = &m88e1145_config_init,
.config_aneg = &m88e1101_config_aneg,
.phy_id = MARVELL_PHY_ID_88E1149R,
.phy_id_mask = MARVELL_PHY_ID_MASK,
.name = "Marvell 88E1149R",
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.probe = marvell_probe,
.config_init = &m88e1149_config_init,
.config_aneg = &m88e1118_config_aneg,
.phy_id = MARVELL_PHY_ID_88E1240,
.phy_id_mask = MARVELL_PHY_ID_MASK,
.name = "Marvell 88E1240",
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.probe = marvell_probe,
.config_init = &m88e1111_config_init,
.config_aneg = &marvell_config_aneg,
.phy_id = MARVELL_PHY_ID_88E1116R,
.phy_id_mask = MARVELL_PHY_ID_MASK,
.name = "Marvell 88E1116R",
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.probe = marvell_probe,
.config_init = &m88e1116r_config_init,
.ack_interrupt = &marvell_ack_interrupt,
.phy_id = MARVELL_PHY_ID_88E1540,
.phy_id_mask = MARVELL_PHY_ID_MASK,
.name = "Marvell 88E1540",
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.probe = m88e1510_probe,
.config_init = &marvell_config_init,
.config_aneg = &m88e1510_config_aneg,
.get_sset_count = marvell_get_sset_count,
.get_strings = marvell_get_strings,
.get_stats = marvell_get_stats,
+ .get_tunable = m88e1540_get_tunable,
+ .set_tunable = m88e1540_set_tunable,
},
{
.phy_id = MARVELL_PHY_ID_88E1545,
.phy_id_mask = MARVELL_PHY_ID_MASK,
.name = "Marvell 88E1545",
.probe = m88e1510_probe,
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.config_init = &marvell_config_init,
.config_aneg = &m88e1510_config_aneg,
.read_status = &marvell_read_status,
.phy_id = MARVELL_PHY_ID_88E3016,
.phy_id_mask = MARVELL_PHY_ID_MASK,
.name = "Marvell 88E3016",
- .features = PHY_BASIC_FEATURES,
+ /* PHY_BASIC_FEATURES */
.probe = marvell_probe,
.config_init = &m88e3016_config_init,
.aneg_done = &marvell_aneg_done,
.phy_id = MARVELL_PHY_ID_88E6390,
.phy_id_mask = MARVELL_PHY_ID_MASK,
.name = "Marvell 88E6390",
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.probe = m88e6390_probe,
.config_init = &marvell_config_init,
.config_aneg = &m88e6390_config_aneg,
.get_sset_count = marvell_get_sset_count,
.get_strings = marvell_get_strings,
.get_stats = marvell_get_stats,
+ .get_tunable = m88e1540_get_tunable,
+ .set_tunable = m88e1540_set_tunable,
},
};
MV_AN_STAT1000 = 0x8001, /* 1000base-T status register */
/* Vendor2 MMD registers */
+ MV_V2_PORT_CTRL = 0xf001,
+ MV_V2_PORT_CTRL_PWRDOWN = 0x0800,
MV_V2_TEMP_CTRL = 0xf08a,
MV_V2_TEMP_CTRL_MASK = 0xc000,
MV_V2_TEMP_CTRL_SAMPLE = 0x0000,
static int mv3310_suspend(struct phy_device *phydev)
{
- return 0;
+ return phy_set_bits_mmd(phydev, MDIO_MMD_VEND2, MV_V2_PORT_CTRL,
+ MV_V2_PORT_CTRL_PWRDOWN);
}
static int mv3310_resume(struct phy_device *phydev)
{
+ int ret;
+
+ ret = phy_clear_bits_mmd(phydev, MDIO_MMD_VEND2, MV_V2_PORT_CTRL,
+ MV_V2_PORT_CTRL_PWRDOWN);
+ if (ret)
+ return ret;
+
return mv3310_hwmon_config(phydev, true);
}
.phy_id = MARVELL_PHY_ID_88E2110,
.phy_id_mask = MARVELL_PHY_ID_MASK,
.name = "mv88x2110",
- .get_features = genphy_c45_pma_read_abilities,
.probe = mv3310_probe,
+ .suspend = mv3310_suspend,
+ .resume = mv3310_resume,
.soft_reset = genphy_no_soft_reset,
.config_init = mv3310_config_init,
.config_aneg = mv3310_config_aneg,
usleep_range(1000, 2000);
} while (--timeout);
- if (!timeout)
- return -ETIMEDOUT;
-
- return 0;
+ return -ETIMEDOUT;
}
static int unimac_mdio_read(struct mii_bus *bus, int phy_id, int reg)
platform_set_drvdata(pdev, priv);
- dev_info(&pdev->dev, "Broadcom UniMAC MDIO bus at 0x%p\n", priv->base);
+ dev_info(&pdev->dev, "Broadcom UniMAC MDIO bus\n");
return 0;
--- /dev/null
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2019 Baylibre, SAS.
+ * Author: Jerome Brunet <jbrunet@baylibre.com>
+ */
+
+#include <linux/bitfield.h>
+#include <linux/clk.h>
+#include <linux/clk-provider.h>
+#include <linux/device.h>
+#include <linux/io.h>
+#include <linux/iopoll.h>
+#include <linux/mdio-mux.h>
+#include <linux/module.h>
+#include <linux/phy.h>
+#include <linux/platform_device.h>
+
+#define ETH_PLL_STS 0x40
+#define ETH_PLL_CTL0 0x44
+#define PLL_CTL0_LOCK_DIG BIT(30)
+#define PLL_CTL0_RST BIT(29)
+#define PLL_CTL0_EN BIT(28)
+#define PLL_CTL0_SEL BIT(23)
+#define PLL_CTL0_N GENMASK(14, 10)
+#define PLL_CTL0_M GENMASK(8, 0)
+#define PLL_LOCK_TIMEOUT 1000000
+#define PLL_MUX_NUM_PARENT 2
+#define ETH_PLL_CTL1 0x48
+#define ETH_PLL_CTL2 0x4c
+#define ETH_PLL_CTL3 0x50
+#define ETH_PLL_CTL4 0x54
+#define ETH_PLL_CTL5 0x58
+#define ETH_PLL_CTL6 0x5c
+#define ETH_PLL_CTL7 0x60
+
+#define ETH_PHY_CNTL0 0x80
+#define EPHY_G12A_ID 0x33000180
+#define ETH_PHY_CNTL1 0x84
+#define PHY_CNTL1_ST_MODE GENMASK(2, 0)
+#define PHY_CNTL1_ST_PHYADD GENMASK(7, 3)
+#define EPHY_DFLT_ADD 8
+#define PHY_CNTL1_MII_MODE GENMASK(15, 14)
+#define EPHY_MODE_RMII 0x1
+#define PHY_CNTL1_CLK_EN BIT(16)
+#define PHY_CNTL1_CLKFREQ BIT(17)
+#define PHY_CNTL1_PHY_ENB BIT(18)
+#define ETH_PHY_CNTL2 0x88
+#define PHY_CNTL2_USE_INTERNAL BIT(5)
+#define PHY_CNTL2_SMI_SRC_MAC BIT(6)
+#define PHY_CNTL2_RX_CLK_EPHY BIT(9)
+
+#define MESON_G12A_MDIO_EXTERNAL_ID 0
+#define MESON_G12A_MDIO_INTERNAL_ID 1
+
+struct g12a_mdio_mux {
+ bool pll_is_enabled;
+ void __iomem *regs;
+ void *mux_handle;
+ struct clk *pclk;
+ struct clk *pll;
+};
+
+struct g12a_ephy_pll {
+ void __iomem *base;
+ struct clk_hw hw;
+};
+
+#define g12a_ephy_pll_to_dev(_hw) \
+ container_of(_hw, struct g12a_ephy_pll, hw)
+
+static unsigned long g12a_ephy_pll_recalc_rate(struct clk_hw *hw,
+ unsigned long parent_rate)
+{
+ struct g12a_ephy_pll *pll = g12a_ephy_pll_to_dev(hw);
+ u32 val, m, n;
+
+ val = readl(pll->base + ETH_PLL_CTL0);
+ m = FIELD_GET(PLL_CTL0_M, val);
+ n = FIELD_GET(PLL_CTL0_N, val);
+
+ return parent_rate * m / n;
+}
+
+static int g12a_ephy_pll_enable(struct clk_hw *hw)
+{
+ struct g12a_ephy_pll *pll = g12a_ephy_pll_to_dev(hw);
+ u32 val = readl(pll->base + ETH_PLL_CTL0);
+
+ /* Apply both enable an reset */
+ val |= PLL_CTL0_RST | PLL_CTL0_EN;
+ writel(val, pll->base + ETH_PLL_CTL0);
+
+ /* Clear the reset to let PLL lock */
+ val &= ~PLL_CTL0_RST;
+ writel(val, pll->base + ETH_PLL_CTL0);
+
+ /* Poll on the digital lock instead of the usual analog lock
+ * This is done because bit 31 is unreliable on some SoC. Bit
+ * 31 may indicate that the PLL is not lock eventhough the clock
+ * is actually running
+ */
+ return readl_poll_timeout(pll->base + ETH_PLL_CTL0, val,
+ val & PLL_CTL0_LOCK_DIG, 0, PLL_LOCK_TIMEOUT);
+}
+
+static void g12a_ephy_pll_disable(struct clk_hw *hw)
+{
+ struct g12a_ephy_pll *pll = g12a_ephy_pll_to_dev(hw);
+ u32 val;
+
+ val = readl(pll->base + ETH_PLL_CTL0);
+ val &= ~PLL_CTL0_EN;
+ val |= PLL_CTL0_RST;
+ writel(val, pll->base + ETH_PLL_CTL0);
+}
+
+static int g12a_ephy_pll_is_enabled(struct clk_hw *hw)
+{
+ struct g12a_ephy_pll *pll = g12a_ephy_pll_to_dev(hw);
+ unsigned int val;
+
+ val = readl(pll->base + ETH_PLL_CTL0);
+
+ return (val & PLL_CTL0_LOCK_DIG) ? 1 : 0;
+}
+
+static void g12a_ephy_pll_init(struct clk_hw *hw)
+{
+ struct g12a_ephy_pll *pll = g12a_ephy_pll_to_dev(hw);
+
+ /* Apply PLL HW settings */
+ writel(0x29c0040a, pll->base + ETH_PLL_CTL0);
+ writel(0x927e0000, pll->base + ETH_PLL_CTL1);
+ writel(0xac5f49e5, pll->base + ETH_PLL_CTL2);
+ writel(0x00000000, pll->base + ETH_PLL_CTL3);
+ writel(0x00000000, pll->base + ETH_PLL_CTL4);
+ writel(0x20200000, pll->base + ETH_PLL_CTL5);
+ writel(0x0000c002, pll->base + ETH_PLL_CTL6);
+ writel(0x00000023, pll->base + ETH_PLL_CTL7);
+}
+
+static const struct clk_ops g12a_ephy_pll_ops = {
+ .recalc_rate = g12a_ephy_pll_recalc_rate,
+ .is_enabled = g12a_ephy_pll_is_enabled,
+ .enable = g12a_ephy_pll_enable,
+ .disable = g12a_ephy_pll_disable,
+ .init = g12a_ephy_pll_init,
+};
+
+static int g12a_enable_internal_mdio(struct g12a_mdio_mux *priv)
+{
+ int ret;
+
+ /* Enable the phy clock */
+ if (!priv->pll_is_enabled) {
+ ret = clk_prepare_enable(priv->pll);
+ if (ret)
+ return ret;
+ }
+
+ priv->pll_is_enabled = true;
+
+ /* Initialize ephy control */
+ writel(EPHY_G12A_ID, priv->regs + ETH_PHY_CNTL0);
+ writel(FIELD_PREP(PHY_CNTL1_ST_MODE, 3) |
+ FIELD_PREP(PHY_CNTL1_ST_PHYADD, EPHY_DFLT_ADD) |
+ FIELD_PREP(PHY_CNTL1_MII_MODE, EPHY_MODE_RMII) |
+ PHY_CNTL1_CLK_EN |
+ PHY_CNTL1_CLKFREQ |
+ PHY_CNTL1_PHY_ENB,
+ priv->regs + ETH_PHY_CNTL1);
+ writel(PHY_CNTL2_USE_INTERNAL |
+ PHY_CNTL2_SMI_SRC_MAC |
+ PHY_CNTL2_RX_CLK_EPHY,
+ priv->regs + ETH_PHY_CNTL2);
+
+ return 0;
+}
+
+static int g12a_enable_external_mdio(struct g12a_mdio_mux *priv)
+{
+ /* Reset the mdio bus mux */
+ writel_relaxed(0x0, priv->regs + ETH_PHY_CNTL2);
+
+ /* Disable the phy clock if enabled */
+ if (priv->pll_is_enabled) {
+ clk_disable_unprepare(priv->pll);
+ priv->pll_is_enabled = false;
+ }
+
+ return 0;
+}
+
+static int g12a_mdio_switch_fn(int current_child, int desired_child,
+ void *data)
+{
+ struct g12a_mdio_mux *priv = dev_get_drvdata(data);
+
+ if (current_child == desired_child)
+ return 0;
+
+ switch (desired_child) {
+ case MESON_G12A_MDIO_EXTERNAL_ID:
+ return g12a_enable_external_mdio(priv);
+ case MESON_G12A_MDIO_INTERNAL_ID:
+ return g12a_enable_internal_mdio(priv);
+ default:
+ return -EINVAL;
+ }
+}
+
+static const struct of_device_id g12a_mdio_mux_match[] = {
+ { .compatible = "amlogic,g12a-mdio-mux", },
+ {},
+};
+MODULE_DEVICE_TABLE(of, g12a_mdio_mux_match);
+
+static int g12a_ephy_glue_clk_register(struct device *dev)
+{
+ struct g12a_mdio_mux *priv = dev_get_drvdata(dev);
+ const char *parent_names[PLL_MUX_NUM_PARENT];
+ struct clk_init_data init;
+ struct g12a_ephy_pll *pll;
+ struct clk_mux *mux;
+ struct clk *clk;
+ char *name;
+ int i;
+
+ /* get the mux parents */
+ for (i = 0; i < PLL_MUX_NUM_PARENT; i++) {
+ char in_name[8];
+
+ snprintf(in_name, sizeof(in_name), "clkin%d", i);
+ clk = devm_clk_get(dev, in_name);
+ if (IS_ERR(clk)) {
+ if (PTR_ERR(clk) != -EPROBE_DEFER)
+ dev_err(dev, "Missing clock %s\n", in_name);
+ return PTR_ERR(clk);
+ }
+
+ parent_names[i] = __clk_get_name(clk);
+ }
+
+ /* create the input mux */
+ mux = devm_kzalloc(dev, sizeof(*mux), GFP_KERNEL);
+ if (!mux)
+ return -ENOMEM;
+
+ name = kasprintf(GFP_KERNEL, "%s#mux", dev_name(dev));
+ if (!name)
+ return -ENOMEM;
+
+ init.name = name;
+ init.ops = &clk_mux_ro_ops;
+ init.flags = 0;
+ init.parent_names = parent_names;
+ init.num_parents = PLL_MUX_NUM_PARENT;
+
+ mux->reg = priv->regs + ETH_PLL_CTL0;
+ mux->shift = __ffs(PLL_CTL0_SEL);
+ mux->mask = PLL_CTL0_SEL >> mux->shift;
+ mux->hw.init = &init;
+
+ clk = devm_clk_register(dev, &mux->hw);
+ kfree(name);
+ if (IS_ERR(clk)) {
+ dev_err(dev, "failed to register input mux\n");
+ return PTR_ERR(clk);
+ }
+
+ /* create the pll */
+ pll = devm_kzalloc(dev, sizeof(*pll), GFP_KERNEL);
+ if (!pll)
+ return -ENOMEM;
+
+ name = kasprintf(GFP_KERNEL, "%s#pll", dev_name(dev));
+ if (!name)
+ return -ENOMEM;
+
+ init.name = name;
+ init.ops = &g12a_ephy_pll_ops;
+ init.flags = 0;
+ parent_names[0] = __clk_get_name(clk);
+ init.parent_names = parent_names;
+ init.num_parents = 1;
+
+ pll->base = priv->regs;
+ pll->hw.init = &init;
+
+ clk = devm_clk_register(dev, &pll->hw);
+ kfree(name);
+ if (IS_ERR(clk)) {
+ dev_err(dev, "failed to register input mux\n");
+ return PTR_ERR(clk);
+ }
+
+ priv->pll = clk;
+
+ return 0;
+}
+
+static int g12a_mdio_mux_probe(struct platform_device *pdev)
+{
+ struct device *dev = &pdev->dev;
+ struct g12a_mdio_mux *priv;
+ struct resource *res;
+ int ret;
+
+ priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
+ if (!priv)
+ return -ENOMEM;
+
+ platform_set_drvdata(pdev, priv);
+
+ res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+ priv->regs = devm_ioremap_resource(dev, res);
+ if (IS_ERR(priv->regs))
+ return PTR_ERR(priv->regs);
+
+ priv->pclk = devm_clk_get(dev, "pclk");
+ if (IS_ERR(priv->pclk)) {
+ ret = PTR_ERR(priv->pclk);
+ if (ret != -EPROBE_DEFER)
+ dev_err(dev, "failed to get peripheral clock\n");
+ return ret;
+ }
+
+ /* Make sure the device registers are clocked */
+ ret = clk_prepare_enable(priv->pclk);
+ if (ret) {
+ dev_err(dev, "failed to enable peripheral clock");
+ return ret;
+ }
+
+ /* Register PLL in CCF */
+ ret = g12a_ephy_glue_clk_register(dev);
+ if (ret)
+ goto err;
+
+ ret = mdio_mux_init(dev, dev->of_node, g12a_mdio_switch_fn,
+ &priv->mux_handle, dev, NULL);
+ if (ret) {
+ if (ret != -EPROBE_DEFER)
+ dev_err(dev, "mdio multiplexer init failed: %d", ret);
+ goto err;
+ }
+
+ return 0;
+
+err:
+ clk_disable_unprepare(priv->pclk);
+ return ret;
+}
+
+static int g12a_mdio_mux_remove(struct platform_device *pdev)
+{
+ struct g12a_mdio_mux *priv = platform_get_drvdata(pdev);
+
+ mdio_mux_uninit(priv->mux_handle);
+
+ if (priv->pll_is_enabled)
+ clk_disable_unprepare(priv->pll);
+
+ clk_disable_unprepare(priv->pclk);
+
+ return 0;
+}
+
+static struct platform_driver g12a_mdio_mux_driver = {
+ .probe = g12a_mdio_mux_probe,
+ .remove = g12a_mdio_mux_remove,
+ .driver = {
+ .name = "g12a-mdio_mux",
+ .of_match_table = g12a_mdio_mux_match,
+ },
+};
+module_platform_driver(g12a_mdio_mux_driver);
+
+MODULE_DESCRIPTION("Amlogic G12a MDIO multiplexer driver");
+MODULE_AUTHOR("Jerome Brunet <jbrunet@baylibre.com>");
+MODULE_LICENSE("GPL v2");
static struct phy_driver meson_gxl_phy[] = {
{
- .phy_id = 0x01814400,
- .phy_id_mask = 0xfffffff0,
+ PHY_ID_MATCH_EXACT(0x01814400),
.name = "Meson GXL Internal PHY",
- .features = PHY_BASIC_FEATURES,
+ /* PHY_BASIC_FEATURES */
.flags = PHY_IS_INTERNAL,
.soft_reset = genphy_soft_reset,
.config_init = meson_gxl_config_init,
- .aneg_done = genphy_aneg_done,
.read_status = meson_gxl_read_status,
.ack_interrupt = meson_gxl_ack_interrupt,
.config_intr = meson_gxl_config_intr,
.suspend = genphy_suspend,
.resume = genphy_resume,
+ }, {
+ PHY_ID_MATCH_EXACT(0x01803301),
+ .name = "Meson G12A Internal PHY",
+ /* PHY_BASIC_FEATURES */
+ .flags = PHY_IS_INTERNAL,
+ .soft_reset = genphy_soft_reset,
+ .ack_interrupt = meson_gxl_ack_interrupt,
+ .config_intr = meson_gxl_config_intr,
+ .suspend = genphy_suspend,
+ .resume = genphy_resume,
},
};
static struct mdio_device_id __maybe_unused meson_gxl_tbl[] = {
- { 0x01814400, 0xfffffff0 },
+ { PHY_ID_MATCH_VENDOR(0x01814400) },
+ { PHY_ID_MATCH_VENDOR(0x01803301) },
{ }
};
return 0;
}
+static int ksz9031_get_features(struct phy_device *phydev)
+{
+ int ret;
+
+ ret = genphy_read_abilities(phydev);
+ if (ret < 0)
+ return ret;
+
+ /* Silicon Errata Sheet (DS80000691D or DS80000692D):
+ * Whenever the device's Asymmetric Pause capability is set to 1,
+ * link-up may fail after a link-up to link-down transition.
+ *
+ * Workaround:
+ * Do not enable the Asymmetric Pause capability bit.
+ */
+ linkmode_clear_bit(ETHTOOL_LINK_MODE_Asym_Pause_BIT, phydev->supported);
+
+ /* We force setting the Pause capability as the core will force the
+ * Asymmetric Pause capability to 1 otherwise.
+ */
+ linkmode_set_bit(ETHTOOL_LINK_MODE_Pause_BIT, phydev->supported);
+
+ return 0;
+}
+
static int ksz9031_read_status(struct phy_device *phydev)
{
int err;
.phy_id = PHY_ID_KS8737,
.phy_id_mask = MICREL_PHY_ID_MASK,
.name = "Micrel KS8737",
- .features = PHY_BASIC_FEATURES,
+ /* PHY_BASIC_FEATURES */
.driver_data = &ks8737_type,
.config_init = kszphy_config_init,
.ack_interrupt = kszphy_ack_interrupt,
.phy_id = PHY_ID_KSZ8021,
.phy_id_mask = 0x00ffffff,
.name = "Micrel KSZ8021 or KSZ8031",
- .features = PHY_BASIC_FEATURES,
+ /* PHY_BASIC_FEATURES */
.driver_data = &ksz8021_type,
.probe = kszphy_probe,
.config_init = kszphy_config_init,
.phy_id = PHY_ID_KSZ8031,
.phy_id_mask = 0x00ffffff,
.name = "Micrel KSZ8031",
- .features = PHY_BASIC_FEATURES,
+ /* PHY_BASIC_FEATURES */
.driver_data = &ksz8021_type,
.probe = kszphy_probe,
.config_init = kszphy_config_init,
.phy_id = PHY_ID_KSZ8041,
.phy_id_mask = MICREL_PHY_ID_MASK,
.name = "Micrel KSZ8041",
- .features = PHY_BASIC_FEATURES,
+ /* PHY_BASIC_FEATURES */
.driver_data = &ksz8041_type,
.probe = kszphy_probe,
.config_init = ksz8041_config_init,
.phy_id = PHY_ID_KSZ8041RNLI,
.phy_id_mask = MICREL_PHY_ID_MASK,
.name = "Micrel KSZ8041RNLI",
- .features = PHY_BASIC_FEATURES,
+ /* PHY_BASIC_FEATURES */
.driver_data = &ksz8041_type,
.probe = kszphy_probe,
.config_init = kszphy_config_init,
.phy_id = PHY_ID_KSZ8051,
.phy_id_mask = MICREL_PHY_ID_MASK,
.name = "Micrel KSZ8051",
- .features = PHY_BASIC_FEATURES,
+ /* PHY_BASIC_FEATURES */
.driver_data = &ksz8051_type,
.probe = kszphy_probe,
.config_init = kszphy_config_init,
.phy_id = PHY_ID_KSZ8001,
.name = "Micrel KSZ8001 or KS8721",
.phy_id_mask = 0x00fffffc,
- .features = PHY_BASIC_FEATURES,
+ /* PHY_BASIC_FEATURES */
.driver_data = &ksz8041_type,
.probe = kszphy_probe,
.config_init = kszphy_config_init,
.phy_id = PHY_ID_KSZ8081,
.name = "Micrel KSZ8081 or KSZ8091",
.phy_id_mask = MICREL_PHY_ID_MASK,
- .features = PHY_BASIC_FEATURES,
+ /* PHY_BASIC_FEATURES */
.driver_data = &ksz8081_type,
.probe = kszphy_probe,
.config_init = kszphy_config_init,
.phy_id = PHY_ID_KSZ8061,
.name = "Micrel KSZ8061",
.phy_id_mask = MICREL_PHY_ID_MASK,
- .features = PHY_BASIC_FEATURES,
+ /* PHY_BASIC_FEATURES */
.config_init = ksz8061_config_init,
.ack_interrupt = kszphy_ack_interrupt,
.config_intr = kszphy_config_intr,
.phy_id = PHY_ID_KSZ9021,
.phy_id_mask = 0x000ffffe,
.name = "Micrel KSZ9021 Gigabit PHY",
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.driver_data = &ksz9021_type,
.probe = kszphy_probe,
.config_init = ksz9021_config_init,
.phy_id = PHY_ID_KSZ9031,
.phy_id_mask = MICREL_PHY_ID_MASK,
.name = "Micrel KSZ9031 Gigabit PHY",
- .features = PHY_GBIT_FEATURES,
.driver_data = &ksz9021_type,
.probe = kszphy_probe,
+ .get_features = ksz9031_get_features,
.config_init = ksz9031_config_init,
.soft_reset = genphy_soft_reset,
.read_status = ksz9031_read_status,
.phy_id = PHY_ID_KSZ9131,
.phy_id_mask = MICREL_PHY_ID_MASK,
.name = "Microchip KSZ9131 Gigabit PHY",
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.driver_data = &ksz9021_type,
.probe = kszphy_probe,
.config_init = ksz9131_config_init,
.phy_id = PHY_ID_KSZ8873MLL,
.phy_id_mask = MICREL_PHY_ID_MASK,
.name = "Micrel KSZ8873MLL Switch",
- .features = PHY_BASIC_FEATURES,
+ /* PHY_BASIC_FEATURES */
.config_init = kszphy_config_init,
.config_aneg = ksz8873mll_config_aneg,
.read_status = ksz8873mll_read_status,
.phy_id = PHY_ID_KSZ886X,
.phy_id_mask = MICREL_PHY_ID_MASK,
.name = "Micrel KSZ886X Switch",
- .features = PHY_BASIC_FEATURES,
+ /* PHY_BASIC_FEATURES */
.config_init = kszphy_config_init,
.suspend = genphy_suspend,
.resume = genphy_resume,
.phy_id = PHY_ID_KSZ8795,
.phy_id_mask = MICREL_PHY_ID_MASK,
.name = "Micrel KSZ8795",
- .features = PHY_BASIC_FEATURES,
+ /* PHY_BASIC_FEATURES */
.config_init = kszphy_config_init,
.config_aneg = ksz8873mll_config_aneg,
.read_status = ksz8873mll_read_status,
.phy_id = PHY_ID_KSZ9477,
.phy_id_mask = MICREL_PHY_ID_MASK,
.name = "Microchip KSZ9477",
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.config_init = kszphy_config_init,
.suspend = genphy_suspend,
.resume = genphy_resume,
.phy_id_mask = 0xfffffff0,
.name = "Microchip LAN88xx",
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.probe = lan88xx_probe,
.remove = lan88xx_remove,
.phy_id = PHY_ID_VSC8530,
.name = "Microsemi FE VSC8530",
.phy_id_mask = 0xfffffff0,
- .features = PHY_BASIC_FEATURES,
+ /* PHY_BASIC_FEATURES */
.soft_reset = &genphy_soft_reset,
.config_init = &vsc85xx_config_init,
.config_aneg = &vsc85xx_config_aneg,
.phy_id = PHY_ID_VSC8531,
.name = "Microsemi VSC8531",
.phy_id_mask = 0xfffffff0,
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.soft_reset = &genphy_soft_reset,
.config_init = &vsc85xx_config_init,
.config_aneg = &vsc85xx_config_aneg,
.phy_id = PHY_ID_VSC8540,
.name = "Microsemi FE VSC8540 SyncE",
.phy_id_mask = 0xfffffff0,
- .features = PHY_BASIC_FEATURES,
+ /* PHY_BASIC_FEATURES */
.soft_reset = &genphy_soft_reset,
.config_init = &vsc85xx_config_init,
.config_aneg = &vsc85xx_config_aneg,
.phy_id = PHY_ID_VSC8541,
.name = "Microsemi VSC8541 SyncE",
.phy_id_mask = 0xfffffff0,
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.soft_reset = &genphy_soft_reset,
.config_init = &vsc85xx_config_init,
.config_aneg = &vsc85xx_config_aneg,
.phy_id = PHY_ID_VSC8574,
.name = "Microsemi GE VSC8574 SyncE",
.phy_id_mask = 0xfffffff0,
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.soft_reset = &genphy_soft_reset,
.config_init = &vsc8584_config_init,
.config_aneg = &vsc85xx_config_aneg,
.phy_id = PHY_ID_VSC8584,
.name = "Microsemi GE VSC8584 SyncE",
.phy_id_mask = 0xfffffff0,
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.soft_reset = &genphy_soft_reset,
.config_init = &vsc8584_config_init,
.config_aneg = &vsc85xx_config_aneg,
.phy_id = DP83865_PHY_ID,
.phy_id_mask = 0xfffffff0,
.name = "NatSemi DP83865",
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.config_init = ns_config_init,
.ack_interrupt = ns_ack_interrupt,
.config_intr = ns_config_intr,
{
int val;
+ val = phy_read_mmd(phydev, MDIO_MMD_AN, MDIO_STAT1);
+ if (val < 0)
+ return val;
+
+ if (!(val & MDIO_AN_STAT1_COMPLETE)) {
+ linkmode_clear_bit(ETHTOOL_LINK_MODE_Autoneg_BIT,
+ phydev->lp_advertising);
+ mii_10gbt_stat_mod_linkmode_lpa_t(phydev->lp_advertising, 0);
+ mii_adv_mod_linkmode_adv_t(phydev->lp_advertising, 0);
+ phydev->pause = 0;
+ phydev->asym_pause = 0;
+
+ return 0;
+ }
+
+ linkmode_mod_bit(ETHTOOL_LINK_MODE_Autoneg_BIT, phydev->lp_advertising,
+ val & MDIO_AN_STAT1_LPABLE);
+
/* Read the link partner's base page advertisement */
val = phy_read_mmd(phydev, MDIO_MMD_AN, MDIO_AN_LPA);
if (val < 0)
return val;
- mii_lpa_mod_linkmode_lpa_t(phydev->lp_advertising, val);
+ mii_adv_mod_linkmode_adv_t(phydev->lp_advertising, val);
phydev->pause = val & LPA_PAUSE_CAP ? 1 : 0;
phydev->asym_pause = val & LPA_PAUSE_ASYM ? 1 : 0;
}
EXPORT_SYMBOL_GPL(gen10g_config_aneg);
-static int gen10g_read_status(struct phy_device *phydev)
-{
- /* For now just lie and say it's 10G all the time */
- phydev->speed = SPEED_10000;
- phydev->duplex = DUPLEX_FULL;
-
- return genphy_c45_read_link(phydev);
-}
-
-struct phy_driver genphy_10g_driver = {
+struct phy_driver genphy_c45_driver = {
.phy_id = 0xffffffff,
.phy_id_mask = 0xffffffff,
- .name = "Generic 10G PHY",
+ .name = "Generic Clause 45 PHY",
.soft_reset = genphy_no_soft_reset,
- .features = PHY_10GBIT_FEATURES,
- .config_aneg = gen10g_config_aneg,
- .read_status = gen10g_read_status,
+ .read_status = genphy_c45_read_status,
};
const char *phy_speed_to_str(int speed)
{
+ BUILD_BUG_ON_MSG(__ETHTOOL_LINK_MODE_MASK_NBITS != 67,
+ "Enum ethtool_link_mode_bit_indices and phylib are out of sync. "
+ "If a speed or mode has been added please update phy_speed_to_str "
+ "and the PHY settings array.\n");
+
switch (speed) {
case SPEED_10:
return "10Mbps";
return "56Gbps";
case SPEED_100000:
return "100Gbps";
+ case SPEED_200000:
+ return "200Gbps";
case SPEED_UNKNOWN:
return "Unknown";
default:
/* A mapping of all SUPPORTED settings to speed/duplex. This table
* must be grouped by speed and sorted in descending match priority
* - iow, descending speed. */
+
+#define PHY_SETTING(s, d, b) { .speed = SPEED_ ## s, .duplex = DUPLEX_ ## d, \
+ .bit = ETHTOOL_LINK_MODE_ ## b ## _BIT}
+
static const struct phy_setting settings[] = {
+ /* 200G */
+ PHY_SETTING( 200000, FULL, 200000baseCR4_Full ),
+ PHY_SETTING( 200000, FULL, 200000baseKR4_Full ),
+ PHY_SETTING( 200000, FULL, 200000baseLR4_ER4_FR4_Full ),
+ PHY_SETTING( 200000, FULL, 200000baseDR4_Full ),
+ PHY_SETTING( 200000, FULL, 200000baseSR4_Full ),
/* 100G */
- {
- .speed = SPEED_100000,
- .duplex = DUPLEX_FULL,
- .bit = ETHTOOL_LINK_MODE_100000baseCR4_Full_BIT,
- },
- {
- .speed = SPEED_100000,
- .duplex = DUPLEX_FULL,
- .bit = ETHTOOL_LINK_MODE_100000baseKR4_Full_BIT,
- },
- {
- .speed = SPEED_100000,
- .duplex = DUPLEX_FULL,
- .bit = ETHTOOL_LINK_MODE_100000baseLR4_ER4_Full_BIT,
- },
- {
- .speed = SPEED_100000,
- .duplex = DUPLEX_FULL,
- .bit = ETHTOOL_LINK_MODE_100000baseSR4_Full_BIT,
- },
+ PHY_SETTING( 100000, FULL, 100000baseCR4_Full ),
+ PHY_SETTING( 100000, FULL, 100000baseKR4_Full ),
+ PHY_SETTING( 100000, FULL, 100000baseLR4_ER4_Full ),
+ PHY_SETTING( 100000, FULL, 100000baseSR4_Full ),
+ PHY_SETTING( 100000, FULL, 100000baseCR2_Full ),
+ PHY_SETTING( 100000, FULL, 100000baseKR2_Full ),
+ PHY_SETTING( 100000, FULL, 100000baseLR2_ER2_FR2_Full ),
+ PHY_SETTING( 100000, FULL, 100000baseDR2_Full ),
+ PHY_SETTING( 100000, FULL, 100000baseSR2_Full ),
/* 56G */
- {
- .speed = SPEED_56000,
- .duplex = DUPLEX_FULL,
- .bit = ETHTOOL_LINK_MODE_56000baseCR4_Full_BIT,
- },
- {
- .speed = SPEED_56000,
- .duplex = DUPLEX_FULL,
- .bit = ETHTOOL_LINK_MODE_56000baseKR4_Full_BIT,
- },
- {
- .speed = SPEED_56000,
- .duplex = DUPLEX_FULL,
- .bit = ETHTOOL_LINK_MODE_56000baseLR4_Full_BIT,
- },
- {
- .speed = SPEED_56000,
- .duplex = DUPLEX_FULL,
- .bit = ETHTOOL_LINK_MODE_56000baseSR4_Full_BIT,
- },
+ PHY_SETTING( 56000, FULL, 56000baseCR4_Full ),
+ PHY_SETTING( 56000, FULL, 56000baseKR4_Full ),
+ PHY_SETTING( 56000, FULL, 56000baseLR4_Full ),
+ PHY_SETTING( 56000, FULL, 56000baseSR4_Full ),
/* 50G */
- {
- .speed = SPEED_50000,
- .duplex = DUPLEX_FULL,
- .bit = ETHTOOL_LINK_MODE_50000baseCR2_Full_BIT,
- },
- {
- .speed = SPEED_50000,
- .duplex = DUPLEX_FULL,
- .bit = ETHTOOL_LINK_MODE_50000baseKR2_Full_BIT,
- },
- {
- .speed = SPEED_50000,
- .duplex = DUPLEX_FULL,
- .bit = ETHTOOL_LINK_MODE_50000baseSR2_Full_BIT,
- },
+ PHY_SETTING( 50000, FULL, 50000baseCR2_Full ),
+ PHY_SETTING( 50000, FULL, 50000baseKR2_Full ),
+ PHY_SETTING( 50000, FULL, 50000baseSR2_Full ),
+ PHY_SETTING( 50000, FULL, 50000baseCR_Full ),
+ PHY_SETTING( 50000, FULL, 50000baseKR_Full ),
+ PHY_SETTING( 50000, FULL, 50000baseLR_ER_FR_Full ),
+ PHY_SETTING( 50000, FULL, 50000baseDR_Full ),
+ PHY_SETTING( 50000, FULL, 50000baseSR_Full ),
/* 40G */
- {
- .speed = SPEED_40000,
- .duplex = DUPLEX_FULL,
- .bit = ETHTOOL_LINK_MODE_40000baseCR4_Full_BIT,
- },
- {
- .speed = SPEED_40000,
- .duplex = DUPLEX_FULL,
- .bit = ETHTOOL_LINK_MODE_40000baseKR4_Full_BIT,
- },
- {
- .speed = SPEED_40000,
- .duplex = DUPLEX_FULL,
- .bit = ETHTOOL_LINK_MODE_40000baseLR4_Full_BIT,
- },
- {
- .speed = SPEED_40000,
- .duplex = DUPLEX_FULL,
- .bit = ETHTOOL_LINK_MODE_40000baseSR4_Full_BIT,
- },
+ PHY_SETTING( 40000, FULL, 40000baseCR4_Full ),
+ PHY_SETTING( 40000, FULL, 40000baseKR4_Full ),
+ PHY_SETTING( 40000, FULL, 40000baseLR4_Full ),
+ PHY_SETTING( 40000, FULL, 40000baseSR4_Full ),
/* 25G */
- {
- .speed = SPEED_25000,
- .duplex = DUPLEX_FULL,
- .bit = ETHTOOL_LINK_MODE_25000baseCR_Full_BIT,
- },
- {
- .speed = SPEED_25000,
- .duplex = DUPLEX_FULL,
- .bit = ETHTOOL_LINK_MODE_25000baseKR_Full_BIT,
- },
- {
- .speed = SPEED_25000,
- .duplex = DUPLEX_FULL,
- .bit = ETHTOOL_LINK_MODE_25000baseSR_Full_BIT,
- },
-
+ PHY_SETTING( 25000, FULL, 25000baseCR_Full ),
+ PHY_SETTING( 25000, FULL, 25000baseKR_Full ),
+ PHY_SETTING( 25000, FULL, 25000baseSR_Full ),
/* 20G */
- {
- .speed = SPEED_20000,
- .duplex = DUPLEX_FULL,
- .bit = ETHTOOL_LINK_MODE_20000baseKR2_Full_BIT,
- },
- {
- .speed = SPEED_20000,
- .duplex = DUPLEX_FULL,
- .bit = ETHTOOL_LINK_MODE_20000baseMLD2_Full_BIT,
- },
+ PHY_SETTING( 20000, FULL, 20000baseKR2_Full ),
+ PHY_SETTING( 20000, FULL, 20000baseMLD2_Full ),
/* 10G */
- {
- .speed = SPEED_10000,
- .duplex = DUPLEX_FULL,
- .bit = ETHTOOL_LINK_MODE_10000baseCR_Full_BIT,
- },
- {
- .speed = SPEED_10000,
- .duplex = DUPLEX_FULL,
- .bit = ETHTOOL_LINK_MODE_10000baseER_Full_BIT,
- },
- {
- .speed = SPEED_10000,
- .duplex = DUPLEX_FULL,
- .bit = ETHTOOL_LINK_MODE_10000baseKR_Full_BIT,
- },
- {
- .speed = SPEED_10000,
- .duplex = DUPLEX_FULL,
- .bit = ETHTOOL_LINK_MODE_10000baseKX4_Full_BIT,
- },
- {
- .speed = SPEED_10000,
- .duplex = DUPLEX_FULL,
- .bit = ETHTOOL_LINK_MODE_10000baseLR_Full_BIT,
- },
- {
- .speed = SPEED_10000,
- .duplex = DUPLEX_FULL,
- .bit = ETHTOOL_LINK_MODE_10000baseLRM_Full_BIT,
- },
- {
- .speed = SPEED_10000,
- .duplex = DUPLEX_FULL,
- .bit = ETHTOOL_LINK_MODE_10000baseR_FEC_BIT,
- },
- {
- .speed = SPEED_10000,
- .duplex = DUPLEX_FULL,
- .bit = ETHTOOL_LINK_MODE_10000baseSR_Full_BIT,
- },
- {
- .speed = SPEED_10000,
- .duplex = DUPLEX_FULL,
- .bit = ETHTOOL_LINK_MODE_10000baseT_Full_BIT,
- },
+ PHY_SETTING( 10000, FULL, 10000baseCR_Full ),
+ PHY_SETTING( 10000, FULL, 10000baseER_Full ),
+ PHY_SETTING( 10000, FULL, 10000baseKR_Full ),
+ PHY_SETTING( 10000, FULL, 10000baseKX4_Full ),
+ PHY_SETTING( 10000, FULL, 10000baseLR_Full ),
+ PHY_SETTING( 10000, FULL, 10000baseLRM_Full ),
+ PHY_SETTING( 10000, FULL, 10000baseR_FEC ),
+ PHY_SETTING( 10000, FULL, 10000baseSR_Full ),
+ PHY_SETTING( 10000, FULL, 10000baseT_Full ),
/* 5G */
- {
- .speed = SPEED_5000,
- .duplex = DUPLEX_FULL,
- .bit = ETHTOOL_LINK_MODE_5000baseT_Full_BIT,
- },
-
+ PHY_SETTING( 5000, FULL, 5000baseT_Full ),
/* 2.5G */
- {
- .speed = SPEED_2500,
- .duplex = DUPLEX_FULL,
- .bit = ETHTOOL_LINK_MODE_2500baseT_Full_BIT,
- },
- {
- .speed = SPEED_2500,
- .duplex = DUPLEX_FULL,
- .bit = ETHTOOL_LINK_MODE_2500baseX_Full_BIT,
- },
+ PHY_SETTING( 2500, FULL, 2500baseT_Full ),
+ PHY_SETTING( 2500, FULL, 2500baseX_Full ),
/* 1G */
- {
- .speed = SPEED_1000,
- .duplex = DUPLEX_FULL,
- .bit = ETHTOOL_LINK_MODE_1000baseKX_Full_BIT,
- },
- {
- .speed = SPEED_1000,
- .duplex = DUPLEX_FULL,
- .bit = ETHTOOL_LINK_MODE_1000baseT_Full_BIT,
- },
- {
- .speed = SPEED_1000,
- .duplex = DUPLEX_HALF,
- .bit = ETHTOOL_LINK_MODE_1000baseT_Half_BIT,
- },
- {
- .speed = SPEED_1000,
- .duplex = DUPLEX_FULL,
- .bit = ETHTOOL_LINK_MODE_1000baseX_Full_BIT,
- },
+ PHY_SETTING( 1000, FULL, 1000baseKX_Full ),
+ PHY_SETTING( 1000, FULL, 1000baseT_Full ),
+ PHY_SETTING( 1000, HALF, 1000baseT_Half ),
+ PHY_SETTING( 1000, FULL, 1000baseX_Full ),
/* 100M */
- {
- .speed = SPEED_100,
- .duplex = DUPLEX_FULL,
- .bit = ETHTOOL_LINK_MODE_100baseT_Full_BIT,
- },
- {
- .speed = SPEED_100,
- .duplex = DUPLEX_HALF,
- .bit = ETHTOOL_LINK_MODE_100baseT_Half_BIT,
- },
+ PHY_SETTING( 100, FULL, 100baseT_Full ),
+ PHY_SETTING( 100, HALF, 100baseT_Half ),
/* 10M */
- {
- .speed = SPEED_10,
- .duplex = DUPLEX_FULL,
- .bit = ETHTOOL_LINK_MODE_10baseT_Full_BIT,
- },
- {
- .speed = SPEED_10,
- .duplex = DUPLEX_HALF,
- .bit = ETHTOOL_LINK_MODE_10baseT_Half_BIT,
- },
+ PHY_SETTING( 10, FULL, 10baseT_Full ),
+ PHY_SETTING( 10, HALF, 10baseT_Half ),
};
+#undef PHY_SETTING
/**
* phy_lookup_setting - lookup a PHY setting
old_state = phydev->state;
- if (phydev->drv && phydev->drv->link_change_notify)
- phydev->drv->link_change_notify(phydev);
-
switch (phydev->state) {
case PHY_DOWN:
case PHY_READY:
if (err < 0)
phy_error(phydev);
- if (old_state != phydev->state)
+ if (old_state != phydev->state) {
phydev_dbg(phydev, "PHY state change %s -> %s\n",
phy_state_to_str(old_state),
phy_state_to_str(phydev->state));
+ if (phydev->drv && phydev->drv->link_change_notify)
+ phydev->drv->link_change_notify(phydev);
+ }
/* Only re-schedule a PHY state machine change if we are polling the
* PHY, if PHY_IGNORE_INTERRUPT is set, then we will be moving
}
static struct phy_driver genphy_driver;
-extern struct phy_driver genphy_10g_driver;
+extern struct phy_driver genphy_c45_driver;
static LIST_HEAD(phy_fixup_list);
static DEFINE_MUTEX(phy_fixup_lock);
*/
if (!d->driver) {
if (phydev->is_c45)
- d->driver = &genphy_10g_driver.mdiodrv.driver;
+ d->driver = &genphy_c45_driver.mdiodrv.driver;
else
d->driver = &genphy_driver.mdiodrv.driver;
bool phy_driver_is_genphy_10g(struct phy_device *phydev)
{
return phy_driver_is_genphy_kind(phydev,
- &genphy_10g_driver.mdiodrv.driver);
+ &genphy_c45_driver.mdiodrv.driver);
}
EXPORT_SYMBOL_GPL(phy_driver_is_genphy_10g);
*/
if (!phy_polling_mode(phydev)) {
status = phy_read(phydev, MII_BMSR);
- if (status < 0) {
+ if (status < 0)
return status;
- } else if (status & BMSR_LSTATUS) {
- phydev->link = 1;
- return 0;
- }
+ else if (status & BMSR_LSTATUS)
+ goto done;
}
/* Read link and autonegotiation status */
status = phy_read(phydev, MII_BMSR);
if (status < 0)
return status;
-
- if ((status & BMSR_LSTATUS) == 0)
- phydev->link = 0;
- else
- phydev->link = 1;
+done:
+ phydev->link = status & BMSR_LSTATUS ? 1 : 0;
+ phydev->autoneg_complete = status & BMSR_ANEGCOMPLETE ? 1 : 0;
return 0;
}
*/
int genphy_read_status(struct phy_device *phydev)
{
- int adv;
- int err;
- int lpa;
- int lpagb = 0;
+ int adv, lpa, lpagb, err;
/* Update the link, but return if there was an error */
err = genphy_update_link(phydev);
if (err)
return err;
+ phydev->speed = SPEED_UNKNOWN;
+ phydev->duplex = DUPLEX_UNKNOWN;
+ phydev->pause = 0;
+ phydev->asym_pause = 0;
+
linkmode_zero(phydev->lp_advertising);
- if (AUTONEG_ENABLE == phydev->autoneg) {
- if (linkmode_test_bit(ETHTOOL_LINK_MODE_1000baseT_Half_BIT,
- phydev->supported) ||
- linkmode_test_bit(ETHTOOL_LINK_MODE_1000baseT_Full_BIT,
- phydev->supported)) {
+ if (phydev->autoneg == AUTONEG_ENABLE && phydev->autoneg_complete) {
+ if (phydev->is_gigabit_capable) {
lpagb = phy_read(phydev, MII_STAT1000);
if (lpagb < 0)
return lpagb;
return lpa;
mii_lpa_mod_linkmode_lpa_t(phydev->lp_advertising, lpa);
-
- phydev->speed = SPEED_UNKNOWN;
- phydev->duplex = DUPLEX_UNKNOWN;
- phydev->pause = 0;
- phydev->asym_pause = 0;
-
phy_resolve_aneg_linkmode(phydev);
- } else {
+ } else if (phydev->autoneg == AUTONEG_DISABLE) {
int bmcr = phy_read(phydev, MII_BMCR);
if (bmcr < 0)
phydev->speed = SPEED_100;
else
phydev->speed = SPEED_10;
-
- phydev->pause = 0;
- phydev->asym_pause = 0;
}
return 0;
}
EXPORT_SYMBOL(genphy_config_init);
+/**
+ * genphy_read_abilities - read PHY abilities from Clause 22 registers
+ * @phydev: target phy_device struct
+ *
+ * Description: Reads the PHY's abilities and populates
+ * phydev->supported accordingly.
+ *
+ * Returns: 0 on success, < 0 on failure
+ */
+int genphy_read_abilities(struct phy_device *phydev)
+{
+ int val;
+
+ linkmode_set_bit_array(phy_basic_ports_array,
+ ARRAY_SIZE(phy_basic_ports_array),
+ phydev->supported);
+
+ val = phy_read(phydev, MII_BMSR);
+ if (val < 0)
+ return val;
+
+ linkmode_mod_bit(ETHTOOL_LINK_MODE_Autoneg_BIT, phydev->supported,
+ val & BMSR_ANEGCAPABLE);
+
+ linkmode_mod_bit(ETHTOOL_LINK_MODE_100baseT_Full_BIT, phydev->supported,
+ val & BMSR_100FULL);
+ linkmode_mod_bit(ETHTOOL_LINK_MODE_100baseT_Half_BIT, phydev->supported,
+ val & BMSR_100HALF);
+ linkmode_mod_bit(ETHTOOL_LINK_MODE_10baseT_Full_BIT, phydev->supported,
+ val & BMSR_10FULL);
+ linkmode_mod_bit(ETHTOOL_LINK_MODE_10baseT_Half_BIT, phydev->supported,
+ val & BMSR_10HALF);
+
+ if (val & BMSR_ESTATEN) {
+ val = phy_read(phydev, MII_ESTATUS);
+ if (val < 0)
+ return val;
+
+ linkmode_mod_bit(ETHTOOL_LINK_MODE_1000baseT_Full_BIT,
+ phydev->supported, val & ESTATUS_1000_TFULL);
+ linkmode_mod_bit(ETHTOOL_LINK_MODE_1000baseT_Half_BIT,
+ phydev->supported, val & ESTATUS_1000_THALF);
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL(genphy_read_abilities);
+
/* This is used for the phy device which doesn't support the MMD extended
* register access, but it does have side effect when we are trying to access
* the MMD register via indirect method.
*/
if (phydrv->features) {
linkmode_copy(phydev->supported, phydrv->features);
- } else {
+ } else if (phydrv->get_features) {
err = phydrv->get_features(phydev);
- if (err)
- goto out;
+ } else if (phydev->is_c45) {
+ err = genphy_c45_pma_read_abilities(phydev);
+ } else {
+ err = genphy_read_abilities(phydev);
}
+ if (err)
+ goto out;
+
+ if (linkmode_test_bit(ETHTOOL_LINK_MODE_1000baseT_Half_BIT,
+ phydev->supported))
+ phydev->is_gigabit_capable = 1;
+ if (linkmode_test_bit(ETHTOOL_LINK_MODE_1000baseT_Full_BIT,
+ phydev->supported))
+ phydev->is_gigabit_capable = 1;
+
of_set_phy_supported(phydev);
linkmode_copy(phydev->advertising, phydev->supported);
int retval;
/* Either the features are hard coded, or dynamically
- * determine. It cannot be both or neither
+ * determined. It cannot be both.
*/
- if (WARN_ON((!new_driver->features && !new_driver->get_features) ||
- (new_driver->features && new_driver->get_features))) {
- pr_err("%s: Driver features are missing\n", new_driver->name);
+ if (WARN_ON(new_driver->features && new_driver->get_features)) {
+ pr_err("%s: features and get_features must not both be set\n",
+ new_driver->name);
return -EINVAL;
}
.phy_id_mask = 0xffffffff,
.name = "Generic PHY",
.soft_reset = genphy_no_soft_reset,
- .config_init = genphy_config_init,
- .features = PHY_GBIT_ALL_PORTS_FEATURES,
+ .get_features = genphy_read_abilities,
.aneg_done = genphy_aneg_done,
.suspend = genphy_suspend,
.resume = genphy_resume,
features_init();
- rc = phy_driver_register(&genphy_10g_driver, THIS_MODULE);
+ rc = phy_driver_register(&genphy_c45_driver, THIS_MODULE);
if (rc)
- goto err_10g;
+ goto err_c45;
rc = phy_driver_register(&genphy_driver, THIS_MODULE);
if (rc) {
- phy_driver_unregister(&genphy_10g_driver);
-err_10g:
+ phy_driver_unregister(&genphy_c45_driver);
+err_c45:
mdio_bus_exit();
}
static void __exit phy_exit(void)
{
- phy_driver_unregister(&genphy_10g_driver);
+ phy_driver_unregister(&genphy_c45_driver);
phy_driver_unregister(&genphy_driver);
mdio_bus_exit();
}
.phy_id = 0x00181440,
.name = "QS6612",
.phy_id_mask = 0xfffffff0,
- .features = PHY_BASIC_FEATURES,
+ /* PHY_BASIC_FEATURES */
.config_init = qs6612_config_init,
.ack_interrupt = qs6612_ack_interrupt,
.config_intr = qs6612_config_intr,
static int rtl8211c_config_init(struct phy_device *phydev)
{
/* RTL8211C has an issue when operating in Gigabit slave mode */
- phy_set_bits(phydev, MII_CTRL1000,
- CTL1000_ENABLE_MASTER | CTL1000_AS_MASTER);
-
- return genphy_config_init(phydev);
+ return phy_set_bits(phydev, MII_CTRL1000,
+ CTL1000_ENABLE_MASTER | CTL1000_AS_MASTER);
}
static int rtl8211f_config_init(struct phy_device *phydev)
{
- int ret;
u16 val = 0;
- ret = genphy_config_init(phydev);
- if (ret < 0)
- return ret;
-
/* enable TX-delay for rgmii-id and rgmii-txid, otherwise disable it */
if (phydev->interface == PHY_INTERFACE_MODE_RGMII_ID ||
phydev->interface == PHY_INTERFACE_MODE_RGMII_TXID)
{
int ret;
- ret = genphy_config_init(phydev);
- if (ret < 0)
- return ret;
-
ret = phy_set_bits(phydev, RTL8366RB_POWER_SAVE,
RTL8366RB_POWER_SAVE_ON);
if (ret) {
{
PHY_ID_MATCH_EXACT(0x00008201),
.name = "RTL8201CP Ethernet",
- .features = PHY_BASIC_FEATURES,
}, {
PHY_ID_MATCH_EXACT(0x001cc816),
.name = "RTL8201F Fast Ethernet",
- .features = PHY_BASIC_FEATURES,
.ack_interrupt = &rtl8201_ack_interrupt,
.config_intr = &rtl8201_config_intr,
.suspend = genphy_suspend,
}, {
PHY_ID_MATCH_EXACT(0x001cc910),
.name = "RTL8211 Gigabit Ethernet",
- .features = PHY_GBIT_FEATURES,
.config_aneg = rtl8211_config_aneg,
.read_mmd = &genphy_read_mmd_unsupported,
.write_mmd = &genphy_write_mmd_unsupported,
}, {
PHY_ID_MATCH_EXACT(0x001cc912),
.name = "RTL8211B Gigabit Ethernet",
- .features = PHY_GBIT_FEATURES,
.ack_interrupt = &rtl821x_ack_interrupt,
.config_intr = &rtl8211b_config_intr,
.read_mmd = &genphy_read_mmd_unsupported,
}, {
PHY_ID_MATCH_EXACT(0x001cc913),
.name = "RTL8211C Gigabit Ethernet",
- .features = PHY_GBIT_FEATURES,
.config_init = rtl8211c_config_init,
.read_mmd = &genphy_read_mmd_unsupported,
.write_mmd = &genphy_write_mmd_unsupported,
}, {
PHY_ID_MATCH_EXACT(0x001cc914),
.name = "RTL8211DN Gigabit Ethernet",
- .features = PHY_GBIT_FEATURES,
.ack_interrupt = rtl821x_ack_interrupt,
.config_intr = rtl8211e_config_intr,
.suspend = genphy_suspend,
}, {
PHY_ID_MATCH_EXACT(0x001cc915),
.name = "RTL8211E Gigabit Ethernet",
- .features = PHY_GBIT_FEATURES,
.ack_interrupt = &rtl821x_ack_interrupt,
.config_intr = &rtl8211e_config_intr,
.suspend = genphy_suspend,
}, {
PHY_ID_MATCH_EXACT(0x001cc916),
.name = "RTL8211F Gigabit Ethernet",
- .features = PHY_GBIT_FEATURES,
.config_init = &rtl8211f_config_init,
.ack_interrupt = &rtl8211f_ack_interrupt,
.config_intr = &rtl8211f_config_intr,
}, {
PHY_ID_MATCH_EXACT(0x001cc800),
.name = "Generic Realtek PHY",
- .features = PHY_GBIT_FEATURES,
- .config_init = genphy_config_init,
.suspend = genphy_suspend,
.resume = genphy_resume,
.read_page = rtl821x_read_page,
}, {
PHY_ID_MATCH_EXACT(0x001cc961),
.name = "RTL8366RB Gigabit Ethernet",
- .features = PHY_GBIT_FEATURES,
.config_init = &rtl8366rb_config_init,
/* These interrupts are handled by the irq controller
* embedded inside the RTL8366RB, they get unmasked when the
static void rockchip_link_change_notify(struct phy_device *phydev)
{
- int speed = SPEED_10;
-
- if (phydev->autoneg == AUTONEG_ENABLE) {
- int reg = phy_read(phydev, MII_SPECIAL_CONTROL_STATUS);
-
- if (reg < 0) {
- phydev_err(phydev, "phy_read err: %d.\n", reg);
- return;
- }
-
- if (reg & MII_SPEED_100)
- speed = SPEED_100;
- else if (reg & MII_SPEED_10)
- speed = SPEED_10;
- } else {
- int bmcr = phy_read(phydev, MII_BMCR);
-
- if (bmcr < 0) {
- phydev_err(phydev, "phy_read err: %d.\n", bmcr);
- return;
- }
-
- if (bmcr & BMCR_SPEED100)
- speed = SPEED_100;
- else
- speed = SPEED_10;
- }
-
/*
* If mode switch happens from 10BT to 100BT, all DSP/AFE
* registers are set to default values. So any AFE/DSP
* registers have to be re-initialized in this case.
*/
- if ((phydev->speed == SPEED_10) && (speed == SPEED_100)) {
+ if (phydev->state == PHY_RUNNING && phydev->speed == SPEED_100) {
int ret = rockchip_integrated_phy_analog_init(phydev);
+
if (ret)
phydev_err(phydev, "rockchip_integrated_phy_analog_init err: %d.\n",
ret);
.phy_id = INTERNAL_EPHY_ID,
.phy_id_mask = 0xfffffff0,
.name = "Rockchip integrated EPHY",
- .features = PHY_BASIC_FEATURES,
+ /* PHY_BASIC_FEATURES */
.flags = 0,
.link_change_notify = rockchip_link_change_notify,
.soft_reset = genphy_soft_reset,
.phy_id_mask = 0xfffffff0,
.name = "SMSC LAN83C185",
- .features = PHY_BASIC_FEATURES,
+ /* PHY_BASIC_FEATURES */
.probe = smsc_phy_probe,
.phy_id_mask = 0xfffffff0,
.name = "SMSC LAN8187",
- .features = PHY_BASIC_FEATURES,
+ /* PHY_BASIC_FEATURES */
.probe = smsc_phy_probe,
.phy_id_mask = 0xfffffff0,
.name = "SMSC LAN8700",
- .features = PHY_BASIC_FEATURES,
+ /* PHY_BASIC_FEATURES */
.probe = smsc_phy_probe,
.phy_id_mask = 0xfffffff0,
.name = "SMSC LAN911x Internal PHY",
- .features = PHY_BASIC_FEATURES,
+ /* PHY_BASIC_FEATURES */
.probe = smsc_phy_probe,
.phy_id_mask = 0xfffffff0,
.name = "SMSC LAN8710/LAN8720",
- .features = PHY_BASIC_FEATURES,
+ /* PHY_BASIC_FEATURES */
.flags = PHY_RST_AFTER_CLK_EN,
.probe = smsc_phy_probe,
.phy_id_mask = 0xfffffff0,
.name = "SMSC LAN8740",
- .features = PHY_BASIC_FEATURES,
+ /* PHY_BASIC_FEATURES */
.probe = smsc_phy_probe,
.phy_id = STE101P_PHY_ID,
.phy_id_mask = 0xfffffff0,
.name = "STe101p",
- .features = PHY_BASIC_FEATURES,
+ /* PHY_BASIC_FEATURES */
.config_init = ste10Xp_config_init,
.ack_interrupt = ste10Xp_ack_interrupt,
.config_intr = ste10Xp_config_intr,
.phy_id = STE100P_PHY_ID,
.phy_id_mask = 0xffffffff,
.name = "STe100p",
- .features = PHY_BASIC_FEATURES,
+ /* PHY_BASIC_FEATURES */
.config_init = ste10Xp_config_init,
.ack_interrupt = ste10Xp_ack_interrupt,
.config_intr = ste10Xp_config_intr,
.phy_id = UPD60620_PHY_ID,
.phy_id_mask = 0xfffffffe,
.name = "Renesas uPD60620",
- .features = PHY_BASIC_FEATURES,
+ /* PHY_BASIC_FEATURES */
.flags = 0,
.config_init = upd60620_config_init,
.read_status = upd60620_read_status,
.phy_id = PHY_ID_VSC8234,
.name = "Vitesse VSC8234",
.phy_id_mask = 0x000ffff0,
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.config_init = &vsc824x_config_init,
.config_aneg = &vsc82x4_config_aneg,
.ack_interrupt = &vsc824x_ack_interrupt,
.phy_id = PHY_ID_VSC8244,
.name = "Vitesse VSC8244",
.phy_id_mask = 0x000fffc0,
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.config_init = &vsc824x_config_init,
.config_aneg = &vsc82x4_config_aneg,
.ack_interrupt = &vsc824x_ack_interrupt,
.phy_id = PHY_ID_VSC8572,
.name = "Vitesse VSC8572",
.phy_id_mask = 0x000ffff0,
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.config_init = &vsc824x_config_init,
.config_aneg = &vsc82x4_config_aneg,
.ack_interrupt = &vsc824x_ack_interrupt,
.phy_id = PHY_ID_VSC8601,
.name = "Vitesse VSC8601",
.phy_id_mask = 0x000ffff0,
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.config_init = &vsc8601_config_init,
.ack_interrupt = &vsc824x_ack_interrupt,
.config_intr = &vsc82xx_config_intr,
.phy_id = PHY_ID_VSC7385,
.name = "Vitesse VSC7385",
.phy_id_mask = 0x000ffff0,
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.config_init = vsc738x_config_init,
.config_aneg = vsc73xx_config_aneg,
.read_page = vsc73xx_read_page,
.phy_id = PHY_ID_VSC7388,
.name = "Vitesse VSC7388",
.phy_id_mask = 0x000ffff0,
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.config_init = vsc738x_config_init,
.config_aneg = vsc73xx_config_aneg,
.read_page = vsc73xx_read_page,
.phy_id = PHY_ID_VSC7395,
.name = "Vitesse VSC7395",
.phy_id_mask = 0x000ffff0,
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.config_init = vsc739x_config_init,
.config_aneg = vsc73xx_config_aneg,
.read_page = vsc73xx_read_page,
.phy_id = PHY_ID_VSC7398,
.name = "Vitesse VSC7398",
.phy_id_mask = 0x000ffff0,
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.config_init = vsc739x_config_init,
.config_aneg = vsc73xx_config_aneg,
.read_page = vsc73xx_read_page,
.phy_id = PHY_ID_VSC8662,
.name = "Vitesse VSC8662",
.phy_id_mask = 0x000ffff0,
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.config_init = &vsc824x_config_init,
.config_aneg = &vsc82x4_config_aneg,
.ack_interrupt = &vsc824x_ack_interrupt,
.phy_id = PHY_ID_VSC8221,
.phy_id_mask = 0x000ffff0,
.name = "Vitesse VSC8221",
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.config_init = &vsc8221_config_init,
.ack_interrupt = &vsc824x_ack_interrupt,
.config_intr = &vsc82xx_config_intr,
.phy_id = PHY_ID_VSC8211,
.phy_id_mask = 0x000ffff0,
.name = "Vitesse VSC8211",
- .features = PHY_GBIT_FEATURES,
+ /* PHY_GBIT_FEATURES */
.config_init = &vsc8221_config_init,
.ack_interrupt = &vsc824x_ack_interrupt,
.config_intr = &vsc82xx_config_intr,
* Helpers
**********/
-#define team_port_exists(dev) (dev->priv_flags & IFF_TEAM_PORT)
-
static struct team_port *team_port_get_rtnl(const struct net_device *dev)
{
struct team_port *port = rtnl_dereference(dev->rx_handler_data);
- return team_port_exists(dev) ? port : NULL;
+ return netif_is_team_port(dev) ? port : NULL;
}
/*
return -EINVAL;
}
- if (team_port_exists(port_dev)) {
+ if (netif_is_team_port(port_dev)) {
NL_SET_ERR_MSG(extack, "Device is already a port of a team device");
netdev_err(dev, "Device %s is already a port "
"of a team device\n", portname);
}
static u16 team_select_queue(struct net_device *dev, struct sk_buff *skb,
- struct net_device *sb_dev,
- select_queue_fallback_t fallback)
+ struct net_device *sb_dev)
{
/*
* This helper function exists to help dev_pick_tx get the correct
{
.cmd = TEAM_CMD_NOOP,
.doit = team_nl_cmd_noop,
- .policy = team_nl_policy,
},
{
.cmd = TEAM_CMD_OPTIONS_SET,
.doit = team_nl_cmd_options_set,
- .policy = team_nl_policy,
.flags = GENL_ADMIN_PERM,
},
{
.cmd = TEAM_CMD_OPTIONS_GET,
.doit = team_nl_cmd_options_get,
- .policy = team_nl_policy,
.flags = GENL_ADMIN_PERM,
},
{
.cmd = TEAM_CMD_PORT_LIST_GET,
.doit = team_nl_cmd_port_list_get,
- .policy = team_nl_policy,
.flags = GENL_ADMIN_PERM,
},
};
.name = TEAM_GENL_NAME,
.version = TEAM_GENL_VERSION,
.maxattr = TEAM_ATTR_MAX,
+ .policy = team_nl_policy,
.netnsok = true,
.module = THIS_MODULE,
.ops = team_nl_ops,
}
static u16 tun_select_queue(struct net_device *dev, struct sk_buff *skb,
- struct net_device *sb_dev,
- select_queue_fallback_t fallback)
+ struct net_device *sb_dev)
{
struct tun_struct *tun = netdev_priv(dev);
u16 ret;
static void tun_automq_xmit(struct tun_struct *tun, struct sk_buff *skb)
{
#ifdef CONFIG_RPS
- if (tun->numqueues == 1 && static_key_false(&rps_needed)) {
+ if (tun->numqueues == 1 && static_branch_unlikely(&rps_needed)) {
/* Select queue was not called for the skbuff, so we extract the
* RPS hash and save it into the flow_table here.
*/
return err;
}
-static void tun_get_iff(struct net *net, struct tun_struct *tun,
- struct ifreq *ifr)
+static void tun_get_iff(struct tun_struct *tun, struct ifreq *ifr)
{
tun_debug(KERN_INFO, tun, "tun_get_iff\n");
tun_debug(KERN_INFO, tun, "tun_chr_ioctl cmd %u\n", cmd);
+ net = dev_net(tun->dev);
ret = 0;
switch (cmd) {
case TUNGETIFF:
- tun_get_iff(current->nsproxy->net_ns, tun, &ifr);
+ tun_get_iff(tun, &ifr);
if (tfile->detached)
ifr.ifr_flags |= IFF_DETACH_QUEUE;
ret = tun_net_change_carrier(tun->dev, (bool)carrier);
break;
+ case TUNGETDEVNETNS:
+ ret = -EPERM;
+ if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+ goto unlock;
+ ret = open_related_ns(&net->ns, get_net_ns);
+ break;
+
default:
ret = -EINVAL;
break;
rtnl_lock();
tun = tun_get(tfile);
if (tun)
- tun_get_iff(current->nsproxy->net_ns, tun, &ifr);
+ tun_get_iff(tun, &ifr);
rtnl_unlock();
if (tun)
#include <linux/usb/cdc_ncm.h>
#include <net/ipv6.h>
#include <net/addrconf.h>
+#include <net/ipv6_stubs.h>
/* alternative VLAN for IP session 0 if not untagged */
#define MBIM_IPS0_VID 4094
enum qmi_wwan_quirks {
QMI_WWAN_QUIRK_DTR = 1 << 0, /* needs "set DTR" request */
+ QMI_WWAN_QUIRK_QUECTEL_DYNCFG = 1 << 1, /* check num. endpoints */
};
struct qmimux_hdr {
.data = QMI_WWAN_QUIRK_DTR,
};
+static const struct driver_info qmi_wwan_info_quirk_quectel_dyncfg = {
+ .description = "WWAN/QMI device",
+ .flags = FLAG_WWAN | FLAG_SEND_ZLP,
+ .bind = qmi_wwan_bind,
+ .unbind = qmi_wwan_unbind,
+ .manage_power = qmi_wwan_manage_power,
+ .rx_fixup = qmi_wwan_rx_fixup,
+ .data = QMI_WWAN_QUIRK_DTR | QMI_WWAN_QUIRK_QUECTEL_DYNCFG,
+};
+
#define HUAWEI_VENDOR_ID 0x12D1
/* map QMI/wwan function by a fixed interface number */
#define QMI_GOBI_DEVICE(vend, prod) \
QMI_FIXED_INTF(vend, prod, 0)
+/* Quectel does not use fixed interface numbers on at least some of their
+ * devices. We need to check the number of endpoints to ensure that we bind to
+ * the correct interface.
+ */
+#define QMI_QUIRK_QUECTEL_DYNCFG(vend, prod) \
+ USB_DEVICE_AND_INTERFACE_INFO(vend, prod, USB_CLASS_VENDOR_SPEC, \
+ USB_SUBCLASS_VENDOR_SPEC, 0xff), \
+ .driver_info = (unsigned long)&qmi_wwan_info_quirk_quectel_dyncfg
+
static const struct usb_device_id products[] = {
/* 1. CDC ECM like devices match on the control interface */
{ /* Huawei E392, E398 and possibly others sharing both device id and more... */
USB_DEVICE_AND_INTERFACE_INFO(0x03f0, 0x581d, USB_CLASS_VENDOR_SPEC, 1, 7),
.driver_info = (unsigned long)&qmi_wwan_info,
},
- { /* Quectel EP06/EG06/EM06 */
- USB_DEVICE_AND_INTERFACE_INFO(0x2c7c, 0x0306,
- USB_CLASS_VENDOR_SPEC,
- USB_SUBCLASS_VENDOR_SPEC,
- 0xff),
- .driver_info = (unsigned long)&qmi_wwan_info_quirk_dtr,
- },
- { /* Quectel EG12/EM12 */
- USB_DEVICE_AND_INTERFACE_INFO(0x2c7c, 0x0512,
- USB_CLASS_VENDOR_SPEC,
- USB_SUBCLASS_VENDOR_SPEC,
- 0xff),
- .driver_info = (unsigned long)&qmi_wwan_info_quirk_dtr,
- },
+ {QMI_QUIRK_QUECTEL_DYNCFG(0x2c7c, 0x0125)}, /* Quectel EC25, EC20 R2.0 Mini PCIe */
+ {QMI_QUIRK_QUECTEL_DYNCFG(0x2c7c, 0x0306)}, /* Quectel EP06/EG06/EM06 */
+ {QMI_QUIRK_QUECTEL_DYNCFG(0x2c7c, 0x0512)}, /* Quectel EG12/EM12 */
/* 3. Combined interface devices matching on interface number */
{QMI_FIXED_INTF(0x0408, 0xea42, 4)}, /* Yota / Megafon M100-1 */
{QMI_FIXED_INTF(0x03f0, 0x9d1d, 1)}, /* HP lt4120 Snapdragon X5 LTE */
{QMI_FIXED_INTF(0x22de, 0x9061, 3)}, /* WeTelecom WPD-600N */
{QMI_QUIRK_SET_DTR(0x1e0e, 0x9001, 5)}, /* SIMCom 7100E, 7230E, 7600E ++ */
- {QMI_QUIRK_SET_DTR(0x2c7c, 0x0125, 4)}, /* Quectel EC25, EC20 R2.0 Mini PCIe */
{QMI_QUIRK_SET_DTR(0x2c7c, 0x0121, 4)}, /* Quectel EC21 Mini PCIe */
{QMI_QUIRK_SET_DTR(0x2c7c, 0x0191, 4)}, /* Quectel EG91 */
{QMI_FIXED_INTF(0x2c7c, 0x0296, 4)}, /* Quectel BG96 */
return false;
}
-static bool quectel_diag_detected(struct usb_interface *intf)
-{
- struct usb_device *dev = interface_to_usbdev(intf);
- struct usb_interface_descriptor intf_desc = intf->cur_altsetting->desc;
- u16 id_vendor = le16_to_cpu(dev->descriptor.idVendor);
- u16 id_product = le16_to_cpu(dev->descriptor.idProduct);
-
- if (id_vendor != 0x2c7c || intf_desc.bNumEndpoints != 2)
- return false;
-
- if (id_product == 0x0306 || id_product == 0x0512)
- return true;
- else
- return false;
-}
-
static int qmi_wwan_probe(struct usb_interface *intf,
const struct usb_device_id *prod)
{
struct usb_device_id *id = (struct usb_device_id *)prod;
struct usb_interface_descriptor *desc = &intf->cur_altsetting->desc;
+ const struct driver_info *info;
/* Workaround to enable dynamic IDs. This disables usbnet
* blacklisting functionality. Which, if required, can be
* we need to match on class/subclass/protocol. These values are
* identical for the diagnostic- and QMI-interface, but bNumEndpoints is
* different. Ignore the current interface if the number of endpoints
- * the number for the diag interface (two).
+ * equals the number for the diag interface (two).
*/
- if (quectel_diag_detected(intf))
- return -ENODEV;
+ info = (void *)&id->driver_info;
+
+ if (info->data & QMI_WWAN_QUIRK_QUECTEL_DYNCFG) {
+ if (desc->bNumEndpoints == 2)
+ return -ENODEV;
+ }
return usbnet_probe(intf, id);
}
goto amacout;
}
memcpy(sa->sa_data, buf, 6);
- ether_addr_copy(tp->netdev->dev_addr, sa->sa_data);
netif_info(tp, probe, tp->netdev,
"Using pass-thru MAC addr %pM\n", sa->sa_data);
return ret;
}
-static int set_ethernet_addr(struct r8152 *tp)
+static int determine_ethernet_addr(struct r8152 *tp, struct sockaddr *sa)
{
struct net_device *dev = tp->netdev;
- struct sockaddr sa;
int ret;
if (tp->version == RTL_VER_01) {
- ret = pla_ocp_read(tp, PLA_IDR, 8, sa.sa_data);
+ ret = pla_ocp_read(tp, PLA_IDR, 8, sa->sa_data);
} else {
/* if device doesn't support MAC pass through this will
* be expected to be non-zero
*/
- ret = vendor_mac_passthru_addr_read(tp, &sa);
+ ret = vendor_mac_passthru_addr_read(tp, sa);
if (ret < 0)
- ret = pla_ocp_read(tp, PLA_BACKUP, 8, sa.sa_data);
+ ret = pla_ocp_read(tp, PLA_BACKUP, 8, sa->sa_data);
}
if (ret < 0) {
netif_err(tp, probe, dev, "Get ether addr fail\n");
- } else if (!is_valid_ether_addr(sa.sa_data)) {
+ } else if (!is_valid_ether_addr(sa->sa_data)) {
netif_err(tp, probe, dev, "Invalid ether addr %pM\n",
- sa.sa_data);
+ sa->sa_data);
eth_hw_addr_random(dev);
- ether_addr_copy(sa.sa_data, dev->dev_addr);
- ret = rtl8152_set_mac_address(dev, &sa);
+ ether_addr_copy(sa->sa_data, dev->dev_addr);
netif_info(tp, probe, dev, "Random ether addr %pM\n",
- sa.sa_data);
- } else {
- if (tp->version == RTL_VER_01)
- ether_addr_copy(dev->dev_addr, sa.sa_data);
- else
- ret = rtl8152_set_mac_address(dev, &sa);
+ sa->sa_data);
+ return 0;
}
return ret;
}
+static int set_ethernet_addr(struct r8152 *tp)
+{
+ struct net_device *dev = tp->netdev;
+ struct sockaddr sa;
+ int ret;
+
+ ret = determine_ethernet_addr(tp, &sa);
+ if (ret < 0)
+ return ret;
+
+ if (tp->version == RTL_VER_01)
+ ether_addr_copy(dev->dev_addr, sa.sa_data);
+ else
+ ret = rtl8152_set_mac_address(dev, &sa);
+
+ return ret;
+}
+
static void read_bulk_callback(struct urb *urb)
{
struct net_device *netdev;
{
struct r8152 *tp = usb_get_intfdata(intf);
struct net_device *netdev;
+ struct sockaddr sa;
if (!tp)
return 0;
+ /* reset the MAC adddress in case of policy change */
+ if (determine_ethernet_addr(tp, &sa) >= 0) {
+ rtnl_lock();
+ dev_set_mac_address (tp->netdev, &sa, NULL);
+ rtnl_unlock();
+ }
+
netdev = tp->netdev;
if (!netif_running(netdev))
return 0;
}
}
-static int veth_get_ts_info(struct net_device *dev,
- struct ethtool_ts_info *info)
-{
- info->so_timestamping =
- SOF_TIMESTAMPING_TX_SOFTWARE |
- SOF_TIMESTAMPING_RX_SOFTWARE |
- SOF_TIMESTAMPING_SOFTWARE;
- info->phc_index = -1;
-
- return 0;
-}
-
static const struct ethtool_ops veth_ethtool_ops = {
.get_drvinfo = veth_get_drvinfo,
.get_link = ethtool_op_get_link,
.get_sset_count = veth_get_sset_count,
.get_ethtool_stats = veth_get_ethtool_stats,
.get_link_ksettings = veth_get_link_ksettings,
- .get_ts_info = veth_get_ts_info,
+ .get_ts_info = ethtool_op_get_ts_info,
};
/* general routines */
#include <linux/average.h>
#include <linux/filter.h>
#include <linux/kernel.h>
-#include <linux/pci.h>
#include <net/route.h>
#include <net/xdp.h>
#include <net/net_failover.h>
struct send_queue *sq = &vi->sq[qnum];
int err;
struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);
- bool kick = !skb->xmit_more;
+ bool kick = !netdev_xmit_more();
bool use_napi = sq->napi.weight;
/* Free up any pending old buffers before queueing new ones. */
dev->stats.tx_fifo_errors++;
if (net_ratelimit())
dev_warn(&dev->dev,
- "Unexpected TXQ (%d) queue failure: %d\n", qnum, err);
+ "Unexpected TXQ (%d) queue failure: %d\n",
+ qnum, err);
dev->stats.tx_dropped++;
dev_kfree_skb_any(skb);
return NETDEV_TX_OK;
return 0;
}
-static void virtnet_clean_affinity(struct virtnet_info *vi, long hcpu)
+static void virtnet_clean_affinity(struct virtnet_info *vi)
{
int i;
int stride;
if (!zalloc_cpumask_var(&mask, GFP_KERNEL)) {
- virtnet_clean_affinity(vi, -1);
+ virtnet_clean_affinity(vi);
return;
}
struct virtnet_info *vi = hlist_entry_safe(node, struct virtnet_info,
node);
- virtnet_clean_affinity(vi, cpu);
+ virtnet_clean_affinity(vi);
return 0;
}
if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_GUEST_OFFLOADS,
VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET, &sg)) {
- dev_warn(&vi->dev->dev, "Fail to set guest offload. \n");
+ dev_warn(&vi->dev->dev, "Fail to set guest offload.\n");
return -EINVAL;
}
{
struct virtio_device *vdev = vi->vdev;
- virtnet_clean_affinity(vi, -1);
+ virtnet_clean_affinity(vi);
vdev->config->del_vqs(vdev);
/* Should never trigger: MTU was previously validated
* in virtnet_validate.
*/
- dev_err(&vdev->dev, "device MTU appears to have changed "
- "it is now %d < %d", mtu, dev->min_mtu);
+ dev_err(&vdev->dev,
+ "device MTU appears to have changed it is now %d < %d",
+ mtu, dev->min_mtu);
goto free;
}
neigh = __neigh_create(&nd_tbl, nexthop, dst->dev, false);
if (!IS_ERR(neigh)) {
sock_confirm_neigh(skb, neigh);
- ret = neigh_output(neigh, skb);
+ ret = neigh_output(neigh, skb, false);
rcu_read_unlock_bh();
return ret;
}
struct net_device *dev = dst->dev;
unsigned int hh_len = LL_RESERVED_SPACE(dev);
struct neighbour *neigh;
- u32 nexthop;
+ bool is_v6gw = false;
int ret = -EINVAL;
nf_reset(skb);
rcu_read_lock_bh();
- nexthop = (__force u32)rt_nexthop(rt, ip_hdr(skb)->daddr);
- neigh = __ipv4_neigh_lookup_noref(dev, nexthop);
- if (unlikely(!neigh))
- neigh = __neigh_create(&arp_tbl, &nexthop, dev, false);
+ neigh = ip_neigh_for_gw(rt, skb, &is_v6gw);
if (!IS_ERR(neigh)) {
sock_confirm_neigh(skb, neigh);
- ret = neigh_output(neigh, skb);
+ /* if crossing protocols, can not use the cached header */
+ ret = neigh_output(neigh, skb, is_v6gw);
rcu_read_unlock_bh();
return ret;
}
#include <linux/ethtool.h>
#include <net/arp.h>
#include <net/ndisc.h>
+#include <net/ipv6_stubs.h>
#include <net/ip.h>
#include <net/icmp.h>
#include <net/rtnetlink.h>
static const struct genl_ops hwsim_ops[] = {
{
.cmd = HWSIM_CMD_REGISTER,
- .policy = hwsim_genl_policy,
.doit = hwsim_register_received_nl,
.flags = GENL_UNS_ADMIN_PERM,
},
{
.cmd = HWSIM_CMD_FRAME,
- .policy = hwsim_genl_policy,
.doit = hwsim_cloned_frame_received_nl,
},
{
.cmd = HWSIM_CMD_TX_INFO_FRAME,
- .policy = hwsim_genl_policy,
.doit = hwsim_tx_info_frame_received_nl,
},
{
.cmd = HWSIM_CMD_NEW_RADIO,
- .policy = hwsim_genl_policy,
.doit = hwsim_new_radio_nl,
.flags = GENL_UNS_ADMIN_PERM,
},
{
.cmd = HWSIM_CMD_DEL_RADIO,
- .policy = hwsim_genl_policy,
.doit = hwsim_del_radio_nl,
.flags = GENL_UNS_ADMIN_PERM,
},
{
.cmd = HWSIM_CMD_GET_RADIO,
- .policy = hwsim_genl_policy,
.doit = hwsim_get_radio_nl,
.dumpit = hwsim_dump_radio_nl,
},
.name = "MAC80211_HWSIM",
.version = 1,
.maxattr = HWSIM_ATTR_MAX,
+ .policy = hwsim_genl_policy,
.netnsok = true,
.module = THIS_MODULE,
.ops = hwsim_ops,
static u16
mwifiex_netdev_select_wmm_queue(struct net_device *dev, struct sk_buff *skb,
- struct net_device *sb_dev,
- select_queue_fallback_t fallback)
+ struct net_device *sb_dev)
{
skb->priority = cfg80211_classify8021d(skb, NULL);
return mwifiex_1d_to_wmm_queue[skb->priority];
struct xenvif_hash_cache cache;
};
+struct backend_info {
+ struct xenbus_device *dev;
+ struct xenvif *vif;
+
+ /* This is the state that will be reflected in xenstore when any
+ * active hotplug script completes.
+ */
+ enum xenbus_state state;
+
+ enum xenbus_state frontend_state;
+ struct xenbus_watch hotplug_status_watch;
+ u8 have_hotplug_status_watch:1;
+
+ const char *hotplug_script;
+};
+
struct xenvif {
/* Unique identifier for this interface. */
domid_t domid;
struct xenbus_watch credit_watch;
struct xenbus_watch mcast_ctrl_watch;
+ struct backend_info *be;
+
spinlock_t lock;
#ifdef CONFIG_DEBUG_FS
}
static u16 xenvif_select_queue(struct net_device *dev, struct sk_buff *skb,
- struct net_device *sb_dev,
- select_queue_fallback_t fallback)
+ struct net_device *sb_dev)
{
struct xenvif *vif = netdev_priv(dev);
unsigned int size = vif->hash.size;
return 0;
if (vif->hash.alg == XEN_NETIF_CTRL_HASH_ALGORITHM_NONE)
- return fallback(dev, skb, NULL) % dev->real_num_tx_queues;
+ return netdev_pick_tx(dev, skb, NULL) %
+ dev->real_num_tx_queues;
xenvif_set_skb_hash(vif, skb);
#include <linux/vmalloc.h>
#include <linux/rtnetlink.h>
-struct backend_info {
- struct xenbus_device *dev;
- struct xenvif *vif;
-
- /* This is the state that will be reflected in xenstore when any
- * active hotplug script completes.
- */
- enum xenbus_state state;
-
- enum xenbus_state frontend_state;
- struct xenbus_watch hotplug_status_watch;
- u8 have_hotplug_status_watch:1;
-
- const char *hotplug_script;
-};
-
static int connect_data_rings(struct backend_info *be,
struct xenvif_queue *queue);
static void connect(struct backend_info *be);
return err;
}
be->vif = vif;
+ vif->be = be;
kobject_uevent(&dev->dev.kobj, KOBJ_ONLINE);
return 0;
}
static u16 xennet_select_queue(struct net_device *dev, struct sk_buff *skb,
- struct net_device *sb_dev,
- select_queue_fallback_t fallback)
+ struct net_device *sb_dev)
{
unsigned int num_queues = dev->real_num_tx_queues;
u32 hash;
case XenbusStateClosed:
if (dev->state == XenbusStateClosed)
break;
- /* Missed the backend's CLOSING state -- fallthrough */
+ /* Fall through - Missed the backend's CLOSING state. */
case XenbusStateClosing:
xenbus_frontend_closed(dev);
break;
#ifndef __QETH_CORE_H__
#define __QETH_CORE_H__
+#include <linux/completion.h>
#include <linux/if.h>
#include <linux/if_arp.h>
#include <linux/etherdevice.h>
#include <linux/hashtable.h>
#include <linux/ip.h>
#include <linux/refcount.h>
+#include <linux/wait.h>
#include <linux/workqueue.h>
#include <net/ipv6.h>
/* QDIO queue and buffer handling */
/*****************************************************************************/
#define QETH_MAX_QUEUES 4
+#define QETH_IQD_MIN_TXQ 2 /* One for ucast, one for mcast. */
+#define QETH_IQD_MCAST_TXQ 0
+#define QETH_IQD_MIN_UCAST_TXQ 1
#define QETH_IN_BUF_SIZE_DEFAULT 65536
#define QETH_IN_BUF_COUNT_DEFAULT 64
#define QETH_IN_BUF_COUNT_HSDEFAULT 128
u64 rx_errors;
u64 rx_dropped;
u64 rx_multicast;
- u64 tx_errors;
};
struct qeth_out_q_stats {
u64 skbs_linearized_fail;
u64 tso_bytes;
u64 packing_mode_switch;
+ u64 stopped;
/* rtnl_link_stats64 */
u64 tx_packets;
atomic_t set_pci_flags_count;
};
+static inline bool qeth_out_queue_is_full(struct qeth_qdio_out_q *queue)
+{
+ return atomic_read(&queue->used_buffers) >= QDIO_MAX_BUFFERS_PER_Q;
+}
+
struct qeth_qdio_info {
atomic_t state;
/* input */
enum qeth_channel_states {
CH_STATE_UP,
CH_STATE_DOWN,
- CH_STATE_ACTIVATING,
CH_STATE_HALTED,
CH_STATE_STOPPED,
CH_STATE_RCD,
enum qeth_cmd_buffer_state state;
struct qeth_channel *channel;
struct qeth_reply *reply;
+ long timeout;
unsigned char *data;
+ void (*finalize)(struct qeth_card *card, struct qeth_cmd_buffer *iob,
+ unsigned int length);
void (*callback)(struct qeth_card *card, struct qeth_channel *channel,
struct qeth_cmd_buffer *iob);
};
int io_buf_no;
};
+static inline bool qeth_trylock_channel(struct qeth_channel *channel)
+{
+ return atomic_cmpxchg(&channel->irq_pending, 0, 1) == 0;
+}
+
/**
* OSA card related definitions
*/
struct qeth_reply {
struct list_head list;
- wait_queue_head_t wait_q;
+ struct completion received;
int (*callback)(struct qeth_card *, struct qeth_reply *,
unsigned long);
u32 seqno;
unsigned long offset;
- atomic_t received;
int rc;
void *param;
refcount_t refcnt;
struct qeth_card_options options;
struct workqueue_struct *event_wq;
+ struct workqueue_struct *cmd_wq;
wait_queue_head_t wait_q;
- spinlock_t mclock;
unsigned long active_vlans[BITS_TO_LONGS(VLAN_N_VID)];
DECLARE_HASHTABLE(mac_htable, 4);
DECLARE_HASHTABLE(ip_htable, 4);
+ struct mutex ip_lock;
DECLARE_HASHTABLE(ip_mc_htable, 4);
+ struct work_struct rx_mode_work;
struct work_struct kernel_thread_starter;
spinlock_t thread_mask_lock;
unsigned long thread_start_mask;
unsigned long thread_allowed_mask;
unsigned long thread_running_mask;
- spinlock_t ip_lock;
struct qeth_ipato ipato;
struct list_head cmd_waiter_list;
/* QDIO buffer handling */
return dev->netdev_ops != NULL;
}
+static inline u16 qeth_iqd_translate_txq(struct net_device *dev, u16 txq)
+{
+ if (txq == QETH_IQD_MCAST_TXQ)
+ return dev->num_tx_queues - 1;
+ if (txq == dev->num_tx_queues - 1)
+ return QETH_IQD_MCAST_TXQ;
+ return txq;
+}
+
static inline void qeth_scrub_qdio_buffer(struct qdio_buffer *buf,
unsigned int elements)
{
data, QETH_PROT_IPV6);
}
-int qeth_get_priority_queue(struct qeth_card *card, struct sk_buff *skb,
- int ipv);
-static inline struct qeth_qdio_out_q *qeth_get_tx_queue(struct qeth_card *card,
- struct sk_buff *skb,
- int ipv, int cast_type)
-{
- if (IS_IQD(card) && cast_type != RTN_UNICAST)
- return card->qdio.out_qs[card->qdio.no_out_queues - 1];
- if (!card->qdio.do_prio_queueing)
- return card->qdio.out_qs[card->qdio.default_out_queue];
- return card->qdio.out_qs[qeth_get_priority_queue(card, skb, ipv)];
-}
+int qeth_get_priority_queue(struct qeth_card *card, struct sk_buff *skb);
extern struct qeth_discipline qeth_l2_discipline;
extern struct qeth_discipline qeth_l3_discipline;
int qeth_qdio_clear_card(struct qeth_card *, int);
void qeth_clear_working_pool_list(struct qeth_card *);
void qeth_clear_cmd_buffers(struct qeth_channel *);
-void qeth_clear_qdio_buffers(struct qeth_card *);
+void qeth_drain_output_queues(struct qeth_card *card);
void qeth_setadp_promisc_mode(struct qeth_card *);
int qeth_setadpparms_change_macaddr(struct qeth_card *);
void qeth_tx_timeout(struct net_device *);
-void qeth_prepare_control_data(struct qeth_card *, int,
- struct qeth_cmd_buffer *);
void qeth_release_buffer(struct qeth_channel *, struct qeth_cmd_buffer *);
void qeth_prepare_ipa_cmd(struct qeth_card *card, struct qeth_cmd_buffer *iob,
u16 cmd_length);
struct net_device *dev,
netdev_features_t features);
void qeth_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats);
+u16 qeth_iqd_select_queue(struct net_device *dev, struct sk_buff *skb,
+ u8 cast_type, struct net_device *sb_dev);
int qeth_open(struct net_device *dev);
int qeth_stop(struct net_device *dev);
static struct device *qeth_core_root_dev;
static struct lock_class_key qdio_out_skb_queue_key;
-static void qeth_send_control_data_cb(struct qeth_card *card,
- struct qeth_channel *channel,
- struct qeth_cmd_buffer *iob);
+static void qeth_issue_next_read_cb(struct qeth_card *card,
+ struct qeth_channel *channel,
+ struct qeth_cmd_buffer *iob);
static struct qeth_cmd_buffer *qeth_get_buffer(struct qeth_channel *);
static void qeth_free_buffer_pool(struct qeth_card *);
static int qeth_qdio_establish(struct qeth_card *);
-static void qeth_free_qdio_buffers(struct qeth_card *);
+static void qeth_free_qdio_queues(struct qeth_card *card);
static void qeth_notify_skbs(struct qeth_qdio_out_q *queue,
struct qeth_qdio_out_buffer *buf,
enum iucv_tx_notify notification);
CARD_DEVID(card));
return -ENOMEM;
}
+
qeth_setup_ccw(channel->ccw, CCW_CMD_READ, QETH_BUFSIZE, iob->data);
+ iob->callback = qeth_issue_next_read_cb;
QETH_CARD_TEXT(card, 6, "noirqpnd");
rc = ccw_device_start(channel->ccwdev, channel->ccw,
(addr_t) iob, 0, 0);
{
struct qeth_reply *reply;
- reply = kzalloc(sizeof(struct qeth_reply), GFP_ATOMIC);
+ reply = kzalloc(sizeof(*reply), GFP_KERNEL);
if (reply) {
refcount_set(&reply->refcnt, 1);
- atomic_set(&reply->received, 0);
- init_waitqueue_head(&reply->wait_q);
+ init_completion(&reply->received);
}
return reply;
}
spin_unlock_irq(&card->lock);
}
-static void qeth_notify_reply(struct qeth_reply *reply)
+static void qeth_notify_reply(struct qeth_reply *reply, int reason)
{
- atomic_inc(&reply->received);
- wake_up(&reply->wait_q);
+ reply->rc = reason;
+ complete(&reply->received);
}
static void qeth_issue_ipa_msg(struct qeth_ipa_cmd *cmd, int rc,
QETH_CARD_TEXT(card, 4, "clipalst");
spin_lock_irqsave(&card->lock, flags);
- list_for_each_entry(reply, &card->cmd_waiter_list, list) {
- reply->rc = -EIO;
- qeth_notify_reply(reply);
- }
+ list_for_each_entry(reply, &card->cmd_waiter_list, list)
+ qeth_notify_reply(reply, -EIO);
spin_unlock_irqrestore(&card->lock, flags);
}
EXPORT_SYMBOL_GPL(qeth_clear_ipacmd_list);
static int qeth_check_idx_response(struct qeth_card *card,
unsigned char *buffer)
{
- if (!buffer)
- return 0;
-
QETH_DBF_HEX(CTRL, 2, buffer, QETH_DBF_CTRL_LEN);
if ((buffer[2] & 0xc0) == 0xc0) {
QETH_DBF_MESSAGE(2, "received an IDX TERMINATE with cause code %#04x\n",
do {
if (channel->iob[index].state == BUF_STATE_FREE) {
channel->iob[index].state = BUF_STATE_LOCKED;
+ channel->iob[index].timeout = QETH_TIMEOUT;
channel->io_buf_no = (channel->io_buf_no + 1) %
QETH_CMD_BUFFER_NO;
memset(channel->iob[index].data, 0, QETH_BUFSIZE);
spin_lock_irqsave(&channel->iob_lock, flags);
iob->state = BUF_STATE_FREE;
- iob->callback = qeth_send_control_data_cb;
+ iob->callback = NULL;
if (iob->reply) {
qeth_put_reply(iob->reply);
iob->reply = NULL;
{
struct qeth_reply *reply = iob->reply;
- if (reply) {
- reply->rc = rc;
- qeth_notify_reply(reply);
- }
+ if (reply)
+ qeth_notify_reply(reply, rc);
qeth_release_buffer(iob->channel, iob);
}
}
EXPORT_SYMBOL_GPL(qeth_clear_cmd_buffers);
-static void qeth_send_control_data_cb(struct qeth_card *card,
- struct qeth_channel *channel,
- struct qeth_cmd_buffer *iob)
+static void qeth_issue_next_read_cb(struct qeth_card *card,
+ struct qeth_channel *channel,
+ struct qeth_cmd_buffer *iob)
{
struct qeth_ipa_cmd *cmd = NULL;
struct qeth_reply *reply = NULL;
}
}
- if (rc <= 0) {
- reply->rc = rc;
- qeth_notify_reply(reply);
- }
-
+ if (rc <= 0)
+ qeth_notify_reply(reply, rc);
qeth_put_reply(reply);
out:
atomic_set(&buf->state, QETH_QDIO_BUF_EMPTY);
}
-static void qeth_clear_outq_buffers(struct qeth_qdio_out_q *q, int free)
+static void qeth_drain_output_queue(struct qeth_qdio_out_q *q, bool free)
{
int j;
}
}
-void qeth_clear_qdio_buffers(struct qeth_card *card)
+void qeth_drain_output_queues(struct qeth_card *card)
{
int i;
QETH_CARD_TEXT(card, 2, "clearqdbf");
/* clear outbound buffers to free skbs */
for (i = 0; i < card->qdio.no_out_queues; ++i) {
- if (card->qdio.out_qs[i]) {
- qeth_clear_outq_buffers(card->qdio.out_qs[i], 0);
- }
+ if (card->qdio.out_qs[i])
+ qeth_drain_output_queue(card->qdio.out_qs[i], false);
}
}
-EXPORT_SYMBOL_GPL(qeth_clear_qdio_buffers);
+EXPORT_SYMBOL_GPL(qeth_drain_output_queues);
static void qeth_free_buffer_pool(struct qeth_card *card)
{
break;
channel->iob[cnt].state = BUF_STATE_FREE;
channel->iob[cnt].channel = channel;
- channel->iob[cnt].callback = qeth_send_control_data_cb;
}
if (cnt < QETH_CMD_BUFFER_NO) {
qeth_clean_channel(channel);
return 0;
}
-static void qeth_set_single_write_queues(struct qeth_card *card)
+static void qeth_osa_set_output_queues(struct qeth_card *card, bool single)
{
- if ((atomic_read(&card->qdio.state) != QETH_QDIO_UNINITIALIZED) &&
- (card->qdio.no_out_queues == 4))
- qeth_free_qdio_buffers(card);
+ unsigned int count = single ? 1 : card->dev->num_tx_queues;
- card->qdio.no_out_queues = 1;
- if (card->qdio.default_out_queue != 0)
- dev_info(&card->gdev->dev, "Priority Queueing not supported\n");
+ rtnl_lock();
+ netif_set_real_num_tx_queues(card->dev, count);
+ rtnl_unlock();
- card->qdio.default_out_queue = 0;
-}
+ if (card->qdio.no_out_queues == count)
+ return;
-static void qeth_set_multiple_write_queues(struct qeth_card *card)
-{
- if ((atomic_read(&card->qdio.state) != QETH_QDIO_UNINITIALIZED) &&
- (card->qdio.no_out_queues == 1)) {
- qeth_free_qdio_buffers(card);
- card->qdio.default_out_queue = 2;
- }
- card->qdio.no_out_queues = 4;
+ if (atomic_read(&card->qdio.state) != QETH_QDIO_UNINITIALIZED)
+ qeth_free_qdio_queues(card);
+
+ if (count == 1)
+ dev_info(&card->gdev->dev, "Priority Queueing not supported\n");
+
+ card->qdio.default_out_queue = single ? 0 : QETH_DEFAULT_QUEUE;
+ card->qdio.no_out_queues = count;
}
-static void qeth_update_from_chp_desc(struct qeth_card *card)
+static int qeth_update_from_chp_desc(struct qeth_card *card)
{
struct ccw_device *ccwdev;
struct channel_path_desc_fmt0 *chp_dsc;
ccwdev = card->data.ccwdev;
chp_dsc = ccw_device_get_chp_desc(ccwdev, 0);
if (!chp_dsc)
- goto out;
+ return -ENOMEM;
card->info.func_level = 0x4100 + chp_dsc->desc;
- if (card->info.type == QETH_CARD_TYPE_IQD)
- goto out;
- /* CHPP field bit 6 == 1 -> single queue */
- if ((chp_dsc->chpp & 0x02) == 0x02)
- qeth_set_single_write_queues(card);
- else
- qeth_set_multiple_write_queues(card);
-out:
+ if (IS_OSD(card) || IS_OSX(card))
+ /* CHPP field bit 6 == 1 -> single queue */
+ qeth_osa_set_output_queues(card, chp_dsc->chpp & 0x02);
+
kfree(chp_dsc);
QETH_DBF_TEXT_(SETUP, 2, "nr:%x", card->qdio.no_out_queues);
QETH_DBF_TEXT_(SETUP, 2, "lvl:%02x", card->info.func_level);
+ return 0;
}
static void qeth_init_qdio_info(struct qeth_card *card)
atomic_set(&card->qdio.state, QETH_QDIO_UNINITIALIZED);
card->qdio.do_prio_queueing = QETH_PRIOQ_DEFAULT;
card->qdio.default_out_queue = QETH_DEFAULT_QUEUE;
- card->qdio.no_out_queues = QETH_MAX_QUEUES;
/* inbound */
card->qdio.no_in_queues = 1;
card->info.type = CARD_RDEV(card)->id.driver_info;
card->state = CARD_STATE_DOWN;
- spin_lock_init(&card->mclock);
spin_lock_init(&card->lock);
- spin_lock_init(&card->ip_lock);
spin_lock_init(&card->thread_mask_lock);
mutex_init(&card->conf_mutex);
mutex_init(&card->discipline_mutex);
CARD_WDEV(card) = gdev->cdev[1];
CARD_DDEV(card) = gdev->cdev[2];
- card->event_wq = alloc_ordered_workqueue("%s", 0, dev_name(&gdev->dev));
+ card->event_wq = alloc_ordered_workqueue("%s_event", 0,
+ dev_name(&gdev->dev));
if (!card->event_wq)
goto out_wq;
if (qeth_setup_channel(&card->read, true))
}
}
-static int qeth_idx_activate_get_answer(struct qeth_card *card,
- struct qeth_channel *channel,
- void (*reply_cb)(struct qeth_card *,
- struct qeth_channel *,
- struct qeth_cmd_buffer *))
-{
- struct qeth_cmd_buffer *iob;
- int rc;
-
- QETH_DBF_TEXT(SETUP, 2, "idxanswr");
- iob = qeth_get_buffer(channel);
- if (!iob)
- return -ENOMEM;
- iob->callback = reply_cb;
- qeth_setup_ccw(channel->ccw, CCW_CMD_READ, QETH_BUFSIZE, iob->data);
-
- wait_event(card->wait_q,
- atomic_cmpxchg(&channel->irq_pending, 0, 1) == 0);
- QETH_DBF_TEXT(SETUP, 6, "noirqpnd");
- spin_lock_irq(get_ccwdev_lock(channel->ccwdev));
- rc = ccw_device_start_timeout(channel->ccwdev, channel->ccw,
- (addr_t) iob, 0, 0, QETH_TIMEOUT);
- spin_unlock_irq(get_ccwdev_lock(channel->ccwdev));
-
- if (rc) {
- QETH_DBF_MESSAGE(2, "Error2 in activating channel rc=%d\n", rc);
- QETH_DBF_TEXT_(SETUP, 2, "2err%d", rc);
- atomic_set(&channel->irq_pending, 0);
- qeth_release_buffer(channel, iob);
- wake_up(&card->wait_q);
- return rc;
- }
- rc = wait_event_interruptible_timeout(card->wait_q,
- channel->state == CH_STATE_UP, QETH_TIMEOUT);
- if (rc == -ERESTARTSYS)
- return rc;
- if (channel->state != CH_STATE_UP) {
- rc = -ETIME;
- QETH_DBF_TEXT_(SETUP, 2, "3err%d", rc);
- } else
- rc = 0;
- return rc;
-}
-
-static int qeth_idx_activate_channel(struct qeth_card *card,
- struct qeth_channel *channel,
- void (*reply_cb)(struct qeth_card *,
- struct qeth_channel *,
- struct qeth_cmd_buffer *))
+static void qeth_idx_finalize_cmd(struct qeth_card *card,
+ struct qeth_cmd_buffer *iob,
+ unsigned int length)
{
- struct qeth_cmd_buffer *iob;
- __u16 temp;
- __u8 tmp;
- int rc;
- struct ccw_dev_id temp_devid;
-
- QETH_DBF_TEXT(SETUP, 2, "idxactch");
+ qeth_setup_ccw(iob->channel->ccw, CCW_CMD_WRITE, length, iob->data);
- iob = qeth_get_buffer(channel);
- if (!iob)
- return -ENOMEM;
- iob->callback = reply_cb;
- qeth_setup_ccw(channel->ccw, CCW_CMD_WRITE, IDX_ACTIVATE_SIZE,
- iob->data);
- if (channel == &card->write) {
- memcpy(iob->data, IDX_ACTIVATE_WRITE, IDX_ACTIVATE_SIZE);
- memcpy(QETH_TRANSPORT_HEADER_SEQ_NO(iob->data),
- &card->seqno.trans_hdr, QETH_SEQ_NO_LENGTH);
+ memcpy(QETH_TRANSPORT_HEADER_SEQ_NO(iob->data), &card->seqno.trans_hdr,
+ QETH_SEQ_NO_LENGTH);
+ if (iob->channel == &card->write)
card->seqno.trans_hdr++;
- } else {
- memcpy(iob->data, IDX_ACTIVATE_READ, IDX_ACTIVATE_SIZE);
- memcpy(QETH_TRANSPORT_HEADER_SEQ_NO(iob->data),
- &card->seqno.trans_hdr, QETH_SEQ_NO_LENGTH);
- }
- tmp = ((u8)card->dev->dev_port) | 0x80;
- memcpy(QETH_IDX_ACT_PNO(iob->data), &tmp, 1);
- memcpy(QETH_IDX_ACT_ISSUER_RM_TOKEN(iob->data),
- &card->token.issuer_rm_w, QETH_MPC_TOKEN_LENGTH);
- memcpy(QETH_IDX_ACT_FUNC_LEVEL(iob->data),
- &card->info.func_level, sizeof(__u16));
- ccw_device_get_id(CARD_DDEV(card), &temp_devid);
- memcpy(QETH_IDX_ACT_QDIO_DEV_CUA(iob->data), &temp_devid.devno, 2);
- temp = (card->info.cula << 8) + card->info.unit_addr2;
- memcpy(QETH_IDX_ACT_QDIO_DEV_REALADDR(iob->data), &temp, 2);
-
- wait_event(card->wait_q,
- atomic_cmpxchg(&channel->irq_pending, 0, 1) == 0);
- QETH_DBF_TEXT(SETUP, 6, "noirqpnd");
- spin_lock_irq(get_ccwdev_lock(channel->ccwdev));
- rc = ccw_device_start_timeout(channel->ccwdev, channel->ccw,
- (addr_t) iob, 0, 0, QETH_TIMEOUT);
- spin_unlock_irq(get_ccwdev_lock(channel->ccwdev));
-
- if (rc) {
- QETH_DBF_MESSAGE(2, "Error1 in activating channel. rc=%d\n",
- rc);
- QETH_DBF_TEXT_(SETUP, 2, "1err%d", rc);
- atomic_set(&channel->irq_pending, 0);
- qeth_release_buffer(channel, iob);
- wake_up(&card->wait_q);
- return rc;
- }
- rc = wait_event_interruptible_timeout(card->wait_q,
- channel->state == CH_STATE_ACTIVATING, QETH_TIMEOUT);
- if (rc == -ERESTARTSYS)
- return rc;
- if (channel->state != CH_STATE_ACTIVATING) {
- dev_warn(&channel->ccwdev->dev, "The qeth device driver"
- " failed to recover an error on the device\n");
- QETH_DBF_MESSAGE(2, "IDX activate timed out on channel %x\n",
- CCW_DEVID(channel->ccwdev));
- QETH_DBF_TEXT_(SETUP, 2, "2err%d", -ETIME);
- return -ETIME;
- }
- return qeth_idx_activate_get_answer(card, channel, reply_cb);
}
static int qeth_peer_func_level(int level)
return level;
}
-static void qeth_idx_write_cb(struct qeth_card *card,
- struct qeth_channel *channel,
- struct qeth_cmd_buffer *iob)
-{
- __u16 temp;
-
- QETH_DBF_TEXT(SETUP , 2, "idxwrcb");
-
- if (channel->state == CH_STATE_DOWN) {
- channel->state = CH_STATE_ACTIVATING;
- goto out;
- }
-
- if (!(QETH_IS_IDX_ACT_POS_REPLY(iob->data))) {
- if (QETH_IDX_ACT_CAUSE_CODE(iob->data) == QETH_IDX_ACT_ERR_EXCL)
- dev_err(&channel->ccwdev->dev,
- "The adapter is used exclusively by another "
- "host\n");
- else
- QETH_DBF_MESSAGE(2, "IDX_ACTIVATE on channel %x: negative reply\n",
- CCW_DEVID(channel->ccwdev));
- goto out;
- }
- memcpy(&temp, QETH_IDX_ACT_FUNC_LEVEL(iob->data), 2);
- if ((temp & ~0x0100) != qeth_peer_func_level(card->info.func_level)) {
- QETH_DBF_MESSAGE(2, "IDX_ACTIVATE on channel %x: function level mismatch (sent: %#x, received: %#x)\n",
- CCW_DEVID(channel->ccwdev),
- card->info.func_level, temp);
- goto out;
- }
- channel->state = CH_STATE_UP;
-out:
- qeth_release_buffer(channel, iob);
-}
-
-static void qeth_idx_read_cb(struct qeth_card *card,
- struct qeth_channel *channel,
- struct qeth_cmd_buffer *iob)
-{
- __u16 temp;
-
- QETH_DBF_TEXT(SETUP , 2, "idxrdcb");
- if (channel->state == CH_STATE_DOWN) {
- channel->state = CH_STATE_ACTIVATING;
- goto out;
- }
-
- if (qeth_check_idx_response(card, iob->data))
- goto out;
-
- if (!(QETH_IS_IDX_ACT_POS_REPLY(iob->data))) {
- switch (QETH_IDX_ACT_CAUSE_CODE(iob->data)) {
- case QETH_IDX_ACT_ERR_EXCL:
- dev_err(&channel->ccwdev->dev,
- "The adapter is used exclusively by another "
- "host\n");
- break;
- case QETH_IDX_ACT_ERR_AUTH:
- case QETH_IDX_ACT_ERR_AUTH_USER:
- dev_err(&channel->ccwdev->dev,
- "Setting the device online failed because of "
- "insufficient authorization\n");
- break;
- default:
- QETH_DBF_MESSAGE(2, "IDX_ACTIVATE on channel %x: negative reply\n",
- CCW_DEVID(channel->ccwdev));
- }
- QETH_CARD_TEXT_(card, 2, "idxread%c",
- QETH_IDX_ACT_CAUSE_CODE(iob->data));
- goto out;
- }
-
- memcpy(&temp, QETH_IDX_ACT_FUNC_LEVEL(iob->data), 2);
- if (temp != qeth_peer_func_level(card->info.func_level)) {
- QETH_DBF_MESSAGE(2, "IDX_ACTIVATE on channel %x: function level mismatch (sent: %#x, received: %#x)\n",
- CCW_DEVID(channel->ccwdev),
- card->info.func_level, temp);
- goto out;
- }
- memcpy(&card->token.issuer_rm_r,
- QETH_IDX_ACT_ISSUER_RM_TOKEN(iob->data),
- QETH_MPC_TOKEN_LENGTH);
- memcpy(&card->info.mcl_level[0],
- QETH_IDX_REPLY_LEVEL(iob->data), QETH_MCL_LENGTH);
- channel->state = CH_STATE_UP;
-out:
- qeth_release_buffer(channel, iob);
-}
-
-void qeth_prepare_control_data(struct qeth_card *card, int len,
- struct qeth_cmd_buffer *iob)
+static void qeth_mpc_finalize_cmd(struct qeth_card *card,
+ struct qeth_cmd_buffer *iob,
+ unsigned int length)
{
- qeth_setup_ccw(iob->channel->ccw, CCW_CMD_WRITE, len, iob->data);
- iob->callback = qeth_release_buffer_cb;
+ qeth_idx_finalize_cmd(card, iob, length);
- memcpy(QETH_TRANSPORT_HEADER_SEQ_NO(iob->data),
- &card->seqno.trans_hdr, QETH_SEQ_NO_LENGTH);
- card->seqno.trans_hdr++;
memcpy(QETH_PDU_HEADER_SEQ_NO(iob->data),
&card->seqno.pdu_hdr, QETH_SEQ_NO_LENGTH);
card->seqno.pdu_hdr++;
memcpy(QETH_PDU_HEADER_ACK_SEQ_NO(iob->data),
&card->seqno.pdu_hdr_ack, QETH_SEQ_NO_LENGTH);
- QETH_DBF_HEX(CTRL, 2, iob->data, min(len, QETH_DBF_CTRL_LEN));
+
+ iob->reply->seqno = QETH_IDX_COMMAND_SEQNO;
+ iob->callback = qeth_release_buffer_cb;
}
-EXPORT_SYMBOL_GPL(qeth_prepare_control_data);
/**
* qeth_send_control_data() - send control command to the card
void *reply_param)
{
struct qeth_channel *channel = iob->channel;
+ long timeout = iob->timeout;
int rc;
struct qeth_reply *reply = NULL;
- unsigned long timeout, event_timeout;
- struct qeth_ipa_cmd *cmd = NULL;
QETH_CARD_TEXT(card, 2, "sendctl");
- if (card->read_or_write_problem) {
- qeth_release_buffer(channel, iob);
- return -EIO;
- }
reply = qeth_alloc_reply(card);
if (!reply) {
qeth_release_buffer(channel, iob);
qeth_get_reply(reply);
iob->reply = reply;
- while (atomic_cmpxchg(&channel->irq_pending, 0, 1)) ;
-
- if (IS_IPA(iob->data)) {
- cmd = __ipa_cmd(iob);
- cmd->hdr.seqno = card->seqno.ipa++;
- reply->seqno = cmd->hdr.seqno;
- event_timeout = QETH_IPA_TIMEOUT;
- } else {
- reply->seqno = QETH_IDX_COMMAND_SEQNO;
- event_timeout = QETH_TIMEOUT;
+ timeout = wait_event_interruptible_timeout(card->wait_q,
+ qeth_trylock_channel(channel),
+ timeout);
+ if (timeout <= 0) {
+ qeth_put_reply(reply);
+ qeth_release_buffer(channel, iob);
+ return (timeout == -ERESTARTSYS) ? -EINTR : -ETIME;
}
- qeth_prepare_control_data(card, len, iob);
- qeth_enqueue_reply(card, reply);
+ iob->finalize(card, iob, len);
+ QETH_DBF_HEX(CTRL, 2, iob->data, min(len, QETH_DBF_CTRL_LEN));
- timeout = jiffies + event_timeout;
+ qeth_enqueue_reply(card, reply);
QETH_CARD_TEXT(card, 6, "noirqpnd");
spin_lock_irq(get_ccwdev_lock(channel->ccwdev));
rc = ccw_device_start_timeout(channel->ccwdev, channel->ccw,
- (addr_t) iob, 0, 0, event_timeout);
+ (addr_t) iob, 0, 0, timeout);
spin_unlock_irq(get_ccwdev_lock(channel->ccwdev));
if (rc) {
QETH_DBF_MESSAGE(2, "qeth_send_control_data on device %x: ccw_device_start rc = %i\n",
return rc;
}
- /* we have only one long running ipassist, since we can ensure
- process context of this command we can sleep */
- if (cmd && cmd->hdr.command == IPA_CMD_SETIP &&
- cmd->hdr.prot_version == QETH_PROT_IPV4) {
- if (!wait_event_timeout(reply->wait_q,
- atomic_read(&reply->received), event_timeout))
- goto time_err;
- } else {
- while (!atomic_read(&reply->received)) {
- if (time_after(jiffies, timeout))
- goto time_err;
- cpu_relax();
- }
- }
+ timeout = wait_for_completion_interruptible_timeout(&reply->received,
+ timeout);
+ if (timeout <= 0)
+ rc = (timeout == -ERESTARTSYS) ? -EINTR : -ETIME;
qeth_dequeue_reply(card, reply);
- rc = reply->rc;
+ if (!rc)
+ rc = reply->rc;
qeth_put_reply(reply);
return rc;
+}
-time_err:
- qeth_dequeue_reply(card, reply);
- qeth_put_reply(reply);
- return -ETIME;
+static int qeth_idx_check_activate_response(struct qeth_card *card,
+ struct qeth_channel *channel,
+ struct qeth_cmd_buffer *iob)
+{
+ int rc;
+
+ rc = qeth_check_idx_response(card, iob->data);
+ if (rc)
+ return rc;
+
+ if (QETH_IS_IDX_ACT_POS_REPLY(iob->data))
+ return 0;
+
+ /* negative reply: */
+ QETH_DBF_TEXT_(SETUP, 2, "idxneg%c",
+ QETH_IDX_ACT_CAUSE_CODE(iob->data));
+
+ switch (QETH_IDX_ACT_CAUSE_CODE(iob->data)) {
+ case QETH_IDX_ACT_ERR_EXCL:
+ dev_err(&channel->ccwdev->dev,
+ "The adapter is used exclusively by another host\n");
+ return -EBUSY;
+ case QETH_IDX_ACT_ERR_AUTH:
+ case QETH_IDX_ACT_ERR_AUTH_USER:
+ dev_err(&channel->ccwdev->dev,
+ "Setting the device online failed because of insufficient authorization\n");
+ return -EPERM;
+ default:
+ QETH_DBF_MESSAGE(2, "IDX_ACTIVATE on channel %x: negative reply\n",
+ CCW_DEVID(channel->ccwdev));
+ return -EIO;
+ }
+}
+
+static void qeth_idx_query_read_cb(struct qeth_card *card,
+ struct qeth_channel *channel,
+ struct qeth_cmd_buffer *iob)
+{
+ u16 peer_level;
+ int rc;
+
+ QETH_DBF_TEXT(SETUP, 2, "idxrdcb");
+
+ rc = qeth_idx_check_activate_response(card, channel, iob);
+ if (rc)
+ goto out;
+
+ memcpy(&peer_level, QETH_IDX_ACT_FUNC_LEVEL(iob->data), 2);
+ if (peer_level != qeth_peer_func_level(card->info.func_level)) {
+ QETH_DBF_MESSAGE(2, "IDX_ACTIVATE on channel %x: function level mismatch (sent: %#x, received: %#x)\n",
+ CCW_DEVID(channel->ccwdev),
+ card->info.func_level, peer_level);
+ rc = -EINVAL;
+ goto out;
+ }
+
+ memcpy(&card->token.issuer_rm_r,
+ QETH_IDX_ACT_ISSUER_RM_TOKEN(iob->data),
+ QETH_MPC_TOKEN_LENGTH);
+ memcpy(&card->info.mcl_level[0],
+ QETH_IDX_REPLY_LEVEL(iob->data), QETH_MCL_LENGTH);
+
+out:
+ qeth_notify_reply(iob->reply, rc);
+ qeth_release_buffer(channel, iob);
+}
+
+static void qeth_idx_query_write_cb(struct qeth_card *card,
+ struct qeth_channel *channel,
+ struct qeth_cmd_buffer *iob)
+{
+ u16 peer_level;
+ int rc;
+
+ QETH_DBF_TEXT(SETUP, 2, "idxwrcb");
+
+ rc = qeth_idx_check_activate_response(card, channel, iob);
+ if (rc)
+ goto out;
+
+ memcpy(&peer_level, QETH_IDX_ACT_FUNC_LEVEL(iob->data), 2);
+ if ((peer_level & ~0x0100) !=
+ qeth_peer_func_level(card->info.func_level)) {
+ QETH_DBF_MESSAGE(2, "IDX_ACTIVATE on channel %x: function level mismatch (sent: %#x, received: %#x)\n",
+ CCW_DEVID(channel->ccwdev),
+ card->info.func_level, peer_level);
+ rc = -EINVAL;
+ }
+
+out:
+ qeth_notify_reply(iob->reply, rc);
+ qeth_release_buffer(channel, iob);
+}
+
+static void qeth_idx_finalize_query_cmd(struct qeth_card *card,
+ struct qeth_cmd_buffer *iob,
+ unsigned int length)
+{
+ qeth_setup_ccw(iob->channel->ccw, CCW_CMD_READ, length, iob->data);
+}
+
+static void qeth_idx_activate_cb(struct qeth_card *card,
+ struct qeth_channel *channel,
+ struct qeth_cmd_buffer *iob)
+{
+ qeth_notify_reply(iob->reply, 0);
+ qeth_release_buffer(channel, iob);
+}
+
+static void qeth_idx_setup_activate_cmd(struct qeth_card *card,
+ struct qeth_cmd_buffer *iob)
+{
+ u16 addr = (card->info.cula << 8) + card->info.unit_addr2;
+ u8 port = ((u8)card->dev->dev_port) | 0x80;
+ struct ccw_dev_id dev_id;
+
+ ccw_device_get_id(CARD_DDEV(card), &dev_id);
+ iob->finalize = qeth_idx_finalize_cmd;
+ iob->callback = qeth_idx_activate_cb;
+
+ memcpy(QETH_IDX_ACT_PNO(iob->data), &port, 1);
+ memcpy(QETH_IDX_ACT_ISSUER_RM_TOKEN(iob->data),
+ &card->token.issuer_rm_w, QETH_MPC_TOKEN_LENGTH);
+ memcpy(QETH_IDX_ACT_FUNC_LEVEL(iob->data),
+ &card->info.func_level, 2);
+ memcpy(QETH_IDX_ACT_QDIO_DEV_CUA(iob->data), &dev_id.devno, 2);
+ memcpy(QETH_IDX_ACT_QDIO_DEV_REALADDR(iob->data), &addr, 2);
+}
+
+static int qeth_idx_activate_read_channel(struct qeth_card *card)
+{
+ struct qeth_channel *channel = &card->read;
+ struct qeth_cmd_buffer *iob;
+ int rc;
+
+ QETH_DBF_TEXT(SETUP, 2, "idxread");
+
+ iob = qeth_get_buffer(channel);
+ if (!iob)
+ return -ENOMEM;
+
+ memcpy(iob->data, IDX_ACTIVATE_READ, IDX_ACTIVATE_SIZE);
+ qeth_idx_setup_activate_cmd(card, iob);
+
+ rc = qeth_send_control_data(card, IDX_ACTIVATE_SIZE, iob, NULL, NULL);
+ if (rc)
+ return rc;
+
+ iob = qeth_get_buffer(channel);
+ if (!iob)
+ return -ENOMEM;
+
+ iob->finalize = qeth_idx_finalize_query_cmd;
+ iob->callback = qeth_idx_query_read_cb;
+ rc = qeth_send_control_data(card, QETH_BUFSIZE, iob, NULL, NULL);
+ if (rc)
+ return rc;
+
+ channel->state = CH_STATE_UP;
+ return 0;
+}
+
+static int qeth_idx_activate_write_channel(struct qeth_card *card)
+{
+ struct qeth_channel *channel = &card->write;
+ struct qeth_cmd_buffer *iob;
+ int rc;
+
+ QETH_DBF_TEXT(SETUP, 2, "idxwrite");
+
+ iob = qeth_get_buffer(channel);
+ if (!iob)
+ return -ENOMEM;
+
+ memcpy(iob->data, IDX_ACTIVATE_WRITE, IDX_ACTIVATE_SIZE);
+ qeth_idx_setup_activate_cmd(card, iob);
+
+ rc = qeth_send_control_data(card, IDX_ACTIVATE_SIZE, iob, NULL, NULL);
+ if (rc)
+ return rc;
+
+ iob = qeth_get_buffer(channel);
+ if (!iob)
+ return -ENOMEM;
+
+ iob->finalize = qeth_idx_finalize_query_cmd;
+ iob->callback = qeth_idx_query_write_cb;
+ rc = qeth_send_control_data(card, QETH_BUFSIZE, iob, NULL, NULL);
+ if (rc)
+ return rc;
+
+ channel->state = CH_STATE_UP;
+ return 0;
}
static int qeth_cm_enable_cb(struct qeth_card *card, struct qeth_reply *reply,
QETH_DBF_TEXT(SETUP, 2, "cmenable");
iob = qeth_wait_for_buffer(&card->write);
+ iob->finalize = qeth_mpc_finalize_cmd;
memcpy(iob->data, CM_ENABLE, CM_ENABLE_SIZE);
+
memcpy(QETH_CM_ENABLE_ISSUER_RM_TOKEN(iob->data),
&card->token.issuer_rm_r, QETH_MPC_TOKEN_LENGTH);
memcpy(QETH_CM_ENABLE_FILTER_TOKEN(iob->data),
QETH_DBF_TEXT(SETUP, 2, "cmsetup");
iob = qeth_wait_for_buffer(&card->write);
+ iob->finalize = qeth_mpc_finalize_cmd;
memcpy(iob->data, CM_SETUP, CM_SETUP_SIZE);
+
memcpy(QETH_CM_SETUP_DEST_ADDR(iob->data),
&card->token.issuer_rm_r, QETH_MPC_TOKEN_LENGTH);
memcpy(QETH_CM_SETUP_CONNECTION_TOKEN(iob->data),
/* adjust RX buffer size to new max MTU: */
card->qdio.in_buf_size = max_mtu + 2 * PAGE_SIZE;
if (dev->max_mtu && dev->max_mtu != max_mtu)
- qeth_free_qdio_buffers(card);
+ qeth_free_qdio_queues(card);
} else {
if (dev->mtu)
new_mtu = dev->mtu;
QETH_DBF_TEXT(SETUP, 2, "ulpenabl");
iob = qeth_wait_for_buffer(&card->write);
+ iob->finalize = qeth_mpc_finalize_cmd;
memcpy(iob->data, ULP_ENABLE, ULP_ENABLE_SIZE);
*(QETH_ULP_ENABLE_LINKNUM(iob->data)) = (u8) card->dev->dev_port;
QETH_DBF_TEXT(SETUP, 2, "ulpsetup");
iob = qeth_wait_for_buffer(&card->write);
+ iob->finalize = qeth_mpc_finalize_cmd;
memcpy(iob->data, ULP_SETUP, ULP_SETUP_SIZE);
memcpy(QETH_ULP_SETUP_DEST_ADDR(iob->data),
if (!q)
return;
- qeth_clear_outq_buffers(q, 1);
+ qeth_drain_output_queue(q, true);
qdio_free_buffers(q->qdio_bufs, QDIO_MAX_BUFFERS_PER_Q);
kfree(q);
}
-static struct qeth_qdio_out_q *qeth_alloc_qdio_out_buf(void)
+static struct qeth_qdio_out_q *qeth_alloc_output_queue(void)
{
struct qeth_qdio_out_q *q = kzalloc(sizeof(*q), GFP_KERNEL);
return q;
}
-static int qeth_alloc_qdio_buffers(struct qeth_card *card)
+static int qeth_alloc_qdio_queues(struct qeth_card *card)
{
int i, j;
/* outbound */
for (i = 0; i < card->qdio.no_out_queues; ++i) {
- card->qdio.out_qs[i] = qeth_alloc_qdio_out_buf();
+ card->qdio.out_qs[i] = qeth_alloc_output_queue();
if (!card->qdio.out_qs[i])
goto out_freeoutq;
QETH_DBF_TEXT_(SETUP, 2, "outq %i", i);
return -ENOMEM;
}
-static void qeth_free_qdio_buffers(struct qeth_card *card)
+static void qeth_free_qdio_queues(struct qeth_card *card)
{
int i, j;
QETH_DBF_TEXT(SETUP, 2, "dmact");
iob = qeth_wait_for_buffer(&card->write);
+ iob->finalize = qeth_mpc_finalize_cmd;
memcpy(iob->data, DM_ACT, DM_ACT_SIZE);
memcpy(QETH_DM_ACT_DEST_ADDR(iob->data),
QETH_DBF_TEXT_(SETUP, 2, "5err%d", rc);
goto out_qdio;
}
- rc = qeth_alloc_qdio_buffers(card);
+ rc = qeth_alloc_qdio_queues(card);
if (rc) {
QETH_DBF_TEXT_(SETUP, 2, "5err%d", rc);
goto out_qdio;
rc = qeth_qdio_establish(card);
if (rc) {
QETH_DBF_TEXT_(SETUP, 2, "6err%d", rc);
- qeth_free_qdio_buffers(card);
+ qeth_free_qdio_queues(card);
goto out_qdio;
}
rc = qeth_qdio_activate(card);
cmd->hdr.prot_version = prot;
}
+static void qeth_ipa_finalize_cmd(struct qeth_card *card,
+ struct qeth_cmd_buffer *iob,
+ unsigned int length)
+{
+ qeth_mpc_finalize_cmd(card, iob, length);
+
+ /* override with IPA-specific values: */
+ __ipa_cmd(iob)->hdr.seqno = card->seqno.ipa;
+ iob->reply->seqno = card->seqno.ipa++;
+}
+
void qeth_prepare_ipa_cmd(struct qeth_card *card, struct qeth_cmd_buffer *iob,
u16 cmd_length)
{
u16 total_length = IPA_PDU_HEADER_SIZE + cmd_length;
u8 prot_type = qeth_mpc_select_prot_type(card);
+ iob->finalize = qeth_ipa_finalize_cmd;
+ iob->timeout = QETH_IPA_TIMEOUT;
+
memcpy(iob->data, IPA_PDU_HEADER, IPA_PDU_HEADER_SIZE);
memcpy(QETH_IPA_PDU_LEN_TOTAL(iob->data), &total_length, 2);
memcpy(QETH_IPA_CMD_PROT_TYPE(iob->data), &prot_type, 1);
QETH_CARD_TEXT(card, 4, "sendipa");
+ if (card->read_or_write_problem) {
+ qeth_release_buffer(iob->channel, iob);
+ return -EIO;
+ }
+
if (reply_cb == NULL)
reply_cb = qeth_send_ipa_cmd_cb;
memcpy(&length, QETH_IPA_PDU_LEN_TOTAL(iob->data), 2);
}
QETH_TXQ_STAT_ADD(queue, bufs, count);
- netif_trans_update(queue->card->dev);
qdio_flags = QDIO_FLAG_SYNC_OUTPUT;
if (atomic_read(&queue->set_pci_flags_count))
qdio_flags |= QDIO_FLAG_PCI_OUT;
- atomic_add(count, &queue->used_buffers);
rc = do_QDIO(CARD_DDEV(queue->card), qdio_flags,
queue->queue_no, index, count);
if (rc) {
* do_send_packet. So, we check if there is a
* packing buffer to be flushed here.
*/
- netif_stop_queue(queue->card->dev);
index = queue->next_buf_to_fill;
q_was_packing = queue->do_pack;
/* queue->do_pack may change */
goto out;
}
- qeth_free_qdio_buffers(card);
+ qeth_free_qdio_queues(card);
card->options.cq = cq;
rc = 0;
}
QETH_CARD_TEXT_(card, 5, "qcqherr%d", qdio_err);
if (qdio_err) {
- netif_stop_queue(card->dev);
+ netif_tx_stop_all_queues(card->dev);
qeth_schedule_recovery(card);
return;
}
struct qeth_card *card = (struct qeth_card *) card_ptr;
struct qeth_qdio_out_q *queue = card->qdio.out_qs[__queue];
struct qeth_qdio_out_buffer *buffer;
+ struct net_device *dev = card->dev;
+ struct netdev_queue *txq;
int i;
QETH_CARD_TEXT(card, 6, "qdouhdl");
if (qdio_error & QDIO_ERROR_FATAL) {
QETH_CARD_TEXT(card, 2, "achkcond");
- netif_stop_queue(card->dev);
+ netif_tx_stop_all_queues(dev);
qeth_schedule_recovery(card);
return;
}
if (card->info.type != QETH_CARD_TYPE_IQD)
qeth_check_outbound_queue(queue);
- netif_wake_queue(queue->card->dev);
-}
-
-/* We cannot use outbound queue 3 for unicast packets on HiperSockets */
-static inline int qeth_cut_iqd_prio(struct qeth_card *card, int queue_num)
-{
- if ((card->info.type == QETH_CARD_TYPE_IQD) && (queue_num == 3))
- return 2;
- return queue_num;
+ if (IS_IQD(card))
+ __queue = qeth_iqd_translate_txq(dev, __queue);
+ txq = netdev_get_tx_queue(dev, __queue);
+ /* xmit may have observed the full-condition, but not yet stopped the
+ * txq. In which case the code below won't trigger. So before returning,
+ * xmit will re-check the txq's fill level and wake it up if needed.
+ */
+ if (netif_tx_queue_stopped(txq) && !qeth_out_queue_is_full(queue))
+ netif_tx_wake_queue(txq);
}
/**
* Note: Function assumes that we have 4 outbound queues.
*/
-int qeth_get_priority_queue(struct qeth_card *card, struct sk_buff *skb,
- int ipv)
+int qeth_get_priority_queue(struct qeth_card *card, struct sk_buff *skb)
{
- __be16 *tci;
+ struct vlan_ethhdr *veth = vlan_eth_hdr(skb);
u8 tos;
switch (card->qdio.do_prio_queueing) {
case QETH_PRIO_Q_ING_TOS:
case QETH_PRIO_Q_ING_PREC:
- switch (ipv) {
+ switch (qeth_get_ip_version(skb)) {
case 4:
tos = ipv4_get_dsfield(ip_hdr(skb));
break;
return card->qdio.default_out_queue;
}
if (card->qdio.do_prio_queueing == QETH_PRIO_Q_ING_PREC)
- return qeth_cut_iqd_prio(card, ~tos >> 6 & 3);
+ return ~tos >> 6 & 3;
if (tos & IPTOS_MINCOST)
- return qeth_cut_iqd_prio(card, 3);
+ return 3;
if (tos & IPTOS_RELIABILITY)
return 2;
if (tos & IPTOS_THROUGHPUT)
case QETH_PRIO_Q_ING_SKB:
if (skb->priority > 5)
return 0;
- return qeth_cut_iqd_prio(card, ~skb->priority >> 1 & 3);
+ return ~skb->priority >> 1 & 3;
case QETH_PRIO_Q_ING_VLAN:
- tci = &((struct ethhdr *)skb->data)->h_proto;
- if (be16_to_cpu(*tci) == ETH_P_8021Q)
- return qeth_cut_iqd_prio(card,
- ~be16_to_cpu(*(tci + 1)) >> (VLAN_PRIO_SHIFT + 1) & 3);
+ if (veth->h_vlan_proto == htons(ETH_P_8021Q))
+ return ~ntohs(veth->h_vlan_TCI) >>
+ (VLAN_PRIO_SHIFT + 1) & 3;
break;
default:
break;
* from qeth_core_header_cache.
* @offset: when mapping the skb, start at skb->data + offset
* @hd_len: if > 0, build a dedicated header element of this size
+ * flush: Prepare the buffer to be flushed, regardless of its fill level.
*/
static int qeth_fill_buffer(struct qeth_qdio_out_q *queue,
struct qeth_qdio_out_buffer *buf,
struct sk_buff *skb, struct qeth_hdr *hdr,
- unsigned int offset, unsigned int hd_len)
+ unsigned int offset, unsigned int hd_len,
+ bool flush)
{
struct qdio_buffer *buffer = buf->buffer;
bool is_first_elem = true;
QETH_TXQ_STAT_INC(queue, skbs_pack);
/* If the buffer still has free elements, keep using it. */
- if (buf->next_element_to_fill <
- QETH_MAX_BUFFER_ELEMENTS(queue->card))
+ if (!flush && buf->next_element_to_fill <
+ QETH_MAX_BUFFER_ELEMENTS(queue->card))
return 0;
}
{
int index = queue->next_buf_to_fill;
struct qeth_qdio_out_buffer *buffer = queue->bufs[index];
+ struct netdev_queue *txq;
+ bool stopped = false;
- /*
- * check if buffer is empty to make sure that we do not 'overtake'
- * ourselves and try to fill a buffer that is already primed
+ /* Just a sanity check, the wake/stop logic should ensure that we always
+ * get a free buffer.
*/
if (atomic_read(&buffer->state) != QETH_QDIO_BUF_EMPTY)
return -EBUSY;
- qeth_fill_buffer(queue, buffer, skb, hdr, offset, hd_len);
+
+ txq = netdev_get_tx_queue(queue->card->dev, skb_get_queue_mapping(skb));
+
+ if (atomic_inc_return(&queue->used_buffers) >= QDIO_MAX_BUFFERS_PER_Q) {
+ /* If a TX completion happens right _here_ and misses to wake
+ * the txq, then our re-check below will catch the race.
+ */
+ QETH_TXQ_STAT_INC(queue, stopped);
+ netif_tx_stop_queue(txq);
+ stopped = true;
+ }
+
+ qeth_fill_buffer(queue, buffer, skb, hdr, offset, hd_len, stopped);
qeth_flush_buffers(queue, index, 1);
+
+ if (stopped && !qeth_out_queue_is_full(queue))
+ netif_tx_start_queue(txq);
return 0;
}
int elements_needed)
{
struct qeth_qdio_out_buffer *buffer;
+ struct netdev_queue *txq;
+ bool stopped = false;
int start_index;
int flush_count = 0;
int do_pack = 0;
QETH_OUT_Q_LOCKED) != QETH_OUT_Q_UNLOCKED);
start_index = queue->next_buf_to_fill;
buffer = queue->bufs[queue->next_buf_to_fill];
- /*
- * check if buffer is empty to make sure that we do not 'overtake'
- * ourselves and try to fill a buffer that is already primed
+
+ /* Just a sanity check, the wake/stop logic should ensure that we always
+ * get a free buffer.
*/
if (atomic_read(&buffer->state) != QETH_QDIO_BUF_EMPTY) {
atomic_set(&queue->state, QETH_OUT_Q_UNLOCKED);
return -EBUSY;
}
+
+ txq = netdev_get_tx_queue(card->dev, skb_get_queue_mapping(skb));
+
/* check if we need to switch packing state of this queue */
qeth_switch_to_packing_if_needed(queue);
if (queue->do_pack) {
(queue->next_buf_to_fill + 1) %
QDIO_MAX_BUFFERS_PER_Q;
buffer = queue->bufs[queue->next_buf_to_fill];
- /* we did a step forward, so check buffer state
- * again */
+
+ /* We stepped forward, so sanity-check again: */
if (atomic_read(&buffer->state) !=
QETH_QDIO_BUF_EMPTY) {
qeth_flush_buffers(queue, start_index,
}
}
- flush_count += qeth_fill_buffer(queue, buffer, skb, hdr, offset,
- hd_len);
+ if (buffer->next_element_to_fill == 0 &&
+ atomic_inc_return(&queue->used_buffers) >= QDIO_MAX_BUFFERS_PER_Q) {
+ /* If a TX completion happens right _here_ and misses to wake
+ * the txq, then our re-check below will catch the race.
+ */
+ QETH_TXQ_STAT_INC(queue, stopped);
+ netif_tx_stop_queue(txq);
+ stopped = true;
+ }
+
+ flush_count += qeth_fill_buffer(queue, buffer, skb, hdr, offset, hd_len,
+ stopped);
if (flush_count)
qeth_flush_buffers(queue, start_index, flush_count);
else if (!atomic_read(&queue->set_pci_flags_count))
if (do_pack)
QETH_TXQ_STAT_ADD(queue, bufs_pack, flush_count);
+ if (stopped && !qeth_out_queue_is_full(queue))
+ netif_tx_start_queue(txq);
return rc;
}
EXPORT_SYMBOL_GPL(qeth_do_send_packet);
} else {
if (!push_len)
kmem_cache_free(qeth_core_header_cache, hdr);
- if (rc == -EBUSY)
- /* roll back to ETH header */
- skb_pull(skb, push_len);
}
return rc;
}
card = dev->ml_priv;
QETH_CARD_TEXT(card, 4, "txtimeo");
- QETH_CARD_STAT_INC(card, tx_errors);
qeth_schedule_recovery(card);
}
EXPORT_SYMBOL_GPL(qeth_tx_timeout);
qeth_clean_channel(&card->write);
qeth_clean_channel(&card->data);
destroy_workqueue(card->event_wq);
- qeth_free_qdio_buffers(card);
+ qeth_free_qdio_queues(card);
unregister_service_level(&card->qeth_service_level);
dev_set_drvdata(&card->gdev->dev, NULL);
kfree(card);
QETH_DBF_TEXT(SETUP, 2, "hrdsetup");
atomic_set(&card->force_alloc_skb, 0);
- qeth_update_from_chp_desc(card);
+ rc = qeth_update_from_chp_desc(card);
+ if (rc)
+ return rc;
retry:
if (retries < 3)
QETH_DBF_MESSAGE(2, "Retrying to do IDX activates on device %x.\n",
qeth_determine_capabilities(card);
qeth_init_tokens(card);
qeth_init_func_level(card);
- rc = qeth_idx_activate_channel(card, &card->read, qeth_idx_read_cb);
- if (rc == -ERESTARTSYS) {
+
+ rc = qeth_idx_activate_read_channel(card);
+ if (rc == -EINTR) {
QETH_DBF_TEXT(SETUP, 2, "break2");
return rc;
} else if (rc) {
else
goto retry;
}
- rc = qeth_idx_activate_channel(card, &card->write, qeth_idx_write_cb);
- if (rc == -ERESTARTSYS) {
+
+ rc = qeth_idx_activate_write_channel(card);
+ if (rc == -EINTR) {
QETH_DBF_TEXT(SETUP, 2, "break3");
return rc;
} else if (rc) {
switch (card->info.type) {
case QETH_CARD_TYPE_IQD:
- dev = alloc_netdev(0, "hsi%d", NET_NAME_UNKNOWN, ether_setup);
+ dev = alloc_netdev_mqs(0, "hsi%d", NET_NAME_UNKNOWN,
+ ether_setup, QETH_MAX_QUEUES, 1);
+ break;
+ case QETH_CARD_TYPE_OSM:
+ dev = alloc_etherdev(0);
break;
case QETH_CARD_TYPE_OSN:
dev = alloc_netdev(0, "osn%d", NET_NAME_UNKNOWN, ether_setup);
break;
default:
- dev = alloc_etherdev(0);
+ dev = alloc_etherdev_mqs(0, QETH_MAX_QUEUES, 1);
}
if (!dev)
dev->priv_flags &= ~IFF_TX_SKB_SHARING;
dev->hw_features |= NETIF_F_SG;
dev->vlan_features |= NETIF_F_SG;
- if (IS_IQD(card))
+ if (IS_IQD(card)) {
+ netif_set_real_num_tx_queues(dev, QETH_IQD_MIN_TXQ);
dev->features |= NETIF_F_SG;
+ }
}
return dev;
}
qeth_setup_card(card);
- qeth_update_from_chp_desc(card);
-
card->dev = qeth_alloc_netdev(card);
if (!card->dev) {
rc = -ENOMEM;
goto err_card;
}
+ card->qdio.no_out_queues = card->dev->num_tx_queues;
+ rc = qeth_update_from_chp_desc(card);
+ if (rc)
+ goto err_chp_desc;
qeth_determine_capabilities(card);
enforced_disc = qeth_enforce_discipline(card);
switch (enforced_disc) {
err_disc:
qeth_core_free_discipline(card);
err_load:
+err_chp_desc:
free_netdev(card->dev);
err_card:
qeth_core_free_card(card);
if ((gdev->state == CCWGROUP_ONLINE) && card->info.hwtrap)
qeth_hw_trap(card, QETH_DIAGS_TRAP_DISARM);
qeth_qdio_clear_card(card, 0);
- qeth_clear_qdio_buffers(card);
+ qeth_drain_output_queues(card);
qdio_free(CARD_DDEV(card));
}
stats->rx_errors = card->stats.rx_errors;
stats->rx_dropped = card->stats.rx_dropped;
stats->multicast = card->stats.rx_multicast;
- stats->tx_errors = card->stats.tx_errors;
for (i = 0; i < card->qdio.no_out_queues; i++) {
queue = card->qdio.out_qs[i];
}
EXPORT_SYMBOL_GPL(qeth_get_stats64);
+u16 qeth_iqd_select_queue(struct net_device *dev, struct sk_buff *skb,
+ u8 cast_type, struct net_device *sb_dev)
+{
+ if (cast_type != RTN_UNICAST)
+ return QETH_IQD_MCAST_TXQ;
+ return QETH_IQD_MIN_UCAST_TXQ;
+}
+EXPORT_SYMBOL_GPL(qeth_iqd_select_queue);
+
int qeth_open(struct net_device *dev)
{
struct qeth_card *card = dev->ml_priv;
return -EIO;
card->data.state = CH_STATE_UP;
- netif_start_queue(dev);
+ netif_tx_start_all_queues(dev);
napi_enable(&card->napi);
local_bh_disable();
if (!card)
return -EINVAL;
+ if (IS_IQD(card))
+ return -EOPNOTSUPP;
+
mutex_lock(&card->conf_mutex);
if (card->state != CARD_STATE_DOWN) {
rc = -EPERM;
card->qdio.do_prio_queueing = QETH_NO_PRIO_QUEUEING;
card->qdio.default_out_queue = 2;
} else if (sysfs_streq(buf, "no_prio_queueing:3")) {
- if (card->info.type == QETH_CARD_TYPE_IQD) {
- rc = -EPERM;
- goto out;
- }
card->qdio.do_prio_queueing = QETH_NO_PRIO_QUEUEING;
card->qdio.default_out_queue = 3;
} else if (sysfs_streq(buf, "no_prio_queueing")) {
QETH_TXQ_STAT("linearized+error skbs", skbs_linearized_fail),
QETH_TXQ_STAT("TSO bytes", tso_bytes),
QETH_TXQ_STAT("Packing mode switches", packing_mode_switch),
+ QETH_TXQ_STAT("Queue stopped", stopped),
};
static const struct qeth_stats card_stats[] = {
CARD_RDEV_ID(card), CARD_WDEV_ID(card), CARD_DDEV_ID(card));
}
+static void qeth_get_channels(struct net_device *dev,
+ struct ethtool_channels *channels)
+{
+ struct qeth_card *card = dev->ml_priv;
+
+ channels->max_rx = dev->num_rx_queues;
+ channels->max_tx = card->qdio.no_out_queues;
+ channels->max_other = 0;
+ channels->max_combined = 0;
+ channels->rx_count = dev->real_num_rx_queues;
+ channels->tx_count = dev->real_num_tx_queues;
+ channels->other_count = 0;
+ channels->combined_count = 0;
+}
+
/* Helper function to fill 'advertising' and 'supported' which are the same. */
/* Autoneg and full-duplex are supported and advertised unconditionally. */
/* Always advertise and support all speeds up to specified, and only one */
.get_ethtool_stats = qeth_get_ethtool_stats,
.get_sset_count = qeth_get_sset_count,
.get_drvinfo = qeth_get_drvinfo,
+ .get_channels = qeth_get_channels,
.get_link_ksettings = qeth_get_link_ksettings,
};
return rc;
}
-static void qeth_l2_del_all_macs(struct qeth_card *card)
+static void qeth_l2_drain_rx_mode_cache(struct qeth_card *card)
{
struct qeth_mac *mac;
struct hlist_node *tmp;
int i;
- spin_lock_bh(&card->mclock);
hash_for_each_safe(card->mac_htable, i, tmp, mac, hnode) {
hash_del(&mac->hnode);
kfree(mac);
}
- spin_unlock_bh(&card->mclock);
}
-static int qeth_l2_get_cast_type(struct qeth_card *card, struct sk_buff *skb)
+static int qeth_l2_get_cast_type(struct sk_buff *skb)
{
- if (card->info.type == QETH_CARD_TYPE_OSN)
- return RTN_UNICAST;
if (is_broadcast_ether_addr(skb->data))
return RTN_BROADCAST;
if (is_multicast_ether_addr(skb->data))
qeth_set_allowed_threads(card, 0, 1);
+ cancel_work_sync(&card->rx_mode_work);
+ qeth_l2_drain_rx_mode_cache(card);
+
if (card->state == CARD_STATE_SOFTSETUP) {
- qeth_l2_del_all_macs(card);
qeth_clear_ipacmd_list(card);
card->state = CARD_STATE_HARDSETUP;
}
if (card->state == CARD_STATE_HARDSETUP) {
qeth_qdio_clear_card(card, 0);
- qeth_clear_qdio_buffers(card);
+ qeth_drain_output_queues(card);
qeth_clear_working_pool_list(card);
card->state = CARD_STATE_DOWN;
}
hash_add(card->mac_htable, &mac->hnode, mac_hash);
}
-static void qeth_l2_set_rx_mode(struct net_device *dev)
+static void qeth_l2_rx_mode_work(struct work_struct *work)
{
- struct qeth_card *card = dev->ml_priv;
+ struct qeth_card *card = container_of(work, struct qeth_card,
+ rx_mode_work);
+ struct net_device *dev = card->dev;
struct netdev_hw_addr *ha;
struct qeth_mac *mac;
struct hlist_node *tmp;
QETH_CARD_TEXT(card, 3, "setmulti");
- spin_lock_bh(&card->mclock);
-
+ netif_addr_lock_bh(dev);
netdev_for_each_mc_addr(ha, dev)
qeth_l2_add_mac(card, ha);
netdev_for_each_uc_addr(ha, dev)
qeth_l2_add_mac(card, ha);
+ netif_addr_unlock_bh(dev);
hash_for_each_safe(card->mac_htable, i, tmp, mac, hnode) {
switch (mac->disp_flag) {
}
}
- spin_unlock_bh(&card->mclock);
-
if (qeth_adp_supported(card, IPA_SETADP_SET_PROMISC_MODE))
qeth_setadp_promisc_mode(card);
else
struct net_device *dev)
{
struct qeth_card *card = dev->ml_priv;
- int cast_type = qeth_l2_get_cast_type(card, skb);
- int ipv = qeth_get_ip_version(skb);
+ u16 txq = skb_get_queue_mapping(skb);
struct qeth_qdio_out_q *queue;
int tx_bytes = skb->len;
int rc;
- queue = qeth_get_tx_queue(card, skb, ipv, cast_type);
-
- netif_stop_queue(dev);
+ if (IS_IQD(card))
+ txq = qeth_iqd_translate_txq(dev, txq);
+ queue = card->qdio.out_qs[txq];
if (IS_OSN(card))
rc = qeth_l2_xmit_osn(card, skb, queue);
else
- rc = qeth_xmit(card, skb, queue, ipv, cast_type,
- qeth_l2_fill_header);
+ rc = qeth_xmit(card, skb, queue, qeth_get_ip_version(skb),
+ qeth_l2_get_cast_type(skb), qeth_l2_fill_header);
if (!rc) {
QETH_TXQ_STAT_INC(queue, tx_packets);
QETH_TXQ_STAT_ADD(queue, tx_bytes, tx_bytes);
- netif_wake_queue(dev);
return NETDEV_TX_OK;
- } else if (rc == -EBUSY) {
- return NETDEV_TX_BUSY;
- } /* else fall through */
+ }
QETH_TXQ_STAT_INC(queue, tx_dropped);
kfree_skb(skb);
- netif_wake_queue(dev);
return NETDEV_TX_OK;
}
+static u16 qeth_l2_select_queue(struct net_device *dev, struct sk_buff *skb,
+ struct net_device *sb_dev)
+{
+ struct qeth_card *card = dev->ml_priv;
+
+ if (IS_IQD(card))
+ return qeth_iqd_select_queue(dev, skb,
+ qeth_l2_get_cast_type(skb),
+ sb_dev);
+ return qeth_get_priority_queue(card, skb);
+}
+
static const struct device_type qeth_l2_devtype = {
.name = "qeth_layer2",
.groups = qeth_l2_attr_groups,
}
hash_init(card->mac_htable);
+ INIT_WORK(&card->rx_mode_work, qeth_l2_rx_mode_work);
return 0;
}
unregister_netdev(card->dev);
}
+static void qeth_l2_set_rx_mode(struct net_device *dev)
+{
+ struct qeth_card *card = dev->ml_priv;
+
+ schedule_work(&card->rx_mode_work);
+}
+
static const struct net_device_ops qeth_l2_netdev_ops = {
.ndo_open = qeth_open,
.ndo_stop = qeth_stop,
.ndo_get_stats64 = qeth_get_stats64,
.ndo_start_xmit = qeth_l2_hard_start_xmit,
.ndo_features_check = qeth_features_check,
+ .ndo_select_queue = qeth_l2_select_queue,
.ndo_validate_addr = qeth_l2_validate_addr,
.ndo_set_rx_mode = qeth_l2_set_rx_mode,
.ndo_do_ioctl = qeth_do_ioctl,
QETH_CARD_TEXT(card, 5, "osndctrd");
- wait_event(card->wait_q,
- atomic_cmpxchg(&channel->irq_pending, 0, 1) == 0);
- qeth_prepare_control_data(card, len, iob);
+ wait_event(card->wait_q, qeth_trylock_channel(channel));
+ iob->finalize(card, iob, len);
+ QETH_DBF_HEX(CTRL, 2, iob->data, min(len, QETH_DBF_CTRL_LEN));
QETH_CARD_TEXT(card, 6, "osnoirqp");
spin_lock_irq(get_ccwdev_lock(channel->ccwdev));
rc = ccw_device_start_timeout(channel->ccwdev, channel->ccw,
- (addr_t) iob, 0, 0, QETH_IPA_TIMEOUT);
+ (addr_t) iob, 0, 0, iob->timeout);
spin_unlock_irq(get_ccwdev_lock(channel->ccwdev));
if (rc) {
QETH_DBF_MESSAGE(2, "qeth_osn_send_control_data: "
*/
if (addr->proto == QETH_PROT_IPV4) {
addr->in_progress = 1;
- spin_unlock_bh(&card->ip_lock);
+ mutex_unlock(&card->ip_lock);
rc = qeth_l3_register_addr_entry(card, addr);
- spin_lock_bh(&card->ip_lock);
+ mutex_lock(&card->ip_lock);
addr->in_progress = 0;
} else
rc = qeth_l3_register_addr_entry(card, addr);
return rc;
}
+static int qeth_l3_modify_ip(struct qeth_card *card, struct qeth_ipaddr *addr,
+ bool add)
+{
+ int rc;
+
+ mutex_lock(&card->ip_lock);
+ rc = add ? qeth_l3_add_ip(card, addr) : qeth_l3_delete_ip(card, addr);
+ mutex_unlock(&card->ip_lock);
+
+ return rc;
+}
+
+static void qeth_l3_drain_rx_mode_cache(struct qeth_card *card)
+{
+ struct qeth_ipaddr *addr;
+ struct hlist_node *tmp;
+ int i;
+
+ hash_for_each_safe(card->ip_mc_htable, i, tmp, addr, hnode) {
+ hash_del(&addr->hnode);
+ kfree(addr);
+ }
+}
+
static void qeth_l3_clear_ip_htable(struct qeth_card *card, int recover)
{
struct qeth_ipaddr *addr;
QETH_CARD_TEXT(card, 4, "clearip");
- spin_lock_bh(&card->ip_lock);
+ mutex_lock(&card->ip_lock);
hash_for_each_safe(card->ip_htable, i, tmp, addr, hnode) {
if (!recover) {
addr->disp_flag = QETH_DISP_ADDR_ADD;
}
- spin_unlock_bh(&card->ip_lock);
-
- spin_lock_bh(&card->mclock);
-
- hash_for_each_safe(card->ip_mc_htable, i, tmp, addr, hnode) {
- hash_del(&addr->hnode);
- kfree(addr);
- }
-
- spin_unlock_bh(&card->mclock);
-
-
+ mutex_unlock(&card->ip_lock);
}
+
static void qeth_l3_recover_ip(struct qeth_card *card)
{
struct qeth_ipaddr *addr;
QETH_CARD_TEXT(card, 4, "recovrip");
- spin_lock_bh(&card->ip_lock);
+ mutex_lock(&card->ip_lock);
hash_for_each_safe(card->ip_htable, i, tmp, addr, hnode) {
if (addr->disp_flag == QETH_DISP_ADDR_ADD) {
if (addr->proto == QETH_PROT_IPV4) {
addr->in_progress = 1;
- spin_unlock_bh(&card->ip_lock);
+ mutex_unlock(&card->ip_lock);
rc = qeth_l3_register_addr_entry(card, addr);
- spin_lock_bh(&card->ip_lock);
+ mutex_lock(&card->ip_lock);
addr->in_progress = 0;
} else
rc = qeth_l3_register_addr_entry(card, addr);
}
}
- spin_unlock_bh(&card->ip_lock);
-
+ mutex_unlock(&card->ip_lock);
}
static int qeth_l3_setdelip_cb(struct qeth_card *card, struct qeth_reply *reply,
{
struct qeth_ipato_entry *ipatoe, *tmp;
- spin_lock_bh(&card->ip_lock);
+ mutex_lock(&card->ip_lock);
list_for_each_entry_safe(ipatoe, tmp, &card->ipato.entries, entry) {
list_del(&ipatoe->entry);
}
qeth_l3_update_ipato(card);
- spin_unlock_bh(&card->ip_lock);
+ mutex_unlock(&card->ip_lock);
}
int qeth_l3_add_ipato_entry(struct qeth_card *card,
QETH_CARD_TEXT(card, 2, "addipato");
- spin_lock_bh(&card->ip_lock);
+ mutex_lock(&card->ip_lock);
list_for_each_entry(ipatoe, &card->ipato.entries, entry) {
if (ipatoe->proto != new->proto)
qeth_l3_update_ipato(card);
}
- spin_unlock_bh(&card->ip_lock);
+ mutex_unlock(&card->ip_lock);
return rc;
}
QETH_CARD_TEXT(card, 2, "delipato");
- spin_lock_bh(&card->ip_lock);
+ mutex_lock(&card->ip_lock);
list_for_each_entry_safe(ipatoe, tmp, &card->ipato.entries, entry) {
if (ipatoe->proto != proto)
}
}
- spin_unlock_bh(&card->ip_lock);
+ mutex_unlock(&card->ip_lock);
return rc;
}
enum qeth_prot_versions proto)
{
struct qeth_ipaddr addr;
- int rc;
qeth_l3_init_ipaddr(&addr, type, proto);
if (proto == QETH_PROT_IPV4)
else
memcpy(&addr.u.a6.addr, ip, 16);
- spin_lock_bh(&card->ip_lock);
- rc = add ? qeth_l3_add_ip(card, &addr) : qeth_l3_delete_ip(card, &addr);
- spin_unlock_bh(&card->ip_lock);
- return rc;
+ return qeth_l3_modify_ip(card, &addr, add);
}
int qeth_l3_modify_hsuid(struct qeth_card *card, bool add)
{
struct qeth_ipaddr addr;
- int rc, i;
+ unsigned int i;
qeth_l3_init_ipaddr(&addr, QETH_IP_TYPE_NORMAL, QETH_PROT_IPV6);
addr.u.a6.addr.s6_addr[0] = 0xfe;
for (i = 0; i < 8; i++)
addr.u.a6.addr.s6_addr[8+i] = card->options.hsuid[i];
- spin_lock_bh(&card->ip_lock);
- rc = add ? qeth_l3_add_ip(card, &addr) : qeth_l3_delete_ip(card, &addr);
- spin_unlock_bh(&card->ip_lock);
- return rc;
+ return qeth_l3_modify_ip(card, &addr, add);
}
static int qeth_l3_register_addr_entry(struct qeth_card *card,
qeth_set_allowed_threads(card, 0, 1);
+ cancel_work_sync(&card->rx_mode_work);
+ qeth_l3_drain_rx_mode_cache(card);
+
if (card->options.sniffer &&
(card->info.promisc_mode == SET_PROMISC_MODE_ON))
qeth_diags_trace(card, QETH_DIAGS_CMD_TRACE_DISABLE);
}
if (card->state == CARD_STATE_HARDSETUP) {
qeth_qdio_clear_card(card, 0);
- qeth_clear_qdio_buffers(card);
+ qeth_drain_output_queues(card);
qeth_clear_working_pool_list(card);
card->state = CARD_STATE_DOWN;
}
}
}
-static void qeth_l3_set_rx_mode(struct net_device *dev)
+static void qeth_l3_rx_mode_work(struct work_struct *work)
{
- struct qeth_card *card = dev->ml_priv;
+ struct qeth_card *card = container_of(work, struct qeth_card,
+ rx_mode_work);
struct qeth_ipaddr *addr;
struct hlist_node *tmp;
int i, rc;
QETH_CARD_TEXT(card, 3, "setmulti");
if (!card->options.sniffer) {
- spin_lock_bh(&card->mclock);
-
qeth_l3_add_multicast_ipv4(card);
qeth_l3_add_multicast_ipv6(card);
}
}
- spin_unlock_bh(&card->mclock);
-
if (!qeth_adp_supported(card, IPA_SETADP_SET_PROMISC_MODE))
return;
}
static int qeth_l3_xmit(struct qeth_card *card, struct sk_buff *skb,
struct qeth_qdio_out_q *queue, int ipv, int cast_type)
{
- unsigned char eth_hdr[ETH_HLEN];
unsigned int hw_hdr_len;
int rc;
rc = skb_cow_head(skb, hw_hdr_len - ETH_HLEN);
if (rc)
return rc;
- skb_copy_from_linear_data(skb, eth_hdr, ETH_HLEN);
skb_pull(skb, ETH_HLEN);
qeth_l3_fixup_headers(skb);
- rc = qeth_xmit(card, skb, queue, ipv, cast_type, qeth_l3_fill_header);
- if (rc == -EBUSY) {
- /* roll back to ETH header */
- skb_push(skb, ETH_HLEN);
- skb_copy_to_linear_data(skb, eth_hdr, ETH_HLEN);
- }
- return rc;
+ return qeth_xmit(card, skb, queue, ipv, cast_type, qeth_l3_fill_header);
}
static netdev_tx_t qeth_l3_hard_start_xmit(struct sk_buff *skb,
struct net_device *dev)
{
- int cast_type = qeth_l3_get_cast_type(skb);
struct qeth_card *card = dev->ml_priv;
+ u16 txq = skb_get_queue_mapping(skb);
int ipv = qeth_get_ip_version(skb);
struct qeth_qdio_out_q *queue;
int tx_bytes = skb->len;
- int rc;
-
- queue = qeth_get_tx_queue(card, skb, ipv, cast_type);
+ int cast_type, rc;
if (IS_IQD(card)) {
+ queue = card->qdio.out_qs[qeth_iqd_translate_txq(dev, txq)];
+
if (card->options.sniffer)
goto tx_drop;
if ((card->options.cq != QETH_CQ_ENABLED && !ipv) ||
(card->options.cq == QETH_CQ_ENABLED &&
skb->protocol != htons(ETH_P_AF_IUCV)))
goto tx_drop;
+
+ if (txq == QETH_IQD_MCAST_TXQ)
+ cast_type = qeth_l3_get_cast_type(skb);
+ else
+ cast_type = RTN_UNICAST;
+ } else {
+ queue = card->qdio.out_qs[txq];
+ cast_type = qeth_l3_get_cast_type(skb);
}
if (cast_type == RTN_BROADCAST && !card->info.broadcast_capable)
goto tx_drop;
- netif_stop_queue(dev);
-
if (ipv == 4 || IS_IQD(card))
rc = qeth_l3_xmit(card, skb, queue, ipv, cast_type);
else
if (!rc) {
QETH_TXQ_STAT_INC(queue, tx_packets);
QETH_TXQ_STAT_ADD(queue, tx_bytes, tx_bytes);
- netif_wake_queue(dev);
return NETDEV_TX_OK;
- } else if (rc == -EBUSY) {
- return NETDEV_TX_BUSY;
- } /* else fall through */
+ }
tx_drop:
QETH_TXQ_STAT_INC(queue, tx_dropped);
kfree_skb(skb);
- netif_wake_queue(dev);
return NETDEV_TX_OK;
}
+static void qeth_l3_set_rx_mode(struct net_device *dev)
+{
+ struct qeth_card *card = dev->ml_priv;
+
+ schedule_work(&card->rx_mode_work);
+}
+
/*
* we need NOARP for IPv4 but we want neighbor solicitation for IPv6. Setting
* NOARP on the netdevice is no option because it also turns off neighbor
return qeth_features_check(skb, dev, features);
}
+static u16 qeth_l3_iqd_select_queue(struct net_device *dev, struct sk_buff *skb,
+ struct net_device *sb_dev)
+{
+ return qeth_iqd_select_queue(dev, skb, qeth_l3_get_cast_type(skb),
+ sb_dev);
+}
+
+static u16 qeth_l3_osa_select_queue(struct net_device *dev, struct sk_buff *skb,
+ struct net_device *sb_dev)
+{
+ struct qeth_card *card = dev->ml_priv;
+
+ return qeth_get_priority_queue(card, skb);
+}
+
static const struct net_device_ops qeth_l3_netdev_ops = {
.ndo_open = qeth_open,
.ndo_stop = qeth_stop,
.ndo_get_stats64 = qeth_get_stats64,
.ndo_start_xmit = qeth_l3_hard_start_xmit,
+ .ndo_select_queue = qeth_l3_iqd_select_queue,
.ndo_validate_addr = eth_validate_addr,
.ndo_set_rx_mode = qeth_l3_set_rx_mode,
.ndo_do_ioctl = qeth_do_ioctl,
.ndo_get_stats64 = qeth_get_stats64,
.ndo_start_xmit = qeth_l3_hard_start_xmit,
.ndo_features_check = qeth_l3_osa_features_check,
+ .ndo_select_queue = qeth_l3_osa_select_queue,
.ndo_validate_addr = eth_validate_addr,
.ndo_set_rx_mode = qeth_l3_set_rx_mode,
.ndo_do_ioctl = qeth_do_ioctl,
int rc;
hash_init(card->ip_htable);
+ mutex_init(&card->ip_lock);
+ card->cmd_wq = alloc_ordered_workqueue("%s_cmd", 0,
+ dev_name(&gdev->dev));
+ if (!card->cmd_wq)
+ return -ENOMEM;
if (gdev->dev.type == &qeth_generic_devtype) {
rc = qeth_l3_create_device_attributes(&gdev->dev);
- if (rc)
+ if (rc) {
+ destroy_workqueue(card->cmd_wq);
return rc;
+ }
}
hash_init(card->ip_mc_htable);
+ INIT_WORK(&card->rx_mode_work, qeth_l3_rx_mode_work);
return 0;
}
cancel_work_sync(&card->close_dev_work);
if (qeth_netdev_is_registered(card->dev))
unregister_netdev(card->dev);
+
+ flush_workqueue(card->cmd_wq);
+ destroy_workqueue(card->cmd_wq);
qeth_l3_clear_ip_htable(card, 0);
qeth_l3_clear_ipato_list(card);
}
{
switch (event) {
case NETDEV_UP:
- spin_lock_bh(&card->ip_lock);
- qeth_l3_add_ip(card, addr);
- spin_unlock_bh(&card->ip_lock);
+ qeth_l3_modify_ip(card, addr, true);
return NOTIFY_OK;
case NETDEV_DOWN:
- spin_lock_bh(&card->ip_lock);
- qeth_l3_delete_ip(card, addr);
- spin_unlock_bh(&card->ip_lock);
+ qeth_l3_modify_ip(card, addr, false);
return NOTIFY_OK;
default:
return NOTIFY_DONE;
}
}
+struct qeth_l3_ip_event_work {
+ struct work_struct work;
+ struct qeth_card *card;
+ struct qeth_ipaddr addr;
+};
+
+#define to_ip_work(w) container_of((w), struct qeth_l3_ip_event_work, work)
+
+static void qeth_l3_add_ip_worker(struct work_struct *work)
+{
+ struct qeth_l3_ip_event_work *ip_work = to_ip_work(work);
+
+ qeth_l3_modify_ip(ip_work->card, &ip_work->addr, true);
+ kfree(work);
+}
+
+static void qeth_l3_delete_ip_worker(struct work_struct *work)
+{
+ struct qeth_l3_ip_event_work *ip_work = to_ip_work(work);
+
+ qeth_l3_modify_ip(ip_work->card, &ip_work->addr, false);
+ kfree(work);
+}
+
static struct qeth_card *qeth_l3_get_card_from_dev(struct net_device *dev)
{
if (is_vlan_dev(dev))
{
struct inet6_ifaddr *ifa = (struct inet6_ifaddr *)ptr;
struct net_device *dev = ifa->idev->dev;
- struct qeth_ipaddr addr;
+ struct qeth_l3_ip_event_work *ip_work;
struct qeth_card *card;
+ if (event != NETDEV_UP && event != NETDEV_DOWN)
+ return NOTIFY_DONE;
+
card = qeth_l3_get_card_from_dev(dev);
if (!card)
return NOTIFY_DONE;
if (!qeth_is_supported(card, IPA_IPV6))
return NOTIFY_DONE;
- qeth_l3_init_ipaddr(&addr, QETH_IP_TYPE_NORMAL, QETH_PROT_IPV6);
- addr.u.a6.addr = ifa->addr;
- addr.u.a6.pfxlen = ifa->prefix_len;
+ ip_work = kmalloc(sizeof(*ip_work), GFP_ATOMIC);
+ if (!ip_work)
+ return NOTIFY_DONE;
- return qeth_l3_handle_ip_event(card, &addr, event);
+ if (event == NETDEV_UP)
+ INIT_WORK(&ip_work->work, qeth_l3_add_ip_worker);
+ else
+ INIT_WORK(&ip_work->work, qeth_l3_delete_ip_worker);
+
+ ip_work->card = card;
+ qeth_l3_init_ipaddr(&ip_work->addr, QETH_IP_TYPE_NORMAL,
+ QETH_PROT_IPV6);
+ ip_work->addr.u.a6.addr = ifa->addr;
+ ip_work->addr.u.a6.pfxlen = ifa->prefix_len;
+
+ queue_work(card->cmd_wq, &ip_work->work);
+ return NOTIFY_OK;
}
static struct notifier_block qeth_l3_ip6_notifier = {
if (card->ipato.enabled != enable) {
card->ipato.enabled = enable;
- spin_lock_bh(&card->ip_lock);
+ mutex_lock(&card->ip_lock);
qeth_l3_update_ipato(card);
- spin_unlock_bh(&card->ip_lock);
+ mutex_unlock(&card->ip_lock);
}
out:
mutex_unlock(&card->conf_mutex);
if (card->ipato.invert4 != invert) {
card->ipato.invert4 = invert;
- spin_lock_bh(&card->ip_lock);
+ mutex_lock(&card->ip_lock);
qeth_l3_update_ipato(card);
- spin_unlock_bh(&card->ip_lock);
+ mutex_unlock(&card->ip_lock);
}
out:
mutex_unlock(&card->conf_mutex);
entry_len = (proto == QETH_PROT_IPV4)? 12 : 40;
/* add strlen for "/<mask>\n" */
entry_len += (proto == QETH_PROT_IPV4)? 5 : 6;
- spin_lock_bh(&card->ip_lock);
+ mutex_lock(&card->ip_lock);
list_for_each_entry(ipatoe, &card->ipato.entries, entry) {
if (ipatoe->proto != proto)
continue;
i += snprintf(buf + i, PAGE_SIZE - i,
"%s/%i\n", addr_str, ipatoe->mask_bits);
}
- spin_unlock_bh(&card->ip_lock);
+ mutex_unlock(&card->ip_lock);
i += snprintf(buf + i, PAGE_SIZE - i, "\n");
return i;
if (card->ipato.invert6 != invert) {
card->ipato.invert6 = invert;
- spin_lock_bh(&card->ip_lock);
+ mutex_lock(&card->ip_lock);
qeth_l3_update_ipato(card);
- spin_unlock_bh(&card->ip_lock);
+ mutex_unlock(&card->ip_lock);
}
out:
mutex_unlock(&card->conf_mutex);
entry_len = (proto == QETH_PROT_IPV4)? 12 : 40;
entry_len += 2; /* \n + terminator */
- spin_lock_bh(&card->ip_lock);
+ mutex_lock(&card->ip_lock);
hash_for_each(card->ip_htable, i, ipaddr, hnode) {
if (ipaddr->proto != proto || ipaddr->type != type)
continue;
str_len += snprintf(buf + str_len, PAGE_SIZE - str_len, "%s\n",
addr_str);
}
- spin_unlock_bh(&card->ip_lock);
+ mutex_unlock(&card->ip_lock);
str_len += snprintf(buf + str_len, PAGE_SIZE - str_len, "\n");
return str_len;
}
static u16 rtw_select_queue(struct net_device *dev, struct sk_buff *skb,
- struct net_device *sb_dev,
- select_queue_fallback_t fallback)
+ struct net_device *sb_dev)
{
struct adapter *padapter = rtw_netdev_priv(dev);
struct mlme_priv *pmlmepriv = &padapter->mlmepriv;
static u16 rtw_select_queue(struct net_device *dev, struct sk_buff *skb,
- struct net_device *sb_dev,
- select_queue_fallback_t fallback)
+ struct net_device *sb_dev)
{
struct adapter *padapter = rtw_netdev_priv(dev);
struct mlme_priv *pmlmepriv = &padapter->mlmepriv;
{
.cmd = TCMU_CMD_SET_FEATURES,
.flags = GENL_ADMIN_PERM,
- .policy = tcmu_attr_policy,
.doit = tcmu_genl_set_features,
},
{
.cmd = TCMU_CMD_ADDED_DEVICE_DONE,
.flags = GENL_ADMIN_PERM,
- .policy = tcmu_attr_policy,
.doit = tcmu_genl_add_dev_done,
},
{
.cmd = TCMU_CMD_REMOVED_DEVICE_DONE,
.flags = GENL_ADMIN_PERM,
- .policy = tcmu_attr_policy,
.doit = tcmu_genl_rm_dev_done,
},
{
.cmd = TCMU_CMD_RECONFIG_DEVICE_DONE,
.flags = GENL_ADMIN_PERM,
- .policy = tcmu_attr_policy,
.doit = tcmu_genl_reconfig_dev_done,
},
};
.name = "TCM-USER",
.version = 2,
.maxattr = TCMU_ATTR_MAX,
+ .policy = tcmu_attr_policy,
.mcgrps = tcmu_mcgrps,
.n_mcgrps = ARRAY_SIZE(tcmu_mcgrps),
.netnsok = true,
const struct btf *btf,
const struct btf_type *key_type,
const struct btf_type *value_type);
+
+ /* Direct value access helpers. */
+ int (*map_direct_value_addr)(const struct bpf_map *map,
+ u64 *imm, u32 off);
+ int (*map_direct_value_meta)(const struct bpf_map *map,
+ u64 imm, u32 *off);
};
struct bpf_map {
struct btf *btf;
u32 pages;
bool unpriv_array;
- /* 51 bytes hole */
+ bool frozen; /* write-once */
+ /* 48 bytes hole */
/* The 3rd and 4th cacheline with misc members to avoid false sharing
* particularly with refcounting.
RET_PTR_TO_MAP_VALUE_OR_NULL, /* returns a pointer to map elem value or NULL */
RET_PTR_TO_SOCKET_OR_NULL, /* returns a pointer to a socket or NULL */
RET_PTR_TO_TCP_SOCK_OR_NULL, /* returns a pointer to a tcp_sock or NULL */
+ RET_PTR_TO_SOCK_COMMON_OR_NULL, /* returns a pointer to a sock_common or NULL */
};
/* eBPF function prototype used by verifier to allow BPF_CALLs from eBPF programs
};
};
+#define BPF_COMPLEXITY_LIMIT_INSNS 1000000 /* yes. 1M insns */
#define MAX_TAIL_CALL_CNT 32
+#define BPF_F_ACCESS_MASK (BPF_F_RDONLY | \
+ BPF_F_RDONLY_PROG | \
+ BPF_F_WRONLY | \
+ BPF_F_WRONLY_PROG)
+
+#define BPF_MAP_CAN_READ BIT(0)
+#define BPF_MAP_CAN_WRITE BIT(1)
+
+static inline u32 bpf_map_flags_to_cap(struct bpf_map *map)
+{
+ u32 access_flags = map->map_flags & (BPF_F_RDONLY_PROG | BPF_F_WRONLY_PROG);
+
+ /* Combination of BPF_F_RDONLY_PROG | BPF_F_WRONLY_PROG is
+ * not possible.
+ */
+ if (access_flags & BPF_F_RDONLY_PROG)
+ return BPF_MAP_CAN_READ;
+ else if (access_flags & BPF_F_WRONLY_PROG)
+ return BPF_MAP_CAN_WRITE;
+ else
+ return BPF_MAP_CAN_READ | BPF_MAP_CAN_WRITE;
+}
+
+static inline bool bpf_map_flags_access_ok(u32 access_flags)
+{
+ return (access_flags & (BPF_F_RDONLY_PROG | BPF_F_WRONLY_PROG)) !=
+ (BPF_F_RDONLY_PROG | BPF_F_WRONLY_PROG);
+}
+
struct bpf_event_entry {
struct perf_event *event;
struct file *perf_file;
u64 bpf_event_output(struct bpf_map *map, u64 flags, void *meta, u64 meta_size,
void *ctx, u64 ctx_size, bpf_ctx_copy_t ctx_copy);
-int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
- union bpf_attr __user *uattr);
-int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
- union bpf_attr __user *uattr);
-int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
- const union bpf_attr *kattr,
- union bpf_attr __user *uattr);
-
/* an array of programs to be executed under rcu_lock.
*
* Typical usage:
struct bpf_prog *bpf_prog_get_type_path(const char *name, enum bpf_prog_type type);
int array_map_alloc_check(union bpf_attr *attr);
+int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
+ union bpf_attr __user *uattr);
+int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
+ union bpf_attr __user *uattr);
+int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
+ const union bpf_attr *kattr,
+ union bpf_attr __user *uattr);
#else /* !CONFIG_BPF_SYSCALL */
static inline struct bpf_prog *bpf_prog_get(u32 ufd)
{
{
return ERR_PTR(-EOPNOTSUPP);
}
+
+static inline int bpf_prog_test_run_xdp(struct bpf_prog *prog,
+ const union bpf_attr *kattr,
+ union bpf_attr __user *uattr)
+{
+ return -ENOTSUPP;
+}
+
+static inline int bpf_prog_test_run_skb(struct bpf_prog *prog,
+ const union bpf_attr *kattr,
+ union bpf_attr __user *uattr)
+{
+ return -ENOTSUPP;
+}
+
+static inline int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
+ const union bpf_attr *kattr,
+ union bpf_attr __user *uattr)
+{
+ return -ENOTSUPP;
+}
#endif /* CONFIG_BPF_SYSCALL */
static inline struct bpf_prog *bpf_prog_get_type(u32 ufd,
struct bpf_verifier_state_list {
struct bpf_verifier_state state;
struct bpf_verifier_state_list *next;
+ int miss_cnt, hit_cnt;
};
/* Possible states for alu_state member. */
unsigned long map_state; /* pointer/poison value for maps */
s32 call_imm; /* saved imm field of call insn */
u32 alu_limit; /* limit for add/sub register with pointer */
+ struct {
+ u32 map_index; /* index into used_maps[] */
+ u32 map_off; /* offset from value base address */
+ };
};
int ctx_field_size; /* the ctx field size for load insn, maybe 0 */
int sanitize_stack_off; /* stack slot to be cleared */
return log->len_used >= log->len_total - 1;
}
+#define BPF_LOG_LEVEL1 1
+#define BPF_LOG_LEVEL2 2
+#define BPF_LOG_STATS 4
+#define BPF_LOG_LEVEL (BPF_LOG_LEVEL1 | BPF_LOG_LEVEL2)
+#define BPF_LOG_MASK (BPF_LOG_LEVEL | BPF_LOG_STATS)
+
static inline bool bpf_verifier_log_needed(const struct bpf_verifier_log *log)
{
return log->level && log->ubuf && !bpf_verifier_log_full(log);
bool strict_alignment; /* perform strict pointer alignment checks */
struct bpf_verifier_state *cur_state; /* current verifier state */
struct bpf_verifier_state_list **explored_states; /* search pruning optimization */
+ struct bpf_verifier_state_list *free_list;
struct bpf_map *used_maps[MAX_USED_MAPS]; /* array of map's used by eBPF program */
u32 used_map_cnt; /* number of used maps */
u32 id_gen; /* used to generate unique reg IDs */
struct bpf_verifier_log log;
struct bpf_subprog_info subprog_info[BPF_MAX_SUBPROGS + 1];
u32 subprog_cnt;
+ /* number of instructions analyzed by the verifier */
+ u32 insn_processed;
+ /* total verification time */
+ u64 verification_time;
+ /* maximum number of verifier states kept in 'branching' instructions */
+ u32 max_states_per_insn;
+ /* total number of allocated verifier states */
+ u32 total_states;
+ /* some states are freed during program analysis.
+ * this is peak number of states. this number dominates kernel
+ * memory consumption during verification
+ */
+ u32 peak_states;
+ /* longest register parentage chain walked for liveness marking */
+ u32 longest_mark_read_walk;
};
__printf(2, 0) void bpf_verifier_vlog(struct bpf_verifier_log *log,
const struct btf_member *m,
u32 expected_offset, u32 expected_size);
int btf_find_spin_lock(const struct btf *btf, const struct btf_type *t);
+bool btf_type_is_void(const struct btf_type *t);
#ifdef CONFIG_BPF_SYSCALL
const struct btf_type *btf_type_by_id(const struct btf *btf, u32 type_id);
{ \
handler \
.cmd = op_name, \
- .policy = CONCAT_(GENL_MAGIC_FAMILY, _tla_nl_policy), \
},
#define ZZZ_genl_ops CONCAT_(GENL_MAGIC_FAMILY, _genl_ops)
#ifdef GENL_MAGIC_FAMILY_HDRSZ
.hdrsize = NLA_ALIGN(GENL_MAGIC_FAMILY_HDRSZ),
#endif
- .maxattr = ARRAY_SIZE(drbd_tla_nl_policy)-1,
+ .maxattr = ARRAY_SIZE(CONCAT_(GENL_MAGIC_FAMILY, _tla_nl_policy))-1,
+ .policy = CONCAT_(GENL_MAGIC_FAMILY, _tla_nl_policy),
.ops = ZZZ_genl_ops,
.n_ops = ARRAY_SIZE(ZZZ_genl_ops),
.mcgrps = ZZZ_genl_mcgrps,
extern void brioctl_set(int (*ioctl_hook)(struct net *, unsigned int, void __user *));
-typedef int br_should_route_hook_t(struct sk_buff *skb);
-extern br_should_route_hook_t __rcu *br_should_route_hook;
-
#if IS_ENABLED(CONFIG_BRIDGE) && IS_ENABLED(CONFIG_BRIDGE_IGMP_SNOOPING)
int br_multicast_list_adjacent(struct net_device *dev,
struct list_head *br_ip_list);
return rtnl_dereference(dev->ip_ptr);
}
+/* called with rcu_read_lock or rtnl held */
+static inline bool ip_ignore_linkdown(const struct net_device *dev)
+{
+ struct in_device *in_dev;
+ bool rc = false;
+
+ in_dev = rcu_dereference_rtnl(dev->ip_ptr);
+ if (in_dev &&
+ IN_DEV_IGNORE_ROUTES_WITH_LINKDOWN(in_dev))
+ rc = true;
+
+ return rc;
+}
+
static inline struct neigh_parms *__in_dev_arp_parms_get_rcu(const struct net_device *dev)
{
struct in_device *in_dev = __in_dev_get_rcu(dev);
}
extern u64 jiffies64_to_nsecs(u64 j);
+extern u64 jiffies64_to_msecs(u64 j);
extern unsigned long __msecs_to_jiffies(const unsigned int m);
#if HZ <= MSEC_PER_SEC && !(MSEC_PER_SEC % HZ)
doorbell[0] = cpu_to_be32(sn << 28 | cmd | ci);
doorbell[1] = cpu_to_be32(cq->cqn);
- mlx5_write64(doorbell, uar_page + MLX5_CQ_DOORBELL, NULL);
+ mlx5_write64(doorbell, uar_page + MLX5_CQ_DOORBELL);
}
static inline void mlx5_cq_hold(struct mlx5_core_cq *cq)
enum {
MLX5_GENERAL_SUBTYPE_DELAY_DROP_TIMEOUT = 0x1,
+ MLX5_GENERAL_SUBTYPE_PCI_POWER_CHANGE_EVENT = 0x5,
};
enum {
#define MLX5_BF_OFFSET 0x800
#define MLX5_CQ_DOORBELL 0x20
-#if BITS_PER_LONG == 64
/* Assume that we can just write a 64-bit doorbell atomically. s390
* actually doesn't have writeq() but S/390 systems don't even have
* PCI so we won't worry about it.
+ *
+ * Note that the write is not atomic on 32-bit systems! In contrast to 64-bit
+ * ones, it requires proper locking. mlx5_write64 doesn't do any locking, so use
+ * it at your own discretion, protected by some kind of lock on 32 bits.
+ *
+ * TODO: use write{q,l}_relaxed()
*/
-#define MLX5_DECLARE_DOORBELL_LOCK(name)
-#define MLX5_INIT_DOORBELL_LOCK(ptr) do { } while (0)
-#define MLX5_GET_DOORBELL_LOCK(ptr) (NULL)
-
-static inline void mlx5_write64(__be32 val[2], void __iomem *dest,
- spinlock_t *doorbell_lock)
+static inline void mlx5_write64(__be32 val[2], void __iomem *dest)
{
+#if BITS_PER_LONG == 64
__raw_writeq(*(u64 *)val, dest);
-}
-
#else
-
-/* Just fall back to a spinlock to protect the doorbell if
- * BITS_PER_LONG is 32 -- there's no portable way to do atomic 64-bit
- * MMIO writes.
- */
-
-#define MLX5_DECLARE_DOORBELL_LOCK(name) spinlock_t name;
-#define MLX5_INIT_DOORBELL_LOCK(ptr) spin_lock_init(ptr)
-#define MLX5_GET_DOORBELL_LOCK(ptr) (ptr)
-
-static inline void mlx5_write64(__be32 val[2], void __iomem *dest,
- spinlock_t *doorbell_lock)
-{
- unsigned long flags;
-
- if (doorbell_lock)
- spin_lock_irqsave(doorbell_lock, flags);
__raw_writel((__force u32) val[0], dest);
__raw_writel((__force u32) val[1], dest + 4);
- if (doorbell_lock)
- spin_unlock_irqrestore(doorbell_lock, flags);
-}
-
#endif
+}
#endif /* MLX5_DOORBELL_H */
MLX5_REG_MTRC_CONF = 0x9041,
MLX5_REG_MTRC_STDB = 0x9042,
MLX5_REG_MTRC_CTRL = 0x9043,
+ MLX5_REG_MPEIN = 0x9050,
MLX5_REG_MPCNT = 0x9051,
MLX5_REG_MTPPS = 0x9053,
MLX5_REG_MTPPSE = 0x9054,
u64 sys_image_guid;
phys_addr_t iseg_base;
struct mlx5_init_seg __iomem *iseg;
+ phys_addr_t bar_addr;
enum mlx5_device_state state;
/* sync interface state */
struct mutex intf_state_mutex;
int mlx5_core_get_caps(struct mlx5_core_dev *dev, enum mlx5_cap_type cap_type);
int mlx5_cmd_alloc_uar(struct mlx5_core_dev *dev, u32 *uarn);
int mlx5_cmd_free_uar(struct mlx5_core_dev *dev, u32 uarn);
+void mlx5_health_flush(struct mlx5_core_dev *dev);
void mlx5_health_cleanup(struct mlx5_core_dev *dev);
int mlx5_health_init(struct mlx5_core_dev *dev);
void mlx5_start_health_poll(struct mlx5_core_dev *dev);
MLX5_ACTION_IN_FIELD_OUT_DIPV6_31_0 = 0x14,
MLX5_ACTION_IN_FIELD_OUT_SIPV4 = 0x15,
MLX5_ACTION_IN_FIELD_OUT_DIPV4 = 0x16,
+ MLX5_ACTION_IN_FIELD_OUT_FIRST_VID = 0x17,
MLX5_ACTION_IN_FIELD_OUT_IPV6_HOPLIMIT = 0x47,
};
union mlx5_ifc_eth_cntrs_grp_data_layout_auto_bits counter_set;
};
+struct mlx5_ifc_mpein_reg_bits {
+ u8 reserved_at_0[0x2];
+ u8 depth[0x6];
+ u8 pcie_index[0x8];
+ u8 node[0x8];
+ u8 reserved_at_18[0x8];
+
+ u8 capability_mask[0x20];
+
+ u8 reserved_at_40[0x8];
+ u8 link_width_enabled[0x8];
+ u8 link_speed_enabled[0x10];
+
+ u8 lane0_physical_position[0x8];
+ u8 link_width_active[0x8];
+ u8 link_speed_active[0x10];
+
+ u8 num_of_pfs[0x10];
+ u8 num_of_vfs[0x10];
+
+ u8 bdf0[0x10];
+ u8 reserved_at_b0[0x10];
+
+ u8 max_read_request_size[0x4];
+ u8 max_payload_size[0x4];
+ u8 reserved_at_c8[0x5];
+ u8 pwr_status[0x3];
+ u8 port_type[0x4];
+ u8 reserved_at_d4[0xb];
+ u8 lane_reversal[0x1];
+
+ u8 reserved_at_e0[0x14];
+ u8 pci_power[0xc];
+
+ u8 reserved_at_100[0x20];
+
+ u8 device_status[0x10];
+ u8 port_state[0x8];
+ u8 reserved_at_138[0x8];
+
+ u8 reserved_at_140[0x10];
+ u8 receiver_detect_result[0x10];
+
+ u8 reserved_at_160[0x20];
+};
+
struct mlx5_ifc_mpcnt_reg_bits {
u8 reserved_at_0[0x8];
u8 pcie_index[0x8];
};
struct mlx5_ifc_mcam_enhanced_features_bits {
- u8 reserved_at_0[0x74];
+ u8 reserved_at_0[0x6e];
+ u8 pci_status_and_power[0x1];
+ u8 reserved_at_6f[0x5];
u8 mark_tx_action_cnp[0x1];
u8 mark_tx_action_cqe[0x1];
u8 dynamic_tx_overflow[0x1];
struct mlx5_ifc_pmtu_reg_bits pmtu_reg;
struct mlx5_ifc_ppad_reg_bits ppad_reg;
struct mlx5_ifc_ppcnt_reg_bits ppcnt_reg;
+ struct mlx5_ifc_mpein_reg_bits mpein_reg;
struct mlx5_ifc_mpcnt_reg_bits mpcnt_reg;
struct mlx5_ifc_pplm_reg_bits pplm_reg;
struct mlx5_ifc_pplr_reg_bits pplr_reg;
#ifdef CONFIG_RPS
#include <linux/static_key.h>
-extern struct static_key rps_needed;
-extern struct static_key rfs_needed;
+extern struct static_key_false rps_needed;
+extern struct static_key_false rfs_needed;
#endif
struct neighbour;
* those the driver believes to be appropriate.
*
* u16 (*ndo_select_queue)(struct net_device *dev, struct sk_buff *skb,
- * struct net_device *sb_dev,
- * select_queue_fallback_t fallback);
+ * struct net_device *sb_dev);
* Called to decide which queue to use when device supports multiple
* transmit queues.
*
* that got dropped are freed/returned via xdp_return_frame().
* Returns negative number, means general error invoking ndo, meaning
* no frames were xmit'ed and core-caller will free all frames.
- * struct devlink *(*ndo_get_devlink)(struct net_device *dev);
- * Get devlink instance associated with a given netdev.
+ * struct devlink_port *(*ndo_get_devlink_port)(struct net_device *dev);
+ * Get devlink port instance associated with a given netdev.
* Called with a reference on the netdevice and devlink locks only,
* rtnl_lock is not held.
*/
netdev_features_t features);
u16 (*ndo_select_queue)(struct net_device *dev,
struct sk_buff *skb,
- struct net_device *sb_dev,
- select_queue_fallback_t fallback);
+ struct net_device *sb_dev);
void (*ndo_change_rx_flags)(struct net_device *dev,
int flags);
void (*ndo_set_rx_mode)(struct net_device *dev);
u32 flags);
int (*ndo_xsk_async_xmit)(struct net_device *dev,
u32 queue_id);
- struct devlink * (*ndo_get_devlink)(struct net_device *dev);
+ struct devlink_port * (*ndo_get_devlink_port)(struct net_device *dev);
};
/**
&qdisc_xmit_lock_key); \
}
-struct netdev_queue *netdev_pick_tx(struct net_device *dev,
- struct sk_buff *skb,
- struct net_device *sb_dev);
+u16 netdev_pick_tx(struct net_device *dev, struct sk_buff *skb,
+ struct net_device *sb_dev);
+struct netdev_queue *netdev_core_pick_tx(struct net_device *dev,
+ struct sk_buff *skb,
+ struct net_device *sb_dev);
/* returns the headroom that the master device needs to take in account
* when forwarding to this dev
void dev_disable_lro(struct net_device *dev);
int dev_loopback_xmit(struct net *net, struct sock *sk, struct sk_buff *newskb);
u16 dev_pick_tx_zero(struct net_device *dev, struct sk_buff *skb,
- struct net_device *sb_dev,
- select_queue_fallback_t fallback);
+ struct net_device *sb_dev);
u16 dev_pick_tx_cpu_id(struct net_device *dev, struct sk_buff *skb,
- struct net_device *sb_dev,
- select_queue_fallback_t fallback);
+ struct net_device *sb_dev);
int dev_queue_xmit(struct sk_buff *skb);
int dev_queue_xmit_accel(struct sk_buff *skb, struct net_device *sb_dev);
int dev_direct_xmit(struct sk_buff *skb, u16 queue_id);
void synchronize_net(void);
int init_dummy_netdev(struct net_device *dev);
-DECLARE_PER_CPU(int, xmit_recursion);
-#define XMIT_RECURSION_LIMIT 10
-
-static inline int dev_recursion_level(void)
-{
- return this_cpu_read(xmit_recursion);
-}
-
struct net_device *dev_get_by_index(struct net *net, int ifindex);
struct net_device *__dev_get_by_index(struct net *net, int ifindex);
struct net_device *dev_get_by_index_rcu(struct net *net, int ifindex);
#ifdef CONFIG_XFRM_OFFLOAD
struct sk_buff_head xfrm_backlog;
#endif
+ /* written and read only by owning cpu: */
+ struct {
+ u16 recursion;
+ u8 more;
+ } xmit;
#ifdef CONFIG_RPS
/* input_queue_head should be written by cpu owning this struct,
* and only read by other cpus. Worth using a cache line.
DECLARE_PER_CPU_ALIGNED(struct softnet_data, softnet_data);
+static inline int dev_recursion_level(void)
+{
+ return this_cpu_read(softnet_data.xmit.recursion);
+}
+
+#define XMIT_RECURSION_LIMIT 10
+static inline bool dev_xmit_recursion(void)
+{
+ return unlikely(__this_cpu_read(softnet_data.xmit.recursion) >
+ XMIT_RECURSION_LIMIT);
+}
+
+static inline void dev_xmit_recursion_inc(void)
+{
+ __this_cpu_inc(softnet_data.xmit.recursion);
+}
+
+static inline void dev_xmit_recursion_dec(void)
+{
+ __this_cpu_dec(softnet_data.xmit.recursion);
+}
+
void __netif_schedule(struct Qdisc *q);
void netif_schedule_queue(struct netdev_queue *txq);
struct sk_buff *skb, struct net_device *dev,
bool more)
{
- skb->xmit_more = more ? 1 : 0;
+ __this_cpu_write(softnet_data.xmit.more, more);
return ops->ndo_start_xmit(skb, dev);
}
+static inline bool netdev_xmit_more(void)
+{
+ return __this_cpu_read(softnet_data.xmit.more);
+}
+
static inline netdev_tx_t netdev_start_xmit(struct sk_buff *skb, struct net_device *dev,
struct netdev_queue *txq, bool more)
{
static inline int nf_inet_addr_cmp(const union nf_inet_addr *a1,
const union nf_inet_addr *a2)
{
+#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && BITS_PER_LONG == 64
+ const unsigned long *ul1 = (const unsigned long *)a1;
+ const unsigned long *ul2 = (const unsigned long *)a2;
+
+ return ((ul1[0] ^ ul2[0]) | (ul1[1] ^ ul2[1])) == 0UL;
+#else
return a1->all[0] == a2->all[0] &&
a1->all[1] == a2->all[1] &&
a1->all[2] == a2->all[2] &&
a1->all[3] == a2->all[3];
+#endif
}
static inline void nf_inet_addr_mask(const union nf_inet_addr *a1,
static inline void
nf_nat_decode_session(struct sk_buff *skb, struct flowi *fl, u_int8_t family)
{
-#ifdef CONFIG_NF_NAT_NEEDED
+#if IS_ENABLED(CONFIG_NF_NAT)
struct nf_nat_hook *nat_hook;
rcu_read_lock();
struct nf_osf_user_finger finger;
};
+struct nf_osf_data {
+ const char *genre;
+ const char *version;
+};
+
bool nf_osf_match(const struct sk_buff *skb, u_int8_t family,
int hooknum, struct net_device *in, struct net_device *out,
const struct nf_osf_info *info, struct net *net,
const struct list_head *nf_osf_fingers);
-const char *nf_osf_find(const struct sk_buff *skb,
- const struct list_head *nf_osf_fingers,
- const int ttl_check);
+bool nf_osf_find(const struct sk_buff *skb,
+ const struct list_head *nf_osf_fingers,
+ const int ttl_check, struct nf_osf_data *data);
#endif /* _NFOSF_H */
int *error);
struct xt_match *xt_find_match(u8 af, const char *name, u8 revision);
-struct xt_target *xt_find_target(u8 af, const char *name, u8 revision);
struct xt_match *xt_request_find_match(u8 af, const char *name, u8 revision);
struct xt_target *xt_request_find_target(u8 af, const char *name, u8 revision);
int xt_find_revision(u8 af, const char *name, u8 revision, int target,
}
int ip6_route_me_harder(struct net *net, struct sk_buff *skb);
+
+static inline int nf_ip6_route_me_harder(struct net *net, struct sk_buff *skb)
+{
+#if IS_MODULE(CONFIG_IPV6)
+ const struct nf_ipv6_ops *v6_ops = nf_get_ipv6_ops();
+
+ if (!v6_ops)
+ return -EHOSTUNREACH;
+
+ return v6_ops->route_me_harder(net, skb);
+#else
+ return ip6_route_me_harder(net, skb);
+#endif
+}
+
__sum16 nf_ip6_checksum(struct sk_buff *skb, unsigned int hook,
unsigned int dataoff, u_int8_t protocol);
* is_c45: Set to true if this phy uses clause 45 addressing.
* is_internal: Set to true if this phy is internal to a MAC.
* is_pseudo_fixed_link: Set to true if this phy is an Ethernet switch, etc.
+ * is_gigabit_capable: Set to true if PHY supports 1000Mbps
* has_fixups: Set to true if this phy has fixups/quirks.
* suspended: Set to true if this phy has been suspended successfully.
* sysfs_links: Internal boolean tracking sysfs symbolic links setup/removal.
unsigned is_c45:1;
unsigned is_internal:1;
unsigned is_pseudo_fixed_link:1;
+ unsigned is_gigabit_capable:1;
unsigned has_fixups:1;
unsigned suspended:1;
unsigned sysfs_links:1;
unsigned autoneg:1;
/* The most recently read link state */
unsigned link:1;
+ unsigned autoneg_complete:1;
/* Interrupts are enabled */
unsigned interrupts:1;
/* Clause 22 PHY */
int genphy_config_init(struct phy_device *phydev);
+int genphy_read_abilities(struct phy_device *phydev);
int genphy_setup_forced(struct phy_device *phydev);
int genphy_restart_aneg(struct phy_device *phydev);
int genphy_config_eee_advert(struct phy_device *phydev);
* @head_offset: Offset of rhash_head in struct to be hashed
* @max_size: Maximum size while expanding
* @min_size: Minimum size while shrinking
- * @locks_mul: Number of bucket locks to allocate per cpu (default: 32)
* @automatic_shrinking: Enable automatic shrinking of tables
* @hashfn: Hash function (default: jhash2 if !(key_len % 4), or jhash)
* @obj_hashfn: Function to hash object
unsigned int max_size;
u16 min_size;
bool automatic_shrinking;
- u8 locks_mul;
rht_hashfn_t hashfn;
rht_obj_hashfn_t obj_hashfn;
rht_obj_cmpfn_t obj_cmpfn;
#include <linux/list_nulls.h>
#include <linux/workqueue.h>
#include <linux/rculist.h>
+#include <linux/bit_spinlock.h>
#include <linux/rhashtable-types.h>
/*
+ * Objects in an rhashtable have an embedded struct rhash_head
+ * which is linked into as hash chain from the hash table - or one
+ * of two or more hash tables when the rhashtable is being resized.
* The end of the chain is marked with a special nulls marks which has
- * the least significant bit set.
+ * the least significant bit set but otherwise stores the address of
+ * the hash bucket. This allows us to be be sure we've found the end
+ * of the right list.
+ * The value stored in the hash bucket has BIT(0) used as a lock bit.
+ * This bit must be atomically set before any changes are made to
+ * the chain. To avoid dereferencing this pointer without clearing
+ * the bit first, we use an opaque 'struct rhash_lock_head *' for the
+ * pointer stored in the bucket. This struct needs to be defined so
+ * that rcu_dereference() works on it, but it has no content so a
+ * cast is needed for it to be useful. This ensures it isn't
+ * used by mistake with clearing the lock bit first.
*/
+struct rhash_lock_head {};
/* Maximum chain length before rehash
*
* @nest: Number of bits of first-level nested table.
* @rehash: Current bucket being rehashed
* @hash_rnd: Random seed to fold into hash
- * @locks_mask: Mask to apply before accessing locks[]
- * @locks: Array of spinlocks protecting individual buckets
* @walkers: List of active walkers
* @rcu: RCU structure for freeing the table
* @future_tbl: Table under construction during rehashing
struct bucket_table {
unsigned int size;
unsigned int nest;
- unsigned int rehash;
u32 hash_rnd;
- unsigned int locks_mask;
- spinlock_t *locks;
struct list_head walkers;
struct rcu_head rcu;
struct bucket_table __rcu *future_tbl;
- struct rhash_head __rcu *buckets[] ____cacheline_aligned_in_smp;
+ struct lockdep_map dep_map;
+
+ struct rhash_lock_head __rcu *buckets[] ____cacheline_aligned_in_smp;
};
/*
* NULLS_MARKER() expects a hash value with the low
* bits mostly likely to be significant, and it discards
* the msb.
- * We git it an address, in which the bottom 2 bits are
+ * We give it an address, in which the bottom bit is
* always 0, and the msb might be significant.
* So we shift the address down one bit to align with
* expectations and avoid losing a significant bit.
+ *
+ * We never store the NULLS_MARKER in the hash table
+ * itself as we need the lsb for locking.
+ * Instead we store a NULL
*/
#define RHT_NULLS_MARKER(ptr) \
((void *)NULLS_MARKER(((unsigned long) (ptr)) >> 1))
#define INIT_RHT_NULLS_HEAD(ptr) \
- ((ptr) = RHT_NULLS_MARKER(&(ptr)))
+ ((ptr) = NULL)
static inline bool rht_is_a_nulls(const struct rhash_head *ptr)
{
return atomic_read(&ht->nelems) >= ht->max_elems;
}
-/* The bucket lock is selected based on the hash and protects mutations
- * on a group of hash buckets.
- *
- * A maximum of tbl->size/2 bucket locks is allocated. This ensures that
- * a single lock always covers both buckets which may both contains
- * entries which link to the same bucket of the old table during resizing.
- * This allows to simplify the locking as locking the bucket in both
- * tables during resize always guarantee protection.
- *
- * IMPORTANT: When holding the bucket lock of both the old and new table
- * during expansions and shrinking, the old bucket lock must always be
- * acquired first.
- */
-static inline spinlock_t *rht_bucket_lock(const struct bucket_table *tbl,
- unsigned int hash)
-{
- return &tbl->locks[hash & tbl->locks_mask];
-}
-
#ifdef CONFIG_PROVE_LOCKING
int lockdep_rht_mutex_is_held(struct rhashtable *ht);
int lockdep_rht_bucket_is_held(const struct bucket_table *tbl, u32 hash);
void *arg);
void rhashtable_destroy(struct rhashtable *ht);
-struct rhash_head __rcu **rht_bucket_nested(const struct bucket_table *tbl,
- unsigned int hash);
-struct rhash_head __rcu **rht_bucket_nested_insert(struct rhashtable *ht,
- struct bucket_table *tbl,
+struct rhash_lock_head __rcu **rht_bucket_nested(const struct bucket_table *tbl,
+ unsigned int hash);
+struct rhash_lock_head __rcu **__rht_bucket_nested(const struct bucket_table *tbl,
unsigned int hash);
+struct rhash_lock_head __rcu **rht_bucket_nested_insert(struct rhashtable *ht,
+ struct bucket_table *tbl,
+ unsigned int hash);
#define rht_dereference(p, ht) \
rcu_dereference_protected(p, lockdep_rht_mutex_is_held(ht))
#define rht_entry(tpos, pos, member) \
({ tpos = container_of(pos, typeof(*tpos), member); 1; })
-static inline struct rhash_head __rcu *const *rht_bucket(
+static inline struct rhash_lock_head __rcu *const *rht_bucket(
const struct bucket_table *tbl, unsigned int hash)
{
return unlikely(tbl->nest) ? rht_bucket_nested(tbl, hash) :
&tbl->buckets[hash];
}
-static inline struct rhash_head __rcu **rht_bucket_var(
+static inline struct rhash_lock_head __rcu **rht_bucket_var(
struct bucket_table *tbl, unsigned int hash)
{
- return unlikely(tbl->nest) ? rht_bucket_nested(tbl, hash) :
+ return unlikely(tbl->nest) ? __rht_bucket_nested(tbl, hash) :
&tbl->buckets[hash];
}
-static inline struct rhash_head __rcu **rht_bucket_insert(
+static inline struct rhash_lock_head __rcu **rht_bucket_insert(
struct rhashtable *ht, struct bucket_table *tbl, unsigned int hash)
{
return unlikely(tbl->nest) ? rht_bucket_nested_insert(ht, tbl, hash) :
&tbl->buckets[hash];
}
+/*
+ * We lock a bucket by setting BIT(0) in the pointer - this is always
+ * zero in real pointers. The NULLS mark is never stored in the bucket,
+ * rather we store NULL if the bucket is empty.
+ * bit_spin_locks do not handle contention well, but the whole point
+ * of the hashtable design is to achieve minimum per-bucket contention.
+ * A nested hash table might not have a bucket pointer. In that case
+ * we cannot get a lock. For remove and replace the bucket cannot be
+ * interesting and doesn't need locking.
+ * For insert we allocate the bucket if this is the last bucket_table,
+ * and then take the lock.
+ * Sometimes we unlock a bucket by writing a new pointer there. In that
+ * case we don't need to unlock, but we do need to reset state such as
+ * local_bh. For that we have rht_assign_unlock(). As rcu_assign_pointer()
+ * provides the same release semantics that bit_spin_unlock() provides,
+ * this is safe.
+ * When we write to a bucket without unlocking, we use rht_assign_locked().
+ */
+
+static inline void rht_lock(struct bucket_table *tbl,
+ struct rhash_lock_head **bkt)
+{
+ local_bh_disable();
+ bit_spin_lock(0, (unsigned long *)bkt);
+ lock_map_acquire(&tbl->dep_map);
+}
+
+static inline void rht_lock_nested(struct bucket_table *tbl,
+ struct rhash_lock_head **bucket,
+ unsigned int subclass)
+{
+ local_bh_disable();
+ bit_spin_lock(0, (unsigned long *)bucket);
+ lock_acquire_exclusive(&tbl->dep_map, subclass, 0, NULL, _THIS_IP_);
+}
+
+static inline void rht_unlock(struct bucket_table *tbl,
+ struct rhash_lock_head **bkt)
+{
+ lock_map_release(&tbl->dep_map);
+ bit_spin_unlock(0, (unsigned long *)bkt);
+ local_bh_enable();
+}
+
+/*
+ * Where 'bkt' is a bucket and might be locked:
+ * rht_ptr() dereferences that pointer and clears the lock bit.
+ * rht_ptr_exclusive() dereferences in a context where exclusive
+ * access is guaranteed, such as when destroying the table.
+ */
+static inline struct rhash_head *rht_ptr(
+ struct rhash_lock_head __rcu * const *bkt,
+ struct bucket_table *tbl,
+ unsigned int hash)
+{
+ const struct rhash_lock_head *p =
+ rht_dereference_bucket_rcu(*bkt, tbl, hash);
+
+ if ((((unsigned long)p) & ~BIT(0)) == 0)
+ return RHT_NULLS_MARKER(bkt);
+ return (void *)(((unsigned long)p) & ~BIT(0));
+}
+
+static inline struct rhash_head *rht_ptr_exclusive(
+ struct rhash_lock_head __rcu * const *bkt)
+{
+ const struct rhash_lock_head *p =
+ rcu_dereference_protected(*bkt, 1);
+
+ if (!p)
+ return RHT_NULLS_MARKER(bkt);
+ return (void *)(((unsigned long)p) & ~BIT(0));
+}
+
+static inline void rht_assign_locked(struct rhash_lock_head __rcu **bkt,
+ struct rhash_head *obj)
+{
+ struct rhash_head __rcu **p = (struct rhash_head __rcu **)bkt;
+
+ if (rht_is_a_nulls(obj))
+ obj = NULL;
+ rcu_assign_pointer(*p, (void *)((unsigned long)obj | BIT(0)));
+}
+
+static inline void rht_assign_unlock(struct bucket_table *tbl,
+ struct rhash_lock_head __rcu **bkt,
+ struct rhash_head *obj)
+{
+ struct rhash_head __rcu **p = (struct rhash_head __rcu **)bkt;
+
+ if (rht_is_a_nulls(obj))
+ obj = NULL;
+ lock_map_release(&tbl->dep_map);
+ rcu_assign_pointer(*p, obj);
+ preempt_enable();
+ __release(bitlock);
+ local_bh_enable();
+}
+
/**
- * rht_for_each_continue - continue iterating over hash chain
+ * rht_for_each_from - iterate over hash chain from given head
* @pos: the &struct rhash_head to use as a loop cursor.
- * @head: the previous &struct rhash_head to continue from
+ * @head: the &struct rhash_head to start from
* @tbl: the &struct bucket_table
* @hash: the hash value / bucket index
*/
-#define rht_for_each_continue(pos, head, tbl, hash) \
- for (pos = rht_dereference_bucket(head, tbl, hash); \
- !rht_is_a_nulls(pos); \
+#define rht_for_each_from(pos, head, tbl, hash) \
+ for (pos = head; \
+ !rht_is_a_nulls(pos); \
pos = rht_dereference_bucket((pos)->next, tbl, hash))
/**
* @hash: the hash value / bucket index
*/
#define rht_for_each(pos, tbl, hash) \
- rht_for_each_continue(pos, *rht_bucket(tbl, hash), tbl, hash)
+ rht_for_each_from(pos, rht_ptr(rht_bucket(tbl, hash), tbl, hash), \
+ tbl, hash)
/**
- * rht_for_each_entry_continue - continue iterating over hash chain
+ * rht_for_each_entry_from - iterate over hash chain from given head
* @tpos: the type * to use as a loop cursor.
* @pos: the &struct rhash_head to use as a loop cursor.
- * @head: the previous &struct rhash_head to continue from
+ * @head: the &struct rhash_head to start from
* @tbl: the &struct bucket_table
* @hash: the hash value / bucket index
* @member: name of the &struct rhash_head within the hashable struct.
*/
-#define rht_for_each_entry_continue(tpos, pos, head, tbl, hash, member) \
- for (pos = rht_dereference_bucket(head, tbl, hash); \
+#define rht_for_each_entry_from(tpos, pos, head, tbl, hash, member) \
+ for (pos = head; \
(!rht_is_a_nulls(pos)) && rht_entry(tpos, pos, member); \
pos = rht_dereference_bucket((pos)->next, tbl, hash))
* @member: name of the &struct rhash_head within the hashable struct.
*/
#define rht_for_each_entry(tpos, pos, tbl, hash, member) \
- rht_for_each_entry_continue(tpos, pos, *rht_bucket(tbl, hash), \
- tbl, hash, member)
+ rht_for_each_entry_from(tpos, pos, \
+ rht_ptr(rht_bucket(tbl, hash), tbl, hash), \
+ tbl, hash, member)
/**
* rht_for_each_entry_safe - safely iterate over hash chain of given type
* remove the loop cursor from the list.
*/
#define rht_for_each_entry_safe(tpos, pos, next, tbl, hash, member) \
- for (pos = rht_dereference_bucket(*rht_bucket(tbl, hash), tbl, hash), \
+ for (pos = rht_ptr(rht_bucket(tbl, hash), tbl, hash), \
next = !rht_is_a_nulls(pos) ? \
rht_dereference_bucket(pos->next, tbl, hash) : NULL; \
(!rht_is_a_nulls(pos)) && rht_entry(tpos, pos, member); \
rht_dereference_bucket(pos->next, tbl, hash) : NULL)
/**
- * rht_for_each_rcu_continue - continue iterating over rcu hash chain
+ * rht_for_each_rcu_from - iterate over rcu hash chain from given head
* @pos: the &struct rhash_head to use as a loop cursor.
- * @head: the previous &struct rhash_head to continue from
+ * @head: the &struct rhash_head to start from
* @tbl: the &struct bucket_table
* @hash: the hash value / bucket index
*
* the _rcu mutation primitives such as rhashtable_insert() as long as the
* traversal is guarded by rcu_read_lock().
*/
-#define rht_for_each_rcu_continue(pos, head, tbl, hash) \
+#define rht_for_each_rcu_from(pos, head, tbl, hash) \
for (({barrier(); }), \
- pos = rht_dereference_bucket_rcu(head, tbl, hash); \
+ pos = head; \
!rht_is_a_nulls(pos); \
pos = rcu_dereference_raw(pos->next))
* the _rcu mutation primitives such as rhashtable_insert() as long as the
* traversal is guarded by rcu_read_lock().
*/
-#define rht_for_each_rcu(pos, tbl, hash) \
- rht_for_each_rcu_continue(pos, *rht_bucket(tbl, hash), tbl, hash)
+#define rht_for_each_rcu(pos, tbl, hash) \
+ for (({barrier(); }), \
+ pos = rht_ptr(rht_bucket(tbl, hash), tbl, hash); \
+ !rht_is_a_nulls(pos); \
+ pos = rcu_dereference_raw(pos->next))
/**
- * rht_for_each_entry_rcu_continue - continue iterating over rcu hash chain
+ * rht_for_each_entry_rcu_from - iterated over rcu hash chain from given head
* @tpos: the type * to use as a loop cursor.
* @pos: the &struct rhash_head to use as a loop cursor.
- * @head: the previous &struct rhash_head to continue from
+ * @head: the &struct rhash_head to start from
* @tbl: the &struct bucket_table
* @hash: the hash value / bucket index
* @member: name of the &struct rhash_head within the hashable struct.
* the _rcu mutation primitives such as rhashtable_insert() as long as the
* traversal is guarded by rcu_read_lock().
*/
-#define rht_for_each_entry_rcu_continue(tpos, pos, head, tbl, hash, member) \
+#define rht_for_each_entry_rcu_from(tpos, pos, head, tbl, hash, member) \
for (({barrier(); }), \
- pos = rht_dereference_bucket_rcu(head, tbl, hash); \
+ pos = head; \
(!rht_is_a_nulls(pos)) && rht_entry(tpos, pos, member); \
pos = rht_dereference_bucket_rcu(pos->next, tbl, hash))
* traversal is guarded by rcu_read_lock().
*/
#define rht_for_each_entry_rcu(tpos, pos, tbl, hash, member) \
- rht_for_each_entry_rcu_continue(tpos, pos, *rht_bucket(tbl, hash), \
- tbl, hash, member)
+ rht_for_each_entry_rcu_from(tpos, pos, \
+ rht_ptr(rht_bucket(tbl, hash), \
+ tbl, hash), \
+ tbl, hash, member)
/**
* rhl_for_each_rcu - iterate over rcu hash table list
.ht = ht,
.key = key,
};
- struct rhash_head __rcu * const *head;
+ struct rhash_lock_head __rcu * const *bkt;
struct bucket_table *tbl;
struct rhash_head *he;
unsigned int hash;
tbl = rht_dereference_rcu(ht->tbl, ht);
restart:
hash = rht_key_hashfn(ht, tbl, key, params);
- head = rht_bucket(tbl, hash);
+ bkt = rht_bucket(tbl, hash);
do {
- rht_for_each_rcu_continue(he, *head, tbl, hash) {
+ rht_for_each_rcu_from(he, rht_ptr(bkt, tbl, hash), tbl, hash) {
if (params.obj_cmpfn ?
params.obj_cmpfn(&arg, rht_obj(ht, he)) :
rhashtable_compare(&arg, rht_obj(ht, he)))
/* An object might have been moved to a different hash chain,
* while we walk along it - better check and retry.
*/
- } while (he != RHT_NULLS_MARKER(head));
+ } while (he != RHT_NULLS_MARKER(bkt));
/* Ensure we see any new tables. */
smp_rmb();
.ht = ht,
.key = key,
};
+ struct rhash_lock_head __rcu **bkt;
struct rhash_head __rcu **pprev;
struct bucket_table *tbl;
struct rhash_head *head;
- spinlock_t *lock;
unsigned int hash;
int elasticity;
void *data;
tbl = rht_dereference_rcu(ht->tbl, ht);
hash = rht_head_hashfn(ht, tbl, obj, params);
- lock = rht_bucket_lock(tbl, hash);
- spin_lock_bh(lock);
+ elasticity = RHT_ELASTICITY;
+ bkt = rht_bucket_insert(ht, tbl, hash);
+ data = ERR_PTR(-ENOMEM);
+ if (!bkt)
+ goto out;
+ pprev = NULL;
+ rht_lock(tbl, bkt);
if (unlikely(rcu_access_pointer(tbl->future_tbl))) {
slow_path:
- spin_unlock_bh(lock);
+ rht_unlock(tbl, bkt);
rcu_read_unlock();
return rhashtable_insert_slow(ht, key, obj);
}
- elasticity = RHT_ELASTICITY;
- pprev = rht_bucket_insert(ht, tbl, hash);
- data = ERR_PTR(-ENOMEM);
- if (!pprev)
- goto out;
-
- rht_for_each_continue(head, *pprev, tbl, hash) {
+ rht_for_each_from(head, rht_ptr(bkt, tbl, hash), tbl, hash) {
struct rhlist_head *plist;
struct rhlist_head *list;
data = rht_obj(ht, head);
if (!rhlist)
- goto out;
+ goto out_unlock;
list = container_of(obj, struct rhlist_head, rhead);
RCU_INIT_POINTER(list->next, plist);
head = rht_dereference_bucket(head->next, tbl, hash);
RCU_INIT_POINTER(list->rhead.next, head);
- rcu_assign_pointer(*pprev, obj);
-
- goto good;
+ if (pprev) {
+ rcu_assign_pointer(*pprev, obj);
+ rht_unlock(tbl, bkt);
+ } else
+ rht_assign_unlock(tbl, bkt, obj);
+ data = NULL;
+ goto out;
}
if (elasticity <= 0)
data = ERR_PTR(-E2BIG);
if (unlikely(rht_grow_above_max(ht, tbl)))
- goto out;
+ goto out_unlock;
if (unlikely(rht_grow_above_100(ht, tbl)))
goto slow_path;
- head = rht_dereference_bucket(*pprev, tbl, hash);
+ /* Inserting at head of list makes unlocking free. */
+ head = rht_ptr(bkt, tbl, hash);
RCU_INIT_POINTER(obj->next, head);
if (rhlist) {
RCU_INIT_POINTER(list->next, NULL);
}
- rcu_assign_pointer(*pprev, obj);
-
atomic_inc(&ht->nelems);
+ rht_assign_unlock(tbl, bkt, obj);
+
if (rht_grow_above_75(ht, tbl))
schedule_work(&ht->run_work);
-good:
data = NULL;
-
out:
- spin_unlock_bh(lock);
rcu_read_unlock();
return data;
+
+out_unlock:
+ rht_unlock(tbl, bkt);
+ goto out;
}
/**
* @obj: pointer to hash head inside object
* @params: hash table parameters
*
- * Will take a per bucket spinlock to protect against mutual mutations
+ * Will take the per bucket bitlock to protect against mutual mutations
* on the same bucket. Multiple insertions may occur in parallel unless
- * they map to the same bucket lock.
+ * they map to the same bucket.
*
* It is safe to call this function from atomic context.
*
* @list: pointer to hash list head inside object
* @params: hash table parameters
*
- * Will take a per bucket spinlock to protect against mutual mutations
+ * Will take the per bucket bitlock to protect against mutual mutations
* on the same bucket. Multiple insertions may occur in parallel unless
- * they map to the same bucket lock.
+ * they map to the same bucket.
*
* It is safe to call this function from atomic context.
*
* @list: pointer to hash list head inside object
* @params: hash table parameters
*
- * Will take a per bucket spinlock to protect against mutual mutations
+ * Will take the per bucket bitlock to protect against mutual mutations
* on the same bucket. Multiple insertions may occur in parallel unless
- * they map to the same bucket lock.
+ * they map to the same bucket.
*
* It is safe to call this function from atomic context.
*
* @obj: pointer to hash head inside object
* @params: hash table parameters
*
- * Locks down the bucket chain in both the old and new table if a resize
- * is in progress to ensure that writers can't remove from the old table
- * and can't insert to the new table during the atomic operation of search
- * and insertion. Searches for duplicates in both the old and new table if
- * a resize is in progress.
- *
* This lookup function may only be used for fixed key hash table (key_len
* parameter set). It will BUG() if used inappropriately.
*
* @obj: pointer to hash head inside object
* @params: hash table parameters
*
- * Locks down the bucket chain in both the old and new table if a resize
- * is in progress to ensure that writers can't remove from the old table
- * and can't insert to the new table during the atomic operation of search
- * and insertion. Searches for duplicates in both the old and new table if
- * a resize is in progress.
- *
* Lookups may occur in parallel with hashtable mutations and resizing.
*
* Will trigger an automatic deferred table resizing if residency in the
struct rhash_head *obj, const struct rhashtable_params params,
bool rhlist)
{
+ struct rhash_lock_head __rcu **bkt;
struct rhash_head __rcu **pprev;
struct rhash_head *he;
- spinlock_t * lock;
unsigned int hash;
int err = -ENOENT;
hash = rht_head_hashfn(ht, tbl, obj, params);
- lock = rht_bucket_lock(tbl, hash);
-
- spin_lock_bh(lock);
+ bkt = rht_bucket_var(tbl, hash);
+ if (!bkt)
+ return -ENOENT;
+ pprev = NULL;
+ rht_lock(tbl, bkt);
- pprev = rht_bucket_var(tbl, hash);
- rht_for_each_continue(he, *pprev, tbl, hash) {
+ rht_for_each_from(he, rht_ptr(bkt, tbl, hash), tbl, hash) {
struct rhlist_head *list;
list = container_of(he, struct rhlist_head, rhead);
}
}
- rcu_assign_pointer(*pprev, obj);
- break;
+ if (pprev) {
+ rcu_assign_pointer(*pprev, obj);
+ rht_unlock(tbl, bkt);
+ } else {
+ rht_assign_unlock(tbl, bkt, obj);
+ }
+ goto unlocked;
}
- spin_unlock_bh(lock);
-
+ rht_unlock(tbl, bkt);
+unlocked:
if (err > 0) {
atomic_dec(&ht->nelems);
if (unlikely(ht->p.automatic_shrinking &&
struct rhash_head *obj_old, struct rhash_head *obj_new,
const struct rhashtable_params params)
{
+ struct rhash_lock_head __rcu **bkt;
struct rhash_head __rcu **pprev;
struct rhash_head *he;
- spinlock_t *lock;
unsigned int hash;
int err = -ENOENT;
if (hash != rht_head_hashfn(ht, tbl, obj_new, params))
return -EINVAL;
- lock = rht_bucket_lock(tbl, hash);
+ bkt = rht_bucket_var(tbl, hash);
+ if (!bkt)
+ return -ENOENT;
- spin_lock_bh(lock);
+ pprev = NULL;
+ rht_lock(tbl, bkt);
- pprev = rht_bucket_var(tbl, hash);
- rht_for_each_continue(he, *pprev, tbl, hash) {
+ rht_for_each_from(he, rht_ptr(bkt, tbl, hash), tbl, hash) {
if (he != obj_old) {
pprev = &he->next;
continue;
}
rcu_assign_pointer(obj_new->next, obj_old->next);
- rcu_assign_pointer(*pprev, obj_new);
+ if (pprev) {
+ rcu_assign_pointer(*pprev, obj_new);
+ rht_unlock(tbl, bkt);
+ } else {
+ rht_assign_unlock(tbl, bkt, obj_new);
+ }
err = 0;
- break;
+ goto unlocked;
}
- spin_unlock_bh(lock);
+ rht_unlock(tbl, bkt);
+unlocked:
return err;
}
u64 key[2];
} siphash_key_t;
+static inline bool siphash_key_is_zero(const siphash_key_t *key)
+{
+ return !(key->key[0] | key->key[1]);
+}
+
u64 __siphash_aligned(const void *data, size_t len, const siphash_key_t *key);
#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
u64 __siphash_unaligned(const void *data, size_t len, const siphash_key_t *key);
* @tc_index: Traffic control index
* @hash: the packet hash
* @queue_mapping: Queue mapping for multiqueue devices
- * @xmit_more: More SKBs are pending for this queue
* @pfmemalloc: skbuff was allocated from PFMEMALLOC reserves
* @active_extensions: active extensions (skb_ext_id types)
* @ndisc_nodetype: router type (from link layer)
fclone:2,
peeked:1,
head_frag:1,
- xmit_more:1,
pfmemalloc:1;
#ifdef CONFIG_SKB_EXTENSIONS
__u8 active_extensions;
unsigned int flags,
void (*destructor)(struct sock *sk,
struct sk_buff *skb),
- int *peeked, int *off, int *err,
+ int *off, int *err,
struct sk_buff **last);
struct sk_buff *__skb_try_recv_datagram(struct sock *sk, unsigned flags,
void (*destructor)(struct sock *sk,
struct sk_buff *skb),
- int *peeked, int *off, int *err,
+ int *off, int *err,
struct sk_buff **last);
struct sk_buff *__skb_recv_datagram(struct sock *sk, unsigned flags,
void (*destructor)(struct sock *sk,
struct sk_buff *skb),
- int *peeked, int *off, int *err);
+ int *off, int *err);
struct sk_buff *skb_recv_datagram(struct sock *sk, unsigned flags, int noblock,
int *err);
__poll_t datagram_poll(struct file *file, struct socket *sock,
void ipv6_mc_dad_complete(struct inet6_dev *idev);
-/* A stub used by vxlan module. This is ugly, ideally these
- * symbols should be built into the core kernel.
- */
-struct ipv6_stub {
- int (*ipv6_sock_mc_join)(struct sock *sk, int ifindex,
- const struct in6_addr *addr);
- int (*ipv6_sock_mc_drop)(struct sock *sk, int ifindex,
- const struct in6_addr *addr);
- int (*ipv6_dst_lookup)(struct net *net, struct sock *sk,
- struct dst_entry **dst, struct flowi6 *fl6);
- int (*ipv6_route_input)(struct sk_buff *skb);
-
- struct fib6_table *(*fib6_get_table)(struct net *net, u32 id);
- struct fib6_info *(*fib6_lookup)(struct net *net, int oif,
- struct flowi6 *fl6, int flags);
- struct fib6_info *(*fib6_table_lookup)(struct net *net,
- struct fib6_table *table,
- int oif, struct flowi6 *fl6,
- int flags);
- struct fib6_info *(*fib6_multipath_select)(const struct net *net,
- struct fib6_info *f6i,
- struct flowi6 *fl6, int oif,
- const struct sk_buff *skb,
- int strict);
- u32 (*ip6_mtu_from_fib6)(struct fib6_info *f6i, struct in6_addr *daddr,
- struct in6_addr *saddr);
-
- void (*udpv6_encap_enable)(void);
- void (*ndisc_send_na)(struct net_device *dev, const struct in6_addr *daddr,
- const struct in6_addr *solicited_addr,
- bool router, bool solicited, bool override, bool inc_opt);
- struct neigh_table *nd_tbl;
-};
-extern const struct ipv6_stub *ipv6_stub __read_mostly;
-
-/* A stub used by bpf helpers. Similarly ugly as ipv6_stub */
-struct ipv6_bpf_stub {
- int (*inet6_bind)(struct sock *sk, struct sockaddr *uaddr, int addr_len,
- bool force_bind_address_no_port, bool with_lock);
- struct sock *(*udp6_lib_lookup)(struct net *net,
- const struct in6_addr *saddr, __be16 sport,
- const struct in6_addr *daddr, __be16 dport,
- int dif, int sdif, struct udp_table *tbl,
- struct sk_buff *skb);
-};
-extern const struct ipv6_bpf_stub *ipv6_bpf_stub __read_mostly;
-
/*
* identify MLD packets for MLD filter exceptions
*/
refcount_inc(&idev->refcnt);
}
+/* called with rcu_read_lock held */
+static inline bool ip6_ignore_linkdown(const struct net_device *dev)
+{
+ const struct inet6_dev *idev = __in6_dev_get(dev);
+
+ return !!idev->cnf.ignore_routes_with_linkdown;
+}
+
void inet6_ifa_finish_destroy(struct inet6_ifaddr *ifp);
static inline void in6_ifa_put(struct inet6_ifaddr *ifp)
#include <linux/gfp.h>
#include <linux/list.h>
#include <linux/netdevice.h>
+#include <linux/spinlock.h>
#include <net/net_namespace.h>
#include <uapi/linux/devlink.h>
};
struct devlink_port_attrs {
- bool set;
+ u8 set:1,
+ split:1,
+ switch_port:1;
enum devlink_port_flavour flavour;
u32 port_number; /* same value as "split group" */
- bool split;
u32 split_subport_number;
+ struct netdev_phys_item_id switch_id;
};
struct devlink_port {
struct devlink *devlink;
unsigned index;
bool registered;
+ spinlock_t type_lock; /* Protects type and type_dev
+ * pointer consistency.
+ */
enum devlink_port_type type;
enum devlink_port_type desired_type;
void *type_dev;
return container_of(priv, struct devlink, priv);
}
+static inline struct devlink_port *
+netdev_to_devlink_port(struct net_device *dev)
+{
+ if (dev->netdev_ops->ndo_get_devlink_port)
+ return dev->netdev_ops->ndo_get_devlink_port(dev);
+ return NULL;
+}
+
static inline struct devlink *netdev_to_devlink(struct net_device *dev)
{
-#if IS_ENABLED(CONFIG_NET_DEVLINK)
- if (dev->netdev_ops->ndo_get_devlink)
- return dev->netdev_ops->ndo_get_devlink(dev);
-#endif
+ struct devlink_port *devlink_port = netdev_to_devlink_port(dev);
+
+ if (devlink_port)
+ return devlink_port->devlink;
return NULL;
}
struct ib_device;
-#if IS_ENABLED(CONFIG_NET_DEVLINK)
-
struct devlink *devlink_alloc(const struct devlink_ops *ops, size_t priv_size);
int devlink_register(struct devlink *devlink, struct device *dev);
void devlink_unregister(struct devlink *devlink);
void devlink_port_attrs_set(struct devlink_port *devlink_port,
enum devlink_port_flavour flavour,
u32 port_number, bool split,
- u32 split_subport_number);
-int devlink_port_get_phys_port_name(struct devlink_port *devlink_port,
- char *name, size_t len);
+ u32 split_subport_number,
+ const unsigned char *switch_id,
+ unsigned char switch_id_len);
int devlink_sb_register(struct devlink *devlink, unsigned int sb_index,
u32 size, u16 ingress_pools_count,
u16 egress_pools_count, u16 ingress_tc_count,
devlink_health_reporter_state_update(struct devlink_health_reporter *reporter,
enum devlink_health_reporter_state state);
+#if IS_ENABLED(CONFIG_NET_DEVLINK)
+
void devlink_compat_running_version(struct net_device *dev,
char *buf, size_t len);
int devlink_compat_flash_update(struct net_device *dev, const char *file_name);
+int devlink_compat_phys_port_name_get(struct net_device *dev,
+ char *name, size_t len);
+int devlink_compat_switch_id_get(struct net_device *dev,
+ struct netdev_phys_item_id *ppid);
#else
-static inline struct devlink *devlink_alloc(const struct devlink_ops *ops,
- size_t priv_size)
-{
- return kzalloc(sizeof(struct devlink) + priv_size, GFP_KERNEL);
-}
-
-static inline int devlink_register(struct devlink *devlink, struct device *dev)
-{
- return 0;
-}
-
-static inline void devlink_unregister(struct devlink *devlink)
-{
-}
-
-static inline void devlink_params_publish(struct devlink *devlink)
-{
-}
-
-static inline void devlink_params_unpublish(struct devlink *devlink)
-{
-}
-
-static inline void devlink_free(struct devlink *devlink)
-{
- kfree(devlink);
-}
-
-static inline int devlink_port_register(struct devlink *devlink,
- struct devlink_port *devlink_port,
- unsigned int port_index)
-{
- return 0;
-}
-
-static inline void devlink_port_unregister(struct devlink_port *devlink_port)
-{
-}
-
-static inline void devlink_port_type_eth_set(struct devlink_port *devlink_port,
- struct net_device *netdev)
-{
-}
-
-static inline void devlink_port_type_ib_set(struct devlink_port *devlink_port,
- struct ib_device *ibdev)
-{
-}
-
-static inline void devlink_port_type_clear(struct devlink_port *devlink_port)
-{
-}
-
-static inline void devlink_port_attrs_set(struct devlink_port *devlink_port,
- enum devlink_port_flavour flavour,
- u32 port_number, bool split,
- u32 split_subport_number)
-{
-}
-
-static inline int
-devlink_port_get_phys_port_name(struct devlink_port *devlink_port,
- char *name, size_t len)
-{
- return -EOPNOTSUPP;
-}
-
-static inline int devlink_sb_register(struct devlink *devlink,
- unsigned int sb_index, u32 size,
- u16 ingress_pools_count,
- u16 egress_pools_count,
- u16 ingress_tc_count,
- u16 egress_tc_count)
-{
- return 0;
-}
-
-static inline void devlink_sb_unregister(struct devlink *devlink,
- unsigned int sb_index)
-{
-}
-
-static inline int
-devlink_dpipe_table_register(struct devlink *devlink,
- const char *table_name,
- struct devlink_dpipe_table_ops *table_ops,
- void *priv, bool counter_control_extern)
-{
- return 0;
-}
-
-static inline void devlink_dpipe_table_unregister(struct devlink *devlink,
- const char *table_name)
-{
-}
-
-static inline int devlink_dpipe_headers_register(struct devlink *devlink,
- struct devlink_dpipe_headers *
- dpipe_headers)
-{
- return 0;
-}
-
-static inline void devlink_dpipe_headers_unregister(struct devlink *devlink)
-{
-}
-
-static inline bool devlink_dpipe_table_counter_enabled(struct devlink *devlink,
- const char *table_name)
-{
- return false;
-}
-
-static inline int
-devlink_dpipe_entry_ctx_prepare(struct devlink_dpipe_dump_ctx *dump_ctx)
-{
- return 0;
-}
-
-static inline int
-devlink_dpipe_entry_ctx_append(struct devlink_dpipe_dump_ctx *dump_ctx,
- struct devlink_dpipe_entry *entry)
-{
- return 0;
-}
-
-static inline int
-devlink_dpipe_entry_ctx_close(struct devlink_dpipe_dump_ctx *dump_ctx)
-{
- return 0;
-}
-
-static inline void
-devlink_dpipe_entry_clear(struct devlink_dpipe_entry *entry)
-{
-}
-
-static inline int
-devlink_dpipe_action_put(struct sk_buff *skb,
- struct devlink_dpipe_action *action)
-{
- return 0;
-}
-
-static inline int
-devlink_dpipe_match_put(struct sk_buff *skb,
- struct devlink_dpipe_match *match)
-{
- return 0;
-}
-
-static inline int
-devlink_resource_register(struct devlink *devlink,
- const char *resource_name,
- u64 resource_size,
- u64 resource_id,
- u64 parent_resource_id,
- const struct devlink_resource_size_params *size_params)
-{
- return 0;
-}
-
static inline void
-devlink_resources_unregister(struct devlink *devlink,
- struct devlink_resource *resource)
-{
-}
-
-static inline int
-devlink_resource_size_get(struct devlink *devlink, u64 resource_id,
- u64 *p_resource_size)
-{
- return -EOPNOTSUPP;
-}
-
-static inline int
-devlink_dpipe_table_resource_set(struct devlink *devlink,
- const char *table_name, u64 resource_id,
- u64 resource_units)
-{
- return -EOPNOTSUPP;
-}
-
-static inline void
-devlink_resource_occ_get_register(struct devlink *devlink,
- u64 resource_id,
- devlink_resource_occ_get_t *occ_get,
- void *occ_get_priv)
-{
-}
-
-static inline void
-devlink_resource_occ_get_unregister(struct devlink *devlink,
- u64 resource_id)
-{
-}
-
-static inline int
-devlink_params_register(struct devlink *devlink,
- const struct devlink_param *params,
- size_t params_count)
-{
- return 0;
-}
-
-static inline void
-devlink_params_unregister(struct devlink *devlink,
- const struct devlink_param *params,
- size_t params_count)
-{
-
-}
-
-static inline int
-devlink_port_params_register(struct devlink_port *devlink_port,
- const struct devlink_param *params,
- size_t params_count)
-{
- return 0;
-}
-
-static inline void
-devlink_port_params_unregister(struct devlink_port *devlink_port,
- const struct devlink_param *params,
- size_t params_count)
-{
-}
-
-static inline int
-devlink_param_driverinit_value_get(struct devlink *devlink, u32 param_id,
- union devlink_param_value *init_val)
+devlink_compat_running_version(struct net_device *dev, char *buf, size_t len)
{
- return -EOPNOTSUPP;
}
static inline int
-devlink_param_driverinit_value_set(struct devlink *devlink, u32 param_id,
- union devlink_param_value init_val)
+devlink_compat_flash_update(struct net_device *dev, const char *file_name)
{
return -EOPNOTSUPP;
}
static inline int
-devlink_port_param_driverinit_value_get(struct devlink_port *devlink_port,
- u32 param_id,
- union devlink_param_value *init_val)
+devlink_compat_phys_port_name_get(struct net_device *dev,
+ char *name, size_t len)
{
return -EOPNOTSUPP;
}
static inline int
-devlink_port_param_driverinit_value_set(struct devlink_port *devlink_port,
- u32 param_id,
- union devlink_param_value init_val)
+devlink_compat_switch_id_get(struct net_device *dev,
+ struct netdev_phys_item_id *ppid)
{
return -EOPNOTSUPP;
}
-static inline void
-devlink_param_value_changed(struct devlink *devlink, u32 param_id)
-{
-}
-
-static inline void
-devlink_port_param_value_changed(struct devlink_port *devlink_port,
- u32 param_id)
-{
-}
-
-static inline void
-devlink_param_value_str_fill(union devlink_param_value *dst_val,
- const char *src)
-{
-}
-
-static inline struct devlink_region *
-devlink_region_create(struct devlink *devlink,
- const char *region_name,
- u32 region_max_snapshots,
- u64 region_size)
-{
- return NULL;
-}
-
-static inline void
-devlink_region_destroy(struct devlink_region *region)
-{
-}
-
-static inline u32
-devlink_region_shapshot_id_get(struct devlink *devlink)
-{
- return 0;
-}
-
-static inline int
-devlink_region_snapshot_create(struct devlink_region *region, u64 data_len,
- u8 *data, u32 snapshot_id,
- devlink_snapshot_data_dest_t *data_destructor)
-{
- return 0;
-}
-
-static inline int
-devlink_info_driver_name_put(struct devlink_info_req *req, const char *name)
-{
- return 0;
-}
-
-static inline int
-devlink_info_serial_number_put(struct devlink_info_req *req, const char *sn)
-{
- return 0;
-}
-
-static inline int
-devlink_info_version_fixed_put(struct devlink_info_req *req,
- const char *version_name,
- const char *version_value)
-{
- return 0;
-}
-
-static inline int
-devlink_info_version_stored_put(struct devlink_info_req *req,
- const char *version_name,
- const char *version_value)
-{
- return 0;
-}
-
-static inline int
-devlink_info_version_running_put(struct devlink_info_req *req,
- const char *version_name,
- const char *version_value)
-{
- return 0;
-}
-
-static inline int
-devlink_fmsg_obj_nest_start(struct devlink_fmsg *fmsg)
-{
- return 0;
-}
-
-static inline int
-devlink_fmsg_obj_nest_end(struct devlink_fmsg *fmsg)
-{
- return 0;
-}
-
-static inline int
-devlink_fmsg_pair_nest_start(struct devlink_fmsg *fmsg, const char *name)
-{
- return 0;
-}
-
-static inline int
-devlink_fmsg_pair_nest_end(struct devlink_fmsg *fmsg)
-{
- return 0;
-}
-
-static inline int
-devlink_fmsg_arr_pair_nest_start(struct devlink_fmsg *fmsg,
- const char *name)
-{
- return 0;
-}
-
-static inline int
-devlink_fmsg_arr_pair_nest_end(struct devlink_fmsg *fmsg)
-{
- return 0;
-}
-
-static inline int
-devlink_fmsg_bool_put(struct devlink_fmsg *fmsg, bool value)
-{
- return 0;
-}
-
-static inline int
-devlink_fmsg_u8_put(struct devlink_fmsg *fmsg, u8 value)
-{
- return 0;
-}
-
-static inline int
-devlink_fmsg_u32_put(struct devlink_fmsg *fmsg, u32 value)
-{
- return 0;
-}
-
-static inline int
-devlink_fmsg_u64_put(struct devlink_fmsg *fmsg, u64 value)
-{
- return 0;
-}
-
-static inline int
-devlink_fmsg_string_put(struct devlink_fmsg *fmsg, const char *value)
-{
- return 0;
-}
-
-static inline int
-devlink_fmsg_binary_put(struct devlink_fmsg *fmsg, const void *value,
- u16 value_len)
-{
- return 0;
-}
-
-static inline int
-devlink_fmsg_bool_pair_put(struct devlink_fmsg *fmsg, const char *name,
- bool value)
-{
- return 0;
-}
-
-static inline int
-devlink_fmsg_u8_pair_put(struct devlink_fmsg *fmsg, const char *name,
- u8 value)
-{
- return 0;
-}
-
-static inline int
-devlink_fmsg_u32_pair_put(struct devlink_fmsg *fmsg, const char *name,
- u32 value)
-{
- return 0;
-}
-
-static inline int
-devlink_fmsg_u64_pair_put(struct devlink_fmsg *fmsg, const char *name,
- u64 value)
-{
- return 0;
-}
-
-static inline int
-devlink_fmsg_string_pair_put(struct devlink_fmsg *fmsg, const char *name,
- const char *value)
-{
- return 0;
-}
-
-static inline int
-devlink_fmsg_binary_pair_put(struct devlink_fmsg *fmsg, const char *name,
- const void *value, u16 value_len)
-{
- return 0;
-}
-
-static inline struct devlink_health_reporter *
-devlink_health_reporter_create(struct devlink *devlink,
- const struct devlink_health_reporter_ops *ops,
- u64 graceful_period, bool auto_recover,
- void *priv)
-{
- return NULL;
-}
-
-static inline void
-devlink_health_reporter_destroy(struct devlink_health_reporter *reporter)
-{
-}
-
-static inline void *
-devlink_health_reporter_priv(struct devlink_health_reporter *reporter)
-{
- return NULL;
-}
-
-static inline int
-devlink_health_report(struct devlink_health_reporter *reporter,
- const char *msg, void *priv_ctx)
-{
- return 0;
-}
-
-static inline void
-devlink_health_reporter_state_update(struct devlink_health_reporter *reporter,
- enum devlink_health_reporter_state state)
-{
-}
-
-static inline void
-devlink_compat_running_version(struct net_device *dev, char *buf, size_t len)
-{
-}
-
-static inline int
-devlink_compat_flash_update(struct net_device *dev, const char *file_name)
-{
- return -EOPNOTSUPP;
-}
#endif
#endif /* _NET_DEVLINK_H_ */
unsigned int index;
const char *name;
const struct dsa_port *cpu_dp;
+ const char *mac;
struct device_node *dn;
unsigned int ageing_time;
u8 stp_state;
#include <net/neighbour.h>
#include <asm/processor.h>
-#define DST_GC_MIN (HZ/10)
-#define DST_GC_INC (HZ/2)
-#define DST_GC_MAX (120*HZ)
-
-/* Each dst_entry has reference count and sits in some parent list(s).
- * When it is removed from parent list, it is "freed" (dst_free).
- * After this it enters dead state (dst->obsolete > 0) and if its refcnt
- * is zero, it can be destroyed immediately, otherwise it is added
- * to gc list and garbage collector periodically checks the refcnt.
- */
-
struct sk_buff;
struct dst_entry {
* @name: name of family
* @version: protocol version
* @maxattr: maximum number of attributes supported
+ * @policy: netlink policy
* @netnsok: set to true if the family can handle network
* namespaces and should be presented in all of them
* @parallel_ops: operations can be called in parallel and aren't
unsigned int maxattr;
bool netnsok;
bool parallel_ops;
+ const struct nla_policy *policy;
int (*pre_doit)(const struct genl_ops *ops,
struct sk_buff *skb,
struct genl_info *info);
* @cmd: command identifier
* @internal_flags: flags used by the family
* @flags: flags
- * @policy: attribute validation policy
* @doit: standard command callback
* @start: start callback for dumps
* @dumpit: callback for dumpers
* @done: completion callback for dumps
*/
struct genl_ops {
- const struct nla_policy *policy;
int (*doit)(struct sk_buff *skb,
struct genl_info *info);
int (*start)(struct netlink_callback *cb);
#include <net/udp_tunnel.h>
+#define GENEVE_UDP_PORT 6081
+
/* Geneve Header:
* +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
* |Ver| Opt Len |O|C| Rsvd. | Protocol Type |
#define IPV4_MAX_PMTU 65535U /* RFC 2675, Section 5.1 */
#define IPV4_MIN_MTU 68 /* RFC 791 */
+extern unsigned int sysctl_fib_sync_mem;
+extern unsigned int sysctl_fib_sync_mem_min;
+extern unsigned int sysctl_fib_sync_mem_max;
+
struct sock;
struct inet_skb_parm {
#include <linux/notifier.h>
#include <net/dst.h>
#include <net/flow.h>
+#include <net/ip_fib.h>
#include <net/netlink.h>
#include <net/inetpeer.h>
#include <net/fib_notifier.h>
u32 fc_protocol;
u16 fc_type; /* only 8 bits are used */
u16 fc_delete_all_nh : 1,
- __unused : 15;
+ fc_ignore_dev_down:1,
+ __unused : 14;
struct in6_addr fc_dst;
struct in6_addr fc_src;
#define FIB6_MAX_DEPTH 5
struct fib6_nh {
- struct in6_addr nh_gw;
- struct net_device *nh_dev;
- struct lwtunnel_state *nh_lwtstate;
+ struct fib_nh_common nh_common;
- unsigned int nh_flags;
- atomic_t nh_upper_bound;
- int nh_weight;
+#ifdef CONFIG_IPV6_ROUTER_PREF
+ unsigned long last_probe;
+#endif
};
struct fib6_info {
struct rt6_info * __percpu *rt6i_pcpu;
struct rt6_exception_bucket __rcu *rt6i_exception_bucket;
-#ifdef CONFIG_IPV6_ROUTER_PREF
- unsigned long last_probe;
-#endif
-
u32 fib6_metric;
u8 fib6_protocol;
u8 fib6_type;
static inline struct net_device *fib6_info_nh_dev(const struct fib6_info *f6i)
{
- return f6i->fib6_nh.nh_dev;
+ return f6i->fib6_nh.fib_nh_dev;
}
+int fib6_nh_init(struct net *net, struct fib6_nh *fib6_nh,
+ struct fib6_config *cfg, gfp_t gfp_flags,
+ struct netlink_ext_ack *extack);
+void fib6_nh_release(struct fib6_nh *fib6_nh);
+
static inline
struct lwtunnel_state *fib6_info_nh_lwt(const struct fib6_info *f6i)
{
- return f6i->fib6_nh.nh_lwtstate;
+ return f6i->fib6_nh.fib_nh_lws;
}
void inet6_rt_notify(int event, struct fib6_info *rt, struct nl_info *info,
static inline bool rt6_qualify_for_ecmp(const struct fib6_info *f6i)
{
- return (f6i->fib6_flags & (RTF_GATEWAY|RTF_ADDRCONF|RTF_DYNAMIC)) ==
- RTF_GATEWAY;
+ return !(f6i->fib6_flags & (RTF_ADDRCONF|RTF_DYNAMIC)) &&
+ f6i->fib6_nh.fib_nh_gw_family;
}
void ip6_route_input(struct sk_buff *skb);
static inline bool rt6_duplicate_nexthop(struct fib6_info *a, struct fib6_info *b)
{
- return a->fib6_nh.nh_dev == b->fib6_nh.nh_dev &&
- ipv6_addr_equal(&a->fib6_nh.nh_gw, &b->fib6_nh.nh_gw) &&
- !lwtunnel_cmp_encap(a->fib6_nh.nh_lwtstate, b->fib6_nh.nh_lwtstate);
+ struct fib6_nh *nha = &a->fib6_nh, *nhb = &b->fib6_nh;
+
+ return nha->fib_nh_dev == nhb->fib_nh_dev &&
+ ipv6_addr_equal(&nha->fib_nh_gw6, &nhb->fib_nh_gw6) &&
+ !lwtunnel_cmp_encap(nha->fib_nh_lws, nhb->fib_nh_lws);
}
static inline unsigned int ip6_dst_mtu_forward(const struct dst_entry *dst)
u8 fc_protocol;
u8 fc_scope;
u8 fc_type;
- /* 3 bytes unused */
+ u8 fc_gw_family;
+ /* 2 bytes unused */
u32 fc_table;
__be32 fc_dst;
- __be32 fc_gw;
+ union {
+ __be32 fc_gw4;
+ struct in6_addr fc_gw6;
+ };
int fc_oif;
u32 fc_flags;
u32 fc_priority;
#define FNHE_HASH_SIZE (1 << FNHE_HASH_SHIFT)
#define FNHE_RECLAIM_DEPTH 5
+struct fib_nh_common {
+ struct net_device *nhc_dev;
+ int nhc_oif;
+ unsigned int nhc_flags;
+ struct lwtunnel_state *nhc_lwtstate;
+ unsigned char nhc_scope;
+ u8 nhc_family;
+ u8 nhc_gw_family;
+
+ union {
+ __be32 ipv4;
+ struct in6_addr ipv6;
+ } nhc_gw;
+
+ int nhc_weight;
+ atomic_t nhc_upper_bound;
+};
+
struct fib_nh {
- struct net_device *nh_dev;
+ struct fib_nh_common nh_common;
struct hlist_node nh_hash;
struct fib_info *nh_parent;
- unsigned int nh_flags;
- unsigned char nh_scope;
-#ifdef CONFIG_IP_ROUTE_MULTIPATH
- int nh_weight;
- atomic_t nh_upper_bound;
-#endif
#ifdef CONFIG_IP_ROUTE_CLASSID
__u32 nh_tclassid;
#endif
- int nh_oif;
- __be32 nh_gw;
__be32 nh_saddr;
int nh_saddr_genid;
struct rtable __rcu * __percpu *nh_pcpu_rth_output;
struct rtable __rcu *nh_rth_input;
struct fnhe_hash_bucket __rcu *nh_exceptions;
- struct lwtunnel_state *nh_lwtstate;
+#define fib_nh_family nh_common.nhc_family
+#define fib_nh_dev nh_common.nhc_dev
+#define fib_nh_oif nh_common.nhc_oif
+#define fib_nh_flags nh_common.nhc_flags
+#define fib_nh_lws nh_common.nhc_lwtstate
+#define fib_nh_scope nh_common.nhc_scope
+#define fib_nh_gw_family nh_common.nhc_gw_family
+#define fib_nh_gw4 nh_common.nhc_gw.ipv4
+#define fib_nh_gw6 nh_common.nhc_gw.ipv6
+#define fib_nh_weight nh_common.nhc_weight
+#define fib_nh_upper_bound nh_common.nhc_upper_bound
};
/*
#define fib_rtt fib_metrics->metrics[RTAX_RTT-1]
#define fib_advmss fib_metrics->metrics[RTAX_ADVMSS-1]
int fib_nhs;
+ bool fib_nh_is_v6;
struct rcu_head rcu;
struct fib_nh fib_nh[0];
-#define fib_dev fib_nh[0].nh_dev
+#define fib_dev fib_nh[0].fib_nh_dev
};
struct fib_table;
struct fib_result {
- __be32 prefix;
- unsigned char prefixlen;
- unsigned char nh_sel;
- unsigned char type;
- unsigned char scope;
- u32 tclassid;
- struct fib_info *fi;
- struct fib_table *table;
- struct hlist_head *fa_head;
+ __be32 prefix;
+ unsigned char prefixlen;
+ unsigned char nh_sel;
+ unsigned char type;
+ unsigned char scope;
+ u32 tclassid;
+ struct fib_nh_common *nhc;
+ struct fib_info *fi;
+ struct fib_table *table;
+ struct hlist_head *fa_head;
};
struct fib_result_nl {
int err;
};
-#ifdef CONFIG_IP_ROUTE_MULTIPATH
-#define FIB_RES_NH(res) ((res).fi->fib_nh[(res).nh_sel])
-#else /* CONFIG_IP_ROUTE_MULTIPATH */
-#define FIB_RES_NH(res) ((res).fi->fib_nh[0])
-#endif /* CONFIG_IP_ROUTE_MULTIPATH */
+static inline struct fib_nh_common *fib_info_nhc(struct fib_info *fi, int nhsel)
+{
+ return &fi->fib_nh[nhsel].nh_common;
+}
#ifdef CONFIG_IP_MULTIPLE_TABLES
#define FIB_TABLE_HASHSZ 256
#endif
__be32 fib_info_update_nh_saddr(struct net *net, struct fib_nh *nh);
+__be32 fib_result_prefsrc(struct net *net, struct fib_result *res);
-#define FIB_RES_SADDR(net, res) \
- ((FIB_RES_NH(res).nh_saddr_genid == \
- atomic_read(&(net)->ipv4.dev_addr_genid)) ? \
- FIB_RES_NH(res).nh_saddr : \
- fib_info_update_nh_saddr((net), &FIB_RES_NH(res)))
-#define FIB_RES_GW(res) (FIB_RES_NH(res).nh_gw)
-#define FIB_RES_DEV(res) (FIB_RES_NH(res).nh_dev)
-#define FIB_RES_OIF(res) (FIB_RES_NH(res).nh_oif)
-
-#define FIB_RES_PREFSRC(net, res) ((res).fi->fib_prefsrc ? : \
- FIB_RES_SADDR(net, res))
+#define FIB_RES_NHC(res) ((res).nhc)
+#define FIB_RES_DEV(res) (FIB_RES_NHC(res)->nhc_dev)
+#define FIB_RES_OIF(res) (FIB_RES_NHC(res)->nhc_oif)
struct fib_entry_notifier_info {
struct fib_notifier_info info; /* must be first */
/* Exported by fib_frontend.c */
extern const struct nla_policy rtm_ipv4_policy[];
void ip_fib_init(void);
+int fib_gw_from_via(struct fib_config *cfg, struct nlattr *nla,
+ struct netlink_ext_ack *extack);
__be32 fib_compute_spec_dst(struct sk_buff *skb);
bool fib_info_nh_uses_dev(struct fib_info *fi, const struct net_device *dev);
int fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst,
void fib_select_path(struct net *net, struct fib_result *res,
struct flowi4 *fl4, const struct sk_buff *skb);
+int fib_nh_init(struct net *net, struct fib_nh *fib_nh,
+ struct fib_config *cfg, int nh_weight,
+ struct netlink_ext_ack *extack);
+void fib_nh_release(struct net *net, struct fib_nh *fib_nh);
+int fib_nh_common_init(struct fib_nh_common *nhc, struct nlattr *fc_encap,
+ u16 fc_encap_type, void *cfg, gfp_t gfp_flags,
+ struct netlink_ext_ack *extack);
+void fib_nh_common_release(struct fib_nh_common *nhc);
+
/* Exported by fib_trie.c */
void fib_trie_init(void);
struct fib_table *fib_trie_table(u32 id, struct fib_table *alias);
static inline void fib_combine_itag(u32 *itag, const struct fib_result *res)
{
#ifdef CONFIG_IP_ROUTE_CLASSID
+ struct fib_nh_common *nhc = res->nhc;
+ struct fib_nh *nh = container_of(nhc, struct fib_nh, nh_common);
#ifdef CONFIG_IP_MULTIPLE_TABLES
u32 rtag;
#endif
- *itag = FIB_RES_NH(*res).nh_tclassid<<16;
+ *itag = nh->nh_tclassid << 16;
#ifdef CONFIG_IP_MULTIPLE_TABLES
rtag = res->tclassid;
if (*itag == 0)
int ip_valid_fib_dump_req(struct net *net, const struct nlmsghdr *nlh,
struct fib_dump_filter *filter,
struct netlink_callback *cb);
+
+int fib_nexthop_info(struct sk_buff *skb, const struct fib_nh_common *nh,
+ unsigned int *flags, bool skip_oif);
+int fib_add_nexthop(struct sk_buff *skb, const struct fib_nh_common *nh,
+ int nh_weight);
#endif /* _NET_FIB_H */
/* Address family of addr */
u16 af;
+
+ u16 tun_type; /* tunnel type */
+ __be16 tun_port; /* tunnel port */
};
atomic_t conn_flags; /* flags to copy to conn */
atomic_t weight; /* server weight */
atomic_t last_weight; /* server latest weight */
+ __u16 tun_type; /* tunnel type */
+ __be16 tun_port; /* tunnel port */
refcount_t refcnt; /* reference counter */
struct ip_vs_stats stats; /* statistics */
--- /dev/null
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _IPV6_STUBS_H
+#define _IPV6_STUBS_H
+
+#include <linux/in6.h>
+#include <linux/netdevice.h>
+#include <linux/skbuff.h>
+#include <net/dst.h>
+#include <net/flow.h>
+#include <net/neighbour.h>
+#include <net/sock.h>
+
+/* structs from net/ip6_fib.h */
+struct fib6_info;
+struct fib6_nh;
+struct fib6_config;
+
+/* This is ugly, ideally these symbols should be built
+ * into the core kernel.
+ */
+struct ipv6_stub {
+ int (*ipv6_sock_mc_join)(struct sock *sk, int ifindex,
+ const struct in6_addr *addr);
+ int (*ipv6_sock_mc_drop)(struct sock *sk, int ifindex,
+ const struct in6_addr *addr);
+ int (*ipv6_dst_lookup)(struct net *net, struct sock *sk,
+ struct dst_entry **dst, struct flowi6 *fl6);
+ int (*ipv6_route_input)(struct sk_buff *skb);
+
+ struct fib6_table *(*fib6_get_table)(struct net *net, u32 id);
+ struct fib6_info *(*fib6_lookup)(struct net *net, int oif,
+ struct flowi6 *fl6, int flags);
+ struct fib6_info *(*fib6_table_lookup)(struct net *net,
+ struct fib6_table *table,
+ int oif, struct flowi6 *fl6,
+ int flags);
+ struct fib6_info *(*fib6_multipath_select)(const struct net *net,
+ struct fib6_info *f6i,
+ struct flowi6 *fl6, int oif,
+ const struct sk_buff *skb,
+ int strict);
+ u32 (*ip6_mtu_from_fib6)(struct fib6_info *f6i, struct in6_addr *daddr,
+ struct in6_addr *saddr);
+
+ int (*fib6_nh_init)(struct net *net, struct fib6_nh *fib6_nh,
+ struct fib6_config *cfg, gfp_t gfp_flags,
+ struct netlink_ext_ack *extack);
+ void (*fib6_nh_release)(struct fib6_nh *fib6_nh);
+ void (*udpv6_encap_enable)(void);
+ void (*ndisc_send_na)(struct net_device *dev, const struct in6_addr *daddr,
+ const struct in6_addr *solicited_addr,
+ bool router, bool solicited, bool override, bool inc_opt);
+ struct neigh_table *nd_tbl;
+};
+extern const struct ipv6_stub *ipv6_stub __read_mostly;
+
+/* A stub used by bpf helpers. Similarly ugly as ipv6_stub */
+struct ipv6_bpf_stub {
+ int (*inet6_bind)(struct sock *sk, struct sockaddr *uaddr, int addr_len,
+ bool force_bind_address_no_port, bool with_lock);
+ struct sock *(*udp6_lib_lookup)(struct net *net,
+ const struct in6_addr *saddr, __be16 sport,
+ const struct in6_addr *daddr, __be16 dport,
+ int dif, int sdif, struct udp_table *tbl,
+ struct sk_buff *skb);
+};
+extern const struct ipv6_bpf_stub *ipv6_bpf_stub __read_mostly;
+
+#endif
#ifndef _NDISC_H
#define _NDISC_H
+#include <net/ipv6_stubs.h>
+
/*
* ICMP codes for neighbour discovery messages
*/
return ___neigh_lookup_noref(&nd_tbl, neigh_key_eq128, ndisc_hashfn, pkey, dev);
}
+static inline
+struct neighbour *__ipv6_neigh_lookup_noref_stub(struct net_device *dev,
+ const void *pkey)
+{
+ return ___neigh_lookup_noref(ipv6_stub->nd_tbl, neigh_key_eq128,
+ ndisc_hashfn, pkey, dev);
+}
+
static inline struct neighbour *__ipv6_neigh_lookup(struct net_device *dev, const void *pkey)
{
struct neighbour *n;
rcu_read_unlock_bh();
}
+static inline void __ipv6_confirm_neigh_stub(struct net_device *dev,
+ const void *pkey)
+{
+ struct neighbour *n;
+
+ rcu_read_lock_bh();
+ n = __ipv6_neigh_lookup_noref_stub(dev, pkey);
+ if (n) {
+ unsigned long now = jiffies;
+
+ /* avoid dirtying neighbour */
+ if (n->confirmed != now)
+ n->confirmed = now;
+ }
+ rcu_read_unlock_bh();
+}
+
+/* uses ipv6_stub and is meant for use outside of IPv6 core */
+static inline struct neighbour *ip_neigh_gw6(struct net_device *dev,
+ const void *addr)
+{
+ struct neighbour *neigh;
+
+ neigh = __ipv6_neigh_lookup_noref_stub(dev, addr);
+ if (unlikely(!neigh))
+ neigh = __neigh_create(ipv6_stub->nd_tbl, addr, dev, false);
+
+ return neigh;
+}
+
int ndisc_init(void);
int ndisc_late_init(void);
return dev_queue_xmit(skb);
}
-static inline int neigh_output(struct neighbour *n, struct sk_buff *skb)
+static inline int neigh_output(struct neighbour *n, struct sk_buff *skb,
+ bool skip_cache)
{
const struct hh_cache *hh = &n->hh;
- if ((n->nud_state & NUD_CONNECTED) && hh->hh_len)
+ if ((n->nud_state & NUD_CONNECTED) && hh->hh_len && !skip_cache)
return neigh_hh_output(hh, skb);
else
return n->output(n, skb);
+++ /dev/null
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _NF_NAT_MASQUERADE_IPV4_H_
-#define _NF_NAT_MASQUERADE_IPV4_H_
-
-#include <net/netfilter/nf_nat.h>
-
-unsigned int
-nf_nat_masquerade_ipv4(struct sk_buff *skb, unsigned int hooknum,
- const struct nf_nat_range2 *range,
- const struct net_device *out);
-
-int nf_nat_masquerade_ipv4_register_notifier(void);
-void nf_nat_masquerade_ipv4_unregister_notifier(void);
-
-#endif /*_NF_NAT_MASQUERADE_IPV4_H_ */
+++ /dev/null
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _NF_NAT_MASQUERADE_IPV6_H_
-#define _NF_NAT_MASQUERADE_IPV6_H_
-
-unsigned int
-nf_nat_masquerade_ipv6(struct sk_buff *skb, const struct nf_nat_range2 *range,
- const struct net_device *out);
-int nf_nat_masquerade_ipv6_register_notifier(void);
-void nf_nat_masquerade_ipv6_unregister_notifier(void);
-
-#endif /* _NF_NAT_MASQUERADE_IPV6_H_ */
/* Expectation class */
unsigned int class;
-#ifdef CONFIG_NF_NAT_NEEDED
+#if IS_ENABLED(CONFIG_NF_NAT)
union nf_inet_addr saved_addr;
/* This is the original per-proto part, used to map the
* expected connection the way the recipient expects. */
int nf_conntrack_timeout_init(void);
void nf_conntrack_timeout_fini(void);
void nf_ct_untimeout(struct net *net, struct nf_ct_timeout *timeout);
+int nf_ct_set_timeout(struct net *net, struct nf_conn *ct, u8 l3num, u8 l4num,
+ const char *timeout_name);
+void nf_ct_destroy_timeout(struct nf_conn *ct);
#else
static inline int nf_conntrack_timeout_init(void)
{
{
return;
}
+
+static inline int nf_ct_set_timeout(struct net *net, struct nf_conn *ct,
+ u8 l3num, u8 l4num,
+ const char *timeout_name)
+{
+ return -EOPNOTSUPP;
+}
+
+static inline void nf_ct_destroy_timeout(struct nf_conn *ct)
+{
+ return;
+}
#endif /* CONFIG_NF_CONNTRACK_TIMEOUT */
#ifdef CONFIG_NF_CONNTRACK_TIMEOUT
#endif
}
-int nf_nat_register_fn(struct net *net, const struct nf_hook_ops *ops,
+int nf_nat_register_fn(struct net *net, u8 pf, const struct nf_hook_ops *ops,
const struct nf_hook_ops *nat_ops, unsigned int ops_count);
-void nf_nat_unregister_fn(struct net *net, const struct nf_hook_ops *ops,
+void nf_nat_unregister_fn(struct net *net, u8 pf, const struct nf_hook_ops *ops,
unsigned int ops_count);
unsigned int nf_nat_packet(struct nf_conn *ct, enum ip_conntrack_info ctinfo,
int nf_nat_ipv6_register_fn(struct net *net, const struct nf_hook_ops *ops);
void nf_nat_ipv6_unregister_fn(struct net *net, const struct nf_hook_ops *ops);
+int nf_nat_inet_register_fn(struct net *net, const struct nf_hook_ops *ops);
+void nf_nat_inet_unregister_fn(struct net *net, const struct nf_hook_ops *ops);
+
unsigned int
nf_nat_inet_fn(void *priv, struct sk_buff *skb,
const struct nf_hook_state *state);
--- /dev/null
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _NF_NAT_MASQUERADE_H_
+#define _NF_NAT_MASQUERADE_H_
+
+#include <net/netfilter/nf_nat.h>
+
+unsigned int
+nf_nat_masquerade_ipv4(struct sk_buff *skb, unsigned int hooknum,
+ const struct nf_nat_range2 *range,
+ const struct net_device *out);
+
+int nf_nat_masquerade_inet_register_notifiers(void);
+void nf_nat_masquerade_inet_unregister_notifiers(void);
+
+unsigned int
+nf_nat_masquerade_ipv6(struct sk_buff *skb, const struct nf_nat_range2 *range,
+ const struct net_device *out);
+
+#endif /*_NF_NAT_MASQUERADE_H_ */
return queue;
}
+int nf_queue(struct sk_buff *skb, struct nf_hook_state *state,
+ const struct nf_hook_entries *entries, unsigned int index,
+ unsigned int verdict);
#endif /* _NF_QUEUE_H */
enum nft_trans_phase phase);
int nf_tables_bind_set(const struct nft_ctx *ctx, struct nft_set *set,
struct nft_set_binding *binding);
-void nf_tables_unbind_set(const struct nft_ctx *ctx, struct nft_set *set,
- struct nft_set_binding *binding, bool commit);
void nf_tables_destroy_set(const struct nft_ctx *ctx, struct nft_set *set);
/**
int __init nft_chain_filter_init(void);
void nft_chain_filter_fini(void);
+void __init nft_chain_route_init(void);
+void nft_chain_route_fini(void);
#endif /* _NET_NF_TABLES_H */
#include <linux/uidgid.h>
#include <net/inet_frag.h>
#include <linux/rcupdate.h>
+#include <linux/siphash.h>
struct tcpm_hash_bucket;
struct ctl_table_header;
unsigned int ipmr_seq; /* protected by rtnl_mutex */
atomic_t rt_genid;
+ siphash_key_t ip_id_key;
};
#endif
int auto_flowlabels;
int icmpv6_time;
int icmpv6_echo_ignore_all;
+ int icmpv6_echo_ignore_multicast;
+ int icmpv6_echo_ignore_anycast;
int anycast_src_echo_reply;
int ip_nonlocal_bind;
int fwmark_reflect;
return req;
}
-static inline void reqsk_free(struct request_sock *req)
+static inline void __reqsk_free(struct request_sock *req)
{
- WARN_ON_ONCE(refcount_read(&req->rsk_refcnt) != 0);
-
req->rsk_ops->destructor(req);
if (req->rsk_listener)
sock_put(req->rsk_listener);
kmem_cache_free(req->rsk_ops->slab, req);
}
+static inline void reqsk_free(struct request_sock *req)
+{
+ WARN_ON_ONCE(refcount_read(&req->rsk_refcnt) != 0);
+ __reqsk_free(req);
+}
+
static inline void reqsk_put(struct request_sock *req)
{
if (refcount_dec_and_test(&req->rsk_refcnt))
#include <net/flow.h>
#include <net/inet_sock.h>
#include <net/ip_fib.h>
+#include <net/arp.h>
+#include <net/ndisc.h>
#include <linux/in_route.h>
#include <linux/rtnetlink.h>
#include <linux/rcupdate.h>
unsigned int rt_flags;
__u16 rt_type;
__u8 rt_is_input;
- __u8 rt_uses_gateway;
+ u8 rt_gw_family;
int rt_iif;
/* Info on neighbour */
- __be32 rt_gateway;
+ union {
+ __be32 rt_gw4;
+ struct in6_addr rt_gw6;
+ };
/* Miscellaneous cached information */
u32 rt_mtu_locked:1,
static inline __be32 rt_nexthop(const struct rtable *rt, __be32 daddr)
{
- if (rt->rt_gateway)
- return rt->rt_gateway;
+ if (rt->rt_gw_family == AF_INET)
+ return rt->rt_gw4;
return daddr;
}
return hoplimit;
}
+static inline struct neighbour *ip_neigh_gw4(struct net_device *dev,
+ __be32 daddr)
+{
+ struct neighbour *neigh;
+
+ neigh = __ipv4_neigh_lookup_noref(dev, daddr);
+ if (unlikely(!neigh))
+ neigh = __neigh_create(&arp_tbl, &daddr, dev, false);
+
+ return neigh;
+}
+
+static inline struct neighbour *ip_neigh_for_gw(struct rtable *rt,
+ struct sk_buff *skb,
+ bool *is_v6gw)
+{
+ struct net_device *dev = rt->dst.dev;
+ struct neighbour *neigh;
+
+ if (likely(rt->rt_gw_family == AF_INET)) {
+ neigh = ip_neigh_gw4(dev, rt->rt_gw4);
+ } else if (rt->rt_gw_family == AF_INET6) {
+ neigh = ip_neigh_gw6(dev, &rt->rt_gw6);
+ *is_v6gw = true;
+ } else {
+ neigh = ip_neigh_gw4(dev, ip_hdr(skb)->daddr);
+ }
+ return neigh;
+}
+
#endif /* _ROUTE_H */
struct qdisc_skb_head {
struct sk_buff *head;
struct sk_buff *tail;
- union {
- u32 qlen;
- atomic_t atomic_qlen;
- };
+ __u32 qlen;
spinlock_t lock;
};
spinlock_t busylock ____cacheline_aligned_in_smp;
spinlock_t seqlock;
+
+ /* for NOLOCK qdisc, true if there are no enqueued skbs */
+ bool empty;
struct rcu_head rcu;
};
return (raw_read_seqcount(&qdisc->running) & 1) ? true : false;
}
+static inline bool qdisc_is_percpu_stats(const struct Qdisc *q)
+{
+ return q->flags & TCQ_F_CPUSTATS;
+}
+
+static inline bool qdisc_is_empty(const struct Qdisc *qdisc)
+{
+ if (qdisc_is_percpu_stats(qdisc))
+ return qdisc->empty;
+ return !qdisc->q.qlen;
+}
+
static inline bool qdisc_run_begin(struct Qdisc *qdisc)
{
if (qdisc->flags & TCQ_F_NOLOCK) {
if (!spin_trylock(&qdisc->seqlock))
return false;
+ qdisc->empty = false;
} else if (qdisc_is_running(qdisc)) {
return false;
}
BUILD_BUG_ON(sizeof(qcb->data) < sz);
}
+static inline int qdisc_qlen_cpu(const struct Qdisc *q)
+{
+ return this_cpu_ptr(q->cpu_qstats)->qlen;
+}
+
static inline int qdisc_qlen(const struct Qdisc *q)
{
return q->q.qlen;
}
-static inline u32 qdisc_qlen_sum(const struct Qdisc *q)
+static inline int qdisc_qlen_sum(const struct Qdisc *q)
{
- u32 qlen = q->qstats.qlen;
+ __u32 qlen = q->qstats.qlen;
+ int i;
- if (q->flags & TCQ_F_NOLOCK)
- qlen += atomic_read(&q->q.atomic_qlen);
- else
+ if (qdisc_is_percpu_stats(q)) {
+ for_each_possible_cpu(i)
+ qlen += per_cpu_ptr(q->cpu_qstats, i)->qlen;
+ } else {
qlen += q->q.qlen;
+ }
return qlen;
}
struct netdev_queue *txq = netdev_get_tx_queue(dev, i);
const struct Qdisc *q = rcu_dereference(txq->qdisc);
- if (q->q.qlen) {
+ if (!qdisc_is_empty(q)) {
rcu_read_unlock();
return false;
}
return sch->enqueue(skb, sch, to_free);
}
-static inline bool qdisc_is_percpu_stats(const struct Qdisc *q)
-{
- return q->flags & TCQ_F_CPUSTATS;
-}
-
static inline void _bstats_update(struct gnet_stats_basic_packed *bstats,
__u64 bytes, __u32 packets)
{
this_cpu_add(sch->cpu_qstats->backlog, qdisc_pkt_len(skb));
}
-static inline void qdisc_qstats_atomic_qlen_inc(struct Qdisc *sch)
+static inline void qdisc_qstats_cpu_qlen_inc(struct Qdisc *sch)
{
- atomic_inc(&sch->q.atomic_qlen);
+ this_cpu_inc(sch->cpu_qstats->qlen);
}
-static inline void qdisc_qstats_atomic_qlen_dec(struct Qdisc *sch)
+static inline void qdisc_qstats_cpu_qlen_dec(struct Qdisc *sch)
{
- atomic_dec(&sch->q.atomic_qlen);
+ this_cpu_dec(sch->cpu_qstats->qlen);
}
static inline void qdisc_qstats_cpu_requeues_inc(struct Qdisc *sch)
return skb;
}
+static inline void qdisc_update_stats_at_dequeue(struct Qdisc *sch,
+ struct sk_buff *skb)
+{
+ if (qdisc_is_percpu_stats(sch)) {
+ qdisc_qstats_cpu_backlog_dec(sch, skb);
+ qdisc_bstats_cpu_update(sch, skb);
+ qdisc_qstats_cpu_qlen_dec(sch);
+ } else {
+ qdisc_qstats_backlog_dec(sch, skb);
+ qdisc_bstats_update(sch, skb);
+ sch->q.qlen--;
+ }
+}
+
+static inline void qdisc_update_stats_at_enqueue(struct Qdisc *sch,
+ unsigned int pkt_len)
+{
+ if (qdisc_is_percpu_stats(sch)) {
+ qdisc_qstats_cpu_qlen_inc(sch);
+ this_cpu_add(sch->cpu_qstats->backlog, pkt_len);
+ } else {
+ sch->qstats.backlog += pkt_len;
+ sch->q.qlen++;
+ }
+}
+
/* use instead of qdisc->dequeue() for all qdiscs queried with ->peek() */
static inline struct sk_buff *qdisc_dequeue_peeked(struct Qdisc *sch)
{
if (skb) {
skb = __skb_dequeue(&sch->gso_skb);
- qdisc_qstats_backlog_dec(sch, skb);
- sch->q.qlen--;
+ if (qdisc_is_percpu_stats(sch)) {
+ qdisc_qstats_cpu_backlog_dec(sch, skb);
+ qdisc_qstats_cpu_qlen_dec(sch);
+ } else {
+ qdisc_qstats_backlog_dec(sch, skb);
+ sch->q.qlen--;
+ }
} else {
skb = sch->dequeue(sch);
}
/*
* This mimics the behavior of skb_set_owner_r
*/
- sk->sk_forward_alloc -= event->rmem_len;
+ sk_mem_charge(sk, event->rmem_len);
}
/* Tests if the list has one and only one entry. */
int sctp_ulpq_tail_data(struct sctp_ulpq *, struct sctp_chunk *, gfp_t);
/* Add a new event for propagation to the ULP. */
-int sctp_ulpq_tail_event(struct sctp_ulpq *, struct sctp_ulpevent *ev);
+int sctp_ulpq_tail_event(struct sctp_ulpq *, struct sk_buff_head *skb_list);
/* Renege previously received chunks. */
void sctp_ulpq_renege(struct sctp_ulpq *, struct sctp_chunk *, gfp_t);
atomic_t sk_drops;
int sk_rcvlowat;
struct sk_buff_head sk_error_queue;
+ struct sk_buff *sk_rx_skb_cache;
struct sk_buff_head sk_receive_queue;
/*
* The backlog queue is special, it is always used with
struct sk_buff *sk_send_head;
struct rb_root tcp_rtx_queue;
};
+ struct sk_buff *sk_tx_skb_cache;
struct sk_buff_head sk_write_queue;
__s32 sk_peek_off;
int sk_write_pending;
static inline void sock_rps_record_flow(const struct sock *sk)
{
#ifdef CONFIG_RPS
- if (static_key_false(&rfs_needed)) {
+ if (static_branch_unlikely(&rfs_needed)) {
/* Reading sk->sk_rxhash might incur an expensive cache line
* miss.
*
sock_set_flag(sk, SOCK_QUEUE_SHRUNK);
sk->sk_wmem_queued -= skb->truesize;
sk_mem_uncharge(sk, skb->truesize);
+ if (!sk->sk_tx_skb_cache) {
+ skb_zcopy_clear(skb, true);
+ sk->sk_tx_skb_cache = skb;
+ return;
+ }
__kfree_skb(skb);
}
static inline void sk_eat_skb(struct sock *sk, struct sk_buff *skb)
{
__skb_unlink(skb, &sk->sk_receive_queue);
+ if (
+#ifdef CONFIG_RPS
+ !static_branch_unlikely(&rps_needed) &&
+#endif
+ !sk->sk_rx_skb_cache) {
+ sk->sk_rx_skb_cache = skb;
+ skb_orphan(skb);
+ return;
+ }
__kfree_skb(skb);
}
#define TLS_AAD_SPACE_SIZE 13
#define TLS_DEVICE_NAME_MAX 32
+#define MAX_IV_SIZE 16
+
+/* For AES-CCM, the full 16-bytes of IV is made of '4' fields of given sizes.
+ *
+ * IV[16] = b0[1] || implicit nonce[4] || explicit nonce[8] || length[3]
+ *
+ * The field 'length' is encoded in field 'b0' as '(length width - 1)'.
+ * Hence b0 contains (3 - 1) = 2.
+ */
+#define TLS_AES_CCM_IV_B0_BYTE 2
+
/*
* This structure defines the routines for Inline TLS driver.
* The following routines are optional and filled with a
struct scatterlist sg_content_type;
char aad_space[TLS_AAD_SPACE_SIZE];
- u8 iv_data[TLS_CIPHER_AES_GCM_128_IV_SIZE +
- TLS_CIPHER_AES_GCM_128_SALT_SIZE];
+ u8 iv_data[MAX_IV_SIZE];
struct aead_request aead_req;
u8 aead_req_ctx[];
};
u16 tag_size;
u16 overhead_size;
u16 iv_size;
+ u16 salt_size;
u16 rec_seq_size;
u16 aad_size;
u16 tail_size;
int __udp_enqueue_schedule_skb(struct sock *sk, struct sk_buff *skb);
void udp_skb_destructor(struct sock *sk, struct sk_buff *skb);
struct sk_buff *__skb_recv_udp(struct sock *sk, unsigned int flags,
- int noblock, int *peeked, int *off, int *err);
+ int noblock, int *off, int *err);
static inline struct sk_buff *skb_recv_udp(struct sock *sk, unsigned int flags,
int noblock, int *err)
{
- int peeked, off = 0;
+ int off = 0;
- return __skb_recv_udp(sk, flags, noblock, &peeked, &off, err);
+ return __skb_recv_udp(sk, flags, noblock, &off, err);
}
int udp_v4_early_demux(struct sk_buff *skb);
#if IS_ENABLED(CONFIG_IPV6)
#include <net/ipv6.h>
-#include <net/addrconf.h>
+#include <net/ipv6_stubs.h>
#endif
struct udp_port_cfg {
#include <net/rtnetlink.h>
#include <net/switchdev.h>
+#define IANA_VXLAN_UDP_PORT 4789
+
/* VXLAN protocol (RFC 7348) header:
* +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
* |R|R|R|R|I|R|R|R| Reserved |
TRACE_EVENT(fib_table_lookup,
TP_PROTO(u32 tb_id, const struct flowi4 *flp,
- const struct fib_nh *nh, int err),
+ const struct fib_nh_common *nhc, int err),
- TP_ARGS(tb_id, flp, nh, err),
+ TP_ARGS(tb_id, flp, nhc, err),
TP_STRUCT__entry(
__field( u32, tb_id )
__field( __u8, flags )
__array( __u8, src, 4 )
__array( __u8, dst, 4 )
- __array( __u8, gw, 4 )
- __array( __u8, saddr, 4 )
+ __array( __u8, gw4, 4 )
+ __array( __u8, gw6, 16 )
__field( u16, sport )
__field( u16, dport )
__dynamic_array(char, name, IFNAMSIZ )
),
TP_fast_assign(
+ struct in6_addr in6_zero = {};
+ struct net_device *dev;
+ struct in6_addr *in6;
__be32 *p32;
__entry->tb_id = tb_id;
__entry->dport = 0;
}
- if (nh) {
- p32 = (__be32 *) __entry->saddr;
- *p32 = nh->nh_saddr;
+ dev = nhc ? nhc->nhc_dev : NULL;
+ __assign_str(name, dev ? dev->name : "-");
- p32 = (__be32 *) __entry->gw;
- *p32 = nh->nh_gw;
+ if (nhc) {
+ if (nhc->nhc_gw_family == AF_INET) {
+ p32 = (__be32 *) __entry->gw4;
+ *p32 = nhc->nhc_gw.ipv4;
- __assign_str(name, nh->nh_dev ? nh->nh_dev->name : "-");
- } else {
- p32 = (__be32 *) __entry->saddr;
- *p32 = 0;
+ in6 = (struct in6_addr *)__entry->gw6;
+ *in6 = in6_zero;
+ } else if (nhc->nhc_gw_family == AF_INET6) {
+ p32 = (__be32 *) __entry->gw4;
+ *p32 = 0;
- p32 = (__be32 *) __entry->gw;
+ in6 = (struct in6_addr *)__entry->gw6;
+ *in6 = nhc->nhc_gw.ipv6;
+ }
+ } else {
+ p32 = (__be32 *) __entry->gw4;
*p32 = 0;
- __assign_str(name, "-");
+ in6 = (struct in6_addr *)__entry->gw6;
+ *in6 = in6_zero;
}
),
- TP_printk("table %u oif %d iif %d proto %u %pI4/%u -> %pI4/%u tos %d scope %d flags %x ==> dev %s gw %pI4 src %pI4 err %d",
+ TP_printk("table %u oif %d iif %d proto %u %pI4/%u -> %pI4/%u tos %d scope %d flags %x ==> dev %s gw %pI4/%pI6c err %d",
__entry->tb_id, __entry->oif, __entry->iif, __entry->proto,
__entry->src, __entry->sport, __entry->dst, __entry->dport,
__entry->tos, __entry->scope, __entry->flags,
- __get_str(name), __entry->gw, __entry->saddr, __entry->err)
+ __get_str(name), __entry->gw4, __entry->gw6, __entry->err)
);
#endif /* _TRACE_FIB_H */
__entry->dport = 0;
}
- if (f6i->fib6_nh.nh_dev) {
- __assign_str(name, f6i->fib6_nh.nh_dev);
+ if (f6i->fib6_nh.fib_nh_dev) {
+ __assign_str(name, f6i->fib6_nh.fib_nh_dev);
} else {
__assign_str(name, "-");
}
} else if (f6i) {
in6 = (struct in6_addr *)__entry->gw;
- *in6 = f6i->fib6_nh.nh_gw;
+ *in6 = f6i->fib6_nh.fib_nh_gw6;
}
),
__entry->mlxsw_sp, __entry->vregion)
);
-TRACE_EVENT(mlxsw_sp_acl_tcam_vregion_rehash_dis,
+TRACE_EVENT(mlxsw_sp_acl_tcam_vregion_rehash_rollback_failed,
TP_PROTO(const struct mlxsw_sp *mlxsw_sp,
const struct mlxsw_sp_acl_tcam_vregion *vregion),
/* Copyright (C) 2007-2019 B.A.T.M.A.N. contributors:
*
* Marek Lindner, Simon Wunderlich
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#ifndef _UAPI_LINUX_BATADV_PACKET_H_
/* Copyright (C) 2016-2019 B.A.T.M.A.N. contributors:
*
* Matthias Schiffer
- *
- * Permission is hereby granted, free of charge, to any person obtaining a
- * copy of this software and associated documentation files (the "Software"),
- * to deal in the Software without restriction, including without limitation
- * the rights to use, copy, modify, merge, publish, distribute, sublicense,
- * and/or sell copies of the Software, and to permit persons to whom the
- * Software is furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice shall be included in
- * all copies or substantial portions of the Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
- * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
- * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
- * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
- * DEALINGS IN THE SOFTWARE.
*/
#ifndef _UAPI_LINUX_BATMAN_ADV_H_
*/
BATADV_ATTR_THROUGHPUT_OVERRIDE,
+ /**
+ * @BATADV_ATTR_MULTICAST_FANOUT: defines the maximum number of packet
+ * copies that may be generated for a multicast-to-unicast conversion.
+ * Once this limit is exceeded distribution will fall back to broadcast.
+ */
+ BATADV_ATTR_MULTICAST_FANOUT,
+
/* add attributes above here, update the policy in netlink.c */
/**
BPF_BTF_GET_FD_BY_ID,
BPF_TASK_FD_QUERY,
BPF_MAP_LOOKUP_AND_DELETE_ELEM,
+ BPF_MAP_FREEZE,
};
enum bpf_map_type {
*/
#define BPF_F_ANY_ALIGNMENT (1U << 1)
-/* when bpf_ldimm64->src_reg == BPF_PSEUDO_MAP_FD, bpf_ldimm64->imm == fd */
+/* When BPF ldimm64's insn[0].src_reg != 0 then this can have
+ * two extensions:
+ *
+ * insn[0].src_reg: BPF_PSEUDO_MAP_FD BPF_PSEUDO_MAP_VALUE
+ * insn[0].imm: map fd map fd
+ * insn[1].imm: 0 offset into value
+ * insn[0].off: 0 0
+ * insn[1].off: 0 0
+ * ldimm64 rewrite: address of map address of map[0]+offset
+ * verifier type: CONST_PTR_TO_MAP PTR_TO_MAP_VALUE
+ */
#define BPF_PSEUDO_MAP_FD 1
+#define BPF_PSEUDO_MAP_VALUE 2
/* when bpf_call->src_reg == BPF_PSEUDO_CALL, bpf_call->imm == pc-relative
* offset to another bpf function
#define BPF_OBJ_NAME_LEN 16U
-/* Flags for accessing BPF object */
+/* Flags for accessing BPF object from syscall side. */
#define BPF_F_RDONLY (1U << 3)
#define BPF_F_WRONLY (1U << 4)
/* Zero-initialize hash function seed. This should only be used for testing. */
#define BPF_F_ZERO_SEED (1U << 6)
+/* Flags for accessing BPF object from program side. */
+#define BPF_F_RDONLY_PROG (1U << 7)
+#define BPF_F_WRONLY_PROG (1U << 8)
+
/* flags for BPF_PROG_QUERY */
#define BPF_F_QUERY_EFFECTIVE (1U << 0)
__aligned_u64 data_out;
__u32 repeat;
__u32 duration;
+ __u32 ctx_size_in; /* input: len of ctx_in */
+ __u32 ctx_size_out; /* input/output: len of ctx_out
+ * returns ENOSPC if ctx_out
+ * is too small.
+ */
+ __aligned_u64 ctx_in;
+ __aligned_u64 ctx_out;
} test;
struct { /* anonymous struct used by BPF_*_GET_*_ID */
* Grow or shrink the room for data in the packet associated to
* *skb* by *len_diff*, and according to the selected *mode*.
*
- * There is a single supported mode at this time:
+ * There are two supported modes at this time:
+ *
+ * * **BPF_ADJ_ROOM_MAC**: Adjust room at the mac layer
+ * (room space is added or removed below the layer 2 header).
*
* * **BPF_ADJ_ROOM_NET**: Adjust room at the network layer
* (room space is added or removed below the layer 3 header).
*
- * All values for *flags* are reserved for future usage, and must
- * be left at zero.
+ * The following flags are supported at this time:
+ *
+ * * **BPF_F_ADJ_ROOM_FIXED_GSO**: Do not adjust gso_size.
+ * Adjusting mss in this way is not allowed for datagrams.
+ *
+ * * **BPF_F_ADJ_ROOM_ENCAP_L3_IPV4 **:
+ * * **BPF_F_ADJ_ROOM_ENCAP_L3_IPV6 **:
+ * Any new space is reserved to hold a tunnel header.
+ * Configure skb offsets and other fields accordingly.
+ *
+ * * **BPF_F_ADJ_ROOM_ENCAP_L4_GRE **:
+ * * **BPF_F_ADJ_ROOM_ENCAP_L4_UDP **:
+ * Use with ENCAP_L3 flags to further specify the tunnel type.
+ *
+ * * **BPF_F_ADJ_ROOM_ENCAP_L2(len) **:
+ * Use with ENCAP_L3/L4 flags to further specify the tunnel
+ * type; **len** is the length of the inner MAC header.
*
* A call to this helper is susceptible to change the underlaying
* packet buffer. Therefore, at load time, all checks on pointers
* Return
* A **struct bpf_sock** pointer on success, or **NULL** in
* case of failure.
+ *
+ * struct bpf_sock *bpf_skc_lookup_tcp(void *ctx, struct bpf_sock_tuple *tuple, u32 tuple_size, u64 netns, u64 flags)
+ * Description
+ * Look for TCP socket matching *tuple*, optionally in a child
+ * network namespace *netns*. The return value must be checked,
+ * and if non-**NULL**, released via **bpf_sk_release**\ ().
+ *
+ * This function is identical to bpf_sk_lookup_tcp, except that it
+ * also returns timewait or request sockets. Use bpf_sk_fullsock
+ * or bpf_tcp_socket to access the full structure.
+ *
+ * This helper is available only if the kernel was compiled with
+ * **CONFIG_NET** configuration option.
+ * Return
+ * Pointer to **struct bpf_sock**, or **NULL** in case of failure.
+ * For sockets with reuseport option, the **struct bpf_sock**
+ * result is from **reuse->socks**\ [] using the hash of the tuple.
+ *
+ * int bpf_tcp_check_syncookie(struct bpf_sock *sk, void *iph, u32 iph_len, struct tcphdr *th, u32 th_len)
+ * Description
+ * Check whether iph and th contain a valid SYN cookie ACK for
+ * the listening socket in sk.
+ *
+ * iph points to the start of the IPv4 or IPv6 header, while
+ * iph_len contains sizeof(struct iphdr) or sizeof(struct ip6hdr).
+ *
+ * th points to the start of the TCP header, while th_len contains
+ * sizeof(struct tcphdr).
+ *
+ * Return
+ * 0 if iph and th are a valid SYN cookie ACK, or a negative error
+ * otherwise.
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
FN(sk_fullsock), \
FN(tcp_sock), \
FN(skb_ecn_set_ce), \
- FN(get_listener_sock),
+ FN(get_listener_sock), \
+ FN(skc_lookup_tcp), \
+ FN(tcp_check_syncookie),
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
* function eBPF program intends to call
/* Current network namespace */
#define BPF_F_CURRENT_NETNS (-1L)
+/* BPF_FUNC_skb_adjust_room flags. */
+#define BPF_F_ADJ_ROOM_FIXED_GSO (1ULL << 0)
+
+#define BPF_ADJ_ROOM_ENCAP_L2_MASK 0xff
+#define BPF_ADJ_ROOM_ENCAP_L2_SHIFT 56
+
+#define BPF_F_ADJ_ROOM_ENCAP_L3_IPV4 (1ULL << 1)
+#define BPF_F_ADJ_ROOM_ENCAP_L3_IPV6 (1ULL << 2)
+#define BPF_F_ADJ_ROOM_ENCAP_L4_GRE (1ULL << 3)
+#define BPF_F_ADJ_ROOM_ENCAP_L4_UDP (1ULL << 4)
+#define BPF_F_ADJ_ROOM_ENCAP_L2(len) (((__u64)len & \
+ BPF_ADJ_ROOM_ENCAP_L2_MASK) \
+ << BPF_ADJ_ROOM_ENCAP_L2_SHIFT)
+
/* Mode for BPF_FUNC_skb_adjust_room helper. */
enum bpf_adj_room_mode {
BPF_ADJ_ROOM_NET,
+ BPF_ADJ_ROOM_MAC,
};
/* Mode for BPF_FUNC_skb_load_bytes_relative helper. */
* struct, union and fwd
*/
__u32 info;
- /* "size" is used by INT, ENUM, STRUCT and UNION.
+ /* "size" is used by INT, ENUM, STRUCT, UNION and DATASEC.
* "size" tells the size of the type it is describing.
*
* "type" is used by PTR, TYPEDEF, VOLATILE, CONST, RESTRICT,
- * FUNC and FUNC_PROTO.
+ * FUNC, FUNC_PROTO and VAR.
* "type" is a type_id referring to another type.
*/
union {
#define BTF_KIND_RESTRICT 11 /* Restrict */
#define BTF_KIND_FUNC 12 /* Function */
#define BTF_KIND_FUNC_PROTO 13 /* Function Proto */
-#define BTF_KIND_MAX 13
-#define NR_BTF_KINDS 14
+#define BTF_KIND_VAR 14 /* Variable */
+#define BTF_KIND_DATASEC 15 /* Section */
+#define BTF_KIND_MAX BTF_KIND_DATASEC
+#define NR_BTF_KINDS (BTF_KIND_MAX + 1)
/* For some specific BTF_KIND, "struct btf_type" is immediately
* followed by extra data.
__u32 type;
};
+enum {
+ BTF_VAR_STATIC = 0,
+ BTF_VAR_GLOBAL_ALLOCATED,
+};
+
+/* BTF_KIND_VAR is followed by a single "struct btf_var" to describe
+ * additional information related to the variable such as its linkage.
+ */
+struct btf_var {
+ __u32 linkage;
+};
+
+/* BTF_KIND_DATASEC is followed by multiple "struct btf_var_secinfo"
+ * to describe all BTF_KIND_VAR types it contains along with it's
+ * in-section offset as well as size.
+ */
+struct btf_var_secinfo {
+ __u32 type;
+ __u32 offset;
+ __u32 size;
+};
+
#endif /* _UAPI__LINUX_BTF_H__ */
#define DOWNSHIFT_DEV_DEFAULT_COUNT 0xff
#define DOWNSHIFT_DEV_DISABLE 0
+/* Time in msecs after which link is reported as down
+ * 0 = lowest time supported by the PHY
+ * 0xff = off, link down detection according to standard
+ */
+#define ETHTOOL_PHY_FAST_LINK_DOWN_ON 0
+#define ETHTOOL_PHY_FAST_LINK_DOWN_OFF 0xff
+
enum phy_tunable_id {
ETHTOOL_PHY_ID_UNSPEC,
ETHTOOL_PHY_DOWNSHIFT,
+ ETHTOOL_PHY_FAST_LINK_DOWN,
/*
* Add your fresh new phy tunable attribute above and remember to update
* phy_tunable_strings[] in net/core/ethtool.c
FOU_ATTR_IPPROTO, /* u8 */
FOU_ATTR_TYPE, /* u8 */
FOU_ATTR_REMCSUM_NOPARTIAL, /* flag */
+ FOU_ATTR_LOCAL_V4, /* u32 */
+ FOU_ATTR_LOCAL_V6, /* in6_addr */
+ FOU_ATTR_PEER_V4, /* u32 */
+ FOU_ATTR_PEER_V6, /* in6_addr */
+ FOU_ATTR_PEER_PORT, /* u16 */
+ FOU_ATTR_IFINDEX, /* s32 */
__FOU_ATTR_MAX,
};
#define TUNSETSTEERINGEBPF _IOR('T', 224, int)
#define TUNSETFILTEREBPF _IOR('T', 225, int)
#define TUNSETCARRIER _IOW('T', 226, int)
+#define TUNGETDEVNETNS _IO('T', 227)
/* TUNSETIFF ifr flags */
#define IFF_TUN 0x0001
#define IP_VS_PEDATA_MAXLEN 255
+/* Tunnel types */
+enum {
+ IP_VS_CONN_F_TUNNEL_TYPE_IPIP = 0, /* IPIP */
+ IP_VS_CONN_F_TUNNEL_TYPE_GUE, /* GUE */
+ IP_VS_CONN_F_TUNNEL_TYPE_MAX,
+};
+
/*
* The struct ip_vs_service_user and struct ip_vs_dest_user are
* used to set IPVS rules through setsockopt.
IPVS_DEST_ATTR_STATS64, /* nested attribute for dest stats */
+ IPVS_DEST_ATTR_TUN_TYPE, /* tunnel type */
+
+ IPVS_DEST_ATTR_TUN_PORT, /* tunnel port */
+
__IPVS_DEST_ATTR_MAX,
};
*
* @NFTA_OSF_DREG: destination register (NLA_U32: nft_registers)
* @NFTA_OSF_TTL: Value of the TTL osf option (NLA_U8)
+ * @NFTA_OSF_FLAGS: flags (NLA_U32)
*/
enum nft_osf_attributes {
NFTA_OSF_UNSPEC,
NFTA_OSF_DREG,
NFTA_OSF_TTL,
+ NFTA_OSF_FLAGS,
__NFTA_OSF_MAX,
};
#define NFTA_OSF_MAX (__NFTA_OSF_MAX - 1)
+enum nft_osf_flags {
+ NFT_OSF_F_VERSION = (1 << 0),
+};
+
/**
* enum nft_device_attributes - nf_tables device netlink attributes
*
OVS_TUNNEL_KEY_ATTR_IPV6_DST, /* struct in6_addr dst IPv6 address. */
OVS_TUNNEL_KEY_ATTR_PAD,
OVS_TUNNEL_KEY_ATTR_ERSPAN_OPTS, /* struct erspan_metadata */
+ OVS_TUNNEL_KEY_ATTR_IPV4_INFO_BRIDGE, /* No argument. IPV4_INFO_BRIDGE mode.*/
__OVS_TUNNEL_KEY_ATTR_MAX
};
* be received on NFNLGRP_CONNTRACK_NEW and NFNLGRP_CONNTRACK_DESTROY groups,
* respectively. Remaining bits control the changes for which an event is
* delivered on the NFNLGRP_CONNTRACK_UPDATE group.
+ * @OVS_CT_ATTR_TIMEOUT: Variable length string defining conntrack timeout.
*/
enum ovs_ct_attr {
OVS_CT_ATTR_UNSPEC,
OVS_CT_ATTR_NAT, /* Nested OVS_NAT_ATTR_* */
OVS_CT_ATTR_FORCE_COMMIT, /* No argument */
OVS_CT_ATTR_EVENTMASK, /* u32 mask of IPCT_* events. */
+ OVS_CT_ATTR_TIMEOUT, /* Associate timeout with this connection for
+ * fine-grain timeout tuning. */
__OVS_CT_ATTR_MAX
};
struct ovs_key_ethernet addresses;
};
+/*
+ * enum ovs_check_pkt_len_attr - Attributes for %OVS_ACTION_ATTR_CHECK_PKT_LEN.
+ *
+ * @OVS_CHECK_PKT_LEN_ATTR_PKT_LEN: u16 Packet length to check for.
+ * @OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_GREATER: Nested OVS_ACTION_ATTR_*
+ * actions to apply if the packer length is greater than the specified
+ * length in the attr - OVS_CHECK_PKT_LEN_ATTR_PKT_LEN.
+ * @OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_LESS_EQUAL - Nested OVS_ACTION_ATTR_*
+ * actions to apply if the packer length is lesser or equal to the specified
+ * length in the attr - OVS_CHECK_PKT_LEN_ATTR_PKT_LEN.
+ */
+enum ovs_check_pkt_len_attr {
+ OVS_CHECK_PKT_LEN_ATTR_UNSPEC,
+ OVS_CHECK_PKT_LEN_ATTR_PKT_LEN,
+ OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_GREATER,
+ OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_LESS_EQUAL,
+ __OVS_CHECK_PKT_LEN_ATTR_MAX,
+
+#ifdef __KERNEL__
+ OVS_CHECK_PKT_LEN_ATTR_ARG /* struct check_pkt_len_arg */
+#endif
+};
+
+#define OVS_CHECK_PKT_LEN_ATTR_MAX (__OVS_CHECK_PKT_LEN_ATTR_MAX - 1)
+
+#ifdef __KERNEL__
+struct check_pkt_len_arg {
+ u16 pkt_len; /* Same value as OVS_CHECK_PKT_LEN_ATTR_PKT_LEN'. */
+ bool exec_for_greater; /* When true, actions in IF_GREATER will
+ * not change flow keys. False otherwise.
+ */
+ bool exec_for_lesser_equal; /* When true, actions in IF_LESS_EQUAL
+ * will not change flow keys. False
+ * otherwise.
+ */
+};
+#endif
+
/**
* enum ovs_action_attr - Action types.
*
* packet, or modify the packet (e.g., change the DSCP field).
* @OVS_ACTION_ATTR_CLONE: make a copy of the packet and execute a list of
* actions without affecting the original packet and key.
+ * @OVS_ACTION_ATTR_CHECK_PKT_LEN: Check the packet length and execute a set
+ * of actions if greater than the specified packet length, else execute
+ * another set of actions.
*
* Only a single header can be set with a single %OVS_ACTION_ATTR_SET. Not all
* fields within a header are modifiable, e.g. the IPv4 protocol and fragment
OVS_ACTION_ATTR_POP_NSH, /* No argument. */
OVS_ACTION_ATTR_METER, /* u32 meter ID. */
OVS_ACTION_ATTR_CLONE, /* Nested OVS_CLONE_ATTR_*. */
+ OVS_ACTION_ATTR_CHECK_PKT_LEN, /* Nested OVS_CHECK_PKT_LEN_ATTR_*. */
__OVS_ACTION_ATTR_MAX, /* Nothing past this will be accepted
* from userspace. */
#define TCPI_OPT_ECN_SEEN 16 /* we received at least one packet with ECT */
#define TCPI_OPT_SYN_DATA 32 /* SYN-ACK acked data in SYN sent or rcvd */
+/*
+ * Sender's congestion state indicating normal or abnormal situations
+ * in the last round of packets sent. The state is driven by the ACK
+ * information and timer events.
+ */
enum tcp_ca_state {
+ /*
+ * Nothing bad has been observed recently.
+ * No apparent reordering, packet loss, or ECN marks.
+ */
TCP_CA_Open = 0,
#define TCPF_CA_Open (1<<TCP_CA_Open)
+ /*
+ * The sender enters disordered state when it has received DUPACKs or
+ * SACKs in the last round of packets sent. This could be due to packet
+ * loss or reordering but needs further information to confirm packets
+ * have been lost.
+ */
TCP_CA_Disorder = 1,
#define TCPF_CA_Disorder (1<<TCP_CA_Disorder)
+ /*
+ * The sender enters Congestion Window Reduction (CWR) state when it
+ * has received ACKs with ECN-ECE marks, or has experienced congestion
+ * or packet discard on the sender host (e.g. qdisc).
+ */
TCP_CA_CWR = 2,
#define TCPF_CA_CWR (1<<TCP_CA_CWR)
+ /*
+ * The sender is in fast recovery and retransmitting lost packets,
+ * typically triggered by ACK events.
+ */
TCP_CA_Recovery = 3,
#define TCPF_CA_Recovery (1<<TCP_CA_Recovery)
+ /*
+ * The sender is in loss recovery triggered by retransmission timeout.
+ */
TCP_CA_Loss = 4
#define TCPF_CA_Loss (1<<TCP_CA_Loss)
};
TIPC_NLA_PROP_TOL, /* u32 */
TIPC_NLA_PROP_WIN, /* u32 */
TIPC_NLA_PROP_MTU, /* u32 */
+ TIPC_NLA_PROP_BROADCAST, /* u32 */
+ TIPC_NLA_PROP_BROADCAST_RATIO, /* u32 */
__TIPC_NLA_PROP_MAX,
TIPC_NLA_PROP_MAX = __TIPC_NLA_PROP_MAX - 1
#define TLS_CIPHER_AES_GCM_256_TAG_SIZE 16
#define TLS_CIPHER_AES_GCM_256_REC_SEQ_SIZE 8
+#define TLS_CIPHER_AES_CCM_128 53
+#define TLS_CIPHER_AES_CCM_128_IV_SIZE 8
+#define TLS_CIPHER_AES_CCM_128_KEY_SIZE 16
+#define TLS_CIPHER_AES_CCM_128_SALT_SIZE 4
+#define TLS_CIPHER_AES_CCM_128_TAG_SIZE 16
+#define TLS_CIPHER_AES_CCM_128_REC_SEQ_SIZE 8
+
#define TLS_SET_RECORD_TYPE 1
#define TLS_GET_RECORD_TYPE 2
unsigned char rec_seq[TLS_CIPHER_AES_GCM_256_REC_SEQ_SIZE];
};
+struct tls12_crypto_info_aes_ccm_128 {
+ struct tls_crypto_info info;
+ unsigned char iv[TLS_CIPHER_AES_CCM_128_IV_SIZE];
+ unsigned char key[TLS_CIPHER_AES_CCM_128_KEY_SIZE];
+ unsigned char salt[TLS_CIPHER_AES_CCM_128_SALT_SIZE];
+ unsigned char rec_seq[TLS_CIPHER_AES_CCM_128_REC_SEQ_SIZE];
+};
+
#endif /* _UAPI_LINUX_TLS_H */
.head_offset = offsetof(struct kern_ipc_perm, khtnode),
.key_offset = offsetof(struct kern_ipc_perm, key),
.key_len = FIELD_SIZEOF(struct kern_ipc_perm, key),
- .locks_mul = 1,
.automatic_shrinking = true,
};
#include "map_in_map.h"
#define ARRAY_CREATE_FLAG_MASK \
- (BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY)
+ (BPF_F_NUMA_NODE | BPF_F_ACCESS_MASK)
static void bpf_array_free_percpu(struct bpf_array *array)
{
if (attr->max_entries == 0 || attr->key_size != 4 ||
attr->value_size == 0 ||
attr->map_flags & ~ARRAY_CREATE_FLAG_MASK ||
+ !bpf_map_flags_access_ok(attr->map_flags) ||
(percpu && numa_node != NUMA_NO_NODE))
return -EINVAL;
return array->value + array->elem_size * (index & array->index_mask);
}
+static int array_map_direct_value_addr(const struct bpf_map *map, u64 *imm,
+ u32 off)
+{
+ struct bpf_array *array = container_of(map, struct bpf_array, map);
+
+ if (map->max_entries != 1)
+ return -ENOTSUPP;
+ if (off >= map->value_size)
+ return -EINVAL;
+
+ *imm = (unsigned long)array->value;
+ return 0;
+}
+
+static int array_map_direct_value_meta(const struct bpf_map *map, u64 imm,
+ u32 *off)
+{
+ struct bpf_array *array = container_of(map, struct bpf_array, map);
+ u64 base = (unsigned long)array->value;
+ u64 range = array->elem_size;
+
+ if (map->max_entries != 1)
+ return -ENOTSUPP;
+ if (imm < base || imm >= base + range)
+ return -ENOENT;
+
+ *off = imm - base;
+ return 0;
+}
+
/* emit BPF instructions equivalent to C code of array_map_lookup_elem() */
static u32 array_map_gen_lookup(struct bpf_map *map, struct bpf_insn *insn_buf)
{
return;
}
- seq_printf(m, "%u: ", *(u32 *)key);
+ if (map->btf_key_type_id)
+ seq_printf(m, "%u: ", *(u32 *)key);
btf_type_seq_show(map->btf, map->btf_value_type_id, value, m);
seq_puts(m, "\n");
{
u32 int_data;
+ /* One exception for keyless BTF: .bss/.data/.rodata map */
+ if (btf_type_is_void(key_type)) {
+ if (map->map_type != BPF_MAP_TYPE_ARRAY ||
+ map->max_entries != 1)
+ return -EINVAL;
+
+ if (BTF_INFO_KIND(value_type->info) != BTF_KIND_DATASEC)
+ return -EINVAL;
+
+ return 0;
+ }
+
if (BTF_INFO_KIND(key_type->info) != BTF_KIND_INT)
return -EINVAL;
.map_update_elem = array_map_update_elem,
.map_delete_elem = array_map_delete_elem,
.map_gen_lookup = array_map_gen_lookup,
+ .map_direct_value_addr = array_map_direct_value_addr,
+ .map_direct_value_meta = array_map_direct_value_meta,
.map_seq_show_elem = array_map_seq_show_elem,
.map_check_btf = array_map_check_btf,
};
/* only file descriptors can be stored in this type of map */
if (attr->value_size != sizeof(u32))
return -EINVAL;
+ /* Program read-only/write-only not supported for special maps yet. */
+ if (attr->map_flags & (BPF_F_RDONLY_PROG | BPF_F_WRONLY_PROG))
+ return -EINVAL;
return array_map_alloc_check(attr);
}
i < btf_type_vlen(struct_type); \
i++, member++)
+#define for_each_vsi(i, struct_type, member) \
+ for (i = 0, member = btf_type_var_secinfo(struct_type); \
+ i < btf_type_vlen(struct_type); \
+ i++, member++)
+
+#define for_each_vsi_from(i, from, struct_type, member) \
+ for (i = from, member = btf_type_var_secinfo(struct_type) + from; \
+ i < btf_type_vlen(struct_type); \
+ i++, member++)
+
static DEFINE_IDR(btf_idr);
static DEFINE_SPINLOCK(btf_idr_lock);
[BTF_KIND_RESTRICT] = "RESTRICT",
[BTF_KIND_FUNC] = "FUNC",
[BTF_KIND_FUNC_PROTO] = "FUNC_PROTO",
+ [BTF_KIND_VAR] = "VAR",
+ [BTF_KIND_DATASEC] = "DATASEC",
};
struct btf_kind_operations {
return false;
}
-static bool btf_type_is_void(const struct btf_type *t)
+bool btf_type_is_void(const struct btf_type *t)
{
return t == &btf_void;
}
return BTF_INFO_KIND(t->info) == BTF_KIND_INT;
}
+static bool btf_type_is_var(const struct btf_type *t)
+{
+ return BTF_INFO_KIND(t->info) == BTF_KIND_VAR;
+}
+
+static bool btf_type_is_datasec(const struct btf_type *t)
+{
+ return BTF_INFO_KIND(t->info) == BTF_KIND_DATASEC;
+}
+
+/* Types that act only as a source, not sink or intermediate
+ * type when resolving.
+ */
+static bool btf_type_is_resolve_source_only(const struct btf_type *t)
+{
+ return btf_type_is_var(t) ||
+ btf_type_is_datasec(t);
+}
+
/* What types need to be resolved?
*
* btf_type_is_modifier() is an obvious one.
*
* btf_type_is_struct() because its member refers to
* another type (through member->type).
-
+ *
+ * btf_type_is_var() because the variable refers to
+ * another type. btf_type_is_datasec() holds multiple
+ * btf_type_is_var() types that need resolving.
+ *
* btf_type_is_array() because its element (array->type)
* refers to another type. Array can be thought of a
* special case of struct while array just has the same
static bool btf_type_needs_resolve(const struct btf_type *t)
{
return btf_type_is_modifier(t) ||
- btf_type_is_ptr(t) ||
- btf_type_is_struct(t) ||
- btf_type_is_array(t);
+ btf_type_is_ptr(t) ||
+ btf_type_is_struct(t) ||
+ btf_type_is_array(t) ||
+ btf_type_is_var(t) ||
+ btf_type_is_datasec(t);
}
/* t->size can be used */
case BTF_KIND_STRUCT:
case BTF_KIND_UNION:
case BTF_KIND_ENUM:
+ case BTF_KIND_DATASEC:
return true;
}
return (const struct btf_enum *)(t + 1);
}
+static const struct btf_var *btf_type_var(const struct btf_type *t)
+{
+ return (const struct btf_var *)(t + 1);
+}
+
+static const struct btf_var_secinfo *btf_type_var_secinfo(const struct btf_type *t)
+{
+ return (const struct btf_var_secinfo *)(t + 1);
+}
+
static const struct btf_kind_operations *btf_type_ops(const struct btf_type *t)
{
return kind_ops[BTF_INFO_KIND(t->info)];
offset < btf->hdr.str_len;
}
-/* Only C-style identifier is permitted. This can be relaxed if
- * necessary.
- */
-static bool btf_name_valid_identifier(const struct btf *btf, u32 offset)
+static bool __btf_name_char_ok(char c, bool first, bool dot_ok)
+{
+ if ((first ? !isalpha(c) :
+ !isalnum(c)) &&
+ c != '_' &&
+ ((c == '.' && !dot_ok) ||
+ c != '.'))
+ return false;
+ return true;
+}
+
+static bool __btf_name_valid(const struct btf *btf, u32 offset, bool dot_ok)
{
/* offset must be valid */
const char *src = &btf->strings[offset];
const char *src_limit;
- if (!isalpha(*src) && *src != '_')
+ if (!__btf_name_char_ok(*src, true, dot_ok))
return false;
/* set a limit on identifier length */
src_limit = src + KSYM_NAME_LEN;
src++;
while (*src && src < src_limit) {
- if (!isalnum(*src) && *src != '_')
+ if (!__btf_name_char_ok(*src, false, dot_ok))
return false;
src++;
}
return !*src;
}
+/* Only C-style identifier is permitted. This can be relaxed if
+ * necessary.
+ */
+static bool btf_name_valid_identifier(const struct btf *btf, u32 offset)
+{
+ return __btf_name_valid(btf, offset, false);
+}
+
+static bool btf_name_valid_section(const struct btf *btf, u32 offset)
+{
+ return __btf_name_valid(btf, offset, true);
+}
+
static const char *__btf_name_by_offset(const struct btf *btf, u32 offset)
{
if (!offset)
__btf_verifier_log(log, "\n");
}
+__printf(4, 5)
+static void btf_verifier_log_vsi(struct btf_verifier_env *env,
+ const struct btf_type *datasec_type,
+ const struct btf_var_secinfo *vsi,
+ const char *fmt, ...)
+{
+ struct bpf_verifier_log *log = &env->log;
+ va_list args;
+
+ if (!bpf_verifier_log_needed(log))
+ return;
+ if (env->phase != CHECK_META)
+ btf_verifier_log_type(env, datasec_type, NULL);
+
+ __btf_verifier_log(log, "\t type_id=%u offset=%u size=%u",
+ vsi->type, vsi->offset, vsi->size);
+ if (fmt && *fmt) {
+ __btf_verifier_log(log, " ");
+ va_start(args, fmt);
+ bpf_verifier_vlog(log, fmt, args);
+ va_end(args);
+ }
+
+ __btf_verifier_log(log, "\n");
+}
+
static void btf_verifier_log_hdr(struct btf_verifier_env *env,
u32 btf_data_size)
{
} else if (btf_type_is_ptr(size_type)) {
size = sizeof(void *);
} else {
- if (WARN_ON_ONCE(!btf_type_is_modifier(size_type)))
+ if (WARN_ON_ONCE(!btf_type_is_modifier(size_type) &&
+ !btf_type_is_var(size_type)))
return NULL;
size = btf->resolved_sizes[size_type_id];
u32 next_type_size = 0;
next_type = btf_type_by_id(btf, next_type_id);
- if (!next_type) {
+ if (!next_type || btf_type_is_resolve_source_only(next_type)) {
btf_verifier_log_type(env, v->t, "Invalid type_id");
return -EINVAL;
}
return 0;
}
+static int btf_var_resolve(struct btf_verifier_env *env,
+ const struct resolve_vertex *v)
+{
+ const struct btf_type *next_type;
+ const struct btf_type *t = v->t;
+ u32 next_type_id = t->type;
+ struct btf *btf = env->btf;
+ u32 next_type_size;
+
+ next_type = btf_type_by_id(btf, next_type_id);
+ if (!next_type || btf_type_is_resolve_source_only(next_type)) {
+ btf_verifier_log_type(env, v->t, "Invalid type_id");
+ return -EINVAL;
+ }
+
+ if (!env_type_is_resolve_sink(env, next_type) &&
+ !env_type_is_resolved(env, next_type_id))
+ return env_stack_push(env, next_type, next_type_id);
+
+ if (btf_type_is_modifier(next_type)) {
+ const struct btf_type *resolved_type;
+ u32 resolved_type_id;
+
+ resolved_type_id = next_type_id;
+ resolved_type = btf_type_id_resolve(btf, &resolved_type_id);
+
+ if (btf_type_is_ptr(resolved_type) &&
+ !env_type_is_resolve_sink(env, resolved_type) &&
+ !env_type_is_resolved(env, resolved_type_id))
+ return env_stack_push(env, resolved_type,
+ resolved_type_id);
+ }
+
+ /* We must resolve to something concrete at this point, no
+ * forward types or similar that would resolve to size of
+ * zero is allowed.
+ */
+ if (!btf_type_id_size(btf, &next_type_id, &next_type_size)) {
+ btf_verifier_log_type(env, v->t, "Invalid type_id");
+ return -EINVAL;
+ }
+
+ env_stack_pop_resolved(env, next_type_id, next_type_size);
+
+ return 0;
+}
+
static int btf_ptr_resolve(struct btf_verifier_env *env,
const struct resolve_vertex *v)
{
struct btf *btf = env->btf;
next_type = btf_type_by_id(btf, next_type_id);
- if (!next_type) {
+ if (!next_type || btf_type_is_resolve_source_only(next_type)) {
btf_verifier_log_type(env, v->t, "Invalid type_id");
return -EINVAL;
}
btf_type_ops(t)->seq_show(btf, t, type_id, data, bits_offset, m);
}
+static void btf_var_seq_show(const struct btf *btf, const struct btf_type *t,
+ u32 type_id, void *data, u8 bits_offset,
+ struct seq_file *m)
+{
+ t = btf_type_id_resolve(btf, &type_id);
+
+ btf_type_ops(t)->seq_show(btf, t, type_id, data, bits_offset, m);
+}
+
static void btf_ptr_seq_show(const struct btf *btf, const struct btf_type *t,
u32 type_id, void *data, u8 bits_offset,
struct seq_file *m)
/* Check array->index_type */
index_type_id = array->index_type;
index_type = btf_type_by_id(btf, index_type_id);
- if (btf_type_nosize_or_null(index_type)) {
+ if (btf_type_is_resolve_source_only(index_type) ||
+ btf_type_nosize_or_null(index_type)) {
btf_verifier_log_type(env, v->t, "Invalid index");
return -EINVAL;
}
/* Check array->type */
elem_type_id = array->type;
elem_type = btf_type_by_id(btf, elem_type_id);
- if (btf_type_nosize_or_null(elem_type)) {
+ if (btf_type_is_resolve_source_only(elem_type) ||
+ btf_type_nosize_or_null(elem_type)) {
btf_verifier_log_type(env, v->t,
"Invalid elem");
return -EINVAL;
const struct btf_type *member_type = btf_type_by_id(env->btf,
member_type_id);
- if (btf_type_nosize_or_null(member_type)) {
+ if (btf_type_is_resolve_source_only(member_type) ||
+ btf_type_nosize_or_null(member_type)) {
btf_verifier_log_member(env, v->t, member,
"Invalid member");
return -EINVAL;
.seq_show = btf_df_seq_show,
};
+static s32 btf_var_check_meta(struct btf_verifier_env *env,
+ const struct btf_type *t,
+ u32 meta_left)
+{
+ const struct btf_var *var;
+ u32 meta_needed = sizeof(*var);
+
+ if (meta_left < meta_needed) {
+ btf_verifier_log_basic(env, t,
+ "meta_left:%u meta_needed:%u",
+ meta_left, meta_needed);
+ return -EINVAL;
+ }
+
+ if (btf_type_vlen(t)) {
+ btf_verifier_log_type(env, t, "vlen != 0");
+ return -EINVAL;
+ }
+
+ if (btf_type_kflag(t)) {
+ btf_verifier_log_type(env, t, "Invalid btf_info kind_flag");
+ return -EINVAL;
+ }
+
+ if (!t->name_off ||
+ !__btf_name_valid(env->btf, t->name_off, true)) {
+ btf_verifier_log_type(env, t, "Invalid name");
+ return -EINVAL;
+ }
+
+ /* A var cannot be in type void */
+ if (!t->type || !BTF_TYPE_ID_VALID(t->type)) {
+ btf_verifier_log_type(env, t, "Invalid type_id");
+ return -EINVAL;
+ }
+
+ var = btf_type_var(t);
+ if (var->linkage != BTF_VAR_STATIC &&
+ var->linkage != BTF_VAR_GLOBAL_ALLOCATED) {
+ btf_verifier_log_type(env, t, "Linkage not supported");
+ return -EINVAL;
+ }
+
+ btf_verifier_log_type(env, t, NULL);
+
+ return meta_needed;
+}
+
+static void btf_var_log(struct btf_verifier_env *env, const struct btf_type *t)
+{
+ const struct btf_var *var = btf_type_var(t);
+
+ btf_verifier_log(env, "type_id=%u linkage=%u", t->type, var->linkage);
+}
+
+static const struct btf_kind_operations var_ops = {
+ .check_meta = btf_var_check_meta,
+ .resolve = btf_var_resolve,
+ .check_member = btf_df_check_member,
+ .check_kflag_member = btf_df_check_kflag_member,
+ .log_details = btf_var_log,
+ .seq_show = btf_var_seq_show,
+};
+
+static s32 btf_datasec_check_meta(struct btf_verifier_env *env,
+ const struct btf_type *t,
+ u32 meta_left)
+{
+ const struct btf_var_secinfo *vsi;
+ u64 last_vsi_end_off = 0, sum = 0;
+ u32 i, meta_needed;
+
+ meta_needed = btf_type_vlen(t) * sizeof(*vsi);
+ if (meta_left < meta_needed) {
+ btf_verifier_log_basic(env, t,
+ "meta_left:%u meta_needed:%u",
+ meta_left, meta_needed);
+ return -EINVAL;
+ }
+
+ if (!btf_type_vlen(t)) {
+ btf_verifier_log_type(env, t, "vlen == 0");
+ return -EINVAL;
+ }
+
+ if (!t->size) {
+ btf_verifier_log_type(env, t, "size == 0");
+ return -EINVAL;
+ }
+
+ if (btf_type_kflag(t)) {
+ btf_verifier_log_type(env, t, "Invalid btf_info kind_flag");
+ return -EINVAL;
+ }
+
+ if (!t->name_off ||
+ !btf_name_valid_section(env->btf, t->name_off)) {
+ btf_verifier_log_type(env, t, "Invalid name");
+ return -EINVAL;
+ }
+
+ btf_verifier_log_type(env, t, NULL);
+
+ for_each_vsi(i, t, vsi) {
+ /* A var cannot be in type void */
+ if (!vsi->type || !BTF_TYPE_ID_VALID(vsi->type)) {
+ btf_verifier_log_vsi(env, t, vsi,
+ "Invalid type_id");
+ return -EINVAL;
+ }
+
+ if (vsi->offset < last_vsi_end_off || vsi->offset >= t->size) {
+ btf_verifier_log_vsi(env, t, vsi,
+ "Invalid offset");
+ return -EINVAL;
+ }
+
+ if (!vsi->size || vsi->size > t->size) {
+ btf_verifier_log_vsi(env, t, vsi,
+ "Invalid size");
+ return -EINVAL;
+ }
+
+ last_vsi_end_off = vsi->offset + vsi->size;
+ if (last_vsi_end_off > t->size) {
+ btf_verifier_log_vsi(env, t, vsi,
+ "Invalid offset+size");
+ return -EINVAL;
+ }
+
+ btf_verifier_log_vsi(env, t, vsi, NULL);
+ sum += vsi->size;
+ }
+
+ if (t->size < sum) {
+ btf_verifier_log_type(env, t, "Invalid btf_info size");
+ return -EINVAL;
+ }
+
+ return meta_needed;
+}
+
+static int btf_datasec_resolve(struct btf_verifier_env *env,
+ const struct resolve_vertex *v)
+{
+ const struct btf_var_secinfo *vsi;
+ struct btf *btf = env->btf;
+ u16 i;
+
+ for_each_vsi_from(i, v->next_member, v->t, vsi) {
+ u32 var_type_id = vsi->type, type_id, type_size = 0;
+ const struct btf_type *var_type = btf_type_by_id(env->btf,
+ var_type_id);
+ if (!var_type || !btf_type_is_var(var_type)) {
+ btf_verifier_log_vsi(env, v->t, vsi,
+ "Not a VAR kind member");
+ return -EINVAL;
+ }
+
+ if (!env_type_is_resolve_sink(env, var_type) &&
+ !env_type_is_resolved(env, var_type_id)) {
+ env_stack_set_next_member(env, i + 1);
+ return env_stack_push(env, var_type, var_type_id);
+ }
+
+ type_id = var_type->type;
+ if (!btf_type_id_size(btf, &type_id, &type_size)) {
+ btf_verifier_log_vsi(env, v->t, vsi, "Invalid type");
+ return -EINVAL;
+ }
+
+ if (vsi->size < type_size) {
+ btf_verifier_log_vsi(env, v->t, vsi, "Invalid size");
+ return -EINVAL;
+ }
+ }
+
+ env_stack_pop_resolved(env, 0, 0);
+ return 0;
+}
+
+static void btf_datasec_log(struct btf_verifier_env *env,
+ const struct btf_type *t)
+{
+ btf_verifier_log(env, "size=%u vlen=%u", t->size, btf_type_vlen(t));
+}
+
+static void btf_datasec_seq_show(const struct btf *btf,
+ const struct btf_type *t, u32 type_id,
+ void *data, u8 bits_offset,
+ struct seq_file *m)
+{
+ const struct btf_var_secinfo *vsi;
+ const struct btf_type *var;
+ u32 i;
+
+ seq_printf(m, "section (\"%s\") = {", __btf_name_by_offset(btf, t->name_off));
+ for_each_vsi(i, t, vsi) {
+ var = btf_type_by_id(btf, vsi->type);
+ if (i)
+ seq_puts(m, ",");
+ btf_type_ops(var)->seq_show(btf, var, vsi->type,
+ data + vsi->offset, bits_offset, m);
+ }
+ seq_puts(m, "}");
+}
+
+static const struct btf_kind_operations datasec_ops = {
+ .check_meta = btf_datasec_check_meta,
+ .resolve = btf_datasec_resolve,
+ .check_member = btf_df_check_member,
+ .check_kflag_member = btf_df_check_kflag_member,
+ .log_details = btf_datasec_log,
+ .seq_show = btf_datasec_seq_show,
+};
+
static int btf_func_proto_check(struct btf_verifier_env *env,
const struct btf_type *t)
{
[BTF_KIND_RESTRICT] = &modifier_ops,
[BTF_KIND_FUNC] = &func_ops,
[BTF_KIND_FUNC_PROTO] = &func_proto_ops,
+ [BTF_KIND_VAR] = &var_ops,
+ [BTF_KIND_DATASEC] = &datasec_ops,
};
static s32 btf_check_meta(struct btf_verifier_env *env,
if (!env_type_is_resolved(env, type_id))
return false;
- if (btf_type_is_struct(t))
+ if (btf_type_is_struct(t) || btf_type_is_datasec(t))
return !btf->resolved_ids[type_id] &&
- !btf->resolved_sizes[type_id];
+ !btf->resolved_sizes[type_id];
- if (btf_type_is_modifier(t) || btf_type_is_ptr(t)) {
+ if (btf_type_is_modifier(t) || btf_type_is_ptr(t) ||
+ btf_type_is_var(t)) {
t = btf_type_id_resolve(btf, &type_id);
- return t && !btf_type_is_modifier(t);
+ return t &&
+ !btf_type_is_modifier(t) &&
+ !btf_type_is_var(t) &&
+ !btf_type_is_datasec(t);
}
if (btf_type_is_array(t)) {
dst[i] = fp->insnsi[i];
if (!was_ld_map &&
dst[i].code == (BPF_LD | BPF_IMM | BPF_DW) &&
- dst[i].src_reg == BPF_PSEUDO_MAP_FD) {
+ (dst[i].src_reg == BPF_PSEUDO_MAP_FD ||
+ dst[i].src_reg == BPF_PSEUDO_MAP_VALUE)) {
was_ld_map = true;
dst[i].imm = 0;
} else if (was_ld_map &&
u32 insn_adj_cnt, insn_rest, insn_delta = len - 1;
const u32 cnt_max = S16_MAX;
struct bpf_prog *prog_adj;
+ int err;
/* Since our patchlet doesn't expand the image, we're done. */
if (insn_delta == 0) {
* we afterwards may not fail anymore.
*/
if (insn_adj_cnt > cnt_max &&
- bpf_adj_branches(prog, off, off + 1, off + len, true))
- return NULL;
+ (err = bpf_adj_branches(prog, off, off + 1, off + len, true)))
+ return ERR_PTR(err);
/* Several new instructions need to be inserted. Make room
* for them. Likely, there's no need for a new allocation as
prog_adj = bpf_prog_realloc(prog, bpf_prog_size(insn_adj_cnt),
GFP_USER);
if (!prog_adj)
- return NULL;
+ return ERR_PTR(-ENOMEM);
prog_adj->len = insn_adj_cnt;
continue;
tmp = bpf_patch_insn_single(clone, i, insn_buff, rewritten);
- if (!tmp) {
+ if (IS_ERR(tmp)) {
/* Patching may have repointed aux->prog during
* realloc from the original one, so we need to
* fix it up here on error.
*/
bpf_jit_prog_release_other(prog, clone);
- return ERR_PTR(-ENOMEM);
+ return tmp;
}
clone = tmp;
* part of the ldimm64 insn is accessible.
*/
u64 imm = ((u64)(insn + 1)->imm << 32) | (u32)insn->imm;
- bool map_ptr = insn->src_reg == BPF_PSEUDO_MAP_FD;
+ bool is_ptr = insn->src_reg == BPF_PSEUDO_MAP_FD ||
+ insn->src_reg == BPF_PSEUDO_MAP_VALUE;
char tmp[64];
- if (map_ptr && !allow_ptr_leaks)
+ if (is_ptr && !allow_ptr_leaks)
imm = 0;
verbose(cbs->private_data, "(%02x) r%d = %s\n",
#define HTAB_CREATE_FLAG_MASK \
(BPF_F_NO_PREALLOC | BPF_F_NO_COMMON_LRU | BPF_F_NUMA_NODE | \
- BPF_F_RDONLY | BPF_F_WRONLY | BPF_F_ZERO_SEED)
+ BPF_F_ACCESS_MASK | BPF_F_ZERO_SEED)
struct bucket {
struct hlist_nulls_head head;
/* Guard against local DoS, and discourage production use. */
return -EPERM;
- if (attr->map_flags & ~HTAB_CREATE_FLAG_MASK)
- /* reserved bits should not be used */
+ if (attr->map_flags & ~HTAB_CREATE_FLAG_MASK ||
+ !bpf_map_flags_access_ok(attr->map_flags))
return -EINVAL;
if (!lru && percpu_lru)
#ifdef CONFIG_CGROUP_BPF
#define LOCAL_STORAGE_CREATE_FLAG_MASK \
- (BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY)
+ (BPF_F_NUMA_NODE | BPF_F_ACCESS_MASK)
struct bpf_cgroup_storage_map {
struct bpf_map map;
if (attr->value_size > PAGE_SIZE)
return ERR_PTR(-E2BIG);
- if (attr->map_flags & ~LOCAL_STORAGE_CREATE_FLAG_MASK)
- /* reserved bits should not be used */
+ if (attr->map_flags & ~LOCAL_STORAGE_CREATE_FLAG_MASK ||
+ !bpf_map_flags_access_ok(attr->map_flags))
return ERR_PTR(-EINVAL);
if (attr->max_entries)
#define LPM_KEY_SIZE_MIN LPM_KEY_SIZE(LPM_DATA_SIZE_MIN)
#define LPM_CREATE_FLAG_MASK (BPF_F_NO_PREALLOC | BPF_F_NUMA_NODE | \
- BPF_F_RDONLY | BPF_F_WRONLY)
+ BPF_F_ACCESS_MASK)
static struct bpf_map *trie_alloc(union bpf_attr *attr)
{
if (attr->max_entries == 0 ||
!(attr->map_flags & BPF_F_NO_PREALLOC) ||
attr->map_flags & ~LPM_CREATE_FLAG_MASK ||
+ !bpf_map_flags_access_ok(attr->map_flags) ||
attr->key_size < LPM_KEY_SIZE_MIN ||
attr->key_size > LPM_KEY_SIZE_MAX ||
attr->value_size < LPM_VAL_SIZE_MIN ||
#include "percpu_freelist.h"
#define QUEUE_STACK_CREATE_FLAG_MASK \
- (BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY)
-
+ (BPF_F_NUMA_NODE | BPF_F_ACCESS_MASK)
struct bpf_queue_stack {
struct bpf_map map;
/* check sanity of attributes */
if (attr->max_entries == 0 || attr->key_size != 0 ||
attr->value_size == 0 ||
- attr->map_flags & ~QUEUE_STACK_CREATE_FLAG_MASK)
+ attr->map_flags & ~QUEUE_STACK_CREATE_FLAG_MASK ||
+ !bpf_map_flags_access_ok(attr->map_flags))
return -EINVAL;
if (attr->value_size > KMALLOC_MAX_SIZE)
kvfree(area);
}
+static u32 bpf_map_flags_retain_permanent(u32 flags)
+{
+ /* Some map creation flags are not tied to the map object but
+ * rather to the map fd instead, so they have no meaning upon
+ * map object inspection since multiple file descriptors with
+ * different (access) properties can exist here. Thus, given
+ * this has zero meaning for the map itself, lets clear these
+ * from here.
+ */
+ return flags & ~(BPF_F_RDONLY | BPF_F_WRONLY);
+}
+
void bpf_map_init_from_attr(struct bpf_map *map, union bpf_attr *attr)
{
map->map_type = attr->map_type;
map->key_size = attr->key_size;
map->value_size = attr->value_size;
map->max_entries = attr->max_entries;
- map->map_flags = attr->map_flags;
+ map->map_flags = bpf_map_flags_retain_permanent(attr->map_flags);
map->numa_node = bpf_map_attr_numa_node(attr);
}
return 0;
}
+static fmode_t map_get_sys_perms(struct bpf_map *map, struct fd f)
+{
+ fmode_t mode = f.file->f_mode;
+
+ /* Our file permissions may have been overridden by global
+ * map permissions facing syscall side.
+ */
+ if (READ_ONCE(map->frozen))
+ mode &= ~FMODE_CAN_WRITE;
+ return mode;
+}
+
#ifdef CONFIG_PROC_FS
static void bpf_map_show_fdinfo(struct seq_file *m, struct file *filp)
{
"max_entries:\t%u\n"
"map_flags:\t%#x\n"
"memlock:\t%llu\n"
- "map_id:\t%u\n",
+ "map_id:\t%u\n"
+ "frozen:\t%u\n",
map->map_type,
map->key_size,
map->value_size,
map->max_entries,
map->map_flags,
map->pages * 1ULL << PAGE_SHIFT,
- map->id);
+ map->id,
+ READ_ONCE(map->frozen));
if (owner_prog_type) {
seq_printf(m, "owner_prog_type:\t%u\n",
const char *end = src + BPF_OBJ_NAME_LEN;
memset(dst, 0, BPF_OBJ_NAME_LEN);
-
- /* Copy all isalnum() and '_' char */
+ /* Copy all isalnum(), '_' and '.' chars. */
while (src < end && *src) {
- if (!isalnum(*src) && *src != '_')
+ if (!isalnum(*src) &&
+ *src != '_' && *src != '.')
return -EINVAL;
*dst++ = *src++;
}
u32 key_size, value_size;
int ret = 0;
- key_type = btf_type_id_size(btf, &btf_key_id, &key_size);
- if (!key_type || key_size != map->key_size)
- return -EINVAL;
+ /* Some maps allow key to be unspecified. */
+ if (btf_key_id) {
+ key_type = btf_type_id_size(btf, &btf_key_id, &key_size);
+ if (!key_type || key_size != map->key_size)
+ return -EINVAL;
+ } else {
+ key_type = btf_type_by_id(btf, 0);
+ if (!map->ops->map_check_btf)
+ return -EINVAL;
+ }
value_type = btf_type_id_size(btf, &btf_value_id, &value_size);
if (!value_type || value_size != map->value_size)
map->spin_lock_off = btf_find_spin_lock(btf, value_type);
if (map_value_has_spin_lock(map)) {
+ if (map->map_flags & BPF_F_RDONLY_PROG)
+ return -EACCES;
if (map->map_type != BPF_MAP_TYPE_HASH &&
map->map_type != BPF_MAP_TYPE_ARRAY &&
map->map_type != BPF_MAP_TYPE_CGROUP_STORAGE)
if (attr->btf_key_type_id || attr->btf_value_type_id) {
struct btf *btf;
- if (!attr->btf_key_type_id || !attr->btf_value_type_id) {
+ if (!attr->btf_value_type_id) {
err = -EINVAL;
goto free_map_nouncharge;
}
map = __bpf_map_get(f);
if (IS_ERR(map))
return PTR_ERR(map);
-
- if (!(f.file->f_mode & FMODE_CAN_READ)) {
+ if (!(map_get_sys_perms(map, f) & FMODE_CAN_READ)) {
err = -EPERM;
goto err_put;
}
map = __bpf_map_get(f);
if (IS_ERR(map))
return PTR_ERR(map);
-
- if (!(f.file->f_mode & FMODE_CAN_WRITE)) {
+ if (!(map_get_sys_perms(map, f) & FMODE_CAN_WRITE)) {
err = -EPERM;
goto err_put;
}
map = __bpf_map_get(f);
if (IS_ERR(map))
return PTR_ERR(map);
-
- if (!(f.file->f_mode & FMODE_CAN_WRITE)) {
+ if (!(map_get_sys_perms(map, f) & FMODE_CAN_WRITE)) {
err = -EPERM;
goto err_put;
}
map = __bpf_map_get(f);
if (IS_ERR(map))
return PTR_ERR(map);
-
- if (!(f.file->f_mode & FMODE_CAN_READ)) {
+ if (!(map_get_sys_perms(map, f) & FMODE_CAN_READ)) {
err = -EPERM;
goto err_put;
}
map = __bpf_map_get(f);
if (IS_ERR(map))
return PTR_ERR(map);
-
- if (!(f.file->f_mode & FMODE_CAN_WRITE)) {
+ if (!(map_get_sys_perms(map, f) & FMODE_CAN_WRITE)) {
err = -EPERM;
goto err_put;
}
return err;
}
+#define BPF_MAP_FREEZE_LAST_FIELD map_fd
+
+static int map_freeze(const union bpf_attr *attr)
+{
+ int err = 0, ufd = attr->map_fd;
+ struct bpf_map *map;
+ struct fd f;
+
+ if (CHECK_ATTR(BPF_MAP_FREEZE))
+ return -EINVAL;
+
+ f = fdget(ufd);
+ map = __bpf_map_get(f);
+ if (IS_ERR(map))
+ return PTR_ERR(map);
+ if (READ_ONCE(map->frozen)) {
+ err = -EBUSY;
+ goto err_put;
+ }
+ if (!capable(CAP_SYS_ADMIN)) {
+ err = -EPERM;
+ goto err_put;
+ }
+
+ WRITE_ONCE(map->frozen, true);
+err_put:
+ fdput(f);
+ return err;
+}
+
static const struct bpf_prog_ops * const bpf_prog_types[] = {
#define BPF_PROG_TYPE(_id, _name) \
[_id] = & _name ## _prog_ops,
/* eBPF programs must be GPL compatible to use GPL-ed functions */
is_gpl = license_is_gpl_compatible(license);
- if (attr->insn_cnt == 0 || attr->insn_cnt > BPF_MAXINSNS)
+ if (attr->insn_cnt == 0 ||
+ attr->insn_cnt > (capable(CAP_SYS_ADMIN) ? BPF_COMPLEXITY_LIMIT_INSNS : BPF_MAXINSNS))
return -E2BIG;
if (type != BPF_PROG_TYPE_SOCKET_FILTER &&
type != BPF_PROG_TYPE_CGROUP_SKB &&
return cgroup_bpf_prog_query(attr, uattr);
}
-#define BPF_PROG_TEST_RUN_LAST_FIELD test.duration
+#define BPF_PROG_TEST_RUN_LAST_FIELD test.ctx_out
static int bpf_prog_test_run(const union bpf_attr *attr,
union bpf_attr __user *uattr)
if (CHECK_ATTR(BPF_PROG_TEST_RUN))
return -EINVAL;
+ if ((attr->test.ctx_size_in && !attr->test.ctx_in) ||
+ (!attr->test.ctx_size_in && attr->test.ctx_in))
+ return -EINVAL;
+
+ if ((attr->test.ctx_size_out && !attr->test.ctx_out) ||
+ (!attr->test.ctx_size_out && attr->test.ctx_out))
+ return -EINVAL;
+
prog = bpf_prog_get(attr->test.prog_fd);
if (IS_ERR(prog))
return PTR_ERR(prog);
}
static const struct bpf_map *bpf_map_from_imm(const struct bpf_prog *prog,
- unsigned long addr)
+ unsigned long addr, u32 *off,
+ u32 *type)
{
+ const struct bpf_map *map;
int i;
- for (i = 0; i < prog->aux->used_map_cnt; i++)
- if (prog->aux->used_maps[i] == (void *)addr)
- return prog->aux->used_maps[i];
+ for (i = 0, *off = 0; i < prog->aux->used_map_cnt; i++) {
+ map = prog->aux->used_maps[i];
+ if (map == (void *)addr) {
+ *type = BPF_PSEUDO_MAP_FD;
+ return map;
+ }
+ if (!map->ops->map_direct_value_meta)
+ continue;
+ if (!map->ops->map_direct_value_meta(map, addr, off)) {
+ *type = BPF_PSEUDO_MAP_VALUE;
+ return map;
+ }
+ }
+
return NULL;
}
{
const struct bpf_map *map;
struct bpf_insn *insns;
+ u32 off, type;
u64 imm;
int i;
continue;
imm = ((u64)insns[i + 1].imm << 32) | (u32)insns[i].imm;
- map = bpf_map_from_imm(prog, imm);
+ map = bpf_map_from_imm(prog, imm, &off, &type);
if (map) {
- insns[i].src_reg = BPF_PSEUDO_MAP_FD;
+ insns[i].src_reg = type;
insns[i].imm = map->id;
- insns[i + 1].imm = 0;
+ insns[i + 1].imm = off;
continue;
}
}
case BPF_MAP_GET_NEXT_KEY:
err = map_get_next_key(&attr);
break;
+ case BPF_MAP_FREEZE:
+ err = map_freeze(&attr);
+ break;
case BPF_PROG_LOAD:
err = bpf_prog_load(&attr, uattr);
break;
struct bpf_verifier_stack_elem *next;
};
-#define BPF_COMPLEXITY_LIMIT_INSNS 131072
#define BPF_COMPLEXITY_LIMIT_STACK 1024
#define BPF_COMPLEXITY_LIMIT_STATES 64
static bool is_acquire_function(enum bpf_func_id func_id)
{
return func_id == BPF_FUNC_sk_lookup_tcp ||
- func_id == BPF_FUNC_sk_lookup_udp;
+ func_id == BPF_FUNC_sk_lookup_udp ||
+ func_id == BPF_FUNC_skc_lookup_tcp;
}
static bool is_ptr_cast_function(enum bpf_func_id func_id)
*/
subprog[env->subprog_cnt].start = insn_cnt;
- if (env->log.level > 1)
+ if (env->log.level & BPF_LOG_LEVEL2)
for (i = 0; i < env->subprog_cnt; i++)
verbose(env, "func#%d @%d\n", i, subprog[i].start);
struct bpf_reg_state *parent)
{
bool writes = parent == state->parent; /* Observe write marks */
+ int cnt = 0;
while (parent) {
/* if read wasn't screened by an earlier write ... */
parent->var_off.value, parent->off);
return -EFAULT;
}
+ if (parent->live & REG_LIVE_READ)
+ /* The parentage chain never changes and
+ * this parent was already marked as LIVE_READ.
+ * There is no need to keep walking the chain again and
+ * keep re-marking all parents as LIVE_READ.
+ * This case happens when the same register is read
+ * multiple times without writes into it in-between.
+ */
+ break;
/* ... then we depend on parent's value */
parent->live |= REG_LIVE_READ;
state = parent;
parent = state->parent;
writes = true;
+ cnt++;
}
+
+ if (env->longest_mark_read_walk < cnt)
+ env->longest_mark_read_walk = cnt;
return 0;
}
char tn_buf[48];
tnum_strn(tn_buf, sizeof(tn_buf), reg->var_off);
- verbose(env, "variable stack access var_off=%s off=%d size=%d",
+ verbose(env, "variable stack access var_off=%s off=%d size=%d\n",
tn_buf, off, size);
return -EACCES;
}
return 0;
}
+static int check_map_access_type(struct bpf_verifier_env *env, u32 regno,
+ int off, int size, enum bpf_access_type type)
+{
+ struct bpf_reg_state *regs = cur_regs(env);
+ struct bpf_map *map = regs[regno].map_ptr;
+ u32 cap = bpf_map_flags_to_cap(map);
+
+ if (type == BPF_WRITE && !(cap & BPF_MAP_CAN_WRITE)) {
+ verbose(env, "write into map forbidden, value_size=%d off=%d size=%d\n",
+ map->value_size, off, size);
+ return -EACCES;
+ }
+
+ if (type == BPF_READ && !(cap & BPF_MAP_CAN_READ)) {
+ verbose(env, "read from map forbidden, value_size=%d off=%d size=%d\n",
+ map->value_size, off, size);
+ return -EACCES;
+ }
+
+ return 0;
+}
+
/* check read/write into map element returned by bpf_map_lookup_elem() */
static int __check_map_access(struct bpf_verifier_env *env, u32 regno, int off,
int size, bool zero_size_allowed)
* need to try adding each of min_value and max_value to off
* to make sure our theoretical access will be safe.
*/
- if (env->log.level)
+ if (env->log.level & BPF_LOG_LEVEL)
print_verifier_state(env, state);
/* The minimum value is only important with signed
verbose(env, "R%d leaks addr into map\n", value_regno);
return -EACCES;
}
-
+ err = check_map_access_type(env, regno, off, size, t);
+ if (err)
+ return err;
err = check_map_access(env, regno, off, size, false);
if (!err && t == BPF_READ && value_regno >= 0)
mark_reg_unknown(env, regs, value_regno);
BPF_SIZE(insn->code), BPF_WRITE, -1, true);
}
+static int __check_stack_boundary(struct bpf_verifier_env *env, u32 regno,
+ int off, int access_size,
+ bool zero_size_allowed)
+{
+ struct bpf_reg_state *reg = reg_state(env, regno);
+
+ if (off >= 0 || off < -MAX_BPF_STACK || off + access_size > 0 ||
+ access_size < 0 || (access_size == 0 && !zero_size_allowed)) {
+ if (tnum_is_const(reg->var_off)) {
+ verbose(env, "invalid stack type R%d off=%d access_size=%d\n",
+ regno, off, access_size);
+ } else {
+ char tn_buf[48];
+
+ tnum_strn(tn_buf, sizeof(tn_buf), reg->var_off);
+ verbose(env, "invalid stack type R%d var_off=%s access_size=%d\n",
+ regno, tn_buf, access_size);
+ }
+ return -EACCES;
+ }
+ return 0;
+}
+
/* when register 'regno' is passed into function that will read 'access_size'
* bytes from that pointer, make sure that it's within stack boundary
* and all elements of stack are initialized.
{
struct bpf_reg_state *reg = reg_state(env, regno);
struct bpf_func_state *state = func(env, reg);
- int off, i, slot, spi;
+ int err, min_off, max_off, i, slot, spi;
if (reg->type != PTR_TO_STACK) {
/* Allow zero-byte read from NULL, regardless of pointer type */
return -EACCES;
}
- /* Only allow fixed-offset stack reads */
- if (!tnum_is_const(reg->var_off)) {
- char tn_buf[48];
+ if (tnum_is_const(reg->var_off)) {
+ min_off = max_off = reg->var_off.value + reg->off;
+ err = __check_stack_boundary(env, regno, min_off, access_size,
+ zero_size_allowed);
+ if (err)
+ return err;
+ } else {
+ /* Variable offset is prohibited for unprivileged mode for
+ * simplicity since it requires corresponding support in
+ * Spectre masking for stack ALU.
+ * See also retrieve_ptr_limit().
+ */
+ if (!env->allow_ptr_leaks) {
+ char tn_buf[48];
- tnum_strn(tn_buf, sizeof(tn_buf), reg->var_off);
- verbose(env, "invalid variable stack read R%d var_off=%s\n",
- regno, tn_buf);
- return -EACCES;
- }
- off = reg->off + reg->var_off.value;
- if (off >= 0 || off < -MAX_BPF_STACK || off + access_size > 0 ||
- access_size < 0 || (access_size == 0 && !zero_size_allowed)) {
- verbose(env, "invalid stack type R%d off=%d access_size=%d\n",
- regno, off, access_size);
- return -EACCES;
+ tnum_strn(tn_buf, sizeof(tn_buf), reg->var_off);
+ verbose(env, "R%d indirect variable offset stack access prohibited for !root, var_off=%s\n",
+ regno, tn_buf);
+ return -EACCES;
+ }
+ /* Only initialized buffer on stack is allowed to be accessed
+ * with variable offset. With uninitialized buffer it's hard to
+ * guarantee that whole memory is marked as initialized on
+ * helper return since specific bounds are unknown what may
+ * cause uninitialized stack leaking.
+ */
+ if (meta && meta->raw_mode)
+ meta = NULL;
+
+ if (reg->smax_value >= BPF_MAX_VAR_OFF ||
+ reg->smax_value <= -BPF_MAX_VAR_OFF) {
+ verbose(env, "R%d unbounded indirect variable offset stack access\n",
+ regno);
+ return -EACCES;
+ }
+ min_off = reg->smin_value + reg->off;
+ max_off = reg->smax_value + reg->off;
+ err = __check_stack_boundary(env, regno, min_off, access_size,
+ zero_size_allowed);
+ if (err) {
+ verbose(env, "R%d min value is outside of stack bound\n",
+ regno);
+ return err;
+ }
+ err = __check_stack_boundary(env, regno, max_off, access_size,
+ zero_size_allowed);
+ if (err) {
+ verbose(env, "R%d max value is outside of stack bound\n",
+ regno);
+ return err;
+ }
}
if (meta && meta->raw_mode) {
return 0;
}
- for (i = 0; i < access_size; i++) {
+ for (i = min_off; i < max_off + access_size; i++) {
u8 *stype;
- slot = -(off + i) - 1;
+ slot = -i - 1;
spi = slot / BPF_REG_SIZE;
if (state->allocated_stack <= slot)
goto err;
goto mark;
}
err:
- verbose(env, "invalid indirect read from stack off %d+%d size %d\n",
- off, i, access_size);
+ if (tnum_is_const(reg->var_off)) {
+ verbose(env, "invalid indirect read from stack off %d+%d size %d\n",
+ min_off, i - min_off, access_size);
+ } else {
+ char tn_buf[48];
+
+ tnum_strn(tn_buf, sizeof(tn_buf), reg->var_off);
+ verbose(env, "invalid indirect read from stack var_off %s+%d size %d\n",
+ tn_buf, i - min_off, access_size);
+ }
return -EACCES;
mark:
/* reading any byte out of 8-byte 'spill_slot' will cause
mark_reg_read(env, &state->stack[spi].spilled_ptr,
state->stack[spi].spilled_ptr.parent);
}
- return update_stack_depth(env, state, off);
+ return update_stack_depth(env, state, min_off);
}
static int check_helper_mem_access(struct bpf_verifier_env *env, int regno,
return check_packet_access(env, regno, reg->off, access_size,
zero_size_allowed);
case PTR_TO_MAP_VALUE:
+ if (check_map_access_type(env, regno, reg->off, access_size,
+ meta && meta->raw_mode ? BPF_WRITE :
+ BPF_READ))
+ return -EACCES;
return check_map_access(env, regno, reg->off, access_size,
zero_size_allowed);
default: /* scalar_value|ptr_to_stack or invalid ptr */
/* and go analyze first insn of the callee */
*insn_idx = target_insn;
- if (env->log.level) {
+ if (env->log.level & BPF_LOG_LEVEL) {
verbose(env, "caller:\n");
print_verifier_state(env, caller);
verbose(env, "callee:\n");
return err;
*insn_idx = callee->callsite + 1;
- if (env->log.level) {
+ if (env->log.level & BPF_LOG_LEVEL) {
verbose(env, "returning from callee:\n");
print_verifier_state(env, callee);
verbose(env, "to caller at %d:\n", *insn_idx);
int func_id, int insn_idx)
{
struct bpf_insn_aux_data *aux = &env->insn_aux_data[insn_idx];
+ struct bpf_map *map = meta->map_ptr;
if (func_id != BPF_FUNC_tail_call &&
func_id != BPF_FUNC_map_lookup_elem &&
func_id != BPF_FUNC_map_peek_elem)
return 0;
- if (meta->map_ptr == NULL) {
+ if (map == NULL) {
verbose(env, "kernel subsystem misconfigured verifier\n");
return -EINVAL;
}
+ /* In case of read-only, some additional restrictions
+ * need to be applied in order to prevent altering the
+ * state of the map from program side.
+ */
+ if ((map->map_flags & BPF_F_RDONLY_PROG) &&
+ (func_id == BPF_FUNC_map_delete_elem ||
+ func_id == BPF_FUNC_map_update_elem ||
+ func_id == BPF_FUNC_map_push_elem ||
+ func_id == BPF_FUNC_map_pop_elem)) {
+ verbose(env, "write into map forbidden\n");
+ return -EACCES;
+ }
+
if (!BPF_MAP_PTR(aux->map_state))
bpf_map_ptr_store(aux, meta->map_ptr,
meta->map_ptr->unpriv_array);
} else if (fn->ret_type == RET_PTR_TO_SOCKET_OR_NULL) {
mark_reg_known_zero(env, regs, BPF_REG_0);
regs[BPF_REG_0].type = PTR_TO_SOCKET_OR_NULL;
- if (is_acquire_function(func_id)) {
- int id = acquire_reference_state(env, insn_idx);
-
- if (id < 0)
- return id;
- /* For mark_ptr_or_null_reg() */
- regs[BPF_REG_0].id = id;
- /* For release_reference() */
- regs[BPF_REG_0].ref_obj_id = id;
- } else {
- /* For mark_ptr_or_null_reg() */
- regs[BPF_REG_0].id = ++env->id_gen;
- }
+ regs[BPF_REG_0].id = ++env->id_gen;
+ } else if (fn->ret_type == RET_PTR_TO_SOCK_COMMON_OR_NULL) {
+ mark_reg_known_zero(env, regs, BPF_REG_0);
+ regs[BPF_REG_0].type = PTR_TO_SOCK_COMMON_OR_NULL;
+ regs[BPF_REG_0].id = ++env->id_gen;
} else if (fn->ret_type == RET_PTR_TO_TCP_SOCK_OR_NULL) {
mark_reg_known_zero(env, regs, BPF_REG_0);
regs[BPF_REG_0].type = PTR_TO_TCP_SOCK_OR_NULL;
return -EINVAL;
}
- if (is_ptr_cast_function(func_id))
+ if (is_ptr_cast_function(func_id)) {
/* For release_reference() */
regs[BPF_REG_0].ref_obj_id = meta.ref_obj_id;
+ } else if (is_acquire_function(func_id)) {
+ int id = acquire_reference_state(env, insn_idx);
+
+ if (id < 0)
+ return id;
+ /* For mark_ptr_or_null_reg() */
+ regs[BPF_REG_0].id = id;
+ /* For release_reference() */
+ regs[BPF_REG_0].ref_obj_id = id;
+ }
do_refine_retval_range(regs, fn->ret_type, func_id, &meta);
switch (ptr_reg->type) {
case PTR_TO_STACK:
+ /* Indirect variable offset stack access is prohibited in
+ * unprivileged mode so it's not handled here.
+ */
off = ptr_reg->off + ptr_reg->var_off.value;
if (mask_to_left)
*ptr_limit = MAX_BPF_STACK + off;
insn->dst_reg);
return -EACCES;
}
- if (env->log.level)
+ if (env->log.level & BPF_LOG_LEVEL)
print_verifier_state(env, this_branch->frame[this_branch->curframe]);
return 0;
}
-/* return the map pointer stored inside BPF_LD_IMM64 instruction */
-static struct bpf_map *ld_imm64_to_map_ptr(struct bpf_insn *insn)
-{
- u64 imm64 = ((u64) (u32) insn[0].imm) | ((u64) (u32) insn[1].imm) << 32;
-
- return (struct bpf_map *) (unsigned long) imm64;
-}
-
/* verify BPF_LD_IMM64 instruction */
static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn)
{
+ struct bpf_insn_aux_data *aux = cur_aux(env);
struct bpf_reg_state *regs = cur_regs(env);
+ struct bpf_map *map;
int err;
if (BPF_SIZE(insn->code) != BPF_DW) {
return 0;
}
- /* replace_map_fd_with_map_ptr() should have caught bad ld_imm64 */
- BUG_ON(insn->src_reg != BPF_PSEUDO_MAP_FD);
+ map = env->used_maps[aux->map_index];
+ mark_reg_known_zero(env, regs, insn->dst_reg);
+ regs[insn->dst_reg].map_ptr = map;
+
+ if (insn->src_reg == BPF_PSEUDO_MAP_VALUE) {
+ regs[insn->dst_reg].type = PTR_TO_MAP_VALUE;
+ regs[insn->dst_reg].off = aux->map_off;
+ if (map_value_has_spin_lock(map))
+ regs[insn->dst_reg].id = ++env->id_gen;
+ } else if (insn->src_reg == BPF_PSEUDO_MAP_FD) {
+ regs[insn->dst_reg].type = CONST_PTR_TO_MAP;
+ } else {
+ verbose(env, "bpf verifier is misconfigured\n");
+ return -EINVAL;
+ }
- regs[insn->dst_reg].type = CONST_PTR_TO_MAP;
- regs[insn->dst_reg].map_ptr = ld_imm64_to_map_ptr(insn);
return 0;
}
int ret = 0;
int i, t;
- insn_state = kcalloc(insn_cnt, sizeof(int), GFP_KERNEL);
+ insn_state = kvcalloc(insn_cnt, sizeof(int), GFP_KERNEL);
if (!insn_state)
return -ENOMEM;
- insn_stack = kcalloc(insn_cnt, sizeof(int), GFP_KERNEL);
+ insn_stack = kvcalloc(insn_cnt, sizeof(int), GFP_KERNEL);
if (!insn_stack) {
- kfree(insn_state);
+ kvfree(insn_state);
return -ENOMEM;
}
ret = 0; /* cfg looks good */
err_free:
- kfree(insn_state);
- kfree(insn_stack);
+ kvfree(insn_state);
+ kvfree(insn_stack);
return ret;
}
static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
{
struct bpf_verifier_state_list *new_sl;
- struct bpf_verifier_state_list *sl;
+ struct bpf_verifier_state_list *sl, **pprev;
struct bpf_verifier_state *cur = env->cur_state, *new;
int i, j, err, states_cnt = 0;
- sl = env->explored_states[insn_idx];
+ pprev = &env->explored_states[insn_idx];
+ sl = *pprev;
+
if (!sl)
/* this 'insn_idx' instruction wasn't marked, so we will not
* be doing state search here
while (sl != STATE_LIST_MARK) {
if (states_equal(env, &sl->state, cur)) {
+ sl->hit_cnt++;
/* reached equivalent register/stack state,
* prune the search.
* Registers read by the continuation are read by us.
return err;
return 1;
}
- sl = sl->next;
states_cnt++;
+ sl->miss_cnt++;
+ /* heuristic to determine whether this state is beneficial
+ * to keep checking from state equivalence point of view.
+ * Higher numbers increase max_states_per_insn and verification time,
+ * but do not meaningfully decrease insn_processed.
+ */
+ if (sl->miss_cnt > sl->hit_cnt * 3 + 3) {
+ /* the state is unlikely to be useful. Remove it to
+ * speed up verification
+ */
+ *pprev = sl->next;
+ if (sl->state.frame[0]->regs[0].live & REG_LIVE_DONE) {
+ free_verifier_state(&sl->state, false);
+ kfree(sl);
+ env->peak_states--;
+ } else {
+ /* cannot free this state, since parentage chain may
+ * walk it later. Add it for free_list instead to
+ * be freed at the end of verification
+ */
+ sl->next = env->free_list;
+ env->free_list = sl;
+ }
+ sl = *pprev;
+ continue;
+ }
+ pprev = &sl->next;
+ sl = *pprev;
}
+ if (env->max_states_per_insn < states_cnt)
+ env->max_states_per_insn = states_cnt;
+
if (!env->allow_ptr_leaks && states_cnt > BPF_COMPLEXITY_LIMIT_STATES)
return 0;
new_sl = kzalloc(sizeof(struct bpf_verifier_state_list), GFP_KERNEL);
if (!new_sl)
return -ENOMEM;
+ env->total_states++;
+ env->peak_states++;
/* add new state to the head of linked list */
new = &new_sl->state;
struct bpf_verifier_state *state;
struct bpf_insn *insns = env->prog->insnsi;
struct bpf_reg_state *regs;
- int insn_cnt = env->prog->len, i;
- int insn_processed = 0;
+ int insn_cnt = env->prog->len;
bool do_print_state = false;
env->prev_linfo = NULL;
insn = &insns[env->insn_idx];
class = BPF_CLASS(insn->code);
- if (++insn_processed > BPF_COMPLEXITY_LIMIT_INSNS) {
+ if (++env->insn_processed > BPF_COMPLEXITY_LIMIT_INSNS) {
verbose(env,
"BPF program is too large. Processed %d insn\n",
- insn_processed);
+ env->insn_processed);
return -E2BIG;
}
return err;
if (err == 1) {
/* found equivalent state, can prune the search */
- if (env->log.level) {
+ if (env->log.level & BPF_LOG_LEVEL) {
if (do_print_state)
verbose(env, "\nfrom %d to %d%s: safe\n",
env->prev_insn_idx, env->insn_idx,
if (need_resched())
cond_resched();
- if (env->log.level > 1 || (env->log.level && do_print_state)) {
- if (env->log.level > 1)
+ if (env->log.level & BPF_LOG_LEVEL2 ||
+ (env->log.level & BPF_LOG_LEVEL && do_print_state)) {
+ if (env->log.level & BPF_LOG_LEVEL2)
verbose(env, "%d:", env->insn_idx);
else
verbose(env, "\nfrom %d to %d%s:",
do_print_state = false;
}
- if (env->log.level) {
+ if (env->log.level & BPF_LOG_LEVEL) {
const struct bpf_insn_cbs cbs = {
.cb_print = verbose,
.private_data = env,
env->insn_idx++;
}
- verbose(env, "processed %d insns (limit %d), stack depth ",
- insn_processed, BPF_COMPLEXITY_LIMIT_INSNS);
- for (i = 0; i < env->subprog_cnt; i++) {
- u32 depth = env->subprog_info[i].stack_depth;
-
- verbose(env, "%d", depth);
- if (i + 1 < env->subprog_cnt)
- verbose(env, "+");
- }
- verbose(env, "\n");
env->prog->aux->stack_depth = env->subprog_info[0].stack_depth;
return 0;
}
}
if (insn[0].code == (BPF_LD | BPF_IMM | BPF_DW)) {
+ struct bpf_insn_aux_data *aux;
struct bpf_map *map;
struct fd f;
+ u64 addr;
if (i == insn_cnt - 1 || insn[1].code != 0 ||
insn[1].dst_reg != 0 || insn[1].src_reg != 0 ||
return -EINVAL;
}
- if (insn->src_reg == 0)
+ if (insn[0].src_reg == 0)
/* valid generic load 64-bit imm */
goto next_insn;
- if (insn[0].src_reg != BPF_PSEUDO_MAP_FD ||
- insn[1].imm != 0) {
- verbose(env, "unrecognized bpf_ld_imm64 insn\n");
+ /* In final convert_pseudo_ld_imm64() step, this is
+ * converted into regular 64-bit imm load insn.
+ */
+ if ((insn[0].src_reg != BPF_PSEUDO_MAP_FD &&
+ insn[0].src_reg != BPF_PSEUDO_MAP_VALUE) ||
+ (insn[0].src_reg == BPF_PSEUDO_MAP_FD &&
+ insn[1].imm != 0)) {
+ verbose(env,
+ "unrecognized bpf_ld_imm64 insn\n");
return -EINVAL;
}
return err;
}
- /* store map pointer inside BPF_LD_IMM64 instruction */
- insn[0].imm = (u32) (unsigned long) map;
- insn[1].imm = ((u64) (unsigned long) map) >> 32;
+ aux = &env->insn_aux_data[i];
+ if (insn->src_reg == BPF_PSEUDO_MAP_FD) {
+ addr = (unsigned long)map;
+ } else {
+ u32 off = insn[1].imm;
+
+ if (off >= BPF_MAX_VAR_OFF) {
+ verbose(env, "direct value offset of %u is not allowed\n", off);
+ fdput(f);
+ return -EINVAL;
+ }
+
+ if (!map->ops->map_direct_value_addr) {
+ verbose(env, "no direct value access support for this map type\n");
+ fdput(f);
+ return -EINVAL;
+ }
+
+ err = map->ops->map_direct_value_addr(map, &addr, off);
+ if (err) {
+ verbose(env, "invalid access to map value pointer, value_size=%u off=%u\n",
+ map->value_size, off);
+ fdput(f);
+ return err;
+ }
+
+ aux->map_off = off;
+ addr += off;
+ }
+
+ insn[0].imm = (u32)addr;
+ insn[1].imm = addr >> 32;
/* check whether we recorded this map already */
- for (j = 0; j < env->used_map_cnt; j++)
+ for (j = 0; j < env->used_map_cnt; j++) {
if (env->used_maps[j] == map) {
+ aux->map_index = j;
fdput(f);
goto next_insn;
}
+ }
if (env->used_map_cnt >= MAX_USED_MAPS) {
fdput(f);
fdput(f);
return PTR_ERR(map);
}
+
+ aux->map_index = env->used_map_cnt;
env->used_maps[env->used_map_cnt++] = map;
if (bpf_map_is_cgroup_storage(map) &&
struct bpf_prog *new_prog;
new_prog = bpf_patch_insn_single(env->prog, off, patch, len);
- if (!new_prog)
+ if (IS_ERR(new_prog)) {
+ if (PTR_ERR(new_prog) == -ERANGE)
+ verbose(env,
+ "insn %d cannot be patched due to 16-bit range\n",
+ env->insn_aux_data[off].orig_idx);
return NULL;
+ }
if (adjust_insn_aux_data(env, new_prog->len, off, len))
return NULL;
adjust_subprog_starts(env, off, len);
struct bpf_verifier_state_list *sl, *sln;
int i;
+ sl = env->free_list;
+ while (sl) {
+ sln = sl->next;
+ free_verifier_state(&sl->state, false);
+ kfree(sl);
+ sl = sln;
+ }
+
if (!env->explored_states)
return;
}
}
- kfree(env->explored_states);
+ kvfree(env->explored_states);
+}
+
+static void print_verification_stats(struct bpf_verifier_env *env)
+{
+ int i;
+
+ if (env->log.level & BPF_LOG_STATS) {
+ verbose(env, "verification time %lld usec\n",
+ div_u64(env->verification_time, 1000));
+ verbose(env, "stack depth ");
+ for (i = 0; i < env->subprog_cnt; i++) {
+ u32 depth = env->subprog_info[i].stack_depth;
+
+ verbose(env, "%d", depth);
+ if (i + 1 < env->subprog_cnt)
+ verbose(env, "+");
+ }
+ verbose(env, "\n");
+ }
+ verbose(env, "processed %d insns (limit %d) max_states_per_insn %d "
+ "total_states %d peak_states %d mark_read %d\n",
+ env->insn_processed, BPF_COMPLEXITY_LIMIT_INSNS,
+ env->max_states_per_insn, env->total_states,
+ env->peak_states, env->longest_mark_read_walk);
}
int bpf_check(struct bpf_prog **prog, union bpf_attr *attr,
union bpf_attr __user *uattr)
{
+ u64 start_time = ktime_get_ns();
struct bpf_verifier_env *env;
struct bpf_verifier_log *log;
int i, len, ret = -EINVAL;
ret = -EINVAL;
/* log attributes have to be sane */
- if (log->len_total < 128 || log->len_total > UINT_MAX >> 8 ||
- !log->level || !log->ubuf)
+ if (log->len_total < 128 || log->len_total > UINT_MAX >> 2 ||
+ !log->level || !log->ubuf || log->level & ~BPF_LOG_MASK)
goto err_unlock;
}
goto skip_full_check;
}
- env->explored_states = kcalloc(env->prog->len,
+ env->explored_states = kvcalloc(env->prog->len,
sizeof(struct bpf_verifier_state_list *),
GFP_USER);
ret = -ENOMEM;
if (ret == 0)
ret = fixup_call_args(env);
+ env->verification_time = ktime_get_ns() - start_time;
+ print_verification_stats(env);
+
if (log->level && bpf_verifier_log_full(log))
ret = -ENOSPC;
if (log->level && !log->ubuf) {
{
.cmd = TASKSTATS_CMD_GET,
.doit = taskstats_user_cmd,
- .policy = taskstats_cmd_get_policy,
- .flags = GENL_ADMIN_PERM,
+ /* policy enforced later */
+ .flags = GENL_ADMIN_PERM | GENL_CMD_CAP_HASPOL,
},
{
.cmd = CGROUPSTATS_CMD_GET,
.doit = cgroupstats_user_cmd,
- .policy = cgroupstats_cmd_get_policy,
+ /* policy enforced later */
+ .flags = GENL_CMD_CAP_HASPOL,
},
};
+static int taskstats_pre_doit(const struct genl_ops *ops, struct sk_buff *skb,
+ struct genl_info *info)
+{
+ const struct nla_policy *policy = NULL;
+
+ switch (ops->cmd) {
+ case TASKSTATS_CMD_GET:
+ policy = taskstats_cmd_get_policy;
+ break;
+ case CGROUPSTATS_CMD_GET:
+ policy = cgroupstats_cmd_get_policy;
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ return nlmsg_validate(info->nlhdr, GENL_HDRLEN, TASKSTATS_CMD_ATTR_MAX,
+ policy, info->extack);
+}
+
static struct genl_family family __ro_after_init = {
.name = TASKSTATS_GENL_NAME,
.version = TASKSTATS_GENL_VERSION,
.module = THIS_MODULE,
.ops = taskstats_ops,
.n_ops = ARRAY_SIZE(taskstats_ops),
+ .pre_doit = taskstats_pre_doit,
};
/* Needed early in initialization */
}
EXPORT_SYMBOL(jiffies64_to_nsecs);
+u64 jiffies64_to_msecs(const u64 j)
+{
+#if HZ <= MSEC_PER_SEC && !(MSEC_PER_SEC % HZ)
+ return (MSEC_PER_SEC / HZ) * j;
+#else
+ return div_u64(j * HZ_TO_MSEC_NUM, HZ_TO_MSEC_DEN);
+#endif
+}
+EXPORT_SYMBOL(jiffies64_to_msecs);
+
/**
* nsecs_to_jiffies64 - Convert nsecs in u64 to jiffies64
*
But it significantly improves the success of resolving
variables in gdb on optimized code.
+config DEBUG_INFO_BTF
+ bool "Generate BTF typeinfo"
+ depends on DEBUG_INFO
+ help
+ Generate deduplicated BTF type information from DWARF debug info.
+ Turning this on expects presence of pahole tool, which will convert
+ DWARF type info into equivalent deduplicated BTF type info.
+
config GDB_SCRIPTS
bool "Provide GDB scripts for kernel debugging"
depends on DEBUG_INFO
#define HASH_DEFAULT_SIZE 64UL
#define HASH_MIN_SIZE 4U
-#define BUCKET_LOCKS_PER_CPU 32UL
union nested_table {
union nested_table __rcu *table;
- struct rhash_head __rcu *bucket;
+ struct rhash_lock_head __rcu *bucket;
};
static u32 head_hashfn(struct rhashtable *ht,
int lockdep_rht_bucket_is_held(const struct bucket_table *tbl, u32 hash)
{
- spinlock_t *lock = rht_bucket_lock(tbl, hash);
-
- return (debug_locks) ? lockdep_is_held(lock) : 1;
+ if (!debug_locks)
+ return 1;
+ if (unlikely(tbl->nest))
+ return 1;
+ return bit_spin_is_locked(0, (unsigned long *)&tbl->buckets[hash]);
}
EXPORT_SYMBOL_GPL(lockdep_rht_bucket_is_held);
#else
if (tbl->nest)
nested_bucket_table_free(tbl);
- free_bucket_spinlocks(tbl->locks);
kvfree(tbl);
}
INIT_RHT_NULLS_HEAD(ntbl[i].bucket);
}
- rcu_assign_pointer(*prev, ntbl);
-
- return ntbl;
+ if (cmpxchg(prev, NULL, ntbl) == NULL)
+ return ntbl;
+ /* Raced with another thread. */
+ kfree(ntbl);
+ return rcu_dereference(*prev);
}
static struct bucket_table *nested_bucket_table_alloc(struct rhashtable *ht,
gfp_t gfp)
{
struct bucket_table *tbl = NULL;
- size_t size, max_locks;
+ size_t size;
int i;
+ static struct lock_class_key __key;
- size = sizeof(*tbl) + nbuckets * sizeof(tbl->buckets[0]);
- tbl = kvzalloc(size, gfp);
+ tbl = kvzalloc(struct_size(tbl, buckets, nbuckets), gfp);
size = nbuckets;
if (tbl == NULL)
return NULL;
- tbl->size = size;
-
- max_locks = size >> 1;
- if (tbl->nest)
- max_locks = min_t(size_t, max_locks, 1U << tbl->nest);
+ lockdep_init_map(&tbl->dep_map, "rhashtable_bucket", &__key, 0);
- if (alloc_bucket_spinlocks(&tbl->locks, &tbl->locks_mask, max_locks,
- ht->p.locks_mul, gfp) < 0) {
- bucket_table_free(tbl);
- return NULL;
- }
+ tbl->size = size;
+ rcu_head_init(&tbl->rcu);
INIT_LIST_HEAD(&tbl->walkers);
tbl->hash_rnd = get_random_u32();
return new_tbl;
}
-static int rhashtable_rehash_one(struct rhashtable *ht, unsigned int old_hash)
+static int rhashtable_rehash_one(struct rhashtable *ht,
+ struct rhash_lock_head __rcu **bkt,
+ unsigned int old_hash)
{
struct bucket_table *old_tbl = rht_dereference(ht->tbl, ht);
struct bucket_table *new_tbl = rhashtable_last_table(ht, old_tbl);
- struct rhash_head __rcu **pprev = rht_bucket_var(old_tbl, old_hash);
int err = -EAGAIN;
struct rhash_head *head, *next, *entry;
- spinlock_t *new_bucket_lock;
+ struct rhash_head __rcu **pprev = NULL;
unsigned int new_hash;
if (new_tbl->nest)
err = -ENOENT;
- rht_for_each(entry, old_tbl, old_hash) {
+ rht_for_each_from(entry, rht_ptr(bkt, old_tbl, old_hash),
+ old_tbl, old_hash) {
err = 0;
next = rht_dereference_bucket(entry->next, old_tbl, old_hash);
new_hash = head_hashfn(ht, new_tbl, entry);
- new_bucket_lock = rht_bucket_lock(new_tbl, new_hash);
+ rht_lock_nested(new_tbl, &new_tbl->buckets[new_hash], SINGLE_DEPTH_NESTING);
- spin_lock_nested(new_bucket_lock, SINGLE_DEPTH_NESTING);
- head = rht_dereference_bucket(new_tbl->buckets[new_hash],
- new_tbl, new_hash);
+ head = rht_ptr(new_tbl->buckets + new_hash, new_tbl, new_hash);
RCU_INIT_POINTER(entry->next, head);
- rcu_assign_pointer(new_tbl->buckets[new_hash], entry);
- spin_unlock(new_bucket_lock);
+ rht_assign_unlock(new_tbl, &new_tbl->buckets[new_hash], entry);
- rcu_assign_pointer(*pprev, next);
+ if (pprev)
+ rcu_assign_pointer(*pprev, next);
+ else
+ /* Need to preserved the bit lock. */
+ rht_assign_locked(bkt, next);
out:
return err;
unsigned int old_hash)
{
struct bucket_table *old_tbl = rht_dereference(ht->tbl, ht);
- spinlock_t *old_bucket_lock;
+ struct rhash_lock_head __rcu **bkt = rht_bucket_var(old_tbl, old_hash);
int err;
- old_bucket_lock = rht_bucket_lock(old_tbl, old_hash);
+ if (!bkt)
+ return 0;
+ rht_lock(old_tbl, bkt);
- spin_lock_bh(old_bucket_lock);
- while (!(err = rhashtable_rehash_one(ht, old_hash)))
+ while (!(err = rhashtable_rehash_one(ht, bkt, old_hash)))
;
- if (err == -ENOENT) {
- old_tbl->rehash++;
+ if (err == -ENOENT)
err = 0;
- }
- spin_unlock_bh(old_bucket_lock);
+ rht_unlock(old_tbl, bkt);
return err;
}
spin_lock(&ht->lock);
list_for_each_entry(walker, &old_tbl->walkers, list)
walker->tbl = NULL;
- spin_unlock(&ht->lock);
/* Wait for readers. All new readers will see the new
* table, and thus no references to the old table will
* remain.
+ * We do this inside the locked region so that
+ * rhashtable_walk_stop() can use rcu_head_after_call_rcu()
+ * to check if it should not re-link the table.
*/
call_rcu(&old_tbl->rcu, bucket_table_free_rcu);
+ spin_unlock(&ht->lock);
return rht_dereference(new_tbl->future_tbl, ht) ? -EAGAIN : 0;
}
}
static void *rhashtable_lookup_one(struct rhashtable *ht,
+ struct rhash_lock_head __rcu **bkt,
struct bucket_table *tbl, unsigned int hash,
const void *key, struct rhash_head *obj)
{
.ht = ht,
.key = key,
};
- struct rhash_head __rcu **pprev;
+ struct rhash_head __rcu **pprev = NULL;
struct rhash_head *head;
int elasticity;
elasticity = RHT_ELASTICITY;
- pprev = rht_bucket_var(tbl, hash);
- rht_for_each_continue(head, *pprev, tbl, hash) {
+ rht_for_each_from(head, rht_ptr(bkt, tbl, hash), tbl, hash) {
struct rhlist_head *list;
struct rhlist_head *plist;
RCU_INIT_POINTER(list->next, plist);
head = rht_dereference_bucket(head->next, tbl, hash);
RCU_INIT_POINTER(list->rhead.next, head);
- rcu_assign_pointer(*pprev, obj);
+ if (pprev)
+ rcu_assign_pointer(*pprev, obj);
+ else
+ /* Need to preserve the bit lock */
+ rht_assign_locked(bkt, obj);
return NULL;
}
}
static struct bucket_table *rhashtable_insert_one(struct rhashtable *ht,
+ struct rhash_lock_head __rcu **bkt,
struct bucket_table *tbl,
unsigned int hash,
struct rhash_head *obj,
void *data)
{
- struct rhash_head __rcu **pprev;
struct bucket_table *new_tbl;
struct rhash_head *head;
if (unlikely(rht_grow_above_100(ht, tbl)))
return ERR_PTR(-EAGAIN);
- pprev = rht_bucket_insert(ht, tbl, hash);
- if (!pprev)
- return ERR_PTR(-ENOMEM);
-
- head = rht_dereference_bucket(*pprev, tbl, hash);
+ head = rht_ptr(bkt, tbl, hash);
RCU_INIT_POINTER(obj->next, head);
if (ht->rhlist) {
RCU_INIT_POINTER(list->next, NULL);
}
- rcu_assign_pointer(*pprev, obj);
+ /* bkt is always the head of the list, so it holds
+ * the lock, which we need to preserve
+ */
+ rht_assign_locked(bkt, obj);
atomic_inc(&ht->nelems);
if (rht_grow_above_75(ht, tbl))
{
struct bucket_table *new_tbl;
struct bucket_table *tbl;
+ struct rhash_lock_head __rcu **bkt;
unsigned int hash;
- spinlock_t *lock;
void *data;
- tbl = rcu_dereference(ht->tbl);
-
- /* All insertions must grab the oldest table containing
- * the hashed bucket that is yet to be rehashed.
- */
- for (;;) {
- hash = rht_head_hashfn(ht, tbl, obj, ht->p);
- lock = rht_bucket_lock(tbl, hash);
- spin_lock_bh(lock);
-
- if (tbl->rehash <= hash)
- break;
-
- spin_unlock_bh(lock);
- tbl = rht_dereference_rcu(tbl->future_tbl, ht);
- }
-
- data = rhashtable_lookup_one(ht, tbl, hash, key, obj);
- new_tbl = rhashtable_insert_one(ht, tbl, hash, obj, data);
- if (PTR_ERR(new_tbl) != -EEXIST)
- data = ERR_CAST(new_tbl);
+ new_tbl = rcu_dereference(ht->tbl);
- while (!IS_ERR_OR_NULL(new_tbl)) {
+ do {
tbl = new_tbl;
hash = rht_head_hashfn(ht, tbl, obj, ht->p);
- spin_lock_nested(rht_bucket_lock(tbl, hash),
- SINGLE_DEPTH_NESTING);
-
- data = rhashtable_lookup_one(ht, tbl, hash, key, obj);
- new_tbl = rhashtable_insert_one(ht, tbl, hash, obj, data);
- if (PTR_ERR(new_tbl) != -EEXIST)
- data = ERR_CAST(new_tbl);
-
- spin_unlock(rht_bucket_lock(tbl, hash));
- }
-
- spin_unlock_bh(lock);
+ if (rcu_access_pointer(tbl->future_tbl))
+ /* Failure is OK */
+ bkt = rht_bucket_var(tbl, hash);
+ else
+ bkt = rht_bucket_insert(ht, tbl, hash);
+ if (bkt == NULL) {
+ new_tbl = rht_dereference_rcu(tbl->future_tbl, ht);
+ data = ERR_PTR(-EAGAIN);
+ } else {
+ rht_lock(tbl, bkt);
+ data = rhashtable_lookup_one(ht, bkt, tbl,
+ hash, key, obj);
+ new_tbl = rhashtable_insert_one(ht, bkt, tbl,
+ hash, obj, data);
+ if (PTR_ERR(new_tbl) != -EEXIST)
+ data = ERR_CAST(new_tbl);
+
+ rht_unlock(tbl, bkt);
+ }
+ } while (!IS_ERR_OR_NULL(new_tbl));
if (PTR_ERR(data) == -EAGAIN)
data = ERR_PTR(rhashtable_insert_rehash(ht, tbl) ?:
ht = iter->ht;
spin_lock(&ht->lock);
- if (tbl->rehash < tbl->size)
- list_add(&iter->walker.list, &tbl->walkers);
- else
+ if (rcu_head_after_call_rcu(&tbl->rcu, bucket_table_free_rcu))
+ /* This bucket table is being freed, don't re-link it. */
iter->walker.tbl = NULL;
+ else
+ list_add(&iter->walker.list, &tbl->walkers);
spin_unlock(&ht->lock);
out:
size = rounded_hashtable_size(&ht->p);
- if (params->locks_mul)
- ht->p.locks_mul = roundup_pow_of_two(params->locks_mul);
- else
- ht->p.locks_mul = BUCKET_LOCKS_PER_CPU;
-
ht->key_len = ht->p.key_len;
if (!params->hashfn) {
ht->p.hashfn = jhash;
struct rhash_head *pos, *next;
cond_resched();
- for (pos = rht_dereference(*rht_bucket(tbl, i), ht),
+ for (pos = rht_ptr_exclusive(rht_bucket(tbl, i)),
next = !rht_is_a_nulls(pos) ?
rht_dereference(pos->next, ht) : NULL;
!rht_is_a_nulls(pos);
}
EXPORT_SYMBOL_GPL(rhashtable_destroy);
-struct rhash_head __rcu **rht_bucket_nested(const struct bucket_table *tbl,
- unsigned int hash)
+struct rhash_lock_head __rcu **__rht_bucket_nested(const struct bucket_table *tbl,
+ unsigned int hash)
{
const unsigned int shift = PAGE_SHIFT - ilog2(sizeof(void *));
- static struct rhash_head __rcu *rhnull;
unsigned int index = hash & ((1 << tbl->nest) - 1);
unsigned int size = tbl->size >> tbl->nest;
unsigned int subhash = hash;
subhash >>= shift;
}
- if (!ntbl) {
- if (!rhnull)
- INIT_RHT_NULLS_HEAD(rhnull);
- return &rhnull;
- }
+ if (!ntbl)
+ return NULL;
return &ntbl[subhash].bucket;
}
+EXPORT_SYMBOL_GPL(__rht_bucket_nested);
+
+struct rhash_lock_head __rcu **rht_bucket_nested(const struct bucket_table *tbl,
+ unsigned int hash)
+{
+ static struct rhash_lock_head __rcu *rhnull;
+
+ if (!rhnull)
+ INIT_RHT_NULLS_HEAD(rhnull);
+ return __rht_bucket_nested(tbl, hash) ?: &rhnull;
+}
EXPORT_SYMBOL_GPL(rht_bucket_nested);
-struct rhash_head __rcu **rht_bucket_nested_insert(struct rhashtable *ht,
- struct bucket_table *tbl,
- unsigned int hash)
+struct rhash_lock_head __rcu **rht_bucket_nested_insert(struct rhashtable *ht,
+ struct bucket_table *tbl,
+ unsigned int hash)
{
const unsigned int shift = PAGE_SHIFT - ilog2(sizeof(void *));
unsigned int index = hash & ((1 << tbl->nest) - 1);
struct rhash_head *pos, *next;
struct test_obj_rhl *p;
- pos = rht_dereference(tbl->buckets[i], ht);
+ pos = rht_ptr_exclusive(tbl->buckets + i);
next = !rht_is_a_nulls(pos) ? rht_dereference(pos->next, ht) : NULL;
if (!rht_is_a_nulls(pos)) {
with the help of BPF programs.
config NET_DEVLINK
- bool "Network physical/parent device Netlink interface"
- help
- Network physical/parent device Netlink interface provides
- infrastructure to support access to physical chip-wide config and
- monitoring.
+ bool
+ default n
config PAGE_POOL
bool
return NETDEV_TX_OK;
}
rt = (struct rtable *) dst;
- if (rt->rt_gateway)
- daddr = &rt->rt_gateway;
+ if (rt->rt_gw_family == AF_INET)
+ daddr = &rt->rt_gw4;
else
daddr = &ip_hdr(skb)->daddr;
n = dst_neigh_lookup(dst, daddr);
# Copyright (C) 2007-2019 B.A.T.M.A.N. contributors:
#
# Marek Lindner, Simon Wunderlich
-#
-# This program is free software; you can redistribute it and/or
-# modify it under the terms of version 2 of the GNU General Public
-# License as published by the Free Software Foundation.
-#
-# This program is distributed in the hope that it will be useful, but
-# WITHOUT ANY WARRANTY; without even the implied warranty of
-# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
-# General Public License for more details.
-#
-# You should have received a copy of the GNU General Public License
-# along with this program; if not, see <http://www.gnu.org/licenses/>.
#
# B.A.T.M.A.N meshing protocol
buffer. The output is controlled via the batadv netdev specific
log_level setting.
+config BATMAN_ADV_SYSFS
+ bool "batman-adv sysfs entries"
+ depends on BATMAN_ADV
+ default y
+ help
+ Say Y here if you want to enable batman-adv device configuration and
+ status interface through sysfs attributes. It is replaced by the
+ batadv generic netlink family but still used by various userspace
+ tools and scripts.
+
+ If unsure, say Y.
+
config BATMAN_ADV_TRACING
bool "B.A.T.M.A.N. tracing support"
depends on BATMAN_ADV
# Copyright (C) 2007-2019 B.A.T.M.A.N. contributors:
#
# Marek Lindner, Simon Wunderlich
-#
-# This program is free software; you can redistribute it and/or
-# modify it under the terms of version 2 of the GNU General Public
-# License as published by the Free Software Foundation.
-#
-# This program is distributed in the hope that it will be useful, but
-# WITHOUT ANY WARRANTY; without even the implied warranty of
-# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
-# General Public License for more details.
-#
-# You should have received a copy of the GNU General Public License
-# along with this program; if not, see <http://www.gnu.org/licenses/>.
-#
obj-$(CONFIG_BATMAN_ADV) += batman-adv.o
batman-adv-y += bat_algo.o
batman-adv-y += routing.o
batman-adv-y += send.o
batman-adv-y += soft-interface.o
-batman-adv-y += sysfs.o
+batman-adv-$(CONFIG_BATMAN_ADV_SYSFS) += sysfs.o
batman-adv-$(CONFIG_BATMAN_ADV_TRACING) += trace.o
batman-adv-y += tp_meter.o
batman-adv-y += translation-table.o
/* Copyright (C) 2007-2019 B.A.T.M.A.N. contributors:
*
* Marek Lindner, Simon Wunderlich
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#include "main.h"
/* Copyright (C) 2011-2019 B.A.T.M.A.N. contributors:
*
* Marek Lindner, Linus Lüssing
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#ifndef _NET_BATMAN_ADV_BAT_ALGO_H_
/* Copyright (C) 2007-2019 B.A.T.M.A.N. contributors:
*
* Marek Lindner, Simon Wunderlich
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#include "bat_iv_ogm.h"
/* Copyright (C) 2007-2019 B.A.T.M.A.N. contributors:
*
* Marek Lindner, Simon Wunderlich
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#ifndef _NET_BATMAN_ADV_BAT_IV_OGM_H_
/* Copyright (C) 2013-2019 B.A.T.M.A.N. contributors:
*
* Linus Lüssing, Marek Lindner
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#include "bat_v.h"
/* Copyright (C) 2011-2019 B.A.T.M.A.N. contributors:
*
* Marek Lindner, Linus Lüssing
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#ifndef _NET_BATMAN_ADV_BAT_V_H_
/* Copyright (C) 2011-2019 B.A.T.M.A.N. contributors:
*
* Linus Lüssing, Marek Lindner
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#include "bat_v_elp.h"
/* Copyright (C) 2013-2019 B.A.T.M.A.N. contributors:
*
* Linus Lüssing, Marek Lindner
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#ifndef _NET_BATMAN_ADV_BAT_V_ELP_H_
/* Copyright (C) 2013-2019 B.A.T.M.A.N. contributors:
*
* Antonio Quartulli
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#include "bat_v_ogm.h"
/* Copyright (C) 2013-2019 B.A.T.M.A.N. contributors:
*
* Antonio Quartulli
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#ifndef _NET_BATMAN_ADV_BAT_V_OGM_H_
/* Copyright (C) 2006-2019 B.A.T.M.A.N. contributors:
*
* Simon Wunderlich, Marek Lindner
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#include "bitarray.h"
/* Copyright (C) 2006-2019 B.A.T.M.A.N. contributors:
*
* Simon Wunderlich, Marek Lindner
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#ifndef _NET_BATMAN_ADV_BITARRAY_H_
/* Copyright (C) 2011-2019 B.A.T.M.A.N. contributors:
*
* Simon Wunderlich
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#include "bridge_loop_avoidance.h"
#include "netlink.h"
#include "originator.h"
#include "soft-interface.h"
-#include "sysfs.h"
#include "translation-table.h"
static const u8 batadv_announce_mac[4] = {0x43, 0x05, 0x43, 0x05};
/* Copyright (C) 2011-2019 B.A.T.M.A.N. contributors:
*
* Simon Wunderlich
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#ifndef _NET_BATMAN_ADV_BLA_H_
/* Copyright (C) 2010-2019 B.A.T.M.A.N. contributors:
*
* Marek Lindner
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#include "debugfs.h"
/* Copyright (C) 2010-2019 B.A.T.M.A.N. contributors:
*
* Marek Lindner
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#ifndef _NET_BATMAN_ADV_DEBUGFS_H_
/* Copyright (C) 2011-2019 B.A.T.M.A.N. contributors:
*
* Antonio Quartulli
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#include "distributed-arp-table.h"
}
/**
- * batadv_dat_send_data() - send a payload to the selected candidates
+ * batadv_dat_forward_data() - copy and send payload to the selected candidates
* @bat_priv: the bat priv with all the soft interface information
* @skb: payload to send
* @ip: the DHT key
* Return: true if the packet is sent to at least one candidate, false
* otherwise.
*/
-static bool batadv_dat_send_data(struct batadv_priv *bat_priv,
- struct sk_buff *skb, __be32 ip,
- unsigned short vid, int packet_subtype)
+static bool batadv_dat_forward_data(struct batadv_priv *bat_priv,
+ struct sk_buff *skb, __be32 ip,
+ unsigned short vid, int packet_subtype)
{
int i;
bool ret = false;
ret = true;
} else {
/* Send the request to the DHT */
- ret = batadv_dat_send_data(bat_priv, skb, ip_dst, vid,
- BATADV_P_DAT_DHT_GET);
+ ret = batadv_dat_forward_data(bat_priv, skb, ip_dst, vid,
+ BATADV_P_DAT_DHT_GET);
}
out:
if (dat_entry)
/* Send the ARP reply to the candidates for both the IP addresses that
* the node obtained from the ARP reply
*/
- batadv_dat_send_data(bat_priv, skb, ip_src, vid, BATADV_P_DAT_DHT_PUT);
- batadv_dat_send_data(bat_priv, skb, ip_dst, vid, BATADV_P_DAT_DHT_PUT);
+ batadv_dat_forward_data(bat_priv, skb, ip_src, vid,
+ BATADV_P_DAT_DHT_PUT);
+ batadv_dat_forward_data(bat_priv, skb, ip_dst, vid,
+ BATADV_P_DAT_DHT_PUT);
}
/**
hw_src, &ip_src, hw_dst, &ip_dst,
dat_entry->mac_addr, &dat_entry->ip);
dropped = true;
- goto out;
}
/* Update our internal cache with both the IP addresses the node got
batadv_dat_entry_add(bat_priv, ip_src, hw_src, vid);
batadv_dat_entry_add(bat_priv, ip_dst, hw_dst, vid);
+ if (dropped)
+ goto out;
+
/* If BLA is enabled, only forward ARP replies if we have claimed the
* source of the ARP reply or if no one else of the same backbone has
* already claimed that client. This prevents that different gateways
batadv_dat_entry_add(bat_priv, yiaddr, chaddr, vid);
batadv_dat_entry_add(bat_priv, ip_dst, hw_dst, vid);
- batadv_dat_send_data(bat_priv, skb, yiaddr, vid, BATADV_P_DAT_DHT_PUT);
- batadv_dat_send_data(bat_priv, skb, ip_dst, vid, BATADV_P_DAT_DHT_PUT);
+ batadv_dat_forward_data(bat_priv, skb, yiaddr, vid,
+ BATADV_P_DAT_DHT_PUT);
+ batadv_dat_forward_data(bat_priv, skb, ip_dst, vid,
+ BATADV_P_DAT_DHT_PUT);
consume_skb(skb);
/* Copyright (C) 2011-2019 B.A.T.M.A.N. contributors:
*
* Antonio Quartulli
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#ifndef _NET_BATMAN_ADV_DISTRIBUTED_ARP_TABLE_H_
/* Copyright (C) 2013-2019 B.A.T.M.A.N. contributors:
*
* Martin Hundebøll <martin@hundeboll.net>
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#include "fragmentation.h"
/* Copyright (C) 2013-2019 B.A.T.M.A.N. contributors:
*
* Martin Hundebøll <martin@hundeboll.net>
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#ifndef _NET_BATMAN_ADV_FRAGMENTATION_H_
/* Copyright (C) 2009-2019 B.A.T.M.A.N. contributors:
*
* Marek Lindner
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#include "gateway_client.h"
#include "originator.h"
#include "routing.h"
#include "soft-interface.h"
-#include "sysfs.h"
#include "translation-table.h"
/* These are the offsets of the "hw type" and "hw address length" in the dhcp
/* Copyright (C) 2009-2019 B.A.T.M.A.N. contributors:
*
* Marek Lindner
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#ifndef _NET_BATMAN_ADV_GATEWAY_CLIENT_H_
/* Copyright (C) 2009-2019 B.A.T.M.A.N. contributors:
*
* Marek Lindner
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#include "gateway_common.h"
/* Copyright (C) 2009-2019 B.A.T.M.A.N. contributors:
*
* Marek Lindner
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#ifndef _NET_BATMAN_ADV_GATEWAY_COMMON_H_
/* Copyright (C) 2007-2019 B.A.T.M.A.N. contributors:
*
* Marek Lindner, Simon Wunderlich
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#include "hard-interface.h"
/* Copyright (C) 2007-2019 B.A.T.M.A.N. contributors:
*
* Marek Lindner, Simon Wunderlich
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#ifndef _NET_BATMAN_ADV_HARD_INTERFACE_H_
/* Copyright (C) 2006-2019 B.A.T.M.A.N. contributors:
*
* Simon Wunderlich, Marek Lindner
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#include "hash.h"
/* Copyright (C) 2006-2019 B.A.T.M.A.N. contributors:
*
* Simon Wunderlich, Marek Lindner
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#ifndef _NET_BATMAN_ADV_HASH_H_
/* Copyright (C) 2007-2019 B.A.T.M.A.N. contributors:
*
* Marek Lindner
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#include "icmp_socket.h"
/* Copyright (C) 2007-2019 B.A.T.M.A.N. contributors:
*
* Marek Lindner
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#ifndef _NET_BATMAN_ADV_ICMP_SOCKET_H_
/* Copyright (C) 2010-2019 B.A.T.M.A.N. contributors:
*
* Marek Lindner
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#include "log.h"
/* Copyright (C) 2007-2019 B.A.T.M.A.N. contributors:
*
* Marek Lindner, Simon Wunderlich
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#ifndef _NET_BATMAN_ADV_LOG_H_
/* Copyright (C) 2007-2019 B.A.T.M.A.N. contributors:
*
* Marek Lindner, Simon Wunderlich
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#include "main.h"
#include <linux/build_bug.h>
#include <linux/byteorder/generic.h>
#include <linux/crc32c.h>
+#include <linux/device.h>
#include <linux/errno.h>
#include <linux/genetlink.h>
#include <linux/gfp.h>
#include <linux/ip.h>
#include <linux/ipv6.h>
#include <linux/kernel.h>
+#include <linux/kobject.h>
#include <linux/kref.h>
#include <linux/list.h>
#include <linux/module.h>
#include <linux/rcupdate.h>
#include <linux/seq_file.h>
#include <linux/skbuff.h>
+#include <linux/slab.h>
#include <linux/spinlock.h>
#include <linux/stddef.h>
#include <linux/string.h>
static void batadv_recv_handler_init(void);
+#define BATADV_UEV_TYPE_VAR "BATTYPE="
+#define BATADV_UEV_ACTION_VAR "BATACTION="
+#define BATADV_UEV_DATA_VAR "BATDATA="
+
+static char *batadv_uev_action_str[] = {
+ "add",
+ "del",
+ "change",
+ "loopdetect",
+};
+
+static char *batadv_uev_type_str[] = {
+ "gw",
+ "bla",
+};
+
static int __init batadv_init(void)
{
int ret;
return ap_isolation_enabled;
}
+/**
+ * batadv_throw_uevent() - Send an uevent with batman-adv specific env data
+ * @bat_priv: the bat priv with all the soft interface information
+ * @type: subsystem type of event. Stored in uevent's BATTYPE
+ * @action: action type of event. Stored in uevent's BATACTION
+ * @data: string with additional information to the event (ignored for
+ * BATADV_UEV_DEL). Stored in uevent's BATDATA
+ *
+ * Return: 0 on success or negative error number in case of failure
+ */
+int batadv_throw_uevent(struct batadv_priv *bat_priv, enum batadv_uev_type type,
+ enum batadv_uev_action action, const char *data)
+{
+ int ret = -ENOMEM;
+ struct kobject *bat_kobj;
+ char *uevent_env[4] = { NULL, NULL, NULL, NULL };
+
+ bat_kobj = &bat_priv->soft_iface->dev.kobj;
+
+ uevent_env[0] = kasprintf(GFP_ATOMIC,
+ "%s%s", BATADV_UEV_TYPE_VAR,
+ batadv_uev_type_str[type]);
+ if (!uevent_env[0])
+ goto out;
+
+ uevent_env[1] = kasprintf(GFP_ATOMIC,
+ "%s%s", BATADV_UEV_ACTION_VAR,
+ batadv_uev_action_str[action]);
+ if (!uevent_env[1])
+ goto out;
+
+ /* If the event is DEL, ignore the data field */
+ if (action != BATADV_UEV_DEL) {
+ uevent_env[2] = kasprintf(GFP_ATOMIC,
+ "%s%s", BATADV_UEV_DATA_VAR, data);
+ if (!uevent_env[2])
+ goto out;
+ }
+
+ ret = kobject_uevent_env(bat_kobj, KOBJ_CHANGE, uevent_env);
+out:
+ kfree(uevent_env[0]);
+ kfree(uevent_env[1]);
+ kfree(uevent_env[2]);
+
+ if (ret)
+ batadv_dbg(BATADV_DBG_BATMAN, bat_priv,
+ "Impossible to send uevent for (%s,%s,%s) event (err: %d)\n",
+ batadv_uev_type_str[type],
+ batadv_uev_action_str[action],
+ (action == BATADV_UEV_DEL ? "NULL" : data), ret);
+ return ret;
+}
+
module_init(batadv_init);
module_exit(batadv_exit);
/* Copyright (C) 2007-2019 B.A.T.M.A.N. contributors:
*
* Marek Lindner, Simon Wunderlich
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#ifndef _NET_BATMAN_ADV_MAIN_H_
unsigned short batadv_get_vid(struct sk_buff *skb, size_t header_len);
bool batadv_vlan_ap_isola_get(struct batadv_priv *bat_priv, unsigned short vid);
+int batadv_throw_uevent(struct batadv_priv *bat_priv, enum batadv_uev_type type,
+ enum batadv_uev_action action, const char *data);
#endif /* _NET_BATMAN_ADV_MAIN_H_ */
/* Copyright (C) 2014-2019 B.A.T.M.A.N. contributors:
*
* Linus Lüssing
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#include "multicast.h"
#include "hash.h"
#include "log.h"
#include "netlink.h"
+#include "send.h"
#include "soft-interface.h"
#include "translation-table.h"
#include "tvlv.h"
{
int ret, tt_count, ip_count, unsnoop_count, total_count;
bool is_unsnoopable = false;
+ unsigned int mcast_fanout;
struct ethhdr *ethhdr;
ret = batadv_mcast_forw_mode_check(bat_priv, skb, &is_unsnoopable);
case 0:
return BATADV_FORW_NONE;
default:
- return BATADV_FORW_ALL;
+ mcast_fanout = atomic_read(&bat_priv->multicast_fanout);
+
+ if (!unsnoop_count && total_count <= mcast_fanout)
+ return BATADV_FORW_SOME;
}
+
+ return BATADV_FORW_ALL;
+}
+
+/**
+ * batadv_mcast_forw_tt() - forwards a packet to multicast listeners
+ * @bat_priv: the bat priv with all the soft interface information
+ * @skb: the multicast packet to transmit
+ * @vid: the vlan identifier
+ *
+ * Sends copies of a frame with multicast destination to any multicast
+ * listener registered in the translation table. A transmission is performed
+ * via a batman-adv unicast packet for each such destination node.
+ *
+ * Return: NET_XMIT_DROP on memory allocation failure, NET_XMIT_SUCCESS
+ * otherwise.
+ */
+static int
+batadv_mcast_forw_tt(struct batadv_priv *bat_priv, struct sk_buff *skb,
+ unsigned short vid)
+{
+ int ret = NET_XMIT_SUCCESS;
+ struct sk_buff *newskb;
+
+ struct batadv_tt_orig_list_entry *orig_entry;
+
+ struct batadv_tt_global_entry *tt_global;
+ const u8 *addr = eth_hdr(skb)->h_dest;
+
+ tt_global = batadv_tt_global_hash_find(bat_priv, addr, vid);
+ if (!tt_global)
+ goto out;
+
+ rcu_read_lock();
+ hlist_for_each_entry_rcu(orig_entry, &tt_global->orig_list, list) {
+ newskb = skb_copy(skb, GFP_ATOMIC);
+ if (!newskb) {
+ ret = NET_XMIT_DROP;
+ break;
+ }
+
+ batadv_send_skb_unicast(bat_priv, newskb, BATADV_UNICAST, 0,
+ orig_entry->orig_node, vid);
+ }
+ rcu_read_unlock();
+
+ batadv_tt_global_entry_put(tt_global);
+
+out:
+ return ret;
+}
+
+/**
+ * batadv_mcast_forw_want_all_ipv4() - forward to nodes with want-all-ipv4
+ * @bat_priv: the bat priv with all the soft interface information
+ * @skb: the multicast packet to transmit
+ * @vid: the vlan identifier
+ *
+ * Sends copies of a frame with multicast destination to any node with a
+ * BATADV_MCAST_WANT_ALL_IPV4 flag set. A transmission is performed via a
+ * batman-adv unicast packet for each such destination node.
+ *
+ * Return: NET_XMIT_DROP on memory allocation failure, NET_XMIT_SUCCESS
+ * otherwise.
+ */
+static int
+batadv_mcast_forw_want_all_ipv4(struct batadv_priv *bat_priv,
+ struct sk_buff *skb, unsigned short vid)
+{
+ struct batadv_orig_node *orig_node;
+ int ret = NET_XMIT_SUCCESS;
+ struct sk_buff *newskb;
+
+ rcu_read_lock();
+ hlist_for_each_entry_rcu(orig_node,
+ &bat_priv->mcast.want_all_ipv4_list,
+ mcast_want_all_ipv4_node) {
+ newskb = skb_copy(skb, GFP_ATOMIC);
+ if (!newskb) {
+ ret = NET_XMIT_DROP;
+ break;
+ }
+
+ batadv_send_skb_unicast(bat_priv, newskb, BATADV_UNICAST, 0,
+ orig_node, vid);
+ }
+ rcu_read_unlock();
+ return ret;
+}
+
+/**
+ * batadv_mcast_forw_want_all_ipv6() - forward to nodes with want-all-ipv6
+ * @bat_priv: the bat priv with all the soft interface information
+ * @skb: The multicast packet to transmit
+ * @vid: the vlan identifier
+ *
+ * Sends copies of a frame with multicast destination to any node with a
+ * BATADV_MCAST_WANT_ALL_IPV6 flag set. A transmission is performed via a
+ * batman-adv unicast packet for each such destination node.
+ *
+ * Return: NET_XMIT_DROP on memory allocation failure, NET_XMIT_SUCCESS
+ * otherwise.
+ */
+static int
+batadv_mcast_forw_want_all_ipv6(struct batadv_priv *bat_priv,
+ struct sk_buff *skb, unsigned short vid)
+{
+ struct batadv_orig_node *orig_node;
+ int ret = NET_XMIT_SUCCESS;
+ struct sk_buff *newskb;
+
+ rcu_read_lock();
+ hlist_for_each_entry_rcu(orig_node,
+ &bat_priv->mcast.want_all_ipv6_list,
+ mcast_want_all_ipv6_node) {
+ newskb = skb_copy(skb, GFP_ATOMIC);
+ if (!newskb) {
+ ret = NET_XMIT_DROP;
+ break;
+ }
+
+ batadv_send_skb_unicast(bat_priv, newskb, BATADV_UNICAST, 0,
+ orig_node, vid);
+ }
+ rcu_read_unlock();
+ return ret;
+}
+
+/**
+ * batadv_mcast_forw_want_all() - forward packet to nodes in a want-all list
+ * @bat_priv: the bat priv with all the soft interface information
+ * @skb: the multicast packet to transmit
+ * @vid: the vlan identifier
+ *
+ * Sends copies of a frame with multicast destination to any node with a
+ * BATADV_MCAST_WANT_ALL_IPV4 or BATADV_MCAST_WANT_ALL_IPV6 flag set. A
+ * transmission is performed via a batman-adv unicast packet for each such
+ * destination node.
+ *
+ * Return: NET_XMIT_DROP on memory allocation failure or if the protocol family
+ * is neither IPv4 nor IPv6. NET_XMIT_SUCCESS otherwise.
+ */
+static int
+batadv_mcast_forw_want_all(struct batadv_priv *bat_priv,
+ struct sk_buff *skb, unsigned short vid)
+{
+ switch (ntohs(eth_hdr(skb)->h_proto)) {
+ case ETH_P_IP:
+ return batadv_mcast_forw_want_all_ipv4(bat_priv, skb, vid);
+ case ETH_P_IPV6:
+ return batadv_mcast_forw_want_all_ipv6(bat_priv, skb, vid);
+ default:
+ /* we shouldn't be here... */
+ return NET_XMIT_DROP;
+ }
+}
+
+/**
+ * batadv_mcast_forw_send() - send packet to any detected multicast recpient
+ * @bat_priv: the bat priv with all the soft interface information
+ * @skb: the multicast packet to transmit
+ * @vid: the vlan identifier
+ *
+ * Sends copies of a frame with multicast destination to any node that signaled
+ * interest in it, that is either via the translation table or the according
+ * want-all flags. A transmission is performed via a batman-adv unicast packet
+ * for each such destination node.
+ *
+ * The given skb is consumed/freed.
+ *
+ * Return: NET_XMIT_DROP on memory allocation failure or if the protocol family
+ * is neither IPv4 nor IPv6. NET_XMIT_SUCCESS otherwise.
+ */
+int batadv_mcast_forw_send(struct batadv_priv *bat_priv, struct sk_buff *skb,
+ unsigned short vid)
+{
+ int ret;
+
+ ret = batadv_mcast_forw_tt(bat_priv, skb, vid);
+ if (ret != NET_XMIT_SUCCESS) {
+ kfree_skb(skb);
+ return ret;
+ }
+
+ ret = batadv_mcast_forw_want_all(bat_priv, skb, vid);
+ if (ret != NET_XMIT_SUCCESS) {
+ kfree_skb(skb);
+ return ret;
+ }
+
+ consume_skb(skb);
+ return ret;
}
/**
/* Copyright (C) 2014-2019 B.A.T.M.A.N. contributors:
*
* Linus Lüssing
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#ifndef _NET_BATMAN_ADV_MULTICAST_H_
*/
BATADV_FORW_ALL,
+ /**
+ * @BATADV_FORW_SOME: forward the packet to some nodes (currently via
+ * a multicast-to-unicast conversion and the BATMAN unicast routing
+ * protocol)
+ */
+ BATADV_FORW_SOME,
+
/**
* @BATADV_FORW_SINGLE: forward the packet to a single node (currently
* via the BATMAN unicast routing protocol)
batadv_mcast_forw_mode(struct batadv_priv *bat_priv, struct sk_buff *skb,
struct batadv_orig_node **mcast_single_orig);
+int batadv_mcast_forw_send(struct batadv_priv *bat_priv, struct sk_buff *skb,
+ unsigned short vid);
+
void batadv_mcast_init(struct batadv_priv *bat_priv);
int batadv_mcast_flags_seq_print_text(struct seq_file *seq, void *offset);
return BATADV_FORW_ALL;
}
+static inline int
+batadv_mcast_forw_send(struct batadv_priv *bat_priv, struct sk_buff *skb,
+ unsigned short vid)
+{
+ kfree_skb(skb);
+ return NET_XMIT_DROP;
+}
+
static inline int batadv_mcast_init(struct batadv_priv *bat_priv)
{
return 0;
/* Copyright (C) 2016-2019 B.A.T.M.A.N. contributors:
*
* Matthias Schiffer
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#include "netlink.h"
[BATADV_ATTR_HOP_PENALTY] = { .type = NLA_U8 },
[BATADV_ATTR_LOG_LEVEL] = { .type = NLA_U32 },
[BATADV_ATTR_MULTICAST_FORCEFLOOD_ENABLED] = { .type = NLA_U8 },
+ [BATADV_ATTR_MULTICAST_FANOUT] = { .type = NLA_U32 },
[BATADV_ATTR_NETWORK_CODING_ENABLED] = { .type = NLA_U8 },
[BATADV_ATTR_ORIG_INTERVAL] = { .type = NLA_U32 },
[BATADV_ATTR_ELP_INTERVAL] = { .type = NLA_U32 },
if (nla_put_u8(msg, BATADV_ATTR_MULTICAST_FORCEFLOOD_ENABLED,
!atomic_read(&bat_priv->multicast_mode)))
goto nla_put_failure;
+
+ if (nla_put_u32(msg, BATADV_ATTR_MULTICAST_FANOUT,
+ atomic_read(&bat_priv->multicast_fanout)))
+ goto nla_put_failure;
#endif /* CONFIG_BATMAN_ADV_MCAST */
#ifdef CONFIG_BATMAN_ADV_NC
atomic_set(&bat_priv->multicast_mode, !nla_get_u8(attr));
}
+
+ if (info->attrs[BATADV_ATTR_MULTICAST_FANOUT]) {
+ attr = info->attrs[BATADV_ATTR_MULTICAST_FANOUT];
+
+ atomic_set(&bat_priv->multicast_fanout, nla_get_u32(attr));
+ }
#endif /* CONFIG_BATMAN_ADV_MCAST */
#ifdef CONFIG_BATMAN_ADV_NC
{
.cmd = BATADV_CMD_GET_MESH,
/* can be retrieved by unprivileged users */
- .policy = batadv_netlink_policy,
.doit = batadv_netlink_get_mesh,
.internal_flags = BATADV_FLAG_NEED_MESH,
},
{
.cmd = BATADV_CMD_TP_METER,
.flags = GENL_ADMIN_PERM,
- .policy = batadv_netlink_policy,
.doit = batadv_netlink_tp_meter_start,
.internal_flags = BATADV_FLAG_NEED_MESH,
},
{
.cmd = BATADV_CMD_TP_METER_CANCEL,
.flags = GENL_ADMIN_PERM,
- .policy = batadv_netlink_policy,
.doit = batadv_netlink_tp_meter_cancel,
.internal_flags = BATADV_FLAG_NEED_MESH,
},
{
.cmd = BATADV_CMD_GET_ROUTING_ALGOS,
.flags = GENL_ADMIN_PERM,
- .policy = batadv_netlink_policy,
.dumpit = batadv_algo_dump,
},
{
.cmd = BATADV_CMD_GET_HARDIF,
/* can be retrieved by unprivileged users */
- .policy = batadv_netlink_policy,
.dumpit = batadv_netlink_dump_hardif,
.doit = batadv_netlink_get_hardif,
.internal_flags = BATADV_FLAG_NEED_MESH |
{
.cmd = BATADV_CMD_GET_TRANSTABLE_LOCAL,
.flags = GENL_ADMIN_PERM,
- .policy = batadv_netlink_policy,
.dumpit = batadv_tt_local_dump,
},
{
.cmd = BATADV_CMD_GET_TRANSTABLE_GLOBAL,
.flags = GENL_ADMIN_PERM,
- .policy = batadv_netlink_policy,
.dumpit = batadv_tt_global_dump,
},
{
.cmd = BATADV_CMD_GET_ORIGINATORS,
.flags = GENL_ADMIN_PERM,
- .policy = batadv_netlink_policy,
.dumpit = batadv_orig_dump,
},
{
.cmd = BATADV_CMD_GET_NEIGHBORS,
.flags = GENL_ADMIN_PERM,
- .policy = batadv_netlink_policy,
.dumpit = batadv_hardif_neigh_dump,
},
{
.cmd = BATADV_CMD_GET_GATEWAYS,
.flags = GENL_ADMIN_PERM,
- .policy = batadv_netlink_policy,
.dumpit = batadv_gw_dump,
},
{
.cmd = BATADV_CMD_GET_BLA_CLAIM,
.flags = GENL_ADMIN_PERM,
- .policy = batadv_netlink_policy,
.dumpit = batadv_bla_claim_dump,
},
{
.cmd = BATADV_CMD_GET_BLA_BACKBONE,
.flags = GENL_ADMIN_PERM,
- .policy = batadv_netlink_policy,
.dumpit = batadv_bla_backbone_dump,
},
{
.cmd = BATADV_CMD_GET_DAT_CACHE,
.flags = GENL_ADMIN_PERM,
- .policy = batadv_netlink_policy,
.dumpit = batadv_dat_cache_dump,
},
{
.cmd = BATADV_CMD_GET_MCAST_FLAGS,
.flags = GENL_ADMIN_PERM,
- .policy = batadv_netlink_policy,
.dumpit = batadv_mcast_flags_dump,
},
{
.cmd = BATADV_CMD_SET_MESH,
.flags = GENL_ADMIN_PERM,
- .policy = batadv_netlink_policy,
.doit = batadv_netlink_set_mesh,
.internal_flags = BATADV_FLAG_NEED_MESH,
},
{
.cmd = BATADV_CMD_SET_HARDIF,
.flags = GENL_ADMIN_PERM,
- .policy = batadv_netlink_policy,
.doit = batadv_netlink_set_hardif,
.internal_flags = BATADV_FLAG_NEED_MESH |
BATADV_FLAG_NEED_HARDIF,
{
.cmd = BATADV_CMD_GET_VLAN,
/* can be retrieved by unprivileged users */
- .policy = batadv_netlink_policy,
.doit = batadv_netlink_get_vlan,
.internal_flags = BATADV_FLAG_NEED_MESH |
BATADV_FLAG_NEED_VLAN,
{
.cmd = BATADV_CMD_SET_VLAN,
.flags = GENL_ADMIN_PERM,
- .policy = batadv_netlink_policy,
.doit = batadv_netlink_set_vlan,
.internal_flags = BATADV_FLAG_NEED_MESH |
BATADV_FLAG_NEED_VLAN,
.name = BATADV_NL_NAME,
.version = 1,
.maxattr = BATADV_ATTR_MAX,
+ .policy = batadv_netlink_policy,
.netnsok = true,
.pre_doit = batadv_pre_doit,
.post_doit = batadv_post_doit,
/* Copyright (C) 2016-2019 B.A.T.M.A.N. contributors:
*
* Matthias Schiffer
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#ifndef _NET_BATMAN_ADV_NETLINK_H_
/* Copyright (C) 2012-2019 B.A.T.M.A.N. contributors:
*
* Martin Hundebøll, Jeppe Ledet-Pedersen
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#include "network-coding.h"
/* Copyright (C) 2012-2019 B.A.T.M.A.N. contributors:
*
* Martin Hundebøll, Jeppe Ledet-Pedersen
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#ifndef _NET_BATMAN_ADV_NETWORK_CODING_H_
/* Copyright (C) 2009-2019 B.A.T.M.A.N. contributors:
*
* Marek Lindner, Simon Wunderlich
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#include "originator.h"
/* Copyright (C) 2007-2019 B.A.T.M.A.N. contributors:
*
* Marek Lindner, Simon Wunderlich
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#ifndef _NET_BATMAN_ADV_ORIGINATOR_H_
/* Copyright (C) 2007-2019 B.A.T.M.A.N. contributors:
*
* Marek Lindner, Simon Wunderlich
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#include "routing.h"
/* Copyright (C) 2007-2019 B.A.T.M.A.N. contributors:
*
* Marek Lindner, Simon Wunderlich
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#ifndef _NET_BATMAN_ADV_ROUTING_H_
/* Copyright (C) 2007-2019 B.A.T.M.A.N. contributors:
*
* Marek Lindner, Simon Wunderlich
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#include "send.h"
/* Copyright (C) 2007-2019 B.A.T.M.A.N. contributors:
*
* Marek Lindner, Simon Wunderlich
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#ifndef _NET_BATMAN_ADV_SEND_H_
/* Copyright (C) 2007-2019 B.A.T.M.A.N. contributors:
*
* Marek Lindner, Simon Wunderlich
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#include "soft-interface.h"
unsigned short vid;
u32 seqno;
int gw_mode;
- enum batadv_forw_mode forw_mode;
+ enum batadv_forw_mode forw_mode = BATADV_FORW_SINGLE;
struct batadv_orig_node *mcast_single_orig = NULL;
int network_offset = ETH_HLEN;
__be16 proto;
if (forw_mode == BATADV_FORW_NONE)
goto dropped;
- if (forw_mode == BATADV_FORW_SINGLE)
+ if (forw_mode == BATADV_FORW_SINGLE ||
+ forw_mode == BATADV_FORW_SOME)
do_bcast = false;
}
}
ret = batadv_send_skb_unicast(bat_priv, skb,
BATADV_UNICAST, 0,
mcast_single_orig, vid);
+ } else if (forw_mode == BATADV_FORW_SOME) {
+ ret = batadv_mcast_forw_send(bat_priv, skb, vid);
} else {
if (batadv_dat_snoop_outgoing_arp_request(bat_priv,
skb))
bat_priv->mcast.querier_ipv6.shadowing = false;
bat_priv->mcast.flags = BATADV_NO_FLAGS;
atomic_set(&bat_priv->multicast_mode, 1);
+ atomic_set(&bat_priv->multicast_fanout, 16);
atomic_set(&bat_priv->mcast.num_want_all_unsnoopables, 0);
atomic_set(&bat_priv->mcast.num_want_all_ipv4, 0);
atomic_set(&bat_priv->mcast.num_want_all_ipv6, 0);
/* Copyright (C) 2007-2019 B.A.T.M.A.N. contributors:
*
* Marek Lindner
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#ifndef _NET_BATMAN_ADV_SOFT_INTERFACE_H_
/* Copyright (C) 2010-2019 B.A.T.M.A.N. contributors:
*
* Marek Lindner
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#include "sysfs.h"
#include "main.h"
+#include <asm/current.h>
#include <linux/atomic.h>
#include <linux/compiler.h>
#include <linux/device.h>
#include <linux/rculist.h>
#include <linux/rcupdate.h>
#include <linux/rtnetlink.h>
+#include <linux/sched.h>
#include <linux/slab.h>
#include <linux/stddef.h>
#include <linux/string.h>
#include "network-coding.h"
#include "soft-interface.h"
+/**
+ * batadv_sysfs_deprecated() - Log use of deprecated batadv sysfs access
+ * @attr: attribute which was accessed
+ */
+static void batadv_sysfs_deprecated(struct attribute *attr)
+{
+ pr_warn_ratelimited(DEPRECATED "%s (pid %d) Use of sysfs file \"%s\".\nUse batadv genl family instead",
+ current->comm, task_pid_nr(current), attr->name);
+}
+
static struct net_device *batadv_kobj_to_netdev(struct kobject *obj)
{
struct device *dev = container_of(obj->parent, struct device, kobj);
return vlan;
}
-#define BATADV_UEV_TYPE_VAR "BATTYPE="
-#define BATADV_UEV_ACTION_VAR "BATACTION="
-#define BATADV_UEV_DATA_VAR "BATDATA="
-
-static char *batadv_uev_action_str[] = {
- "add",
- "del",
- "change",
- "loopdetect",
-};
-
-static char *batadv_uev_type_str[] = {
- "gw",
- "bla",
-};
-
/* Use this, if you have customized show and store functions for vlan attrs */
#define BATADV_ATTR_VLAN(_name, _mode, _show, _store) \
struct batadv_attribute batadv_attr_vlan_##_name = { \
struct batadv_priv *bat_priv = netdev_priv(net_dev); \
ssize_t length; \
\
+ batadv_sysfs_deprecated(attr); \
length = __batadv_store_bool_attr(buff, count, _post_func, attr,\
&bat_priv->_name, net_dev); \
\
{ \
struct batadv_priv *bat_priv = batadv_kobj_to_batpriv(kobj); \
\
+ batadv_sysfs_deprecated(attr); \
return sprintf(buff, "%s\n", \
atomic_read(&bat_priv->_name) == 0 ? \
"disabled" : "enabled"); \
struct batadv_priv *bat_priv = netdev_priv(net_dev); \
ssize_t length; \
\
+ batadv_sysfs_deprecated(attr); \
length = __batadv_store_uint_attr(buff, count, _min, _max, \
_post_func, attr, \
&bat_priv->_var, net_dev, \
{ \
struct batadv_priv *bat_priv = batadv_kobj_to_batpriv(kobj); \
\
+ batadv_sysfs_deprecated(attr); \
return sprintf(buff, "%i\n", atomic_read(&bat_priv->_var)); \
} \
attr, &vlan->_name, \
bat_priv->soft_iface); \
\
+ batadv_sysfs_deprecated(attr); \
if (vlan->vid) \
batadv_netlink_notify_vlan(bat_priv, vlan); \
else \
atomic_read(&vlan->_name) == 0 ? \
"disabled" : "enabled"); \
\
+ batadv_sysfs_deprecated(attr); \
batadv_softif_vlan_put(vlan); \
return res; \
}
struct batadv_priv *bat_priv; \
ssize_t length; \
\
+ batadv_sysfs_deprecated(attr); \
hard_iface = batadv_hardif_get_by_netdev(net_dev); \
if (!hard_iface) \
return 0; \
struct batadv_hard_iface *hard_iface; \
ssize_t length; \
\
+ batadv_sysfs_deprecated(attr); \
hard_iface = batadv_hardif_get_by_netdev(net_dev); \
if (!hard_iface) \
return 0; \
{
struct batadv_priv *bat_priv = batadv_kobj_to_batpriv(kobj);
+ batadv_sysfs_deprecated(attr);
return sprintf(buff, "%s\n", bat_priv->algo_ops->name);
}
struct batadv_priv *bat_priv = batadv_kobj_to_batpriv(kobj);
int bytes_written;
+ batadv_sysfs_deprecated(attr);
+
/* GW mode is not available if the routing algorithm in use does not
* implement the GW API
*/
char *curr_gw_mode_str;
int gw_mode_tmp = -1;
+ batadv_sysfs_deprecated(attr);
+
/* toggling GW mode is allowed only if the routing algorithm in use
* provides the GW API
*/
{
struct batadv_priv *bat_priv = batadv_kobj_to_batpriv(kobj);
+ batadv_sysfs_deprecated(attr);
+
/* GW selection class is not available if the routing algorithm in use
* does not implement the GW API
*/
struct batadv_priv *bat_priv = batadv_kobj_to_batpriv(kobj);
ssize_t length;
+ batadv_sysfs_deprecated(attr);
+
/* setting the GW selection class is allowed only if the routing
* algorithm in use implements the GW API
*/
struct batadv_priv *bat_priv = batadv_kobj_to_batpriv(kobj);
u32 down, up;
+ batadv_sysfs_deprecated(attr);
+
down = atomic_read(&bat_priv->gw.bandwidth_down);
up = atomic_read(&bat_priv->gw.bandwidth_up);
struct net_device *net_dev = batadv_kobj_to_netdev(kobj);
ssize_t length;
+ batadv_sysfs_deprecated(attr);
+
if (buff[count - 1] == '\n')
buff[count - 1] = '\0';
{
struct batadv_priv *bat_priv = batadv_kobj_to_batpriv(kobj);
+ batadv_sysfs_deprecated(attr);
return sprintf(buff, "%#.8x/%#.8x\n", bat_priv->isolation_mark,
bat_priv->isolation_mark_mask);
}
u32 mark, mask;
char *mask_ptr;
+ batadv_sysfs_deprecated(attr);
+
/* parse the mask if it has been specified, otherwise assume the mask is
* the biggest possible
*/
ssize_t length;
const char *ifname;
+ batadv_sysfs_deprecated(attr);
+
hard_iface = batadv_hardif_get_by_netdev(net_dev);
if (!hard_iface)
return 0;
struct net_device *net_dev = batadv_kobj_to_netdev(kobj);
struct batadv_store_mesh_work *store_work;
+ batadv_sysfs_deprecated(attr);
+
if (buff[count - 1] == '\n')
buff[count - 1] = '\0';
struct batadv_hard_iface *hard_iface;
ssize_t length;
+ batadv_sysfs_deprecated(attr);
+
hard_iface = batadv_hardif_get_by_netdev(net_dev);
if (!hard_iface)
return 0;
u32 old_tp_override;
bool ret;
+ batadv_sysfs_deprecated(attr);
+
hard_iface = batadv_hardif_get_by_netdev(net_dev);
if (!hard_iface)
return -EINVAL;
struct batadv_hard_iface *hard_iface;
u32 tp_override;
+ batadv_sysfs_deprecated(attr);
+
hard_iface = batadv_hardif_get_by_netdev(net_dev);
if (!hard_iface)
return -EINVAL;
kobject_put(*hardif_obj);
*hardif_obj = NULL;
}
-
-/**
- * batadv_throw_uevent() - Send an uevent with batman-adv specific env data
- * @bat_priv: the bat priv with all the soft interface information
- * @type: subsystem type of event. Stored in uevent's BATTYPE
- * @action: action type of event. Stored in uevent's BATACTION
- * @data: string with additional information to the event (ignored for
- * BATADV_UEV_DEL). Stored in uevent's BATDATA
- *
- * Return: 0 on success or negative error number in case of failure
- */
-int batadv_throw_uevent(struct batadv_priv *bat_priv, enum batadv_uev_type type,
- enum batadv_uev_action action, const char *data)
-{
- int ret = -ENOMEM;
- struct kobject *bat_kobj;
- char *uevent_env[4] = { NULL, NULL, NULL, NULL };
-
- bat_kobj = &bat_priv->soft_iface->dev.kobj;
-
- uevent_env[0] = kasprintf(GFP_ATOMIC,
- "%s%s", BATADV_UEV_TYPE_VAR,
- batadv_uev_type_str[type]);
- if (!uevent_env[0])
- goto out;
-
- uevent_env[1] = kasprintf(GFP_ATOMIC,
- "%s%s", BATADV_UEV_ACTION_VAR,
- batadv_uev_action_str[action]);
- if (!uevent_env[1])
- goto out;
-
- /* If the event is DEL, ignore the data field */
- if (action != BATADV_UEV_DEL) {
- uevent_env[2] = kasprintf(GFP_ATOMIC,
- "%s%s", BATADV_UEV_DATA_VAR, data);
- if (!uevent_env[2])
- goto out;
- }
-
- ret = kobject_uevent_env(bat_kobj, KOBJ_CHANGE, uevent_env);
-out:
- kfree(uevent_env[0]);
- kfree(uevent_env[1]);
- kfree(uevent_env[2]);
-
- if (ret)
- batadv_dbg(BATADV_DBG_BATMAN, bat_priv,
- "Impossible to send uevent for (%s,%s,%s) event (err: %d)\n",
- batadv_uev_type_str[type],
- batadv_uev_action_str[action],
- (action == BATADV_UEV_DEL ? "NULL" : data), ret);
- return ret;
-}
/* Copyright (C) 2010-2019 B.A.T.M.A.N. contributors:
*
* Marek Lindner
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#ifndef _NET_BATMAN_ADV_SYSFS_H_
char *buf, size_t count);
};
+#ifdef CONFIG_BATMAN_ADV_SYSFS
+
int batadv_sysfs_add_meshif(struct net_device *dev);
void batadv_sysfs_del_meshif(struct net_device *dev);
int batadv_sysfs_add_hardif(struct kobject **hardif_obj,
struct batadv_softif_vlan *vlan);
void batadv_sysfs_del_vlan(struct batadv_priv *bat_priv,
struct batadv_softif_vlan *vlan);
-int batadv_throw_uevent(struct batadv_priv *bat_priv, enum batadv_uev_type type,
- enum batadv_uev_action action, const char *data);
+
+#else
+
+static inline int batadv_sysfs_add_meshif(struct net_device *dev)
+{
+ return 0;
+}
+
+static inline void batadv_sysfs_del_meshif(struct net_device *dev)
+{
+}
+
+static inline int batadv_sysfs_add_hardif(struct kobject **hardif_obj,
+ struct net_device *dev)
+{
+ return 0;
+}
+
+static inline void batadv_sysfs_del_hardif(struct kobject **hardif_obj)
+{
+}
+
+static inline int batadv_sysfs_add_vlan(struct net_device *dev,
+ struct batadv_softif_vlan *vlan)
+{
+ return 0;
+}
+
+static inline void batadv_sysfs_del_vlan(struct batadv_priv *bat_priv,
+ struct batadv_softif_vlan *vlan)
+{
+}
+
+#endif
#endif /* _NET_BATMAN_ADV_SYSFS_H_ */
/* Copyright (C) 2012-2019 B.A.T.M.A.N. contributors:
*
* Edo Monticelli, Antonio Quartulli
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#include "tp_meter.h"
/* Copyright (C) 2012-2019 B.A.T.M.A.N. contributors:
*
* Edo Monticelli, Antonio Quartulli
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#ifndef _NET_BATMAN_ADV_TP_METER_H_
/* Copyright (C) 2010-2019 B.A.T.M.A.N. contributors:
*
* Sven Eckelmann
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#define CREATE_TRACE_POINTS
/* Copyright (C) 2010-2019 B.A.T.M.A.N. contributors:
*
* Sven Eckelmann
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#if !defined(_NET_BATMAN_ADV_TRACE_H_) || defined(TRACE_HEADER_MULTI_READ)
/* Copyright (C) 2007-2019 B.A.T.M.A.N. contributors:
*
* Marek Lindner, Simon Wunderlich, Antonio Quartulli
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#include "translation-table.h"
* Return: a pointer to the corresponding tt_global_entry struct if the client
* is found, NULL otherwise.
*/
-static struct batadv_tt_global_entry *
+struct batadv_tt_global_entry *
batadv_tt_global_hash_find(struct batadv_priv *bat_priv, const u8 *addr,
unsigned short vid)
{
* possibly release it
* @tt_global_entry: tt_global_entry to be free'd
*/
-static void
-batadv_tt_global_entry_put(struct batadv_tt_global_entry *tt_global_entry)
+void batadv_tt_global_entry_put(struct batadv_tt_global_entry *tt_global_entry)
{
kref_put(&tt_global_entry->common.refcount,
batadv_tt_global_entry_release);
/* Copyright (C) 2007-2019 B.A.T.M.A.N. contributors:
*
* Marek Lindner, Simon Wunderlich, Antonio Quartulli
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#ifndef _NET_BATMAN_ADV_TRANSLATION_TABLE_H_
void batadv_tt_global_del_orig(struct batadv_priv *bat_priv,
struct batadv_orig_node *orig_node,
s32 match_vid, const char *message);
+struct batadv_tt_global_entry *
+batadv_tt_global_hash_find(struct batadv_priv *bat_priv, const u8 *addr,
+ unsigned short vid);
+void batadv_tt_global_entry_put(struct batadv_tt_global_entry *tt_global_entry);
int batadv_tt_global_hash_count(struct batadv_priv *bat_priv,
const u8 *addr, unsigned short vid);
struct batadv_orig_node *batadv_transtable_search(struct batadv_priv *bat_priv,
/* Copyright (C) 2007-2019 B.A.T.M.A.N. contributors:
*
* Marek Lindner, Simon Wunderlich
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#include "main.h"
/* Copyright (C) 2007-2019 B.A.T.M.A.N. contributors:
*
* Marek Lindner, Simon Wunderlich
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#ifndef _NET_BATMAN_ADV_TVLV_H_
/* Copyright (C) 2007-2019 B.A.T.M.A.N. contributors:
*
* Marek Lindner, Simon Wunderlich
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#ifndef _NET_BATMAN_ADV_TYPES_H_
* node's sender/originating side
*/
atomic_t multicast_mode;
+
+ /**
+ * @multicast_fanout: Maximum number of packet copies to generate for a
+ * multicast-to-unicast conversion
+ */
+ atomic_t multicast_fanout;
#endif
/** @orig_interval: OGM broadcast interval in milliseconds */
-obj-y := test_run.o
+obj-$(CONFIG_BPF_SYSCALL) := test_run.o
return data;
}
+static void *bpf_ctx_init(const union bpf_attr *kattr, u32 max_size)
+{
+ void __user *data_in = u64_to_user_ptr(kattr->test.ctx_in);
+ void __user *data_out = u64_to_user_ptr(kattr->test.ctx_out);
+ u32 size = kattr->test.ctx_size_in;
+ void *data;
+ int err;
+
+ if (!data_in && !data_out)
+ return NULL;
+
+ data = kzalloc(max_size, GFP_USER);
+ if (!data)
+ return ERR_PTR(-ENOMEM);
+
+ if (data_in) {
+ err = bpf_check_uarg_tail_zero(data_in, max_size, size);
+ if (err) {
+ kfree(data);
+ return ERR_PTR(err);
+ }
+
+ size = min_t(u32, max_size, size);
+ if (copy_from_user(data, data_in, size)) {
+ kfree(data);
+ return ERR_PTR(-EFAULT);
+ }
+ }
+ return data;
+}
+
+static int bpf_ctx_finish(const union bpf_attr *kattr,
+ union bpf_attr __user *uattr, const void *data,
+ u32 size)
+{
+ void __user *data_out = u64_to_user_ptr(kattr->test.ctx_out);
+ int err = -EFAULT;
+ u32 copy_size = size;
+
+ if (!data || !data_out)
+ return 0;
+
+ if (copy_size > kattr->test.ctx_size_out) {
+ copy_size = kattr->test.ctx_size_out;
+ err = -ENOSPC;
+ }
+
+ if (copy_to_user(data_out, data, copy_size))
+ goto out;
+ if (copy_to_user(&uattr->test.ctx_size_out, &size, sizeof(size)))
+ goto out;
+ if (err != -ENOSPC)
+ err = 0;
+out:
+ return err;
+}
+
+/**
+ * range_is_zero - test whether buffer is initialized
+ * @buf: buffer to check
+ * @from: check from this position
+ * @to: check up until (excluding) this position
+ *
+ * This function returns true if the there is a non-zero byte
+ * in the buf in the range [from,to).
+ */
+static inline bool range_is_zero(void *buf, size_t from, size_t to)
+{
+ return !memchr_inv((u8 *)buf + from, 0, to - from);
+}
+
+static int convert___skb_to_skb(struct sk_buff *skb, struct __sk_buff *__skb)
+{
+ struct qdisc_skb_cb *cb = (struct qdisc_skb_cb *)skb->cb;
+
+ if (!__skb)
+ return 0;
+
+ /* make sure the fields we don't use are zeroed */
+ if (!range_is_zero(__skb, 0, offsetof(struct __sk_buff, priority)))
+ return -EINVAL;
+
+ /* priority is allowed */
+
+ if (!range_is_zero(__skb, offsetof(struct __sk_buff, priority) +
+ FIELD_SIZEOF(struct __sk_buff, priority),
+ offsetof(struct __sk_buff, cb)))
+ return -EINVAL;
+
+ /* cb is allowed */
+
+ if (!range_is_zero(__skb, offsetof(struct __sk_buff, cb) +
+ FIELD_SIZEOF(struct __sk_buff, cb),
+ sizeof(struct __sk_buff)))
+ return -EINVAL;
+
+ skb->priority = __skb->priority;
+ memcpy(&cb->data, __skb->cb, QDISC_CB_PRIV_LEN);
+
+ return 0;
+}
+
+static void convert_skb_to___skb(struct sk_buff *skb, struct __sk_buff *__skb)
+{
+ struct qdisc_skb_cb *cb = (struct qdisc_skb_cb *)skb->cb;
+
+ if (!__skb)
+ return;
+
+ __skb->priority = skb->priority;
+ memcpy(__skb->cb, &cb->data, QDISC_CB_PRIV_LEN);
+}
+
int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
union bpf_attr __user *uattr)
{
bool is_l2 = false, is_direct_pkt_access = false;
u32 size = kattr->test.data_size_in;
u32 repeat = kattr->test.repeat;
+ struct __sk_buff *ctx = NULL;
u32 retval, duration;
int hh_len = ETH_HLEN;
struct sk_buff *skb;
if (IS_ERR(data))
return PTR_ERR(data);
+ ctx = bpf_ctx_init(kattr, sizeof(struct __sk_buff));
+ if (IS_ERR(ctx)) {
+ kfree(data);
+ return PTR_ERR(ctx);
+ }
+
switch (prog->type) {
case BPF_PROG_TYPE_SCHED_CLS:
case BPF_PROG_TYPE_SCHED_ACT:
sk = kzalloc(sizeof(struct sock), GFP_USER);
if (!sk) {
kfree(data);
+ kfree(ctx);
return -ENOMEM;
}
sock_net_set(sk, current->nsproxy->net_ns);
skb = build_skb(data, 0);
if (!skb) {
kfree(data);
+ kfree(ctx);
kfree(sk);
return -ENOMEM;
}
__skb_push(skb, hh_len);
if (is_direct_pkt_access)
bpf_compute_data_pointers(skb);
+ ret = convert___skb_to_skb(skb, ctx);
+ if (ret)
+ goto out;
ret = bpf_test_run(prog, skb, repeat, &retval, &duration);
- if (ret) {
- kfree_skb(skb);
- kfree(sk);
- return ret;
- }
+ if (ret)
+ goto out;
if (!is_l2) {
if (skb_headroom(skb) < hh_len) {
int nhead = HH_DATA_ALIGN(hh_len - skb_headroom(skb));
if (pskb_expand_head(skb, nhead, 0, GFP_USER)) {
- kfree_skb(skb);
- kfree(sk);
- return -ENOMEM;
+ ret = -ENOMEM;
+ goto out;
}
}
memset(__skb_push(skb, hh_len), 0, hh_len);
}
+ convert_skb_to___skb(skb, ctx);
size = skb->len;
/* bpf program can never convert linear skb to non-linear */
if (WARN_ON_ONCE(skb_is_nonlinear(skb)))
size = skb_headlen(skb);
ret = bpf_test_finish(kattr, uattr, skb->data, size, retval, duration);
+ if (!ret)
+ ret = bpf_ctx_finish(kattr, uattr, ctx,
+ sizeof(struct __sk_buff));
+out:
kfree_skb(skb);
kfree(sk);
+ kfree(ctx);
return ret;
}
void *data;
int ret;
+ if (kattr->test.ctx_in || kattr->test.ctx_out)
+ return -EINVAL;
+
data = bpf_test_init(kattr, size, XDP_PACKET_HEADROOM + NET_IP_ALIGN, 0);
if (IS_ERR(data))
return PTR_ERR(data);
if (prog->type != BPF_PROG_TYPE_FLOW_DISSECTOR)
return -EINVAL;
+ if (kattr->test.ctx_in || kattr->test.ctx_out)
+ return -EINVAL;
+
data = bpf_test_init(kattr, size, NET_SKB_PAD + NET_IP_ALIGN,
SKB_DATA_ALIGN(sizeof(struct skb_shared_info)));
if (IS_ERR(data))
#include <linux/if_vlan.h>
#include <linux/inetdevice.h>
#include <net/addrconf.h>
+#include <net/ipv6_stubs.h>
#if IS_ENABLED(CONFIG_IPV6)
#include <net/ip6_checksum.h>
#endif
u8 *arpptr, *sha;
__be32 sip, tip;
- BR_INPUT_SKB_CB(skb)->proxyarp_replied = false;
+ BR_INPUT_SKB_CB(skb)->proxyarp_replied = 0;
if ((dev->flags & IFF_NOARP) ||
!pskb_may_pull(skb, arp_hdr_len(dev)))
return;
if (ipv4_is_zeronet(sip) || sip == tip) {
/* prevent flooding to neigh suppress ports */
- BR_INPUT_SKB_CB(skb)->proxyarp_replied = true;
+ BR_INPUT_SKB_CB(skb)->proxyarp_replied = 1;
return;
}
}
/* its our local ip, so don't proxy reply
* and don't forward to neigh suppress ports
*/
- BR_INPUT_SKB_CB(skb)->proxyarp_replied = true;
+ BR_INPUT_SKB_CB(skb)->proxyarp_replied = 1;
return;
}
*/
if (replied ||
br_opt_get(br, BROPT_NEIGH_SUPPRESS_ENABLED))
- BR_INPUT_SKB_CB(skb)->proxyarp_replied = true;
+ BR_INPUT_SKB_CB(skb)->proxyarp_replied = 1;
}
neigh_release(n);
struct ipv6hdr *iphdr;
struct neighbour *n;
- BR_INPUT_SKB_CB(skb)->proxyarp_replied = false;
+ BR_INPUT_SKB_CB(skb)->proxyarp_replied = 0;
if (p && (p->flags & BR_NEIGH_SUPPRESS))
return;
if (msg->icmph.icmp6_type == NDISC_NEIGHBOUR_ADVERTISEMENT &&
!msg->icmph.icmp6_solicited) {
/* prevent flooding to neigh suppress ports */
- BR_INPUT_SKB_CB(skb)->proxyarp_replied = true;
+ BR_INPUT_SKB_CB(skb)->proxyarp_replied = 1;
return;
}
if (ipv6_addr_any(saddr) || !ipv6_addr_cmp(saddr, daddr)) {
/* prevent flooding to neigh suppress ports */
- BR_INPUT_SKB_CB(skb)->proxyarp_replied = true;
+ BR_INPUT_SKB_CB(skb)->proxyarp_replied = 1;
return;
}
/* its our own ip, so don't proxy reply
* and don't forward to arp suppress ports
*/
- BR_INPUT_SKB_CB(skb)->proxyarp_replied = true;
+ BR_INPUT_SKB_CB(skb)->proxyarp_replied = 1;
return;
}
*/
if (replied ||
br_opt_get(br, BROPT_NEIGH_SUPPRESS_ENABLED))
- BR_INPUT_SKB_CB(skb)->proxyarp_replied = true;
+ BR_INPUT_SKB_CB(skb)->proxyarp_replied = 1;
}
neigh_release(n);
}
.key_offset = offsetof(struct net_bridge_fdb_entry, key),
.key_len = sizeof(struct net_bridge_fdb_key),
.automatic_shrinking = true,
- .locks_mul = 1,
};
static struct kmem_cache *br_fdb_cache __read_mostly;
struct net_bridge_port *prev, struct net_bridge_port *p,
struct sk_buff *skb, bool local_orig)
{
+ u8 igmp_type = br_multicast_igmp_type(skb);
int err;
if (!should_deliver(p, skb))
err = deliver_clone(prev, skb, local_orig);
if (err)
return ERR_PTR(err);
-
out:
+ br_multicast_count(p->br, p, skb, igmp_type, BR_MCAST_DIR_TX);
+
return p;
}
void br_flood(struct net_bridge *br, struct sk_buff *skb,
enum br_pkt_type pkt_type, bool local_rcv, bool local_orig)
{
- u8 igmp_type = br_multicast_igmp_type(skb);
struct net_bridge_port *prev = NULL;
struct net_bridge_port *p;
prev = maybe_deliver(prev, p, skb, local_orig);
if (IS_ERR(prev))
goto out;
- if (prev == p)
- br_multicast_count(p->br, p, skb, igmp_type,
- BR_MCAST_DIR_TX);
}
if (!prev)
bool local_rcv, bool local_orig)
{
struct net_device *dev = BR_INPUT_SKB_CB(skb)->brdev;
- u8 igmp_type = br_multicast_igmp_type(skb);
struct net_bridge *br = netdev_priv(dev);
struct net_bridge_port *prev = NULL;
struct net_bridge_port_group *p;
}
prev = maybe_deliver(prev, port, skb, local_orig);
-delivered:
if (IS_ERR(prev))
goto out;
- if (prev == port)
- br_multicast_count(port->br, port, skb, igmp_type,
- BR_MCAST_DIR_TX);
-
+delivered:
if ((unsigned long)lport >= (unsigned long)port)
p = rcu_dereference(p->next);
if ((unsigned long)rport >= (unsigned long)port)
ASSERT_RTNL();
if (backup_dev) {
- if (!br_port_exists(backup_dev))
+ if (!netif_is_bridge_port(backup_dev))
return -ENOENT;
backup_p = br_port_get_rtnl(backup_dev);
#include <linux/netdevice.h>
#include <linux/etherdevice.h>
#include <linux/netfilter_bridge.h>
+#ifdef CONFIG_NETFILTER_FAMILY_BRIDGE
+#include <net/netfilter/nf_queue.h>
+#endif
#include <linux/neighbour.h>
#include <net/arp.h>
#include <linux/export.h>
#include "br_private.h"
#include "br_private_tunnel.h"
-/* Hook for brouter */
-br_should_route_hook_t __rcu *br_should_route_hook __read_mostly;
-EXPORT_SYMBOL(br_should_route_hook);
-
static int
br_netif_receive_skb(struct net *net, struct sock *sk, struct sk_buff *skb)
{
return 1;
}
+static int nf_hook_bridge_pre(struct sk_buff *skb, struct sk_buff **pskb)
+{
+#ifdef CONFIG_NETFILTER_FAMILY_BRIDGE
+ struct nf_hook_entries *e = NULL;
+ struct nf_hook_state state;
+ unsigned int verdict, i;
+ struct net *net;
+ int ret;
+
+ net = dev_net(skb->dev);
+#ifdef HAVE_JUMP_LABEL
+ if (!static_key_false(&nf_hooks_needed[NFPROTO_BRIDGE][NF_BR_PRE_ROUTING]))
+ goto frame_finish;
+#endif
+
+ e = rcu_dereference(net->nf.hooks_bridge[NF_BR_PRE_ROUTING]);
+ if (!e)
+ goto frame_finish;
+
+ nf_hook_state_init(&state, NF_BR_PRE_ROUTING,
+ NFPROTO_BRIDGE, skb->dev, NULL, NULL,
+ net, br_handle_frame_finish);
+
+ for (i = 0; i < e->num_hook_entries; i++) {
+ verdict = nf_hook_entry_hookfn(&e->hooks[i], skb, &state);
+ switch (verdict & NF_VERDICT_MASK) {
+ case NF_ACCEPT:
+ if (BR_INPUT_SKB_CB(skb)->br_netfilter_broute) {
+ *pskb = skb;
+ return RX_HANDLER_PASS;
+ }
+ break;
+ case NF_DROP:
+ kfree_skb(skb);
+ return RX_HANDLER_CONSUMED;
+ case NF_QUEUE:
+ ret = nf_queue(skb, &state, e, i, verdict);
+ if (ret == 1)
+ continue;
+ return RX_HANDLER_CONSUMED;
+ default: /* STOLEN */
+ return RX_HANDLER_CONSUMED;
+ }
+ }
+frame_finish:
+ net = dev_net(skb->dev);
+ br_handle_frame_finish(net, NULL, skb);
+#else
+ br_handle_frame_finish(dev_net(skb->dev), NULL, skb);
+#endif
+ return RX_HANDLER_CONSUMED;
+}
+
/*
* Return NULL if skb is handled
* note: already called with rcu_read_lock
struct net_bridge_port *p;
struct sk_buff *skb = *pskb;
const unsigned char *dest = eth_hdr(skb)->h_dest;
- br_should_route_hook_t *rhook;
if (unlikely(skb->pkt_type == PACKET_LOOPBACK))
return RX_HANDLER_PASS;
if (!skb)
return RX_HANDLER_CONSUMED;
+ memset(skb->cb, 0, sizeof(struct br_input_skb_cb));
+
p = br_port_get_rcu(skb->dev);
if (p->flags & BR_VLAN_TUNNEL) {
if (br_handle_ingress_vlan_tunnel(skb, p,
forward:
switch (p->state) {
case BR_STATE_FORWARDING:
- rhook = rcu_dereference(br_should_route_hook);
- if (rhook) {
- if ((*rhook)(skb)) {
- *pskb = skb;
- return RX_HANDLER_PASS;
- }
- dest = eth_hdr(skb)->h_dest;
- }
- /* fall through */
case BR_STATE_LEARNING:
if (ether_addr_equal(p->br->dev->dev_addr, dest))
skb->pkt_type = PACKET_HOST;
- NF_HOOK(NFPROTO_BRIDGE, NF_BR_PRE_ROUTING,
- dev_net(skb->dev), NULL, skb, skb->dev, NULL,
- br_handle_frame_finish);
- break;
+ return nf_hook_bridge_pre(skb, pskb);
default:
drop:
kfree_skb(skb);
.key_offset = offsetof(struct net_bridge_mdb_entry, addr),
.key_len = sizeof(struct br_ip),
.automatic_shrinking = true,
- .locks_mul = 1,
};
static void br_multicast_start_querier(struct net_bridge *br,
__u16 vid, const unsigned char *src);
#endif
-static inline int br_ip_equal(const struct br_ip *a, const struct br_ip *b)
-{
- if (a->proto != b->proto)
- return 0;
- if (a->vid != b->vid)
- return 0;
- switch (a->proto) {
- case htons(ETH_P_IP):
- return a->u.ip4 == b->u.ip4;
-#if IS_ENABLED(CONFIG_IPV6)
- case htons(ETH_P_IPV6):
- return ipv6_addr_equal(&a->u.ip6, &b->u.ip6);
-#endif
- }
- return 0;
-}
-
static struct net_bridge_mdb_entry *br_mdb_ip_get_rcu(struct net_bridge *br,
struct br_ip *dst)
{
if (src)
memcpy(p->eth_addr, src, ETH_ALEN);
else
- memset(p->eth_addr, 0xff, ETH_ALEN);
+ eth_broadcast_addr(p->eth_addr);
return p;
}
int count = 0;
rcu_read_lock();
- if (!br_ip_list || !br_port_exists(dev))
+ if (!br_ip_list || !netif_is_bridge_port(dev))
goto unlock;
port = br_port_get_rcu(dev);
bool ret = false;
rcu_read_lock();
- if (!br_port_exists(dev))
+ if (!netif_is_bridge_port(dev))
goto unlock;
port = br_port_get_rcu(dev);
bool ret = false;
rcu_read_lock();
- if (!br_port_exists(dev))
+ if (!netif_is_bridge_port(dev))
goto unlock;
port = br_port_get_rcu(dev);
size_t vinfo_sz = 0;
rcu_read_lock();
- if (br_port_exists(dev)) {
+ if (netif_is_bridge_port(dev)) {
p = br_port_get_rcu(dev);
vg = nbp_vlan_group_rcu(p);
} else if (dev->priv_flags & IFF_EBRIDGE) {
#define br_auto_port(p) ((p)->flags & BR_AUTO_MASK)
#define br_promisc_port(p) ((p)->flags & BR_PROMISC)
-#define br_port_exists(dev) (dev->priv_flags & IFF_BRIDGE_PORT)
-
static inline struct net_bridge_port *br_port_get_rcu(const struct net_device *dev)
{
return rcu_dereference(dev->rx_handler_data);
static inline struct net_bridge_port *br_port_get_rtnl(const struct net_device *dev)
{
- return br_port_exists(dev) ?
+ return netif_is_bridge_port(dev) ?
rtnl_dereference(dev->rx_handler_data) : NULL;
}
static inline struct net_bridge_port *br_port_get_rtnl_rcu(const struct net_device *dev)
{
- return br_port_exists(dev) ?
+ return netif_is_bridge_port(dev) ?
rcu_dereference_rtnl(dev->rx_handler_data) : NULL;
}
struct net_device *brdev;
#ifdef CONFIG_BRIDGE_IGMP_SNOOPING
- int igmp;
- int mrouters_only;
+ u8 igmp;
+ u8 mrouters_only:1;
#endif
-
- bool proxyarp_replied;
- bool src_port_isolated;
-
+ u8 proxyarp_replied:1;
+ u8 src_port_isolated:1;
#ifdef CONFIG_BRIDGE_VLAN_FILTERING
- bool vlan_filtered;
+ u8 vlan_filtered:1;
+#endif
+#ifdef CONFIG_NETFILTER_FAMILY_BRIDGE
+ u8 br_netfilter_broute:1;
#endif
#ifdef CONFIG_NET_SWITCHDEV
del_timer(&p->forward_delay_timer);
del_timer(&p->hold_timer);
- br_fdb_delete_by_port(br, p, 0, 0);
+ if (!rcu_access_pointer(p->backup_port))
+ br_fdb_delete_by_port(br, p, 0, 0);
br_multicast_disable_port(p);
br_configuration_update(br);
.key_offset = offsetof(struct net_bridge_vlan, vid),
.key_len = sizeof(u16),
.nelem_hint = 3,
- .locks_mul = 1,
.max_size = VLAN_N_VID,
.obj_cmpfn = br_vlan_cmp,
.automatic_shrinking = true,
.key_offset = offsetof(struct net_bridge_vlan, tinfo.tunnel_id),
.key_len = sizeof(__be64),
.nelem_hint = 3,
- .locks_mul = 1,
.obj_cmpfn = br_vlan_tunid_cmp,
.automatic_shrinking = true,
};
#include <linux/module.h>
#include <linux/if_bridge.h>
+#include "../br_private.h"
+
/* EBT_ACCEPT means the frame will be bridged
* EBT_DROP means the frame will be routed
*/
.me = THIS_MODULE,
};
-static int ebt_broute(struct sk_buff *skb)
+static unsigned int ebt_broute(void *priv, struct sk_buff *skb,
+ const struct nf_hook_state *s)
{
+ struct net_bridge_port *p = br_port_get_rcu(skb->dev);
struct nf_hook_state state;
+ unsigned char *dest;
int ret;
+ if (!p || p->state != BR_STATE_FORWARDING)
+ return NF_ACCEPT;
+
nf_hook_state_init(&state, NF_BR_BROUTING,
- NFPROTO_BRIDGE, skb->dev, NULL, NULL,
- dev_net(skb->dev), NULL);
+ NFPROTO_BRIDGE, s->in, NULL, NULL,
+ s->net, NULL);
ret = ebt_do_table(skb, &state, state.net->xt.broute_table);
- if (ret == NF_DROP)
- return 1; /* route it */
- return 0; /* bridge it */
+
+ if (ret != NF_DROP)
+ return ret;
+
+ /* DROP in ebtables -t broute means that the
+ * skb should be routed, not bridged.
+ * This is awkward, but can't be changed for compatibility
+ * reasons.
+ *
+ * We map DROP to ACCEPT and set the ->br_netfilter_broute flag.
+ */
+ BR_INPUT_SKB_CB(skb)->br_netfilter_broute = 1;
+
+ /* undo PACKET_HOST mangling done in br_input in case the dst
+ * address matches the logical bridge but not the port.
+ */
+ dest = eth_hdr(skb)->h_dest;
+ if (skb->pkt_type == PACKET_HOST &&
+ !ether_addr_equal(skb->dev->dev_addr, dest) &&
+ ether_addr_equal(p->br->dev->dev_addr, dest))
+ skb->pkt_type = PACKET_OTHERHOST;
+
+ return NF_ACCEPT;
}
+static const struct nf_hook_ops ebt_ops_broute = {
+ .hook = ebt_broute,
+ .pf = NFPROTO_BRIDGE,
+ .hooknum = NF_BR_PRE_ROUTING,
+ .priority = NF_BR_PRI_FIRST,
+};
+
static int __net_init broute_net_init(struct net *net)
{
- return ebt_register_table(net, &broute_table, NULL,
+ return ebt_register_table(net, &broute_table, &ebt_ops_broute,
&net->xt.broute_table);
}
static void __net_exit broute_net_exit(struct net *net)
{
- ebt_unregister_table(net, net->xt.broute_table, NULL);
+ ebt_unregister_table(net, net->xt.broute_table, &ebt_ops_broute);
}
static struct pernet_operations broute_net_ops = {
static int __init ebtable_broute_init(void)
{
- int ret;
-
- ret = register_pernet_subsys(&broute_net_ops);
- if (ret < 0)
- return ret;
- /* see br_input.c */
- RCU_INIT_POINTER(br_should_route_hook,
- (br_should_route_hook_t *)ebt_broute);
- return 0;
+ return register_pernet_subsys(&broute_net_ops);
}
static void __exit ebtable_broute_fini(void)
{
- RCU_INIT_POINTER(br_should_route_hook, NULL);
- synchronize_net();
unregister_pernet_subsys(&broute_net_ops);
}
mutex_unlock(&ebt_mutex);
WRITE_ONCE(*res, table);
-
- if (!ops)
- return 0;
-
ret = nf_register_net_hooks(net, ops, hweight32(table->valid_hooks));
if (ret) {
__ebt_unregister_table(net, table);
void ebt_unregister_table(struct net *net, struct ebt_table *table,
const struct nf_hook_ops *ops)
{
- if (ops)
- nf_unregister_net_hooks(net, ops, hweight32(table->valid_hooks));
+ nf_unregister_net_hooks(net, ops, hweight32(table->valid_hooks));
__ebt_unregister_table(net, table);
}
goto noxoff;
if (likely(!netif_queue_stopped(caifd->netdev))) {
+ struct Qdisc *sch;
+
/* If we run with a TX queue, check if the queue is too long*/
txq = netdev_get_tx_queue(skb->dev, 0);
- qlen = qdisc_qlen(rcu_dereference_bh(txq->qdisc));
-
- if (likely(qlen == 0))
+ sch = rcu_dereference_bh(txq->qdisc);
+ if (likely(qdisc_is_empty(sch)))
goto noxoff;
+ /* can check for explicit qdisc len value only !NOLOCK,
+ * always set flow off otherwise
+ */
high = (caifd->netdev->tx_queue_len * q_high) / 100;
- if (likely(qlen < high))
+ if (!(sch->flags & TCQ_F_NOLOCK) && likely(sch->q.qlen < high))
goto noxoff;
}
#include <trace/events/skb.h>
#include <net/busy_poll.h>
+#include "datagram.h"
+
/*
* Is a socket 'connection oriented' ?
*/
unsigned int flags,
void (*destructor)(struct sock *sk,
struct sk_buff *skb),
- int *peeked, int *off, int *err,
+ int *off, int *err,
struct sk_buff **last)
{
bool peek_at_off = false;
return NULL;
}
}
- *peeked = 1;
refcount_inc(&skb->users);
} else {
__skb_unlink(skb, queue);
* @sk: socket
* @flags: MSG\_ flags
* @destructor: invoked under the receive lock on successful dequeue
- * @peeked: returns non-zero if this packet has been seen before
* @off: an offset in bytes to peek skb from. Returns an offset
* within an skb where data actually starts
* @err: error code returned
struct sk_buff *__skb_try_recv_datagram(struct sock *sk, unsigned int flags,
void (*destructor)(struct sock *sk,
struct sk_buff *skb),
- int *peeked, int *off, int *err,
+ int *off, int *err,
struct sk_buff **last)
{
struct sk_buff_head *queue = &sk->sk_receive_queue;
if (error)
goto no_packet;
- *peeked = 0;
do {
/* Again only user level code calls this function, so nothing
* interrupt level will suddenly eat the receive_queue.
*/
spin_lock_irqsave(&queue->lock, cpu_flags);
skb = __skb_try_recv_from_queue(sk, queue, flags, destructor,
- peeked, off, &error, last);
+ off, &error, last);
spin_unlock_irqrestore(&queue->lock, cpu_flags);
if (error)
goto no_packet;
struct sk_buff *__skb_recv_datagram(struct sock *sk, unsigned int flags,
void (*destructor)(struct sock *sk,
struct sk_buff *skb),
- int *peeked, int *off, int *err)
+ int *off, int *err)
{
struct sk_buff *skb, *last;
long timeo;
timeo = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
do {
- skb = __skb_try_recv_datagram(sk, flags, destructor, peeked,
- off, err, &last);
+ skb = __skb_try_recv_datagram(sk, flags, destructor, off, err,
+ &last);
if (skb)
return skb;
struct sk_buff *skb_recv_datagram(struct sock *sk, unsigned int flags,
int noblock, int *err)
{
- int peeked, off = 0;
+ int off = 0;
return __skb_recv_datagram(sk, flags | (noblock ? MSG_DONTWAIT : 0),
- NULL, &peeked, &off, err);
+ NULL, &off, err);
}
EXPORT_SYMBOL(skb_recv_datagram);
}
EXPORT_SYMBOL(skb_kill_datagram);
-int __skb_datagram_iter(const struct sk_buff *skb, int offset,
- struct iov_iter *to, int len, bool fault_short,
- size_t (*cb)(const void *, size_t, void *, struct iov_iter *),
- void *data)
+static int __skb_datagram_iter(const struct sk_buff *skb, int offset,
+ struct iov_iter *to, int len, bool fault_short,
+ size_t (*cb)(const void *, size_t, void *,
+ struct iov_iter *), void *data)
{
int start = skb_headlen(skb);
int i, copy = start - offset, start_off = offset, n;
--- /dev/null
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _NET_CORE_DATAGRAM_H_
+#define _NET_CORE_DATAGRAM_H_
+
+#include <linux/types.h>
+
+struct sock;
+struct sk_buff;
+struct iov_iter;
+
+int __zerocopy_sg_from_iter(struct sock *sk, struct sk_buff *skb,
+ struct iov_iter *from, size_t length);
+
+#endif /* _NET_CORE_DATAGRAM_H_ */
#include <trace/events/napi.h>
#include <trace/events/net.h>
#include <trace/events/skb.h>
-#include <linux/pci.h>
#include <linux/inetdevice.h>
#include <linux/cpu_rmap.h>
#include <linux/static_key.h>
#include <net/udp_tunnel.h>
#include <linux/net_namespace.h>
#include <linux/indirect_call_wrapper.h>
+#include <net/devlink.h>
#include "net-sysfs.h"
if (unlikely(test_bit(__QDISC_STATE_DEACTIVATED, &q->state))) {
__qdisc_drop(skb, &to_free);
rc = NET_XMIT_DROP;
+ } else if ((q->flags & TCQ_F_CAN_BYPASS) && q->empty &&
+ qdisc_run_begin(q)) {
+ qdisc_bstats_cpu_update(q, skb);
+
+ if (sch_direct_xmit(skb, q, dev, txq, NULL, true))
+ __qdisc_run(q);
+
+ qdisc_run_end(q);
+ rc = NET_XMIT_SUCCESS;
} else {
rc = q->enqueue(skb, q, &to_free) & NET_XMIT_MASK;
qdisc_run(q);
#define skb_update_prio(skb)
#endif
-DEFINE_PER_CPU(int, xmit_recursion);
-EXPORT_SYMBOL(xmit_recursion);
-
/**
* dev_loopback_xmit - loop back @skb
* @net: network namespace this loopback is happening in
}
u16 dev_pick_tx_zero(struct net_device *dev, struct sk_buff *skb,
- struct net_device *sb_dev,
- select_queue_fallback_t fallback)
+ struct net_device *sb_dev)
{
return 0;
}
EXPORT_SYMBOL(dev_pick_tx_zero);
u16 dev_pick_tx_cpu_id(struct net_device *dev, struct sk_buff *skb,
- struct net_device *sb_dev,
- select_queue_fallback_t fallback)
+ struct net_device *sb_dev)
{
return (u16)raw_smp_processor_id() % dev->real_num_tx_queues;
}
EXPORT_SYMBOL(dev_pick_tx_cpu_id);
-static u16 __netdev_pick_tx(struct net_device *dev, struct sk_buff *skb,
- struct net_device *sb_dev)
+u16 netdev_pick_tx(struct net_device *dev, struct sk_buff *skb,
+ struct net_device *sb_dev)
{
struct sock *sk = skb->sk;
int queue_index = sk_tx_queue_get(sk);
return queue_index;
}
+EXPORT_SYMBOL(netdev_pick_tx);
-struct netdev_queue *netdev_pick_tx(struct net_device *dev,
- struct sk_buff *skb,
- struct net_device *sb_dev)
+struct netdev_queue *netdev_core_pick_tx(struct net_device *dev,
+ struct sk_buff *skb,
+ struct net_device *sb_dev)
{
int queue_index = 0;
const struct net_device_ops *ops = dev->netdev_ops;
if (ops->ndo_select_queue)
- queue_index = ops->ndo_select_queue(dev, skb, sb_dev,
- __netdev_pick_tx);
+ queue_index = ops->ndo_select_queue(dev, skb, sb_dev);
else
- queue_index = __netdev_pick_tx(dev, skb, sb_dev);
+ queue_index = netdev_pick_tx(dev, skb, sb_dev);
queue_index = netdev_cap_txqueue(dev, queue_index);
}
else
skb_dst_force(skb);
- txq = netdev_pick_tx(dev, skb, sb_dev);
+ txq = netdev_core_pick_tx(dev, skb, sb_dev);
q = rcu_dereference_bh(txq->qdisc);
trace_net_dev_queue(skb);
int cpu = smp_processor_id(); /* ok because BHs are off */
if (txq->xmit_lock_owner != cpu) {
- if (unlikely(__this_cpu_read(xmit_recursion) >
- XMIT_RECURSION_LIMIT))
+ if (dev_xmit_recursion())
goto recursion_alert;
skb = validate_xmit_skb(skb, dev, &again);
HARD_TX_LOCK(dev, txq, cpu);
if (!netif_xmit_stopped(txq)) {
- __this_cpu_inc(xmit_recursion);
+ dev_xmit_recursion_inc();
skb = dev_hard_start_xmit(skb, dev, txq, &rc);
- __this_cpu_dec(xmit_recursion);
+ dev_xmit_recursion_dec();
if (dev_xmit_complete(rc)) {
HARD_TX_UNLOCK(dev, txq);
goto out;
u32 rps_cpu_mask __read_mostly;
EXPORT_SYMBOL(rps_cpu_mask);
-struct static_key rps_needed __read_mostly;
+struct static_key_false rps_needed __read_mostly;
EXPORT_SYMBOL(rps_needed);
-struct static_key rfs_needed __read_mostly;
+struct static_key_false rfs_needed __read_mostly;
EXPORT_SYMBOL(rfs_needed);
static struct rps_dev_flow *
bool free_skb = true;
int cpu, rc;
- txq = netdev_pick_tx(dev, skb, NULL);
+ txq = netdev_core_pick_tx(dev, skb, NULL);
cpu = smp_processor_id();
HARD_TX_LOCK(dev, txq, cpu);
if (!netif_xmit_stopped(txq)) {
}
#ifdef CONFIG_RPS
- if (static_key_false(&rps_needed)) {
+ if (static_branch_unlikely(&rps_needed)) {
struct rps_dev_flow voidflow, *rflow = &voidflow;
int cpu;
rcu_read_lock();
#ifdef CONFIG_RPS
- if (static_key_false(&rps_needed)) {
+ if (static_branch_unlikely(&rps_needed)) {
struct rps_dev_flow voidflow, *rflow = &voidflow;
int cpu = get_rps_cpu(skb->dev, skb, &rflow);
rcu_read_lock();
#ifdef CONFIG_RPS
- if (static_key_false(&rps_needed)) {
+ if (static_branch_unlikely(&rps_needed)) {
list_for_each_entry_safe(skb, next, head, list) {
struct rps_dev_flow voidflow, *rflow = &voidflow;
int cpu = get_rps_cpu(skb->dev, skb, &rflow);
char *name, size_t len)
{
const struct net_device_ops *ops = dev->netdev_ops;
+ int err;
- if (!ops->ndo_get_phys_port_name)
- return -EOPNOTSUPP;
- return ops->ndo_get_phys_port_name(dev, name, len);
+ if (ops->ndo_get_phys_port_name) {
+ err = ops->ndo_get_phys_port_name(dev, name, len);
+ if (err != -EOPNOTSUPP)
+ return err;
+ }
+ return devlink_compat_phys_port_name_get(dev, name, len);
}
EXPORT_SYMBOL(dev_get_phys_port_name);
struct netdev_phys_item_id first = { };
struct net_device *lower_dev;
struct list_head *iter;
- int err = -EOPNOTSUPP;
+ int err;
- if (ops->ndo_get_port_parent_id)
- return ops->ndo_get_port_parent_id(dev, ppid);
+ if (ops->ndo_get_port_parent_id) {
+ err = ops->ndo_get_port_parent_id(dev, ppid);
+ if (err != -EOPNOTSUPP)
+ return err;
+ }
- if (!recurse)
+ err = devlink_compat_switch_id_get(dev, ppid);
+ if (!err || err != -EOPNOTSUPP)
return err;
+ if (!recurse)
+ return -EOPNOTSUPP;
+
netdev_for_each_lower_dev(dev, lower_dev, iter) {
err = dev_get_port_parent_id(lower_dev, ppid, recurse);
if (err)
* dev_ioctl - network device ioctl
* @net: the applicable net namespace
* @cmd: command to issue
- * @arg: pointer to a struct ifreq in user space
+ * @ifr: pointer to a struct ifreq in user space
+ * @need_copyout: whether or not copy_to_user() should be called
*
* Issue ioctl functions to devices. This is normally called by the
* user space syscall interfaces but can sometimes be useful for
#include <linux/device.h>
#include <linux/list.h>
#include <linux/netdevice.h>
+#include <linux/spinlock.h>
#include <rdma/ib_verbs.h>
#include <net/netlink.h>
#include <net/genetlink.h>
goto nla_put_failure;
if (nla_put_u32(msg, DEVLINK_ATTR_PORT_INDEX, devlink_port->index))
goto nla_put_failure;
+
+ spin_lock(&devlink_port->type_lock);
if (nla_put_u16(msg, DEVLINK_ATTR_PORT_TYPE, devlink_port->type))
- goto nla_put_failure;
+ goto nla_put_failure_type_locked;
if (devlink_port->desired_type != DEVLINK_PORT_TYPE_NOTSET &&
nla_put_u16(msg, DEVLINK_ATTR_PORT_DESIRED_TYPE,
devlink_port->desired_type))
- goto nla_put_failure;
+ goto nla_put_failure_type_locked;
if (devlink_port->type == DEVLINK_PORT_TYPE_ETH) {
struct net_device *netdev = devlink_port->type_dev;
netdev->ifindex) ||
nla_put_string(msg, DEVLINK_ATTR_PORT_NETDEV_NAME,
netdev->name)))
- goto nla_put_failure;
+ goto nla_put_failure_type_locked;
}
if (devlink_port->type == DEVLINK_PORT_TYPE_IB) {
struct ib_device *ibdev = devlink_port->type_dev;
if (ibdev &&
nla_put_string(msg, DEVLINK_ATTR_PORT_IBDEV_NAME,
ibdev->name))
- goto nla_put_failure;
+ goto nla_put_failure_type_locked;
}
+ spin_unlock(&devlink_port->type_lock);
if (devlink_nl_port_attrs_put(msg, devlink_port))
goto nla_put_failure;
genlmsg_end(msg, hdr);
return 0;
+nla_put_failure_type_locked:
+ spin_unlock(&devlink_port->type_lock);
nla_put_failure:
genlmsg_cancel(msg, hdr);
return -EMSGSIZE;
struct netlink_callback *cb)
{
u64 ret_offset, start_offset, end_offset = 0;
- const struct genl_ops *ops = cb->data;
struct devlink_region *region;
struct nlattr *chunks_attr;
const char *region_name;
return -ENOMEM;
err = nlmsg_parse(cb->nlh, GENL_HDRLEN + devlink_nl_family.hdrsize,
- attrs, DEVLINK_ATTR_MAX, ops->policy, cb->extack);
+ attrs, DEVLINK_ATTR_MAX, devlink_nl_family.policy,
+ cb->extack);
if (err)
goto out_free;
{
mutex_lock(&reporter->devlink->lock);
list_del(&reporter->list);
+ mutex_destroy(&reporter->dump_lock);
mutex_unlock(&reporter->devlink->lock);
if (reporter->dump_fmsg)
devlink_fmsg_free(reporter->dump_fmsg);
.cmd = DEVLINK_CMD_GET,
.doit = devlink_nl_cmd_get_doit,
.dumpit = devlink_nl_cmd_get_dumpit,
- .policy = devlink_nl_policy,
.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
/* can be retrieved by unprivileged users */
},
.cmd = DEVLINK_CMD_PORT_GET,
.doit = devlink_nl_cmd_port_get_doit,
.dumpit = devlink_nl_cmd_port_get_dumpit,
- .policy = devlink_nl_policy,
.internal_flags = DEVLINK_NL_FLAG_NEED_PORT,
/* can be retrieved by unprivileged users */
},
{
.cmd = DEVLINK_CMD_PORT_SET,
.doit = devlink_nl_cmd_port_set_doit,
- .policy = devlink_nl_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = DEVLINK_NL_FLAG_NEED_PORT,
},
{
.cmd = DEVLINK_CMD_PORT_SPLIT,
.doit = devlink_nl_cmd_port_split_doit,
- .policy = devlink_nl_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK |
DEVLINK_NL_FLAG_NO_LOCK,
{
.cmd = DEVLINK_CMD_PORT_UNSPLIT,
.doit = devlink_nl_cmd_port_unsplit_doit,
- .policy = devlink_nl_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK |
DEVLINK_NL_FLAG_NO_LOCK,
.cmd = DEVLINK_CMD_SB_GET,
.doit = devlink_nl_cmd_sb_get_doit,
.dumpit = devlink_nl_cmd_sb_get_dumpit,
- .policy = devlink_nl_policy,
.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK |
DEVLINK_NL_FLAG_NEED_SB,
/* can be retrieved by unprivileged users */
.cmd = DEVLINK_CMD_SB_POOL_GET,
.doit = devlink_nl_cmd_sb_pool_get_doit,
.dumpit = devlink_nl_cmd_sb_pool_get_dumpit,
- .policy = devlink_nl_policy,
.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK |
DEVLINK_NL_FLAG_NEED_SB,
/* can be retrieved by unprivileged users */
{
.cmd = DEVLINK_CMD_SB_POOL_SET,
.doit = devlink_nl_cmd_sb_pool_set_doit,
- .policy = devlink_nl_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK |
DEVLINK_NL_FLAG_NEED_SB,
.cmd = DEVLINK_CMD_SB_PORT_POOL_GET,
.doit = devlink_nl_cmd_sb_port_pool_get_doit,
.dumpit = devlink_nl_cmd_sb_port_pool_get_dumpit,
- .policy = devlink_nl_policy,
.internal_flags = DEVLINK_NL_FLAG_NEED_PORT |
DEVLINK_NL_FLAG_NEED_SB,
/* can be retrieved by unprivileged users */
{
.cmd = DEVLINK_CMD_SB_PORT_POOL_SET,
.doit = devlink_nl_cmd_sb_port_pool_set_doit,
- .policy = devlink_nl_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = DEVLINK_NL_FLAG_NEED_PORT |
DEVLINK_NL_FLAG_NEED_SB,
.cmd = DEVLINK_CMD_SB_TC_POOL_BIND_GET,
.doit = devlink_nl_cmd_sb_tc_pool_bind_get_doit,
.dumpit = devlink_nl_cmd_sb_tc_pool_bind_get_dumpit,
- .policy = devlink_nl_policy,
.internal_flags = DEVLINK_NL_FLAG_NEED_PORT |
DEVLINK_NL_FLAG_NEED_SB,
/* can be retrieved by unprivileged users */
{
.cmd = DEVLINK_CMD_SB_TC_POOL_BIND_SET,
.doit = devlink_nl_cmd_sb_tc_pool_bind_set_doit,
- .policy = devlink_nl_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = DEVLINK_NL_FLAG_NEED_PORT |
DEVLINK_NL_FLAG_NEED_SB,
{
.cmd = DEVLINK_CMD_SB_OCC_SNAPSHOT,
.doit = devlink_nl_cmd_sb_occ_snapshot_doit,
- .policy = devlink_nl_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK |
DEVLINK_NL_FLAG_NEED_SB,
{
.cmd = DEVLINK_CMD_SB_OCC_MAX_CLEAR,
.doit = devlink_nl_cmd_sb_occ_max_clear_doit,
- .policy = devlink_nl_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK |
DEVLINK_NL_FLAG_NEED_SB,
{
.cmd = DEVLINK_CMD_ESWITCH_GET,
.doit = devlink_nl_cmd_eswitch_get_doit,
- .policy = devlink_nl_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
},
{
.cmd = DEVLINK_CMD_ESWITCH_SET,
.doit = devlink_nl_cmd_eswitch_set_doit,
- .policy = devlink_nl_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK |
DEVLINK_NL_FLAG_NO_LOCK,
{
.cmd = DEVLINK_CMD_DPIPE_TABLE_GET,
.doit = devlink_nl_cmd_dpipe_table_get,
- .policy = devlink_nl_policy,
.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
/* can be retrieved by unprivileged users */
},
{
.cmd = DEVLINK_CMD_DPIPE_ENTRIES_GET,
.doit = devlink_nl_cmd_dpipe_entries_get,
- .policy = devlink_nl_policy,
.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
/* can be retrieved by unprivileged users */
},
{
.cmd = DEVLINK_CMD_DPIPE_HEADERS_GET,
.doit = devlink_nl_cmd_dpipe_headers_get,
- .policy = devlink_nl_policy,
.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
/* can be retrieved by unprivileged users */
},
{
.cmd = DEVLINK_CMD_DPIPE_TABLE_COUNTERS_SET,
.doit = devlink_nl_cmd_dpipe_table_counters_set,
- .policy = devlink_nl_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
},
{
.cmd = DEVLINK_CMD_RESOURCE_SET,
.doit = devlink_nl_cmd_resource_set,
- .policy = devlink_nl_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
},
{
.cmd = DEVLINK_CMD_RESOURCE_DUMP,
.doit = devlink_nl_cmd_resource_dump,
- .policy = devlink_nl_policy,
.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
/* can be retrieved by unprivileged users */
},
{
.cmd = DEVLINK_CMD_RELOAD,
.doit = devlink_nl_cmd_reload,
- .policy = devlink_nl_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK |
DEVLINK_NL_FLAG_NO_LOCK,
.cmd = DEVLINK_CMD_PARAM_GET,
.doit = devlink_nl_cmd_param_get_doit,
.dumpit = devlink_nl_cmd_param_get_dumpit,
- .policy = devlink_nl_policy,
.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
/* can be retrieved by unprivileged users */
},
{
.cmd = DEVLINK_CMD_PARAM_SET,
.doit = devlink_nl_cmd_param_set_doit,
- .policy = devlink_nl_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
},
.cmd = DEVLINK_CMD_PORT_PARAM_GET,
.doit = devlink_nl_cmd_port_param_get_doit,
.dumpit = devlink_nl_cmd_port_param_get_dumpit,
- .policy = devlink_nl_policy,
.internal_flags = DEVLINK_NL_FLAG_NEED_PORT,
/* can be retrieved by unprivileged users */
},
{
.cmd = DEVLINK_CMD_PORT_PARAM_SET,
.doit = devlink_nl_cmd_port_param_set_doit,
- .policy = devlink_nl_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = DEVLINK_NL_FLAG_NEED_PORT,
},
.cmd = DEVLINK_CMD_REGION_GET,
.doit = devlink_nl_cmd_region_get_doit,
.dumpit = devlink_nl_cmd_region_get_dumpit,
- .policy = devlink_nl_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
},
{
.cmd = DEVLINK_CMD_REGION_DEL,
.doit = devlink_nl_cmd_region_del,
- .policy = devlink_nl_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
},
{
.cmd = DEVLINK_CMD_REGION_READ,
.dumpit = devlink_nl_cmd_region_read_dumpit,
- .policy = devlink_nl_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
},
.cmd = DEVLINK_CMD_INFO_GET,
.doit = devlink_nl_cmd_info_get_doit,
.dumpit = devlink_nl_cmd_info_get_dumpit,
- .policy = devlink_nl_policy,
.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
/* can be retrieved by unprivileged users */
},
.cmd = DEVLINK_CMD_HEALTH_REPORTER_GET,
.doit = devlink_nl_cmd_health_reporter_get_doit,
.dumpit = devlink_nl_cmd_health_reporter_get_dumpit,
- .policy = devlink_nl_policy,
.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
/* can be retrieved by unprivileged users */
},
{
.cmd = DEVLINK_CMD_HEALTH_REPORTER_SET,
.doit = devlink_nl_cmd_health_reporter_set_doit,
- .policy = devlink_nl_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
},
{
.cmd = DEVLINK_CMD_HEALTH_REPORTER_RECOVER,
.doit = devlink_nl_cmd_health_reporter_recover_doit,
- .policy = devlink_nl_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
},
{
.cmd = DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE,
.doit = devlink_nl_cmd_health_reporter_diagnose_doit,
- .policy = devlink_nl_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
},
{
.cmd = DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET,
.doit = devlink_nl_cmd_health_reporter_dump_get_doit,
- .policy = devlink_nl_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK |
DEVLINK_NL_FLAG_NO_LOCK,
{
.cmd = DEVLINK_CMD_HEALTH_REPORTER_DUMP_CLEAR,
.doit = devlink_nl_cmd_health_reporter_dump_clear_doit,
- .policy = devlink_nl_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK |
DEVLINK_NL_FLAG_NO_LOCK,
{
.cmd = DEVLINK_CMD_FLASH_UPDATE,
.doit = devlink_nl_cmd_flash_update,
- .policy = devlink_nl_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
},
.name = DEVLINK_GENL_NAME,
.version = DEVLINK_GENL_VERSION,
.maxattr = DEVLINK_ATTR_MAX,
+ .policy = devlink_nl_policy,
.netnsok = true,
.pre_doit = devlink_nl_pre_doit,
.post_doit = devlink_nl_post_doit,
*/
void devlink_free(struct devlink *devlink)
{
+ mutex_destroy(&devlink->lock);
WARN_ON(!list_empty(&devlink->reporter_list));
WARN_ON(!list_empty(&devlink->region_list));
WARN_ON(!list_empty(&devlink->param_list));
devlink_port->devlink = devlink;
devlink_port->index = port_index;
devlink_port->registered = true;
+ spin_lock_init(&devlink_port->type_lock);
list_add_tail(&devlink_port->list, &devlink->port_list);
INIT_LIST_HEAD(&devlink_port->param_list);
mutex_unlock(&devlink->lock);
enum devlink_port_type type,
void *type_dev)
{
+ if (WARN_ON(!devlink_port->registered))
+ return;
+ spin_lock(&devlink_port->type_lock);
devlink_port->type = type;
devlink_port->type_dev = type_dev;
+ spin_unlock(&devlink_port->type_lock);
devlink_port_notify(devlink_port, DEVLINK_CMD_PORT_NEW);
}
void devlink_port_type_eth_set(struct devlink_port *devlink_port,
struct net_device *netdev)
{
- return __devlink_port_type_set(devlink_port,
- DEVLINK_PORT_TYPE_ETH, netdev);
+ const struct net_device_ops *ops = netdev->netdev_ops;
+
+ /* If driver registers devlink port, it should set devlink port
+ * attributes accordingly so the compat functions are called
+ * and the original ops are not used.
+ */
+ if (ops->ndo_get_phys_port_name) {
+ /* Some drivers use the same set of ndos for netdevs
+ * that have devlink_port registered and also for
+ * those who don't. Make sure that ndo_get_phys_port_name
+ * returns -EOPNOTSUPP here in case it is defined.
+ * Warn if not.
+ */
+ char name[IFNAMSIZ];
+ int err;
+
+ err = ops->ndo_get_phys_port_name(netdev, name, sizeof(name));
+ WARN_ON(err != -EOPNOTSUPP);
+ }
+ if (ops->ndo_get_port_parent_id) {
+ /* Some drivers use the same set of ndos for netdevs
+ * that have devlink_port registered and also for
+ * those who don't. Make sure that ndo_get_port_parent_id
+ * returns -EOPNOTSUPP here in case it is defined.
+ * Warn if not.
+ */
+ struct netdev_phys_item_id ppid;
+ int err;
+
+ err = ops->ndo_get_port_parent_id(netdev, &ppid);
+ WARN_ON(err != -EOPNOTSUPP);
+ }
+ __devlink_port_type_set(devlink_port, DEVLINK_PORT_TYPE_ETH, netdev);
}
EXPORT_SYMBOL_GPL(devlink_port_type_eth_set);
void devlink_port_type_ib_set(struct devlink_port *devlink_port,
struct ib_device *ibdev)
{
- return __devlink_port_type_set(devlink_port,
- DEVLINK_PORT_TYPE_IB, ibdev);
+ __devlink_port_type_set(devlink_port, DEVLINK_PORT_TYPE_IB, ibdev);
}
EXPORT_SYMBOL_GPL(devlink_port_type_ib_set);
*/
void devlink_port_type_clear(struct devlink_port *devlink_port)
{
- return __devlink_port_type_set(devlink_port,
- DEVLINK_PORT_TYPE_NOTSET, NULL);
+ __devlink_port_type_set(devlink_port, DEVLINK_PORT_TYPE_NOTSET, NULL);
}
EXPORT_SYMBOL_GPL(devlink_port_type_clear);
* @split: indicates if this is split port
* @split_subport_number: if the port is split, this is the number
* of subport.
+ * @switch_id: if the port is part of switch, this is buffer with ID,
+ * otwerwise this is NULL
+ * @switch_id_len: length of the switch_id buffer
*/
void devlink_port_attrs_set(struct devlink_port *devlink_port,
enum devlink_port_flavour flavour,
u32 port_number, bool split,
- u32 split_subport_number)
+ u32 split_subport_number,
+ const unsigned char *switch_id,
+ unsigned char switch_id_len)
{
struct devlink_port_attrs *attrs = &devlink_port->attrs;
+ if (WARN_ON(devlink_port->registered))
+ return;
attrs->set = true;
attrs->flavour = flavour;
attrs->port_number = port_number;
attrs->split = split;
attrs->split_subport_number = split_subport_number;
- devlink_port_notify(devlink_port, DEVLINK_CMD_PORT_NEW);
+ if (switch_id) {
+ attrs->switch_port = true;
+ if (WARN_ON(switch_id_len > MAX_PHYS_ITEM_ID_LEN))
+ switch_id_len = MAX_PHYS_ITEM_ID_LEN;
+ memcpy(attrs->switch_id.id, switch_id, switch_id_len);
+ attrs->switch_id.id_len = switch_id_len;
+ } else {
+ attrs->switch_port = false;
+ }
}
EXPORT_SYMBOL_GPL(devlink_port_attrs_set);
-int devlink_port_get_phys_port_name(struct devlink_port *devlink_port,
- char *name, size_t len)
+static int __devlink_port_phys_port_name_get(struct devlink_port *devlink_port,
+ char *name, size_t len)
{
struct devlink_port_attrs *attrs = &devlink_port->attrs;
int n = 0;
return 0;
}
-EXPORT_SYMBOL_GPL(devlink_port_get_phys_port_name);
int devlink_sb_register(struct devlink *devlink, unsigned int sb_index,
u32 size, u16 ingress_pools_count,
dev_hold(dev);
rtnl_unlock();
- mutex_lock(&devlink_mutex);
devlink = netdev_to_devlink(dev);
if (!devlink || !devlink->ops->info_get)
- goto unlock_list;
+ goto out;
mutex_lock(&devlink->lock);
__devlink_compat_running_version(devlink, buf, len);
mutex_unlock(&devlink->lock);
-unlock_list:
- mutex_unlock(&devlink_mutex);
+out:
rtnl_lock();
dev_put(dev);
}
int devlink_compat_flash_update(struct net_device *dev, const char *file_name)
{
struct devlink *devlink;
- int ret = -EOPNOTSUPP;
+ int ret;
dev_hold(dev);
rtnl_unlock();
- mutex_lock(&devlink_mutex);
devlink = netdev_to_devlink(dev);
- if (!devlink || !devlink->ops->flash_update)
- goto unlock_list;
+ if (!devlink || !devlink->ops->flash_update) {
+ ret = -EOPNOTSUPP;
+ goto out;
+ }
mutex_lock(&devlink->lock);
ret = devlink->ops->flash_update(devlink, file_name, NULL, NULL);
mutex_unlock(&devlink->lock);
-unlock_list:
- mutex_unlock(&devlink_mutex);
+out:
rtnl_lock();
dev_put(dev);
return ret;
}
+int devlink_compat_phys_port_name_get(struct net_device *dev,
+ char *name, size_t len)
+{
+ struct devlink_port *devlink_port;
+
+ /* RTNL mutex is held here which ensures that devlink_port
+ * instance cannot disappear in the middle. No need to take
+ * any devlink lock as only permanent values are accessed.
+ */
+ ASSERT_RTNL();
+
+ devlink_port = netdev_to_devlink_port(dev);
+ if (!devlink_port)
+ return -EOPNOTSUPP;
+
+ return __devlink_port_phys_port_name_get(devlink_port, name, len);
+}
+
+int devlink_compat_switch_id_get(struct net_device *dev,
+ struct netdev_phys_item_id *ppid)
+{
+ struct devlink_port *devlink_port;
+
+ /* RTNL mutex is held here which ensures that devlink_port
+ * instance cannot disappear in the middle. No need to take
+ * any devlink lock as only permanent values are accessed.
+ */
+ ASSERT_RTNL();
+ devlink_port = netdev_to_devlink_port(dev);
+ if (!devlink_port || !devlink_port->attrs.switch_port)
+ return -EOPNOTSUPP;
+
+ memcpy(ppid, &devlink_port->attrs.switch_id, sizeof(*ppid));
+
+ return 0;
+}
+
static int __init devlink_init(void)
{
return genl_register_family(&devlink_nl_family);
#include <net/dst.h>
#include <net/dst_metadata.h>
-/*
- * Theory of operations:
- * 1) We use a list, protected by a spinlock, to add
- * new entries from both BH and non-BH context.
- * 2) In order to keep spinlock held for a small delay,
- * we use a second list where are stored long lived
- * entries, that are handled by the garbage collect thread
- * fired by a workqueue.
- * 3) This list is guarded by a mutex,
- * so that the gc_task and dst_dev_event() can be synchronized.
- */
-
-/*
- * We want to keep lock & list close together
- * to dirty as few cache lines as possible in __dst_free().
- * As this is not a very strong hint, we dont force an alignment on SMP.
- */
int dst_discard_out(struct net *net, struct sock *sk, struct sk_buff *skb)
{
kfree_skb(skb);
phy_tunable_strings[__ETHTOOL_PHY_TUNABLE_COUNT][ETH_GSTRING_LEN] = {
[ETHTOOL_ID_UNSPEC] = "Unspec",
[ETHTOOL_PHY_DOWNSHIFT] = "phy-downshift",
+ [ETHTOOL_PHY_FAST_LINK_DOWN] = "phy-fast-link-down",
};
static int ethtool_get_features(struct net_device *dev, void __user *useraddr)
{
switch (tuna->id) {
case ETHTOOL_PHY_DOWNSHIFT:
+ case ETHTOOL_PHY_FAST_LINK_DOWN:
if (tuna->len != sizeof(u8) ||
tuna->type_id != ETHTOOL_TUNABLE_U8)
return -EINVAL;
#include <net/seg6.h>
#include <net/seg6_local.h>
#include <net/lwtunnel.h>
+#include <net/ipv6_stubs.h>
/**
* sk_filter_trim_cap - run a packet through a socket filter
{
int ret;
- if (unlikely(__this_cpu_read(xmit_recursion) > XMIT_RECURSION_LIMIT)) {
+ if (dev_xmit_recursion()) {
net_crit_ratelimited("bpf: recursion limit reached on datapath, buggy bpf program?\n");
kfree_skb(skb);
return -ENETDOWN;
skb->dev = dev;
- __this_cpu_inc(xmit_recursion);
+ dev_xmit_recursion_inc();
ret = dev_queue_xmit(skb);
- __this_cpu_dec(xmit_recursion);
+ dev_xmit_recursion_dec();
return ret;
}
}
}
-static int bpf_skb_net_grow(struct sk_buff *skb, u32 len_diff)
+#define BPF_F_ADJ_ROOM_ENCAP_L3_MASK (BPF_F_ADJ_ROOM_ENCAP_L3_IPV4 | \
+ BPF_F_ADJ_ROOM_ENCAP_L3_IPV6)
+
+#define BPF_F_ADJ_ROOM_MASK (BPF_F_ADJ_ROOM_FIXED_GSO | \
+ BPF_F_ADJ_ROOM_ENCAP_L3_MASK | \
+ BPF_F_ADJ_ROOM_ENCAP_L4_GRE | \
+ BPF_F_ADJ_ROOM_ENCAP_L4_UDP | \
+ BPF_F_ADJ_ROOM_ENCAP_L2( \
+ BPF_ADJ_ROOM_ENCAP_L2_MASK))
+
+static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff,
+ u64 flags)
{
- u32 off = skb_mac_header_len(skb) + bpf_skb_net_base_len(skb);
+ u8 inner_mac_len = flags >> BPF_ADJ_ROOM_ENCAP_L2_SHIFT;
+ bool encap = flags & BPF_F_ADJ_ROOM_ENCAP_L3_MASK;
+ u16 mac_len = 0, inner_net = 0, inner_trans = 0;
+ unsigned int gso_type = SKB_GSO_DODGY;
int ret;
- if (skb_is_gso(skb) && !skb_is_gso_tcp(skb))
- return -ENOTSUPP;
+ if (skb_is_gso(skb) && !skb_is_gso_tcp(skb)) {
+ /* udp gso_size delineates datagrams, only allow if fixed */
+ if (!(skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4) ||
+ !(flags & BPF_F_ADJ_ROOM_FIXED_GSO))
+ return -ENOTSUPP;
+ }
- ret = skb_cow(skb, len_diff);
+ ret = skb_cow_head(skb, len_diff);
if (unlikely(ret < 0))
return ret;
+ if (encap) {
+ if (skb->protocol != htons(ETH_P_IP) &&
+ skb->protocol != htons(ETH_P_IPV6))
+ return -ENOTSUPP;
+
+ if (flags & BPF_F_ADJ_ROOM_ENCAP_L3_IPV4 &&
+ flags & BPF_F_ADJ_ROOM_ENCAP_L3_IPV6)
+ return -EINVAL;
+
+ if (flags & BPF_F_ADJ_ROOM_ENCAP_L4_GRE &&
+ flags & BPF_F_ADJ_ROOM_ENCAP_L4_UDP)
+ return -EINVAL;
+
+ if (skb->encapsulation)
+ return -EALREADY;
+
+ mac_len = skb->network_header - skb->mac_header;
+ inner_net = skb->network_header;
+ if (inner_mac_len > len_diff)
+ return -EINVAL;
+ inner_trans = skb->transport_header;
+ }
+
ret = bpf_skb_net_hdr_push(skb, off, len_diff);
if (unlikely(ret < 0))
return ret;
+ if (encap) {
+ skb->inner_mac_header = inner_net - inner_mac_len;
+ skb->inner_network_header = inner_net;
+ skb->inner_transport_header = inner_trans;
+ skb_set_inner_protocol(skb, skb->protocol);
+
+ skb->encapsulation = 1;
+ skb_set_network_header(skb, mac_len);
+
+ if (flags & BPF_F_ADJ_ROOM_ENCAP_L4_UDP)
+ gso_type |= SKB_GSO_UDP_TUNNEL;
+ else if (flags & BPF_F_ADJ_ROOM_ENCAP_L4_GRE)
+ gso_type |= SKB_GSO_GRE;
+ else if (flags & BPF_F_ADJ_ROOM_ENCAP_L3_IPV6)
+ gso_type |= SKB_GSO_IPXIP6;
+ else if (flags & BPF_F_ADJ_ROOM_ENCAP_L3_IPV4)
+ gso_type |= SKB_GSO_IPXIP4;
+
+ if (flags & BPF_F_ADJ_ROOM_ENCAP_L4_GRE ||
+ flags & BPF_F_ADJ_ROOM_ENCAP_L4_UDP) {
+ int nh_len = flags & BPF_F_ADJ_ROOM_ENCAP_L3_IPV6 ?
+ sizeof(struct ipv6hdr) :
+ sizeof(struct iphdr);
+
+ skb_set_transport_header(skb, mac_len + nh_len);
+ }
+ }
+
if (skb_is_gso(skb)) {
struct skb_shared_info *shinfo = skb_shinfo(skb);
/* Due to header grow, MSS needs to be downgraded. */
- skb_decrease_gso_size(shinfo, len_diff);
+ if (!(flags & BPF_F_ADJ_ROOM_FIXED_GSO))
+ skb_decrease_gso_size(shinfo, len_diff);
+
/* Header must be checked, and gso_segs recomputed. */
- shinfo->gso_type |= SKB_GSO_DODGY;
+ shinfo->gso_type |= gso_type;
shinfo->gso_segs = 0;
}
return 0;
}
-static int bpf_skb_net_shrink(struct sk_buff *skb, u32 len_diff)
+static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff,
+ u64 flags)
{
- u32 off = skb_mac_header_len(skb) + bpf_skb_net_base_len(skb);
int ret;
- if (skb_is_gso(skb) && !skb_is_gso_tcp(skb))
- return -ENOTSUPP;
+ if (skb_is_gso(skb) && !skb_is_gso_tcp(skb)) {
+ /* udp gso_size delineates datagrams, only allow if fixed */
+ if (!(skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4) ||
+ !(flags & BPF_F_ADJ_ROOM_FIXED_GSO))
+ return -ENOTSUPP;
+ }
ret = skb_unclone(skb, GFP_ATOMIC);
if (unlikely(ret < 0))
struct skb_shared_info *shinfo = skb_shinfo(skb);
/* Due to header shrink, MSS can be upgraded. */
- skb_increase_gso_size(shinfo, len_diff);
+ if (!(flags & BPF_F_ADJ_ROOM_FIXED_GSO))
+ skb_increase_gso_size(shinfo, len_diff);
+
/* Header must be checked, and gso_segs recomputed. */
shinfo->gso_type |= SKB_GSO_DODGY;
shinfo->gso_segs = 0;
SKB_MAX_ALLOC;
}
-static int bpf_skb_adjust_net(struct sk_buff *skb, s32 len_diff)
+BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, s32, len_diff,
+ u32, mode, u64, flags)
{
- bool trans_same = skb->transport_header == skb->network_header;
u32 len_cur, len_diff_abs = abs(len_diff);
u32 len_min = bpf_skb_net_base_len(skb);
u32 len_max = __bpf_skb_max_len(skb);
__be16 proto = skb->protocol;
bool shrink = len_diff < 0;
+ u32 off;
int ret;
+ if (unlikely(flags & ~BPF_F_ADJ_ROOM_MASK))
+ return -EINVAL;
if (unlikely(len_diff_abs > 0xfffU))
return -EFAULT;
if (unlikely(proto != htons(ETH_P_IP) &&
proto != htons(ETH_P_IPV6)))
return -ENOTSUPP;
+ off = skb_mac_header_len(skb);
+ switch (mode) {
+ case BPF_ADJ_ROOM_NET:
+ off += bpf_skb_net_base_len(skb);
+ break;
+ case BPF_ADJ_ROOM_MAC:
+ break;
+ default:
+ return -ENOTSUPP;
+ }
+
len_cur = skb->len - skb_network_offset(skb);
- if (skb_transport_header_was_set(skb) && !trans_same)
- len_cur = skb_network_header_len(skb);
if ((shrink && (len_diff_abs >= len_cur ||
len_cur - len_diff_abs < len_min)) ||
(!shrink && (skb->len + len_diff_abs > len_max &&
!skb_is_gso(skb))))
return -ENOTSUPP;
- ret = shrink ? bpf_skb_net_shrink(skb, len_diff_abs) :
- bpf_skb_net_grow(skb, len_diff_abs);
+ ret = shrink ? bpf_skb_net_shrink(skb, off, len_diff_abs, flags) :
+ bpf_skb_net_grow(skb, off, len_diff_abs, flags);
bpf_compute_data_pointers(skb);
return ret;
}
-BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, s32, len_diff,
- u32, mode, u64, flags)
-{
- if (unlikely(flags))
- return -EINVAL;
- if (likely(mode == BPF_ADJ_ROOM_NET))
- return bpf_skb_adjust_net(skb, len_diff);
-
- return -ENOTSUPP;
-}
-
static const struct bpf_func_proto bpf_skb_adjust_room_proto = {
.func = bpf_skb_adjust_room,
.gpl_only = false,
static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
u32 flags, bool check_mtu)
{
+ struct fib_nh_common *nhc;
struct in_device *in_dev;
struct neighbour *neigh;
struct net_device *dev;
struct fib_result res;
- struct fib_nh *nh;
struct flowi4 fl4;
int err;
u32 mtu;
return BPF_FIB_LKUP_RET_FRAG_NEEDED;
}
- nh = &res.fi->fib_nh[res.nh_sel];
+ nhc = res.nhc;
/* do not handle lwt encaps right now */
- if (nh->nh_lwtstate)
+ if (nhc->nhc_lwtstate)
return BPF_FIB_LKUP_RET_UNSUPP_LWT;
- dev = nh->nh_dev;
- if (nh->nh_gw)
- params->ipv4_dst = nh->nh_gw;
+ dev = nhc->nhc_dev;
params->rt_metric = res.fi->fib_priority;
/* xdp and cls_bpf programs are run in RCU-bh so
* rcu_read_lock_bh is not needed here
*/
- neigh = __ipv4_neigh_lookup_noref(dev, (__force u32)params->ipv4_dst);
+ if (likely(nhc->nhc_gw_family != AF_INET6)) {
+ if (nhc->nhc_gw_family)
+ params->ipv4_dst = nhc->nhc_gw.ipv4;
+
+ neigh = __ipv4_neigh_lookup_noref(dev,
+ (__force u32)params->ipv4_dst);
+ } else {
+ struct in6_addr *dst = (struct in6_addr *)params->ipv6_dst;
+
+ params->family = AF_INET6;
+ *dst = nhc->nhc_gw.ipv6;
+ neigh = __ipv6_neigh_lookup_noref_stub(dev, dst);
+ }
+
if (!neigh)
return BPF_FIB_LKUP_RET_NO_NEIGH;
return BPF_FIB_LKUP_RET_FRAG_NEEDED;
}
- if (f6i->fib6_nh.nh_lwtstate)
+ if (f6i->fib6_nh.fib_nh_lws)
return BPF_FIB_LKUP_RET_UNSUPP_LWT;
- if (f6i->fib6_flags & RTF_GATEWAY)
- *dst = f6i->fib6_nh.nh_gw;
+ if (f6i->fib6_nh.fib_nh_gw_family)
+ *dst = f6i->fib6_nh.fib_nh_gw6;
- dev = f6i->fib6_nh.nh_dev;
+ dev = f6i->fib6_nh.fib_nh_dev;
params->rt_metric = f6i->fib6_metric;
/* xdp and cls_bpf programs are run in RCU-bh so rcu_read_lock_bh is
- * not needed here. Can not use __ipv6_neigh_lookup_noref here
- * because we need to get nd_tbl via the stub
+ * not needed here.
*/
- neigh = ___neigh_lookup_noref(ipv6_stub->nd_tbl, neigh_key_eq128,
- ndisc_hashfn, dst, dev);
+ neigh = __ipv6_neigh_lookup_noref_stub(dev, dst);
if (!neigh)
return BPF_FIB_LKUP_RET_NO_NEIGH;
return sk;
}
-/* bpf_sk_lookup performs the core lookup for different types of sockets,
+/* bpf_skc_lookup performs the core lookup for different types of sockets,
* taking a reference on the socket if it doesn't have the flag SOCK_RCU_FREE.
* Returns the socket as an 'unsigned long' to simplify the casting in the
* callers to satisfy BPF_CALL declarations.
*/
-static unsigned long
-__bpf_sk_lookup(struct sk_buff *skb, struct bpf_sock_tuple *tuple, u32 len,
- struct net *caller_net, u32 ifindex, u8 proto, u64 netns_id,
- u64 flags)
+static struct sock *
+__bpf_skc_lookup(struct sk_buff *skb, struct bpf_sock_tuple *tuple, u32 len,
+ struct net *caller_net, u32 ifindex, u8 proto, u64 netns_id,
+ u64 flags)
{
struct sock *sk = NULL;
u8 family = AF_UNSPEC;
put_net(net);
}
+out:
+ return sk;
+}
+
+static struct sock *
+__bpf_sk_lookup(struct sk_buff *skb, struct bpf_sock_tuple *tuple, u32 len,
+ struct net *caller_net, u32 ifindex, u8 proto, u64 netns_id,
+ u64 flags)
+{
+ struct sock *sk = __bpf_skc_lookup(skb, tuple, len, caller_net,
+ ifindex, proto, netns_id, flags);
+
if (sk)
sk = sk_to_full_sk(sk);
-out:
- return (unsigned long) sk;
+
+ return sk;
}
-static unsigned long
-bpf_sk_lookup(struct sk_buff *skb, struct bpf_sock_tuple *tuple, u32 len,
- u8 proto, u64 netns_id, u64 flags)
+static struct sock *
+bpf_skc_lookup(struct sk_buff *skb, struct bpf_sock_tuple *tuple, u32 len,
+ u8 proto, u64 netns_id, u64 flags)
{
struct net *caller_net;
int ifindex;
ifindex = 0;
}
- return __bpf_sk_lookup(skb, tuple, len, caller_net, ifindex,
- proto, netns_id, flags);
+ return __bpf_skc_lookup(skb, tuple, len, caller_net, ifindex, proto,
+ netns_id, flags);
}
+static struct sock *
+bpf_sk_lookup(struct sk_buff *skb, struct bpf_sock_tuple *tuple, u32 len,
+ u8 proto, u64 netns_id, u64 flags)
+{
+ struct sock *sk = bpf_skc_lookup(skb, tuple, len, proto, netns_id,
+ flags);
+
+ if (sk)
+ sk = sk_to_full_sk(sk);
+
+ return sk;
+}
+
+BPF_CALL_5(bpf_skc_lookup_tcp, struct sk_buff *, skb,
+ struct bpf_sock_tuple *, tuple, u32, len, u64, netns_id, u64, flags)
+{
+ return (unsigned long)bpf_skc_lookup(skb, tuple, len, IPPROTO_TCP,
+ netns_id, flags);
+}
+
+static const struct bpf_func_proto bpf_skc_lookup_tcp_proto = {
+ .func = bpf_skc_lookup_tcp,
+ .gpl_only = false,
+ .pkt_access = true,
+ .ret_type = RET_PTR_TO_SOCK_COMMON_OR_NULL,
+ .arg1_type = ARG_PTR_TO_CTX,
+ .arg2_type = ARG_PTR_TO_MEM,
+ .arg3_type = ARG_CONST_SIZE,
+ .arg4_type = ARG_ANYTHING,
+ .arg5_type = ARG_ANYTHING,
+};
+
BPF_CALL_5(bpf_sk_lookup_tcp, struct sk_buff *, skb,
struct bpf_sock_tuple *, tuple, u32, len, u64, netns_id, u64, flags)
{
- return bpf_sk_lookup(skb, tuple, len, IPPROTO_TCP, netns_id, flags);
+ return (unsigned long)bpf_sk_lookup(skb, tuple, len, IPPROTO_TCP,
+ netns_id, flags);
}
static const struct bpf_func_proto bpf_sk_lookup_tcp_proto = {
BPF_CALL_5(bpf_sk_lookup_udp, struct sk_buff *, skb,
struct bpf_sock_tuple *, tuple, u32, len, u64, netns_id, u64, flags)
{
- return bpf_sk_lookup(skb, tuple, len, IPPROTO_UDP, netns_id, flags);
+ return (unsigned long)bpf_sk_lookup(skb, tuple, len, IPPROTO_UDP,
+ netns_id, flags);
}
static const struct bpf_func_proto bpf_sk_lookup_udp_proto = {
struct net *caller_net = dev_net(ctx->rxq->dev);
int ifindex = ctx->rxq->dev->ifindex;
- return __bpf_sk_lookup(NULL, tuple, len, caller_net, ifindex,
- IPPROTO_UDP, netns_id, flags);
+ return (unsigned long)__bpf_sk_lookup(NULL, tuple, len, caller_net,
+ ifindex, IPPROTO_UDP, netns_id,
+ flags);
}
static const struct bpf_func_proto bpf_xdp_sk_lookup_udp_proto = {
.arg5_type = ARG_ANYTHING,
};
+BPF_CALL_5(bpf_xdp_skc_lookup_tcp, struct xdp_buff *, ctx,
+ struct bpf_sock_tuple *, tuple, u32, len, u32, netns_id, u64, flags)
+{
+ struct net *caller_net = dev_net(ctx->rxq->dev);
+ int ifindex = ctx->rxq->dev->ifindex;
+
+ return (unsigned long)__bpf_skc_lookup(NULL, tuple, len, caller_net,
+ ifindex, IPPROTO_TCP, netns_id,
+ flags);
+}
+
+static const struct bpf_func_proto bpf_xdp_skc_lookup_tcp_proto = {
+ .func = bpf_xdp_skc_lookup_tcp,
+ .gpl_only = false,
+ .pkt_access = true,
+ .ret_type = RET_PTR_TO_SOCK_COMMON_OR_NULL,
+ .arg1_type = ARG_PTR_TO_CTX,
+ .arg2_type = ARG_PTR_TO_MEM,
+ .arg3_type = ARG_CONST_SIZE,
+ .arg4_type = ARG_ANYTHING,
+ .arg5_type = ARG_ANYTHING,
+};
+
BPF_CALL_5(bpf_xdp_sk_lookup_tcp, struct xdp_buff *, ctx,
struct bpf_sock_tuple *, tuple, u32, len, u32, netns_id, u64, flags)
{
struct net *caller_net = dev_net(ctx->rxq->dev);
int ifindex = ctx->rxq->dev->ifindex;
- return __bpf_sk_lookup(NULL, tuple, len, caller_net, ifindex,
- IPPROTO_TCP, netns_id, flags);
+ return (unsigned long)__bpf_sk_lookup(NULL, tuple, len, caller_net,
+ ifindex, IPPROTO_TCP, netns_id,
+ flags);
}
static const struct bpf_func_proto bpf_xdp_sk_lookup_tcp_proto = {
.arg5_type = ARG_ANYTHING,
};
+BPF_CALL_5(bpf_sock_addr_skc_lookup_tcp, struct bpf_sock_addr_kern *, ctx,
+ struct bpf_sock_tuple *, tuple, u32, len, u64, netns_id, u64, flags)
+{
+ return (unsigned long)__bpf_skc_lookup(NULL, tuple, len,
+ sock_net(ctx->sk), 0,
+ IPPROTO_TCP, netns_id, flags);
+}
+
+static const struct bpf_func_proto bpf_sock_addr_skc_lookup_tcp_proto = {
+ .func = bpf_sock_addr_skc_lookup_tcp,
+ .gpl_only = false,
+ .ret_type = RET_PTR_TO_SOCK_COMMON_OR_NULL,
+ .arg1_type = ARG_PTR_TO_CTX,
+ .arg2_type = ARG_PTR_TO_MEM,
+ .arg3_type = ARG_CONST_SIZE,
+ .arg4_type = ARG_ANYTHING,
+ .arg5_type = ARG_ANYTHING,
+};
+
BPF_CALL_5(bpf_sock_addr_sk_lookup_tcp, struct bpf_sock_addr_kern *, ctx,
struct bpf_sock_tuple *, tuple, u32, len, u64, netns_id, u64, flags)
{
- return __bpf_sk_lookup(NULL, tuple, len, sock_net(ctx->sk), 0,
- IPPROTO_TCP, netns_id, flags);
+ return (unsigned long)__bpf_sk_lookup(NULL, tuple, len,
+ sock_net(ctx->sk), 0, IPPROTO_TCP,
+ netns_id, flags);
}
static const struct bpf_func_proto bpf_sock_addr_sk_lookup_tcp_proto = {
BPF_CALL_5(bpf_sock_addr_sk_lookup_udp, struct bpf_sock_addr_kern *, ctx,
struct bpf_sock_tuple *, tuple, u32, len, u64, netns_id, u64, flags)
{
- return __bpf_sk_lookup(NULL, tuple, len, sock_net(ctx->sk), 0,
- IPPROTO_UDP, netns_id, flags);
+ return (unsigned long)__bpf_sk_lookup(NULL, tuple, len,
+ sock_net(ctx->sk), 0, IPPROTO_UDP,
+ netns_id, flags);
}
static const struct bpf_func_proto bpf_sock_addr_sk_lookup_udp_proto = {
.ret_type = RET_INTEGER,
.arg1_type = ARG_PTR_TO_CTX,
};
+
+BPF_CALL_5(bpf_tcp_check_syncookie, struct sock *, sk, void *, iph, u32, iph_len,
+ struct tcphdr *, th, u32, th_len)
+{
+#ifdef CONFIG_SYN_COOKIES
+ u32 cookie;
+ int ret;
+
+ if (unlikely(th_len < sizeof(*th)))
+ return -EINVAL;
+
+ /* sk_listener() allows TCP_NEW_SYN_RECV, which makes no sense here. */
+ if (sk->sk_protocol != IPPROTO_TCP || sk->sk_state != TCP_LISTEN)
+ return -EINVAL;
+
+ if (!sock_net(sk)->ipv4.sysctl_tcp_syncookies)
+ return -EINVAL;
+
+ if (!th->ack || th->rst || th->syn)
+ return -ENOENT;
+
+ if (tcp_synq_no_recent_overflow(sk))
+ return -ENOENT;
+
+ cookie = ntohl(th->ack_seq) - 1;
+
+ switch (sk->sk_family) {
+ case AF_INET:
+ if (unlikely(iph_len < sizeof(struct iphdr)))
+ return -EINVAL;
+
+ ret = __cookie_v4_check((struct iphdr *)iph, th, cookie);
+ break;
+
+#if IS_BUILTIN(CONFIG_IPV6)
+ case AF_INET6:
+ if (unlikely(iph_len < sizeof(struct ipv6hdr)))
+ return -EINVAL;
+
+ ret = __cookie_v6_check((struct ipv6hdr *)iph, th, cookie);
+ break;
+#endif /* CONFIG_IPV6 */
+
+ default:
+ return -EPROTONOSUPPORT;
+ }
+
+ if (ret > 0)
+ return 0;
+
+ return -ENOENT;
+#else
+ return -ENOTSUPP;
+#endif
+}
+
+static const struct bpf_func_proto bpf_tcp_check_syncookie_proto = {
+ .func = bpf_tcp_check_syncookie,
+ .gpl_only = true,
+ .pkt_access = true,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_PTR_TO_SOCK_COMMON,
+ .arg2_type = ARG_PTR_TO_MEM,
+ .arg3_type = ARG_CONST_SIZE,
+ .arg4_type = ARG_PTR_TO_MEM,
+ .arg5_type = ARG_CONST_SIZE,
+};
+
#endif /* CONFIG_INET */
bool bpf_helper_changes_pkt_data(void *func)
return &bpf_sock_addr_sk_lookup_udp_proto;
case BPF_FUNC_sk_release:
return &bpf_sk_release_proto;
+ case BPF_FUNC_skc_lookup_tcp:
+ return &bpf_sock_addr_skc_lookup_tcp_proto;
#endif /* CONFIG_INET */
default:
return bpf_base_func_proto(func_id);
return &bpf_tcp_sock_proto;
case BPF_FUNC_get_listener_sock:
return &bpf_get_listener_sock_proto;
+ case BPF_FUNC_skc_lookup_tcp:
+ return &bpf_skc_lookup_tcp_proto;
+ case BPF_FUNC_tcp_check_syncookie:
+ return &bpf_tcp_check_syncookie_proto;
+ case BPF_FUNC_skb_ecn_set_ce:
+ return &bpf_skb_ecn_set_ce_proto;
#endif
default:
return bpf_base_func_proto(func_id);
return &bpf_xdp_sk_lookup_tcp_proto;
case BPF_FUNC_sk_release:
return &bpf_sk_release_proto;
+ case BPF_FUNC_skc_lookup_tcp:
+ return &bpf_xdp_skc_lookup_tcp_proto;
+ case BPF_FUNC_tcp_check_syncookie:
+ return &bpf_tcp_check_syncookie_proto;
#endif
default:
return bpf_base_func_proto(func_id);
return &bpf_sk_lookup_udp_proto;
case BPF_FUNC_sk_release:
return &bpf_sk_release_proto;
+ case BPF_FUNC_skc_lookup_tcp:
+ return &bpf_skc_lookup_tcp_proto;
#endif
default:
return bpf_base_func_proto(func_id);
* @proto: protocol for which to get the flow, if @data is NULL use skb->protocol
* @nhoff: network header offset, if @data is NULL use skb_network_offset(skb)
* @hlen: packet header length, if @data is NULL use skb_headlen(skb)
+ * @flags: flags that control the dissection process, e.g.
+ * FLOW_DISSECTOR_F_STOP_AT_L3.
*
* The function will try to retrieve individual keys into target specified
* by flow_dissector from either the skbuff or a raw buffer specified by the
for_each_possible_cpu(i) {
const struct gnet_stats_queue *qcpu = per_cpu_ptr(q, i);
+ qstats->qlen = 0;
qstats->backlog += qcpu->backlog;
qstats->drops += qcpu->drops;
qstats->requeues += qcpu->requeues;
if (cpu) {
__gnet_stats_copy_queue_cpu(qstats, cpu);
} else {
+ qstats->qlen = q->qlen;
qstats->backlog = q->backlog;
qstats->drops = q->drops;
qstats->requeues = q->requeues;
#include <net/lwtunnel.h>
#include <net/gre.h>
#include <net/ip6_route.h>
+#include <net/ipv6_stubs.h>
struct bpf_lwt_prog {
struct bpf_prog *prog;
rcu_assign_pointer(queue->rps_map, map);
if (map)
- static_key_slow_inc(&rps_needed);
+ static_branch_inc(&rps_needed);
if (old_map)
- static_key_slow_dec(&rps_needed);
+ static_branch_dec(&rps_needed);
mutex_unlock(&rps_map_mutex);
peer = get_net_ns_by_fd(nla_get_u32(tb[NETNSA_FD]));
nla = tb[NETNSA_FD];
} else if (tb[NETNSA_NSID]) {
- peer = get_net_ns_by_id(net, nla_get_u32(tb[NETNSA_NSID]));
+ peer = get_net_ns_by_id(net, nla_get_s32(tb[NETNSA_NSID]));
if (!peer)
peer = ERR_PTR(-ENOENT);
nla = tb[NETNSA_NSID];
if (skb_queue_len(&npinfo->txq) == 0 && !netpoll_owner_active(dev)) {
struct netdev_queue *txq;
- txq = netdev_pick_tx(dev, skb, NULL);
+ txq = netdev_core_pick_tx(dev, skb, NULL);
/* try until next clock tick */
for (tries = jiffies_to_usecs(1)/USEC_PER_POLL;
rtnl_set_sk_err(net, RTNLGRP_NEIGH, err);
}
-/**
+/*
* ndo_dflt_fdb_add - default netdevice operation to add an FDB entry
*/
int ndo_dflt_fdb_add(struct ndmsg *ndm,
return err;
}
-/**
+/*
* ndo_dflt_fdb_del - default netdevice operation to delete an FDB entry
*/
int ndo_dflt_fdb_del(struct ndmsg *ndm,
/**
* ndo_dflt_fdb_dump - default netdevice operation to dump an FDB table.
- * @nlh: netlink message header
+ * @skb: socket buffer to store message in
+ * @cb: netlink callback
* @dev: netdevice
+ * @filter_dev: ignored
+ * @idx: the number of FDB table entries dumped is added to *@idx
*
* Default netdevice operation to dump the existing unicast address list.
* Returns number of addresses from list put in skb.
#include <linux/capability.h>
#include <linux/user_namespace.h>
+#include "datagram.h"
+
struct kmem_cache *skbuff_head_cache __ro_after_init;
static struct kmem_cache *skbuff_fclone_cache __ro_after_init;
#ifdef CONFIG_SKB_EXTENSIONS
}
EXPORT_SYMBOL_GPL(sock_zerocopy_put_abort);
-extern int __zerocopy_sg_from_iter(struct sock *sk, struct sk_buff *skb,
- struct iov_iter *from, size_t length);
-
int skb_zerocopy_iter_dgram(struct sk_buff *skb, struct msghdr *msg, int len)
{
return __zerocopy_sg_from_iter(skb->sk, skb, &msg->msg_iter, len);
* reuseport_add_sock - Add a socket to the reuseport group of another.
* @sk: New socket to add to the group.
* @sk2: Socket belonging to the existing reuseport group.
+ * @bind_inany: Whether or not the group is bound to a local INANY address.
+ *
* May return ENOMEM and not add socket to group under memory pressure.
*/
int reuseport_add_sock(struct sock *sk, struct sock *sk2, bool bind_inany)
if (sock_table != orig_sock_table) {
rcu_assign_pointer(rps_sock_flow_table, sock_table);
if (sock_table) {
- static_key_slow_inc(&rps_needed);
- static_key_slow_inc(&rfs_needed);
+ static_branch_inc(&rps_needed);
+ static_branch_inc(&rfs_needed);
}
if (orig_sock_table) {
- static_key_slow_dec(&rps_needed);
- static_key_slow_dec(&rfs_needed);
+ static_branch_dec(&rps_needed);
+ static_branch_dec(&rfs_needed);
synchronize_rcu();
vfree(orig_sock_table);
}
skb_queue_purge(&scp->other_xmit_queue);
skb_queue_purge(&scp->other_receive_queue);
- dst_release(rcu_dereference_check(sk->sk_dst_cache, 1));
+ dst_release(rcu_dereference_protected(sk->sk_dst_cache, 1));
}
static unsigned long dn_memory_pressure;
desclen += typelen + 1;
}
- if (!namelen)
- namelen = strnlen(name, 256);
if (namelen < 3 || namelen > 255)
return -EINVAL;
desclen += namelen + 1;
depends on BRIDGE || BRIDGE=n
select NET_SWITCHDEV
select PHYLINK
+ select NET_DEVLINK
---help---
Say Y if you want to enable support for the hardware switches supported
by the Distributed Switch Architecture.
#include <linux/rtnetlink.h>
#include <linux/of.h>
#include <linux/of_net.h>
+#include <net/devlink.h>
#include "dsa_priv.h"
static int dsa_port_setup(struct dsa_port *dp)
{
+ enum devlink_port_flavour flavour;
struct dsa_switch *ds = dp->ds;
- int err = 0;
+ struct dsa_switch_tree *dst = ds->dst;
+ int err;
+
+ if (dp->type == DSA_PORT_TYPE_UNUSED)
+ return 0;
memset(&dp->devlink_port, 0, sizeof(dp->devlink_port));
+ dp->mac = of_get_mac_address(dp->dn);
- if (dp->type != DSA_PORT_TYPE_UNUSED)
- err = devlink_port_register(ds->devlink, &dp->devlink_port,
- dp->index);
+ switch (dp->type) {
+ case DSA_PORT_TYPE_CPU:
+ flavour = DEVLINK_PORT_FLAVOUR_CPU;
+ break;
+ case DSA_PORT_TYPE_DSA:
+ flavour = DEVLINK_PORT_FLAVOUR_DSA;
+ break;
+ case DSA_PORT_TYPE_USER: /* fall-through */
+ default:
+ flavour = DEVLINK_PORT_FLAVOUR_PHYSICAL;
+ break;
+ }
+
+ /* dp->index is used now as port_number. However
+ * CPU and DSA ports should have separate numbering
+ * independent from front panel port numbers.
+ */
+ devlink_port_attrs_set(&dp->devlink_port, flavour,
+ dp->index, false, 0,
+ (const char *) &dst->index, sizeof(dst->index));
+ err = devlink_port_register(ds->devlink, &dp->devlink_port,
+ dp->index);
if (err)
return err;
case DSA_PORT_TYPE_UNUSED:
break;
case DSA_PORT_TYPE_CPU:
- /* dp->index is used now as port_number. However
- * CPU ports should have separate numbering
- * independent from front panel port numbers.
- */
- devlink_port_attrs_set(&dp->devlink_port,
- DEVLINK_PORT_FLAVOUR_CPU,
- dp->index, false, 0);
err = dsa_port_link_register_of(dp);
if (err) {
dev_err(ds->dev, "failed to setup link for port %d.%d\n",
}
break;
case DSA_PORT_TYPE_DSA:
- /* dp->index is used now as port_number. However
- * DSA ports should have separate numbering
- * independent from front panel port numbers.
- */
- devlink_port_attrs_set(&dp->devlink_port,
- DEVLINK_PORT_FLAVOUR_DSA,
- dp->index, false, 0);
err = dsa_port_link_register_of(dp);
if (err) {
dev_err(ds->dev, "failed to setup link for port %d.%d\n",
}
break;
case DSA_PORT_TYPE_USER:
- devlink_port_attrs_set(&dp->devlink_port,
- DEVLINK_PORT_FLAVOUR_PHYSICAL,
- dp->index, false, 0);
err = dsa_slave_create(dp);
if (err)
dev_err(ds->dev, "failed to create slave for port %d.%d\n",
struct dsa_switch *ds = dp->ds;
struct dsa_switch_tree *dst = ds->dst;
+ /* For non-legacy ports, devlink is used and it takes
+ * care of the name generation. This ndo implementation
+ * should be removed with legacy support.
+ */
+ if (dp->ds->devlink)
+ return -EOPNOTSUPP;
+
ppid->id_len = sizeof(dst->index);
memcpy(&ppid->id, &dst->index, ppid->id_len);
{
struct dsa_port *dp = dsa_slave_to_port(dev);
+ /* For non-legacy ports, devlink is used and it takes
+ * care of the name generation. This ndo implementation
+ * should be removed with legacy support.
+ */
+ if (dp->ds->devlink)
+ return -EOPNOTSUPP;
+
if (snprintf(name, len, "p%d", dp->index) >= len)
return -EINVAL;
return dsa_port_fdb_del(dp, addr, vid);
}
+static struct devlink_port *dsa_slave_get_devlink_port(struct net_device *dev)
+{
+ struct dsa_port *dp = dsa_slave_to_port(dev);
+
+ return dp->ds->devlink ? &dp->devlink_port : NULL;
+}
+
static const struct net_device_ops dsa_slave_netdev_ops = {
.ndo_open = dsa_slave_open,
.ndo_stop = dsa_slave_close,
.ndo_get_port_parent_id = dsa_slave_get_port_parent_id,
.ndo_vlan_rx_add_vid = dsa_slave_vlan_rx_add_vid,
.ndo_vlan_rx_kill_vid = dsa_slave_vlan_rx_kill_vid,
+ .ndo_get_devlink_port = dsa_slave_get_devlink_port,
};
static struct device_type dsa_type = {
phy_flags = ds->ops->get_phy_flags(ds, dp->index);
ret = phylink_of_phy_connect(dp->pl, port_dn, phy_flags);
- if (ret == -ENODEV) {
- /* We could not connect to a designated PHY or SFP, so use the
- * switch internal MDIO bus instead
+ if (ret == -ENODEV && ds->slave_mii_bus) {
+ /* We could not connect to a designated PHY or SFP, so try to
+ * use the switch internal MDIO bus instead
*/
ret = dsa_slave_phy_connect(slave_dev, dp->index);
if (ret) {
}
}
- return 0;
+ return ret;
}
static struct lock_class_key dsa_slave_netdev_xmit_lock_key;
NETIF_F_HW_VLAN_CTAG_FILTER;
slave_dev->hw_features |= NETIF_F_HW_TC;
slave_dev->ethtool_ops = &dsa_slave_ethtool_ops;
- eth_hw_addr_inherit(slave_dev, master);
+ if (port->mac && is_valid_ether_addr(port->mac))
+ ether_addr_copy(slave_dev->dev_addr, port->mac);
+ else
+ eth_hw_addr_inherit(slave_dev, master);
slave_dev->priv_flags |= IFF_NO_QUEUE;
slave_dev->netdev_ops = &dsa_slave_netdev_ops;
slave_dev->min_mtu = 0;
hsr-y := hsr_main.o hsr_framereg.o hsr_device.o \
hsr_netlink.o hsr_slave.o hsr_forward.o
+hsr-$(CONFIG_DEBUG_FS) += hsr_debugfs.o
--- /dev/null
+/*
+ * hsr_debugfs code
+ * Copyright (C) 2019 Texas Instruments Incorporated
+ *
+ * Author(s):
+ * Murali Karicheri <m-karicheri2@ti.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation version 2.
+ *
+ * This program is distributed "as is" WITHOUT ANY WARRANTY of any
+ * kind, whether express or implied; without even the implied warranty
+ * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+#include <linux/module.h>
+#include <linux/errno.h>
+#include <linux/debugfs.h>
+#include "hsr_main.h"
+#include "hsr_framereg.h"
+
+static void print_mac_address(struct seq_file *sfp, unsigned char *mac)
+{
+ seq_printf(sfp, "%02x:%02x:%02x:%02x:%02x:%02x:",
+ mac[0], mac[1], mac[2], mac[3], mac[4], mac[5]);
+}
+
+/* hsr_node_table_show - Formats and prints node_table entries */
+static int
+hsr_node_table_show(struct seq_file *sfp, void *data)
+{
+ struct hsr_priv *priv = (struct hsr_priv *)sfp->private;
+ struct hsr_node *node;
+
+ seq_puts(sfp, "Node Table entries\n");
+ seq_puts(sfp, "MAC-Address-A, MAC-Address-B, time_in[A], ");
+ seq_puts(sfp, "time_in[B], Address-B port\n");
+ rcu_read_lock();
+ list_for_each_entry_rcu(node, &priv->node_db, mac_list) {
+ /* skip self node */
+ if (hsr_addr_is_self(priv, node->macaddress_A))
+ continue;
+ print_mac_address(sfp, &node->macaddress_A[0]);
+ seq_puts(sfp, " ");
+ print_mac_address(sfp, &node->macaddress_B[0]);
+ seq_printf(sfp, "0x%lx, ", node->time_in[HSR_PT_SLAVE_A]);
+ seq_printf(sfp, "0x%lx ", node->time_in[HSR_PT_SLAVE_B]);
+ seq_printf(sfp, "0x%x\n", node->addr_B_port);
+ }
+ rcu_read_unlock();
+ return 0;
+}
+
+/* hsr_node_table_open - Open the node_table file
+ *
+ * Description:
+ * This routine opens a debugfs file node_table of specific hsr device
+ */
+static int
+hsr_node_table_open(struct inode *inode, struct file *filp)
+{
+ return single_open(filp, hsr_node_table_show, inode->i_private);
+}
+
+static const struct file_operations hsr_fops = {
+ .owner = THIS_MODULE,
+ .open = hsr_node_table_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = single_release,
+};
+
+/* hsr_debugfs_init - create hsr node_table file for dumping
+ * the node table
+ *
+ * Description:
+ * When debugfs is configured this routine sets up the node_table file per
+ * hsr device for dumping the node_table entries
+ */
+int hsr_debugfs_init(struct hsr_priv *priv, struct net_device *hsr_dev)
+{
+ int rc = -1;
+ struct dentry *de = NULL;
+
+ de = debugfs_create_dir(hsr_dev->name, NULL);
+ if (!de) {
+ pr_err("Cannot create hsr debugfs root\n");
+ return rc;
+ }
+
+ priv->node_tbl_root = de;
+
+ de = debugfs_create_file("node_table", S_IFREG | 0444,
+ priv->node_tbl_root, priv,
+ &hsr_fops);
+ if (!de) {
+ pr_err("Cannot create hsr node_table directory\n");
+ return rc;
+ }
+ priv->node_tbl_file = de;
+
+ return 0;
+}
+
+/* hsr_debugfs_term - Tear down debugfs intrastructure
+ *
+ * Description:
+ * When Debufs is configured this routine removes debugfs file system
+ * elements that are specific to hsr
+ */
+void
+hsr_debugfs_term(struct hsr_priv *priv)
+{
+ debugfs_remove(priv->node_tbl_file);
+ priv->node_tbl_file = NULL;
+ debugfs_remove(priv->node_tbl_root);
+ priv->node_tbl_root = NULL;
+}
+// SPDX-License-Identifier: GPL-2.0
/* Copyright 2011-2014 Autronica Fire and Security AS
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms of the GNU General Public License as published by the Free
- * Software Foundation; either version 2 of the License, or (at your option)
- * any later version.
*
* Author(s):
* 2011-2014 Arvid Brodin, arvid.brodin@alten.se
#include "hsr_main.h"
#include "hsr_forward.h"
-
static bool is_admin_up(struct net_device *dev)
{
return dev && (dev->flags & IFF_UP);
rcu_read_lock();
hsr_for_each_port(master->hsr, port)
- if ((port->type != HSR_PT_MASTER) && is_slave_up(port->dev)) {
+ if (port->type != HSR_PT_MASTER && is_slave_up(port->dev)) {
has_carrier = true;
break;
}
return has_carrier;
}
-
static void hsr_check_announce(struct net_device *hsr_dev,
unsigned char old_operstate)
{
hsr = netdev_priv(hsr_dev);
- if ((hsr_dev->operstate == IF_OPER_UP)
- && (old_operstate != IF_OPER_UP)) {
+ if (hsr_dev->operstate == IF_OPER_UP && old_operstate != IF_OPER_UP) {
/* Went up */
hsr->announce_count = 0;
mod_timer(&hsr->announce_timer,
jiffies + msecs_to_jiffies(HSR_ANNOUNCE_INTERVAL));
}
- if ((hsr_dev->operstate != IF_OPER_UP) && (old_operstate == IF_OPER_UP))
+ if (hsr_dev->operstate != IF_OPER_UP && old_operstate == IF_OPER_UP)
/* Went down */
del_timer(&hsr->announce_timer);
}
return mtu_max - HSR_HLEN;
}
-
static int hsr_dev_change_mtu(struct net_device *dev, int new_mtu)
{
struct hsr_priv *hsr;
return 0;
}
-
static int hsr_dev_close(struct net_device *dev)
{
/* Nothing to do here. */
return 0;
}
-
static netdev_features_t hsr_features_recompute(struct hsr_priv *hsr,
netdev_features_t features)
{
return hsr_features_recompute(hsr, features);
}
-
static int hsr_dev_xmit(struct sk_buff *skb, struct net_device *dev)
{
struct hsr_priv *hsr = netdev_priv(dev);
return NETDEV_TX_OK;
}
-
static const struct header_ops hsr_header_ops = {
.create = eth_header,
.parse = eth_header_parse,
};
static void send_hsr_supervision_frame(struct hsr_port *master,
- u8 type, u8 hsrVer)
+ u8 type, u8 hsr_ver)
{
struct sk_buff *skb;
int hlen, tlen;
hlen = LL_RESERVED_SPACE(master->dev);
tlen = master->dev->needed_tailroom;
- skb = dev_alloc_skb(
- sizeof(struct hsr_tag) +
- sizeof(struct hsr_sup_tag) +
- sizeof(struct hsr_sup_payload) + hlen + tlen);
+ skb = dev_alloc_skb(sizeof(struct hsr_tag) +
+ sizeof(struct hsr_sup_tag) +
+ sizeof(struct hsr_sup_payload) + hlen + tlen);
- if (skb == NULL)
+ if (!skb)
return;
skb_reserve(skb, hlen);
skb->dev = master->dev;
- skb->protocol = htons(hsrVer ? ETH_P_HSR : ETH_P_PRP);
+ skb->protocol = htons(hsr_ver ? ETH_P_HSR : ETH_P_PRP);
skb->priority = TC_PRIO_CONTROL;
- if (dev_hard_header(skb, skb->dev, (hsrVer ? ETH_P_HSR : ETH_P_PRP),
+ if (dev_hard_header(skb, skb->dev, (hsr_ver ? ETH_P_HSR : ETH_P_PRP),
master->hsr->sup_multicast_addr,
skb->dev->dev_addr, skb->len) <= 0)
goto out;
skb_reset_mac_header(skb);
- if (hsrVer > 0) {
+ if (hsr_ver > 0) {
hsr_tag = skb_put(skb, sizeof(struct hsr_tag));
hsr_tag->encap_proto = htons(ETH_P_PRP);
set_hsr_tag_LSDU_size(hsr_tag, HSR_V1_SUP_LSDUSIZE);
}
hsr_stag = skb_put(skb, sizeof(struct hsr_sup_tag));
- set_hsr_stag_path(hsr_stag, (hsrVer ? 0x0 : 0xf));
- set_hsr_stag_HSR_Ver(hsr_stag, hsrVer);
+ set_hsr_stag_path(hsr_stag, (hsr_ver ? 0x0 : 0xf));
+ set_hsr_stag_HSR_ver(hsr_stag, hsr_ver);
/* From HSRv1 on we have separate supervision sequence numbers. */
spin_lock_irqsave(&master->hsr->seqnr_lock, irqflags);
- if (hsrVer > 0) {
+ if (hsr_ver > 0) {
hsr_stag->sequence_nr = htons(master->hsr->sup_sequence_nr);
hsr_tag->sequence_nr = htons(master->hsr->sequence_nr);
master->hsr->sup_sequence_nr++;
}
spin_unlock_irqrestore(&master->hsr->seqnr_lock, irqflags);
- hsr_stag->HSR_TLV_Type = type;
+ hsr_stag->HSR_TLV_type = type;
/* TODO: Why 12 in HSRv0? */
- hsr_stag->HSR_TLV_Length = hsrVer ? sizeof(struct hsr_sup_payload) : 12;
+ hsr_stag->HSR_TLV_length =
+ hsr_ver ? sizeof(struct hsr_sup_payload) : 12;
/* Payload: MacAddressA */
hsr_sp = skb_put(skb, sizeof(struct hsr_sup_payload));
- ether_addr_copy(hsr_sp->MacAddressA, master->dev->dev_addr);
+ ether_addr_copy(hsr_sp->macaddress_A, master->dev->dev_addr);
if (skb_put_padto(skb, ETH_ZLEN + HSR_HLEN))
return;
kfree_skb(skb);
}
-
/* Announce (supervision frame) timer function
*/
static void hsr_announce(struct timer_list *t)
rcu_read_lock();
master = hsr_port_get_hsr(hsr, HSR_PT_MASTER);
- if (hsr->announce_count < 3 && hsr->protVersion == 0) {
+ if (hsr->announce_count < 3 && hsr->prot_version == 0) {
send_hsr_supervision_frame(master, HSR_TLV_ANNOUNCE,
- hsr->protVersion);
+ hsr->prot_version);
hsr->announce_count++;
interval = msecs_to_jiffies(HSR_ANNOUNCE_INTERVAL);
} else {
send_hsr_supervision_frame(master, HSR_TLV_LIFE_CHECK,
- hsr->protVersion);
+ hsr->prot_version);
interval = msecs_to_jiffies(HSR_LIFE_CHECK_INTERVAL);
}
rcu_read_unlock();
}
-
/* According to comments in the declaration of struct net_device, this function
* is "Called from unregister, can be used to call free_netdev". Ok then...
*/
hsr = netdev_priv(hsr_dev);
+ hsr_debugfs_term(hsr);
+
rtnl_lock();
hsr_for_each_port(hsr, port)
hsr_del_port(port);
dev->features |= NETIF_F_NETNS_LOCAL;
}
-
/* Return true if dev is a HSR master; return false otherwise.
*/
inline bool is_hsr_master(struct net_device *dev)
ether_addr_copy(hsr->sup_multicast_addr, def_multicast_addr);
hsr->sup_multicast_addr[ETH_ALEN - 1] = multicast_spec;
- hsr->protVersion = protocol_version;
+ hsr->prot_version = protocol_version;
/* FIXME: should I modify the value of these?
*
goto fail;
mod_timer(&hsr->prune_timer, jiffies + msecs_to_jiffies(PRUNE_PERIOD));
+ res = hsr_debugfs_init(hsr, hsr_dev);
+ if (res)
+ goto fail;
return 0;
+/* SPDX-License-Identifier: GPL-2.0 */
/* Copyright 2011-2014 Autronica Fire and Security AS
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms of the GNU General Public License as published by the Free
- * Software Foundation; either version 2 of the License, or (at your option)
- * any later version.
*
* Author(s):
* 2011-2014 Arvid Brodin, arvid.brodin@alten.se
+// SPDX-License-Identifier: GPL-2.0
/* Copyright 2011-2014 Autronica Fire and Security AS
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms of the GNU General Public License as published by the Free
- * Software Foundation; either version 2 of the License, or (at your option)
- * any later version.
*
* Author(s):
* 2011-2014 Arvid Brodin, arvid.brodin@alten.se
#include "hsr_main.h"
#include "hsr_framereg.h"
-
struct hsr_node;
struct hsr_frame_info {
bool is_local_exclusive;
};
-
/* The uses I can see for these HSR supervision frames are:
* 1) Use the frames that are sent after node initialization ("HSR_TLV.Type =
* 22") to reset any sequence_nr counters belonging to that node. Useful if
*/
static bool is_supervision_frame(struct hsr_priv *hsr, struct sk_buff *skb)
{
- struct ethhdr *ethHdr;
- struct hsr_sup_tag *hsrSupTag;
- struct hsrv1_ethhdr_sp *hsrV1Hdr;
+ struct ethhdr *eth_hdr;
+ struct hsr_sup_tag *hsr_sup_tag;
+ struct hsrv1_ethhdr_sp *hsr_V1_hdr;
WARN_ON_ONCE(!skb_mac_header_was_set(skb));
- ethHdr = (struct ethhdr *) skb_mac_header(skb);
+ eth_hdr = (struct ethhdr *)skb_mac_header(skb);
/* Correct addr? */
- if (!ether_addr_equal(ethHdr->h_dest,
+ if (!ether_addr_equal(eth_hdr->h_dest,
hsr->sup_multicast_addr))
return false;
/* Correct ether type?. */
- if (!(ethHdr->h_proto == htons(ETH_P_PRP)
- || ethHdr->h_proto == htons(ETH_P_HSR)))
+ if (!(eth_hdr->h_proto == htons(ETH_P_PRP) ||
+ eth_hdr->h_proto == htons(ETH_P_HSR)))
return false;
/* Get the supervision header from correct location. */
- if (ethHdr->h_proto == htons(ETH_P_HSR)) { /* Okay HSRv1. */
- hsrV1Hdr = (struct hsrv1_ethhdr_sp *) skb_mac_header(skb);
- if (hsrV1Hdr->hsr.encap_proto != htons(ETH_P_PRP))
+ if (eth_hdr->h_proto == htons(ETH_P_HSR)) { /* Okay HSRv1. */
+ hsr_V1_hdr = (struct hsrv1_ethhdr_sp *)skb_mac_header(skb);
+ if (hsr_V1_hdr->hsr.encap_proto != htons(ETH_P_PRP))
return false;
- hsrSupTag = &hsrV1Hdr->hsr_sup;
+ hsr_sup_tag = &hsr_V1_hdr->hsr_sup;
} else {
- hsrSupTag = &((struct hsrv0_ethhdr_sp *) skb_mac_header(skb))->hsr_sup;
+ hsr_sup_tag =
+ &((struct hsrv0_ethhdr_sp *)skb_mac_header(skb))->hsr_sup;
}
- if ((hsrSupTag->HSR_TLV_Type != HSR_TLV_ANNOUNCE) &&
- (hsrSupTag->HSR_TLV_Type != HSR_TLV_LIFE_CHECK))
+ if (hsr_sup_tag->HSR_TLV_type != HSR_TLV_ANNOUNCE &&
+ hsr_sup_tag->HSR_TLV_type != HSR_TLV_LIFE_CHECK)
return false;
- if ((hsrSupTag->HSR_TLV_Length != 12) &&
- (hsrSupTag->HSR_TLV_Length !=
- sizeof(struct hsr_sup_payload)))
+ if (hsr_sup_tag->HSR_TLV_length != 12 &&
+ hsr_sup_tag->HSR_TLV_length != sizeof(struct hsr_sup_payload))
return false;
return true;
}
-
static struct sk_buff *create_stripped_skb(struct sk_buff *skb_in,
struct hsr_frame_info *frame)
{
skb_pull(skb_in, HSR_HLEN);
skb = __pskb_copy(skb_in, skb_headroom(skb_in) - HSR_HLEN, GFP_ATOMIC);
skb_push(skb_in, HSR_HLEN);
- if (skb == NULL)
+ if (!skb)
return NULL;
skb_reset_mac_header(skb);
if (skb->ip_summed == CHECKSUM_PARTIAL)
skb->csum_start -= HSR_HLEN;
- copylen = 2*ETH_ALEN;
+ copylen = 2 * ETH_ALEN;
if (frame->is_vlan)
copylen += VLAN_HLEN;
src = skb_mac_header(skb_in);
return skb_clone(frame->skb_std, GFP_ATOMIC);
}
-
static void hsr_fill_tag(struct sk_buff *skb, struct hsr_frame_info *frame,
- struct hsr_port *port, u8 protoVersion)
+ struct hsr_port *port, u8 proto_version)
{
struct hsr_ethhdr *hsr_ethhdr;
int lane_id;
if (frame->is_vlan)
lsdu_size -= 4;
- hsr_ethhdr = (struct hsr_ethhdr *) skb_mac_header(skb);
+ hsr_ethhdr = (struct hsr_ethhdr *)skb_mac_header(skb);
set_hsr_tag_path(&hsr_ethhdr->hsr_tag, lane_id);
set_hsr_tag_LSDU_size(&hsr_ethhdr->hsr_tag, lsdu_size);
hsr_ethhdr->hsr_tag.sequence_nr = htons(frame->sequence_nr);
hsr_ethhdr->hsr_tag.encap_proto = hsr_ethhdr->ethhdr.h_proto;
- hsr_ethhdr->ethhdr.h_proto = htons(protoVersion ?
+ hsr_ethhdr->ethhdr.h_proto = htons(proto_version ?
ETH_P_HSR : ETH_P_PRP);
}
/* Create the new skb with enough headroom to fit the HSR tag */
skb = __pskb_copy(skb_o, skb_headroom(skb_o) + HSR_HLEN, GFP_ATOMIC);
- if (skb == NULL)
+ if (!skb)
return NULL;
skb_reset_mac_header(skb);
memmove(dst, src, movelen);
skb_reset_mac_header(skb);
- hsr_fill_tag(skb, frame, port, port->hsr->protVersion);
+ hsr_fill_tag(skb, frame, port, port->hsr->prot_version);
return skb;
}
if (frame->skb_hsr)
return skb_clone(frame->skb_hsr, GFP_ATOMIC);
- if ((port->type != HSR_PT_SLAVE_A) && (port->type != HSR_PT_SLAVE_B)) {
+ if (port->type != HSR_PT_SLAVE_A && port->type != HSR_PT_SLAVE_B) {
WARN_ONCE(1, "HSR: Bug: trying to create a tagged frame for a non-ring port");
return NULL;
}
return create_tagged_skb(frame->skb_std, frame, port);
}
-
static void hsr_deliver_master(struct sk_buff *skb, struct net_device *dev,
struct hsr_node *node_src)
{
return dev_queue_xmit(skb);
}
-
/* Forward the frame through all devices except:
* - Back through the receiving device
* - If it's a HSR frame: through a device where it has passed before
continue;
/* Don't deliver locally unless we should */
- if ((port->type == HSR_PT_MASTER) && !frame->is_local_dest)
+ if (port->type == HSR_PT_MASTER && !frame->is_local_dest)
continue;
/* Deliver frames directly addressed to us to master only */
- if ((port->type != HSR_PT_MASTER) && frame->is_local_exclusive)
+ if (port->type != HSR_PT_MASTER && frame->is_local_exclusive)
continue;
/* Don't send frame over port where it has been sent before */
frame->sequence_nr))
continue;
- if (frame->is_supervision && (port->type == HSR_PT_MASTER)) {
+ if (frame->is_supervision && port->type == HSR_PT_MASTER) {
hsr_handle_sup_frame(frame->skb_hsr,
frame->node_src,
frame->port_rcv);
skb = frame_get_tagged_skb(frame, port);
else
skb = frame_get_stripped_skb(frame, port);
- if (skb == NULL) {
+ if (!skb) {
/* FIXME: Record the dropped frame? */
continue;
}
}
}
-
static void check_local_dest(struct hsr_priv *hsr, struct sk_buff *skb,
struct hsr_frame_info *frame)
{
frame->is_local_exclusive = false;
}
- if ((skb->pkt_type == PACKET_HOST) ||
- (skb->pkt_type == PACKET_MULTICAST) ||
- (skb->pkt_type == PACKET_BROADCAST)) {
+ if (skb->pkt_type == PACKET_HOST ||
+ skb->pkt_type == PACKET_MULTICAST ||
+ skb->pkt_type == PACKET_BROADCAST) {
frame->is_local_dest = true;
} else {
frame->is_local_dest = false;
}
}
-
static int hsr_fill_frame_info(struct hsr_frame_info *frame,
struct sk_buff *skb, struct hsr_port *port)
{
frame->is_supervision = is_supervision_frame(port->hsr, skb);
frame->node_src = hsr_get_node(port, skb, frame->is_supervision);
- if (frame->node_src == NULL)
+ if (!frame->node_src)
return -1; /* Unknown node and !is_supervision, or no mem */
- ethhdr = (struct ethhdr *) skb_mac_header(skb);
+ ethhdr = (struct ethhdr *)skb_mac_header(skb);
frame->is_vlan = false;
if (ethhdr->h_proto == htons(ETH_P_8021Q)) {
frame->is_vlan = true;
/* FIXME: */
WARN_ONCE(1, "HSR: VLAN not yet supported");
}
- if (ethhdr->h_proto == htons(ETH_P_PRP)
- || ethhdr->h_proto == htons(ETH_P_HSR)) {
+ if (ethhdr->h_proto == htons(ETH_P_PRP) ||
+ ethhdr->h_proto == htons(ETH_P_HSR)) {
frame->skb_std = NULL;
frame->skb_hsr = skb;
frame->sequence_nr = hsr_get_skb_sequence_nr(skb);
goto out_drop;
hsr_register_frame_in(frame.node_src, port, frame.sequence_nr);
hsr_forward_do(&frame);
+ /* Gets called for ingress frames as well as egress from master port.
+ * So check and increment stats for master port only here.
+ */
+ if (port->type == HSR_PT_MASTER) {
+ port->dev->stats.tx_packets++;
+ port->dev->stats.tx_bytes += skb->len;
+ }
- if (frame.skb_hsr != NULL)
+ if (frame.skb_hsr)
kfree_skb(frame.skb_hsr);
- if (frame.skb_std != NULL)
+ if (frame.skb_std)
kfree_skb(frame.skb_std);
return;
+/* SPDX-License-Identifier: GPL-2.0 */
/* Copyright 2011-2014 Autronica Fire and Security AS
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms of the GNU General Public License as published by the Free
- * Software Foundation; either version 2 of the License, or (at your option)
- * any later version.
*
* Author(s):
* 2011-2014 Arvid Brodin, arvid.brodin@alten.se
+// SPDX-License-Identifier: GPL-2.0
/* Copyright 2011-2014 Autronica Fire and Security AS
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms of the GNU General Public License as published by the Free
- * Software Foundation; either version 2 of the License, or (at your option)
- * any later version.
*
* Author(s):
* 2011-2014 Arvid Brodin, arvid.brodin@alten.se
#include "hsr_framereg.h"
#include "hsr_netlink.h"
-
-struct hsr_node {
- struct list_head mac_list;
- unsigned char MacAddressA[ETH_ALEN];
- unsigned char MacAddressB[ETH_ALEN];
- /* Local slave through which AddrB frames are received from this node */
- enum hsr_port_type AddrB_port;
- unsigned long time_in[HSR_PT_PORTS];
- bool time_in_stale[HSR_PT_PORTS];
- u16 seq_out[HSR_PT_PORTS];
- struct rcu_head rcu_head;
-};
-
-
/* TODO: use hash lists for mac addresses (linux/jhash.h)? */
-
/* seq_nr_after(a, b) - return true if a is after (higher in sequence than) b,
* false otherwise.
*/
/* Remove inconsistency where
* seq_nr_after(a, b) == seq_nr_before(a, b)
*/
- if ((int) b - a == 32768)
+ if ((int)b - a == 32768)
return false;
- return (((s16) (b - a)) < 0);
+ return (((s16)(b - a)) < 0);
}
+
#define seq_nr_before(a, b) seq_nr_after((b), (a))
#define seq_nr_after_or_eq(a, b) (!seq_nr_before((a), (b)))
#define seq_nr_before_or_eq(a, b) (!seq_nr_after((a), (b)))
-
bool hsr_addr_is_self(struct hsr_priv *hsr, unsigned char *addr)
{
struct hsr_node *node;
return false;
}
- if (ether_addr_equal(addr, node->MacAddressA))
+ if (ether_addr_equal(addr, node->macaddress_A))
return true;
- if (ether_addr_equal(addr, node->MacAddressB))
+ if (ether_addr_equal(addr, node->macaddress_B))
return true;
return false;
/* Search for mac entry. Caller must hold rcu read lock.
*/
-static struct hsr_node *find_node_by_AddrA(struct list_head *node_db,
- const unsigned char addr[ETH_ALEN])
+static struct hsr_node *find_node_by_addr_A(struct list_head *node_db,
+ const unsigned char addr[ETH_ALEN])
{
struct hsr_node *node;
list_for_each_entry_rcu(node, node_db, mac_list) {
- if (ether_addr_equal(node->MacAddressA, addr))
+ if (ether_addr_equal(node->macaddress_A, addr))
return node;
}
return NULL;
}
-
/* Helper for device init; the self_node_db is used in hsr_rcv() to recognize
* frames from self that's been looped over the HSR ring.
*/
if (!node)
return -ENOMEM;
- ether_addr_copy(node->MacAddressA, addr_a);
- ether_addr_copy(node->MacAddressB, addr_b);
+ ether_addr_copy(node->macaddress_A, addr_a);
+ ether_addr_copy(node->macaddress_B, addr_b);
rcu_read_lock();
oldnode = list_first_or_null_rcu(self_node_db,
- struct hsr_node, mac_list);
+ struct hsr_node, mac_list);
if (oldnode) {
list_replace_rcu(&oldnode->mac_list, &node->mac_list);
rcu_read_unlock();
}
}
-/* Allocate an hsr_node and add it to node_db. 'addr' is the node's AddressA;
+/* Allocate an hsr_node and add it to node_db. 'addr' is the node's address_A;
* seq_out is used to initialize filtering of outgoing duplicate frames
* originating from the newly added node.
*/
if (!node)
return NULL;
- ether_addr_copy(node->MacAddressA, addr);
+ ether_addr_copy(node->macaddress_A, addr);
/* We are only interested in time diffs here, so use current jiffies
* as initialization. (0 could trigger an spurious ring error warning).
if (!skb_mac_header_was_set(skb))
return NULL;
- ethhdr = (struct ethhdr *) skb_mac_header(skb);
+ ethhdr = (struct ethhdr *)skb_mac_header(skb);
list_for_each_entry_rcu(node, node_db, mac_list) {
- if (ether_addr_equal(node->MacAddressA, ethhdr->h_source))
+ if (ether_addr_equal(node->macaddress_A, ethhdr->h_source))
return node;
- if (ether_addr_equal(node->MacAddressB, ethhdr->h_source))
+ if (ether_addr_equal(node->macaddress_B, ethhdr->h_source))
return node;
}
/* Everyone may create a node entry, connected node to a HSR device. */
- if (ethhdr->h_proto == htons(ETH_P_PRP)
- || ethhdr->h_proto == htons(ETH_P_HSR)) {
+ if (ethhdr->h_proto == htons(ETH_P_PRP) ||
+ ethhdr->h_proto == htons(ETH_P_HSR)) {
/* Use the existing sequence_nr from the tag as starting point
* for filtering duplicate frames.
*/
return hsr_add_node(node_db, ethhdr->h_source, seq_out);
}
-/* Use the Supervision frame's info about an eventual MacAddressB for merging
- * nodes that has previously had their MacAddressB registered as a separate
+/* Use the Supervision frame's info about an eventual macaddress_B for merging
+ * nodes that has previously had their macaddress_B registered as a separate
* node.
*/
void hsr_handle_sup_frame(struct sk_buff *skb, struct hsr_node *node_curr,
struct list_head *node_db;
int i;
- ethhdr = (struct ethhdr *) skb_mac_header(skb);
+ ethhdr = (struct ethhdr *)skb_mac_header(skb);
/* Leave the ethernet header. */
skb_pull(skb, sizeof(struct ethhdr));
/* And leave the HSR sup tag. */
skb_pull(skb, sizeof(struct hsr_sup_tag));
- hsr_sp = (struct hsr_sup_payload *) skb->data;
+ hsr_sp = (struct hsr_sup_payload *)skb->data;
- /* Merge node_curr (registered on MacAddressB) into node_real */
+ /* Merge node_curr (registered on macaddress_B) into node_real */
node_db = &port_rcv->hsr->node_db;
- node_real = find_node_by_AddrA(node_db, hsr_sp->MacAddressA);
+ node_real = find_node_by_addr_A(node_db, hsr_sp->macaddress_A);
if (!node_real)
/* No frame received from AddrA of this node yet */
- node_real = hsr_add_node(node_db, hsr_sp->MacAddressA,
+ node_real = hsr_add_node(node_db, hsr_sp->macaddress_A,
HSR_SEQNR_START - 1);
if (!node_real)
goto done; /* No mem */
/* Node has already been merged */
goto done;
- ether_addr_copy(node_real->MacAddressB, ethhdr->h_source);
+ ether_addr_copy(node_real->macaddress_B, ethhdr->h_source);
for (i = 0; i < HSR_PT_PORTS; i++) {
if (!node_curr->time_in_stale[i] &&
time_after(node_curr->time_in[i], node_real->time_in[i])) {
node_real->time_in[i] = node_curr->time_in[i];
- node_real->time_in_stale[i] = node_curr->time_in_stale[i];
+ node_real->time_in_stale[i] =
+ node_curr->time_in_stale[i];
}
if (seq_nr_after(node_curr->seq_out[i], node_real->seq_out[i]))
node_real->seq_out[i] = node_curr->seq_out[i];
}
- node_real->AddrB_port = port_rcv->type;
+ node_real->addr_B_port = port_rcv->type;
list_del_rcu(&node_curr->mac_list);
kfree_rcu(node_curr, rcu_head);
skb_push(skb, sizeof(struct hsrv1_ethhdr_sp));
}
-
/* 'skb' is a frame meant for this host, that is to be passed to upper layers.
*
* If the frame was sent by a node's B interface, replace the source
- * address with that node's "official" address (MacAddressA) so that upper
+ * address with that node's "official" address (macaddress_A) so that upper
* layers recognize where it came from.
*/
void hsr_addr_subst_source(struct hsr_node *node, struct sk_buff *skb)
return;
}
- memcpy(ð_hdr(skb)->h_source, node->MacAddressA, ETH_ALEN);
+ memcpy(ð_hdr(skb)->h_source, node->macaddress_A, ETH_ALEN);
}
/* 'skb' is a frame meant for another host.
if (!is_unicast_ether_addr(eth_hdr(skb)->h_dest))
return;
- node_dst = find_node_by_AddrA(&port->hsr->node_db, eth_hdr(skb)->h_dest);
+ node_dst = find_node_by_addr_A(&port->hsr->node_db,
+ eth_hdr(skb)->h_dest);
if (!node_dst) {
WARN_ONCE(1, "%s: Unknown node\n", __func__);
return;
}
- if (port->type != node_dst->AddrB_port)
+ if (port->type != node_dst->addr_B_port)
return;
- ether_addr_copy(eth_hdr(skb)->h_dest, node_dst->MacAddressB);
+ ether_addr_copy(eth_hdr(skb)->h_dest, node_dst->macaddress_B);
}
-
void hsr_register_frame_in(struct hsr_node *node, struct hsr_port *port,
u16 sequence_nr)
{
return 0;
}
-
static struct hsr_port *get_late_port(struct hsr_priv *hsr,
struct hsr_node *node)
{
return NULL;
}
-
/* Remove stale sequence_nr records. Called by timer every
* HSR_LIFE_CHECK_INTERVAL (two seconds or so).
*/
time_b = node->time_in[HSR_PT_SLAVE_B];
/* Check for timestamps old enough to risk wrap-around */
- if (time_after(jiffies, time_a + MAX_JIFFY_OFFSET/2))
+ if (time_after(jiffies, time_a + MAX_JIFFY_OFFSET / 2))
node->time_in_stale[HSR_PT_SLAVE_A] = true;
- if (time_after(jiffies, time_b + MAX_JIFFY_OFFSET/2))
+ if (time_after(jiffies, time_b + MAX_JIFFY_OFFSET / 2))
node->time_in_stale[HSR_PT_SLAVE_B] = true;
/* Get age of newest frame from node.
/* Warn of ring error only as long as we get frames at all */
if (time_is_after_jiffies(timestamp +
- msecs_to_jiffies(1.5*MAX_SLAVE_DIFF))) {
+ msecs_to_jiffies(1.5 * MAX_SLAVE_DIFF))) {
rcu_read_lock();
port = get_late_port(hsr, node);
- if (port != NULL)
- hsr_nl_ringerror(hsr, node->MacAddressA, port);
+ if (port)
+ hsr_nl_ringerror(hsr, node->macaddress_A, port);
rcu_read_unlock();
}
/* Prune old entries */
if (time_is_before_jiffies(timestamp +
- msecs_to_jiffies(HSR_NODE_FORGET_TIME))) {
- hsr_nl_nodedown(hsr, node->MacAddressA);
+ msecs_to_jiffies(HSR_NODE_FORGET_TIME))) {
+ hsr_nl_nodedown(hsr, node->macaddress_A);
list_del_rcu(&node->mac_list);
/* Note that we need to free this entry later: */
kfree_rcu(node, rcu_head);
}
}
rcu_read_unlock();
-}
+ /* Restart timer */
+ mod_timer(&hsr->prune_timer,
+ jiffies + msecs_to_jiffies(PRUNE_PERIOD));
+}
void *hsr_get_next_node(struct hsr_priv *hsr, void *_pos,
unsigned char addr[ETH_ALEN])
node = list_first_or_null_rcu(&hsr->node_db,
struct hsr_node, mac_list);
if (node)
- ether_addr_copy(addr, node->MacAddressA);
+ ether_addr_copy(addr, node->macaddress_A);
return node;
}
node = _pos;
list_for_each_entry_continue_rcu(node, &hsr->node_db, mac_list) {
- ether_addr_copy(addr, node->MacAddressA);
+ ether_addr_copy(addr, node->macaddress_A);
return node;
}
return NULL;
}
-
int hsr_get_node_data(struct hsr_priv *hsr,
const unsigned char *addr,
unsigned char addr_b[ETH_ALEN],
struct hsr_port *port;
unsigned long tdiff;
-
rcu_read_lock();
- node = find_node_by_AddrA(&hsr->node_db, addr);
+ node = find_node_by_addr_A(&hsr->node_db, addr);
if (!node) {
rcu_read_unlock();
return -ENOENT; /* No such entry */
}
- ether_addr_copy(addr_b, node->MacAddressB);
+ ether_addr_copy(addr_b, node->macaddress_B);
tdiff = jiffies - node->time_in[HSR_PT_SLAVE_A];
if (node->time_in_stale[HSR_PT_SLAVE_A])
*if1_seq = node->seq_out[HSR_PT_SLAVE_B];
*if2_seq = node->seq_out[HSR_PT_SLAVE_A];
- if (node->AddrB_port != HSR_PT_NONE) {
- port = hsr_port_get_hsr(hsr, node->AddrB_port);
+ if (node->addr_B_port != HSR_PT_NONE) {
+ port = hsr_port_get_hsr(hsr, node->addr_B_port);
*addr_b_ifindex = port->dev->ifindex;
} else {
*addr_b_ifindex = -1;
+/* SPDX-License-Identifier: GPL-2.0 */
/* Copyright 2011-2014 Autronica Fire and Security AS
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms of the GNU General Public License as published by the Free
- * Software Foundation; either version 2 of the License, or (at your option)
- * any later version.
*
* Author(s):
* 2011-2014 Arvid Brodin, arvid.brodin@alten.se
int *if2_age,
u16 *if2_seq);
+struct hsr_node {
+ struct list_head mac_list;
+ unsigned char macaddress_A[ETH_ALEN];
+ unsigned char macaddress_B[ETH_ALEN];
+ /* Local slave through which AddrB frames are received from this node */
+ enum hsr_port_type addr_B_port;
+ unsigned long time_in[HSR_PT_PORTS];
+ bool time_in_stale[HSR_PT_PORTS];
+ u16 seq_out[HSR_PT_PORTS];
+ struct rcu_head rcu_head;
+};
+
#endif /* __HSR_FRAMEREG_H */
+// SPDX-License-Identifier: GPL-2.0
/* Copyright 2011-2014 Autronica Fire and Security AS
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms of the GNU General Public License as published by the Free
- * Software Foundation; either version 2 of the License, or (at your option)
- * any later version.
*
* Author(s):
* 2011-2014 Arvid Brodin, arvid.brodin@alten.se
#include "hsr_framereg.h"
#include "hsr_slave.h"
-
static int hsr_netdev_notify(struct notifier_block *nb, unsigned long event,
void *ptr)
{
dev = netdev_notifier_info_to_dev(ptr);
port = hsr_port_get_rtnl(dev);
- if (port == NULL) {
+ if (!port) {
if (!is_hsr_master(dev))
return NOTIFY_DONE; /* Not an HSR device */
hsr = netdev_priv(dev);
port = hsr_port_get_hsr(hsr, HSR_PT_MASTER);
- if (port == NULL) {
+ if (!port) {
/* Resend of notification concerning removed device? */
return NOTIFY_DONE;
}
if (port->type == HSR_PT_SLAVE_A) {
ether_addr_copy(master->dev->dev_addr, dev->dev_addr);
- call_netdevice_notifiers(NETDEV_CHANGEADDR, master->dev);
+ call_netdevice_notifiers(NETDEV_CHANGEADDR,
+ master->dev);
}
/* Make sure we recognize frames from ourselves in hsr_rcv() */
return NOTIFY_DONE;
}
-
struct hsr_port *hsr_port_get_hsr(struct hsr_priv *hsr, enum hsr_port_type pt)
{
struct hsr_port *port;
.notifier_call = hsr_netdev_notify, /* Slave event notifications */
};
-
static int __init hsr_init(void)
{
int res;
+/* SPDX-License-Identifier: GPL-2.0 */
/* Copyright 2011-2014 Autronica Fire and Security AS
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms of the GNU General Public License as published by the Free
- * Software Foundation; either version 2 of the License, or (at your option)
- * any later version.
*
* Author(s):
* 2011-2014 Arvid Brodin, arvid.brodin@alten.se
#include <linux/netdevice.h>
#include <linux/list.h>
-
/* Time constants as specified in the HSR specification (IEC-62439-3 2010)
* Table 8.
* All values in milliseconds.
#define HSR_NODE_FORGET_TIME 60000 /* ms */
#define HSR_ANNOUNCE_INTERVAL 100 /* ms */
-
/* By how much may slave1 and slave2 timestamps of latest received frame from
* each node differ before we notify of communication problem?
*/
#define HSR_SEQNR_START (USHRT_MAX - 1024)
#define HSR_SUP_SEQNR_START (HSR_SEQNR_START / 2)
-
/* How often shall we check for broken ring and remove node entries older than
* HSR_NODE_FORGET_TIME?
*/
#define PRUNE_PERIOD 3000 /* ms */
-
#define HSR_TLV_ANNOUNCE 22
#define HSR_TLV_LIFE_CHECK 23
-
/* HSR Tag.
* As defined in IEC-62439-3:2010, the HSR tag is really { ethertype = 0x88FB,
* path, LSDU_size, sequence Nr }. But we let eth_header() create { h_dest,
static inline void set_hsr_tag_path(struct hsr_tag *ht, u16 path)
{
- ht->path_and_LSDU_size = htons(
- (ntohs(ht->path_and_LSDU_size) & 0x0FFF) | (path << 12));
+ ht->path_and_LSDU_size =
+ htons((ntohs(ht->path_and_LSDU_size) & 0x0FFF) | (path << 12));
}
static inline void set_hsr_tag_LSDU_size(struct hsr_tag *ht, u16 LSDU_size)
{
- ht->path_and_LSDU_size = htons(
- (ntohs(ht->path_and_LSDU_size) & 0xF000) |
- (LSDU_size & 0x0FFF));
+ ht->path_and_LSDU_size = htons((ntohs(ht->path_and_LSDU_size) &
+ 0xF000) | (LSDU_size & 0x0FFF));
}
struct hsr_ethhdr {
struct hsr_tag hsr_tag;
} __packed;
-
/* HSR Supervision Frame data types.
* Field names as defined in the IEC:2010 standard for HSR.
*/
struct hsr_sup_tag {
- __be16 path_and_HSR_Ver;
+ __be16 path_and_HSR_ver;
__be16 sequence_nr;
- __u8 HSR_TLV_Type;
- __u8 HSR_TLV_Length;
+ __u8 HSR_TLV_type;
+ __u8 HSR_TLV_length;
} __packed;
struct hsr_sup_payload {
- unsigned char MacAddressA[ETH_ALEN];
+ unsigned char macaddress_A[ETH_ALEN];
} __packed;
static inline u16 get_hsr_stag_path(struct hsr_sup_tag *hst)
{
- return get_hsr_tag_path((struct hsr_tag *) hst);
+ return get_hsr_tag_path((struct hsr_tag *)hst);
}
static inline u16 get_hsr_stag_HSR_ver(struct hsr_sup_tag *hst)
{
- return get_hsr_tag_LSDU_size((struct hsr_tag *) hst);
+ return get_hsr_tag_LSDU_size((struct hsr_tag *)hst);
}
static inline void set_hsr_stag_path(struct hsr_sup_tag *hst, u16 path)
{
- set_hsr_tag_path((struct hsr_tag *) hst, path);
+ set_hsr_tag_path((struct hsr_tag *)hst, path);
}
-static inline void set_hsr_stag_HSR_Ver(struct hsr_sup_tag *hst, u16 HSR_Ver)
+static inline void set_hsr_stag_HSR_ver(struct hsr_sup_tag *hst, u16 HSR_ver)
{
- set_hsr_tag_LSDU_size((struct hsr_tag *) hst, HSR_Ver);
+ set_hsr_tag_LSDU_size((struct hsr_tag *)hst, HSR_ver);
}
struct hsrv0_ethhdr_sp {
struct hsr_sup_tag hsr_sup;
} __packed;
-
enum hsr_port_type {
HSR_PT_NONE = 0, /* Must be 0, used by framereg */
HSR_PT_SLAVE_A,
struct timer_list prune_timer;
int announce_count;
u16 sequence_nr;
- u16 sup_sequence_nr; /* For HSRv1 separate seq_nr for supervision */
- u8 protVersion; /* Indicate if HSRv0 or HSRv1. */
+ u16 sup_sequence_nr; /* For HSRv1 separate seq_nr for supervision */
+ u8 prot_version; /* Indicate if HSRv0 or HSRv1. */
spinlock_t seqnr_lock; /* locking for sequence_nr */
unsigned char sup_multicast_addr[ETH_ALEN];
+#ifdef CONFIG_DEBUG_FS
+ struct dentry *node_tbl_root;
+ struct dentry *node_tbl_file;
+#endif
};
#define hsr_for_each_port(hsr, port) \
{
struct hsr_ethhdr *hsr_ethhdr;
- hsr_ethhdr = (struct hsr_ethhdr *) skb_mac_header(skb);
+ hsr_ethhdr = (struct hsr_ethhdr *)skb_mac_header(skb);
return ntohs(hsr_ethhdr->hsr_tag.sequence_nr);
}
+#if IS_ENABLED(CONFIG_DEBUG_FS)
+int hsr_debugfs_init(struct hsr_priv *priv, struct net_device *hsr_dev);
+void hsr_debugfs_term(struct hsr_priv *priv);
+#else
+static inline int hsr_debugfs_init(struct hsr_priv *priv,
+ struct net_device *hsr_dev)
+{
+ return 0;
+}
+
+static inline void hsr_debugfs_term(struct hsr_priv *priv)
+{}
+#endif
+
#endif /* __HSR_PRIVATE_H */
+// SPDX-License-Identifier: GPL-2.0
/* Copyright 2011-2014 Autronica Fire and Security AS
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms of the GNU General Public License as published by the Free
- * Software Foundation; either version 2 of the License, or (at your option)
- * any later version.
*
* Author(s):
* 2011-2014 Arvid Brodin, arvid.brodin@alten.se
[IFLA_HSR_SEQ_NR] = { .type = NLA_U16 },
};
-
/* Here, it seems a netdevice has already been allocated for us, and the
* hsr_dev_setup routine has been executed. Nice!
*/
netdev_info(dev, "HSR: Slave1 device not specified\n");
return -EINVAL;
}
- link[0] = __dev_get_by_index(src_net, nla_get_u32(data[IFLA_HSR_SLAVE1]));
+ link[0] = __dev_get_by_index(src_net,
+ nla_get_u32(data[IFLA_HSR_SLAVE1]));
if (!data[IFLA_HSR_SLAVE2]) {
netdev_info(dev, "HSR: Slave2 device not specified\n");
return -EINVAL;
}
- link[1] = __dev_get_by_index(src_net, nla_get_u32(data[IFLA_HSR_SLAVE2]));
+ link[1] = __dev_get_by_index(src_net,
+ nla_get_u32(data[IFLA_HSR_SLAVE2]));
if (!link[0] || !link[1])
return -ENODEV;
.fill_info = hsr_fill_info,
};
-
-
/* attribute policy */
static const struct nla_policy hsr_genl_policy[HSR_A_MAX + 1] = {
[HSR_A_NODE_ADDR] = { .len = ETH_ALEN },
{ .name = "hsr-network", },
};
-
-
/* This is called if for some node with MAC address addr, we only get frames
* over one of the slave interfaces. This would indicate an open network ring
* (i.e. a link has failed somewhere).
if (!skb)
goto fail;
- msg_head = genlmsg_put(skb, 0, 0, &hsr_genl_family, 0, HSR_C_RING_ERROR);
+ msg_head = genlmsg_put(skb, 0, 0, &hsr_genl_family, 0,
+ HSR_C_RING_ERROR);
if (!msg_head)
goto nla_put_failure;
if (!msg_head)
goto nla_put_failure;
-
res = nla_put(skb, HSR_A_NODE_ADDR, ETH_ALEN, addr);
if (res < 0)
goto nla_put_failure;
rcu_read_unlock();
}
-
/* HSR_C_GET_NODE_STATUS lets userspace query the internal HSR node table
* about the status of a specific node in the network, defined by its MAC
* address.
goto invalid;
hsr_dev = __dev_get_by_index(genl_info_net(info),
- nla_get_u32(info->attrs[HSR_A_IFINDEX]));
+ nla_get_u32(info->attrs[HSR_A_IFINDEX]));
if (!hsr_dev)
goto invalid;
if (!is_hsr_master(hsr_dev))
goto invalid;
-
/* Send reply */
-
skb_out = genlmsg_new(NLMSG_GOODSIZE, GFP_KERNEL);
if (!skb_out) {
res = -ENOMEM;
}
msg_head = genlmsg_put(skb_out, NETLINK_CB(skb_in).portid,
- info->snd_seq, &hsr_genl_family, 0,
- HSR_C_SET_NODE_STATUS);
+ info->snd_seq, &hsr_genl_family, 0,
+ HSR_C_SET_NODE_STATUS);
if (!msg_head) {
res = -ENOMEM;
goto nla_put_failure;
hsr = netdev_priv(hsr_dev);
res = hsr_get_node_data(hsr,
- (unsigned char *) nla_data(info->attrs[HSR_A_NODE_ADDR]),
- hsr_node_addr_b,
- &addr_b_ifindex,
- &hsr_node_if1_age,
- &hsr_node_if1_seq,
- &hsr_node_if2_age,
- &hsr_node_if2_seq);
+ (unsigned char *)
+ nla_data(info->attrs[HSR_A_NODE_ADDR]),
+ hsr_node_addr_b,
+ &addr_b_ifindex,
+ &hsr_node_if1_age,
+ &hsr_node_if1_seq,
+ &hsr_node_if2_age,
+ &hsr_node_if2_seq);
if (res < 0)
goto nla_put_failure;
res = nla_put(skb_out, HSR_A_NODE_ADDR, ETH_ALEN,
- nla_data(info->attrs[HSR_A_NODE_ADDR]));
+ nla_data(info->attrs[HSR_A_NODE_ADDR]));
if (res < 0)
goto nla_put_failure;
if (addr_b_ifindex > -1) {
res = nla_put(skb_out, HSR_A_NODE_ADDR_B, ETH_ALEN,
- hsr_node_addr_b);
+ hsr_node_addr_b);
if (res < 0)
goto nla_put_failure;
- res = nla_put_u32(skb_out, HSR_A_ADDR_B_IFINDEX, addr_b_ifindex);
+ res = nla_put_u32(skb_out, HSR_A_ADDR_B_IFINDEX,
+ addr_b_ifindex);
if (res < 0)
goto nla_put_failure;
}
if (!is_hsr_master(hsr_dev))
goto invalid;
-
/* Send reply */
-
skb_out = genlmsg_new(NLMSG_GOODSIZE, GFP_KERNEL);
if (!skb_out) {
res = -ENOMEM;
}
msg_head = genlmsg_put(skb_out, NETLINK_CB(skb_in).portid,
- info->snd_seq, &hsr_genl_family, 0,
- HSR_C_SET_NODE_LIST);
+ info->snd_seq, &hsr_genl_family, 0,
+ HSR_C_SET_NODE_LIST);
if (!msg_head) {
res = -ENOMEM;
goto nla_put_failure;
return res;
}
-
static const struct genl_ops hsr_ops[] = {
{
.cmd = HSR_C_GET_NODE_STATUS,
.flags = 0,
- .policy = hsr_genl_policy,
.doit = hsr_get_node_status,
.dumpit = NULL,
},
{
.cmd = HSR_C_GET_NODE_LIST,
.flags = 0,
- .policy = hsr_genl_policy,
.doit = hsr_get_node_list,
.dumpit = NULL,
},
.name = "HSR",
.version = 1,
.maxattr = HSR_A_MAX,
+ .policy = hsr_genl_policy,
.module = THIS_MODULE,
.ops = hsr_ops,
.n_ops = ARRAY_SIZE(hsr_ops),
+/* SPDX-License-Identifier: GPL-2.0 */
/* Copyright 2011-2014 Autronica Fire and Security AS
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms of the GNU General Public License as published by the Free
- * Software Foundation; either version 2 of the License, or (at your option)
- * any later version.
*
* Author(s):
* 2011-2014 Arvid Brodin, arvid.brodin@alten.se
+// SPDX-License-Identifier: GPL-2.0
/* Copyright 2011-2014 Autronica Fire and Security AS
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms of the GNU General Public License as published by the Free
- * Software Foundation; either version 2 of the License, or (at your option)
- * any later version.
*
* Author(s):
* 2011-2014 Arvid Brodin, arvid.brodin@alten.se
#include "hsr_forward.h"
#include "hsr_framereg.h"
-
static rx_handler_result_t hsr_handle_frame(struct sk_buff **pskb)
{
struct sk_buff *skb = *pskb;
return rcu_access_pointer(dev->rx_handler) == hsr_handle_frame;
}
-
static int hsr_check_dev_ok(struct net_device *dev)
{
/* Don't allow HSR on non-ethernet like devices */
- if ((dev->flags & IFF_LOOPBACK) || (dev->type != ARPHRD_ETHER) ||
- (dev->addr_len != ETH_ALEN)) {
+ if ((dev->flags & IFF_LOOPBACK) || dev->type != ARPHRD_ETHER ||
+ dev->addr_len != ETH_ALEN) {
netdev_info(dev, "Cannot use loopback or non-ethernet device as HSR slave.\n");
return -EINVAL;
}
return 0;
}
-
/* Setup device to be added to the HSR bridge. */
static int hsr_portdev_setup(struct net_device *dev, struct hsr_port *port)
{
}
port = hsr_port_get_hsr(hsr, type);
- if (port != NULL)
+ if (port)
return -EBUSY; /* This port already exists */
port = kzalloc(sizeof(*port), GFP_KERNEL);
- if (port == NULL)
+ if (!port)
return -ENOMEM;
if (type != HSR_PT_MASTER) {
list_del_rcu(&port->port_list);
if (port != master) {
- if (master != NULL) {
+ if (master) {
netdev_update_features(master->dev);
dev_set_mtu(master->dev, hsr_get_max_mtu(hsr));
}
+/* SPDX-License-Identifier: GPL-2.0 */
/* Copyright 2011-2014 Autronica Fire and Security AS
*
- * This program is free software; you can redistribute it and/or modify it
- * under the terms of the GNU General Public License as published by the Free
- * Software Foundation; either version 2 of the License, or (at your option)
- * any later version.
- *
- * Author(s):
* 2011-2014 Arvid Brodin, arvid.brodin@alten.se
*/
#define IEEE802154_OP(_cmd, _func) \
{ \
.cmd = _cmd, \
- .policy = ieee802154_policy, \
.doit = _func, \
.dumpit = NULL, \
.flags = GENL_ADMIN_PERM, \
#define IEEE802154_DUMP(_cmd, _func, _dump) \
{ \
.cmd = _cmd, \
- .policy = ieee802154_policy, \
.doit = _func, \
.dumpit = _dump, \
}
.name = IEEE802154_NL_NAME,
.version = 1,
.maxattr = IEEE802154_ATTR_MAX,
+ .policy = ieee802154_policy,
.module = THIS_MODULE,
.ops = ieee802154_ops,
.n_ops = ARRAY_SIZE(ieee802154_ops),
.doit = nl802154_get_wpan_phy,
.dumpit = nl802154_dump_wpan_phy,
.done = nl802154_dump_wpan_phy_done,
- .policy = nl802154_policy,
/* can be retrieved by unprivileged users */
.internal_flags = NL802154_FLAG_NEED_WPAN_PHY |
NL802154_FLAG_NEED_RTNL,
.cmd = NL802154_CMD_GET_INTERFACE,
.doit = nl802154_get_interface,
.dumpit = nl802154_dump_interface,
- .policy = nl802154_policy,
/* can be retrieved by unprivileged users */
.internal_flags = NL802154_FLAG_NEED_WPAN_DEV |
NL802154_FLAG_NEED_RTNL,
{
.cmd = NL802154_CMD_NEW_INTERFACE,
.doit = nl802154_new_interface,
- .policy = nl802154_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = NL802154_FLAG_NEED_WPAN_PHY |
NL802154_FLAG_NEED_RTNL,
{
.cmd = NL802154_CMD_DEL_INTERFACE,
.doit = nl802154_del_interface,
- .policy = nl802154_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = NL802154_FLAG_NEED_WPAN_DEV |
NL802154_FLAG_NEED_RTNL,
{
.cmd = NL802154_CMD_SET_CHANNEL,
.doit = nl802154_set_channel,
- .policy = nl802154_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = NL802154_FLAG_NEED_WPAN_PHY |
NL802154_FLAG_NEED_RTNL,
{
.cmd = NL802154_CMD_SET_CCA_MODE,
.doit = nl802154_set_cca_mode,
- .policy = nl802154_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = NL802154_FLAG_NEED_WPAN_PHY |
NL802154_FLAG_NEED_RTNL,
{
.cmd = NL802154_CMD_SET_CCA_ED_LEVEL,
.doit = nl802154_set_cca_ed_level,
- .policy = nl802154_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = NL802154_FLAG_NEED_WPAN_PHY |
NL802154_FLAG_NEED_RTNL,
{
.cmd = NL802154_CMD_SET_TX_POWER,
.doit = nl802154_set_tx_power,
- .policy = nl802154_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = NL802154_FLAG_NEED_WPAN_PHY |
NL802154_FLAG_NEED_RTNL,
{
.cmd = NL802154_CMD_SET_WPAN_PHY_NETNS,
.doit = nl802154_wpan_phy_netns,
- .policy = nl802154_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = NL802154_FLAG_NEED_WPAN_PHY |
NL802154_FLAG_NEED_RTNL,
{
.cmd = NL802154_CMD_SET_PAN_ID,
.doit = nl802154_set_pan_id,
- .policy = nl802154_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = NL802154_FLAG_NEED_NETDEV |
NL802154_FLAG_NEED_RTNL,
{
.cmd = NL802154_CMD_SET_SHORT_ADDR,
.doit = nl802154_set_short_addr,
- .policy = nl802154_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = NL802154_FLAG_NEED_NETDEV |
NL802154_FLAG_NEED_RTNL,
{
.cmd = NL802154_CMD_SET_BACKOFF_EXPONENT,
.doit = nl802154_set_backoff_exponent,
- .policy = nl802154_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = NL802154_FLAG_NEED_NETDEV |
NL802154_FLAG_NEED_RTNL,
{
.cmd = NL802154_CMD_SET_MAX_CSMA_BACKOFFS,
.doit = nl802154_set_max_csma_backoffs,
- .policy = nl802154_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = NL802154_FLAG_NEED_NETDEV |
NL802154_FLAG_NEED_RTNL,
{
.cmd = NL802154_CMD_SET_MAX_FRAME_RETRIES,
.doit = nl802154_set_max_frame_retries,
- .policy = nl802154_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = NL802154_FLAG_NEED_NETDEV |
NL802154_FLAG_NEED_RTNL,
{
.cmd = NL802154_CMD_SET_LBT_MODE,
.doit = nl802154_set_lbt_mode,
- .policy = nl802154_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = NL802154_FLAG_NEED_NETDEV |
NL802154_FLAG_NEED_RTNL,
{
.cmd = NL802154_CMD_SET_ACKREQ_DEFAULT,
.doit = nl802154_set_ackreq_default,
- .policy = nl802154_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = NL802154_FLAG_NEED_NETDEV |
NL802154_FLAG_NEED_RTNL,
{
.cmd = NL802154_CMD_SET_SEC_PARAMS,
.doit = nl802154_set_llsec_params,
- .policy = nl802154_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = NL802154_FLAG_NEED_NETDEV |
NL802154_FLAG_NEED_RTNL,
.cmd = NL802154_CMD_GET_SEC_KEY,
/* TODO .doit by matching key id? */
.dumpit = nl802154_dump_llsec_key,
- .policy = nl802154_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = NL802154_FLAG_NEED_NETDEV |
NL802154_FLAG_NEED_RTNL,
{
.cmd = NL802154_CMD_NEW_SEC_KEY,
.doit = nl802154_add_llsec_key,
- .policy = nl802154_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = NL802154_FLAG_NEED_NETDEV |
NL802154_FLAG_NEED_RTNL,
{
.cmd = NL802154_CMD_DEL_SEC_KEY,
.doit = nl802154_del_llsec_key,
- .policy = nl802154_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = NL802154_FLAG_NEED_NETDEV |
NL802154_FLAG_NEED_RTNL,
.cmd = NL802154_CMD_GET_SEC_DEV,
/* TODO .doit by matching extended_addr? */
.dumpit = nl802154_dump_llsec_dev,
- .policy = nl802154_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = NL802154_FLAG_NEED_NETDEV |
NL802154_FLAG_NEED_RTNL,
{
.cmd = NL802154_CMD_NEW_SEC_DEV,
.doit = nl802154_add_llsec_dev,
- .policy = nl802154_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = NL802154_FLAG_NEED_NETDEV |
NL802154_FLAG_NEED_RTNL,
{
.cmd = NL802154_CMD_DEL_SEC_DEV,
.doit = nl802154_del_llsec_dev,
- .policy = nl802154_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = NL802154_FLAG_NEED_NETDEV |
NL802154_FLAG_NEED_RTNL,
.cmd = NL802154_CMD_GET_SEC_DEVKEY,
/* TODO doit by matching ??? */
.dumpit = nl802154_dump_llsec_devkey,
- .policy = nl802154_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = NL802154_FLAG_NEED_NETDEV |
NL802154_FLAG_NEED_RTNL,
{
.cmd = NL802154_CMD_NEW_SEC_DEVKEY,
.doit = nl802154_add_llsec_devkey,
- .policy = nl802154_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = NL802154_FLAG_NEED_NETDEV |
NL802154_FLAG_NEED_RTNL,
{
.cmd = NL802154_CMD_DEL_SEC_DEVKEY,
.doit = nl802154_del_llsec_devkey,
- .policy = nl802154_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = NL802154_FLAG_NEED_NETDEV |
NL802154_FLAG_NEED_RTNL,
.cmd = NL802154_CMD_GET_SEC_LEVEL,
/* TODO .doit by matching frame_type? */
.dumpit = nl802154_dump_llsec_seclevel,
- .policy = nl802154_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = NL802154_FLAG_NEED_NETDEV |
NL802154_FLAG_NEED_RTNL,
{
.cmd = NL802154_CMD_NEW_SEC_LEVEL,
.doit = nl802154_add_llsec_seclevel,
- .policy = nl802154_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = NL802154_FLAG_NEED_NETDEV |
NL802154_FLAG_NEED_RTNL,
.cmd = NL802154_CMD_DEL_SEC_LEVEL,
/* TODO match frame_type only? */
.doit = nl802154_del_llsec_seclevel,
- .policy = nl802154_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = NL802154_FLAG_NEED_NETDEV |
NL802154_FLAG_NEED_RTNL,
.hdrsize = 0, /* no private header */
.version = 1, /* no particular meaning now */
.maxattr = NL802154_ATTR_MAX,
+ .policy = nl802154_policy,
.netnsok = true,
.pre_doit = nl802154_pre_doit,
.post_doit = nl802154_post_doit,
struct inet_sock *inet = inet_sk(sk);
__skb_queue_purge(&sk->sk_receive_queue);
+ if (sk->sk_rx_skb_cache) {
+ __kfree_skb(sk->sk_rx_skb_cache);
+ sk->sk_rx_skb_cache = NULL;
+ }
__skb_queue_purge(&sk->sk_error_queue);
sk_mem_reclaim(sk);
WARN_ON(sk->sk_forward_alloc);
kfree(rcu_dereference_protected(inet->inet_opt, 1));
- dst_release(rcu_dereference_check(sk->sk_dst_cache, 1));
+ dst_release(rcu_dereference_protected(sk->sk_dst_cache, 1));
dst_release(sk->sk_rx_dst);
sk_refcnt_debug_dec(sk);
}
.flowi4_mark = vmark ? skb->mark : 0,
};
if (!fib_lookup(net, &fl4, &res, 0))
- return FIB_RES_PREFSRC(net, res);
+ return fib_result_prefsrc(net, &res);
} else {
scope = RT_SCOPE_LINK;
}
for (ret = 0; ret < fi->fib_nhs; ret++) {
struct fib_nh *nh = &fi->fib_nh[ret];
- if (nh->nh_dev == dev) {
+ if (nh->fib_nh_dev == dev) {
dev_match = true;
break;
- } else if (l3mdev_master_ifindex_rcu(nh->nh_dev) == dev->ifindex) {
+ } else if (l3mdev_master_ifindex_rcu(nh->fib_nh_dev) == dev->ifindex) {
dev_match = true;
break;
}
}
#else
- if (fi->fib_nh[0].nh_dev == dev)
+ if (fi->fib_nh[0].fib_nh_dev == dev)
dev_match = true;
#endif
dev_match = fib_info_nh_uses_dev(res.fi, dev);
if (dev_match) {
- ret = FIB_RES_NH(res).nh_scope >= RT_SCOPE_HOST;
+ ret = FIB_RES_NHC(res)->nhc_scope >= RT_SCOPE_HOST;
return ret;
}
if (no_addr)
ret = 0;
if (fib_lookup(net, &fl4, &res, FIB_LOOKUP_IGNORE_LINKSTATE) == 0) {
if (res.type == RTN_UNICAST)
- ret = FIB_RES_NH(res).nh_scope >= RT_SCOPE_HOST;
+ ret = FIB_RES_NHC(res)->nhc_scope >= RT_SCOPE_HOST;
}
return ret;
if (rt->rt_gateway.sa_family == AF_INET && addr) {
unsigned int addr_type;
- cfg->fc_gw = addr;
+ cfg->fc_gw4 = addr;
+ cfg->fc_gw_family = AF_INET;
addr_type = inet_addr_type_table(net, addr, cfg->fc_table);
if (rt->rt_flags & RTF_GATEWAY &&
addr_type == RTN_UNICAST)
if (cmd == SIOCDELRT)
return 0;
- if (rt->rt_flags & RTF_GATEWAY && !cfg->fc_gw)
+ if (rt->rt_flags & RTF_GATEWAY && !cfg->fc_gw_family)
return -EINVAL;
if (cfg->fc_scope == RT_SCOPE_NOWHERE)
[RTA_DPORT] = { .type = NLA_U16 },
};
+int fib_gw_from_via(struct fib_config *cfg, struct nlattr *nla,
+ struct netlink_ext_ack *extack)
+{
+ struct rtvia *via;
+ int alen;
+
+ if (nla_len(nla) < offsetof(struct rtvia, rtvia_addr)) {
+ NL_SET_ERR_MSG(extack, "Invalid attribute length for RTA_VIA");
+ return -EINVAL;
+ }
+
+ via = nla_data(nla);
+ alen = nla_len(nla) - offsetof(struct rtvia, rtvia_addr);
+
+ switch (via->rtvia_family) {
+ case AF_INET:
+ if (alen != sizeof(__be32)) {
+ NL_SET_ERR_MSG(extack, "Invalid IPv4 address in RTA_VIA");
+ return -EINVAL;
+ }
+ cfg->fc_gw_family = AF_INET;
+ cfg->fc_gw4 = *((__be32 *)via->rtvia_addr);
+ break;
+ case AF_INET6:
+#ifdef CONFIG_IPV6
+ if (alen != sizeof(struct in6_addr)) {
+ NL_SET_ERR_MSG(extack, "Invalid IPv6 address in RTA_VIA");
+ return -EINVAL;
+ }
+ cfg->fc_gw_family = AF_INET6;
+ cfg->fc_gw6 = *((struct in6_addr *)via->rtvia_addr);
+#else
+ NL_SET_ERR_MSG(extack, "IPv6 support not enabled in kernel");
+ return -EINVAL;
+#endif
+ break;
+ default:
+ NL_SET_ERR_MSG(extack, "Unsupported address family in RTA_VIA");
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
static int rtm_to_fib_config(struct net *net, struct sk_buff *skb,
struct nlmsghdr *nlh, struct fib_config *cfg,
struct netlink_ext_ack *extack)
{
+ bool has_gw = false, has_via = false;
struct nlattr *attr;
int err, remaining;
struct rtmsg *rtm;
cfg->fc_oif = nla_get_u32(attr);
break;
case RTA_GATEWAY:
- cfg->fc_gw = nla_get_be32(attr);
+ has_gw = true;
+ cfg->fc_gw4 = nla_get_be32(attr);
+ if (cfg->fc_gw4)
+ cfg->fc_gw_family = AF_INET;
break;
case RTA_VIA:
- NL_SET_ERR_MSG(extack, "IPv4 does not support RTA_VIA attribute");
- err = -EINVAL;
- goto errout;
+ has_via = true;
+ err = fib_gw_from_via(cfg, attr, extack);
+ if (err)
+ goto errout;
+ break;
case RTA_PRIORITY:
cfg->fc_priority = nla_get_u32(attr);
break;
}
}
+ if (has_gw && has_via) {
+ NL_SET_ERR_MSG(extack,
+ "Nexthop configuration can not contain both GATEWAY and VIA");
+ goto errout;
+ }
+
return 0;
errout:
return err;
{
/* we used to play games with refcounts, but we now use RCU */
res->fi = fi;
+ res->nhc = fib_info_nhc(fi, 0);
}
struct fib_prop {
#include <net/tcp.h>
#include <net/sock.h>
#include <net/ip_fib.h>
+#include <net/ip6_fib.h>
#include <net/netlink.h>
#include <net/nexthop.h>
#include <net/lwtunnel.h>
#include <net/fib_notifier.h>
+#include <net/addrconf.h>
#include "fib_lookup.h"
free_percpu(rtp);
}
+void fib_nh_common_release(struct fib_nh_common *nhc)
+{
+ if (nhc->nhc_dev)
+ dev_put(nhc->nhc_dev);
+
+ lwtstate_put(nhc->nhc_lwtstate);
+}
+EXPORT_SYMBOL_GPL(fib_nh_common_release);
+
+void fib_nh_release(struct net *net, struct fib_nh *fib_nh)
+{
+#ifdef CONFIG_IP_ROUTE_CLASSID
+ if (fib_nh->nh_tclassid)
+ net->ipv4.fib_num_tclassid_users--;
+#endif
+ fib_nh_common_release(&fib_nh->nh_common);
+ free_nh_exceptions(fib_nh);
+ rt_fibinfo_free_cpus(fib_nh->nh_pcpu_rth_output);
+ rt_fibinfo_free(&fib_nh->nh_rth_input);
+}
+
/* Release a nexthop info record */
static void free_fib_info_rcu(struct rcu_head *head)
{
struct fib_info *fi = container_of(head, struct fib_info, rcu);
change_nexthops(fi) {
- if (nexthop_nh->nh_dev)
- dev_put(nexthop_nh->nh_dev);
- lwtstate_put(nexthop_nh->nh_lwtstate);
- free_nh_exceptions(nexthop_nh);
- rt_fibinfo_free_cpus(nexthop_nh->nh_pcpu_rth_output);
- rt_fibinfo_free(&nexthop_nh->nh_rth_input);
+ fib_nh_release(fi->fib_net, nexthop_nh);
} endfor_nexthops(fi);
ip_fib_metrics_put(fi->fib_metrics);
return;
}
fib_info_cnt--;
-#ifdef CONFIG_IP_ROUTE_CLASSID
- change_nexthops(fi) {
- if (nexthop_nh->nh_tclassid)
- fi->fib_net->ipv4.fib_num_tclassid_users--;
- } endfor_nexthops(fi);
-#endif
+
call_rcu(&fi->rcu, free_fib_info_rcu);
}
EXPORT_SYMBOL_GPL(free_fib_info);
if (fi->fib_prefsrc)
hlist_del(&fi->fib_lhash);
change_nexthops(fi) {
- if (!nexthop_nh->nh_dev)
+ if (!nexthop_nh->fib_nh_dev)
continue;
hlist_del(&nexthop_nh->nh_hash);
} endfor_nexthops(fi)
const struct fib_nh *onh = ofi->fib_nh;
for_nexthops(fi) {
- if (nh->nh_oif != onh->nh_oif ||
- nh->nh_gw != onh->nh_gw ||
- nh->nh_scope != onh->nh_scope ||
+ if (nh->fib_nh_oif != onh->fib_nh_oif ||
+ nh->fib_nh_gw_family != onh->fib_nh_gw_family ||
+ nh->fib_nh_scope != onh->fib_nh_scope ||
#ifdef CONFIG_IP_ROUTE_MULTIPATH
- nh->nh_weight != onh->nh_weight ||
+ nh->fib_nh_weight != onh->fib_nh_weight ||
#endif
#ifdef CONFIG_IP_ROUTE_CLASSID
nh->nh_tclassid != onh->nh_tclassid ||
#endif
- lwtunnel_cmp_encap(nh->nh_lwtstate, onh->nh_lwtstate) ||
- ((nh->nh_flags ^ onh->nh_flags) & ~RTNH_COMPARE_MASK))
+ lwtunnel_cmp_encap(nh->fib_nh_lws, onh->fib_nh_lws) ||
+ ((nh->fib_nh_flags ^ onh->fib_nh_flags) & ~RTNH_COMPARE_MASK))
+ return -1;
+
+ if (nh->fib_nh_gw_family == AF_INET &&
+ nh->fib_nh_gw4 != onh->fib_nh_gw4)
return -1;
+
+ if (nh->fib_nh_gw_family == AF_INET6 &&
+ ipv6_addr_cmp(&nh->fib_nh_gw6, &onh->fib_nh_gw6))
+ return -1;
+
onh++;
} endfor_nexthops(fi);
return 0;
val ^= (__force u32)fi->fib_prefsrc;
val ^= fi->fib_priority;
for_nexthops(fi) {
- val ^= fib_devindex_hashfn(nh->nh_oif);
+ val ^= fib_devindex_hashfn(nh->fib_nh_oif);
} endfor_nexthops(fi)
return (val ^ (val >> 7) ^ (val >> 12)) & mask;
hash = fib_devindex_hashfn(dev->ifindex);
head = &fib_info_devhash[hash];
hlist_for_each_entry(nh, head, nh_hash) {
- if (nh->nh_dev == dev &&
- nh->nh_gw == gw &&
- !(nh->nh_flags & RTNH_F_DEAD)) {
+ if (nh->fib_nh_dev == dev &&
+ nh->fib_nh_gw4 == gw &&
+ !(nh->fib_nh_flags & RTNH_F_DEAD)) {
spin_unlock(&fib_info_lock);
return 0;
}
/* grab encap info */
for_nexthops(fi) {
- if (nh->nh_lwtstate) {
+ if (nh->fib_nh_lws) {
/* RTA_ENCAP_TYPE */
nh_encapsize += lwtunnel_get_encap_size(
- nh->nh_lwtstate);
+ nh->fib_nh_lws);
/* RTA_ENCAP */
nh_encapsize += nla_total_size(2);
}
struct fib_info **last_resort, int *last_idx,
int dflt)
{
+ const struct fib_nh_common *nhc = fib_info_nhc(fi, 0);
struct neighbour *n;
int state = NUD_NONE;
- n = neigh_lookup(&arp_tbl, &fi->fib_nh[0].nh_gw, fi->fib_dev);
+ if (likely(nhc->nhc_gw_family == AF_INET))
+ n = neigh_lookup(&arp_tbl, &nhc->nhc_gw.ipv4, nhc->nhc_dev);
+ else if (nhc->nhc_gw_family == AF_INET6)
+ n = neigh_lookup(ipv6_stub->nd_tbl, &nhc->nhc_gw.ipv6,
+ nhc->nhc_dev);
+ else
+ n = NULL;
+
if (n) {
state = n->nud_state;
neigh_release(n);
return 1;
}
+int fib_nh_common_init(struct fib_nh_common *nhc, struct nlattr *encap,
+ u16 encap_type, void *cfg, gfp_t gfp_flags,
+ struct netlink_ext_ack *extack)
+{
+ if (encap) {
+ struct lwtunnel_state *lwtstate;
+ int err;
+
+ if (encap_type == LWTUNNEL_ENCAP_NONE) {
+ NL_SET_ERR_MSG(extack, "LWT encap type not specified");
+ return -EINVAL;
+ }
+ err = lwtunnel_build_state(encap_type, encap, nhc->nhc_family,
+ cfg, &lwtstate, extack);
+ if (err)
+ return err;
+
+ nhc->nhc_lwtstate = lwtstate_get(lwtstate);
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(fib_nh_common_init);
+
+int fib_nh_init(struct net *net, struct fib_nh *nh,
+ struct fib_config *cfg, int nh_weight,
+ struct netlink_ext_ack *extack)
+{
+ int err = -ENOMEM;
+
+ nh->fib_nh_family = AF_INET;
+
+ nh->nh_pcpu_rth_output = alloc_percpu(struct rtable __rcu *);
+ if (!nh->nh_pcpu_rth_output)
+ goto err_out;
+
+ err = fib_nh_common_init(&nh->nh_common, cfg->fc_encap,
+ cfg->fc_encap_type, cfg, GFP_KERNEL, extack);
+ if (err)
+ goto init_failure;
+
+ nh->fib_nh_oif = cfg->fc_oif;
+ nh->fib_nh_gw_family = cfg->fc_gw_family;
+ if (cfg->fc_gw_family == AF_INET)
+ nh->fib_nh_gw4 = cfg->fc_gw4;
+ else if (cfg->fc_gw_family == AF_INET6)
+ nh->fib_nh_gw6 = cfg->fc_gw6;
+
+ nh->fib_nh_flags = cfg->fc_flags;
+
+#ifdef CONFIG_IP_ROUTE_CLASSID
+ nh->nh_tclassid = cfg->fc_flow;
+ if (nh->nh_tclassid)
+ net->ipv4.fib_num_tclassid_users++;
+#endif
+#ifdef CONFIG_IP_ROUTE_MULTIPATH
+ nh->fib_nh_weight = nh_weight;
+#endif
+ return 0;
+
+init_failure:
+ rt_fibinfo_free_cpus(nh->nh_pcpu_rth_output);
+ nh->nh_pcpu_rth_output = NULL;
+err_out:
+ return err;
+}
+
#ifdef CONFIG_IP_ROUTE_MULTIPATH
static int fib_count_nexthops(struct rtnexthop *rtnh, int remaining,
int remaining, struct fib_config *cfg,
struct netlink_ext_ack *extack)
{
+ struct net *net = fi->fib_net;
+ struct fib_config fib_cfg;
int ret;
change_nexthops(fi) {
int attrlen;
+ memset(&fib_cfg, 0, sizeof(fib_cfg));
+
if (!rtnh_ok(rtnh, remaining)) {
NL_SET_ERR_MSG(extack,
"Invalid nexthop configuration - extra data after nexthop");
return -EINVAL;
}
- nexthop_nh->nh_flags =
- (cfg->fc_flags & ~0xFF) | rtnh->rtnh_flags;
- nexthop_nh->nh_oif = rtnh->rtnh_ifindex;
- nexthop_nh->nh_weight = rtnh->rtnh_hops + 1;
+ fib_cfg.fc_flags = (cfg->fc_flags & ~0xFF) | rtnh->rtnh_flags;
+ fib_cfg.fc_oif = rtnh->rtnh_ifindex;
attrlen = rtnh_attrlen(rtnh);
if (attrlen > 0) {
- struct nlattr *nla, *attrs = rtnh_attrs(rtnh);
+ struct nlattr *nla, *nlav, *attrs = rtnh_attrs(rtnh);
nla = nla_find(attrs, attrlen, RTA_GATEWAY);
- nexthop_nh->nh_gw = nla ? nla_get_in_addr(nla) : 0;
-#ifdef CONFIG_IP_ROUTE_CLASSID
- nla = nla_find(attrs, attrlen, RTA_FLOW);
- nexthop_nh->nh_tclassid = nla ? nla_get_u32(nla) : 0;
- if (nexthop_nh->nh_tclassid)
- fi->fib_net->ipv4.fib_num_tclassid_users++;
-#endif
- nla = nla_find(attrs, attrlen, RTA_ENCAP);
+ nlav = nla_find(attrs, attrlen, RTA_VIA);
+ if (nla && nlav) {
+ NL_SET_ERR_MSG(extack,
+ "Nexthop configuration can not contain both GATEWAY and VIA");
+ return -EINVAL;
+ }
if (nla) {
- struct lwtunnel_state *lwtstate;
- struct nlattr *nla_entype;
-
- nla_entype = nla_find(attrs, attrlen,
- RTA_ENCAP_TYPE);
- if (!nla_entype) {
- NL_SET_BAD_ATTR(extack, nla);
- NL_SET_ERR_MSG(extack,
- "Encap type is missing");
- goto err_inval;
- }
-
- ret = lwtunnel_build_state(nla_get_u16(
- nla_entype),
- nla, AF_INET, cfg,
- &lwtstate, extack);
+ fib_cfg.fc_gw4 = nla_get_in_addr(nla);
+ if (fib_cfg.fc_gw4)
+ fib_cfg.fc_gw_family = AF_INET;
+ } else if (nlav) {
+ ret = fib_gw_from_via(&fib_cfg, nlav, extack);
if (ret)
goto errout;
- nexthop_nh->nh_lwtstate =
- lwtstate_get(lwtstate);
}
+
+ nla = nla_find(attrs, attrlen, RTA_FLOW);
+ if (nla)
+ fib_cfg.fc_flow = nla_get_u32(nla);
+
+ fib_cfg.fc_encap = nla_find(attrs, attrlen, RTA_ENCAP);
+ nla = nla_find(attrs, attrlen, RTA_ENCAP_TYPE);
+ if (nla)
+ fib_cfg.fc_encap_type = nla_get_u16(nla);
}
+ ret = fib_nh_init(net, nexthop_nh, &fib_cfg,
+ rtnh->rtnh_hops + 1, extack);
+ if (ret)
+ goto errout;
+
rtnh = rtnh_next(rtnh, &remaining);
} endfor_nexthops(fi);
- return 0;
-
-err_inval:
ret = -EINVAL;
-
+ if (cfg->fc_oif && fi->fib_nh->fib_nh_oif != cfg->fc_oif) {
+ NL_SET_ERR_MSG(extack,
+ "Nexthop device index does not match RTA_OIF");
+ goto errout;
+ }
+ if (cfg->fc_gw_family) {
+ if (cfg->fc_gw_family != fi->fib_nh->fib_nh_gw_family ||
+ (cfg->fc_gw_family == AF_INET &&
+ fi->fib_nh->fib_nh_gw4 != cfg->fc_gw4) ||
+ (cfg->fc_gw_family == AF_INET6 &&
+ ipv6_addr_cmp(&fi->fib_nh->fib_nh_gw6, &cfg->fc_gw6))) {
+ NL_SET_ERR_MSG(extack,
+ "Nexthop gateway does not match RTA_GATEWAY or RTA_VIA");
+ goto errout;
+ }
+ }
+#ifdef CONFIG_IP_ROUTE_CLASSID
+ if (cfg->fc_flow && fi->fib_nh->nh_tclassid != cfg->fc_flow) {
+ NL_SET_ERR_MSG(extack,
+ "Nexthop class id does not match RTA_FLOW");
+ goto errout;
+ }
+#endif
+ ret = 0;
errout:
return ret;
}
{
int total;
int w;
- struct in_device *in_dev;
if (fi->fib_nhs < 2)
return;
total = 0;
for_nexthops(fi) {
- if (nh->nh_flags & RTNH_F_DEAD)
+ if (nh->fib_nh_flags & RTNH_F_DEAD)
continue;
- in_dev = __in_dev_get_rtnl(nh->nh_dev);
-
- if (in_dev &&
- IN_DEV_IGNORE_ROUTES_WITH_LINKDOWN(in_dev) &&
- nh->nh_flags & RTNH_F_LINKDOWN)
+ if (ip_ignore_linkdown(nh->fib_nh_dev) &&
+ nh->fib_nh_flags & RTNH_F_LINKDOWN)
continue;
- total += nh->nh_weight;
+ total += nh->fib_nh_weight;
} endfor_nexthops(fi);
w = 0;
change_nexthops(fi) {
int upper_bound;
- in_dev = __in_dev_get_rtnl(nexthop_nh->nh_dev);
-
- if (nexthop_nh->nh_flags & RTNH_F_DEAD) {
+ if (nexthop_nh->fib_nh_flags & RTNH_F_DEAD) {
upper_bound = -1;
- } else if (in_dev &&
- IN_DEV_IGNORE_ROUTES_WITH_LINKDOWN(in_dev) &&
- nexthop_nh->nh_flags & RTNH_F_LINKDOWN) {
+ } else if (ip_ignore_linkdown(nexthop_nh->fib_nh_dev) &&
+ nexthop_nh->fib_nh_flags & RTNH_F_LINKDOWN) {
upper_bound = -1;
} else {
- w += nexthop_nh->nh_weight;
+ w += nexthop_nh->fib_nh_weight;
upper_bound = DIV_ROUND_CLOSEST_ULL((u64)w << 31,
total) - 1;
}
- atomic_set(&nexthop_nh->nh_upper_bound, upper_bound);
+ atomic_set(&nexthop_nh->fib_nh_upper_bound, upper_bound);
} endfor_nexthops(fi);
}
#else /* CONFIG_IP_ROUTE_MULTIPATH */
+static int fib_get_nhs(struct fib_info *fi, struct rtnexthop *rtnh,
+ int remaining, struct fib_config *cfg,
+ struct netlink_ext_ack *extack)
+{
+ NL_SET_ERR_MSG(extack, "Multipath support not enabled in kernel");
+
+ return -EINVAL;
+}
+
#define fib_rebalance(fi) do { } while (0)
#endif /* CONFIG_IP_ROUTE_MULTIPATH */
ret = lwtunnel_build_state(encap_type, encap, AF_INET,
cfg, &lwtstate, extack);
if (!ret) {
- result = lwtunnel_cmp_encap(lwtstate, nh->nh_lwtstate);
+ result = lwtunnel_cmp_encap(lwtstate, nh->fib_nh_lws);
lwtstate_free(lwtstate);
}
if (cfg->fc_priority && cfg->fc_priority != fi->fib_priority)
return 1;
- if (cfg->fc_oif || cfg->fc_gw) {
+ if (cfg->fc_oif || cfg->fc_gw_family) {
if (cfg->fc_encap) {
if (fib_encap_match(cfg->fc_encap_type, cfg->fc_encap,
fi->fib_nh, cfg, extack))
cfg->fc_flow != fi->fib_nh->nh_tclassid)
return 1;
#endif
- if ((!cfg->fc_oif || cfg->fc_oif == fi->fib_nh->nh_oif) &&
- (!cfg->fc_gw || cfg->fc_gw == fi->fib_nh->nh_gw))
- return 0;
- return 1;
+ if ((cfg->fc_oif && cfg->fc_oif != fi->fib_nh->fib_nh_oif) ||
+ (cfg->fc_gw_family &&
+ cfg->fc_gw_family != fi->fib_nh->fib_nh_gw_family))
+ return 1;
+
+ if (cfg->fc_gw_family == AF_INET &&
+ cfg->fc_gw4 != fi->fib_nh->fib_nh_gw4)
+ return 1;
+
+ if (cfg->fc_gw_family == AF_INET6 &&
+ ipv6_addr_cmp(&cfg->fc_gw6, &fi->fib_nh->fib_nh_gw6))
+ return 1;
+
+ return 0;
}
#ifdef CONFIG_IP_ROUTE_MULTIPATH
if (!rtnh_ok(rtnh, remaining))
return -EINVAL;
- if (rtnh->rtnh_ifindex && rtnh->rtnh_ifindex != nh->nh_oif)
+ if (rtnh->rtnh_ifindex && rtnh->rtnh_ifindex != nh->fib_nh_oif)
return 1;
attrlen = rtnh_attrlen(rtnh);
if (attrlen > 0) {
- struct nlattr *nla, *attrs = rtnh_attrs(rtnh);
+ struct nlattr *nla, *nlav, *attrs = rtnh_attrs(rtnh);
nla = nla_find(attrs, attrlen, RTA_GATEWAY);
- if (nla && nla_get_in_addr(nla) != nh->nh_gw)
- return 1;
+ nlav = nla_find(attrs, attrlen, RTA_VIA);
+ if (nla && nlav) {
+ NL_SET_ERR_MSG(extack,
+ "Nexthop configuration can not contain both GATEWAY and VIA");
+ return -EINVAL;
+ }
+
+ if (nla) {
+ if (nh->fib_nh_gw_family != AF_INET ||
+ nla_get_in_addr(nla) != nh->fib_nh_gw4)
+ return 1;
+ } else if (nlav) {
+ struct fib_config cfg2;
+ int err;
+
+ err = fib_gw_from_via(&cfg2, nlav, extack);
+ if (err)
+ return err;
+
+ switch (nh->fib_nh_gw_family) {
+ case AF_INET:
+ if (cfg2.fc_gw_family != AF_INET ||
+ cfg2.fc_gw4 != nh->fib_nh_gw4)
+ return 1;
+ break;
+ case AF_INET6:
+ if (cfg2.fc_gw_family != AF_INET6 ||
+ ipv6_addr_cmp(&cfg2.fc_gw6,
+ &nh->fib_nh_gw6))
+ return 1;
+ break;
+ }
+ }
+
#ifdef CONFIG_IP_ROUTE_CLASSID
nla = nla_find(attrs, attrlen, RTA_FLOW);
if (nla && nla_get_u32(nla) != nh->nh_tclassid)
return true;
}
+static int fib_check_nh_v6_gw(struct net *net, struct fib_nh *nh,
+ u32 table, struct netlink_ext_ack *extack)
+{
+ struct fib6_config cfg = {
+ .fc_table = table,
+ .fc_flags = nh->fib_nh_flags | RTF_GATEWAY,
+ .fc_ifindex = nh->fib_nh_oif,
+ .fc_gateway = nh->fib_nh_gw6,
+ };
+ struct fib6_nh fib6_nh = {};
+ int err;
+
+ err = ipv6_stub->fib6_nh_init(net, &fib6_nh, &cfg, GFP_KERNEL, extack);
+ if (!err) {
+ nh->fib_nh_dev = fib6_nh.fib_nh_dev;
+ dev_hold(nh->fib_nh_dev);
+ nh->fib_nh_oif = nh->fib_nh_dev->ifindex;
+ nh->fib_nh_scope = RT_SCOPE_LINK;
+
+ ipv6_stub->fib6_nh_release(&fib6_nh);
+ }
+
+ return err;
+}
/*
* Picture
* |
* |-> {local prefix} (terminal node)
*/
-static int fib_check_nh(struct fib_config *cfg, struct fib_nh *nh,
- struct netlink_ext_ack *extack)
+static int fib_check_nh_v4_gw(struct net *net, struct fib_nh *nh, u32 table,
+ u8 scope, struct netlink_ext_ack *extack)
{
- int err = 0;
- struct net *net;
struct net_device *dev;
+ struct fib_result res;
+ int err;
- net = cfg->fc_nlinfo.nl_net;
- if (nh->nh_gw) {
- struct fib_result res;
-
- if (nh->nh_flags & RTNH_F_ONLINK) {
- unsigned int addr_type;
+ if (nh->fib_nh_flags & RTNH_F_ONLINK) {
+ unsigned int addr_type;
- if (cfg->fc_scope >= RT_SCOPE_LINK) {
- NL_SET_ERR_MSG(extack,
- "Nexthop has invalid scope");
- return -EINVAL;
- }
- dev = __dev_get_by_index(net, nh->nh_oif);
- if (!dev) {
- NL_SET_ERR_MSG(extack, "Nexthop device required for onlink");
- return -ENODEV;
- }
- if (!(dev->flags & IFF_UP)) {
- NL_SET_ERR_MSG(extack,
- "Nexthop device is not up");
- return -ENETDOWN;
- }
- addr_type = inet_addr_type_dev_table(net, dev, nh->nh_gw);
- if (addr_type != RTN_UNICAST) {
- NL_SET_ERR_MSG(extack,
- "Nexthop has invalid gateway");
- return -EINVAL;
- }
- if (!netif_carrier_ok(dev))
- nh->nh_flags |= RTNH_F_LINKDOWN;
- nh->nh_dev = dev;
- dev_hold(dev);
- nh->nh_scope = RT_SCOPE_LINK;
- return 0;
+ if (scope >= RT_SCOPE_LINK) {
+ NL_SET_ERR_MSG(extack, "Nexthop has invalid scope");
+ return -EINVAL;
}
- rcu_read_lock();
- {
- struct fib_table *tbl = NULL;
- struct flowi4 fl4 = {
- .daddr = nh->nh_gw,
- .flowi4_scope = cfg->fc_scope + 1,
- .flowi4_oif = nh->nh_oif,
- .flowi4_iif = LOOPBACK_IFINDEX,
- };
-
- /* It is not necessary, but requires a bit of thinking */
- if (fl4.flowi4_scope < RT_SCOPE_LINK)
- fl4.flowi4_scope = RT_SCOPE_LINK;
-
- if (cfg->fc_table)
- tbl = fib_get_table(net, cfg->fc_table);
-
- if (tbl)
- err = fib_table_lookup(tbl, &fl4, &res,
- FIB_LOOKUP_IGNORE_LINKSTATE |
- FIB_LOOKUP_NOREF);
-
- /* on error or if no table given do full lookup. This
- * is needed for example when nexthops are in the local
- * table rather than the given table
- */
- if (!tbl || err) {
- err = fib_lookup(net, &fl4, &res,
- FIB_LOOKUP_IGNORE_LINKSTATE);
- }
-
- if (err) {
- NL_SET_ERR_MSG(extack,
- "Nexthop has invalid gateway");
- rcu_read_unlock();
- return err;
- }
+ dev = __dev_get_by_index(net, nh->fib_nh_oif);
+ if (!dev) {
+ NL_SET_ERR_MSG(extack, "Nexthop device required for onlink");
+ return -ENODEV;
}
- err = -EINVAL;
- if (res.type != RTN_UNICAST && res.type != RTN_LOCAL) {
- NL_SET_ERR_MSG(extack, "Nexthop has invalid gateway");
- goto out;
+ if (!(dev->flags & IFF_UP)) {
+ NL_SET_ERR_MSG(extack, "Nexthop device is not up");
+ return -ENETDOWN;
}
- nh->nh_scope = res.scope;
- nh->nh_oif = FIB_RES_OIF(res);
- nh->nh_dev = dev = FIB_RES_DEV(res);
- if (!dev) {
- NL_SET_ERR_MSG(extack,
- "No egress device for nexthop gateway");
- goto out;
+ addr_type = inet_addr_type_dev_table(net, dev, nh->fib_nh_gw4);
+ if (addr_type != RTN_UNICAST) {
+ NL_SET_ERR_MSG(extack, "Nexthop has invalid gateway");
+ return -EINVAL;
}
- dev_hold(dev);
if (!netif_carrier_ok(dev))
- nh->nh_flags |= RTNH_F_LINKDOWN;
- err = (dev->flags & IFF_UP) ? 0 : -ENETDOWN;
- } else {
- struct in_device *in_dev;
-
- if (nh->nh_flags & (RTNH_F_PERVASIVE | RTNH_F_ONLINK)) {
- NL_SET_ERR_MSG(extack,
- "Invalid flags for nexthop - PERVASIVE and ONLINK can not be set");
- return -EINVAL;
+ nh->fib_nh_flags |= RTNH_F_LINKDOWN;
+ nh->fib_nh_dev = dev;
+ dev_hold(dev);
+ nh->fib_nh_scope = RT_SCOPE_LINK;
+ return 0;
+ }
+ rcu_read_lock();
+ {
+ struct fib_table *tbl = NULL;
+ struct flowi4 fl4 = {
+ .daddr = nh->fib_nh_gw4,
+ .flowi4_scope = scope + 1,
+ .flowi4_oif = nh->fib_nh_oif,
+ .flowi4_iif = LOOPBACK_IFINDEX,
+ };
+
+ /* It is not necessary, but requires a bit of thinking */
+ if (fl4.flowi4_scope < RT_SCOPE_LINK)
+ fl4.flowi4_scope = RT_SCOPE_LINK;
+
+ if (table)
+ tbl = fib_get_table(net, table);
+
+ if (tbl)
+ err = fib_table_lookup(tbl, &fl4, &res,
+ FIB_LOOKUP_IGNORE_LINKSTATE |
+ FIB_LOOKUP_NOREF);
+
+ /* on error or if no table given do full lookup. This
+ * is needed for example when nexthops are in the local
+ * table rather than the given table
+ */
+ if (!tbl || err) {
+ err = fib_lookup(net, &fl4, &res,
+ FIB_LOOKUP_IGNORE_LINKSTATE);
}
- rcu_read_lock();
- err = -ENODEV;
- in_dev = inetdev_by_index(net, nh->nh_oif);
- if (!in_dev)
- goto out;
- err = -ENETDOWN;
- if (!(in_dev->dev->flags & IFF_UP)) {
- NL_SET_ERR_MSG(extack, "Device for nexthop is not up");
+
+ if (err) {
+ NL_SET_ERR_MSG(extack, "Nexthop has invalid gateway");
goto out;
}
- nh->nh_dev = in_dev->dev;
- dev_hold(nh->nh_dev);
- nh->nh_scope = RT_SCOPE_HOST;
- if (!netif_carrier_ok(nh->nh_dev))
- nh->nh_flags |= RTNH_F_LINKDOWN;
- err = 0;
}
+
+ err = -EINVAL;
+ if (res.type != RTN_UNICAST && res.type != RTN_LOCAL) {
+ NL_SET_ERR_MSG(extack, "Nexthop has invalid gateway");
+ goto out;
+ }
+ nh->fib_nh_scope = res.scope;
+ nh->fib_nh_oif = FIB_RES_OIF(res);
+ nh->fib_nh_dev = dev = FIB_RES_DEV(res);
+ if (!dev) {
+ NL_SET_ERR_MSG(extack,
+ "No egress device for nexthop gateway");
+ goto out;
+ }
+ dev_hold(dev);
+ if (!netif_carrier_ok(dev))
+ nh->fib_nh_flags |= RTNH_F_LINKDOWN;
+ err = (dev->flags & IFF_UP) ? 0 : -ENETDOWN;
+out:
+ rcu_read_unlock();
+ return err;
+}
+
+static int fib_check_nh_nongw(struct net *net, struct fib_nh *nh,
+ struct netlink_ext_ack *extack)
+{
+ struct in_device *in_dev;
+ int err;
+
+ if (nh->fib_nh_flags & (RTNH_F_PERVASIVE | RTNH_F_ONLINK)) {
+ NL_SET_ERR_MSG(extack,
+ "Invalid flags for nexthop - PERVASIVE and ONLINK can not be set");
+ return -EINVAL;
+ }
+
+ rcu_read_lock();
+
+ err = -ENODEV;
+ in_dev = inetdev_by_index(net, nh->fib_nh_oif);
+ if (!in_dev)
+ goto out;
+ err = -ENETDOWN;
+ if (!(in_dev->dev->flags & IFF_UP)) {
+ NL_SET_ERR_MSG(extack, "Device for nexthop is not up");
+ goto out;
+ }
+
+ nh->fib_nh_dev = in_dev->dev;
+ dev_hold(nh->fib_nh_dev);
+ nh->fib_nh_scope = RT_SCOPE_HOST;
+ if (!netif_carrier_ok(nh->fib_nh_dev))
+ nh->fib_nh_flags |= RTNH_F_LINKDOWN;
+ err = 0;
out:
rcu_read_unlock();
return err;
}
+static int fib_check_nh(struct fib_config *cfg, struct fib_nh *nh,
+ struct netlink_ext_ack *extack)
+{
+ struct net *net = cfg->fc_nlinfo.nl_net;
+ u32 table = cfg->fc_table;
+ int err;
+
+ if (nh->fib_nh_gw_family == AF_INET)
+ err = fib_check_nh_v4_gw(net, nh, table, cfg->fc_scope, extack);
+ else if (nh->fib_nh_gw_family == AF_INET6)
+ err = fib_check_nh_v6_gw(net, nh, table, extack);
+ else
+ err = fib_check_nh_nongw(net, nh, extack);
+
+ return err;
+}
+
static inline unsigned int fib_laddr_hashfn(__be32 val)
{
unsigned int mask = (fib_info_hash_size - 1);
__be32 fib_info_update_nh_saddr(struct net *net, struct fib_nh *nh)
{
- nh->nh_saddr = inet_select_addr(nh->nh_dev,
- nh->nh_gw,
+ nh->nh_saddr = inet_select_addr(nh->fib_nh_dev,
+ nh->fib_nh_gw4,
nh->nh_parent->fib_scope);
nh->nh_saddr_genid = atomic_read(&net->ipv4.dev_addr_genid);
return nh->nh_saddr;
}
+__be32 fib_result_prefsrc(struct net *net, struct fib_result *res)
+{
+ struct fib_nh_common *nhc = res->nhc;
+ struct fib_nh *nh;
+
+ if (res->fi->fib_prefsrc)
+ return res->fi->fib_prefsrc;
+
+ nh = container_of(nhc, struct fib_nh, nh_common);
+ if (nh->nh_saddr_genid == atomic_read(&net->ipv4.dev_addr_genid))
+ return nh->nh_saddr;
+
+ return fib_info_update_nh_saddr(net, nh);
+}
+
static bool fib_valid_prefsrc(struct fib_config *cfg, __be32 fib_prefsrc)
{
if (cfg->fc_type != RTN_LOCAL || !cfg->fc_dst ||
fi->fib_nhs = nhs;
change_nexthops(fi) {
nexthop_nh->nh_parent = fi;
- nexthop_nh->nh_pcpu_rth_output = alloc_percpu(struct rtable __rcu *);
- if (!nexthop_nh->nh_pcpu_rth_output)
- goto failure;
} endfor_nexthops(fi)
- if (cfg->fc_mp) {
-#ifdef CONFIG_IP_ROUTE_MULTIPATH
+ if (cfg->fc_mp)
err = fib_get_nhs(fi, cfg->fc_mp, cfg->fc_mp_len, cfg, extack);
- if (err != 0)
- goto failure;
- if (cfg->fc_oif && fi->fib_nh->nh_oif != cfg->fc_oif) {
- NL_SET_ERR_MSG(extack,
- "Nexthop device index does not match RTA_OIF");
- goto err_inval;
- }
- if (cfg->fc_gw && fi->fib_nh->nh_gw != cfg->fc_gw) {
- NL_SET_ERR_MSG(extack,
- "Nexthop gateway does not match RTA_GATEWAY");
- goto err_inval;
- }
-#ifdef CONFIG_IP_ROUTE_CLASSID
- if (cfg->fc_flow && fi->fib_nh->nh_tclassid != cfg->fc_flow) {
- NL_SET_ERR_MSG(extack,
- "Nexthop class id does not match RTA_FLOW");
- goto err_inval;
- }
-#endif
-#else
- NL_SET_ERR_MSG(extack,
- "Multipath support not enabled in kernel");
- goto err_inval;
-#endif
- } else {
- struct fib_nh *nh = fi->fib_nh;
-
- if (cfg->fc_encap) {
- struct lwtunnel_state *lwtstate;
-
- if (cfg->fc_encap_type == LWTUNNEL_ENCAP_NONE) {
- NL_SET_ERR_MSG(extack,
- "LWT encap type not specified");
- goto err_inval;
- }
- err = lwtunnel_build_state(cfg->fc_encap_type,
- cfg->fc_encap, AF_INET, cfg,
- &lwtstate, extack);
- if (err)
- goto failure;
+ else
+ err = fib_nh_init(net, fi->fib_nh, cfg, 1, extack);
- nh->nh_lwtstate = lwtstate_get(lwtstate);
- }
- nh->nh_oif = cfg->fc_oif;
- nh->nh_gw = cfg->fc_gw;
- nh->nh_flags = cfg->fc_flags;
-#ifdef CONFIG_IP_ROUTE_CLASSID
- nh->nh_tclassid = cfg->fc_flow;
- if (nh->nh_tclassid)
- fi->fib_net->ipv4.fib_num_tclassid_users++;
-#endif
-#ifdef CONFIG_IP_ROUTE_MULTIPATH
- nh->nh_weight = 1;
-#endif
- }
+ if (err != 0)
+ goto failure;
if (fib_props[cfg->fc_type].error) {
- if (cfg->fc_gw || cfg->fc_oif || cfg->fc_mp) {
+ if (cfg->fc_gw_family || cfg->fc_oif || cfg->fc_mp) {
NL_SET_ERR_MSG(extack,
"Gateway, device and multipath can not be specified for this route type");
goto err_inval;
"Route with host scope can not have multiple nexthops");
goto err_inval;
}
- if (nh->nh_gw) {
+ if (nh->fib_nh_gw_family) {
NL_SET_ERR_MSG(extack,
"Route with host scope can not have a gateway");
goto err_inval;
}
- nh->nh_scope = RT_SCOPE_NOWHERE;
- nh->nh_dev = dev_get_by_index(net, fi->fib_nh->nh_oif);
+ nh->fib_nh_scope = RT_SCOPE_NOWHERE;
+ nh->fib_nh_dev = dev_get_by_index(net, fi->fib_nh->fib_nh_oif);
err = -ENODEV;
- if (!nh->nh_dev)
+ if (!nh->fib_nh_dev)
goto failure;
} else {
int linkdown = 0;
err = fib_check_nh(cfg, nexthop_nh, extack);
if (err != 0)
goto failure;
- if (nexthop_nh->nh_flags & RTNH_F_LINKDOWN)
+ if (nexthop_nh->fib_nh_flags & RTNH_F_LINKDOWN)
linkdown++;
} endfor_nexthops(fi)
if (linkdown == fi->fib_nhs)
change_nexthops(fi) {
fib_info_update_nh_saddr(net, nexthop_nh);
+ if (nexthop_nh->fib_nh_gw_family == AF_INET6)
+ fi->fib_nh_is_v6 = true;
} endfor_nexthops(fi)
fib_rebalance(fi);
struct hlist_head *head;
unsigned int hash;
- if (!nexthop_nh->nh_dev)
+ if (!nexthop_nh->fib_nh_dev)
continue;
- hash = fib_devindex_hashfn(nexthop_nh->nh_dev->ifindex);
+ hash = fib_devindex_hashfn(nexthop_nh->fib_nh_dev->ifindex);
head = &fib_info_devhash[hash];
hlist_add_head(&nexthop_nh->nh_hash, head);
} endfor_nexthops(fi)
return ERR_PTR(err);
}
+int fib_nexthop_info(struct sk_buff *skb, const struct fib_nh_common *nhc,
+ unsigned int *flags, bool skip_oif)
+{
+ if (nhc->nhc_flags & RTNH_F_DEAD)
+ *flags |= RTNH_F_DEAD;
+
+ if (nhc->nhc_flags & RTNH_F_LINKDOWN) {
+ *flags |= RTNH_F_LINKDOWN;
+
+ rcu_read_lock();
+ switch (nhc->nhc_family) {
+ case AF_INET:
+ if (ip_ignore_linkdown(nhc->nhc_dev))
+ *flags |= RTNH_F_DEAD;
+ break;
+ case AF_INET6:
+ if (ip6_ignore_linkdown(nhc->nhc_dev))
+ *flags |= RTNH_F_DEAD;
+ break;
+ }
+ rcu_read_unlock();
+ }
+
+ switch (nhc->nhc_gw_family) {
+ case AF_INET:
+ if (nla_put_in_addr(skb, RTA_GATEWAY, nhc->nhc_gw.ipv4))
+ goto nla_put_failure;
+ break;
+ case AF_INET6:
+ /* if gateway family does not match nexthop family
+ * gateway is encoded as RTA_VIA
+ */
+ if (nhc->nhc_gw_family != nhc->nhc_family) {
+ int alen = sizeof(struct in6_addr);
+ struct nlattr *nla;
+ struct rtvia *via;
+
+ nla = nla_reserve(skb, RTA_VIA, alen + 2);
+ if (!nla)
+ goto nla_put_failure;
+
+ via = nla_data(nla);
+ via->rtvia_family = AF_INET6;
+ memcpy(via->rtvia_addr, &nhc->nhc_gw.ipv6, alen);
+ } else if (nla_put_in6_addr(skb, RTA_GATEWAY,
+ &nhc->nhc_gw.ipv6) < 0) {
+ goto nla_put_failure;
+ }
+ break;
+ }
+
+ *flags |= (nhc->nhc_flags & RTNH_F_ONLINK);
+ if (nhc->nhc_flags & RTNH_F_OFFLOAD)
+ *flags |= RTNH_F_OFFLOAD;
+
+ if (!skip_oif && nhc->nhc_dev &&
+ nla_put_u32(skb, RTA_OIF, nhc->nhc_dev->ifindex))
+ goto nla_put_failure;
+
+ if (nhc->nhc_lwtstate &&
+ lwtunnel_fill_encap(skb, nhc->nhc_lwtstate) < 0)
+ goto nla_put_failure;
+
+ return 0;
+
+nla_put_failure:
+ return -EMSGSIZE;
+}
+EXPORT_SYMBOL_GPL(fib_nexthop_info);
+
+#if IS_ENABLED(CONFIG_IP_ROUTE_MULTIPATH) || IS_ENABLED(CONFIG_IPV6)
+int fib_add_nexthop(struct sk_buff *skb, const struct fib_nh_common *nhc,
+ int nh_weight)
+{
+ const struct net_device *dev = nhc->nhc_dev;
+ struct rtnexthop *rtnh;
+ unsigned int flags = 0;
+
+ rtnh = nla_reserve_nohdr(skb, sizeof(*rtnh));
+ if (!rtnh)
+ goto nla_put_failure;
+
+ rtnh->rtnh_hops = nh_weight - 1;
+ rtnh->rtnh_ifindex = dev ? dev->ifindex : 0;
+
+ if (fib_nexthop_info(skb, nhc, &flags, true) < 0)
+ goto nla_put_failure;
+
+ rtnh->rtnh_flags = flags;
+
+ /* length of rtnetlink header + attributes */
+ rtnh->rtnh_len = nlmsg_get_pos(skb) - (void *)rtnh;
+
+ return 0;
+
+nla_put_failure:
+ return -EMSGSIZE;
+}
+EXPORT_SYMBOL_GPL(fib_add_nexthop);
+#endif
+
+#ifdef CONFIG_IP_ROUTE_MULTIPATH
+static int fib_add_multipath(struct sk_buff *skb, struct fib_info *fi)
+{
+ struct nlattr *mp;
+
+ mp = nla_nest_start(skb, RTA_MULTIPATH);
+ if (!mp)
+ goto nla_put_failure;
+
+ for_nexthops(fi) {
+ if (fib_add_nexthop(skb, &nh->nh_common, nh->fib_nh_weight) < 0)
+ goto nla_put_failure;
+#ifdef CONFIG_IP_ROUTE_CLASSID
+ if (nh->nh_tclassid &&
+ nla_put_u32(skb, RTA_FLOW, nh->nh_tclassid))
+ goto nla_put_failure;
+#endif
+ } endfor_nexthops(fi);
+
+ nla_nest_end(skb, mp);
+
+ return 0;
+
+nla_put_failure:
+ return -EMSGSIZE;
+}
+#else
+static int fib_add_multipath(struct sk_buff *skb, struct fib_info *fi)
+{
+ return 0;
+}
+#endif
+
int fib_dump_info(struct sk_buff *skb, u32 portid, u32 seq, int event,
u32 tb_id, u8 type, __be32 dst, int dst_len, u8 tos,
struct fib_info *fi, unsigned int flags)
nla_put_in_addr(skb, RTA_PREFSRC, fi->fib_prefsrc))
goto nla_put_failure;
if (fi->fib_nhs == 1) {
- if (fi->fib_nh->nh_gw &&
- nla_put_in_addr(skb, RTA_GATEWAY, fi->fib_nh->nh_gw))
- goto nla_put_failure;
- if (fi->fib_nh->nh_oif &&
- nla_put_u32(skb, RTA_OIF, fi->fib_nh->nh_oif))
+ struct fib_nh *nh = &fi->fib_nh[0];
+ unsigned int flags = 0;
+
+ if (fib_nexthop_info(skb, &nh->nh_common, &flags, false) < 0)
goto nla_put_failure;
- if (fi->fib_nh->nh_flags & RTNH_F_LINKDOWN) {
- struct in_device *in_dev;
-
- rcu_read_lock();
- in_dev = __in_dev_get_rcu(fi->fib_nh->nh_dev);
- if (in_dev &&
- IN_DEV_IGNORE_ROUTES_WITH_LINKDOWN(in_dev))
- rtm->rtm_flags |= RTNH_F_DEAD;
- rcu_read_unlock();
- }
- if (fi->fib_nh->nh_flags & RTNH_F_OFFLOAD)
- rtm->rtm_flags |= RTNH_F_OFFLOAD;
+
+ rtm->rtm_flags = flags;
#ifdef CONFIG_IP_ROUTE_CLASSID
- if (fi->fib_nh[0].nh_tclassid &&
- nla_put_u32(skb, RTA_FLOW, fi->fib_nh[0].nh_tclassid))
+ if (nh->nh_tclassid &&
+ nla_put_u32(skb, RTA_FLOW, nh->nh_tclassid))
goto nla_put_failure;
#endif
- if (fi->fib_nh->nh_lwtstate &&
- lwtunnel_fill_encap(skb, fi->fib_nh->nh_lwtstate) < 0)
+ } else {
+ if (fib_add_multipath(skb, fi) < 0)
goto nla_put_failure;
}
-#ifdef CONFIG_IP_ROUTE_MULTIPATH
- if (fi->fib_nhs > 1) {
- struct rtnexthop *rtnh;
- struct nlattr *mp;
- mp = nla_nest_start(skb, RTA_MULTIPATH);
- if (!mp)
- goto nla_put_failure;
-
- for_nexthops(fi) {
- rtnh = nla_reserve_nohdr(skb, sizeof(*rtnh));
- if (!rtnh)
- goto nla_put_failure;
-
- rtnh->rtnh_flags = nh->nh_flags & 0xFF;
- if (nh->nh_flags & RTNH_F_LINKDOWN) {
- struct in_device *in_dev;
-
- rcu_read_lock();
- in_dev = __in_dev_get_rcu(nh->nh_dev);
- if (in_dev &&
- IN_DEV_IGNORE_ROUTES_WITH_LINKDOWN(in_dev))
- rtnh->rtnh_flags |= RTNH_F_DEAD;
- rcu_read_unlock();
- }
- rtnh->rtnh_hops = nh->nh_weight - 1;
- rtnh->rtnh_ifindex = nh->nh_oif;
-
- if (nh->nh_gw &&
- nla_put_in_addr(skb, RTA_GATEWAY, nh->nh_gw))
- goto nla_put_failure;
-#ifdef CONFIG_IP_ROUTE_CLASSID
- if (nh->nh_tclassid &&
- nla_put_u32(skb, RTA_FLOW, nh->nh_tclassid))
- goto nla_put_failure;
-#endif
- if (nh->nh_lwtstate &&
- lwtunnel_fill_encap(skb, nh->nh_lwtstate) < 0)
- goto nla_put_failure;
-
- /* length of rtnetlink header + attributes */
- rtnh->rtnh_len = nlmsg_get_pos(skb) - (void *) rtnh;
- } endfor_nexthops(fi);
-
- nla_nest_end(skb, mp);
- }
-#endif
nlmsg_end(skb, nlh);
return 0;
return ret;
}
-static int call_fib_nh_notifiers(struct fib_nh *fib_nh,
+static int call_fib_nh_notifiers(struct fib_nh *nh,
enum fib_event_type event_type)
{
- struct in_device *in_dev = __in_dev_get_rtnl(fib_nh->nh_dev);
+ bool ignore_link_down = ip_ignore_linkdown(nh->fib_nh_dev);
struct fib_nh_notifier_info info = {
- .fib_nh = fib_nh,
+ .fib_nh = nh,
};
switch (event_type) {
case FIB_EVENT_NH_ADD:
- if (fib_nh->nh_flags & RTNH_F_DEAD)
+ if (nh->fib_nh_flags & RTNH_F_DEAD)
break;
- if (IN_DEV_IGNORE_ROUTES_WITH_LINKDOWN(in_dev) &&
- fib_nh->nh_flags & RTNH_F_LINKDOWN)
+ if (ignore_link_down && nh->fib_nh_flags & RTNH_F_LINKDOWN)
break;
- return call_fib4_notifiers(dev_net(fib_nh->nh_dev), event_type,
+ return call_fib4_notifiers(dev_net(nh->fib_nh_dev), event_type,
&info.info);
case FIB_EVENT_NH_DEL:
- if ((in_dev && IN_DEV_IGNORE_ROUTES_WITH_LINKDOWN(in_dev) &&
- fib_nh->nh_flags & RTNH_F_LINKDOWN) ||
- (fib_nh->nh_flags & RTNH_F_DEAD))
- return call_fib4_notifiers(dev_net(fib_nh->nh_dev),
+ if ((ignore_link_down && nh->fib_nh_flags & RTNH_F_LINKDOWN) ||
+ (nh->fib_nh_flags & RTNH_F_DEAD))
+ return call_fib4_notifiers(dev_net(nh->fib_nh_dev),
event_type, &info.info);
default:
break;
struct fib_nh *nh;
hlist_for_each_entry(nh, head, nh_hash) {
- if (nh->nh_dev == dev)
+ if (nh->fib_nh_dev == dev)
nh_update_mtu(nh, dev->mtu, orig_mtu);
}
}
int dead;
BUG_ON(!fi->fib_nhs);
- if (nh->nh_dev != dev || fi == prev_fi)
+ if (nh->fib_nh_dev != dev || fi == prev_fi)
continue;
prev_fi = fi;
dead = 0;
change_nexthops(fi) {
- if (nexthop_nh->nh_flags & RTNH_F_DEAD)
+ if (nexthop_nh->fib_nh_flags & RTNH_F_DEAD)
dead++;
- else if (nexthop_nh->nh_dev == dev &&
- nexthop_nh->nh_scope != scope) {
+ else if (nexthop_nh->fib_nh_dev == dev &&
+ nexthop_nh->fib_nh_scope != scope) {
switch (event) {
case NETDEV_DOWN:
case NETDEV_UNREGISTER:
- nexthop_nh->nh_flags |= RTNH_F_DEAD;
+ nexthop_nh->fib_nh_flags |= RTNH_F_DEAD;
/* fall through */
case NETDEV_CHANGE:
- nexthop_nh->nh_flags |= RTNH_F_LINKDOWN;
+ nexthop_nh->fib_nh_flags |= RTNH_F_LINKDOWN;
break;
}
call_fib_nh_notifiers(nexthop_nh,
}
#ifdef CONFIG_IP_ROUTE_MULTIPATH
if (event == NETDEV_UNREGISTER &&
- nexthop_nh->nh_dev == dev) {
+ nexthop_nh->fib_nh_dev == dev) {
dead = fi->fib_nhs;
break;
}
if (next_fi->fib_scope != res->scope ||
fa->fa_type != RTN_UNICAST)
continue;
- if (!next_fi->fib_nh[0].nh_gw ||
- next_fi->fib_nh[0].nh_scope != RT_SCOPE_LINK)
+ if (!next_fi->fib_nh[0].fib_nh_gw4 ||
+ next_fi->fib_nh[0].fib_nh_scope != RT_SCOPE_LINK)
continue;
fib_alias_accessed(fa);
int alive;
BUG_ON(!fi->fib_nhs);
- if (nh->nh_dev != dev || fi == prev_fi)
+ if (nh->fib_nh_dev != dev || fi == prev_fi)
continue;
prev_fi = fi;
alive = 0;
change_nexthops(fi) {
- if (!(nexthop_nh->nh_flags & nh_flags)) {
+ if (!(nexthop_nh->fib_nh_flags & nh_flags)) {
alive++;
continue;
}
- if (!nexthop_nh->nh_dev ||
- !(nexthop_nh->nh_dev->flags & IFF_UP))
+ if (!nexthop_nh->fib_nh_dev ||
+ !(nexthop_nh->fib_nh_dev->flags & IFF_UP))
continue;
- if (nexthop_nh->nh_dev != dev ||
+ if (nexthop_nh->fib_nh_dev != dev ||
!__in_dev_get_rtnl(dev))
continue;
alive++;
- nexthop_nh->nh_flags &= ~nh_flags;
+ nexthop_nh->fib_nh_flags &= ~nh_flags;
call_fib_nh_notifiers(nexthop_nh, FIB_EVENT_NH_ADD);
} endfor_nexthops(fi)
{
int state = NUD_REACHABLE;
- if (nh->nh_scope == RT_SCOPE_LINK) {
+ if (nh->fib_nh_scope == RT_SCOPE_LINK) {
struct neighbour *n;
rcu_read_lock_bh();
- n = __ipv4_neigh_lookup_noref(nh->nh_dev,
- (__force u32)nh->nh_gw);
+ if (likely(nh->fib_nh_gw_family == AF_INET))
+ n = __ipv4_neigh_lookup_noref(nh->fib_nh_dev,
+ (__force u32)nh->fib_nh_gw4);
+ else if (nh->fib_nh_gw_family == AF_INET6)
+ n = __ipv6_neigh_lookup_noref_stub(nh->fib_nh_dev,
+ &nh->fib_nh_gw6);
+ else
+ n = NULL;
if (n)
state = n->nud_state;
struct net *net = fi->fib_net;
bool first = false;
- for_nexthops(fi) {
+ change_nexthops(fi) {
if (net->ipv4.sysctl_fib_multipath_use_neigh) {
- if (!fib_good_nh(nh))
+ if (!fib_good_nh(nexthop_nh))
continue;
if (!first) {
res->nh_sel = nhsel;
+ res->nhc = &nexthop_nh->nh_common;
first = true;
}
}
- if (hash > atomic_read(&nh->nh_upper_bound))
+ if (hash > atomic_read(&nexthop_nh->fib_nh_upper_bound))
continue;
res->nh_sel = nhsel;
+ res->nhc = &nexthop_nh->nh_common;
return;
} endfor_nexthops(fi);
}
check_saddr:
if (!fl4->saddr)
- fl4->saddr = FIB_RES_PREFSRC(net, *res);
+ fl4->saddr = fib_result_prefsrc(net, res);
}
};
static struct key_vector *resize(struct trie *t, struct key_vector *tn);
-static size_t tnode_free_size;
+static unsigned int tnode_free_size;
/*
- * synchronize_rcu after call_rcu for that many pages; it should be especially
- * useful before resizing the root node with PREEMPT_NONE configs; the value was
- * obtained experimentally, aiming to avoid visible slowdown.
+ * synchronize_rcu after call_rcu for outstanding dirty memory; it should be
+ * especially useful before resizing the root node with PREEMPT_NONE configs;
+ * the value was obtained experimentally, aiming to avoid visible slowdown.
*/
-static const int sync_pages = 128;
+unsigned int sysctl_fib_sync_mem = 512 * 1024;
+unsigned int sysctl_fib_sync_mem_min = 64 * 1024;
+unsigned int sysctl_fib_sync_mem_max = 64 * 1024 * 1024;
static struct kmem_cache *fn_alias_kmem __ro_after_init;
static struct kmem_cache *trie_leaf_kmem __ro_after_init;
tn = container_of(head, struct tnode, rcu)->kv;
}
- if (tnode_free_size >= PAGE_SIZE * sync_pages) {
+ if (tnode_free_size >= sysctl_fib_sync_mem) {
tnode_free_size = 0;
synchronize_rcu();
}
if (fi->fib_flags & RTNH_F_DEAD)
continue;
for (nhsel = 0; nhsel < fi->fib_nhs; nhsel++) {
- const struct fib_nh *nh = &fi->fib_nh[nhsel];
- struct in_device *in_dev = __in_dev_get_rcu(nh->nh_dev);
+ struct fib_nh_common *nhc = fib_info_nhc(fi, nhsel);
- if (nh->nh_flags & RTNH_F_DEAD)
+ if (nhc->nhc_flags & RTNH_F_DEAD)
continue;
- if (in_dev &&
- IN_DEV_IGNORE_ROUTES_WITH_LINKDOWN(in_dev) &&
- nh->nh_flags & RTNH_F_LINKDOWN &&
+ if (ip_ignore_linkdown(nhc->nhc_dev) &&
+ nhc->nhc_flags & RTNH_F_LINKDOWN &&
!(fib_flags & FIB_LOOKUP_IGNORE_LINKSTATE))
continue;
if (!(flp->flowi4_flags & FLOWI_FLAG_SKIP_NH_OIF)) {
if (flp->flowi4_oif &&
- flp->flowi4_oif != nh->nh_oif)
+ flp->flowi4_oif != nhc->nhc_oif)
continue;
}
res->prefix = htonl(n->key);
res->prefixlen = KEYLENGTH - fa->fa_slen;
res->nh_sel = nhsel;
+ res->nhc = nhc;
res->type = fa->fa_type;
res->scope = fi->fib_scope;
res->fi = fi;
#ifdef CONFIG_IP_FIB_TRIE_STATS
this_cpu_inc(stats->semantic_match_passed);
#endif
- trace_fib_table_lookup(tb->tb_id, flp, nh, err);
+ trace_fib_table_lookup(tb->tb_id, flp, nhc, err);
return err;
}
if (type == RTN_UNREACHABLE || type == RTN_PROHIBIT)
flags = RTF_REJECT;
- if (fi && fi->fib_nh->nh_gw)
+ if (fi && fi->fib_nh->fib_nh_gw4)
flags |= RTF_GATEWAY;
if (mask == htonl(0xFFFFFFFF))
flags |= RTF_HOST;
"%d\t%08X\t%d\t%u\t%u",
fi->fib_dev ? fi->fib_dev->name : "*",
prefix,
- fi->fib_nh->nh_gw, flags, 0, 0,
+ fi->fib_nh->fib_nh_gw4, flags, 0, 0,
fi->fib_priority,
mask,
(fi->fib_advmss ?
break;
case 1: {
- /* Direct encasulation of IPv4 or IPv6 */
+ /* Direct encapsulation of IPv4 or IPv6 */
int prot;
/* guehdr may change after pull */
guehdr = (struct guehdr *)&udp_hdr(skb)[1];
- hdrlen = sizeof(struct guehdr) + optlen;
-
- if (guehdr->version != 0 || validate_gue_flags(guehdr, optlen))
+ if (validate_gue_flags(guehdr, optlen))
goto drop;
hdrlen = sizeof(struct guehdr) + optlen;
return err;
}
-static int fou_add_to_port_list(struct net *net, struct fou *fou)
+static bool fou_cfg_cmp(struct fou *fou, struct fou_cfg *cfg)
+{
+ struct sock *sk = fou->sock->sk;
+ struct udp_port_cfg *udp_cfg = &cfg->udp_config;
+
+ if (fou->family != udp_cfg->family ||
+ fou->port != udp_cfg->local_udp_port ||
+ sk->sk_dport != udp_cfg->peer_udp_port ||
+ sk->sk_bound_dev_if != udp_cfg->bind_ifindex)
+ return false;
+
+ if (fou->family == AF_INET) {
+ if (sk->sk_rcv_saddr != udp_cfg->local_ip.s_addr ||
+ sk->sk_daddr != udp_cfg->peer_ip.s_addr)
+ return false;
+ else
+ return true;
+#if IS_ENABLED(CONFIG_IPV6)
+ } else {
+ if (ipv6_addr_cmp(&sk->sk_v6_rcv_saddr, &udp_cfg->local_ip6) ||
+ ipv6_addr_cmp(&sk->sk_v6_daddr, &udp_cfg->peer_ip6))
+ return false;
+ else
+ return true;
+#endif
+ }
+
+ return false;
+}
+
+static int fou_add_to_port_list(struct net *net, struct fou *fou,
+ struct fou_cfg *cfg)
{
struct fou_net *fn = net_generic(net, fou_net_id);
struct fou *fout;
mutex_lock(&fn->fou_lock);
list_for_each_entry(fout, &fn->fou_list, list) {
- if (fou->port == fout->port &&
- fou->family == fout->family) {
+ if (fou_cfg_cmp(fout, cfg)) {
mutex_unlock(&fn->fou_lock);
return -EALREADY;
}
sk->sk_allocation = GFP_ATOMIC;
- err = fou_add_to_port_list(net, fou);
+ err = fou_add_to_port_list(net, fou, cfg);
if (err)
goto error;
static int fou_destroy(struct net *net, struct fou_cfg *cfg)
{
struct fou_net *fn = net_generic(net, fou_net_id);
- __be16 port = cfg->udp_config.local_udp_port;
- u8 family = cfg->udp_config.family;
int err = -EINVAL;
struct fou *fou;
mutex_lock(&fn->fou_lock);
list_for_each_entry(fou, &fn->fou_list, list) {
- if (fou->port == port && fou->family == family) {
+ if (fou_cfg_cmp(fou, cfg)) {
fou_release(fou);
err = 0;
break;
static struct genl_family fou_nl_family;
static const struct nla_policy fou_nl_policy[FOU_ATTR_MAX + 1] = {
- [FOU_ATTR_PORT] = { .type = NLA_U16, },
- [FOU_ATTR_AF] = { .type = NLA_U8, },
- [FOU_ATTR_IPPROTO] = { .type = NLA_U8, },
- [FOU_ATTR_TYPE] = { .type = NLA_U8, },
- [FOU_ATTR_REMCSUM_NOPARTIAL] = { .type = NLA_FLAG, },
+ [FOU_ATTR_PORT] = { .type = NLA_U16, },
+ [FOU_ATTR_AF] = { .type = NLA_U8, },
+ [FOU_ATTR_IPPROTO] = { .type = NLA_U8, },
+ [FOU_ATTR_TYPE] = { .type = NLA_U8, },
+ [FOU_ATTR_REMCSUM_NOPARTIAL] = { .type = NLA_FLAG, },
+ [FOU_ATTR_LOCAL_V4] = { .type = NLA_U32, },
+ [FOU_ATTR_PEER_V4] = { .type = NLA_U32, },
+ [FOU_ATTR_LOCAL_V6] = { .type = sizeof(struct in6_addr), },
+ [FOU_ATTR_PEER_V6] = { .type = sizeof(struct in6_addr), },
+ [FOU_ATTR_PEER_PORT] = { .type = NLA_U16, },
+ [FOU_ATTR_IFINDEX] = { .type = NLA_S32, },
};
static int parse_nl_config(struct genl_info *info,
struct fou_cfg *cfg)
{
+ bool has_local = false, has_peer = false;
+ struct nlattr *attr;
+ int ifindex;
+ __be16 port;
+
memset(cfg, 0, sizeof(*cfg));
cfg->udp_config.family = AF_INET;
}
if (info->attrs[FOU_ATTR_PORT]) {
- __be16 port = nla_get_be16(info->attrs[FOU_ATTR_PORT]);
-
+ port = nla_get_be16(info->attrs[FOU_ATTR_PORT]);
cfg->udp_config.local_udp_port = port;
}
if (info->attrs[FOU_ATTR_REMCSUM_NOPARTIAL])
cfg->flags |= FOU_F_REMCSUM_NOPARTIAL;
+ if (cfg->udp_config.family == AF_INET) {
+ if (info->attrs[FOU_ATTR_LOCAL_V4]) {
+ attr = info->attrs[FOU_ATTR_LOCAL_V4];
+ cfg->udp_config.local_ip.s_addr = nla_get_in_addr(attr);
+ has_local = true;
+ }
+
+ if (info->attrs[FOU_ATTR_PEER_V4]) {
+ attr = info->attrs[FOU_ATTR_PEER_V4];
+ cfg->udp_config.peer_ip.s_addr = nla_get_in_addr(attr);
+ has_peer = true;
+ }
+#if IS_ENABLED(CONFIG_IPV6)
+ } else {
+ if (info->attrs[FOU_ATTR_LOCAL_V6]) {
+ attr = info->attrs[FOU_ATTR_LOCAL_V6];
+ cfg->udp_config.local_ip6 = nla_get_in6_addr(attr);
+ has_local = true;
+ }
+
+ if (info->attrs[FOU_ATTR_PEER_V6]) {
+ attr = info->attrs[FOU_ATTR_PEER_V6];
+ cfg->udp_config.peer_ip6 = nla_get_in6_addr(attr);
+ has_peer = true;
+ }
+#endif
+ }
+
+ if (has_peer) {
+ if (info->attrs[FOU_ATTR_PEER_PORT]) {
+ port = nla_get_be16(info->attrs[FOU_ATTR_PEER_PORT]);
+ cfg->udp_config.peer_udp_port = port;
+ } else {
+ return -EINVAL;
+ }
+ }
+
+ if (info->attrs[FOU_ATTR_IFINDEX]) {
+ if (!has_local)
+ return -EINVAL;
+
+ ifindex = nla_get_s32(info->attrs[FOU_ATTR_IFINDEX]);
+
+ cfg->udp_config.bind_ifindex = ifindex;
+ }
+
return 0;
}
static int fou_fill_info(struct fou *fou, struct sk_buff *msg)
{
+ struct sock *sk = fou->sock->sk;
+
if (nla_put_u8(msg, FOU_ATTR_AF, fou->sock->sk->sk_family) ||
nla_put_be16(msg, FOU_ATTR_PORT, fou->port) ||
+ nla_put_be16(msg, FOU_ATTR_PEER_PORT, sk->sk_dport) ||
nla_put_u8(msg, FOU_ATTR_IPPROTO, fou->protocol) ||
- nla_put_u8(msg, FOU_ATTR_TYPE, fou->type))
+ nla_put_u8(msg, FOU_ATTR_TYPE, fou->type) ||
+ nla_put_s32(msg, FOU_ATTR_IFINDEX, sk->sk_bound_dev_if))
return -1;
if (fou->flags & FOU_F_REMCSUM_NOPARTIAL)
if (nla_put_flag(msg, FOU_ATTR_REMCSUM_NOPARTIAL))
return -1;
+
+ if (fou->sock->sk->sk_family == AF_INET) {
+ if (nla_put_in_addr(msg, FOU_ATTR_LOCAL_V4, sk->sk_rcv_saddr))
+ return -1;
+
+ if (nla_put_in_addr(msg, FOU_ATTR_PEER_V4, sk->sk_daddr))
+ return -1;
+#if IS_ENABLED(CONFIG_IPV6)
+ } else {
+ if (nla_put_in6_addr(msg, FOU_ATTR_LOCAL_V6,
+ &sk->sk_v6_rcv_saddr))
+ return -1;
+
+ if (nla_put_in6_addr(msg, FOU_ATTR_PEER_V6, &sk->sk_v6_daddr))
+ return -1;
+#endif
+ }
+
return 0;
}
ret = -ESRCH;
mutex_lock(&fn->fou_lock);
list_for_each_entry(fout, &fn->fou_list, list) {
- if (port == fout->port && family == fout->family) {
+ if (fou_cfg_cmp(fout, &cfg)) {
ret = fou_dump_info(fout, info->snd_portid,
info->snd_seq, 0, msg,
info->genlhdr->cmd);
{
.cmd = FOU_CMD_ADD,
.doit = fou_nl_cmd_add_port,
- .policy = fou_nl_policy,
.flags = GENL_ADMIN_PERM,
},
{
.cmd = FOU_CMD_DEL,
.doit = fou_nl_cmd_rm_port,
- .policy = fou_nl_policy,
.flags = GENL_ADMIN_PERM,
},
{
.cmd = FOU_CMD_GET,
.doit = fou_nl_cmd_get_port,
.dumpit = fou_nl_dump,
- .policy = fou_nl_policy,
},
};
.name = FOU_GENL_NAME,
.version = FOU_GENL_VERSION,
.maxattr = FOU_ATTR_MAX,
+ .policy = fou_nl_policy,
.netnsok = true,
.module = THIS_MODULE,
.ops = fou_nl_ops,
case 0: /* Full GUE header present */
break;
case 1: {
- /* Direct encasulation of IPv4 or IPv6 */
+ /* Direct encapsulation of IPv4 or IPv6 */
skb_set_transport_header(skb, -(int)sizeof(struct icmphdr));
switch (((struct iphdr *)guehdr)->version) {
rt = ip_route_output_flow(net, fl4, sk);
if (IS_ERR(rt))
goto no_route;
- if (opt && opt->opt.is_strictroute && rt->rt_uses_gateway)
+ if (opt && opt->opt.is_strictroute && rt->rt_gw_family)
goto route_err;
rcu_read_unlock();
return &rt->dst;
rt = ip_route_output_flow(net, fl4, sk);
if (IS_ERR(rt))
goto no_route;
- if (opt && opt->opt.is_strictroute && rt->rt_uses_gateway)
+ if (opt && opt->opt.is_strictroute && rt->rt_gw_family)
goto route_err;
return &rt->dst;
rt = skb_rtable(skb);
- if (opt->is_strictroute && rt->rt_uses_gateway)
+ if (opt->is_strictroute && rt->rt_gw_family)
goto sr_failed;
IPCB(skb)->flags |= IPSKB_FORWARDED;
struct net_device *dev = dst->dev;
unsigned int hh_len = LL_RESERVED_SPACE(dev);
struct neighbour *neigh;
- u32 nexthop;
+ bool is_v6gw = false;
if (rt->rt_type == RTN_MULTICAST) {
IP_UPD_PO_STATS(net, IPSTATS_MIB_OUTMCAST, skb->len);
}
rcu_read_lock_bh();
- nexthop = (__force u32) rt_nexthop(rt, ip_hdr(skb)->daddr);
- neigh = __ipv4_neigh_lookup_noref(dev, nexthop);
- if (unlikely(!neigh))
- neigh = __neigh_create(&arp_tbl, &nexthop, dev, false);
+ neigh = ip_neigh_for_gw(rt, skb, &is_v6gw);
if (!IS_ERR(neigh)) {
int res;
sock_confirm_neigh(skb, neigh);
- res = neigh_output(neigh, skb);
-
+ /* if crossing protocols, can not use the cached header */
+ res = neigh_output(neigh, skb, is_v6gw);
rcu_read_unlock_bh();
return res;
}
skb_dst_set_noref(skb, &rt->dst);
packet_routed:
- if (inet_opt && inet_opt->opt.is_strictroute && rt->rt_uses_gateway)
+ if (inet_opt && inet_opt->opt.is_strictroute && rt->rt_gw_family)
goto no_route;
/* OK, we know where to send it, allocate and build IP header. */
return 0;
}
- while (frag) {
- skb = frag->next;
- kfree_skb(frag);
- frag = skb;
- }
+ kfree_skb_list(frag);
+
IP_INC_STATS(net, IPSTATS_MIB_FRAGFAILS);
return err;
.key_offset = offsetof(struct mfc_cache, cmparg),
.key_len = sizeof(struct mfc_cache_cmp_arg),
.nelem_hint = 3,
- .locks_mul = 1,
.obj_cmpfn = ipmr_hash_cmp,
.automatic_shrinking = true,
};
if NF_TABLES_IPV4
-config NFT_CHAIN_ROUTE_IPV4
- tristate "IPv4 nf_tables route chain support"
- help
- This option enables the "route" chain for IPv4 in nf_tables. This
- chain type is used to force packet re-routing after mangling header
- fields such as the source, destination, type of service and
- the packet mark.
-
config NFT_REJECT_IPV4
select NF_REJECT_IPV4
default NFT_REJECT
config IP_NF_TARGET_MASQUERADE
tristate "MASQUERADE target support"
- select NF_NAT_MASQUERADE
- default m if NETFILTER_ADVANCED=n
+ select NETFILTER_XT_TARGET_MASQUERADE
help
- Masquerading is a special case of NAT: all outgoing connections are
- changed to seem to come from a particular interface's address, and
- if the interface goes down, those connections are lost. This is
- only useful for dialup accounts with dynamic IP address (ie. your IP
- address will be different on next dialup).
-
- To compile it as a module, choose M here. If unsure, say N.
+ This is a backwards-compat option for the user's convenience
+ (e.g. when running oldconfig). It selects NETFILTER_XT_TARGET_MASQUERADE.
config IP_NF_TARGET_NETMAP
tristate "NETMAP target support"
$(obj)/nf_nat_snmp_basic_main.o: $(obj)/nf_nat_snmp_basic.asn1.h
obj-$(CONFIG_NF_NAT_SNMP_BASIC) += nf_nat_snmp_basic.o
-obj-$(CONFIG_NFT_CHAIN_ROUTE_IPV4) += nft_chain_route_ipv4.o
obj-$(CONFIG_NFT_REJECT_IPV4) += nft_reject_ipv4.o
obj-$(CONFIG_NFT_FIB_IPV4) += nft_fib_ipv4.o
obj-$(CONFIG_NFT_DUP_IPV4) += nft_dup_ipv4.o
# targets
obj-$(CONFIG_IP_NF_TARGET_CLUSTERIP) += ipt_CLUSTERIP.o
obj-$(CONFIG_IP_NF_TARGET_ECN) += ipt_ECN.o
-obj-$(CONFIG_IP_NF_TARGET_MASQUERADE) += ipt_MASQUERADE.o
obj-$(CONFIG_IP_NF_TARGET_REJECT) += ipt_REJECT.o
obj-$(CONFIG_IP_NF_TARGET_SYNPROXY) += ipt_SYNPROXY.o
+++ /dev/null
-/*
- * Copyright (c) 2008 Patrick McHardy <kaber@trash.net>
- * Copyright (c) 2012 Pablo Neira Ayuso <pablo@netfilter.org>
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- */
-
-#include <linux/module.h>
-#include <linux/init.h>
-#include <linux/list.h>
-#include <linux/skbuff.h>
-#include <linux/netlink.h>
-#include <linux/netfilter.h>
-#include <linux/netfilter_ipv4.h>
-#include <linux/netfilter/nfnetlink.h>
-#include <linux/netfilter/nf_tables.h>
-#include <net/netfilter/nf_tables.h>
-#include <net/netfilter/nf_tables_ipv4.h>
-#include <net/route.h>
-#include <net/ip.h>
-
-static unsigned int nf_route_table_hook(void *priv,
- struct sk_buff *skb,
- const struct nf_hook_state *state)
-{
- unsigned int ret;
- struct nft_pktinfo pkt;
- u32 mark;
- __be32 saddr, daddr;
- u_int8_t tos;
- const struct iphdr *iph;
- int err;
-
- nft_set_pktinfo(&pkt, skb, state);
- nft_set_pktinfo_ipv4(&pkt, skb);
-
- mark = skb->mark;
- iph = ip_hdr(skb);
- saddr = iph->saddr;
- daddr = iph->daddr;
- tos = iph->tos;
-
- ret = nft_do_chain(&pkt, priv);
- if (ret != NF_DROP && ret != NF_STOLEN) {
- iph = ip_hdr(skb);
-
- if (iph->saddr != saddr ||
- iph->daddr != daddr ||
- skb->mark != mark ||
- iph->tos != tos) {
- err = ip_route_me_harder(state->net, skb, RTN_UNSPEC);
- if (err < 0)
- ret = NF_DROP_ERR(err);
- }
- }
- return ret;
-}
-
-static const struct nft_chain_type nft_chain_route_ipv4 = {
- .name = "route",
- .type = NFT_CHAIN_T_ROUTE,
- .family = NFPROTO_IPV4,
- .owner = THIS_MODULE,
- .hook_mask = (1 << NF_INET_LOCAL_OUT),
- .hooks = {
- [NF_INET_LOCAL_OUT] = nf_route_table_hook,
- },
-};
-
-static int __init nft_chain_route_init(void)
-{
- nft_register_chain_type(&nft_chain_route_ipv4);
-
- return 0;
-}
-
-static void __exit nft_chain_route_exit(void)
-{
- nft_unregister_chain_type(&nft_chain_route_ipv4);
-}
-
-module_init(nft_chain_route_init);
-module_exit(nft_chain_route_exit);
-
-MODULE_LICENSE("GPL");
-MODULE_AUTHOR("Patrick McHardy <kaber@trash.net>");
-MODULE_ALIAS_NFT_CHAIN(AF_INET, "route");
struct sk_buff *skb,
const void *daddr)
{
+ const struct rtable *rt = container_of(dst, struct rtable, dst);
struct net_device *dev = dst->dev;
- const __be32 *pkey = daddr;
- const struct rtable *rt;
struct neighbour *n;
- rt = (const struct rtable *) dst;
- if (rt->rt_gateway)
- pkey = (const __be32 *) &rt->rt_gateway;
- else if (skb)
- pkey = &ip_hdr(skb)->daddr;
+ rcu_read_lock_bh();
+
+ if (likely(rt->rt_gw_family == AF_INET)) {
+ n = ip_neigh_gw4(dev, rt->rt_gw4);
+ } else if (rt->rt_gw_family == AF_INET6) {
+ n = ip_neigh_gw6(dev, &rt->rt_gw6);
+ } else {
+ __be32 pkey;
+
+ pkey = skb ? ip_hdr(skb)->daddr : *((__be32 *) daddr);
+ n = ip_neigh_gw4(dev, pkey);
+ }
+
+ if (n && !refcount_inc_not_zero(&n->refcnt))
+ n = NULL;
- n = __ipv4_neigh_lookup(dev, *(__force u32 *)pkey);
- if (n)
- return n;
- return neigh_create(&arp_tbl, pkey, dev);
+ rcu_read_unlock_bh();
+
+ return n;
}
static void ipv4_confirm_neigh(const struct dst_entry *dst, const void *daddr)
{
+ const struct rtable *rt = container_of(dst, struct rtable, dst);
struct net_device *dev = dst->dev;
const __be32 *pkey = daddr;
- const struct rtable *rt;
- rt = (const struct rtable *)dst;
- if (rt->rt_gateway)
- pkey = (const __be32 *)&rt->rt_gateway;
- else if (!daddr ||
+ if (rt->rt_gw_family == AF_INET) {
+ pkey = (const __be32 *)&rt->rt_gw4;
+ } else if (rt->rt_gw_family == AF_INET6) {
+ return __ipv6_confirm_neigh_stub(dev, &rt->rt_gw6);
+ } else if (!daddr ||
(rt->rt_flags &
- (RTCF_MULTICAST | RTCF_BROADCAST | RTCF_LOCAL)))
+ (RTCF_MULTICAST | RTCF_BROADCAST | RTCF_LOCAL))) {
return;
-
+ }
__ipv4_confirm_neigh(dev, *(__force u32 *)pkey);
}
void __ip_select_ident(struct net *net, struct iphdr *iph, int segs)
{
- static u32 ip_idents_hashrnd __read_mostly;
u32 hash, id;
- net_get_random_once(&ip_idents_hashrnd, sizeof(ip_idents_hashrnd));
+ /* Note the following code is not safe, but this is okay. */
+ if (unlikely(siphash_key_is_zero(&net->ipv4.ip_id_key)))
+ get_random_bytes(&net->ipv4.ip_id_key,
+ sizeof(net->ipv4.ip_id_key));
- hash = jhash_3words((__force u32)iph->daddr,
+ hash = siphash_3u32((__force u32)iph->daddr,
(__force u32)iph->saddr,
- iph->protocol ^ net_hash_mix(net),
- ip_idents_hashrnd);
+ iph->protocol,
+ &net->ipv4.ip_id_key);
id = ip_idents_reserve(hash, segs);
iph->id = htons(id);
}
if (fnhe->fnhe_gw) {
rt->rt_flags |= RTCF_REDIRECTED;
- rt->rt_gateway = fnhe->fnhe_gw;
- rt->rt_uses_gateway = 1;
+ rt->rt_gw_family = AF_INET;
+ rt->rt_gw4 = fnhe->fnhe_gw;
}
}
unsigned int i;
int depth;
- genid = fnhe_genid(dev_net(nh->nh_dev));
+ genid = fnhe_genid(dev_net(nh->fib_nh_dev));
hval = fnhe_hashfun(daddr);
spin_lock_bh(&fnhe_lock);
return;
}
- if (rt->rt_gateway != old_gw)
+ if (rt->rt_gw_family != AF_INET || rt->rt_gw4 != old_gw)
return;
in_dev = __in_dev_get_rcu(dev);
neigh_event_send(n, NULL);
} else {
if (fib_lookup(net, fl4, &res, 0) == 0) {
- struct fib_nh *nh = &FIB_RES_NH(res);
+ struct fib_nh_common *nhc = FIB_RES_NHC(res);
+ struct fib_nh *nh;
+ nh = container_of(nhc, struct fib_nh, nh_common);
update_or_create_fnhe(nh, fl4->daddr, new_gw,
0, false,
jiffies + ip_rt_gc_timeout);
rcu_read_lock();
if (fib_lookup(dev_net(dst->dev), fl4, &res, 0) == 0) {
- struct fib_nh *nh = &FIB_RES_NH(res);
+ struct fib_nh_common *nhc = FIB_RES_NHC(res);
+ struct fib_nh *nh;
+ nh = container_of(nhc, struct fib_nh, nh_common);
update_or_create_fnhe(nh, fl4->daddr, 0, mtu, lock,
jiffies + ip_rt_mtu_expires);
}
*
* When a PMTU/redirect information update invalidates a route,
* this is indicated by setting obsolete to DST_OBSOLETE_KILL or
- * DST_OBSOLETE_DEAD by dst_free().
+ * DST_OBSOLETE_DEAD.
*/
if (dst->obsolete != DST_OBSOLETE_FORCE_CHK || rt_is_expired(rt))
return NULL;
rcu_read_lock();
if (fib_lookup(dev_net(rt->dst.dev), &fl4, &res, 0) == 0)
- src = FIB_RES_PREFSRC(dev_net(rt->dst.dev), res);
+ src = fib_result_prefsrc(dev_net(rt->dst.dev), &res);
else
src = inet_select_addr(rt->dst.dev,
rt_nexthop(rt, iph->daddr),
mtu = READ_ONCE(dst->dev->mtu);
if (unlikely(ip_mtu_locked(dst))) {
- if (rt->rt_uses_gateway && mtu > 576)
+ if (rt->rt_gw_family && mtu > 576)
mtu = 576;
}
u32 ip_mtu_from_fib_result(struct fib_result *res, __be32 daddr)
{
+ struct fib_nh_common *nhc = res->nhc;
+ struct net_device *dev = nhc->nhc_dev;
struct fib_info *fi = res->fi;
- struct fib_nh *nh = &fi->fib_nh[res->nh_sel];
- struct net_device *dev = nh->nh_dev;
u32 mtu = 0;
if (dev_net(dev)->ipv4.sysctl_ip_fwd_use_pmtu ||
mtu = fi->fib_mtu;
if (likely(!mtu)) {
+ struct fib_nh *nh = container_of(nhc, struct fib_nh, nh_common);
struct fib_nh_exception *fnhe;
fnhe = find_exception(nh, daddr);
if (likely(!mtu))
mtu = min(READ_ONCE(dev->mtu), IP_MAX_MTU);
- return mtu - lwtunnel_headroom(nh->nh_lwtstate, mtu);
+ return mtu - lwtunnel_headroom(nhc->nhc_lwtstate, mtu);
}
static bool rt_bind_exception(struct rtable *rt, struct fib_nh_exception *fnhe,
orig = NULL;
}
fill_route_from_fnhe(rt, fnhe);
- if (!rt->rt_gateway)
- rt->rt_gateway = daddr;
+ if (!rt->rt_gw4) {
+ rt->rt_gw4 = daddr;
+ rt->rt_gw_family = AF_INET;
+ }
if (do_cache) {
dst_hold(&rt->dst);
bool cached = false;
if (fi) {
- struct fib_nh *nh = &FIB_RES_NH(*res);
-
- if (nh->nh_gw && nh->nh_scope == RT_SCOPE_LINK) {
- rt->rt_gateway = nh->nh_gw;
- rt->rt_uses_gateway = 1;
+ struct fib_nh_common *nhc = FIB_RES_NHC(*res);
+ struct fib_nh *nh;
+
+ if (nhc->nhc_gw_family && nhc->nhc_scope == RT_SCOPE_LINK) {
+ rt->rt_gw_family = nhc->nhc_gw_family;
+ /* only INET and INET6 are supported */
+ if (likely(nhc->nhc_gw_family == AF_INET))
+ rt->rt_gw4 = nhc->nhc_gw.ipv4;
+ else
+ rt->rt_gw6 = nhc->nhc_gw.ipv6;
}
+
ip_dst_init_metrics(&rt->dst, fi->fib_metrics);
+ nh = container_of(nhc, struct fib_nh, nh_common);
#ifdef CONFIG_IP_ROUTE_CLASSID
rt->dst.tclassid = nh->nh_tclassid;
#endif
- rt->dst.lwtstate = lwtstate_get(nh->nh_lwtstate);
+ rt->dst.lwtstate = lwtstate_get(nh->fib_nh_lws);
if (unlikely(fnhe))
cached = rt_bind_exception(rt, fnhe, daddr, do_cache);
else if (do_cache)
* However, if we are unsuccessful at storing this
* route into the cache we really need to set it.
*/
- if (!rt->rt_gateway)
- rt->rt_gateway = daddr;
+ if (!rt->rt_gw4) {
+ rt->rt_gw_family = AF_INET;
+ rt->rt_gw4 = daddr;
+ }
rt_add_uncached_list(rt);
}
} else
rt->rt_iif = 0;
rt->rt_pmtu = 0;
rt->rt_mtu_locked = 0;
- rt->rt_gateway = 0;
- rt->rt_uses_gateway = 0;
+ rt->rt_gw_family = 0;
+ rt->rt_gw4 = 0;
INIT_LIST_HEAD(&rt->rt_uncached);
rt->dst.output = ip_output;
struct in_device *in_dev,
__be32 daddr, __be32 saddr, u32 tos)
{
+ struct fib_nh_common *nhc = FIB_RES_NHC(*res);
+ struct net_device *dev = nhc->nhc_dev;
struct fib_nh_exception *fnhe;
struct rtable *rth;
+ struct fib_nh *nh;
int err;
struct in_device *out_dev;
bool do_cache;
u32 itag = 0;
/* get a working reference to the output device */
- out_dev = __in_dev_get_rcu(FIB_RES_DEV(*res));
+ out_dev = __in_dev_get_rcu(dev);
if (!out_dev) {
net_crit_ratelimited("Bug in ip_route_input_slow(). Please report.\n");
return -EINVAL;
do_cache = res->fi && !itag;
if (out_dev == in_dev && err && IN_DEV_TX_REDIRECTS(out_dev) &&
- skb->protocol == htons(ETH_P_IP) &&
- (IN_DEV_SHARED_MEDIA(out_dev) ||
- inet_addr_onlink(out_dev, saddr, FIB_RES_GW(*res))))
- IPCB(skb)->flags |= IPSKB_DOREDIRECT;
+ skb->protocol == htons(ETH_P_IP)) {
+ __be32 gw;
+
+ gw = nhc->nhc_gw_family == AF_INET ? nhc->nhc_gw.ipv4 : 0;
+ if (IN_DEV_SHARED_MEDIA(out_dev) ||
+ inet_addr_onlink(out_dev, saddr, gw))
+ IPCB(skb)->flags |= IPSKB_DOREDIRECT;
+ }
if (skb->protocol != htons(ETH_P_IP)) {
/* Not IP (i.e. ARP). Do not create route, if it is
}
}
- fnhe = find_exception(&FIB_RES_NH(*res), daddr);
+ nh = container_of(nhc, struct fib_nh, nh_common);
+ fnhe = find_exception(nh, daddr);
if (do_cache) {
if (fnhe)
rth = rcu_dereference(fnhe->fnhe_rth_input);
else
- rth = rcu_dereference(FIB_RES_NH(*res).nh_rth_input);
+ rth = rcu_dereference(nh->nh_rth_input);
if (rt_cache_valid(rth)) {
skb_dst_set_noref(skb, &rth->dst);
goto out;
do_cache = false;
if (res->fi) {
if (!itag) {
- rth = rcu_dereference(FIB_RES_NH(*res).nh_rth_input);
+ struct fib_nh_common *nhc = FIB_RES_NHC(*res);
+ struct fib_nh *nh;
+
+ nh = container_of(nhc, struct fib_nh, nh_common);
+ rth = rcu_dereference(nh->nh_rth_input);
if (rt_cache_valid(rth)) {
skb_dst_set_noref(skb, &rth->dst);
err = 0;
}
if (do_cache) {
- struct fib_nh *nh = &FIB_RES_NH(*res);
+ struct fib_nh_common *nhc = FIB_RES_NHC(*res);
+ struct fib_nh *nh;
- rth->dst.lwtstate = lwtstate_get(nh->nh_lwtstate);
+ rth->dst.lwtstate = lwtstate_get(nhc->nhc_lwtstate);
if (lwtunnel_input_redirect(rth->dst.lwtstate)) {
WARN_ON(rth->dst.input == lwtunnel_input);
rth->dst.lwtstate->orig_input = rth->dst.input;
rth->dst.input = lwtunnel_input;
}
+ nh = container_of(nhc, struct fib_nh, nh_common);
if (unlikely(!rt_cache_route(nh, rth)))
rt_add_uncached_list(rth);
}
fnhe = NULL;
do_cache &= fi != NULL;
if (fi) {
+ struct fib_nh_common *nhc = FIB_RES_NHC(*res);
+ struct fib_nh *nh = container_of(nhc, struct fib_nh, nh_common);
struct rtable __rcu **prth;
- struct fib_nh *nh = &FIB_RES_NH(*res);
fnhe = find_exception(nh, fl4->daddr);
if (!do_cache)
} else {
if (unlikely(fl4->flowi4_flags &
FLOWI_FLAG_KNOWN_NH &&
- !(nh->nh_gw &&
- nh->nh_scope == RT_SCOPE_LINK))) {
+ !(nhc->nhc_gw_family &&
+ nhc->nhc_scope == RT_SCOPE_LINK))) {
do_cache = false;
goto add;
}
rt->rt_genid = rt_genid_ipv4(net);
rt->rt_flags = ort->rt_flags;
rt->rt_type = ort->rt_type;
- rt->rt_gateway = ort->rt_gateway;
- rt->rt_uses_gateway = ort->rt_uses_gateway;
+ rt->rt_gw_family = ort->rt_gw_family;
+ if (rt->rt_gw_family == AF_INET)
+ rt->rt_gw4 = ort->rt_gw4;
+ else if (rt->rt_gw_family == AF_INET6)
+ rt->rt_gw6 = ort->rt_gw6;
INIT_LIST_HEAD(&rt->rt_uncached);
}
if (nla_put_in_addr(skb, RTA_PREFSRC, fl4->saddr))
goto nla_put_failure;
}
- if (rt->rt_uses_gateway &&
- nla_put_in_addr(skb, RTA_GATEWAY, rt->rt_gateway))
+ if (rt->rt_gw_family == AF_INET &&
+ nla_put_in_addr(skb, RTA_GATEWAY, rt->rt_gw4)) {
goto nla_put_failure;
+ } else if (rt->rt_gw_family == AF_INET6) {
+ int alen = sizeof(struct in6_addr);
+ struct nlattr *nla;
+ struct rtvia *via;
+
+ nla = nla_reserve(skb, RTA_VIA, alen + 2);
+ if (!nla)
+ goto nla_put_failure;
+
+ via = nla_data(nla);
+ via->rtvia_family = AF_INET6;
+ memcpy(via->rtvia_addr, &rt->rt_gw6, alen);
+ }
expires = rt->dst.expires;
if (expires) {
refcount_set(&req->rsk_refcnt, 1);
tcp_sk(child)->tsoffset = tsoff;
sock_rps_save_rxhash(child, skb);
- if (!inet_csk_reqsk_queue_add(sk, req, child)) {
- bh_unlock_sock(child);
- sock_put(child);
- child = NULL;
- reqsk_put(req);
- }
- } else {
- reqsk_free(req);
+ if (inet_csk_reqsk_queue_add(sk, req, child))
+ return child;
+
+ bh_unlock_sock(child);
+ sock_put(child);
}
- return child;
+ __reqsk_free(req);
+
+ return NULL;
}
EXPORT_SYMBOL(tcp_get_cookie_sock);
.mode = 0644,
.proc_handler = proc_doulongvec_minmax,
},
+ {
+ .procname = "fib_sync_mem",
+ .data = &sysctl_fib_sync_mem,
+ .maxlen = sizeof(sysctl_fib_sync_mem),
+ .mode = 0644,
+ .proc_handler = proc_douintvec_minmax,
+ .extra1 = &sysctl_fib_sync_mem_min,
+ .extra2 = &sysctl_fib_sync_mem_max,
+ },
{ }
};
{
struct sk_buff *skb;
+ if (likely(!size)) {
+ skb = sk->sk_tx_skb_cache;
+ if (skb && !skb_cloned(skb)) {
+ skb->truesize -= skb->data_len;
+ sk->sk_tx_skb_cache = NULL;
+ pskb_trim(skb, 0);
+ INIT_LIST_HEAD(&skb->tcp_tsorted_anchor);
+ skb_shinfo(skb)->tx_flags = 0;
+ memset(TCP_SKB_CB(skb), 0, sizeof(struct tcp_skb_cb));
+ return skb;
+ }
+ }
/* The TCP header must be at least 32-bit aligned. */
size = ALIGN(size, 4);
}
EXPORT_SYMBOL(tcp_sendpage);
-/* Do not bother using a page frag for very small frames.
- * But use this heuristic only for the first skb in write queue.
- *
- * Having no payload in skb->head allows better SACK shifting
- * in tcp_shift_skb_data(), reducing sack/rack overhead, because
- * write queue has less skbs.
- * Each skb can hold up to MAX_SKB_FRAGS * 32Kbytes, or ~0.5 MB.
- * This also speeds up tso_fragment(), since it wont fallback
- * to tcp_fragment().
- */
-static int linear_payload_sz(bool first_skb)
-{
- if (first_skb)
- return SKB_WITH_OVERHEAD(2048 - MAX_TCP_HEADER);
- return 0;
-}
-
-static int select_size(bool first_skb, bool zc)
-{
- if (zc)
- return 0;
- return linear_payload_sz(first_skb);
-}
-
void tcp_free_fastopen_req(struct tcp_sock *tp)
{
if (tp->fastopen_req) {
if (copy <= 0 || !tcp_skb_can_collapse_to(skb)) {
bool first_skb;
- int linear;
new_segment:
if (!sk_stream_memory_free(sk))
goto restart;
}
first_skb = tcp_rtx_and_write_queues_empty(sk);
- linear = select_size(first_skb, zc);
- skb = sk_stream_alloc_skb(sk, linear, sk->sk_allocation,
+ skb = sk_stream_alloc_skb(sk, 0, sk->sk_allocation,
first_skb);
if (!skb)
goto wait_for_memory;
sk_wmem_free_skb(sk, skb);
}
tcp_rtx_queue_purge(sk);
+ skb = sk->sk_tx_skb_cache;
+ if (skb) {
+ __kfree_skb(skb);
+ sk->sk_tx_skb_cache = NULL;
+ }
INIT_LIST_HEAD(&tcp_sk(sk)->tsorted_sent_queue);
sk_mem_reclaim(sk);
tcp_clear_all_retrans_hints(tcp_sk(sk));
tcp_clear_xmit_timers(sk);
__skb_queue_purge(&sk->sk_receive_queue);
+ if (sk->sk_rx_skb_cache) {
+ __kfree_skb(sk->sk_rx_skb_cache);
+ sk->sk_rx_skb_cache = NULL;
+ }
tp->copied_seq = tp->rcv_nxt;
tp->urg_data = 0;
tcp_write_queue_purge(sk);
* congestion control: Linux DCTCP asserts ECT on all packets,
* including SYN, which is most optimal solution; however,
* others, such as FreeBSD do not.
+ *
+ * Exception: At least one of the reserved bits of the TCP header (th->res1) is
+ * set, indicating the use of a future TCP extension (such as AccECN). See
+ * RFC8311 §4.3 which updates RFC3168 to allow the development of such
+ * extensions.
*/
static void tcp_ecn_create_request(struct request_sock *req,
const struct sk_buff *skb,
ecn_ok_dst = dst_feature(dst, DST_FEATURE_ECN_MASK);
ecn_ok = net->ipv4.sysctl_tcp_ecn || ecn_ok_dst;
- if ((!ect && ecn_ok) || tcp_ca_needs_ecn(listen_sk) ||
+ if (((!ect || th->res1) && ecn_ok) || tcp_ca_needs_ecn(listen_sk) ||
(ecn_ok_dst & DST_FEATURE_ECN_CA) ||
tcp_bpf_ca_needs_ecn((struct sock *)req))
inet_rsk(req)->ecn_ok = 1;
reqsk_fastopen_remove(fastopen_sk, req, false);
bh_unlock_sock(fastopen_sk);
sock_put(fastopen_sk);
- reqsk_put(req);
- goto drop;
+ goto drop_and_free;
}
sk->sk_data_ready(sk);
bh_unlock_sock(fastopen_sk);
drop_and_release:
dst_release(dst);
drop_and_free:
- reqsk_free(req);
+ __reqsk_free(req);
drop:
tcp_listendrop(sk);
return 0;
int tcp_v4_rcv(struct sk_buff *skb)
{
struct net *net = dev_net(skb->dev);
+ struct sk_buff *skb_to_free;
int sdif = inet_sdif(skb);
const struct iphdr *iph;
const struct tcphdr *th;
tcp_segs_in(tcp_sk(sk), skb);
ret = 0;
if (!sock_owned_by_user(sk)) {
+ skb_to_free = sk->sk_rx_skb_cache;
+ sk->sk_rx_skb_cache = NULL;
ret = tcp_v4_do_rcv(sk, skb);
- } else if (tcp_add_backlog(sk, skb)) {
- goto discard_and_relse;
+ } else {
+ if (tcp_add_backlog(sk, skb))
+ goto discard_and_relse;
+ skb_to_free = NULL;
}
bh_unlock_sock(sk);
+ if (skb_to_free)
+ __kfree_skb(skb_to_free);
put_and_return:
if (refcounted)
.cmd = TCP_METRICS_CMD_GET,
.doit = tcp_metrics_nl_cmd_get,
.dumpit = tcp_metrics_nl_dump,
- .policy = tcp_metrics_nl_policy,
},
{
.cmd = TCP_METRICS_CMD_DEL,
.doit = tcp_metrics_nl_cmd_del,
- .policy = tcp_metrics_nl_policy,
.flags = GENL_ADMIN_PERM,
},
};
.name = TCP_METRICS_GENL_NAME,
.version = TCP_METRICS_GENL_VERSION,
.maxattr = TCP_METRICS_ATTR_MAX,
+ .policy = tcp_metrics_nl_policy,
.netnsok = true,
.module = THIS_MODULE,
.ops = tcp_metrics_nl_ops,
{
u64 val = tcp_clock_ns();
- if (val > tp->tcp_clock_cache)
- tp->tcp_clock_cache = val;
-
- val = div_u64(val, NSEC_PER_USEC);
- if (val > tp->tcp_mstamp)
- tp->tcp_mstamp = val;
+ tp->tcp_clock_cache = val;
+ tp->tcp_mstamp = div_u64(val, NSEC_PER_USEC);
}
static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
tskb = skb_rb_last(&sk->tcp_rtx_queue);
if (tskb) {
-coalesce:
TCP_SKB_CB(tskb)->tcp_flags |= TCPHDR_FIN;
TCP_SKB_CB(tskb)->end_seq++;
tp->write_seq++;
}
} else {
skb = alloc_skb_fclone(MAX_TCP_HEADER, sk->sk_allocation);
- if (unlikely(!skb)) {
- if (tskb)
- goto coalesce;
+ if (unlikely(!skb))
return;
- }
+
INIT_LIST_HEAD(&skb->tcp_tsorted_anchor);
skb_reserve(skb, MAX_TCP_HEADER);
sk_forced_mem_schedule(sk, skb->truesize);
EXPORT_SYMBOL(udp_ioctl);
struct sk_buff *__skb_recv_udp(struct sock *sk, unsigned int flags,
- int noblock, int *peeked, int *off, int *err)
+ int noblock, int *off, int *err)
{
struct sk_buff_head *sk_queue = &sk->sk_receive_queue;
struct sk_buff_head *queue;
break;
error = -EAGAIN;
- *peeked = 0;
do {
spin_lock_bh(&queue->lock);
skb = __skb_try_recv_from_queue(sk, queue, flags,
udp_skb_destructor,
- peeked, off, err,
- &last);
+ off, err, &last);
if (skb) {
spin_unlock_bh(&queue->lock);
return skb;
skb = __skb_try_recv_from_queue(sk, queue, flags,
udp_skb_dtor_locked,
- peeked, off, err,
- &last);
+ off, err, &last);
spin_unlock(&sk_queue->lock);
spin_unlock_bh(&queue->lock);
if (skb)
DECLARE_SOCKADDR(struct sockaddr_in *, sin, msg->msg_name);
struct sk_buff *skb;
unsigned int ulen, copied;
- int peeked, peeking, off;
- int err;
+ int off, err, peeking = flags & MSG_PEEK;
int is_udplite = IS_UDPLITE(sk);
bool checksum_valid = false;
return ip_recv_error(sk, msg, len, addr_len);
try_again:
- peeking = flags & MSG_PEEK;
off = sk_peek_offset(sk, flags);
- skb = __skb_recv_udp(sk, flags, noblock, &peeked, &off, &err);
+ skb = __skb_recv_udp(sk, flags, noblock, &off, &err);
if (!skb)
return err;
}
if (unlikely(err)) {
- if (!peeked) {
+ if (!peeking) {
atomic_inc(&sk->sk_drops);
UDP_INC_STATS(sock_net(sk),
UDP_MIB_INERRORS, is_udplite);
return err;
}
- if (!peeked)
+ if (!peeking)
UDP_INC_STATS(sock_net(sk),
UDP_MIB_INDATAGRAMS, is_udplite);
xdst->u.rt.rt_flags = rt->rt_flags & (RTCF_BROADCAST | RTCF_MULTICAST |
RTCF_LOCAL);
xdst->u.rt.rt_type = rt->rt_type;
- xdst->u.rt.rt_gateway = rt->rt_gateway;
- xdst->u.rt.rt_uses_gateway = rt->rt_uses_gateway;
+ xdst->u.rt.rt_gw_family = rt->rt_gw_family;
+ if (rt->rt_gw_family == AF_INET)
+ xdst->u.rt.rt_gw4 = rt->rt_gw4;
+ else if (rt->rt_gw_family == AF_INET6)
+ xdst->u.rt.rt_gw6 = rt->rt_gw6;
xdst->u.rt.rt_pmtu = rt->rt_pmtu;
xdst->u.rt.rt_mtu_locked = rt->rt_mtu_locked;
INIT_LIST_HEAD(&xdst->u.rt.rt_uncached);
static struct fib6_info *addrconf_get_prefix_route(const struct in6_addr *pfx,
int plen,
const struct net_device *dev,
- u32 flags, u32 noflags);
+ u32 flags, u32 noflags,
+ bool no_gw);
static void addrconf_dad_start(struct inet6_ifaddr *ifp);
static void addrconf_dad_work(struct work_struct *w);
{
struct fib6_info *f6i;
- f6i = addrconf_get_prefix_route(&ifp->addr,
- ifp->prefix_len,
- ifp->idev->dev,
- 0, RTF_GATEWAY | RTF_DEFAULT);
+ f6i = addrconf_get_prefix_route(&ifp->addr, ifp->prefix_len,
+ ifp->idev->dev, 0, RTF_DEFAULT, true);
if (f6i) {
if (del_rt)
ip6_del_rt(dev_net(ifp->idev->dev), f6i);
static struct fib6_info *addrconf_get_prefix_route(const struct in6_addr *pfx,
int plen,
const struct net_device *dev,
- u32 flags, u32 noflags)
+ u32 flags, u32 noflags,
+ bool no_gw)
{
struct fib6_node *fn;
struct fib6_info *rt = NULL;
goto out;
for_each_fib6_node_rt_rcu(fn) {
- if (rt->fib6_nh.nh_dev->ifindex != dev->ifindex)
+ if (rt->fib6_nh.fib_nh_dev->ifindex != dev->ifindex)
+ continue;
+ if (no_gw && rt->fib6_nh.fib_nh_gw_family)
continue;
if ((rt->fib6_flags & flags) != flags)
continue;
pinfo->prefix_len,
dev,
RTF_ADDRCONF | RTF_PREFIX_RT,
- RTF_GATEWAY | RTF_DEFAULT);
+ RTF_DEFAULT, true);
if (rt) {
/* Autoconf prefix route */
struct fib6_info *f6i;
u32 prio;
- f6i = addrconf_get_prefix_route(&ifp->addr,
- ifp->prefix_len,
- ifp->idev->dev,
- 0, RTF_GATEWAY | RTF_DEFAULT);
+ f6i = addrconf_get_prefix_route(&ifp->addr, ifp->prefix_len,
+ ifp->idev->dev, 0, RTF_DEFAULT, true);
if (!f6i)
return -ENOENT;
struct fib6_info *rt;
rt = addrconf_get_prefix_route(&ifp->peer_addr, 128,
- ifp->idev->dev, 0, 0);
+ ifp->idev->dev, 0, 0,
+ false);
if (rt)
ip6_del_rt(net, rt);
}
#include <linux/export.h>
#include <net/ipv6.h>
-#include <net/addrconf.h>
+#include <net/ipv6_stubs.h>
#include <net/ip.h>
/* if ipv6 module registers this function is used by xfrm to force all
return 0;
}
+static int eafnosupport_fib6_nh_init(struct net *net, struct fib6_nh *fib6_nh,
+ struct fib6_config *cfg, gfp_t gfp_flags,
+ struct netlink_ext_ack *extack)
+{
+ NL_SET_ERR_MSG(extack, "IPv6 support not enabled in kernel");
+ return -EAFNOSUPPORT;
+}
+
const struct ipv6_stub *ipv6_stub __read_mostly = &(struct ipv6_stub) {
.ipv6_dst_lookup = eafnosupport_ipv6_dst_lookup,
.ipv6_route_input = eafnosupport_ipv6_route_input,
.fib6_lookup = eafnosupport_fib6_lookup,
.fib6_multipath_select = eafnosupport_fib6_multipath_select,
.ip6_mtu_from_fib6 = eafnosupport_ip6_mtu_from_fib6,
+ .fib6_nh_init = eafnosupport_fib6_nh_init,
};
EXPORT_SYMBOL_GPL(ipv6_stub);
#include <net/transp_v6.h>
#include <net/ip6_route.h>
#include <net/addrconf.h>
+#include <net/ipv6_stubs.h>
#include <net/ndisc.h>
#ifdef CONFIG_IPV6_TUNNEL
#include <net/ip6_tunnel.h>
net->ipv6.sysctl.bindv6only = 0;
net->ipv6.sysctl.icmpv6_time = 1*HZ;
net->ipv6.sysctl.icmpv6_echo_ignore_all = 0;
+ net->ipv6.sysctl.icmpv6_echo_ignore_multicast = 0;
+ net->ipv6.sysctl.icmpv6_echo_ignore_anycast = 0;
net->ipv6.sysctl.flowlabel_consistency = 1;
net->ipv6.sysctl.auto_flowlabels = IP6_DEFAULT_AUTO_FLOW_LABELS;
net->ipv6.sysctl.idgen_retries = 3;
.fib6_lookup = fib6_lookup,
.fib6_multipath_select = fib6_multipath_select,
.ip6_mtu_from_fib6 = ip6_mtu_from_fib6,
+ .fib6_nh_init = fib6_nh_init,
+ .fib6_nh_release = fib6_nh_release,
.udpv6_encap_enable = udpv6_encap_enable,
.ndisc_send_na = ndisc_send_na,
.nd_tbl = &nd_tbl,
struct dst_entry *dst;
struct ipcm6_cookie ipc6;
u32 mark = IP6_REPLY_MARK(net, skb->mark);
+ bool acast;
+
+ if (ipv6_addr_is_multicast(&ipv6_hdr(skb)->daddr) &&
+ net->ipv6.sysctl.icmpv6_echo_ignore_multicast)
+ return;
saddr = &ipv6_hdr(skb)->daddr;
+ acast = ipv6_anycast_destination(skb_dst(skb), saddr);
+ if (acast && net->ipv6.sysctl.icmpv6_echo_ignore_anycast)
+ return;
+
if (!ipv6_unicast_destination(skb) &&
- !(net->ipv6.sysctl.anycast_src_echo_reply &&
- ipv6_anycast_destination(skb_dst(skb), saddr)))
+ !(net->ipv6.sysctl.anycast_src_echo_reply && acast))
saddr = NULL;
memcpy(&tmp_hdr, icmph, sizeof(tmp_hdr));
.mode = 0644,
.proc_handler = proc_dointvec,
},
+ {
+ .procname = "echo_ignore_multicast",
+ .data = &init_net.ipv6.sysctl.icmpv6_echo_ignore_multicast,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ },
+ {
+ .procname = "echo_ignore_anycast",
+ .data = &init_net.ipv6.sysctl.icmpv6_echo_ignore_anycast,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ },
{ },
};
if (table) {
table[0].data = &net->ipv6.sysctl.icmpv6_time;
table[1].data = &net->ipv6.sysctl.icmpv6_echo_ignore_all;
+ table[2].data = &net->ipv6.sysctl.icmpv6_echo_ignore_multicast;
+ table[3].data = &net->ipv6.sysctl.icmpv6_echo_ignore_anycast;
}
return table;
}
{
.cmd = ILA_CMD_ADD,
.doit = ila_xlat_nl_cmd_add_mapping,
- .policy = ila_nl_policy,
.flags = GENL_ADMIN_PERM,
},
{
.cmd = ILA_CMD_DEL,
.doit = ila_xlat_nl_cmd_del_mapping,
- .policy = ila_nl_policy,
.flags = GENL_ADMIN_PERM,
},
{
.cmd = ILA_CMD_FLUSH,
.doit = ila_xlat_nl_cmd_flush,
- .policy = ila_nl_policy,
.flags = GENL_ADMIN_PERM,
},
{
.start = ila_xlat_nl_dump_start,
.dumpit = ila_xlat_nl_dump,
.done = ila_xlat_nl_dump_done,
- .policy = ila_nl_policy,
},
};
.name = ILA_GENL_NAME,
.version = ILA_GENL_VERSION,
.maxattr = ILA_ATTR_MAX,
+ .policy = ila_nl_policy,
.netnsok = true,
.parallel_ops = true,
.module = THIS_MODULE,
free_percpu(f6i->rt6i_pcpu);
}
- lwtstate_put(f6i->fib6_nh.nh_lwtstate);
-
- if (f6i->fib6_nh.nh_dev)
- dev_put(f6i->fib6_nh.nh_dev);
+ fib6_nh_release(&f6i->fib6_nh);
ip_fib_metrics_put(f6i->fib6_metrics);
{
struct fib6_info *rt = v;
struct ipv6_route_iter *iter = seq->private;
+ unsigned int flags = rt->fib6_flags;
const struct net_device *dev;
seq_printf(seq, "%pi6 %02x ", &rt->fib6_dst.addr, rt->fib6_dst.plen);
#else
seq_puts(seq, "00000000000000000000000000000000 00 ");
#endif
- if (rt->fib6_flags & RTF_GATEWAY)
- seq_printf(seq, "%pi6", &rt->fib6_nh.nh_gw);
- else
+ if (rt->fib6_nh.fib_nh_gw_family) {
+ flags |= RTF_GATEWAY;
+ seq_printf(seq, "%pi6", &rt->fib6_nh.fib_nh_gw6);
+ } else {
seq_puts(seq, "00000000000000000000000000000000");
+ }
- dev = rt->fib6_nh.nh_dev;
+ dev = rt->fib6_nh.fib_nh_dev;
seq_printf(seq, " %08x %08x %08x %08x %8s\n",
rt->fib6_metric, atomic_read(&rt->fib6_ref), 0,
- rt->fib6_flags, dev ? dev->name : "");
+ flags, dev ? dev->name : "");
iter->w.leaf = NULL;
return 0;
}
neigh = __neigh_create(&nd_tbl, nexthop, dst->dev, false);
if (!IS_ERR(neigh)) {
sock_confirm_neigh(skb, neigh);
- ret = neigh_output(neigh, skb);
+ ret = neigh_output(neigh, skb, false);
rcu_read_unlock_bh();
return ret;
}
.key_offset = offsetof(struct mfc6_cache, cmparg),
.key_len = sizeof(struct mfc6_cache_cmp_arg),
.nelem_hint = 3,
- .locks_mul = 1,
.obj_cmpfn = ip6mr_hash_cmp,
.automatic_shrinking = true,
};
rt = rt6_get_dflt_router(net, &ipv6_hdr(skb)->saddr, skb->dev);
if (rt) {
- neigh = ip6_neigh_lookup(&rt->fib6_nh.nh_gw,
- rt->fib6_nh.nh_dev, NULL,
+ neigh = ip6_neigh_lookup(&rt->fib6_nh.fib_nh_gw6,
+ rt->fib6_nh.fib_nh_dev, NULL,
&ipv6_hdr(skb)->saddr);
if (!neigh) {
ND_PRINTK(0, err,
return;
}
- neigh = ip6_neigh_lookup(&rt->fib6_nh.nh_gw,
- rt->fib6_nh.nh_dev, NULL,
+ neigh = ip6_neigh_lookup(&rt->fib6_nh.fib_nh_gw6,
+ rt->fib6_nh.fib_nh_dev, NULL,
&ipv6_hdr(skb)->saddr);
if (!neigh) {
ND_PRINTK(0, err,
if NF_TABLES_IPV6
-config NFT_CHAIN_ROUTE_IPV6
- tristate "IPv6 nf_tables route chain support"
- help
- This option enables the "route" chain for IPv6 in nf_tables. This
- chain type is used to force packet re-routing after mangling header
- fields such as the source, destination, flowlabel, hop-limit and
- the packet mark.
-
config NFT_REJECT_IPV6
select NF_REJECT_IPV6
default NFT_REJECT
config IP6_NF_TARGET_MASQUERADE
tristate "MASQUERADE target support"
- select NF_NAT_MASQUERADE
+ select NETFILTER_XT_TARGET_MASQUERADE
help
- Masquerading is a special case of NAT: all outgoing connections are
- changed to seem to come from a particular interface's address, and
- if the interface goes down, those connections are lost. This is
- only useful for dialup accounts with dynamic IP address (ie. your IP
- address will be different on next dialup).
-
- To compile it as a module, choose M here. If unsure, say N.
+ This is a backwards-compat option for the user's convenience
+ (e.g. when running oldconfig). It selects NETFILTER_XT_TARGET_MASQUERADE.
config IP6_NF_TARGET_NPT
tristate "NPT (Network Prefix translation) target support"
obj-$(CONFIG_NF_DUP_IPV6) += nf_dup_ipv6.o
# nf_tables
-obj-$(CONFIG_NFT_CHAIN_ROUTE_IPV6) += nft_chain_route_ipv6.o
obj-$(CONFIG_NFT_REJECT_IPV6) += nft_reject_ipv6.o
obj-$(CONFIG_NFT_DUP_IPV6) += nft_dup_ipv6.o
obj-$(CONFIG_NFT_FIB_IPV6) += nft_fib_ipv6.o
obj-$(CONFIG_IP6_NF_MATCH_SRH) += ip6t_srh.o
# targets
-obj-$(CONFIG_IP6_NF_TARGET_MASQUERADE) += ip6t_MASQUERADE.o
obj-$(CONFIG_IP6_NF_TARGET_NPT) += ip6t_NPT.o
obj-$(CONFIG_IP6_NF_TARGET_REJECT) += ip6t_REJECT.o
obj-$(CONFIG_IP6_NF_TARGET_SYNPROXY) += ip6t_SYNPROXY.o
+++ /dev/null
-/*
- * Copyright (c) 2011 Patrick McHardy <kaber@trash.net>
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- *
- * Based on Rusty Russell's IPv6 MASQUERADE target. Development of IPv6
- * NAT funded by Astaro.
- */
-
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/netdevice.h>
-#include <linux/ipv6.h>
-#include <linux/netfilter.h>
-#include <linux/netfilter_ipv6.h>
-#include <linux/netfilter/x_tables.h>
-#include <net/netfilter/nf_nat.h>
-#include <net/addrconf.h>
-#include <net/ipv6.h>
-#include <net/netfilter/ipv6/nf_nat_masquerade.h>
-
-static unsigned int
-masquerade_tg6(struct sk_buff *skb, const struct xt_action_param *par)
-{
- return nf_nat_masquerade_ipv6(skb, par->targinfo, xt_out(par));
-}
-
-static int masquerade_tg6_checkentry(const struct xt_tgchk_param *par)
-{
- const struct nf_nat_range2 *range = par->targinfo;
-
- if (range->flags & NF_NAT_RANGE_MAP_IPS)
- return -EINVAL;
- return nf_ct_netns_get(par->net, par->family);
-}
-
-static void masquerade_tg6_destroy(const struct xt_tgdtor_param *par)
-{
- nf_ct_netns_put(par->net, par->family);
-}
-
-static struct xt_target masquerade_tg6_reg __read_mostly = {
- .name = "MASQUERADE",
- .family = NFPROTO_IPV6,
- .checkentry = masquerade_tg6_checkentry,
- .destroy = masquerade_tg6_destroy,
- .target = masquerade_tg6,
- .targetsize = sizeof(struct nf_nat_range),
- .table = "nat",
- .hooks = 1 << NF_INET_POST_ROUTING,
- .me = THIS_MODULE,
-};
-
-static int __init masquerade_tg6_init(void)
-{
- int err;
-
- err = xt_register_target(&masquerade_tg6_reg);
- if (err)
- return err;
-
- err = nf_nat_masquerade_ipv6_register_notifier();
- if (err)
- xt_unregister_target(&masquerade_tg6_reg);
-
- return err;
-}
-static void __exit masquerade_tg6_exit(void)
-{
- nf_nat_masquerade_ipv6_unregister_notifier();
- xt_unregister_target(&masquerade_tg6_reg);
-}
-
-module_init(masquerade_tg6_init);
-module_exit(masquerade_tg6_exit);
-
-MODULE_LICENSE("GPL");
-MODULE_AUTHOR("Patrick McHardy <kaber@trash.net>");
-MODULE_DESCRIPTION("Xtables: automatic address SNAT");
+++ /dev/null
-/*
- * Copyright (c) 2008 Patrick McHardy <kaber@trash.net>
- * Copyright (c) 2012 Pablo Neira Ayuso <pablo@netfilter.org>
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- *
- * Development of this code funded by Astaro AG (http://www.astaro.com/)
- */
-
-#include <linux/module.h>
-#include <linux/init.h>
-#include <linux/list.h>
-#include <linux/skbuff.h>
-#include <linux/netlink.h>
-#include <linux/netfilter.h>
-#include <linux/netfilter_ipv6.h>
-#include <linux/netfilter/nfnetlink.h>
-#include <linux/netfilter/nf_tables.h>
-#include <net/netfilter/nf_tables.h>
-#include <net/netfilter/nf_tables_ipv6.h>
-#include <net/route.h>
-
-static unsigned int nf_route_table_hook(void *priv,
- struct sk_buff *skb,
- const struct nf_hook_state *state)
-{
- unsigned int ret;
- struct nft_pktinfo pkt;
- struct in6_addr saddr, daddr;
- u_int8_t hop_limit;
- u32 mark, flowlabel;
- int err;
-
- nft_set_pktinfo(&pkt, skb, state);
- nft_set_pktinfo_ipv6(&pkt, skb);
-
- /* save source/dest address, mark, hoplimit, flowlabel, priority */
- memcpy(&saddr, &ipv6_hdr(skb)->saddr, sizeof(saddr));
- memcpy(&daddr, &ipv6_hdr(skb)->daddr, sizeof(daddr));
- mark = skb->mark;
- hop_limit = ipv6_hdr(skb)->hop_limit;
-
- /* flowlabel and prio (includes version, which shouldn't change either */
- flowlabel = *((u32 *)ipv6_hdr(skb));
-
- ret = nft_do_chain(&pkt, priv);
- if (ret != NF_DROP && ret != NF_STOLEN &&
- (memcmp(&ipv6_hdr(skb)->saddr, &saddr, sizeof(saddr)) ||
- memcmp(&ipv6_hdr(skb)->daddr, &daddr, sizeof(daddr)) ||
- skb->mark != mark ||
- ipv6_hdr(skb)->hop_limit != hop_limit ||
- flowlabel != *((u_int32_t *)ipv6_hdr(skb)))) {
- err = ip6_route_me_harder(state->net, skb);
- if (err < 0)
- ret = NF_DROP_ERR(err);
- }
-
- return ret;
-}
-
-static const struct nft_chain_type nft_chain_route_ipv6 = {
- .name = "route",
- .type = NFT_CHAIN_T_ROUTE,
- .family = NFPROTO_IPV6,
- .owner = THIS_MODULE,
- .hook_mask = (1 << NF_INET_LOCAL_OUT),
- .hooks = {
- [NF_INET_LOCAL_OUT] = nf_route_table_hook,
- },
-};
-
-static int __init nft_chain_route_init(void)
-{
- nft_register_chain_type(&nft_chain_route_ipv6);
-
- return 0;
-}
-
-static void __exit nft_chain_route_exit(void)
-{
- nft_unregister_chain_type(&nft_chain_route_ipv6);
-}
-
-module_init(nft_chain_route_init);
-module_exit(nft_chain_route_exit);
-
-MODULE_LICENSE("GPL");
-MODULE_AUTHOR("Patrick McHardy <kaber@trash.net>");
-MODULE_ALIAS_NFT_CHAIN(AF_INET6, "route");
#include <net/secure_seq.h>
#include <linux/netfilter.h>
-static u32 __ipv6_select_ident(struct net *net, u32 hashrnd,
+static u32 __ipv6_select_ident(struct net *net,
const struct in6_addr *dst,
const struct in6_addr *src)
{
+ const struct {
+ struct in6_addr dst;
+ struct in6_addr src;
+ } __aligned(SIPHASH_ALIGNMENT) combined = {
+ .dst = *dst,
+ .src = *src,
+ };
u32 hash, id;
- hash = __ipv6_addr_jhash(dst, hashrnd);
- hash = __ipv6_addr_jhash(src, hash);
- hash ^= net_hash_mix(net);
+ /* Note the following code is not safe, but this is okay. */
+ if (unlikely(siphash_key_is_zero(&net->ipv4.ip_id_key)))
+ get_random_bytes(&net->ipv4.ip_id_key,
+ sizeof(net->ipv4.ip_id_key));
+
+ hash = siphash(&combined, sizeof(combined), &net->ipv4.ip_id_key);
/* Treat id of 0 as unset and if we get 0 back from ip_idents_reserve,
* set the hight order instead thus minimizing possible future
*/
__be32 ipv6_proxy_select_ident(struct net *net, struct sk_buff *skb)
{
- static u32 ip6_proxy_idents_hashrnd __read_mostly;
struct in6_addr buf[2];
struct in6_addr *addrs;
u32 id;
if (!addrs)
return 0;
- net_get_random_once(&ip6_proxy_idents_hashrnd,
- sizeof(ip6_proxy_idents_hashrnd));
-
- id = __ipv6_select_ident(net, ip6_proxy_idents_hashrnd,
- &addrs[1], &addrs[0]);
+ id = __ipv6_select_ident(net, &addrs[1], &addrs[0]);
return htonl(id);
}
EXPORT_SYMBOL_GPL(ipv6_proxy_select_ident);
const struct in6_addr *daddr,
const struct in6_addr *saddr)
{
- static u32 ip6_idents_hashrnd __read_mostly;
u32 id;
- net_get_random_once(&ip6_idents_hashrnd, sizeof(ip6_idents_hashrnd));
-
- id = __ipv6_select_ident(net, ip6_idents_hashrnd, daddr, saddr);
+ id = __ipv6_select_ident(net, daddr, saddr);
return htonl(id);
}
EXPORT_SYMBOL(ipv6_select_ident);
struct sk_buff *skb, u32 mtu);
static void rt6_do_redirect(struct dst_entry *dst, struct sock *sk,
struct sk_buff *skb);
-static int rt6_score_route(struct fib6_info *rt, int oif, int strict);
+static int rt6_score_route(const struct fib6_nh *nh, u32 fib6_flags, int oif,
+ int strict);
static size_t rt6_nlmsg_size(struct fib6_info *rt);
static int rt6_fill_node(struct net *net, struct sk_buff *skb,
struct fib6_info *rt, struct dst_entry *dst,
if (!fl6->mp_hash)
fl6->mp_hash = rt6_multipath_hash(net, fl6, skb, NULL);
- if (fl6->mp_hash <= atomic_read(&match->fib6_nh.nh_upper_bound))
+ if (fl6->mp_hash <= atomic_read(&match->fib6_nh.fib_nh_upper_bound))
return match;
list_for_each_entry_safe(sibling, next_sibling, &match->fib6_siblings,
fib6_siblings) {
+ const struct fib6_nh *nh = &sibling->fib6_nh;
int nh_upper_bound;
- nh_upper_bound = atomic_read(&sibling->fib6_nh.nh_upper_bound);
+ nh_upper_bound = atomic_read(&nh->fib_nh_upper_bound);
if (fl6->mp_hash > nh_upper_bound)
continue;
- if (rt6_score_route(sibling, oif, strict) < 0)
+ if (rt6_score_route(nh, sibling->fib6_flags, oif, strict) < 0)
break;
match = sibling;
break;
* Route lookup. rcu_read_lock() should be held.
*/
+static bool __rt6_device_match(struct net *net, const struct fib6_nh *nh,
+ const struct in6_addr *saddr, int oif, int flags)
+{
+ const struct net_device *dev;
+
+ if (nh->fib_nh_flags & RTNH_F_DEAD)
+ return false;
+
+ dev = nh->fib_nh_dev;
+ if (oif) {
+ if (dev->ifindex == oif)
+ return true;
+ } else {
+ if (ipv6_chk_addr(net, saddr, dev,
+ flags & RT6_LOOKUP_F_IFACE))
+ return true;
+ }
+
+ return false;
+}
+
static inline struct fib6_info *rt6_device_match(struct net *net,
struct fib6_info *rt,
const struct in6_addr *saddr,
int oif,
int flags)
{
+ const struct fib6_nh *nh;
struct fib6_info *sprt;
if (!oif && ipv6_addr_any(saddr) &&
- !(rt->fib6_nh.nh_flags & RTNH_F_DEAD))
+ !(rt->fib6_nh.fib_nh_flags & RTNH_F_DEAD))
return rt;
for (sprt = rt; sprt; sprt = rcu_dereference(sprt->fib6_next)) {
- const struct net_device *dev = sprt->fib6_nh.nh_dev;
-
- if (sprt->fib6_nh.nh_flags & RTNH_F_DEAD)
- continue;
-
- if (oif) {
- if (dev->ifindex == oif)
- return sprt;
- } else {
- if (ipv6_chk_addr(net, saddr, dev,
- flags & RT6_LOOKUP_F_IFACE))
- return sprt;
- }
+ nh = &sprt->fib6_nh;
+ if (__rt6_device_match(net, nh, saddr, oif, flags))
+ return sprt;
}
if (oif && flags & RT6_LOOKUP_F_IFACE)
return net->ipv6.fib6_null_entry;
- return rt->fib6_nh.nh_flags & RTNH_F_DEAD ? net->ipv6.fib6_null_entry : rt;
+ return rt->fib6_nh.fib_nh_flags & RTNH_F_DEAD ? net->ipv6.fib6_null_entry : rt;
}
#ifdef CONFIG_IPV6_ROUTER_PREF
kfree(work);
}
-static void rt6_probe(struct fib6_info *rt)
+static void rt6_probe(struct fib6_nh *fib6_nh)
{
struct __rt6_probe_work *work = NULL;
const struct in6_addr *nh_gw;
* Router Reachability Probe MUST be rate-limited
* to no more than one per minute.
*/
- if (!rt || !(rt->fib6_flags & RTF_GATEWAY))
+ if (fib6_nh->fib_nh_gw_family)
return;
- nh_gw = &rt->fib6_nh.nh_gw;
- dev = rt->fib6_nh.nh_dev;
+ nh_gw = &fib6_nh->fib_nh_gw6;
+ dev = fib6_nh->fib_nh_dev;
rcu_read_lock_bh();
idev = __in6_dev_get(dev);
neigh = __ipv6_neigh_lookup_noref(dev, nh_gw);
__neigh_set_probe_once(neigh);
}
write_unlock(&neigh->lock);
- } else if (time_after(jiffies, rt->last_probe +
+ } else if (time_after(jiffies, fib6_nh->last_probe +
idev->cnf.rtr_probe_interval)) {
work = kmalloc(sizeof(*work), GFP_ATOMIC);
}
if (work) {
- rt->last_probe = jiffies;
+ fib6_nh->last_probe = jiffies;
INIT_WORK(&work->work, rt6_probe_deferred);
work->target = *nh_gw;
dev_hold(dev);
rcu_read_unlock_bh();
}
#else
-static inline void rt6_probe(struct fib6_info *rt)
+static inline void rt6_probe(struct fib6_nh *fib6_nh)
{
}
#endif
/*
* Default Router Selection (RFC 2461 6.3.6)
*/
-static inline int rt6_check_dev(struct fib6_info *rt, int oif)
-{
- const struct net_device *dev = rt->fib6_nh.nh_dev;
-
- if (!oif || dev->ifindex == oif)
- return 2;
- return 0;
-}
-
-static inline enum rt6_nud_state rt6_check_neigh(struct fib6_info *rt)
+static enum rt6_nud_state rt6_check_neigh(const struct fib6_nh *fib6_nh)
{
enum rt6_nud_state ret = RT6_NUD_FAIL_HARD;
struct neighbour *neigh;
- if (rt->fib6_flags & RTF_NONEXTHOP ||
- !(rt->fib6_flags & RTF_GATEWAY))
- return RT6_NUD_SUCCEED;
-
rcu_read_lock_bh();
- neigh = __ipv6_neigh_lookup_noref(rt->fib6_nh.nh_dev,
- &rt->fib6_nh.nh_gw);
+ neigh = __ipv6_neigh_lookup_noref(fib6_nh->fib_nh_dev,
+ &fib6_nh->fib_nh_gw6);
if (neigh) {
read_lock(&neigh->lock);
if (neigh->nud_state & NUD_VALID)
return ret;
}
-static int rt6_score_route(struct fib6_info *rt, int oif, int strict)
+static int rt6_score_route(const struct fib6_nh *nh, u32 fib6_flags, int oif,
+ int strict)
{
- int m;
+ int m = 0;
+
+ if (!oif || nh->fib_nh_dev->ifindex == oif)
+ m = 2;
- m = rt6_check_dev(rt, oif);
if (!m && (strict & RT6_LOOKUP_F_IFACE))
return RT6_NUD_FAIL_HARD;
#ifdef CONFIG_IPV6_ROUTER_PREF
- m |= IPV6_DECODE_PREF(IPV6_EXTRACT_PREF(rt->fib6_flags)) << 2;
+ m |= IPV6_DECODE_PREF(IPV6_EXTRACT_PREF(fib6_flags)) << 2;
#endif
- if (strict & RT6_LOOKUP_F_REACHABLE) {
- int n = rt6_check_neigh(rt);
+ if ((strict & RT6_LOOKUP_F_REACHABLE) &&
+ !(fib6_flags & RTF_NONEXTHOP) && nh->fib_nh_gw_family) {
+ int n = rt6_check_neigh(nh);
if (n < 0)
return n;
}
return m;
}
-/* called with rc_read_lock held */
-static inline bool fib6_ignore_linkdown(const struct fib6_info *f6i)
+static bool find_match(struct fib6_nh *nh, u32 fib6_flags,
+ int oif, int strict, int *mpri, bool *do_rr)
{
- const struct net_device *dev = fib6_info_nh_dev(f6i);
+ bool match_do_rr = false;
bool rc = false;
-
- if (dev) {
- const struct inet6_dev *idev = __in6_dev_get(dev);
-
- rc = !!idev->cnf.ignore_routes_with_linkdown;
- }
-
- return rc;
-}
-
-static struct fib6_info *find_match(struct fib6_info *rt, int oif, int strict,
- int *mpri, struct fib6_info *match,
- bool *do_rr)
-{
int m;
- bool match_do_rr = false;
- if (rt->fib6_nh.nh_flags & RTNH_F_DEAD)
+ if (nh->fib_nh_flags & RTNH_F_DEAD)
goto out;
- if (fib6_ignore_linkdown(rt) &&
- rt->fib6_nh.nh_flags & RTNH_F_LINKDOWN &&
+ if (ip6_ignore_linkdown(nh->fib_nh_dev) &&
+ nh->fib_nh_flags & RTNH_F_LINKDOWN &&
!(strict & RT6_LOOKUP_F_IGNORE_LINKSTATE))
goto out;
- if (fib6_check_expired(rt))
- goto out;
-
- m = rt6_score_route(rt, oif, strict);
+ m = rt6_score_route(nh, fib6_flags, oif, strict);
if (m == RT6_NUD_FAIL_DO_RR) {
match_do_rr = true;
m = 0; /* lowest valid score */
}
if (strict & RT6_LOOKUP_F_REACHABLE)
- rt6_probe(rt);
+ rt6_probe(nh);
/* note that m can be RT6_NUD_FAIL_PROBE at this point */
if (m > *mpri) {
*do_rr = match_do_rr;
*mpri = m;
- match = rt;
+ rc = true;
}
out:
- return match;
+ return rc;
}
-static struct fib6_info *find_rr_leaf(struct fib6_node *fn,
- struct fib6_info *leaf,
- struct fib6_info *rr_head,
- u32 metric, int oif, int strict,
- bool *do_rr)
+static void __find_rr_leaf(struct fib6_info *rt_start,
+ struct fib6_info *nomatch, u32 metric,
+ struct fib6_info **match, struct fib6_info **cont,
+ int oif, int strict, bool *do_rr, int *mpri)
{
- struct fib6_info *rt, *match, *cont;
- int mpri = -1;
+ struct fib6_info *rt;
- match = NULL;
- cont = NULL;
- for (rt = rr_head; rt; rt = rcu_dereference(rt->fib6_next)) {
- if (rt->fib6_metric != metric) {
- cont = rt;
- break;
+ for (rt = rt_start;
+ rt && rt != nomatch;
+ rt = rcu_dereference(rt->fib6_next)) {
+ struct fib6_nh *nh;
+
+ if (cont && rt->fib6_metric != metric) {
+ *cont = rt;
+ return;
}
- match = find_match(rt, oif, strict, &mpri, match, do_rr);
+ if (fib6_check_expired(rt))
+ continue;
+
+ nh = &rt->fib6_nh;
+ if (find_match(nh, rt->fib6_flags, oif, strict, mpri, do_rr))
+ *match = rt;
}
+}
- for (rt = leaf; rt && rt != rr_head;
- rt = rcu_dereference(rt->fib6_next)) {
- if (rt->fib6_metric != metric) {
- cont = rt;
- break;
- }
+static struct fib6_info *find_rr_leaf(struct fib6_node *fn,
+ struct fib6_info *leaf,
+ struct fib6_info *rr_head,
+ u32 metric, int oif, int strict,
+ bool *do_rr)
+{
+ struct fib6_info *match = NULL, *cont = NULL;
+ int mpri = -1;
- match = find_match(rt, oif, strict, &mpri, match, do_rr);
- }
+ __find_rr_leaf(rr_head, NULL, metric, &match, &cont,
+ oif, strict, do_rr, &mpri);
+
+ __find_rr_leaf(leaf, rr_head, metric, &match, &cont,
+ oif, strict, do_rr, &mpri);
if (match || !cont)
return match;
- for (rt = cont; rt; rt = rcu_dereference(rt->fib6_next))
- match = find_match(rt, oif, strict, &mpri, match, do_rr);
+ __find_rr_leaf(cont, NULL, metric, &match, NULL,
+ oif, strict, do_rr, &mpri);
return match;
}
static bool rt6_is_gw_or_nonexthop(const struct fib6_info *rt)
{
- return (rt->fib6_flags & (RTF_NONEXTHOP | RTF_GATEWAY));
+ return (rt->fib6_flags & RTF_NONEXTHOP) || rt->fib6_nh.fib_nh_gw_family;
}
#ifdef CONFIG_IPV6_ROUTE_INFO
/* called with rcu_lock held */
static struct net_device *ip6_rt_get_dev_rcu(struct fib6_info *rt)
{
- struct net_device *dev = rt->fib6_nh.nh_dev;
+ struct net_device *dev = rt->fib6_nh.fib_nh_dev;
if (rt->fib6_flags & (RTF_LOCAL | RTF_ANYCAST)) {
/* for copies of local routes, dst->dev needs to be the
rt->dst.input = ip6_forward;
}
- if (ort->fib6_nh.nh_lwtstate) {
- rt->dst.lwtstate = lwtstate_get(ort->fib6_nh.nh_lwtstate);
+ if (ort->fib6_nh.fib_nh_lws) {
+ rt->dst.lwtstate = lwtstate_get(ort->fib6_nh.fib_nh_lws);
lwtunnel_set_redirect(&rt->dst);
}
rt->rt6i_dst = ort->fib6_dst;
rt->rt6i_idev = dev ? in6_dev_get(dev) : NULL;
- rt->rt6i_gateway = ort->fib6_nh.nh_gw;
rt->rt6i_flags = ort->fib6_flags;
+ if (ort->fib6_nh.fib_nh_gw_family) {
+ rt->rt6i_gateway = ort->fib6_nh.fib_nh_gw6;
+ rt->rt6i_flags |= RTF_GATEWAY;
+ }
rt6_set_from(rt, ort);
#ifdef CONFIG_IPV6_SUBTREES
rt->rt6i_src = ort->fib6_src;
}
}
-static bool ip6_hold_safe(struct net *net, struct rt6_info **prt,
- bool null_fallback)
+static bool ip6_hold_safe(struct net *net, struct rt6_info **prt)
{
struct rt6_info *rt = *prt;
if (dst_hold_safe(&rt->dst))
return true;
- if (null_fallback) {
+ if (net) {
rt = net->ipv6.ip6_null_entry;
dst_hold(&rt->dst);
} else {
static struct rt6_info *ip6_create_rt_rcu(struct fib6_info *rt)
{
unsigned short flags = fib6_info_dst_flags(rt);
- struct net_device *dev = rt->fib6_nh.nh_dev;
+ struct net_device *dev = rt->fib6_nh.fib_nh_dev;
struct rt6_info *nrt;
if (!fib6_info_hold_safe(rt))
fn = fib6_node_lookup(&table->tb6_root, &fl6->daddr, &fl6->saddr);
restart:
f6i = rcu_dereference(fn->leaf);
- if (!f6i) {
+ if (!f6i)
f6i = net->ipv6.fib6_null_entry;
- } else {
+ else
f6i = rt6_device_match(net, f6i, &fl6->saddr,
fl6->flowi6_oif, flags);
- if (f6i->fib6_nsiblings && fl6->flowi6_oif == 0)
- f6i = fib6_multipath_select(net, f6i, fl6,
- fl6->flowi6_oif, skb,
- flags);
- }
+
if (f6i == net->ipv6.fib6_null_entry) {
fn = fib6_backtrack(fn, &fl6->saddr);
if (fn)
goto restart;
- }
- trace_fib6_table_lookup(net, f6i, table, fl6);
+ rt = net->ipv6.ip6_null_entry;
+ dst_hold(&rt->dst);
+ goto out;
+ }
+ if (f6i->fib6_nsiblings && fl6->flowi6_oif == 0)
+ f6i = fib6_multipath_select(net, f6i, fl6, fl6->flowi6_oif, skb,
+ flags);
/* Search through exception table */
rt = rt6_find_cached_rt(f6i, &fl6->daddr, &fl6->saddr);
if (rt) {
- if (ip6_hold_safe(net, &rt, true))
+ if (ip6_hold_safe(net, &rt))
dst_use_noref(&rt->dst, jiffies);
- } else if (f6i == net->ipv6.fib6_null_entry) {
- rt = net->ipv6.ip6_null_entry;
- dst_hold(&rt->dst);
} else {
rt = ip6_create_rt_rcu(f6i);
}
+out:
+ trace_fib6_table_lookup(net, f6i, table, fl6);
+
rcu_read_unlock();
return rt;
pcpu_rt = *p;
if (pcpu_rt)
- ip6_hold_safe(NULL, &pcpu_rt, false);
+ ip6_hold_safe(NULL, &pcpu_rt);
return pcpu_rt;
}
mtu = min_t(unsigned int, mtu, IP6_MAX_MTU);
- return mtu - lwtunnel_headroom(rt->fib6_nh.nh_lwtstate, mtu);
+ return mtu - lwtunnel_headroom(rt->fib6_nh.fib_nh_lws, mtu);
}
static int rt6_insert_exception(struct rt6_info *nrt,
rcu_read_lock();
f6i = fib6_table_lookup(net, table, oif, fl6, strict);
- if (f6i->fib6_nsiblings)
- f6i = fib6_multipath_select(net, f6i, fl6, oif, skb, strict);
-
if (f6i == net->ipv6.fib6_null_entry) {
rt = net->ipv6.ip6_null_entry;
rcu_read_unlock();
return rt;
}
+ if (f6i->fib6_nsiblings)
+ f6i = fib6_multipath_select(net, f6i, fl6, oif, skb, strict);
+
/*Search through exception table */
rt = rt6_find_cached_rt(f6i, &fl6->daddr, &fl6->saddr);
if (rt) {
- if (ip6_hold_safe(net, &rt, true))
+ if (ip6_hold_safe(net, &rt))
dst_use_noref(&rt->dst, jiffies);
rcu_read_unlock();
return rt;
} else if (unlikely((fl6->flowi6_flags & FLOWI_FLAG_KNOWN_NH) &&
- !(f6i->fib6_flags & RTF_GATEWAY))) {
+ !f6i->fib6_nh.fib_nh_gw_family)) {
/* Create a RTF_CACHE clone which will not be
* owned by the fib6 tree. It is for the special case where
* the daddr in the skb during the neighbor look-up is different
NULL);
}
+static bool ip6_redirect_nh_match(struct fib6_info *f6i,
+ struct fib6_nh *nh,
+ struct flowi6 *fl6,
+ const struct in6_addr *gw,
+ struct rt6_info **ret)
+{
+ if (nh->fib_nh_flags & RTNH_F_DEAD || !nh->fib_nh_gw_family ||
+ fl6->flowi6_oif != nh->fib_nh_dev->ifindex)
+ return false;
+
+ /* rt_cache's gateway might be different from its 'parent'
+ * in the case of an ip redirect.
+ * So we keep searching in the exception table if the gateway
+ * is different.
+ */
+ if (!ipv6_addr_equal(gw, &nh->fib_nh_gw6)) {
+ struct rt6_info *rt_cache;
+
+ rt_cache = rt6_find_cached_rt(f6i, &fl6->daddr, &fl6->saddr);
+ if (rt_cache &&
+ ipv6_addr_equal(gw, &rt_cache->rt6i_gateway)) {
+ *ret = rt_cache;
+ return true;
+ }
+ return false;
+ }
+ return true;
+}
+
/* Handle redirects */
struct ip6rd_flowi {
struct flowi6 fl6;
int flags)
{
struct ip6rd_flowi *rdfl = (struct ip6rd_flowi *)fl6;
- struct rt6_info *ret = NULL, *rt_cache;
+ struct rt6_info *ret = NULL;
struct fib6_info *rt;
struct fib6_node *fn;
fn = fib6_node_lookup(&table->tb6_root, &fl6->daddr, &fl6->saddr);
restart:
for_each_fib6_node_rt_rcu(fn) {
- if (rt->fib6_nh.nh_flags & RTNH_F_DEAD)
- continue;
if (fib6_check_expired(rt))
continue;
if (rt->fib6_flags & RTF_REJECT)
break;
- if (!(rt->fib6_flags & RTF_GATEWAY))
- continue;
- if (fl6->flowi6_oif != rt->fib6_nh.nh_dev->ifindex)
- continue;
- /* rt_cache's gateway might be different from its 'parent'
- * in the case of an ip redirect.
- * So we keep searching in the exception table if the gateway
- * is different.
- */
- if (!ipv6_addr_equal(&rdfl->gateway, &rt->fib6_nh.nh_gw)) {
- rt_cache = rt6_find_cached_rt(rt,
- &fl6->daddr,
- &fl6->saddr);
- if (rt_cache &&
- ipv6_addr_equal(&rdfl->gateway,
- &rt_cache->rt6i_gateway)) {
- ret = rt_cache;
- break;
- }
- continue;
- }
- break;
+ if (ip6_redirect_nh_match(rt, &rt->fib6_nh, fl6,
+ &rdfl->gateway, &ret))
+ goto out;
}
if (!rt)
out:
if (ret)
- ip6_hold_safe(net, &ret, true);
+ ip6_hold_safe(net, &ret);
else
ret = ip6_create_rt_rcu(rt);
return err;
}
+static bool fib6_is_reject(u32 flags, struct net_device *dev, int addr_type)
+{
+ if ((flags & RTF_REJECT) ||
+ (dev && (dev->flags & IFF_LOOPBACK) &&
+ !(addr_type & IPV6_ADDR_LOOPBACK) &&
+ !(flags & RTF_LOCAL)))
+ return true;
+
+ return false;
+}
+
+int fib6_nh_init(struct net *net, struct fib6_nh *fib6_nh,
+ struct fib6_config *cfg, gfp_t gfp_flags,
+ struct netlink_ext_ack *extack)
+{
+ struct net_device *dev = NULL;
+ struct inet6_dev *idev = NULL;
+ int addr_type;
+ int err;
+
+ fib6_nh->fib_nh_family = AF_INET6;
+
+ err = -ENODEV;
+ if (cfg->fc_ifindex) {
+ dev = dev_get_by_index(net, cfg->fc_ifindex);
+ if (!dev)
+ goto out;
+ idev = in6_dev_get(dev);
+ if (!idev)
+ goto out;
+ }
+
+ if (cfg->fc_flags & RTNH_F_ONLINK) {
+ if (!dev) {
+ NL_SET_ERR_MSG(extack,
+ "Nexthop device required for onlink");
+ goto out;
+ }
+
+ if (!(dev->flags & IFF_UP)) {
+ NL_SET_ERR_MSG(extack, "Nexthop device is not up");
+ err = -ENETDOWN;
+ goto out;
+ }
+
+ fib6_nh->fib_nh_flags |= RTNH_F_ONLINK;
+ }
+
+ fib6_nh->fib_nh_weight = 1;
+
+ /* We cannot add true routes via loopback here,
+ * they would result in kernel looping; promote them to reject routes
+ */
+ addr_type = ipv6_addr_type(&cfg->fc_dst);
+ if (fib6_is_reject(cfg->fc_flags, dev, addr_type)) {
+ /* hold loopback dev/idev if we haven't done so. */
+ if (dev != net->loopback_dev) {
+ if (dev) {
+ dev_put(dev);
+ in6_dev_put(idev);
+ }
+ dev = net->loopback_dev;
+ dev_hold(dev);
+ idev = in6_dev_get(dev);
+ if (!idev) {
+ err = -ENODEV;
+ goto out;
+ }
+ }
+ goto set_dev;
+ }
+
+ if (cfg->fc_flags & RTF_GATEWAY) {
+ err = ip6_validate_gw(net, cfg, &dev, &idev, extack);
+ if (err)
+ goto out;
+
+ fib6_nh->fib_nh_gw6 = cfg->fc_gateway;
+ fib6_nh->fib_nh_gw_family = AF_INET6;
+ }
+
+ err = -ENODEV;
+ if (!dev)
+ goto out;
+
+ if (idev->cnf.disable_ipv6) {
+ NL_SET_ERR_MSG(extack, "IPv6 is disabled on nexthop device");
+ err = -EACCES;
+ goto out;
+ }
+
+ if (!(dev->flags & IFF_UP) && !cfg->fc_ignore_dev_down) {
+ NL_SET_ERR_MSG(extack, "Nexthop device is not up");
+ err = -ENETDOWN;
+ goto out;
+ }
+
+ if (!(cfg->fc_flags & (RTF_LOCAL | RTF_ANYCAST)) &&
+ !netif_carrier_ok(dev))
+ fib6_nh->fib_nh_flags |= RTNH_F_LINKDOWN;
+
+ err = fib_nh_common_init(&fib6_nh->nh_common, cfg->fc_encap,
+ cfg->fc_encap_type, cfg, gfp_flags, extack);
+ if (err)
+ goto out;
+set_dev:
+ fib6_nh->fib_nh_dev = dev;
+ fib6_nh->fib_nh_oif = dev->ifindex;
+ err = 0;
+out:
+ if (idev)
+ in6_dev_put(idev);
+
+ if (err) {
+ lwtstate_put(fib6_nh->fib_nh_lws);
+ fib6_nh->fib_nh_lws = NULL;
+ if (dev)
+ dev_put(dev);
+ }
+
+ return err;
+}
+
+void fib6_nh_release(struct fib6_nh *fib6_nh)
+{
+ fib_nh_common_release(&fib6_nh->nh_common);
+}
+
static struct fib6_info *ip6_route_info_create(struct fib6_config *cfg,
gfp_t gfp_flags,
struct netlink_ext_ack *extack)
{
struct net *net = cfg->fc_nlinfo.nl_net;
struct fib6_info *rt = NULL;
- struct net_device *dev = NULL;
- struct inet6_dev *idev = NULL;
struct fib6_table *table;
- int addr_type;
int err = -EINVAL;
+ int addr_type;
/* RTF_PCPU is an internal flag; can not be set by userspace */
if (cfg->fc_flags & RTF_PCPU) {
goto out;
}
#endif
- if (cfg->fc_ifindex) {
- err = -ENODEV;
- dev = dev_get_by_index(net, cfg->fc_ifindex);
- if (!dev)
- goto out;
- idev = in6_dev_get(dev);
- if (!idev)
- goto out;
- }
-
- if (cfg->fc_metric == 0)
- cfg->fc_metric = IP6_RT_PRIO_USER;
-
- if (cfg->fc_flags & RTNH_F_ONLINK) {
- if (!dev) {
- NL_SET_ERR_MSG(extack,
- "Nexthop device required for onlink");
- err = -ENODEV;
- goto out;
- }
-
- if (!(dev->flags & IFF_UP)) {
- NL_SET_ERR_MSG(extack, "Nexthop device is not up");
- err = -ENETDOWN;
- goto out;
- }
- }
err = -ENOBUFS;
if (cfg->fc_nlinfo.nlh &&
cfg->fc_protocol = RTPROT_BOOT;
rt->fib6_protocol = cfg->fc_protocol;
- addr_type = ipv6_addr_type(&cfg->fc_dst);
-
- if (cfg->fc_encap) {
- struct lwtunnel_state *lwtstate;
-
- err = lwtunnel_build_state(cfg->fc_encap_type,
- cfg->fc_encap, AF_INET6, cfg,
- &lwtstate, extack);
- if (err)
- goto out;
- rt->fib6_nh.nh_lwtstate = lwtstate_get(lwtstate);
- }
+ rt->fib6_table = table;
+ rt->fib6_metric = cfg->fc_metric;
+ rt->fib6_type = cfg->fc_type;
+ rt->fib6_flags = cfg->fc_flags & ~RTF_GATEWAY;
ipv6_addr_prefix(&rt->fib6_dst.addr, &cfg->fc_dst, cfg->fc_dst_len);
rt->fib6_dst.plen = cfg->fc_dst_len;
ipv6_addr_prefix(&rt->fib6_src.addr, &cfg->fc_src, cfg->fc_src_len);
rt->fib6_src.plen = cfg->fc_src_len;
#endif
-
- rt->fib6_metric = cfg->fc_metric;
- rt->fib6_nh.nh_weight = 1;
-
- rt->fib6_type = cfg->fc_type;
+ err = fib6_nh_init(net, &rt->fib6_nh, cfg, gfp_flags, extack);
+ if (err)
+ goto out;
/* We cannot add true routes via loopback here,
- they would result in kernel looping; promote them to reject routes
+ * they would result in kernel looping; promote them to reject routes
*/
- if ((cfg->fc_flags & RTF_REJECT) ||
- (dev && (dev->flags & IFF_LOOPBACK) &&
- !(addr_type & IPV6_ADDR_LOOPBACK) &&
- !(cfg->fc_flags & RTF_LOCAL))) {
- /* hold loopback dev/idev if we haven't done so. */
- if (dev != net->loopback_dev) {
- if (dev) {
- dev_put(dev);
- in6_dev_put(idev);
- }
- dev = net->loopback_dev;
- dev_hold(dev);
- idev = in6_dev_get(dev);
- if (!idev) {
- err = -ENODEV;
- goto out;
- }
- }
- rt->fib6_flags = RTF_REJECT|RTF_NONEXTHOP;
- goto install_route;
- }
-
- if (cfg->fc_flags & RTF_GATEWAY) {
- err = ip6_validate_gw(net, cfg, &dev, &idev, extack);
- if (err)
- goto out;
-
- rt->fib6_nh.nh_gw = cfg->fc_gateway;
- }
-
- err = -ENODEV;
- if (!dev)
- goto out;
-
- if (idev->cnf.disable_ipv6) {
- NL_SET_ERR_MSG(extack, "IPv6 is disabled on nexthop device");
- err = -EACCES;
- goto out;
- }
-
- if (!(dev->flags & IFF_UP)) {
- NL_SET_ERR_MSG(extack, "Nexthop device is not up");
- err = -ENETDOWN;
- goto out;
- }
+ addr_type = ipv6_addr_type(&cfg->fc_dst);
+ if (fib6_is_reject(cfg->fc_flags, rt->fib6_nh.fib_nh_dev, addr_type))
+ rt->fib6_flags = RTF_REJECT | RTF_NONEXTHOP;
if (!ipv6_addr_any(&cfg->fc_prefsrc)) {
+ struct net_device *dev = fib6_info_nh_dev(rt);
+
if (!ipv6_chk_addr(net, &cfg->fc_prefsrc, dev, 0)) {
NL_SET_ERR_MSG(extack, "Invalid source address");
err = -EINVAL;
} else
rt->fib6_prefsrc.plen = 0;
- rt->fib6_flags = cfg->fc_flags;
-
-install_route:
- if (!(rt->fib6_flags & (RTF_LOCAL | RTF_ANYCAST)) &&
- !netif_carrier_ok(dev))
- rt->fib6_nh.nh_flags |= RTNH_F_LINKDOWN;
- rt->fib6_nh.nh_flags |= (cfg->fc_flags & RTNH_F_ONLINK);
- rt->fib6_nh.nh_dev = dev;
- rt->fib6_table = table;
-
- if (idev)
- in6_dev_put(idev);
-
return rt;
out:
- if (dev)
- dev_put(dev);
- if (idev)
- in6_dev_put(idev);
-
fib6_info_release(rt);
return ERR_PTR(err);
}
if (fn) {
for_each_fib6_node_rt_rcu(fn) {
+ struct fib6_nh *nh;
+
if (cfg->fc_flags & RTF_CACHE) {
int rc;
}
continue;
}
+
+ nh = &rt->fib6_nh;
if (cfg->fc_ifindex &&
- (!rt->fib6_nh.nh_dev ||
- rt->fib6_nh.nh_dev->ifindex != cfg->fc_ifindex))
+ (!nh->fib_nh_dev ||
+ nh->fib_nh_dev->ifindex != cfg->fc_ifindex))
continue;
if (cfg->fc_flags & RTF_GATEWAY &&
- !ipv6_addr_equal(&cfg->fc_gateway, &rt->fib6_nh.nh_gw))
+ !ipv6_addr_equal(&cfg->fc_gateway, &nh->fib_nh_gw6))
continue;
if (cfg->fc_metric && cfg->fc_metric != rt->fib6_metric)
continue;
goto out;
for_each_fib6_node_rt_rcu(fn) {
- if (rt->fib6_nh.nh_dev->ifindex != ifindex)
+ if (rt->fib6_nh.fib_nh_dev->ifindex != ifindex)
continue;
- if ((rt->fib6_flags & (RTF_ROUTEINFO|RTF_GATEWAY)) != (RTF_ROUTEINFO|RTF_GATEWAY))
+ if (!(rt->fib6_flags & RTF_ROUTEINFO) ||
+ !rt->fib6_nh.fib_nh_gw_family)
continue;
- if (!ipv6_addr_equal(&rt->fib6_nh.nh_gw, gwaddr))
+ if (!ipv6_addr_equal(&rt->fib6_nh.fib_nh_gw6, gwaddr))
continue;
if (!fib6_info_hold_safe(rt))
continue;
rcu_read_lock();
for_each_fib6_node_rt_rcu(&table->tb6_root) {
- if (dev == rt->fib6_nh.nh_dev &&
+ struct fib6_nh *nh = &rt->fib6_nh;
+
+ if (dev == nh->fib_nh_dev &&
((rt->fib6_flags & (RTF_ADDRCONF | RTF_DEFAULT)) == (RTF_ADDRCONF | RTF_DEFAULT)) &&
- ipv6_addr_equal(&rt->fib6_nh.nh_gw, addr))
+ ipv6_addr_equal(&nh->fib_nh_gw6, addr))
break;
}
if (rt && !fib6_info_hold_safe(rt))
.fc_table = l3mdev_fib_table_by_index(net, rtmsg->rtmsg_ifindex) ?
: RT6_TABLE_MAIN,
.fc_ifindex = rtmsg->rtmsg_ifindex,
- .fc_metric = rtmsg->rtmsg_metric,
+ .fc_metric = rtmsg->rtmsg_metric ? : IP6_RT_PRIO_USER,
.fc_expires = rtmsg->rtmsg_info,
.fc_dst_len = rtmsg->rtmsg_dst_len,
.fc_src_len = rtmsg->rtmsg_src_len,
const struct in6_addr *addr,
bool anycast, gfp_t gfp_flags)
{
- u32 tb_id;
- struct net_device *dev = idev->dev;
- struct fib6_info *f6i;
-
- f6i = fib6_info_alloc(gfp_flags);
- if (!f6i)
- return ERR_PTR(-ENOMEM);
+ struct fib6_config cfg = {
+ .fc_table = l3mdev_fib_table(idev->dev) ? : RT6_TABLE_LOCAL,
+ .fc_ifindex = idev->dev->ifindex,
+ .fc_flags = RTF_UP | RTF_ADDRCONF | RTF_NONEXTHOP,
+ .fc_dst = *addr,
+ .fc_dst_len = 128,
+ .fc_protocol = RTPROT_KERNEL,
+ .fc_nlinfo.nl_net = net,
+ .fc_ignore_dev_down = true,
+ };
- f6i->fib6_metrics = ip_fib_metrics_init(net, NULL, 0, NULL);
- f6i->dst_nocount = true;
- f6i->dst_host = true;
- f6i->fib6_protocol = RTPROT_KERNEL;
- f6i->fib6_flags = RTF_UP | RTF_NONEXTHOP;
if (anycast) {
- f6i->fib6_type = RTN_ANYCAST;
- f6i->fib6_flags |= RTF_ANYCAST;
+ cfg.fc_type = RTN_ANYCAST;
+ cfg.fc_flags |= RTF_ANYCAST;
} else {
- f6i->fib6_type = RTN_LOCAL;
- f6i->fib6_flags |= RTF_LOCAL;
+ cfg.fc_type = RTN_LOCAL;
+ cfg.fc_flags |= RTF_LOCAL;
}
- f6i->fib6_nh.nh_gw = *addr;
- dev_hold(dev);
- f6i->fib6_nh.nh_dev = dev;
- f6i->fib6_dst.addr = *addr;
- f6i->fib6_dst.plen = 128;
- tb_id = l3mdev_fib_table(idev->dev) ? : RT6_TABLE_LOCAL;
- f6i->fib6_table = fib6_get_table(net, tb_id);
-
- return f6i;
+ return ip6_route_info_create(&cfg, gfp_flags, NULL);
}
/* remove deleted ip from prefsrc entries */
struct net *net = ((struct arg_dev_net_ip *)arg)->net;
struct in6_addr *addr = ((struct arg_dev_net_ip *)arg)->addr;
- if (((void *)rt->fib6_nh.nh_dev == dev || !dev) &&
+ if (((void *)rt->fib6_nh.fib_nh_dev == dev || !dev) &&
rt != net->ipv6.fib6_null_entry &&
ipv6_addr_equal(addr, &rt->fib6_prefsrc.addr)) {
spin_lock_bh(&rt6_exception_lock);
fib6_clean_all(net, fib6_remove_prefsrc, &adni);
}
-#define RTF_RA_ROUTER (RTF_ADDRCONF | RTF_DEFAULT | RTF_GATEWAY)
+#define RTF_RA_ROUTER (RTF_ADDRCONF | RTF_DEFAULT)
/* Remove routers and update dst entries when gateway turn into host. */
static int fib6_clean_tohost(struct fib6_info *rt, void *arg)
struct in6_addr *gateway = (struct in6_addr *)arg;
if (((rt->fib6_flags & RTF_RA_ROUTER) == RTF_RA_ROUTER) &&
- ipv6_addr_equal(gateway, &rt->fib6_nh.nh_gw)) {
+ rt->fib6_nh.fib_nh_gw_family &&
+ ipv6_addr_equal(gateway, &rt->fib6_nh.fib_nh_gw6)) {
return -1;
}
static bool rt6_is_dead(const struct fib6_info *rt)
{
- if (rt->fib6_nh.nh_flags & RTNH_F_DEAD ||
- (rt->fib6_nh.nh_flags & RTNH_F_LINKDOWN &&
- fib6_ignore_linkdown(rt)))
+ if (rt->fib6_nh.fib_nh_flags & RTNH_F_DEAD ||
+ (rt->fib6_nh.fib_nh_flags & RTNH_F_LINKDOWN &&
+ ip6_ignore_linkdown(rt->fib6_nh.fib_nh_dev)))
return true;
return false;
int total = 0;
if (!rt6_is_dead(rt))
- total += rt->fib6_nh.nh_weight;
+ total += rt->fib6_nh.fib_nh_weight;
list_for_each_entry(iter, &rt->fib6_siblings, fib6_siblings) {
if (!rt6_is_dead(iter))
- total += iter->fib6_nh.nh_weight;
+ total += iter->fib6_nh.fib_nh_weight;
}
return total;
int upper_bound = -1;
if (!rt6_is_dead(rt)) {
- *weight += rt->fib6_nh.nh_weight;
+ *weight += rt->fib6_nh.fib_nh_weight;
upper_bound = DIV_ROUND_CLOSEST_ULL((u64) (*weight) << 31,
total) - 1;
}
- atomic_set(&rt->fib6_nh.nh_upper_bound, upper_bound);
+ atomic_set(&rt->fib6_nh.fib_nh_upper_bound, upper_bound);
}
static void rt6_multipath_upper_bound_set(struct fib6_info *rt, int total)
const struct arg_netdev_event *arg = p_arg;
struct net *net = dev_net(arg->dev);
- if (rt != net->ipv6.fib6_null_entry && rt->fib6_nh.nh_dev == arg->dev) {
- rt->fib6_nh.nh_flags &= ~arg->nh_flags;
+ if (rt != net->ipv6.fib6_null_entry &&
+ rt->fib6_nh.fib_nh_dev == arg->dev) {
+ rt->fib6_nh.fib_nh_flags &= ~arg->nh_flags;
fib6_update_sernum_upto_root(net, rt);
rt6_multipath_rebalance(rt);
}
{
struct fib6_info *iter;
- if (rt->fib6_nh.nh_dev == dev)
+ if (rt->fib6_nh.fib_nh_dev == dev)
return true;
list_for_each_entry(iter, &rt->fib6_siblings, fib6_siblings)
- if (iter->fib6_nh.nh_dev == dev)
+ if (iter->fib6_nh.fib_nh_dev == dev)
return true;
return false;
struct fib6_info *iter;
unsigned int dead = 0;
- if (rt->fib6_nh.nh_dev == down_dev ||
- rt->fib6_nh.nh_flags & RTNH_F_DEAD)
+ if (rt->fib6_nh.fib_nh_dev == down_dev ||
+ rt->fib6_nh.fib_nh_flags & RTNH_F_DEAD)
dead++;
list_for_each_entry(iter, &rt->fib6_siblings, fib6_siblings)
- if (iter->fib6_nh.nh_dev == down_dev ||
- iter->fib6_nh.nh_flags & RTNH_F_DEAD)
+ if (iter->fib6_nh.fib_nh_dev == down_dev ||
+ iter->fib6_nh.fib_nh_flags & RTNH_F_DEAD)
dead++;
return dead;
{
struct fib6_info *iter;
- if (rt->fib6_nh.nh_dev == dev)
- rt->fib6_nh.nh_flags |= nh_flags;
+ if (rt->fib6_nh.fib_nh_dev == dev)
+ rt->fib6_nh.fib_nh_flags |= nh_flags;
list_for_each_entry(iter, &rt->fib6_siblings, fib6_siblings)
- if (iter->fib6_nh.nh_dev == dev)
- iter->fib6_nh.nh_flags |= nh_flags;
+ if (iter->fib6_nh.fib_nh_dev == dev)
+ iter->fib6_nh.fib_nh_flags |= nh_flags;
}
/* called with write lock held for table with rt */
switch (arg->event) {
case NETDEV_UNREGISTER:
- return rt->fib6_nh.nh_dev == dev ? -1 : 0;
+ return rt->fib6_nh.fib_nh_dev == dev ? -1 : 0;
case NETDEV_DOWN:
if (rt->should_flush)
return -1;
if (!rt->fib6_nsiblings)
- return rt->fib6_nh.nh_dev == dev ? -1 : 0;
+ return rt->fib6_nh.fib_nh_dev == dev ? -1 : 0;
if (rt6_multipath_uses_dev(rt, dev)) {
unsigned int count;
}
return -2;
case NETDEV_CHANGE:
- if (rt->fib6_nh.nh_dev != dev ||
+ if (rt->fib6_nh.fib_nh_dev != dev ||
rt->fib6_flags & (RTF_LOCAL | RTF_ANYCAST))
break;
- rt->fib6_nh.nh_flags |= RTNH_F_LINKDOWN;
+ rt->fib6_nh.fib_nh_flags |= RTNH_F_LINKDOWN;
rt6_multipath_rebalance(rt);
break;
}
Since RFC 1981 doesn't include administrative MTU increase
update PMTU increase is a MUST. (i.e. jumbo frame)
*/
- if (rt->fib6_nh.nh_dev == arg->dev &&
+ if (rt->fib6_nh.fib_nh_dev == arg->dev &&
!fib6_metric_locked(rt, RTAX_MTU)) {
u32 mtu = rt->fib6_pmtu;
goto cleanup;
}
- rt->fib6_nh.nh_weight = rtnh->rtnh_hops + 1;
+ rt->fib6_nh.fib_nh_weight = rtnh->rtnh_hops + 1;
err = ip6_route_info_append(info->nl_net, &rt6_nh_list,
rt, &r_cfg);
if (err < 0)
return err;
+ if (cfg.fc_metric == 0)
+ cfg.fc_metric = IP6_RT_PRIO_USER;
+
if (cfg.fc_mp)
return ip6_route_multipath_add(&cfg, extack);
else
nexthop_len = nla_total_size(0) /* RTA_MULTIPATH */
+ NLA_ALIGN(sizeof(struct rtnexthop))
+ nla_total_size(16) /* RTA_GATEWAY */
- + lwtunnel_get_encap_size(rt->fib6_nh.nh_lwtstate);
+ + lwtunnel_get_encap_size(rt->fib6_nh.fib_nh_lws);
nexthop_len *= rt->fib6_nsiblings;
}
+ nla_total_size(sizeof(struct rta_cacheinfo))
+ nla_total_size(TCP_CA_NAME_MAX) /* RTAX_CC_ALGO */
+ nla_total_size(1) /* RTA_PREF */
- + lwtunnel_get_encap_size(rt->fib6_nh.nh_lwtstate)
+ + lwtunnel_get_encap_size(rt->fib6_nh.fib_nh_lws)
+ nexthop_len;
}
-static int rt6_nexthop_info(struct sk_buff *skb, struct fib6_info *rt,
- unsigned int *flags, bool skip_oif)
-{
- if (rt->fib6_nh.nh_flags & RTNH_F_DEAD)
- *flags |= RTNH_F_DEAD;
-
- if (rt->fib6_nh.nh_flags & RTNH_F_LINKDOWN) {
- *flags |= RTNH_F_LINKDOWN;
-
- rcu_read_lock();
- if (fib6_ignore_linkdown(rt))
- *flags |= RTNH_F_DEAD;
- rcu_read_unlock();
- }
-
- if (rt->fib6_flags & RTF_GATEWAY) {
- if (nla_put_in6_addr(skb, RTA_GATEWAY, &rt->fib6_nh.nh_gw) < 0)
- goto nla_put_failure;
- }
-
- *flags |= (rt->fib6_nh.nh_flags & RTNH_F_ONLINK);
- if (rt->fib6_nh.nh_flags & RTNH_F_OFFLOAD)
- *flags |= RTNH_F_OFFLOAD;
-
- /* not needed for multipath encoding b/c it has a rtnexthop struct */
- if (!skip_oif && rt->fib6_nh.nh_dev &&
- nla_put_u32(skb, RTA_OIF, rt->fib6_nh.nh_dev->ifindex))
- goto nla_put_failure;
-
- if (rt->fib6_nh.nh_lwtstate &&
- lwtunnel_fill_encap(skb, rt->fib6_nh.nh_lwtstate) < 0)
- goto nla_put_failure;
-
- return 0;
-
-nla_put_failure:
- return -EMSGSIZE;
-}
-
-/* add multipath next hop */
-static int rt6_add_nexthop(struct sk_buff *skb, struct fib6_info *rt)
-{
- const struct net_device *dev = rt->fib6_nh.nh_dev;
- struct rtnexthop *rtnh;
- unsigned int flags = 0;
-
- rtnh = nla_reserve_nohdr(skb, sizeof(*rtnh));
- if (!rtnh)
- goto nla_put_failure;
-
- rtnh->rtnh_hops = rt->fib6_nh.nh_weight - 1;
- rtnh->rtnh_ifindex = dev ? dev->ifindex : 0;
-
- if (rt6_nexthop_info(skb, rt, &flags, true) < 0)
- goto nla_put_failure;
-
- rtnh->rtnh_flags = flags;
-
- /* length of rtnetlink header + attributes */
- rtnh->rtnh_len = nlmsg_get_pos(skb) - (void *)rtnh;
-
- return 0;
-
-nla_put_failure:
- return -EMSGSIZE;
-}
-
static int rt6_fill_node(struct net *net, struct sk_buff *skb,
struct fib6_info *rt, struct dst_entry *dst,
struct in6_addr *dest, struct in6_addr *src,
if (!mp)
goto nla_put_failure;
- if (rt6_add_nexthop(skb, rt) < 0)
+ if (fib_add_nexthop(skb, &rt->fib6_nh.nh_common,
+ rt->fib6_nh.fib_nh_weight) < 0)
goto nla_put_failure;
list_for_each_entry_safe(sibling, next_sibling,
&rt->fib6_siblings, fib6_siblings) {
- if (rt6_add_nexthop(skb, sibling) < 0)
+ if (fib_add_nexthop(skb, &sibling->fib6_nh.nh_common,
+ sibling->fib6_nh.fib_nh_weight) < 0)
goto nla_put_failure;
}
nla_nest_end(skb, mp);
} else {
- if (rt6_nexthop_info(skb, rt, &rtm->rtm_flags, false) < 0)
+ if (fib_nexthop_info(skb, &rt->fib6_nh.nh_common,
+ &rtm->rtm_flags, false) < 0)
goto nla_put_failure;
}
static bool fib6_info_uses_dev(const struct fib6_info *f6i,
const struct net_device *dev)
{
- if (f6i->fib6_nh.nh_dev == dev)
+ if (f6i->fib6_nh.fib_nh_dev == dev)
return true;
if (f6i->fib6_nsiblings) {
list_for_each_entry_safe(sibling, next_sibling,
&f6i->fib6_siblings, fib6_siblings) {
- if (sibling->fib6_nh.nh_dev == dev)
+ if (sibling->fib6_nh.fib_nh_dev == dev)
return true;
}
}
return NOTIFY_OK;
if (event == NETDEV_REGISTER) {
- net->ipv6.fib6_null_entry->fib6_nh.nh_dev = dev;
+ net->ipv6.fib6_null_entry->fib6_nh.fib_nh_dev = dev;
net->ipv6.ip6_null_entry->dst.dev = dev;
net->ipv6.ip6_null_entry->rt6i_idev = in6_dev_get(dev);
#ifdef CONFIG_IPV6_MULTIPLE_TABLES
/* Registering of the loopback is done before this portion of code,
* the loopback reference in rt6_info will not be taken, do it
* manually for init_net */
- init_net.ipv6.fib6_null_entry->fib6_nh.nh_dev = init_net.loopback_dev;
+ init_net.ipv6.fib6_null_entry->fib6_nh.fib_nh_dev = init_net.loopback_dev;
init_net.ipv6.ip6_null_entry->dst.dev = init_net.loopback_dev;
init_net.ipv6.ip6_null_entry->rt6i_idev = in6_dev_get(init_net.loopback_dev);
#ifdef CONFIG_IPV6_MULTIPLE_TABLES
{
.cmd = SEG6_CMD_SETHMAC,
.doit = seg6_genl_sethmac,
- .policy = seg6_genl_policy,
.flags = GENL_ADMIN_PERM,
},
{
.start = seg6_genl_dumphmac_start,
.dumpit = seg6_genl_dumphmac,
.done = seg6_genl_dumphmac_done,
- .policy = seg6_genl_policy,
.flags = GENL_ADMIN_PERM,
},
{
.cmd = SEG6_CMD_SET_TUNSRC,
.doit = seg6_genl_set_tunsrc,
- .policy = seg6_genl_policy,
.flags = GENL_ADMIN_PERM,
},
{
.cmd = SEG6_CMD_GET_TUNSRC,
.doit = seg6_genl_get_tunsrc,
- .policy = seg6_genl_policy,
.flags = GENL_ADMIN_PERM,
},
};
.name = SEG6_GENL_NAME,
.version = SEG6_GENL_VERSION,
.maxattr = SEG6_ATTR_MAX,
+ .policy = seg6_genl_policy,
.netnsok = true,
.parallel_ops = true,
.ops = seg6_genl_ops,
}
#endif
+/* Helper returning the inet6 address from a given tcp socket.
+ * It can be used in TCP stack instead of inet6_sk(sk).
+ * This avoids a dereference and allow compiler optimizations.
+ * It is a specialized version of inet6_sk_generic().
+ */
+static struct ipv6_pinfo *tcp_inet6_sk(const struct sock *sk)
+{
+ unsigned int offset = sizeof(struct tcp6_sock) - sizeof(struct ipv6_pinfo);
+
+ return (struct ipv6_pinfo *)(((u8 *)sk) + offset);
+}
+
static void inet6_sk_rx_dst_set(struct sock *sk, const struct sk_buff *skb)
{
struct dst_entry *dst = skb_dst(skb);
sk->sk_rx_dst = dst;
inet_sk(sk)->rx_dst_ifindex = skb->skb_iif;
- inet6_sk(sk)->rx_dst_cookie = rt6_get_cookie(rt);
+ tcp_inet6_sk(sk)->rx_dst_cookie = rt6_get_cookie(rt);
}
}
struct sockaddr_in6 *usin = (struct sockaddr_in6 *) uaddr;
struct inet_sock *inet = inet_sk(sk);
struct inet_connection_sock *icsk = inet_csk(sk);
- struct ipv6_pinfo *np = inet6_sk(sk);
+ struct ipv6_pinfo *np = tcp_inet6_sk(sk);
struct tcp_sock *tp = tcp_sk(sk);
struct in6_addr *saddr = NULL, *final_p, final;
struct ipv6_txoptions *opt;
if (sk->sk_state == TCP_CLOSE)
goto out;
- if (ipv6_hdr(skb)->hop_limit < inet6_sk(sk)->min_hopcount) {
+ if (ipv6_hdr(skb)->hop_limit < tcp_inet6_sk(sk)->min_hopcount) {
__NET_INC_STATS(net, LINUX_MIB_TCPMINTTLDROP);
goto out;
}
goto out;
}
- np = inet6_sk(sk);
+ np = tcp_inet6_sk(sk);
if (type == NDISC_REDIRECT) {
if (!sock_owned_by_user(sk)) {
enum tcp_synack_type synack_type)
{
struct inet_request_sock *ireq = inet_rsk(req);
- struct ipv6_pinfo *np = inet6_sk(sk);
+ struct ipv6_pinfo *np = tcp_inet6_sk(sk);
struct ipv6_txoptions *opt;
struct flowi6 *fl6 = &fl->u.ip6;
struct sk_buff *skb;
{
bool l3_slave = ipv6_l3mdev_skb(TCP_SKB_CB(skb)->header.h6.flags);
struct inet_request_sock *ireq = inet_rsk(req);
- const struct ipv6_pinfo *np = inet6_sk(sk_listener);
+ const struct ipv6_pinfo *np = tcp_inet6_sk(sk_listener);
ireq->ir_v6_rmt_addr = ipv6_hdr(skb)->saddr;
ireq->ir_v6_loc_addr = ipv6_hdr(skb)->daddr;
{
struct inet_request_sock *ireq;
struct ipv6_pinfo *newnp;
- const struct ipv6_pinfo *np = inet6_sk(sk);
+ const struct ipv6_pinfo *np = tcp_inet6_sk(sk);
struct ipv6_txoptions *opt;
- struct tcp6_sock *newtcp6sk;
struct inet_sock *newinet;
struct tcp_sock *newtp;
struct sock *newsk;
if (!newsk)
return NULL;
- newtcp6sk = (struct tcp6_sock *)newsk;
- inet_sk(newsk)->pinet6 = &newtcp6sk->inet6;
+ inet_sk(newsk)->pinet6 = tcp_inet6_sk(newsk);
newinet = inet_sk(newsk);
- newnp = inet6_sk(newsk);
+ newnp = tcp_inet6_sk(newsk);
newtp = tcp_sk(newsk);
memcpy(newnp, np, sizeof(struct ipv6_pinfo));
ip6_dst_store(newsk, dst, NULL, NULL);
inet6_sk_rx_dst_set(newsk, skb);
- newtcp6sk = (struct tcp6_sock *)newsk;
- inet_sk(newsk)->pinet6 = &newtcp6sk->inet6;
+ inet_sk(newsk)->pinet6 = tcp_inet6_sk(newsk);
newtp = tcp_sk(newsk);
newinet = inet_sk(newsk);
- newnp = inet6_sk(newsk);
+ newnp = tcp_inet6_sk(newsk);
memcpy(newnp, np, sizeof(struct ipv6_pinfo));
*/
static int tcp_v6_do_rcv(struct sock *sk, struct sk_buff *skb)
{
- struct ipv6_pinfo *np = inet6_sk(sk);
- struct tcp_sock *tp;
+ struct ipv6_pinfo *np = tcp_inet6_sk(sk);
struct sk_buff *opt_skb = NULL;
+ struct tcp_sock *tp;
/* Imagine: socket is IPv6. IPv4 packet arrives,
goes to IPv4 receive handler and backlogged.
static int tcp_v6_rcv(struct sk_buff *skb)
{
+ struct sk_buff *skb_to_free;
int sdif = inet6_sdif(skb);
const struct tcphdr *th;
const struct ipv6hdr *hdr;
return 0;
}
}
- if (hdr->hop_limit < inet6_sk(sk)->min_hopcount) {
+ if (hdr->hop_limit < tcp_inet6_sk(sk)->min_hopcount) {
__NET_INC_STATS(net, LINUX_MIB_TCPMINTTLDROP);
goto discard_and_relse;
}
tcp_segs_in(tcp_sk(sk), skb);
ret = 0;
if (!sock_owned_by_user(sk)) {
+ skb_to_free = sk->sk_rx_skb_cache;
+ sk->sk_rx_skb_cache = NULL;
ret = tcp_v6_do_rcv(sk, skb);
- } else if (tcp_add_backlog(sk, skb)) {
- goto discard_and_relse;
+ } else {
+ if (tcp_add_backlog(sk, skb))
+ goto discard_and_relse;
+ skb_to_free = NULL;
}
bh_unlock_sock(sk);
-
+ if (skb_to_free)
+ __kfree_skb(skb_to_free);
put_and_return:
if (refcounted)
sock_put(sk);
struct dst_entry *dst = READ_ONCE(sk->sk_rx_dst);
if (dst)
- dst = dst_check(dst, inet6_sk(sk)->rx_dst_cookie);
+ dst = dst_check(dst, tcp_inet6_sk(sk)->rx_dst_cookie);
if (dst &&
inet_sk(sk)->rx_dst_ifindex == skb->skb_iif)
skb_dst_set_noref(skb, dst);
struct inet_sock *inet = inet_sk(sk);
struct sk_buff *skb;
unsigned int ulen, copied;
- int peeked, peeking, off;
- int err;
+ int off, err, peeking = flags & MSG_PEEK;
int is_udplite = IS_UDPLITE(sk);
struct udp_mib __percpu *mib;
bool checksum_valid = false;
return ipv6_recv_rxpmtu(sk, msg, len, addr_len);
try_again:
- peeking = flags & MSG_PEEK;
off = sk_peek_offset(sk, flags);
- skb = __skb_recv_udp(sk, flags, noblock, &peeked, &off, &err);
+ skb = __skb_recv_udp(sk, flags, noblock, &off, &err);
if (!skb)
return err;
goto csum_copy_err;
}
if (unlikely(err)) {
- if (!peeked) {
+ if (!peeking) {
atomic_inc(&sk->sk_drops);
SNMP_INC_STATS(mib, UDP_MIB_INERRORS);
}
kfree_skb(skb);
return err;
}
- if (!peeked)
+ if (!peeking)
SNMP_INC_STATS(mib, UDP_MIB_INDATAGRAMS);
sock_recv_ts_and_drops(msg, sk, skb);
{
.cmd = L2TP_CMD_NOOP,
.doit = l2tp_nl_cmd_noop,
- .policy = l2tp_nl_policy,
/* can be retrieved by unprivileged users */
},
{
.cmd = L2TP_CMD_TUNNEL_CREATE,
.doit = l2tp_nl_cmd_tunnel_create,
- .policy = l2tp_nl_policy,
.flags = GENL_ADMIN_PERM,
},
{
.cmd = L2TP_CMD_TUNNEL_DELETE,
.doit = l2tp_nl_cmd_tunnel_delete,
- .policy = l2tp_nl_policy,
.flags = GENL_ADMIN_PERM,
},
{
.cmd = L2TP_CMD_TUNNEL_MODIFY,
.doit = l2tp_nl_cmd_tunnel_modify,
- .policy = l2tp_nl_policy,
.flags = GENL_ADMIN_PERM,
},
{
.cmd = L2TP_CMD_TUNNEL_GET,
.doit = l2tp_nl_cmd_tunnel_get,
.dumpit = l2tp_nl_cmd_tunnel_dump,
- .policy = l2tp_nl_policy,
.flags = GENL_ADMIN_PERM,
},
{
.cmd = L2TP_CMD_SESSION_CREATE,
.doit = l2tp_nl_cmd_session_create,
- .policy = l2tp_nl_policy,
.flags = GENL_ADMIN_PERM,
},
{
.cmd = L2TP_CMD_SESSION_DELETE,
.doit = l2tp_nl_cmd_session_delete,
- .policy = l2tp_nl_policy,
.flags = GENL_ADMIN_PERM,
},
{
.cmd = L2TP_CMD_SESSION_MODIFY,
.doit = l2tp_nl_cmd_session_modify,
- .policy = l2tp_nl_policy,
.flags = GENL_ADMIN_PERM,
},
{
.cmd = L2TP_CMD_SESSION_GET,
.doit = l2tp_nl_cmd_session_get,
.dumpit = l2tp_nl_cmd_session_dump,
- .policy = l2tp_nl_policy,
.flags = GENL_ADMIN_PERM,
},
};
.version = L2TP_GENL_VERSION,
.hdrsize = 0,
.maxattr = L2TP_ATTR_MAX,
+ .policy = l2tp_nl_policy,
.netnsok = true,
.module = THIS_MODULE,
.ops = l2tp_nl_ops,
static u16 ieee80211_netdev_select_queue(struct net_device *dev,
struct sk_buff *skb,
- struct net_device *sb_dev,
- select_queue_fallback_t fallback)
+ struct net_device *sb_dev)
{
return ieee80211_select_queue(IEEE80211_DEV_TO_SUB_IF(dev), skb);
}
static u16 ieee80211_monitor_select_queue(struct net_device *dev,
struct sk_buff *skb,
- struct net_device *sb_dev,
- select_queue_fallback_t fallback)
+ struct net_device *sb_dev)
{
struct ieee80211_sub_if_data *sdata = IEEE80211_DEV_TO_SUB_IF(dev);
struct ieee80211_local *local = sdata->local;
#if IS_ENABLED(CONFIG_IPV6)
#include <net/ipv6.h>
#endif
-#include <net/addrconf.h>
+#include <net/ipv6_stubs.h>
#include <net/nexthop.h>
#include "internal.h"
mpls_stats_inc_outucastpkts(out_dev, skb);
- if (rt)
- err = neigh_xmit(NEIGH_ARP_TABLE, out_dev, &rt->rt_gateway,
- skb);
- else if (rt6) {
+ if (rt) {
+ if (rt->rt_gw_family == AF_INET)
+ err = neigh_xmit(NEIGH_ARP_TABLE, out_dev, &rt->rt_gw4,
+ skb);
+ else if (rt->rt_gw_family == AF_INET6)
+ err = neigh_xmit(NEIGH_ND_TABLE, out_dev, &rt->rt_gw6,
+ skb);
+ } else if (rt6) {
if (ipv6_addr_v4mapped(&rt6->rt6i_gateway)) {
/* 6PE (RFC 4798) */
err = neigh_xmit(NEIGH_ARP_TABLE, out_dev, &rt6->rt6i_gateway.s6_addr32[3],
static const struct genl_ops ncsi_ops[] = {
{
.cmd = NCSI_CMD_PKG_INFO,
- .policy = ncsi_genl_policy,
.doit = ncsi_pkg_info_nl,
.dumpit = ncsi_pkg_info_all_nl,
.flags = 0,
},
{
.cmd = NCSI_CMD_SET_INTERFACE,
- .policy = ncsi_genl_policy,
.doit = ncsi_set_interface_nl,
.flags = GENL_ADMIN_PERM,
},
{
.cmd = NCSI_CMD_CLEAR_INTERFACE,
- .policy = ncsi_genl_policy,
.doit = ncsi_clear_interface_nl,
.flags = GENL_ADMIN_PERM,
},
{
.cmd = NCSI_CMD_SEND_CMD,
- .policy = ncsi_genl_policy,
.doit = ncsi_send_cmd_nl,
.flags = GENL_ADMIN_PERM,
},
{
.cmd = NCSI_CMD_SET_PACKAGE_MASK,
- .policy = ncsi_genl_policy,
.doit = ncsi_set_package_mask_nl,
.flags = GENL_ADMIN_PERM,
},
{
.cmd = NCSI_CMD_SET_CHANNEL_MASK,
- .policy = ncsi_genl_policy,
.doit = ncsi_set_channel_mask_nl,
.flags = GENL_ADMIN_PERM,
},
.name = "NCSI",
.version = 0,
.maxattr = NCSI_ATTR_MAX,
+ .policy = ncsi_genl_policy,
.module = THIS_MODULE,
.ops = ncsi_ops,
.n_ops = ARRAY_SIZE(ncsi_ops),
forms of full Network Address Port Translation. This can be
controlled by iptables, ip6tables or nft.
-config NF_NAT_NEEDED
- bool
- depends on NF_NAT
- default y
-
config NF_NAT_AMANDA
tristate
depends on NF_CONNTRACK && NF_NAT
To compile it as a module, choose M here. If unsure, say N.
+config NETFILTER_XT_TARGET_MASQUERADE
+ tristate "MASQUERADE target support"
+ depends on NF_NAT
+ default m if NETFILTER_ADVANCED=n
+ select NF_NAT_MASQUERADE
+ help
+ Masquerading is a special case of NAT: all outgoing connections are
+ changed to seem to come from a particular interface's address, and
+ if the interface goes down, those connections are lost. This is
+ only useful for dialup accounts with dynamic IP address (ie. your IP
+ address will be different on next dialup).
+
+ To compile it as a module, choose M here. If unsure, say N.
+
config NETFILTER_XT_TARGET_TEE
tristate '"TEE" - packet cloning to alternate destination'
depends on NETFILTER_ADVANCED
nf_tables-objs := nf_tables_core.o nf_tables_api.o nft_chain_filter.o \
nf_tables_trace.o nft_immediate.o nft_cmp.o nft_range.o \
nft_bitwise.o nft_byteorder.o nft_payload.o nft_lookup.o \
- nft_dynset.o nft_meta.o nft_rt.o nft_exthdr.o
+ nft_dynset.o nft_meta.o nft_rt.o nft_exthdr.o \
+ nft_chain_route.o
nf_tables_set-objs := nf_tables_set_core.o \
nft_set_hash.o nft_set_bitmap.o nft_set_rbtree.o
obj-$(CONFIG_NETFILTER_XT_TARGET_NFQUEUE) += xt_NFQUEUE.o
obj-$(CONFIG_NETFILTER_XT_TARGET_RATEEST) += xt_RATEEST.o
obj-$(CONFIG_NETFILTER_XT_TARGET_REDIRECT) += xt_REDIRECT.o
+obj-$(CONFIG_NETFILTER_XT_TARGET_MASQUERADE) += xt_MASQUERADE.o
obj-$(CONFIG_NETFILTER_XT_TARGET_SECMARK) += xt_SECMARK.o
obj-$(CONFIG_NETFILTER_XT_TARGET_TPROXY) += xt_TPROXY.o
obj-$(CONFIG_NETFILTER_XT_TARGET_TCPMSS) += xt_TCPMSS.o
#include <linux/mm.h>
#include <linux/rcupdate.h>
#include <net/net_namespace.h>
+#include <net/netfilter/nf_queue.h>
#include <net/sock.h>
#include "nf_internals.h"
conn_flags = udest->conn_flags & IP_VS_CONN_F_DEST_MASK;
conn_flags |= IP_VS_CONN_F_INACTIVE;
+ /* set the tunnel info */
+ dest->tun_type = udest->tun_type;
+ dest->tun_port = udest->tun_port;
+
/* set the IP_VS_CONN_F_NOOUTPUT flag if not masquerading/NAT */
if ((conn_flags & IP_VS_CONN_F_FWD_MASK) != IP_VS_CONN_F_MASQ) {
conn_flags |= IP_VS_CONN_F_NOOUTPUT;
return -ERANGE;
}
+ if (udest->tun_type == IP_VS_CONN_F_TUNNEL_TYPE_GUE) {
+ if (udest->tun_port == 0) {
+ pr_err("%s(): tunnel port is zero\n", __func__);
+ return -EINVAL;
+ }
+ }
+
ip_vs_addr_copy(udest->af, &daddr, &udest->addr);
/* We use function that requires RCU lock */
return -ERANGE;
}
+ if (udest->tun_type == IP_VS_CONN_F_TUNNEL_TYPE_GUE) {
+ if (udest->tun_port == 0) {
+ pr_err("%s(): tunnel port is zero\n", __func__);
+ return -EINVAL;
+ }
+ }
+
ip_vs_addr_copy(udest->af, &daddr, &udest->addr);
/* We use function that requires RCU lock */
udest->u_threshold = udest_compat->u_threshold;
udest->l_threshold = udest_compat->l_threshold;
udest->af = AF_INET;
+ udest->tun_type = IP_VS_CONN_F_TUNNEL_TYPE_IPIP;
}
static int
[IPVS_DEST_ATTR_PERSIST_CONNS] = { .type = NLA_U32 },
[IPVS_DEST_ATTR_STATS] = { .type = NLA_NESTED },
[IPVS_DEST_ATTR_ADDR_FAMILY] = { .type = NLA_U16 },
+ [IPVS_DEST_ATTR_TUN_TYPE] = { .type = NLA_U8 },
+ [IPVS_DEST_ATTR_TUN_PORT] = { .type = NLA_U16 },
};
static int ip_vs_genl_fill_stats(struct sk_buff *skb, int container_type,
IP_VS_CONN_F_FWD_MASK)) ||
nla_put_u32(skb, IPVS_DEST_ATTR_WEIGHT,
atomic_read(&dest->weight)) ||
+ nla_put_u8(skb, IPVS_DEST_ATTR_TUN_TYPE,
+ dest->tun_type) ||
+ nla_put_be16(skb, IPVS_DEST_ATTR_TUN_PORT,
+ dest->tun_port) ||
nla_put_u32(skb, IPVS_DEST_ATTR_U_THRESH, dest->u_threshold) ||
nla_put_u32(skb, IPVS_DEST_ATTR_L_THRESH, dest->l_threshold) ||
nla_put_u32(skb, IPVS_DEST_ATTR_ACTIVE_CONNS,
/* If a full entry was requested, check for the additional fields */
if (full_entry) {
struct nlattr *nla_fwd, *nla_weight, *nla_u_thresh,
- *nla_l_thresh;
+ *nla_l_thresh, *nla_tun_type, *nla_tun_port;
nla_fwd = attrs[IPVS_DEST_ATTR_FWD_METHOD];
nla_weight = attrs[IPVS_DEST_ATTR_WEIGHT];
nla_u_thresh = attrs[IPVS_DEST_ATTR_U_THRESH];
nla_l_thresh = attrs[IPVS_DEST_ATTR_L_THRESH];
+ nla_tun_type = attrs[IPVS_DEST_ATTR_TUN_TYPE];
+ nla_tun_port = attrs[IPVS_DEST_ATTR_TUN_PORT];
if (!(nla_fwd && nla_weight && nla_u_thresh && nla_l_thresh))
return -EINVAL;
udest->weight = nla_get_u32(nla_weight);
udest->u_threshold = nla_get_u32(nla_u_thresh);
udest->l_threshold = nla_get_u32(nla_l_thresh);
+
+ if (nla_tun_type)
+ udest->tun_type = nla_get_u8(nla_tun_type);
+
+ if (nla_tun_port)
+ udest->tun_port = nla_get_be16(nla_tun_port);
}
return 0;
{
.cmd = IPVS_CMD_NEW_SERVICE,
.flags = GENL_ADMIN_PERM,
- .policy = ip_vs_cmd_policy,
.doit = ip_vs_genl_set_cmd,
},
{
.cmd = IPVS_CMD_SET_SERVICE,
.flags = GENL_ADMIN_PERM,
- .policy = ip_vs_cmd_policy,
.doit = ip_vs_genl_set_cmd,
},
{
.cmd = IPVS_CMD_DEL_SERVICE,
.flags = GENL_ADMIN_PERM,
- .policy = ip_vs_cmd_policy,
.doit = ip_vs_genl_set_cmd,
},
{
.flags = GENL_ADMIN_PERM,
.doit = ip_vs_genl_get_cmd,
.dumpit = ip_vs_genl_dump_services,
- .policy = ip_vs_cmd_policy,
},
{
.cmd = IPVS_CMD_NEW_DEST,
.flags = GENL_ADMIN_PERM,
- .policy = ip_vs_cmd_policy,
.doit = ip_vs_genl_set_cmd,
},
{
.cmd = IPVS_CMD_SET_DEST,
.flags = GENL_ADMIN_PERM,
- .policy = ip_vs_cmd_policy,
.doit = ip_vs_genl_set_cmd,
},
{
.cmd = IPVS_CMD_DEL_DEST,
.flags = GENL_ADMIN_PERM,
- .policy = ip_vs_cmd_policy,
.doit = ip_vs_genl_set_cmd,
},
{
.cmd = IPVS_CMD_GET_DEST,
.flags = GENL_ADMIN_PERM,
- .policy = ip_vs_cmd_policy,
.dumpit = ip_vs_genl_dump_dests,
},
{
.cmd = IPVS_CMD_NEW_DAEMON,
.flags = GENL_ADMIN_PERM,
- .policy = ip_vs_cmd_policy,
.doit = ip_vs_genl_set_daemon,
},
{
.cmd = IPVS_CMD_DEL_DAEMON,
.flags = GENL_ADMIN_PERM,
- .policy = ip_vs_cmd_policy,
.doit = ip_vs_genl_set_daemon,
},
{
{
.cmd = IPVS_CMD_SET_CONFIG,
.flags = GENL_ADMIN_PERM,
- .policy = ip_vs_cmd_policy,
.doit = ip_vs_genl_set_cmd,
},
{
{
.cmd = IPVS_CMD_ZERO,
.flags = GENL_ADMIN_PERM,
- .policy = ip_vs_cmd_policy,
.doit = ip_vs_genl_set_cmd,
},
{
.name = IPVS_GENL_NAME,
.version = IPVS_GENL_VERSION,
.maxattr = IPVS_CMD_ATTR_MAX,
+ .policy = ip_vs_cmd_policy,
.netnsok = true, /* Make ipvsadm to work on netns */
.module = THIS_MODULE,
.ops = ip_vs_genl_ops,
#include <linux/slab.h>
#include <linux/tcp.h> /* for tcphdr */
#include <net/ip.h>
+#include <net/gue.h>
#include <net/tcp.h> /* for csum_tcpudp_magic */
#include <net/udp.h>
#include <net/icmp.h> /* for icmp_send */
mtu = dst_mtu(&rt->dst);
} else {
mtu = dst_mtu(&rt->dst) - sizeof(struct iphdr);
+ if (!dest)
+ goto err_put;
+ if (dest->tun_type == IP_VS_CONN_F_TUNNEL_TYPE_GUE)
+ mtu -= sizeof(struct udphdr) + sizeof(struct guehdr);
if (mtu < 68) {
IP_VS_DBG_RL("%s(): mtu less than 68\n", __func__);
goto err_put;
mtu = dst_mtu(&rt->dst);
else {
mtu = dst_mtu(&rt->dst) - sizeof(struct ipv6hdr);
+ if (!dest)
+ goto err_put;
+ if (dest->tun_type == IP_VS_CONN_F_TUNNEL_TYPE_GUE)
+ mtu -= sizeof(struct udphdr) + sizeof(struct guehdr);
if (mtu < IPV6_MIN_MTU) {
IP_VS_DBG_RL("%s(): mtu less than %d\n", __func__,
IPV6_MIN_MTU);
}
}
+static int
+ipvs_gue_encap(struct net *net, struct sk_buff *skb,
+ struct ip_vs_conn *cp, __u8 *next_protocol)
+{
+ __be16 dport;
+ __be16 sport = udp_flow_src_port(net, skb, 0, 0, false);
+ struct udphdr *udph; /* Our new UDP header */
+ struct guehdr *gueh; /* Our new GUE header */
+
+ skb_push(skb, sizeof(struct guehdr));
+
+ gueh = (struct guehdr *)skb->data;
+
+ gueh->control = 0;
+ gueh->version = 0;
+ gueh->hlen = 0;
+ gueh->flags = 0;
+ gueh->proto_ctype = *next_protocol;
+
+ skb_push(skb, sizeof(struct udphdr));
+ skb_reset_transport_header(skb);
+
+ udph = udp_hdr(skb);
+
+ dport = cp->dest->tun_port;
+ udph->dest = dport;
+ udph->source = sport;
+ udph->len = htons(skb->len);
+ udph->check = 0;
+
+ *next_protocol = IPPROTO_UDP;
+
+ return 0;
+}
+
/*
* IP Tunneling transmitter
*
struct iphdr *iph; /* Our new IP header */
unsigned int max_headroom; /* The extra header space needed */
int ret, local;
+ int tun_type, gso_type;
EnterFunction(10);
*/
max_headroom = LL_RESERVED_SPACE(tdev) + sizeof(struct iphdr);
+ tun_type = cp->dest->tun_type;
+
+ if (tun_type == IP_VS_CONN_F_TUNNEL_TYPE_GUE)
+ max_headroom += sizeof(struct udphdr) + sizeof(struct guehdr);
+
/* We only care about the df field if sysctl_pmtu_disc(ipvs) is set */
dfp = sysctl_pmtu_disc(ipvs) ? &df : NULL;
skb = ip_vs_prepare_tunneled_skb(skb, cp->af, max_headroom,
if (IS_ERR(skb))
goto tx_error;
- if (iptunnel_handle_offloads(skb, __tun_gso_type_mask(AF_INET, cp->af)))
+ gso_type = __tun_gso_type_mask(AF_INET, cp->af);
+ if (tun_type == IP_VS_CONN_F_TUNNEL_TYPE_GUE)
+ gso_type |= SKB_GSO_UDP_TUNNEL;
+
+ if (iptunnel_handle_offloads(skb, gso_type))
goto tx_error;
skb->transport_header = skb->network_header;
+ skb_set_inner_ipproto(skb, next_protocol);
+
+ if (tun_type == IP_VS_CONN_F_TUNNEL_TYPE_GUE)
+ ipvs_gue_encap(net, skb, cp, &next_protocol);
+
skb_push(skb, sizeof(struct iphdr));
skb_reset_network_header(skb);
memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt));
ip_vs_tunnel_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
struct ip_vs_protocol *pp, struct ip_vs_iphdr *ipvsh)
{
+ struct netns_ipvs *ipvs = cp->ipvs;
+ struct net *net = ipvs->net;
struct rt6_info *rt; /* Route to the other host */
struct in6_addr saddr; /* Source for tunnel */
struct net_device *tdev; /* Device to other host */
struct ipv6hdr *iph; /* Our new IP header */
unsigned int max_headroom; /* The extra header space needed */
int ret, local;
+ int tun_type, gso_type;
EnterFunction(10);
- local = __ip_vs_get_out_rt_v6(cp->ipvs, cp->af, skb, cp->dest,
+ local = __ip_vs_get_out_rt_v6(ipvs, cp->af, skb, cp->dest,
&cp->daddr.in6,
&saddr, ipvsh, 1,
IP_VS_RT_MODE_LOCAL |
*/
max_headroom = LL_RESERVED_SPACE(tdev) + sizeof(struct ipv6hdr);
+ tun_type = cp->dest->tun_type;
+
+ if (tun_type == IP_VS_CONN_F_TUNNEL_TYPE_GUE)
+ max_headroom += sizeof(struct udphdr) + sizeof(struct guehdr);
+
skb = ip_vs_prepare_tunneled_skb(skb, cp->af, max_headroom,
&next_protocol, &payload_len,
&dsfield, &ttl, NULL);
if (IS_ERR(skb))
goto tx_error;
- if (iptunnel_handle_offloads(skb, __tun_gso_type_mask(AF_INET6, cp->af)))
+ gso_type = __tun_gso_type_mask(AF_INET6, cp->af);
+ if (tun_type == IP_VS_CONN_F_TUNNEL_TYPE_GUE)
+ gso_type |= SKB_GSO_UDP_TUNNEL;
+
+ if (iptunnel_handle_offloads(skb, gso_type))
goto tx_error;
skb->transport_header = skb->network_header;
+ skb_set_inner_ipproto(skb, next_protocol);
+
+ if (tun_type == IP_VS_CONN_F_TUNNEL_TYPE_GUE)
+ ipvs_gue_encap(net, skb, cp, &next_protocol);
+
skb_push(skb, sizeof(struct ipv6hdr));
skb_reset_network_header(skb);
memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt));
ret = ip_vs_tunnel_xmit_prepare(skb, cp);
if (ret == NF_ACCEPT)
- ip6_local_out(cp->ipvs->net, skb->sk, skb);
+ ip6_local_out(net, skb->sk, skb);
else if (ret == NF_DROP)
kfree_skb(skb);
exp->tuple.dst.u.all = *dst;
-#ifdef CONFIG_NF_NAT_NEEDED
+#if IS_ENABLED(CONFIG_NF_NAT)
memset(&exp->saved_addr, 0, sizeof(exp->saved_addr));
memset(&exp->saved_proto, 0, sizeof(exp->saved_proto));
#endif
#include <net/netfilter/nf_conntrack_timestamp.h>
#include <net/netfilter/nf_conntrack_labels.h>
#include <net/netfilter/nf_conntrack_synproxy.h>
-#ifdef CONFIG_NF_NAT_NEEDED
+#if IS_ENABLED(CONFIG_NF_NAT)
#include <net/netfilter/nf_nat.h>
#include <net/netfilter/nf_nat_helper.h>
#endif
+ nla_total_size(0) /* CTA_HELP */
+ nla_total_size(NF_CT_HELPER_NAME_LEN) /* CTA_HELP_NAME */
+ ctnetlink_secctx_size(ct)
-#ifdef CONFIG_NF_NAT_NEEDED
+#if IS_ENABLED(CONFIG_NF_NAT)
+ 2 * nla_total_size(0) /* CTA_NAT_SEQ_ADJ_ORIG|REPL */
+ 6 * nla_total_size(sizeof(u_int32_t)) /* CTA_NAT_SEQ_OFFSET */
#endif
return -EOPNOTSUPP;
}
-#ifdef CONFIG_NF_NAT_NEEDED
+#if IS_ENABLED(CONFIG_NF_NAT)
static int
ctnetlink_parse_nat_setup(struct nf_conn *ct,
enum nf_nat_manip_type manip,
static int
ctnetlink_setup_nat(struct nf_conn *ct, const struct nlattr * const cda[])
{
-#ifdef CONFIG_NF_NAT_NEEDED
+#if IS_ENABLED(CONFIG_NF_NAT)
int ret;
if (!cda[CTA_NAT_DST] && !cda[CTA_NAT_SRC])
+ nla_total_size(0) /* CTA_HELP */
+ nla_total_size(NF_CT_HELPER_NAME_LEN) /* CTA_HELP_NAME */
+ ctnetlink_secctx_size(ct)
-#ifdef CONFIG_NF_NAT_NEEDED
+#if IS_ENABLED(CONFIG_NF_NAT)
+ 2 * nla_total_size(0) /* CTA_NAT_SEQ_ADJ_ORIG|REPL */
+ 6 * nla_total_size(sizeof(u_int32_t)) /* CTA_NAT_SEQ_OFFSET */
#endif
struct nf_conn *master = exp->master;
long timeout = ((long)exp->timeout.expires - (long)jiffies) / HZ;
struct nf_conn_help *help;
-#ifdef CONFIG_NF_NAT_NEEDED
+#if IS_ENABLED(CONFIG_NF_NAT)
struct nlattr *nest_parms;
struct nf_conntrack_tuple nat_tuple = {};
#endif
CTA_EXPECT_MASTER) < 0)
goto nla_put_failure;
-#ifdef CONFIG_NF_NAT_NEEDED
+#if IS_ENABLED(CONFIG_NF_NAT)
if (!nf_inet_addr_cmp(&exp->saved_addr, &any_addr) ||
exp->saved_proto.all) {
nest_parms = nla_nest_start(skb, CTA_EXPECT_NAT | NLA_F_NESTED);
struct nf_conntrack_expect *exp,
u_int8_t u3)
{
-#ifdef CONFIG_NF_NAT_NEEDED
+#if IS_ENABLED(CONFIG_NF_NAT)
struct nlattr *tb[CTA_EXPECT_NAT_MAX+1];
struct nf_conntrack_tuple nat_tuple = {};
int err;
nfct_help(exp->master)->helper != nfct_help(ct)->helper ||
exp->class != class)
break;
-#ifdef CONFIG_NF_NAT_NEEDED
+#if IS_ENABLED(CONFIG_NF_NAT)
if (!direct_rtp &&
(!nf_inet_addr_cmp(&exp->saved_addr, &exp->tuple.dst.u3) ||
exp->saved_proto.udp.port != exp->tuple.dst.u.udp.port) &&
}
EXPORT_SYMBOL_GPL(nf_ct_untimeout);
+static void __nf_ct_timeout_put(struct nf_ct_timeout *timeout)
+{
+ typeof(nf_ct_timeout_put_hook) timeout_put;
+
+ timeout_put = rcu_dereference(nf_ct_timeout_put_hook);
+ if (timeout_put)
+ timeout_put(timeout);
+}
+
+int nf_ct_set_timeout(struct net *net, struct nf_conn *ct,
+ u8 l3num, u8 l4num, const char *timeout_name)
+{
+ typeof(nf_ct_timeout_find_get_hook) timeout_find_get;
+ struct nf_ct_timeout *timeout;
+ struct nf_conn_timeout *timeout_ext;
+ const char *errmsg = NULL;
+ int ret = 0;
+
+ rcu_read_lock();
+ timeout_find_get = rcu_dereference(nf_ct_timeout_find_get_hook);
+ if (!timeout_find_get) {
+ ret = -ENOENT;
+ errmsg = "Timeout policy base is empty";
+ goto out;
+ }
+
+ timeout = timeout_find_get(net, timeout_name);
+ if (!timeout) {
+ ret = -ENOENT;
+ pr_info_ratelimited("No such timeout policy \"%s\"\n",
+ timeout_name);
+ goto out;
+ }
+
+ if (timeout->l3num != l3num) {
+ ret = -EINVAL;
+ pr_info_ratelimited("Timeout policy `%s' can only be used by "
+ "L%d protocol number %d\n",
+ timeout_name, 3, timeout->l3num);
+ goto err_put_timeout;
+ }
+ /* Make sure the timeout policy matches any existing protocol tracker,
+ * otherwise default to generic.
+ */
+ if (timeout->l4proto->l4proto != l4num) {
+ ret = -EINVAL;
+ pr_info_ratelimited("Timeout policy `%s' can only be used by "
+ "L%d protocol number %d\n",
+ timeout_name, 4, timeout->l4proto->l4proto);
+ goto err_put_timeout;
+ }
+ timeout_ext = nf_ct_timeout_ext_add(ct, timeout, GFP_ATOMIC);
+ if (!timeout_ext) {
+ ret = -ENOMEM;
+ goto err_put_timeout;
+ }
+
+ rcu_read_unlock();
+ return ret;
+
+err_put_timeout:
+ __nf_ct_timeout_put(timeout);
+out:
+ rcu_read_unlock();
+ if (errmsg)
+ pr_info_ratelimited("%s\n", errmsg);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(nf_ct_set_timeout);
+
+void nf_ct_destroy_timeout(struct nf_conn *ct)
+{
+ struct nf_conn_timeout *timeout_ext;
+ typeof(nf_ct_timeout_put_hook) timeout_put;
+
+ rcu_read_lock();
+ timeout_put = rcu_dereference(nf_ct_timeout_put_hook);
+
+ if (timeout_put) {
+ timeout_ext = nf_ct_timeout_find(ct);
+ if (timeout_ext) {
+ timeout_put(timeout_ext->timeout);
+ RCU_INIT_POINTER(timeout_ext->timeout, NULL);
+ }
+ }
+ rcu_read_unlock();
+}
+EXPORT_SYMBOL_GPL(nf_ct_destroy_timeout);
+
static const struct nf_ct_ext_type timeout_extend = {
.len = sizeof(struct nf_conn_timeout),
.align = __alignof__(struct nf_conn_timeout),
if (tuplehash == NULL)
return NF_ACCEPT;
- outdev = dev_get_by_index_rcu(state->net, tuplehash->tuple.oifidx);
- if (!outdev)
- return NF_ACCEPT;
-
dir = tuplehash->tuple.dir;
flow = container_of(tuplehash, struct flow_offload, tuplehash[dir]);
rt = (struct rtable *)flow->tuplehash[dir].tuple.dst_cache;
+ outdev = rt->dst.dev;
if (unlikely(nf_flow_exceeds_mtu(skb, flow->tuplehash[dir].tuple.mtu)) &&
(ip_hdr(skb)->frag_off & htons(IP_DF)) != 0)
if (tuplehash == NULL)
return NF_ACCEPT;
- outdev = dev_get_by_index_rcu(state->net, tuplehash->tuple.oifidx);
- if (!outdev)
- return NF_ACCEPT;
-
dir = tuplehash->tuple.dir;
flow = container_of(tuplehash, struct flow_offload, tuplehash[dir]);
rt = (struct rt6_info *)flow->tuplehash[dir].tuple.dst_cache;
+ outdev = rt->dst.dev;
if (unlikely(nf_flow_exceeds_mtu(skb, flow->tuplehash[dir].tuple.mtu)))
return NF_ACCEPT;
#include <linux/netdevice.h>
/* nf_queue.c */
-int nf_queue(struct sk_buff *skb, struct nf_hook_state *state,
- const struct nf_hook_entries *entries, unsigned int index,
- unsigned int verdict);
void nf_queue_nf_hook_drop(struct net *net);
/* nf_log.c */
.expectfn = nf_nat_follow_master,
};
-int nf_nat_register_fn(struct net *net, const struct nf_hook_ops *ops,
+int nf_nat_register_fn(struct net *net, u8 pf, const struct nf_hook_ops *ops,
const struct nf_hook_ops *orig_nat_ops, unsigned int ops_count)
{
struct nat_net *nat_net = net_generic(net, nat_net_id);
struct nf_hook_ops *nat_ops;
int i, ret;
- if (WARN_ON_ONCE(ops->pf >= ARRAY_SIZE(nat_net->nat_proto_net)))
+ if (WARN_ON_ONCE(pf >= ARRAY_SIZE(nat_net->nat_proto_net)))
return -EINVAL;
- nat_proto_net = &nat_net->nat_proto_net[ops->pf];
+ nat_proto_net = &nat_net->nat_proto_net[pf];
for (i = 0; i < ops_count; i++) {
- if (WARN_ON(orig_nat_ops[i].pf != ops->pf))
- return -EINVAL;
if (orig_nat_ops[i].hooknum == hooknum) {
hooknum = i;
break;
return ret;
}
-void nf_nat_unregister_fn(struct net *net, const struct nf_hook_ops *ops,
- unsigned int ops_count)
+void nf_nat_unregister_fn(struct net *net, u8 pf, const struct nf_hook_ops *ops,
+ unsigned int ops_count)
{
struct nat_net *nat_net = net_generic(net, nat_net_id);
struct nf_nat_hooks_net *nat_proto_net;
int hooknum = ops->hooknum;
int i;
- if (ops->pf >= ARRAY_SIZE(nat_net->nat_proto_net))
+ if (pf >= ARRAY_SIZE(nat_net->nat_proto_net))
return;
- nat_proto_net = &nat_net->nat_proto_net[ops->pf];
+ nat_proto_net = &nat_net->nat_proto_net[pf];
mutex_lock(&nf_nat_proto_mutex);
if (WARN_ON(nat_proto_net->users == 0))
#include <linux/netfilter_ipv4.h>
#include <linux/netfilter_ipv6.h>
-#include <net/netfilter/ipv4/nf_nat_masquerade.h>
-#include <net/netfilter/ipv6/nf_nat_masquerade.h>
+#include <net/netfilter/nf_nat_masquerade.h>
static DEFINE_MUTEX(masq_mutex);
-static unsigned int masq_refcnt4 __read_mostly;
-static unsigned int masq_refcnt6 __read_mostly;
+static unsigned int masq_refcnt __read_mostly;
unsigned int
nf_nat_masquerade_ipv4(struct sk_buff *skb, unsigned int hooknum,
.notifier_call = masq_inet_event,
};
-int nf_nat_masquerade_ipv4_register_notifier(void)
-{
- int ret = 0;
-
- mutex_lock(&masq_mutex);
- if (WARN_ON_ONCE(masq_refcnt4 == UINT_MAX)) {
- ret = -EOVERFLOW;
- goto out_unlock;
- }
-
- /* check if the notifier was already set */
- if (++masq_refcnt4 > 1)
- goto out_unlock;
-
- /* Register for device down reports */
- ret = register_netdevice_notifier(&masq_dev_notifier);
- if (ret)
- goto err_dec;
- /* Register IP address change reports */
- ret = register_inetaddr_notifier(&masq_inet_notifier);
- if (ret)
- goto err_unregister;
-
- mutex_unlock(&masq_mutex);
- return ret;
-
-err_unregister:
- unregister_netdevice_notifier(&masq_dev_notifier);
-err_dec:
- masq_refcnt4--;
-out_unlock:
- mutex_unlock(&masq_mutex);
- return ret;
-}
-EXPORT_SYMBOL_GPL(nf_nat_masquerade_ipv4_register_notifier);
-
-void nf_nat_masquerade_ipv4_unregister_notifier(void)
-{
- mutex_lock(&masq_mutex);
- /* check if the notifier still has clients */
- if (--masq_refcnt4 > 0)
- goto out_unlock;
-
- unregister_netdevice_notifier(&masq_dev_notifier);
- unregister_inetaddr_notifier(&masq_inet_notifier);
-out_unlock:
- mutex_unlock(&masq_mutex);
-}
-EXPORT_SYMBOL_GPL(nf_nat_masquerade_ipv4_unregister_notifier);
-
#if IS_ENABLED(CONFIG_IPV6)
static atomic_t v6_worker_count __read_mostly;
.notifier_call = masq_inet6_event,
};
-int nf_nat_masquerade_ipv6_register_notifier(void)
+static int nf_nat_masquerade_ipv6_register_notifier(void)
+{
+ return register_inet6addr_notifier(&masq_inet6_notifier);
+}
+#else
+static inline int nf_nat_masquerade_ipv6_register_notifier(void) { return 0; }
+#endif
+
+int nf_nat_masquerade_inet_register_notifiers(void)
{
int ret = 0;
mutex_lock(&masq_mutex);
- if (WARN_ON_ONCE(masq_refcnt6 == UINT_MAX)) {
+ if (WARN_ON_ONCE(masq_refcnt == UINT_MAX)) {
ret = -EOVERFLOW;
goto out_unlock;
}
- /* check if the notifier is already set */
- if (++masq_refcnt6 > 1)
+ /* check if the notifier was already set */
+ if (++masq_refcnt > 1)
goto out_unlock;
- ret = register_inet6addr_notifier(&masq_inet6_notifier);
+ /* Register for device down reports */
+ ret = register_netdevice_notifier(&masq_dev_notifier);
if (ret)
goto err_dec;
+ /* Register IP address change reports */
+ ret = register_inetaddr_notifier(&masq_inet_notifier);
+ if (ret)
+ goto err_unregister;
+
+ ret = nf_nat_masquerade_ipv6_register_notifier();
+ if (ret)
+ goto err_unreg_inet;
mutex_unlock(&masq_mutex);
return ret;
+err_unreg_inet:
+ unregister_inetaddr_notifier(&masq_inet_notifier);
+err_unregister:
+ unregister_netdevice_notifier(&masq_dev_notifier);
err_dec:
- masq_refcnt6--;
+ masq_refcnt--;
out_unlock:
mutex_unlock(&masq_mutex);
return ret;
}
-EXPORT_SYMBOL_GPL(nf_nat_masquerade_ipv6_register_notifier);
+EXPORT_SYMBOL_GPL(nf_nat_masquerade_inet_register_notifiers);
-void nf_nat_masquerade_ipv6_unregister_notifier(void)
+void nf_nat_masquerade_inet_unregister_notifiers(void)
{
mutex_lock(&masq_mutex);
- /* check if the notifier still has clients */
- if (--masq_refcnt6 > 0)
+ /* check if the notifiers still have clients */
+ if (--masq_refcnt > 0)
goto out_unlock;
+ unregister_netdevice_notifier(&masq_dev_notifier);
+ unregister_inetaddr_notifier(&masq_inet_notifier);
+#if IS_ENABLED(CONFIG_IPV6)
unregister_inet6addr_notifier(&masq_inet6_notifier);
+#endif
out_unlock:
mutex_unlock(&masq_mutex);
}
-EXPORT_SYMBOL_GPL(nf_nat_masquerade_ipv6_unregister_notifier);
-#endif
+EXPORT_SYMBOL_GPL(nf_nat_masquerade_inet_unregister_notifiers);
return ret;
}
-static const struct nf_hook_ops nf_nat_ipv4_ops[] = {
+const struct nf_hook_ops nf_nat_ipv4_ops[] = {
/* Before packet filtering, change destination */
{
.hook = nf_nat_ipv4_in,
int nf_nat_ipv4_register_fn(struct net *net, const struct nf_hook_ops *ops)
{
- return nf_nat_register_fn(net, ops, nf_nat_ipv4_ops, ARRAY_SIZE(nf_nat_ipv4_ops));
+ return nf_nat_register_fn(net, ops->pf, ops, nf_nat_ipv4_ops,
+ ARRAY_SIZE(nf_nat_ipv4_ops));
}
EXPORT_SYMBOL_GPL(nf_nat_ipv4_register_fn);
void nf_nat_ipv4_unregister_fn(struct net *net, const struct nf_hook_ops *ops)
{
- nf_nat_unregister_fn(net, ops, ARRAY_SIZE(nf_nat_ipv4_ops));
+ nf_nat_unregister_fn(net, ops->pf, ops, ARRAY_SIZE(nf_nat_ipv4_ops));
}
EXPORT_SYMBOL_GPL(nf_nat_ipv4_unregister_fn);
return ret;
}
-static int nat_route_me_harder(struct net *net, struct sk_buff *skb)
-{
-#ifdef CONFIG_IPV6_MODULE
- const struct nf_ipv6_ops *v6_ops = nf_get_ipv6_ops();
-
- if (!v6_ops)
- return -EHOSTUNREACH;
-
- return v6_ops->route_me_harder(net, skb);
-#else
- return ip6_route_me_harder(net, skb);
-#endif
-}
-
static unsigned int
nf_nat_ipv6_local_fn(void *priv, struct sk_buff *skb,
const struct nf_hook_state *state)
if (!nf_inet_addr_cmp(&ct->tuplehash[dir].tuple.dst.u3,
&ct->tuplehash[!dir].tuple.src.u3)) {
- err = nat_route_me_harder(state->net, skb);
+ err = nf_ip6_route_me_harder(state->net, skb);
if (err < 0)
ret = NF_DROP_ERR(err);
}
return ret;
}
-static const struct nf_hook_ops nf_nat_ipv6_ops[] = {
+const struct nf_hook_ops nf_nat_ipv6_ops[] = {
/* Before packet filtering, change destination */
{
.hook = nf_nat_ipv6_in,
int nf_nat_ipv6_register_fn(struct net *net, const struct nf_hook_ops *ops)
{
- return nf_nat_register_fn(net, ops, nf_nat_ipv6_ops,
+ return nf_nat_register_fn(net, ops->pf, ops, nf_nat_ipv6_ops,
ARRAY_SIZE(nf_nat_ipv6_ops));
}
EXPORT_SYMBOL_GPL(nf_nat_ipv6_register_fn);
void nf_nat_ipv6_unregister_fn(struct net *net, const struct nf_hook_ops *ops)
{
- nf_nat_unregister_fn(net, ops, ARRAY_SIZE(nf_nat_ipv6_ops));
+ nf_nat_unregister_fn(net, ops->pf, ops, ARRAY_SIZE(nf_nat_ipv6_ops));
}
EXPORT_SYMBOL_GPL(nf_nat_ipv6_unregister_fn);
#endif /* CONFIG_IPV6 */
+
+#if defined(CONFIG_NF_TABLES_INET) && IS_ENABLED(CONFIG_NFT_NAT)
+int nf_nat_inet_register_fn(struct net *net, const struct nf_hook_ops *ops)
+{
+ int ret;
+
+ if (WARN_ON_ONCE(ops->pf != NFPROTO_INET))
+ return -EINVAL;
+
+ ret = nf_nat_register_fn(net, NFPROTO_IPV6, ops, nf_nat_ipv6_ops,
+ ARRAY_SIZE(nf_nat_ipv6_ops));
+ if (ret)
+ return ret;
+
+ ret = nf_nat_register_fn(net, NFPROTO_IPV4, ops, nf_nat_ipv4_ops,
+ ARRAY_SIZE(nf_nat_ipv4_ops));
+ if (ret)
+ nf_nat_ipv6_unregister_fn(net, ops);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(nf_nat_inet_register_fn);
+
+void nf_nat_inet_unregister_fn(struct net *net, const struct nf_hook_ops *ops)
+{
+ nf_nat_unregister_fn(net, NFPROTO_IPV4, ops, ARRAY_SIZE(nf_nat_ipv4_ops));
+ nf_nat_unregister_fn(net, NFPROTO_IPV6, ops, ARRAY_SIZE(nf_nat_ipv6_ops));
+}
+EXPORT_SYMBOL_GPL(nf_nat_inet_unregister_fn);
+#endif /* NFT INET NAT */
return 0;
}
+EXPORT_SYMBOL_GPL(nf_queue);
static unsigned int nf_iterate(struct sk_buff *skb,
struct nf_hook_state *state,
.hashfn = nft_chain_hash,
.obj_hashfn = nft_chain_hash_obj,
.obj_cmpfn = nft_chain_hash_cmp,
- .locks_mul = 1,
.automatic_shrinking = true,
};
static __be64 nf_jiffies64_to_msecs(u64 input)
{
- u64 ms = jiffies64_to_nsecs(input);
-
- return cpu_to_be64(div_u64(ms, NSEC_PER_MSEC));
+ return cpu_to_be64(jiffies64_to_msecs(input));
}
static int nf_tables_fill_set(struct sk_buff *skb, const struct nft_ctx *ctx,
return err;
}
-static int nf_tables_set_desc_parse(const struct nft_ctx *ctx,
- struct nft_set_desc *desc,
+static int nf_tables_set_desc_parse(struct nft_set_desc *desc,
const struct nlattr *nla)
{
struct nlattr *da[NFTA_SET_DESC_MAX + 1];
policy = ntohl(nla_get_be32(nla[NFTA_SET_POLICY]));
if (nla[NFTA_SET_DESC] != NULL) {
- err = nf_tables_set_desc_parse(&ctx, &desc, nla[NFTA_SET_DESC]);
+ err = nf_tables_set_desc_parse(&desc, nla[NFTA_SET_DESC]);
if (err < 0)
return err;
}
}
EXPORT_SYMBOL_GPL(nf_tables_bind_set);
-void nf_tables_unbind_set(const struct nft_ctx *ctx, struct nft_set *set,
- struct nft_set_binding *binding, bool event)
+static void nf_tables_unbind_set(const struct nft_ctx *ctx, struct nft_set *set,
+ struct nft_set_binding *binding, bool event)
{
list_del_rcu(&binding->list);
GFP_KERNEL);
}
}
-EXPORT_SYMBOL_GPL(nf_tables_unbind_set);
void nf_tables_deactivate_set(const struct nft_ctx *ctx, struct nft_set *set,
struct nft_set_binding *binding,
if (err < 0)
goto err5;
+ nft_chain_route_init();
return err;
err5:
rhltable_destroy(&nft_objname_ht);
nfnetlink_subsys_unregister(&nf_tables_subsys);
unregister_netdevice_notifier(&nf_tables_flowtable_notifier);
nft_chain_filter_fini();
+ nft_chain_route_fini();
unregister_pernet_subsys(&nf_tables_net_ops);
cancel_work_sync(&trans_destroy_work);
rcu_barrier();
}
EXPORT_SYMBOL_GPL(nf_osf_match);
-const char *nf_osf_find(const struct sk_buff *skb,
- const struct list_head *nf_osf_fingers,
- const int ttl_check)
+bool nf_osf_find(const struct sk_buff *skb,
+ const struct list_head *nf_osf_fingers,
+ const int ttl_check, struct nf_osf_data *data)
{
const struct iphdr *ip = ip_hdr(skb);
const struct nf_osf_user_finger *f;
const struct nf_osf_finger *kf;
struct nf_osf_hdr_ctx ctx;
const struct tcphdr *tcp;
- const char *genre = NULL;
memset(&ctx, 0, sizeof(ctx));
tcp = nf_osf_hdr_ctx_init(&ctx, skb, ip, opts);
if (!tcp)
- return NULL;
+ return false;
list_for_each_entry_rcu(kf, &nf_osf_fingers[ctx.df], finger_entry) {
f = &kf->finger;
if (!nf_osf_match_one(skb, f, ttl_check, &ctx))
continue;
- genre = f->genre;
+ data->genre = f->genre;
+ data->version = f->version;
break;
}
- return genre;
+ return true;
}
EXPORT_SYMBOL_GPL(nf_osf_find);
};
#endif
+#ifdef CONFIG_NF_TABLES_INET
+static int nft_nat_inet_reg(struct net *net, const struct nf_hook_ops *ops)
+{
+ return nf_nat_inet_register_fn(net, ops);
+}
+
+static void nft_nat_inet_unreg(struct net *net, const struct nf_hook_ops *ops)
+{
+ nf_nat_inet_unregister_fn(net, ops);
+}
+
+static const struct nft_chain_type nft_chain_nat_inet = {
+ .name = "nat",
+ .type = NFT_CHAIN_T_NAT,
+ .family = NFPROTO_INET,
+ .hook_mask = (1 << NF_INET_PRE_ROUTING) |
+ (1 << NF_INET_LOCAL_IN) |
+ (1 << NF_INET_LOCAL_OUT) |
+ (1 << NF_INET_POST_ROUTING),
+ .hooks = {
+ [NF_INET_PRE_ROUTING] = nft_nat_do_chain,
+ [NF_INET_LOCAL_IN] = nft_nat_do_chain,
+ [NF_INET_LOCAL_OUT] = nft_nat_do_chain,
+ [NF_INET_POST_ROUTING] = nft_nat_do_chain,
+ },
+ .ops_register = nft_nat_inet_reg,
+ .ops_unregister = nft_nat_inet_unreg,
+};
+#endif
+
static int __init nft_chain_nat_init(void)
{
#ifdef CONFIG_NF_TABLES_IPV6
#ifdef CONFIG_NF_TABLES_IPV4
nft_register_chain_type(&nft_chain_nat_ipv4);
#endif
+#ifdef CONFIG_NF_TABLES_INET
+ nft_register_chain_type(&nft_chain_nat_inet);
+#endif
return 0;
}
#ifdef CONFIG_NF_TABLES_IPV6
nft_unregister_chain_type(&nft_chain_nat_ipv6);
#endif
+#ifdef CONFIG_NF_TABLES_INET
+ nft_unregister_chain_type(&nft_chain_nat_inet);
+#endif
}
module_init(nft_chain_nat_init);
--- /dev/null
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/skbuff.h>
+#include <linux/netfilter.h>
+#include <linux/netfilter_ipv4.h>
+#include <linux/netfilter_ipv6.h>
+#include <linux/netfilter/nfnetlink.h>
+#include <linux/netfilter/nf_tables.h>
+#include <net/netfilter/nf_tables.h>
+#include <net/netfilter/nf_tables_ipv4.h>
+#include <net/netfilter/nf_tables_ipv6.h>
+#include <net/route.h>
+#include <net/ip.h>
+
+#ifdef CONFIG_NF_TABLES_IPV4
+static unsigned int nf_route_table_hook4(void *priv,
+ struct sk_buff *skb,
+ const struct nf_hook_state *state)
+{
+ const struct iphdr *iph;
+ struct nft_pktinfo pkt;
+ __be32 saddr, daddr;
+ unsigned int ret;
+ u32 mark;
+ int err;
+ u8 tos;
+
+ nft_set_pktinfo(&pkt, skb, state);
+ nft_set_pktinfo_ipv4(&pkt, skb);
+
+ mark = skb->mark;
+ iph = ip_hdr(skb);
+ saddr = iph->saddr;
+ daddr = iph->daddr;
+ tos = iph->tos;
+
+ ret = nft_do_chain(&pkt, priv);
+ if (ret == NF_ACCEPT) {
+ iph = ip_hdr(skb);
+
+ if (iph->saddr != saddr ||
+ iph->daddr != daddr ||
+ skb->mark != mark ||
+ iph->tos != tos) {
+ err = ip_route_me_harder(state->net, skb, RTN_UNSPEC);
+ if (err < 0)
+ ret = NF_DROP_ERR(err);
+ }
+ }
+ return ret;
+}
+
+static const struct nft_chain_type nft_chain_route_ipv4 = {
+ .name = "route",
+ .type = NFT_CHAIN_T_ROUTE,
+ .family = NFPROTO_IPV4,
+ .hook_mask = (1 << NF_INET_LOCAL_OUT),
+ .hooks = {
+ [NF_INET_LOCAL_OUT] = nf_route_table_hook4,
+ },
+};
+#endif
+
+#ifdef CONFIG_NF_TABLES_IPV6
+static unsigned int nf_route_table_hook6(void *priv,
+ struct sk_buff *skb,
+ const struct nf_hook_state *state)
+{
+ struct in6_addr saddr, daddr;
+ struct nft_pktinfo pkt;
+ u32 mark, flowlabel;
+ unsigned int ret;
+ u8 hop_limit;
+ int err;
+
+ nft_set_pktinfo(&pkt, skb, state);
+ nft_set_pktinfo_ipv6(&pkt, skb);
+
+ /* save source/dest address, mark, hoplimit, flowlabel, priority */
+ memcpy(&saddr, &ipv6_hdr(skb)->saddr, sizeof(saddr));
+ memcpy(&daddr, &ipv6_hdr(skb)->daddr, sizeof(daddr));
+ mark = skb->mark;
+ hop_limit = ipv6_hdr(skb)->hop_limit;
+
+ /* flowlabel and prio (includes version, which shouldn't change either)*/
+ flowlabel = *((u32 *)ipv6_hdr(skb));
+
+ ret = nft_do_chain(&pkt, priv);
+ if (ret == NF_ACCEPT &&
+ (memcmp(&ipv6_hdr(skb)->saddr, &saddr, sizeof(saddr)) ||
+ memcmp(&ipv6_hdr(skb)->daddr, &daddr, sizeof(daddr)) ||
+ skb->mark != mark ||
+ ipv6_hdr(skb)->hop_limit != hop_limit ||
+ flowlabel != *((u32 *)ipv6_hdr(skb)))) {
+ err = nf_ip6_route_me_harder(state->net, skb);
+ if (err < 0)
+ ret = NF_DROP_ERR(err);
+ }
+
+ return ret;
+}
+
+static const struct nft_chain_type nft_chain_route_ipv6 = {
+ .name = "route",
+ .type = NFT_CHAIN_T_ROUTE,
+ .family = NFPROTO_IPV6,
+ .hook_mask = (1 << NF_INET_LOCAL_OUT),
+ .hooks = {
+ [NF_INET_LOCAL_OUT] = nf_route_table_hook6,
+ },
+};
+#endif
+
+#ifdef CONFIG_NF_TABLES_INET
+static unsigned int nf_route_table_inet(void *priv,
+ struct sk_buff *skb,
+ const struct nf_hook_state *state)
+{
+ struct nft_pktinfo pkt;
+
+ switch (state->pf) {
+ case NFPROTO_IPV4:
+ return nf_route_table_hook4(priv, skb, state);
+ case NFPROTO_IPV6:
+ return nf_route_table_hook6(priv, skb, state);
+ default:
+ nft_set_pktinfo(&pkt, skb, state);
+ break;
+ }
+
+ return nft_do_chain(&pkt, priv);
+}
+
+static const struct nft_chain_type nft_chain_route_inet = {
+ .name = "route",
+ .type = NFT_CHAIN_T_ROUTE,
+ .family = NFPROTO_INET,
+ .hook_mask = (1 << NF_INET_LOCAL_OUT),
+ .hooks = {
+ [NF_INET_LOCAL_OUT] = nf_route_table_inet,
+ },
+};
+#endif
+
+void __init nft_chain_route_init(void)
+{
+#ifdef CONFIG_NF_TABLES_IPV6
+ nft_register_chain_type(&nft_chain_route_ipv6);
+#endif
+#ifdef CONFIG_NF_TABLES_IPV4
+ nft_register_chain_type(&nft_chain_route_ipv4);
+#endif
+#ifdef CONFIG_NF_TABLES_INET
+ nft_register_chain_type(&nft_chain_route_inet);
+#endif
+}
+
+void __exit nft_chain_route_fini(void)
+{
+#ifdef CONFIG_NF_TABLES_IPV6
+ nft_unregister_chain_type(&nft_chain_route_ipv6);
+#endif
+#ifdef CONFIG_NF_TABLES_IPV4
+ nft_unregister_chain_type(&nft_chain_route_ipv4);
+#endif
+#ifdef CONFIG_NF_TABLES_INET
+ nft_unregister_chain_type(&nft_chain_route_inet);
+#endif
+}
#include <linux/netfilter/nf_tables.h>
#include <net/netfilter/nf_tables.h>
#include <net/netfilter/nf_nat.h>
-#include <net/netfilter/ipv4/nf_nat_masquerade.h>
-#include <net/netfilter/ipv6/nf_nat_masquerade.h>
+#include <net/netfilter/nf_nat_masquerade.h>
struct nft_masq {
u32 flags;
static int __init nft_masq_module_init_ipv6(void)
{
- int ret = nft_register_expr(&nft_masq_ipv6_type);
-
- if (ret)
- return ret;
-
- ret = nf_nat_masquerade_ipv6_register_notifier();
- if (ret < 0)
- nft_unregister_expr(&nft_masq_ipv6_type);
-
- return ret;
+ return nft_register_expr(&nft_masq_ipv6_type);
}
static void nft_masq_module_exit_ipv6(void)
{
nft_unregister_expr(&nft_masq_ipv6_type);
- nf_nat_masquerade_ipv6_unregister_notifier();
}
#else
static inline int nft_masq_module_init_ipv6(void) { return 0; }
static inline void nft_masq_module_exit_ipv6(void) {}
#endif
+#ifdef CONFIG_NF_TABLES_INET
+static void nft_masq_inet_eval(const struct nft_expr *expr,
+ struct nft_regs *regs,
+ const struct nft_pktinfo *pkt)
+{
+ switch (nft_pf(pkt)) {
+ case NFPROTO_IPV4:
+ return nft_masq_ipv4_eval(expr, regs, pkt);
+ case NFPROTO_IPV6:
+ return nft_masq_ipv6_eval(expr, regs, pkt);
+ }
+
+ WARN_ON_ONCE(1);
+}
+
+static void
+nft_masq_inet_destroy(const struct nft_ctx *ctx, const struct nft_expr *expr)
+{
+ nf_ct_netns_put(ctx->net, NFPROTO_INET);
+}
+
+static struct nft_expr_type nft_masq_inet_type;
+static const struct nft_expr_ops nft_masq_inet_ops = {
+ .type = &nft_masq_inet_type,
+ .size = NFT_EXPR_SIZE(sizeof(struct nft_masq)),
+ .eval = nft_masq_inet_eval,
+ .init = nft_masq_init,
+ .destroy = nft_masq_inet_destroy,
+ .dump = nft_masq_dump,
+ .validate = nft_masq_validate,
+};
+
+static struct nft_expr_type nft_masq_inet_type __read_mostly = {
+ .family = NFPROTO_INET,
+ .name = "masq",
+ .ops = &nft_masq_inet_ops,
+ .policy = nft_masq_policy,
+ .maxattr = NFTA_MASQ_MAX,
+ .owner = THIS_MODULE,
+};
+
+static int __init nft_masq_module_init_inet(void)
+{
+ return nft_register_expr(&nft_masq_inet_type);
+}
+
+static void nft_masq_module_exit_inet(void)
+{
+ nft_unregister_expr(&nft_masq_inet_type);
+}
+#else
+static inline int nft_masq_module_init_inet(void) { return 0; }
+static inline void nft_masq_module_exit_inet(void) {}
+#endif
+
static int __init nft_masq_module_init(void)
{
int ret;
if (ret < 0)
return ret;
+ ret = nft_masq_module_init_inet();
+ if (ret < 0) {
+ nft_masq_module_exit_ipv6();
+ return ret;
+ }
+
ret = nft_register_expr(&nft_masq_ipv4_type);
if (ret < 0) {
+ nft_masq_module_exit_inet();
nft_masq_module_exit_ipv6();
return ret;
}
- ret = nf_nat_masquerade_ipv4_register_notifier();
+ ret = nf_nat_masquerade_inet_register_notifiers();
if (ret < 0) {
nft_masq_module_exit_ipv6();
+ nft_masq_module_exit_inet();
nft_unregister_expr(&nft_masq_ipv4_type);
return ret;
}
static void __exit nft_masq_module_exit(void)
{
nft_masq_module_exit_ipv6();
+ nft_masq_module_exit_inet();
nft_unregister_expr(&nft_masq_ipv4_type);
- nf_nat_masquerade_ipv4_unregister_notifier();
+ nf_nat_masquerade_inet_unregister_notifiers();
}
module_init(nft_masq_module_init);
return -EINVAL;
family = ntohl(nla_get_be32(tb[NFTA_NAT_FAMILY]));
- if (family != ctx->family)
+ if (ctx->family != NFPROTO_INET && ctx->family != family)
return -EOPNOTSUPP;
switch (family) {
.owner = THIS_MODULE,
};
+#ifdef CONFIG_NF_TABLES_INET
+static void nft_nat_inet_eval(const struct nft_expr *expr,
+ struct nft_regs *regs,
+ const struct nft_pktinfo *pkt)
+{
+ const struct nft_nat *priv = nft_expr_priv(expr);
+
+ if (priv->family == nft_pf(pkt))
+ nft_nat_eval(expr, regs, pkt);
+}
+
+static const struct nft_expr_ops nft_nat_inet_ops = {
+ .type = &nft_nat_type,
+ .size = NFT_EXPR_SIZE(sizeof(struct nft_nat)),
+ .eval = nft_nat_inet_eval,
+ .init = nft_nat_init,
+ .destroy = nft_nat_destroy,
+ .dump = nft_nat_dump,
+ .validate = nft_nat_validate,
+};
+
+static struct nft_expr_type nft_inet_nat_type __read_mostly = {
+ .name = "nat",
+ .family = NFPROTO_INET,
+ .ops = &nft_nat_inet_ops,
+ .policy = nft_nat_policy,
+ .maxattr = NFTA_NAT_MAX,
+ .owner = THIS_MODULE,
+};
+
+static int nft_nat_inet_module_init(void)
+{
+ return nft_register_expr(&nft_inet_nat_type);
+}
+
+static void nft_nat_inet_module_exit(void)
+{
+ nft_unregister_expr(&nft_inet_nat_type);
+}
+#else
+static int nft_nat_inet_module_init(void) { return 0; }
+static void nft_nat_inet_module_exit(void) { }
+#endif
+
static int __init nft_nat_module_init(void)
{
- return nft_register_expr(&nft_nat_type);
+ int ret = nft_nat_inet_module_init();
+
+ if (ret)
+ return ret;
+
+ ret = nft_register_expr(&nft_nat_type);
+ if (ret)
+ nft_nat_inet_module_exit();
+
+ return ret;
}
static void __exit nft_nat_module_exit(void)
{
+ nft_nat_inet_module_exit();
nft_unregister_expr(&nft_nat_type);
}
struct nft_osf {
enum nft_registers dreg:8;
u8 ttl;
+ u32 flags;
};
static const struct nla_policy nft_osf_policy[NFTA_OSF_MAX + 1] = {
[NFTA_OSF_DREG] = { .type = NLA_U32 },
[NFTA_OSF_TTL] = { .type = NLA_U8 },
+ [NFTA_OSF_FLAGS] = { .type = NLA_U32 },
};
static void nft_osf_eval(const struct nft_expr *expr, struct nft_regs *regs,
struct nft_osf *priv = nft_expr_priv(expr);
u32 *dest = ®s->data[priv->dreg];
struct sk_buff *skb = pkt->skb;
+ char os_match[NFT_OSF_MAXGENRELEN + 1];
const struct tcphdr *tcp;
+ struct nf_osf_data data;
struct tcphdr _tcph;
- const char *os_name;
tcp = skb_header_pointer(skb, ip_hdrlen(skb),
sizeof(struct tcphdr), &_tcph);
return;
}
- os_name = nf_osf_find(skb, nf_osf_fingers, priv->ttl);
- if (!os_name)
+ if (!nf_osf_find(skb, nf_osf_fingers, priv->ttl, &data)) {
strncpy((char *)dest, "unknown", NFT_OSF_MAXGENRELEN);
- else
- strncpy((char *)dest, os_name, NFT_OSF_MAXGENRELEN);
+ } else {
+ if (priv->flags & NFT_OSF_F_VERSION)
+ snprintf(os_match, NFT_OSF_MAXGENRELEN, "%s:%s",
+ data.genre, data.version);
+ else
+ strlcpy(os_match, data.genre, NFT_OSF_MAXGENRELEN);
+
+ strncpy((char *)dest, os_match, NFT_OSF_MAXGENRELEN);
+ }
}
static int nft_osf_init(const struct nft_ctx *ctx,
const struct nlattr * const tb[])
{
struct nft_osf *priv = nft_expr_priv(expr);
+ u32 flags;
int err;
u8 ttl;
priv->ttl = ttl;
}
+ if (tb[NFTA_OSF_FLAGS]) {
+ flags = ntohl(nla_get_be32(tb[NFTA_OSF_FLAGS]));
+ if (flags != NFT_OSF_F_VERSION)
+ return -EINVAL;
+ priv->flags = flags;
+ }
+
priv->dreg = nft_parse_register(tb[NFTA_OSF_DREG]);
err = nft_validate_register_store(ctx, priv->dreg, NULL,
NFT_DATA_VALUE, NFT_OSF_MAXGENRELEN);
if (nla_put_u8(skb, NFTA_OSF_TTL, priv->ttl))
goto nla_put_failure;
+ if (nla_put_be32(skb, NFTA_OSF_FLAGS, ntohl(priv->flags)))
+ goto nla_put_failure;
+
if (nft_dump_register(skb, NFTA_OSF_DREG, priv->dreg))
goto nla_put_failure;
return nf_ct_netns_get(ctx->net, ctx->family);
}
-int nft_redir_dump(struct sk_buff *skb, const struct nft_expr *expr)
+static int nft_redir_dump(struct sk_buff *skb, const struct nft_expr *expr)
{
const struct nft_redir *priv = nft_expr_priv(expr);
};
#endif
+#ifdef CONFIG_NF_TABLES_INET
+static void nft_redir_inet_eval(const struct nft_expr *expr,
+ struct nft_regs *regs,
+ const struct nft_pktinfo *pkt)
+{
+ switch (nft_pf(pkt)) {
+ case NFPROTO_IPV4:
+ return nft_redir_ipv4_eval(expr, regs, pkt);
+ case NFPROTO_IPV6:
+ return nft_redir_ipv6_eval(expr, regs, pkt);
+ }
+
+ WARN_ON_ONCE(1);
+}
+
+static void
+nft_redir_inet_destroy(const struct nft_ctx *ctx, const struct nft_expr *expr)
+{
+ nf_ct_netns_put(ctx->net, NFPROTO_INET);
+}
+
+static struct nft_expr_type nft_redir_inet_type;
+static const struct nft_expr_ops nft_redir_inet_ops = {
+ .type = &nft_redir_inet_type,
+ .size = NFT_EXPR_SIZE(sizeof(struct nft_redir)),
+ .eval = nft_redir_inet_eval,
+ .init = nft_redir_init,
+ .destroy = nft_redir_inet_destroy,
+ .dump = nft_redir_dump,
+ .validate = nft_redir_validate,
+};
+
+static struct nft_expr_type nft_redir_inet_type __read_mostly = {
+ .family = NFPROTO_INET,
+ .name = "redir",
+ .ops = &nft_redir_inet_ops,
+ .policy = nft_redir_policy,
+ .maxattr = NFTA_MASQ_MAX,
+ .owner = THIS_MODULE,
+};
+
+static int __init nft_redir_module_init_inet(void)
+{
+ return nft_register_expr(&nft_redir_inet_type);
+}
+#else
+static inline int nft_redir_module_init_inet(void) { return 0; }
+#endif
+
static int __init nft_redir_module_init(void)
{
int ret = nft_register_expr(&nft_redir_ipv4_type);
}
#endif
+ ret = nft_redir_module_init_inet();
+ if (ret < 0) {
+ nft_unregister_expr(&nft_redir_ipv4_type);
+#ifdef CONFIG_NF_TABLES_IPV6
+ nft_unregister_expr(&nft_redir_ipv6_type);
+#endif
+ return ret;
+ }
+
return ret;
}
#ifdef CONFIG_NF_TABLES_IPV6
nft_unregister_expr(&nft_redir_ipv6_type);
#endif
+#ifdef CONFIG_NF_TABLES_INET
+ nft_unregister_expr(&nft_redir_inet_type);
+#endif
}
module_init(nft_redir_module_init);
EXPORT_SYMBOL_GPL(xt_request_find_match);
/* Find target, grabs ref. Returns ERR_PTR() on error. */
-struct xt_target *xt_find_target(u8 af, const char *name, u8 revision)
+static struct xt_target *xt_find_target(u8 af, const char *name, u8 revision)
{
struct xt_target *t;
int err = -ENOENT;
return ERR_PTR(err);
}
-EXPORT_SYMBOL(xt_find_target);
struct xt_target *xt_request_find_target(u8 af, const char *name, u8 revision)
{
return 0;
}
-#ifdef CONFIG_NF_CONNTRACK_TIMEOUT
-static void __xt_ct_tg_timeout_put(struct nf_ct_timeout *timeout)
-{
- typeof(nf_ct_timeout_put_hook) timeout_put;
-
- timeout_put = rcu_dereference(nf_ct_timeout_put_hook);
- if (timeout_put)
- timeout_put(timeout);
-}
-#endif
-
static int
xt_ct_set_timeout(struct nf_conn *ct, const struct xt_tgchk_param *par,
const char *timeout_name)
{
#ifdef CONFIG_NF_CONNTRACK_TIMEOUT
- typeof(nf_ct_timeout_find_get_hook) timeout_find_get;
const struct nf_conntrack_l4proto *l4proto;
- struct nf_ct_timeout *timeout;
- struct nf_conn_timeout *timeout_ext;
- const char *errmsg = NULL;
- int ret = 0;
u8 proto;
- rcu_read_lock();
- timeout_find_get = rcu_dereference(nf_ct_timeout_find_get_hook);
- if (timeout_find_get == NULL) {
- ret = -ENOENT;
- errmsg = "Timeout policy base is empty";
- goto out;
- }
-
proto = xt_ct_find_proto(par);
if (!proto) {
- ret = -EINVAL;
- errmsg = "You must specify a L4 protocol and not use inversions on it";
- goto out;
- }
-
- timeout = timeout_find_get(par->net, timeout_name);
- if (timeout == NULL) {
- ret = -ENOENT;
- pr_info_ratelimited("No such timeout policy \"%s\"\n",
- timeout_name);
- goto out;
- }
-
- if (timeout->l3num != par->family) {
- ret = -EINVAL;
- pr_info_ratelimited("Timeout policy `%s' can only be used by L%d protocol number %d\n",
- timeout_name, 3, timeout->l3num);
- goto err_put_timeout;
+ pr_info_ratelimited("You must specify a L4 protocol and not "
+ "use inversions on it");
+ return -EINVAL;
}
- /* Make sure the timeout policy matches any existing protocol tracker,
- * otherwise default to generic.
- */
l4proto = nf_ct_l4proto_find(proto);
- if (timeout->l4proto->l4proto != l4proto->l4proto) {
- ret = -EINVAL;
- pr_info_ratelimited("Timeout policy `%s' can only be used by L%d protocol number %d\n",
- timeout_name, 4, timeout->l4proto->l4proto);
- goto err_put_timeout;
- }
- timeout_ext = nf_ct_timeout_ext_add(ct, timeout, GFP_ATOMIC);
- if (!timeout_ext) {
- ret = -ENOMEM;
- goto err_put_timeout;
- }
+ return nf_ct_set_timeout(par->net, ct, par->family, l4proto->l4proto,
+ timeout_name);
- rcu_read_unlock();
- return ret;
-
-err_put_timeout:
- __xt_ct_tg_timeout_put(timeout);
-out:
- rcu_read_unlock();
- if (errmsg)
- pr_info_ratelimited("%s\n", errmsg);
- return ret;
#else
return -EOPNOTSUPP;
#endif
return xt_ct_tg_check(par, par->targinfo);
}
-static void xt_ct_destroy_timeout(struct nf_conn *ct)
-{
-#ifdef CONFIG_NF_CONNTRACK_TIMEOUT
- struct nf_conn_timeout *timeout_ext;
- typeof(nf_ct_timeout_put_hook) timeout_put;
-
- rcu_read_lock();
- timeout_put = rcu_dereference(nf_ct_timeout_put_hook);
-
- if (timeout_put) {
- timeout_ext = nf_ct_timeout_find(ct);
- if (timeout_ext) {
- timeout_put(timeout_ext->timeout);
- RCU_INIT_POINTER(timeout_ext->timeout, NULL);
- }
- }
- rcu_read_unlock();
-#endif
-}
-
static void xt_ct_tg_destroy(const struct xt_tgdtor_param *par,
struct xt_ct_target_info_v1 *info)
{
nf_ct_netns_put(par->net, par->family);
- xt_ct_destroy_timeout(ct);
+ nf_ct_destroy_timeout(ct);
nf_ct_put(info->ct);
}
}
* published by the Free Software Foundation.
*/
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
-#include <linux/types.h>
-#include <linux/inetdevice.h>
-#include <linux/ip.h>
-#include <linux/timer.h>
#include <linux/module.h>
-#include <linux/netfilter.h>
-#include <net/protocol.h>
-#include <net/ip.h>
-#include <net/checksum.h>
-#include <net/route.h>
-#include <linux/netfilter_ipv4.h>
#include <linux/netfilter/x_tables.h>
#include <net/netfilter/nf_nat.h>
-#include <net/netfilter/ipv4/nf_nat_masquerade.h>
+#include <net/netfilter/nf_nat_masquerade.h>
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Netfilter Core Team <coreteam@netfilter.org>");
nf_ct_netns_put(par->net, par->family);
}
-static struct xt_target masquerade_tg_reg __read_mostly = {
- .name = "MASQUERADE",
- .family = NFPROTO_IPV4,
- .target = masquerade_tg,
- .targetsize = sizeof(struct nf_nat_ipv4_multi_range_compat),
- .table = "nat",
- .hooks = 1 << NF_INET_POST_ROUTING,
- .checkentry = masquerade_tg_check,
- .destroy = masquerade_tg_destroy,
- .me = THIS_MODULE,
+#if IS_ENABLED(CONFIG_IPV6)
+static unsigned int
+masquerade_tg6(struct sk_buff *skb, const struct xt_action_param *par)
+{
+ return nf_nat_masquerade_ipv6(skb, par->targinfo, xt_out(par));
+}
+
+static int masquerade_tg6_checkentry(const struct xt_tgchk_param *par)
+{
+ const struct nf_nat_range2 *range = par->targinfo;
+
+ if (range->flags & NF_NAT_RANGE_MAP_IPS)
+ return -EINVAL;
+
+ return nf_ct_netns_get(par->net, par->family);
+}
+#endif
+
+static struct xt_target masquerade_tg_reg[] __read_mostly = {
+ {
+#if IS_ENABLED(CONFIG_IPV6)
+ .name = "MASQUERADE",
+ .family = NFPROTO_IPV6,
+ .target = masquerade_tg6,
+ .targetsize = sizeof(struct nf_nat_range),
+ .table = "nat",
+ .hooks = 1 << NF_INET_POST_ROUTING,
+ .checkentry = masquerade_tg6_checkentry,
+ .destroy = masquerade_tg_destroy,
+ .me = THIS_MODULE,
+ }, {
+#endif
+ .name = "MASQUERADE",
+ .family = NFPROTO_IPV4,
+ .target = masquerade_tg,
+ .targetsize = sizeof(struct nf_nat_ipv4_multi_range_compat),
+ .table = "nat",
+ .hooks = 1 << NF_INET_POST_ROUTING,
+ .checkentry = masquerade_tg_check,
+ .destroy = masquerade_tg_destroy,
+ .me = THIS_MODULE,
+ }
};
static int __init masquerade_tg_init(void)
{
int ret;
- ret = xt_register_target(&masquerade_tg_reg);
+ ret = xt_register_targets(masquerade_tg_reg,
+ ARRAY_SIZE(masquerade_tg_reg));
if (ret)
return ret;
- ret = nf_nat_masquerade_ipv4_register_notifier();
- if (ret)
- xt_unregister_target(&masquerade_tg_reg);
+ ret = nf_nat_masquerade_inet_register_notifiers();
+ if (ret) {
+ xt_unregister_targets(masquerade_tg_reg,
+ ARRAY_SIZE(masquerade_tg_reg));
+ return ret;
+ }
return ret;
}
static void __exit masquerade_tg_exit(void)
{
- xt_unregister_target(&masquerade_tg_reg);
- nf_nat_masquerade_ipv4_unregister_notifier();
+ xt_unregister_targets(masquerade_tg_reg, ARRAY_SIZE(masquerade_tg_reg));
+ nf_nat_masquerade_inet_unregister_notifiers();
}
module_init(masquerade_tg_init);
module_exit(masquerade_tg_exit);
+#if IS_ENABLED(CONFIG_IPV6)
+MODULE_ALIAS("ip6t_MASQUERADE");
+#endif
+MODULE_ALIAS("ipt_MASQUERADE");
{
.cmd = NLBL_CALIPSO_C_ADD,
.flags = GENL_ADMIN_PERM,
- .policy = calipso_genl_policy,
.doit = netlbl_calipso_add,
.dumpit = NULL,
},
{
.cmd = NLBL_CALIPSO_C_REMOVE,
.flags = GENL_ADMIN_PERM,
- .policy = calipso_genl_policy,
.doit = netlbl_calipso_remove,
.dumpit = NULL,
},
{
.cmd = NLBL_CALIPSO_C_LIST,
.flags = 0,
- .policy = calipso_genl_policy,
.doit = netlbl_calipso_list,
.dumpit = NULL,
},
{
.cmd = NLBL_CALIPSO_C_LISTALL,
.flags = 0,
- .policy = calipso_genl_policy,
.doit = NULL,
.dumpit = netlbl_calipso_listall,
},
.name = NETLBL_NLTYPE_CALIPSO_NAME,
.version = NETLBL_PROTO_VERSION,
.maxattr = NLBL_CALIPSO_A_MAX,
+ .policy = calipso_genl_policy,
.module = THIS_MODULE,
.ops = netlbl_calipso_ops,
.n_ops = ARRAY_SIZE(netlbl_calipso_ops),
{
.cmd = NLBL_CIPSOV4_C_ADD,
.flags = GENL_ADMIN_PERM,
- .policy = netlbl_cipsov4_genl_policy,
.doit = netlbl_cipsov4_add,
.dumpit = NULL,
},
{
.cmd = NLBL_CIPSOV4_C_REMOVE,
.flags = GENL_ADMIN_PERM,
- .policy = netlbl_cipsov4_genl_policy,
.doit = netlbl_cipsov4_remove,
.dumpit = NULL,
},
{
.cmd = NLBL_CIPSOV4_C_LIST,
.flags = 0,
- .policy = netlbl_cipsov4_genl_policy,
.doit = netlbl_cipsov4_list,
.dumpit = NULL,
},
{
.cmd = NLBL_CIPSOV4_C_LISTALL,
.flags = 0,
- .policy = netlbl_cipsov4_genl_policy,
.doit = NULL,
.dumpit = netlbl_cipsov4_listall,
},
.name = NETLBL_NLTYPE_CIPSOV4_NAME,
.version = NETLBL_PROTO_VERSION,
.maxattr = NLBL_CIPSOV4_A_MAX,
+ .policy = netlbl_cipsov4_genl_policy,
.module = THIS_MODULE,
.ops = netlbl_cipsov4_ops,
.n_ops = ARRAY_SIZE(netlbl_cipsov4_ops),
{
.cmd = NLBL_MGMT_C_ADD,
.flags = GENL_ADMIN_PERM,
- .policy = netlbl_mgmt_genl_policy,
.doit = netlbl_mgmt_add,
.dumpit = NULL,
},
{
.cmd = NLBL_MGMT_C_REMOVE,
.flags = GENL_ADMIN_PERM,
- .policy = netlbl_mgmt_genl_policy,
.doit = netlbl_mgmt_remove,
.dumpit = NULL,
},
{
.cmd = NLBL_MGMT_C_LISTALL,
.flags = 0,
- .policy = netlbl_mgmt_genl_policy,
.doit = NULL,
.dumpit = netlbl_mgmt_listall,
},
{
.cmd = NLBL_MGMT_C_ADDDEF,
.flags = GENL_ADMIN_PERM,
- .policy = netlbl_mgmt_genl_policy,
.doit = netlbl_mgmt_adddef,
.dumpit = NULL,
},
{
.cmd = NLBL_MGMT_C_REMOVEDEF,
.flags = GENL_ADMIN_PERM,
- .policy = netlbl_mgmt_genl_policy,
.doit = netlbl_mgmt_removedef,
.dumpit = NULL,
},
{
.cmd = NLBL_MGMT_C_LISTDEF,
.flags = 0,
- .policy = netlbl_mgmt_genl_policy,
.doit = netlbl_mgmt_listdef,
.dumpit = NULL,
},
{
.cmd = NLBL_MGMT_C_PROTOCOLS,
.flags = 0,
- .policy = netlbl_mgmt_genl_policy,
.doit = NULL,
.dumpit = netlbl_mgmt_protocols,
},
{
.cmd = NLBL_MGMT_C_VERSION,
.flags = 0,
- .policy = netlbl_mgmt_genl_policy,
.doit = netlbl_mgmt_version,
.dumpit = NULL,
},
.name = NETLBL_NLTYPE_MGMT_NAME,
.version = NETLBL_PROTO_VERSION,
.maxattr = NLBL_MGMT_A_MAX,
+ .policy = netlbl_mgmt_genl_policy,
.module = THIS_MODULE,
.ops = netlbl_mgmt_genl_ops,
.n_ops = ARRAY_SIZE(netlbl_mgmt_genl_ops),
{
.cmd = NLBL_UNLABEL_C_STATICADD,
.flags = GENL_ADMIN_PERM,
- .policy = netlbl_unlabel_genl_policy,
.doit = netlbl_unlabel_staticadd,
.dumpit = NULL,
},
{
.cmd = NLBL_UNLABEL_C_STATICREMOVE,
.flags = GENL_ADMIN_PERM,
- .policy = netlbl_unlabel_genl_policy,
.doit = netlbl_unlabel_staticremove,
.dumpit = NULL,
},
{
.cmd = NLBL_UNLABEL_C_STATICLIST,
.flags = 0,
- .policy = netlbl_unlabel_genl_policy,
.doit = NULL,
.dumpit = netlbl_unlabel_staticlist,
},
{
.cmd = NLBL_UNLABEL_C_STATICADDDEF,
.flags = GENL_ADMIN_PERM,
- .policy = netlbl_unlabel_genl_policy,
.doit = netlbl_unlabel_staticadddef,
.dumpit = NULL,
},
{
.cmd = NLBL_UNLABEL_C_STATICREMOVEDEF,
.flags = GENL_ADMIN_PERM,
- .policy = netlbl_unlabel_genl_policy,
.doit = netlbl_unlabel_staticremovedef,
.dumpit = NULL,
},
{
.cmd = NLBL_UNLABEL_C_STATICLISTDEF,
.flags = 0,
- .policy = netlbl_unlabel_genl_policy,
.doit = NULL,
.dumpit = netlbl_unlabel_staticlistdef,
},
{
.cmd = NLBL_UNLABEL_C_ACCEPT,
.flags = GENL_ADMIN_PERM,
- .policy = netlbl_unlabel_genl_policy,
.doit = netlbl_unlabel_accept,
.dumpit = NULL,
},
{
.cmd = NLBL_UNLABEL_C_LIST,
.flags = 0,
- .policy = netlbl_unlabel_genl_policy,
.doit = netlbl_unlabel_list,
.dumpit = NULL,
},
.name = NETLBL_NLTYPE_UNLABELED_NAME,
.version = NETLBL_PROTO_VERSION,
.maxattr = NLBL_UNLABEL_A_MAX,
+ .policy = netlbl_unlabel_genl_policy,
.module = THIS_MODULE,
.ops = netlbl_unlabel_genl_ops,
.n_ops = ARRAY_SIZE(netlbl_unlabel_genl_ops),
if (attrbuf) {
err = nlmsg_parse(nlh, hdrlen, attrbuf, family->maxattr,
- ops->policy, extack);
+ family->policy, extack);
if (err < 0)
goto out;
}
op_flags |= GENL_CMD_CAP_DUMP;
if (ops->doit)
op_flags |= GENL_CMD_CAP_DO;
- if (ops->policy)
+ if (family->policy)
op_flags |= GENL_CMD_CAP_HASPOL;
nest = nla_nest_start(skb, i + 1);
.cmd = CTRL_CMD_GETFAMILY,
.doit = ctrl_getfamily,
.dumpit = ctrl_dumpfamily,
- .policy = ctrl_policy,
},
};
.name = "nlctrl",
.version = 0x2,
.maxattr = CTRL_ATTR_MAX,
+ .policy = ctrl_policy,
.netnsok = true,
};
.doit = nfc_genl_get_device,
.dumpit = nfc_genl_dump_devices,
.done = nfc_genl_dump_devices_done,
- .policy = nfc_genl_policy,
},
{
.cmd = NFC_CMD_DEV_UP,
.doit = nfc_genl_dev_up,
- .policy = nfc_genl_policy,
},
{
.cmd = NFC_CMD_DEV_DOWN,
.doit = nfc_genl_dev_down,
- .policy = nfc_genl_policy,
},
{
.cmd = NFC_CMD_START_POLL,
.doit = nfc_genl_start_poll,
- .policy = nfc_genl_policy,
},
{
.cmd = NFC_CMD_STOP_POLL,
.doit = nfc_genl_stop_poll,
- .policy = nfc_genl_policy,
},
{
.cmd = NFC_CMD_DEP_LINK_UP,
.doit = nfc_genl_dep_link_up,
- .policy = nfc_genl_policy,
},
{
.cmd = NFC_CMD_DEP_LINK_DOWN,
.doit = nfc_genl_dep_link_down,
- .policy = nfc_genl_policy,
},
{
.cmd = NFC_CMD_GET_TARGET,
.dumpit = nfc_genl_dump_targets,
.done = nfc_genl_dump_targets_done,
- .policy = nfc_genl_policy,
},
{
.cmd = NFC_CMD_LLC_GET_PARAMS,
.doit = nfc_genl_llc_get_params,
- .policy = nfc_genl_policy,
},
{
.cmd = NFC_CMD_LLC_SET_PARAMS,
.doit = nfc_genl_llc_set_params,
- .policy = nfc_genl_policy,
},
{
.cmd = NFC_CMD_LLC_SDREQ,
.doit = nfc_genl_llc_sdreq,
- .policy = nfc_genl_policy,
},
{
.cmd = NFC_CMD_FW_DOWNLOAD,
.doit = nfc_genl_fw_download,
- .policy = nfc_genl_policy,
},
{
.cmd = NFC_CMD_ENABLE_SE,
.doit = nfc_genl_enable_se,
- .policy = nfc_genl_policy,
},
{
.cmd = NFC_CMD_DISABLE_SE,
.doit = nfc_genl_disable_se,
- .policy = nfc_genl_policy,
},
{
.cmd = NFC_CMD_GET_SE,
.dumpit = nfc_genl_dump_ses,
.done = nfc_genl_dump_ses_done,
- .policy = nfc_genl_policy,
},
{
.cmd = NFC_CMD_SE_IO,
.doit = nfc_genl_se_io,
- .policy = nfc_genl_policy,
},
{
.cmd = NFC_CMD_ACTIVATE_TARGET,
.doit = nfc_genl_activate_target,
- .policy = nfc_genl_policy,
},
{
.cmd = NFC_CMD_VENDOR,
.doit = nfc_genl_vendor_cmd,
- .policy = nfc_genl_policy,
},
{
.cmd = NFC_CMD_DEACTIVATE_TARGET,
.doit = nfc_genl_deactivate_target,
- .policy = nfc_genl_policy,
},
};
.name = NFC_GENL_NAME,
.version = NFC_GENL_VERSION,
.maxattr = NFC_ATTR_MAX,
+ .policy = nfc_genl_policy,
.module = THIS_MODULE,
.ops = nfc_genl_ops,
.n_ops = ARRAY_SIZE(nfc_genl_ops),
const struct nlattr *actions, int len,
bool last, bool clone_flow_key);
+static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
+ struct sw_flow_key *key,
+ const struct nlattr *attr, int len);
+
static void update_ethertype(struct sk_buff *skb, struct ethhdr *hdr,
__be16 ethertype)
{
return clone_execute(dp, skb, key, recirc_id, NULL, 0, last, true);
}
+static int execute_check_pkt_len(struct datapath *dp, struct sk_buff *skb,
+ struct sw_flow_key *key,
+ const struct nlattr *attr, bool last)
+{
+ const struct nlattr *actions, *cpl_arg;
+ const struct check_pkt_len_arg *arg;
+ int rem = nla_len(attr);
+ bool clone_flow_key;
+
+ /* The first netlink attribute in 'attr' is always
+ * 'OVS_CHECK_PKT_LEN_ATTR_ARG'.
+ */
+ cpl_arg = nla_data(attr);
+ arg = nla_data(cpl_arg);
+
+ if (skb->len <= arg->pkt_len) {
+ /* Second netlink attribute in 'attr' is always
+ * 'OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_LESS_EQUAL'.
+ */
+ actions = nla_next(cpl_arg, &rem);
+ clone_flow_key = !arg->exec_for_lesser_equal;
+ } else {
+ /* Third netlink attribute in 'attr' is always
+ * 'OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_GREATER'.
+ */
+ actions = nla_next(cpl_arg, &rem);
+ actions = nla_next(actions, &rem);
+ clone_flow_key = !arg->exec_for_greater;
+ }
+
+ return clone_execute(dp, skb, key, 0, nla_data(actions),
+ nla_len(actions), last, clone_flow_key);
+}
+
/* Execute a list of actions against 'skb'. */
static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
struct sw_flow_key *key,
break;
}
+
+ case OVS_ACTION_ATTR_CHECK_PKT_LEN: {
+ bool last = nla_is_last(a, rem);
+
+ err = execute_check_pkt_len(dp, skb, key, a, last);
+ if (last)
+ return err;
+
+ break;
+ }
}
if (unlikely(err)) {
#include <net/netfilter/nf_conntrack_helper.h>
#include <net/netfilter/nf_conntrack_labels.h>
#include <net/netfilter/nf_conntrack_seqadj.h>
+#include <net/netfilter/nf_conntrack_timeout.h>
#include <net/netfilter/nf_conntrack_zones.h>
#include <net/netfilter/ipv6/nf_defrag_ipv6.h>
#include <net/ipv6_frag.h>
-#ifdef CONFIG_NF_NAT_NEEDED
+#if IS_ENABLED(CONFIG_NF_NAT)
#include <net/netfilter/nf_nat.h>
#endif
u32 eventmask; /* Mask of 1 << IPCT_*. */
struct md_mark mark;
struct md_labels labels;
-#ifdef CONFIG_NF_NAT_NEEDED
+ char timeout[CTNL_TIMEOUT_NAME_MAX];
+#if IS_ENABLED(CONFIG_NF_NAT)
struct nf_nat_range2 range; /* Only present for SRC NAT and DST NAT. */
#endif
};
return ct_executed;
}
-#ifdef CONFIG_NF_NAT_NEEDED
+#if IS_ENABLED(CONFIG_NF_NAT)
/* Modelled after nf_nat_ipv[46]_fn().
* range is only used for new, uninitialized NAT state.
* Returns either NF_ACCEPT or NF_DROP.
return err;
}
-#else /* !CONFIG_NF_NAT_NEEDED */
+#else /* !CONFIG_NF_NAT */
static int ovs_ct_nat(struct net *net, struct sw_flow_key *key,
const struct ovs_conntrack_info *info,
struct sk_buff *skb, struct nf_conn *ct,
GFP_ATOMIC);
if (err)
return err;
+
+ /* helper installed, add seqadj if NAT is required */
+ if (info->nat && !nfct_seqadj(ct)) {
+ if (!nfct_seqadj_ext_add(ct))
+ return -EINVAL;
+ }
}
/* Call the helper only if:
return 0;
}
-#ifdef CONFIG_NF_NAT_NEEDED
+#if IS_ENABLED(CONFIG_NF_NAT)
static int parse_nat(const struct nlattr *attr,
struct ovs_conntrack_info *info, bool log)
{
.maxlen = sizeof(struct md_labels) },
[OVS_CT_ATTR_HELPER] = { .minlen = 1,
.maxlen = NF_CT_HELPER_NAME_LEN },
-#ifdef CONFIG_NF_NAT_NEEDED
+#if IS_ENABLED(CONFIG_NF_NAT)
/* NAT length is checked when parsing the nested attributes. */
[OVS_CT_ATTR_NAT] = { .minlen = 0, .maxlen = INT_MAX },
#endif
[OVS_CT_ATTR_EVENTMASK] = { .minlen = sizeof(u32),
.maxlen = sizeof(u32) },
+ [OVS_CT_ATTR_TIMEOUT] = { .minlen = 1,
+ .maxlen = CTNL_TIMEOUT_NAME_MAX },
};
static int parse_ct(const struct nlattr *attr, struct ovs_conntrack_info *info,
return -EINVAL;
}
break;
-#ifdef CONFIG_NF_NAT_NEEDED
+#if IS_ENABLED(CONFIG_NF_NAT)
case OVS_CT_ATTR_NAT: {
int err = parse_nat(a, info, log);
info->have_eventmask = true;
info->eventmask = nla_get_u32(a);
break;
+#ifdef CONFIG_NF_CONNTRACK_TIMEOUT
+ case OVS_CT_ATTR_TIMEOUT:
+ memcpy(info->timeout, nla_data(a), nla_len(a));
+ if (!memchr(info->timeout, '\0', nla_len(a))) {
+ OVS_NLERR(log, "Invalid conntrack helper");
+ return -EINVAL;
+ }
+ break;
+#endif
default:
OVS_NLERR(log, "Unknown conntrack attr (%d)",
OVS_NLERR(log, "Failed to allocate conntrack template");
return -ENOMEM;
}
+
+ if (ct_info.timeout[0]) {
+ if (nf_ct_set_timeout(net, ct_info.ct, family, key->ip.proto,
+ ct_info.timeout))
+ pr_info_ratelimited("Failed to associated timeout "
+ "policy `%s'\n", ct_info.timeout);
+ }
+
if (helper) {
err = ovs_ct_add_helper(&ct_info, helper, key, log);
if (err)
return err;
}
-#ifdef CONFIG_NF_NAT_NEEDED
+#if IS_ENABLED(CONFIG_NF_NAT)
static bool ovs_ct_nat_to_attr(const struct ovs_conntrack_info *info,
struct sk_buff *skb)
{
if (ct_info->have_eventmask &&
nla_put_u32(skb, OVS_CT_ATTR_EVENTMASK, ct_info->eventmask))
return -EMSGSIZE;
+ if (ct_info->timeout[0]) {
+ if (nla_put_string(skb, OVS_CT_ATTR_TIMEOUT, ct_info->timeout))
+ return -EMSGSIZE;
+ }
-#ifdef CONFIG_NF_NAT_NEEDED
+#if IS_ENABLED(CONFIG_NF_NAT)
if (ct_info->nat && !ovs_ct_nat_to_attr(ct_info, skb))
return -EMSGSIZE;
#endif
{
if (ct_info->helper)
nf_conntrack_helper_put(ct_info->helper);
- if (ct_info->ct)
+ if (ct_info->ct) {
+ if (ct_info->timeout[0])
+ nf_ct_destroy_timeout(ct_info->ct);
nf_ct_tmpl_free(ct_info->ct);
+ }
}
#if IS_ENABLED(CONFIG_NETFILTER_CONNCOUNT)
{ .cmd = OVS_CT_LIMIT_CMD_SET,
.flags = GENL_ADMIN_PERM, /* Requires CAP_NET_ADMIN
* privilege. */
- .policy = ct_limit_policy,
.doit = ovs_ct_limit_cmd_set,
},
{ .cmd = OVS_CT_LIMIT_CMD_DEL,
.flags = GENL_ADMIN_PERM, /* Requires CAP_NET_ADMIN
* privilege. */
- .policy = ct_limit_policy,
.doit = ovs_ct_limit_cmd_del,
},
{ .cmd = OVS_CT_LIMIT_CMD_GET,
.flags = 0, /* OK for unprivileged users. */
- .policy = ct_limit_policy,
.doit = ovs_ct_limit_cmd_get,
},
};
.name = OVS_CT_LIMIT_FAMILY,
.version = OVS_CT_LIMIT_VERSION,
.maxattr = OVS_CT_LIMIT_ATTR_MAX,
+ .policy = ct_limit_policy,
.netnsok = true,
.parallel_ops = true,
.ops = ct_limit_genl_ops,
static const struct genl_ops dp_packet_genl_ops[] = {
{ .cmd = OVS_PACKET_CMD_EXECUTE,
.flags = GENL_UNS_ADMIN_PERM, /* Requires CAP_NET_ADMIN privilege. */
- .policy = packet_policy,
.doit = ovs_packet_cmd_execute
}
};
.name = OVS_PACKET_FAMILY,
.version = OVS_PACKET_VERSION,
.maxattr = OVS_PACKET_ATTR_MAX,
+ .policy = packet_policy,
.netnsok = true,
.parallel_ops = true,
.ops = dp_packet_genl_ops,
static const struct genl_ops dp_flow_genl_ops[] = {
{ .cmd = OVS_FLOW_CMD_NEW,
.flags = GENL_UNS_ADMIN_PERM, /* Requires CAP_NET_ADMIN privilege. */
- .policy = flow_policy,
.doit = ovs_flow_cmd_new
},
{ .cmd = OVS_FLOW_CMD_DEL,
.flags = GENL_UNS_ADMIN_PERM, /* Requires CAP_NET_ADMIN privilege. */
- .policy = flow_policy,
.doit = ovs_flow_cmd_del
},
{ .cmd = OVS_FLOW_CMD_GET,
.flags = 0, /* OK for unprivileged users. */
- .policy = flow_policy,
.doit = ovs_flow_cmd_get,
.dumpit = ovs_flow_cmd_dump
},
{ .cmd = OVS_FLOW_CMD_SET,
.flags = GENL_UNS_ADMIN_PERM, /* Requires CAP_NET_ADMIN privilege. */
- .policy = flow_policy,
.doit = ovs_flow_cmd_set,
},
};
.name = OVS_FLOW_FAMILY,
.version = OVS_FLOW_VERSION,
.maxattr = OVS_FLOW_ATTR_MAX,
+ .policy = flow_policy,
.netnsok = true,
.parallel_ops = true,
.ops = dp_flow_genl_ops,
static const struct genl_ops dp_datapath_genl_ops[] = {
{ .cmd = OVS_DP_CMD_NEW,
.flags = GENL_UNS_ADMIN_PERM, /* Requires CAP_NET_ADMIN privilege. */
- .policy = datapath_policy,
.doit = ovs_dp_cmd_new
},
{ .cmd = OVS_DP_CMD_DEL,
.flags = GENL_UNS_ADMIN_PERM, /* Requires CAP_NET_ADMIN privilege. */
- .policy = datapath_policy,
.doit = ovs_dp_cmd_del
},
{ .cmd = OVS_DP_CMD_GET,
.flags = 0, /* OK for unprivileged users. */
- .policy = datapath_policy,
.doit = ovs_dp_cmd_get,
.dumpit = ovs_dp_cmd_dump
},
{ .cmd = OVS_DP_CMD_SET,
.flags = GENL_UNS_ADMIN_PERM, /* Requires CAP_NET_ADMIN privilege. */
- .policy = datapath_policy,
.doit = ovs_dp_cmd_set,
},
};
.name = OVS_DATAPATH_FAMILY,
.version = OVS_DATAPATH_VERSION,
.maxattr = OVS_DP_ATTR_MAX,
+ .policy = datapath_policy,
.netnsok = true,
.parallel_ops = true,
.ops = dp_datapath_genl_ops,
static const struct genl_ops dp_vport_genl_ops[] = {
{ .cmd = OVS_VPORT_CMD_NEW,
.flags = GENL_UNS_ADMIN_PERM, /* Requires CAP_NET_ADMIN privilege. */
- .policy = vport_policy,
.doit = ovs_vport_cmd_new
},
{ .cmd = OVS_VPORT_CMD_DEL,
.flags = GENL_UNS_ADMIN_PERM, /* Requires CAP_NET_ADMIN privilege. */
- .policy = vport_policy,
.doit = ovs_vport_cmd_del
},
{ .cmd = OVS_VPORT_CMD_GET,
.flags = 0, /* OK for unprivileged users. */
- .policy = vport_policy,
.doit = ovs_vport_cmd_get,
.dumpit = ovs_vport_cmd_dump
},
{ .cmd = OVS_VPORT_CMD_SET,
.flags = GENL_UNS_ADMIN_PERM, /* Requires CAP_NET_ADMIN privilege. */
- .policy = vport_policy,
.doit = ovs_vport_cmd_set,
},
};
.name = OVS_VPORT_FAMILY,
.version = OVS_VPORT_VERSION,
.maxattr = OVS_VPORT_ATTR_MAX,
+ .policy = vport_policy,
.netnsok = true,
.parallel_ops = true,
.ops = dp_vport_genl_ops,
case OVS_ACTION_ATTR_SET:
case OVS_ACTION_ATTR_SET_MASKED:
case OVS_ACTION_ATTR_METER:
+ case OVS_ACTION_ATTR_CHECK_PKT_LEN:
default:
return true;
}
[OVS_TUNNEL_KEY_ATTR_IPV6_SRC] = { .len = sizeof(struct in6_addr) },
[OVS_TUNNEL_KEY_ATTR_IPV6_DST] = { .len = sizeof(struct in6_addr) },
[OVS_TUNNEL_KEY_ATTR_ERSPAN_OPTS] = { .len = OVS_ATTR_VARIABLE },
+ [OVS_TUNNEL_KEY_ATTR_IPV4_INFO_BRIDGE] = { .len = 0 },
};
static const struct ovs_len_tbl
bool log)
{
bool ttl = false, ipv4 = false, ipv6 = false;
+ bool info_bridge_mode = false;
__be16 tun_flags = 0;
int opts_type = 0;
struct nlattr *a;
tun_flags |= TUNNEL_ERSPAN_OPT;
opts_type = type;
break;
+ case OVS_TUNNEL_KEY_ATTR_IPV4_INFO_BRIDGE:
+ info_bridge_mode = true;
+ ipv4 = true;
+ break;
default:
OVS_NLERR(log, "Unknown IP tunnel attribute %d",
type);
OVS_NLERR(log, "IP tunnel dst address not specified");
return -EINVAL;
}
- if (ipv4 && !match->key->tun_key.u.ipv4.dst) {
- OVS_NLERR(log, "IPv4 tunnel dst address is zero");
- return -EINVAL;
+ if (ipv4) {
+ if (info_bridge_mode) {
+ if (match->key->tun_key.u.ipv4.src ||
+ match->key->tun_key.u.ipv4.dst ||
+ match->key->tun_key.tp_src ||
+ match->key->tun_key.tp_dst ||
+ match->key->tun_key.ttl ||
+ match->key->tun_key.tos ||
+ tun_flags & ~TUNNEL_KEY) {
+ OVS_NLERR(log, "IPv4 tun info is not correct");
+ return -EINVAL;
+ }
+ } else if (!match->key->tun_key.u.ipv4.dst) {
+ OVS_NLERR(log, "IPv4 tunnel dst address is zero");
+ return -EINVAL;
+ }
}
if (ipv6 && ipv6_addr_any(&match->key->tun_key.u.ipv6.dst)) {
OVS_NLERR(log, "IPv6 tunnel dst address is zero");
return -EINVAL;
}
- if (!ttl) {
+ if (!ttl && !info_bridge_mode) {
OVS_NLERR(log, "IP tunnel TTL not specified.");
return -EINVAL;
}
static int __ip_tun_to_nlattr(struct sk_buff *skb,
const struct ip_tunnel_key *output,
const void *tun_opts, int swkey_tun_opts_len,
- unsigned short tun_proto)
+ unsigned short tun_proto, u8 mode)
{
if (output->tun_flags & TUNNEL_KEY &&
nla_put_be64(skb, OVS_TUNNEL_KEY_ATTR_ID, output->tun_id,
OVS_TUNNEL_KEY_ATTR_PAD))
return -EMSGSIZE;
+
+ if (mode & IP_TUNNEL_INFO_BRIDGE)
+ return nla_put_flag(skb, OVS_TUNNEL_KEY_ATTR_IPV4_INFO_BRIDGE)
+ ? -EMSGSIZE : 0;
+
switch (tun_proto) {
case AF_INET:
if (output->u.ipv4.src &&
static int ip_tun_to_nlattr(struct sk_buff *skb,
const struct ip_tunnel_key *output,
const void *tun_opts, int swkey_tun_opts_len,
- unsigned short tun_proto)
+ unsigned short tun_proto, u8 mode)
{
struct nlattr *nla;
int err;
return -EMSGSIZE;
err = __ip_tun_to_nlattr(skb, output, tun_opts, swkey_tun_opts_len,
- tun_proto);
+ tun_proto, mode);
if (err)
return err;
return __ip_tun_to_nlattr(skb, &tun_info->key,
ip_tunnel_info_opts(tun_info),
tun_info->options_len,
- ip_tunnel_info_af(tun_info));
+ ip_tunnel_info_af(tun_info), tun_info->mode);
}
static int encode_vlan_from_nlattrs(struct sw_flow_match *match,
opts = TUN_METADATA_OPTS(output, swkey->tun_opts_len);
if (ip_tun_to_nlattr(skb, &output->tun_key, opts,
- swkey->tun_opts_len, swkey->tun_proto))
+ swkey->tun_opts_len, swkey->tun_proto, 0))
goto nla_put_failure;
}
tun_info->mode = IP_TUNNEL_INFO_TX;
if (key.tun_proto == AF_INET6)
tun_info->mode |= IP_TUNNEL_INFO_IPV6;
+ else if (key.tun_proto == AF_INET && key.tun_key.u.ipv4.dst == 0)
+ tun_info->mode |= IP_TUNNEL_INFO_BRIDGE;
tun_info->key = key.tun_key;
/* We need to store the options in the action itself since
return 0;
}
+static const struct nla_policy cpl_policy[OVS_CHECK_PKT_LEN_ATTR_MAX + 1] = {
+ [OVS_CHECK_PKT_LEN_ATTR_PKT_LEN] = {.type = NLA_U16 },
+ [OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_GREATER] = {.type = NLA_NESTED },
+ [OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_LESS_EQUAL] = {.type = NLA_NESTED },
+};
+
+static int validate_and_copy_check_pkt_len(struct net *net,
+ const struct nlattr *attr,
+ const struct sw_flow_key *key,
+ struct sw_flow_actions **sfa,
+ __be16 eth_type, __be16 vlan_tci,
+ bool log, bool last)
+{
+ const struct nlattr *acts_if_greater, *acts_if_lesser_eq;
+ struct nlattr *a[OVS_CHECK_PKT_LEN_ATTR_MAX + 1];
+ struct check_pkt_len_arg arg;
+ int nested_acts_start;
+ int start, err;
+
+ err = nla_parse_strict(a, OVS_CHECK_PKT_LEN_ATTR_MAX, nla_data(attr),
+ nla_len(attr), cpl_policy, NULL);
+ if (err)
+ return err;
+
+ if (!a[OVS_CHECK_PKT_LEN_ATTR_PKT_LEN] ||
+ !nla_get_u16(a[OVS_CHECK_PKT_LEN_ATTR_PKT_LEN]))
+ return -EINVAL;
+
+ acts_if_lesser_eq = a[OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_LESS_EQUAL];
+ acts_if_greater = a[OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_GREATER];
+
+ /* Both the nested action should be present. */
+ if (!acts_if_greater || !acts_if_lesser_eq)
+ return -EINVAL;
+
+ /* validation done, copy the nested actions. */
+ start = add_nested_action_start(sfa, OVS_ACTION_ATTR_CHECK_PKT_LEN,
+ log);
+ if (start < 0)
+ return start;
+
+ arg.pkt_len = nla_get_u16(a[OVS_CHECK_PKT_LEN_ATTR_PKT_LEN]);
+ arg.exec_for_lesser_equal =
+ last || !actions_may_change_flow(acts_if_lesser_eq);
+ arg.exec_for_greater =
+ last || !actions_may_change_flow(acts_if_greater);
+
+ err = ovs_nla_add_action(sfa, OVS_CHECK_PKT_LEN_ATTR_ARG, &arg,
+ sizeof(arg), log);
+ if (err)
+ return err;
+
+ nested_acts_start = add_nested_action_start(sfa,
+ OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_LESS_EQUAL, log);
+ if (nested_acts_start < 0)
+ return nested_acts_start;
+
+ err = __ovs_nla_copy_actions(net, acts_if_lesser_eq, key, sfa,
+ eth_type, vlan_tci, log);
+
+ if (err)
+ return err;
+
+ add_nested_action_end(*sfa, nested_acts_start);
+
+ nested_acts_start = add_nested_action_start(sfa,
+ OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_GREATER, log);
+ if (nested_acts_start < 0)
+ return nested_acts_start;
+
+ err = __ovs_nla_copy_actions(net, acts_if_greater, key, sfa,
+ eth_type, vlan_tci, log);
+
+ if (err)
+ return err;
+
+ add_nested_action_end(*sfa, nested_acts_start);
+ add_nested_action_end(*sfa, start);
+ return 0;
+}
+
static int copy_action(const struct nlattr *from,
struct sw_flow_actions **sfa, bool log)
{
[OVS_ACTION_ATTR_POP_NSH] = 0,
[OVS_ACTION_ATTR_METER] = sizeof(u32),
[OVS_ACTION_ATTR_CLONE] = (u32)-1,
+ [OVS_ACTION_ATTR_CHECK_PKT_LEN] = (u32)-1,
};
const struct ovs_action_push_vlan *vlan;
int type = nla_type(a);
break;
}
+ case OVS_ACTION_ATTR_CHECK_PKT_LEN: {
+ bool last = nla_is_last(a, rem);
+
+ err = validate_and_copy_check_pkt_len(net, a, key, sfa,
+ eth_type,
+ vlan_tci, log,
+ last);
+ if (err)
+ return err;
+ skip_copy = true;
+ break;
+ }
+
default:
OVS_NLERR(log, "Unknown Action type %d", type);
return -EINVAL;
return err;
}
+static int check_pkt_len_action_to_attr(const struct nlattr *attr,
+ struct sk_buff *skb)
+{
+ struct nlattr *start, *ac_start = NULL;
+ const struct check_pkt_len_arg *arg;
+ const struct nlattr *a, *cpl_arg;
+ int err = 0, rem = nla_len(attr);
+
+ start = nla_nest_start(skb, OVS_ACTION_ATTR_CHECK_PKT_LEN);
+ if (!start)
+ return -EMSGSIZE;
+
+ /* The first nested attribute in 'attr' is always
+ * 'OVS_CHECK_PKT_LEN_ATTR_ARG'.
+ */
+ cpl_arg = nla_data(attr);
+ arg = nla_data(cpl_arg);
+
+ if (nla_put_u16(skb, OVS_CHECK_PKT_LEN_ATTR_PKT_LEN, arg->pkt_len)) {
+ err = -EMSGSIZE;
+ goto out;
+ }
+
+ /* Second nested attribute in 'attr' is always
+ * 'OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_LESS_EQUAL'.
+ */
+ a = nla_next(cpl_arg, &rem);
+ ac_start = nla_nest_start(skb,
+ OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_LESS_EQUAL);
+ if (!ac_start) {
+ err = -EMSGSIZE;
+ goto out;
+ }
+
+ err = ovs_nla_put_actions(nla_data(a), nla_len(a), skb);
+ if (err) {
+ nla_nest_cancel(skb, ac_start);
+ goto out;
+ } else {
+ nla_nest_end(skb, ac_start);
+ }
+
+ /* Third nested attribute in 'attr' is always
+ * OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_GREATER.
+ */
+ a = nla_next(a, &rem);
+ ac_start = nla_nest_start(skb,
+ OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_GREATER);
+ if (!ac_start) {
+ err = -EMSGSIZE;
+ goto out;
+ }
+
+ err = ovs_nla_put_actions(nla_data(a), nla_len(a), skb);
+ if (err) {
+ nla_nest_cancel(skb, ac_start);
+ goto out;
+ } else {
+ nla_nest_end(skb, ac_start);
+ }
+
+ nla_nest_end(skb, start);
+ return 0;
+
+out:
+ nla_nest_cancel(skb, start);
+ return err;
+}
+
static int set_action_to_attr(const struct nlattr *a, struct sk_buff *skb)
{
const struct nlattr *ovs_key = nla_data(a);
err = ip_tun_to_nlattr(skb, &tun_info->key,
ip_tunnel_info_opts(tun_info),
tun_info->options_len,
- ip_tunnel_info_af(tun_info));
+ ip_tunnel_info_af(tun_info), tun_info->mode);
if (err)
return err;
nla_nest_end(skb, start);
return err;
break;
+ case OVS_ACTION_ATTR_CHECK_PKT_LEN:
+ err = check_pkt_len_action_to_attr(a, skb);
+ if (err)
+ return err;
+ break;
+
default:
if (nla_put(skb, type, nla_len(a), nla_data(a)))
return -EMSGSIZE;
static struct genl_ops dp_meter_genl_ops[] = {
{ .cmd = OVS_METER_CMD_FEATURES,
.flags = 0, /* OK for unprivileged users. */
- .policy = meter_policy,
.doit = ovs_meter_cmd_features
},
{ .cmd = OVS_METER_CMD_SET,
.flags = GENL_ADMIN_PERM, /* Requires CAP_NET_ADMIN
* privilege.
*/
- .policy = meter_policy,
.doit = ovs_meter_cmd_set,
},
{ .cmd = OVS_METER_CMD_GET,
.flags = 0, /* OK for unprivileged users. */
- .policy = meter_policy,
.doit = ovs_meter_cmd_get,
},
{ .cmd = OVS_METER_CMD_DEL,
.flags = GENL_ADMIN_PERM, /* Requires CAP_NET_ADMIN
* privilege.
*/
- .policy = meter_policy,
.doit = ovs_meter_cmd_del
},
};
.name = OVS_METER_FAMILY,
.version = OVS_METER_VERSION,
.maxattr = OVS_METER_ATTR_MAX,
+ .policy = meter_policy,
.netnsok = true,
.parallel_ops = true,
.ops = dp_meter_genl_ops,
return po->xmit == packet_direct_xmit;
}
-static u16 __packet_pick_tx_queue(struct net_device *dev, struct sk_buff *skb,
- struct net_device *sb_dev)
-{
- return dev_pick_tx_cpu_id(dev, skb, sb_dev, NULL);
-}
-
static u16 packet_pick_tx_queue(struct sk_buff *skb)
{
struct net_device *dev = skb->dev;
const struct net_device_ops *ops = dev->netdev_ops;
+ int cpu = raw_smp_processor_id();
u16 queue_index;
+#ifdef CONFIG_XPS
+ skb->sender_cpu = cpu + 1;
+#endif
+ skb_record_rx_queue(skb, cpu % dev->real_num_tx_queues);
if (ops->ndo_select_queue) {
- queue_index = ops->ndo_select_queue(dev, skb, NULL,
- __packet_pick_tx_queue);
+ queue_index = ops->ndo_select_queue(dev, skb, NULL);
queue_index = netdev_cap_txqueue(dev, queue_index);
} else {
- queue_index = __packet_pick_tx_queue(dev, skb, NULL);
+ queue_index = netdev_pick_tx(dev, skb, NULL);
}
return queue_index;
/* Fall through and set IPv4 options too otherwise we don't get
* errors from IPv4 packets sent through the IPv6 socket.
*/
-
+ /* Fall through */
case AF_INET:
/* we want to receive ICMP errors */
opt = 1;
entry->tunnel = tcf_tunnel_info(act);
} else if (is_tcf_tunnel_release(act)) {
entry->id = FLOW_ACTION_TUNNEL_DECAP;
- entry->tunnel = tcf_tunnel_info(act);
} else if (is_tcf_pedit(act)) {
for (k = 0; k < tcf_pedit_nkeys(act); k++) {
switch (tcf_pedit_cmd(act, k)) {
#include <linux/module.h>
#include <linux/rhashtable.h>
#include <linux/workqueue.h>
+#include <linux/refcount.h>
#include <linux/if_ether.h>
#include <linux/in6.h>
struct list_head filters;
struct rcu_work rwork;
struct list_head list;
+ refcount_t refcnt;
};
struct fl_flow_tmplt {
struct cls_fl_head {
struct rhashtable ht;
+ spinlock_t masks_lock; /* Protect masks list */
struct list_head masks;
struct rcu_work rwork;
struct idr handle_idr;
u32 in_hw_count;
struct rcu_work rwork;
struct net_device *hw_dev;
+ /* Flower classifier is unlocked, which means that its reference counter
+ * can be changed concurrently without any kind of external
+ * synchronization. Use atomic reference counter to be concurrency-safe.
+ */
+ refcount_t refcnt;
+ bool deleted;
};
static const struct rhashtable_params mask_ht_params = {
if (!head)
return -ENOBUFS;
+ spin_lock_init(&head->masks_lock);
INIT_LIST_HEAD_RCU(&head->masks);
rcu_assign_pointer(tp->root, head);
idr_init(&head->handle_idr);
static void fl_mask_free(struct fl_flow_mask *mask)
{
+ WARN_ON(!list_empty(&mask->filters));
rhashtable_destroy(&mask->ht);
kfree(mask);
}
fl_mask_free(mask);
}
-static bool fl_mask_put(struct cls_fl_head *head, struct fl_flow_mask *mask,
- bool async)
+static bool fl_mask_put(struct cls_fl_head *head, struct fl_flow_mask *mask)
{
- if (!list_empty(&mask->filters))
+ if (!refcount_dec_and_test(&mask->refcnt))
return false;
rhashtable_remove_fast(&head->ht, &mask->ht_node, mask_ht_params);
+
+ spin_lock(&head->masks_lock);
list_del_rcu(&mask->list);
- if (async)
- tcf_queue_work(&mask->rwork, fl_mask_free_work);
- else
- fl_mask_free(mask);
+ spin_unlock(&head->masks_lock);
+
+ tcf_queue_work(&mask->rwork, fl_mask_free_work);
return true;
}
struct cls_fl_filter *f = container_of(to_rcu_work(work),
struct cls_fl_filter, rwork);
- rtnl_lock();
__fl_destroy_filter(f);
- rtnl_unlock();
}
static void fl_hw_destroy_filter(struct tcf_proto *tp, struct cls_fl_filter *f,
- struct netlink_ext_ack *extack)
+ bool rtnl_held, struct netlink_ext_ack *extack)
{
struct tc_cls_flower_offload cls_flower = {};
struct tcf_block *block = tp->chain->block;
+ if (!rtnl_held)
+ rtnl_lock();
+
tc_cls_common_offload_init(&cls_flower.common, tp, f->flags, extack);
cls_flower.command = TC_CLSFLOWER_DESTROY;
cls_flower.cookie = (unsigned long) f;
tc_setup_cb_call(block, TC_SETUP_CLSFLOWER, &cls_flower, false);
+ spin_lock(&tp->lock);
tcf_block_offload_dec(block, &f->flags);
+ spin_unlock(&tp->lock);
+
+ if (!rtnl_held)
+ rtnl_unlock();
}
static int fl_hw_replace_filter(struct tcf_proto *tp,
- struct cls_fl_filter *f,
+ struct cls_fl_filter *f, bool rtnl_held,
struct netlink_ext_ack *extack)
{
struct tc_cls_flower_offload cls_flower = {};
struct tcf_block *block = tp->chain->block;
bool skip_sw = tc_skip_sw(f->flags);
- int err;
+ int err = 0;
+
+ if (!rtnl_held)
+ rtnl_lock();
cls_flower.rule = flow_rule_alloc(tcf_exts_num_actions(&f->exts));
- if (!cls_flower.rule)
- return -ENOMEM;
+ if (!cls_flower.rule) {
+ err = -ENOMEM;
+ goto errout;
+ }
tc_cls_common_offload_init(&cls_flower.common, tp, f->flags, extack);
cls_flower.command = TC_CLSFLOWER_REPLACE;
err = tc_setup_flow_action(&cls_flower.rule->action, &f->exts);
if (err) {
kfree(cls_flower.rule);
- if (skip_sw) {
+ if (skip_sw)
NL_SET_ERR_MSG_MOD(extack, "Failed to setup flow action");
- return err;
- }
- return 0;
+ else
+ err = 0;
+ goto errout;
}
err = tc_setup_cb_call(block, TC_SETUP_CLSFLOWER, &cls_flower, skip_sw);
kfree(cls_flower.rule);
if (err < 0) {
- fl_hw_destroy_filter(tp, f, NULL);
- return err;
+ fl_hw_destroy_filter(tp, f, true, NULL);
+ goto errout;
} else if (err > 0) {
f->in_hw_count = err;
+ err = 0;
+ spin_lock(&tp->lock);
tcf_block_offload_inc(block, &f->flags);
+ spin_unlock(&tp->lock);
}
- if (skip_sw && !(f->flags & TCA_CLS_FLAGS_IN_HW))
- return -EINVAL;
+ if (skip_sw && !(f->flags & TCA_CLS_FLAGS_IN_HW)) {
+ err = -EINVAL;
+ goto errout;
+ }
- return 0;
+errout:
+ if (!rtnl_held)
+ rtnl_unlock();
+
+ return err;
}
-static void fl_hw_update_stats(struct tcf_proto *tp, struct cls_fl_filter *f)
+static void fl_hw_update_stats(struct tcf_proto *tp, struct cls_fl_filter *f,
+ bool rtnl_held)
{
struct tc_cls_flower_offload cls_flower = {};
struct tcf_block *block = tp->chain->block;
+ if (!rtnl_held)
+ rtnl_lock();
+
tc_cls_common_offload_init(&cls_flower.common, tp, f->flags, NULL);
cls_flower.command = TC_CLSFLOWER_STATS;
cls_flower.cookie = (unsigned long) f;
tcf_exts_stats_update(&f->exts, cls_flower.stats.bytes,
cls_flower.stats.pkts,
cls_flower.stats.lastused);
+
+ if (!rtnl_held)
+ rtnl_unlock();
}
-static bool __fl_delete(struct tcf_proto *tp, struct cls_fl_filter *f,
- struct netlink_ext_ack *extack)
+static struct cls_fl_head *fl_head_dereference(struct tcf_proto *tp)
{
- struct cls_fl_head *head = rtnl_dereference(tp->root);
- bool async = tcf_exts_get_net(&f->exts);
- bool last;
+ /* Flower classifier only changes root pointer during init and destroy.
+ * Users must obtain reference to tcf_proto instance before calling its
+ * API, so tp->root pointer is protected from concurrent call to
+ * fl_destroy() by reference counting.
+ */
+ return rcu_dereference_raw(tp->root);
+}
+
+static void __fl_put(struct cls_fl_filter *f)
+{
+ if (!refcount_dec_and_test(&f->refcnt))
+ return;
+
+ WARN_ON(!f->deleted);
+
+ if (tcf_exts_get_net(&f->exts))
+ tcf_queue_work(&f->rwork, fl_destroy_filter_work);
+ else
+ __fl_destroy_filter(f);
+}
+
+static struct cls_fl_filter *__fl_get(struct cls_fl_head *head, u32 handle)
+{
+ struct cls_fl_filter *f;
+
+ rcu_read_lock();
+ f = idr_find(&head->handle_idr, handle);
+ if (f && !refcount_inc_not_zero(&f->refcnt))
+ f = NULL;
+ rcu_read_unlock();
+
+ return f;
+}
+
+static struct cls_fl_filter *fl_get_next_filter(struct tcf_proto *tp,
+ unsigned long *handle)
+{
+ struct cls_fl_head *head = fl_head_dereference(tp);
+ struct cls_fl_filter *f;
+
+ rcu_read_lock();
+ while ((f = idr_get_next_ul(&head->handle_idr, handle))) {
+ /* don't return filters that are being deleted */
+ if (refcount_inc_not_zero(&f->refcnt))
+ break;
+ ++(*handle);
+ }
+ rcu_read_unlock();
+
+ return f;
+}
+
+static int __fl_delete(struct tcf_proto *tp, struct cls_fl_filter *f,
+ bool *last, bool rtnl_held,
+ struct netlink_ext_ack *extack)
+{
+ struct cls_fl_head *head = fl_head_dereference(tp);
+
+ *last = false;
+
+ spin_lock(&tp->lock);
+ if (f->deleted) {
+ spin_unlock(&tp->lock);
+ return -ENOENT;
+ }
+ f->deleted = true;
+ rhashtable_remove_fast(&f->mask->ht, &f->ht_node,
+ f->mask->filter_ht_params);
idr_remove(&head->handle_idr, f->handle);
list_del_rcu(&f->list);
- last = fl_mask_put(head, f->mask, async);
+ spin_unlock(&tp->lock);
+
+ *last = fl_mask_put(head, f->mask);
if (!tc_skip_hw(f->flags))
- fl_hw_destroy_filter(tp, f, extack);
+ fl_hw_destroy_filter(tp, f, rtnl_held, extack);
tcf_unbind_filter(tp, &f->res);
- if (async)
- tcf_queue_work(&f->rwork, fl_destroy_filter_work);
- else
- __fl_destroy_filter(f);
+ __fl_put(f);
- return last;
+ return 0;
}
static void fl_destroy_sleepable(struct work_struct *work)
static void fl_destroy(struct tcf_proto *tp, bool rtnl_held,
struct netlink_ext_ack *extack)
{
- struct cls_fl_head *head = rtnl_dereference(tp->root);
+ struct cls_fl_head *head = fl_head_dereference(tp);
struct fl_flow_mask *mask, *next_mask;
struct cls_fl_filter *f, *next;
+ bool last;
list_for_each_entry_safe(mask, next_mask, &head->masks, list) {
list_for_each_entry_safe(f, next, &mask->filters, list) {
- if (__fl_delete(tp, f, extack))
+ __fl_delete(tp, f, &last, rtnl_held, extack);
+ if (last)
break;
}
}
tcf_queue_work(&head->rwork, fl_destroy_sleepable);
}
+static void fl_put(struct tcf_proto *tp, void *arg)
+{
+ struct cls_fl_filter *f = arg;
+
+ __fl_put(f);
+}
+
static void *fl_get(struct tcf_proto *tp, u32 handle)
{
- struct cls_fl_head *head = rtnl_dereference(tp->root);
+ struct cls_fl_head *head = fl_head_dereference(tp);
- return idr_find(&head->handle_idr, handle);
+ return __fl_get(head, handle);
}
static const struct nla_policy fl_policy[TCA_FLOWER_MAX + 1] = {
INIT_LIST_HEAD_RCU(&newmask->filters);
- err = rhashtable_insert_fast(&head->ht, &newmask->ht_node,
- mask_ht_params);
+ refcount_set(&newmask->refcnt, 1);
+ err = rhashtable_replace_fast(&head->ht, &mask->ht_node,
+ &newmask->ht_node, mask_ht_params);
if (err)
goto errout_destroy;
+ /* Wait until any potential concurrent users of mask are finished */
+ synchronize_rcu();
+
+ spin_lock(&head->masks_lock);
list_add_tail_rcu(&newmask->list, &head->masks);
+ spin_unlock(&head->masks_lock);
return newmask;
struct fl_flow_mask *mask)
{
struct fl_flow_mask *newmask;
+ int ret = 0;
- fnew->mask = rhashtable_lookup_fast(&head->ht, mask, mask_ht_params);
+ rcu_read_lock();
+
+ /* Insert mask as temporary node to prevent concurrent creation of mask
+ * with same key. Any concurrent lookups with same key will return
+ * -EAGAIN because mask's refcnt is zero. It is safe to insert
+ * stack-allocated 'mask' to masks hash table because we call
+ * synchronize_rcu() before returning from this function (either in case
+ * of error or after replacing it with heap-allocated mask in
+ * fl_create_new_mask()).
+ */
+ fnew->mask = rhashtable_lookup_get_insert_fast(&head->ht,
+ &mask->ht_node,
+ mask_ht_params);
if (!fnew->mask) {
- if (fold)
- return -EINVAL;
+ rcu_read_unlock();
+
+ if (fold) {
+ ret = -EINVAL;
+ goto errout_cleanup;
+ }
newmask = fl_create_new_mask(head, mask);
- if (IS_ERR(newmask))
- return PTR_ERR(newmask);
+ if (IS_ERR(newmask)) {
+ ret = PTR_ERR(newmask);
+ goto errout_cleanup;
+ }
fnew->mask = newmask;
+ return 0;
+ } else if (IS_ERR(fnew->mask)) {
+ ret = PTR_ERR(fnew->mask);
} else if (fold && fold->mask != fnew->mask) {
- return -EINVAL;
+ ret = -EINVAL;
+ } else if (!refcount_inc_not_zero(&fnew->mask->refcnt)) {
+ /* Mask was deleted concurrently, try again */
+ ret = -EAGAIN;
}
+ rcu_read_unlock();
+ return ret;
- return 0;
+errout_cleanup:
+ rhashtable_remove_fast(&head->ht, &mask->ht_node,
+ mask_ht_params);
+ /* Wait until any potential concurrent users of mask are finished */
+ synchronize_rcu();
+ return ret;
}
static int fl_set_parms(struct net *net, struct tcf_proto *tp,
struct cls_fl_filter *f, struct fl_flow_mask *mask,
unsigned long base, struct nlattr **tb,
struct nlattr *est, bool ovr,
- struct fl_flow_tmplt *tmplt,
+ struct fl_flow_tmplt *tmplt, bool rtnl_held,
struct netlink_ext_ack *extack)
{
int err;
- err = tcf_exts_validate(net, tp, tb, est, &f->exts, ovr, true,
+ err = tcf_exts_validate(net, tp, tb, est, &f->exts, ovr, rtnl_held,
extack);
if (err < 0)
return err;
if (tb[TCA_FLOWER_CLASSID]) {
f->res.classid = nla_get_u32(tb[TCA_FLOWER_CLASSID]);
+ if (!rtnl_held)
+ rtnl_lock();
tcf_bind_filter(tp, &f->res, base);
+ if (!rtnl_held)
+ rtnl_unlock();
}
err = fl_set_key(net, tb, &f->key, &mask->key, extack);
return 0;
}
+static int fl_ht_insert_unique(struct cls_fl_filter *fnew,
+ struct cls_fl_filter *fold,
+ bool *in_ht)
+{
+ struct fl_flow_mask *mask = fnew->mask;
+ int err;
+
+ err = rhashtable_lookup_insert_fast(&mask->ht,
+ &fnew->ht_node,
+ mask->filter_ht_params);
+ if (err) {
+ *in_ht = false;
+ /* It is okay if filter with same key exists when
+ * overwriting.
+ */
+ return fold && err == -EEXIST ? 0 : err;
+ }
+
+ *in_ht = true;
+ return 0;
+}
+
static int fl_change(struct net *net, struct sk_buff *in_skb,
struct tcf_proto *tp, unsigned long base,
u32 handle, struct nlattr **tca,
void **arg, bool ovr, bool rtnl_held,
struct netlink_ext_ack *extack)
{
- struct cls_fl_head *head = rtnl_dereference(tp->root);
+ struct cls_fl_head *head = fl_head_dereference(tp);
struct cls_fl_filter *fold = *arg;
struct cls_fl_filter *fnew;
struct fl_flow_mask *mask;
struct nlattr **tb;
+ bool in_ht;
int err;
- if (!tca[TCA_OPTIONS])
- return -EINVAL;
+ if (!tca[TCA_OPTIONS]) {
+ err = -EINVAL;
+ goto errout_fold;
+ }
mask = kzalloc(sizeof(struct fl_flow_mask), GFP_KERNEL);
- if (!mask)
- return -ENOBUFS;
+ if (!mask) {
+ err = -ENOBUFS;
+ goto errout_fold;
+ }
tb = kcalloc(TCA_FLOWER_MAX + 1, sizeof(struct nlattr *), GFP_KERNEL);
if (!tb) {
err = -ENOBUFS;
goto errout_tb;
}
+ refcount_set(&fnew->refcnt, 1);
err = tcf_exts_init(&fnew->exts, net, TCA_FLOWER_ACT, 0);
if (err < 0)
}
err = fl_set_parms(net, tp, fnew, mask, base, tb, tca[TCA_RATE], ovr,
- tp->chain->tmplt_priv, extack);
+ tp->chain->tmplt_priv, rtnl_held, extack);
if (err)
goto errout;
if (err)
goto errout;
- if (!handle) {
- handle = 1;
- err = idr_alloc_u32(&head->handle_idr, fnew, &handle,
- INT_MAX, GFP_KERNEL);
- } else if (!fold) {
- /* user specifies a handle and it doesn't exist */
- err = idr_alloc_u32(&head->handle_idr, fnew, &handle,
- handle, GFP_KERNEL);
- }
+ err = fl_ht_insert_unique(fnew, fold, &in_ht);
if (err)
goto errout_mask;
- fnew->handle = handle;
-
- if (!fold && __fl_lookup(fnew->mask, &fnew->mkey)) {
- err = -EEXIST;
- goto errout_idr;
- }
-
- err = rhashtable_insert_fast(&fnew->mask->ht, &fnew->ht_node,
- fnew->mask->filter_ht_params);
- if (err)
- goto errout_idr;
if (!tc_skip_hw(fnew->flags)) {
- err = fl_hw_replace_filter(tp, fnew, extack);
+ err = fl_hw_replace_filter(tp, fnew, rtnl_held, extack);
if (err)
- goto errout_mask_ht;
+ goto errout_ht;
}
if (!tc_in_hw(fnew->flags))
fnew->flags |= TCA_CLS_FLAGS_NOT_IN_HW;
+ spin_lock(&tp->lock);
+
+ /* tp was deleted concurrently. -EAGAIN will cause caller to lookup
+ * proto again or create new one, if necessary.
+ */
+ if (tp->deleting) {
+ err = -EAGAIN;
+ goto errout_hw;
+ }
+
+ refcount_inc(&fnew->refcnt);
if (fold) {
+ /* Fold filter was deleted concurrently. Retry lookup. */
+ if (fold->deleted) {
+ err = -EAGAIN;
+ goto errout_hw;
+ }
+
+ fnew->handle = handle;
+
+ if (!in_ht) {
+ struct rhashtable_params params =
+ fnew->mask->filter_ht_params;
+
+ err = rhashtable_insert_fast(&fnew->mask->ht,
+ &fnew->ht_node,
+ params);
+ if (err)
+ goto errout_hw;
+ in_ht = true;
+ }
+
rhashtable_remove_fast(&fold->mask->ht,
&fold->ht_node,
fold->mask->filter_ht_params);
- if (!tc_skip_hw(fold->flags))
- fl_hw_destroy_filter(tp, fold, NULL);
- }
-
- *arg = fnew;
-
- if (fold) {
idr_replace(&head->handle_idr, fnew, fnew->handle);
list_replace_rcu(&fold->list, &fnew->list);
+ fold->deleted = true;
+
+ spin_unlock(&tp->lock);
+
+ fl_mask_put(head, fold->mask);
+ if (!tc_skip_hw(fold->flags))
+ fl_hw_destroy_filter(tp, fold, rtnl_held, NULL);
tcf_unbind_filter(tp, &fold->res);
- tcf_exts_get_net(&fold->exts);
- tcf_queue_work(&fold->rwork, fl_destroy_filter_work);
+ /* Caller holds reference to fold, so refcnt is always > 0
+ * after this.
+ */
+ refcount_dec(&fold->refcnt);
+ __fl_put(fold);
} else {
+ if (handle) {
+ /* user specifies a handle and it doesn't exist */
+ err = idr_alloc_u32(&head->handle_idr, fnew, &handle,
+ handle, GFP_ATOMIC);
+
+ /* Filter with specified handle was concurrently
+ * inserted after initial check in cls_api. This is not
+ * necessarily an error if NLM_F_EXCL is not set in
+ * message flags. Returning EAGAIN will cause cls_api to
+ * try to update concurrently inserted rule.
+ */
+ if (err == -ENOSPC)
+ err = -EAGAIN;
+ } else {
+ handle = 1;
+ err = idr_alloc_u32(&head->handle_idr, fnew, &handle,
+ INT_MAX, GFP_ATOMIC);
+ }
+ if (err)
+ goto errout_hw;
+
+ fnew->handle = handle;
list_add_tail_rcu(&fnew->list, &fnew->mask->filters);
+ spin_unlock(&tp->lock);
}
+ *arg = fnew;
+
kfree(tb);
kfree(mask);
return 0;
-errout_mask_ht:
- rhashtable_remove_fast(&fnew->mask->ht, &fnew->ht_node,
- fnew->mask->filter_ht_params);
-
-errout_idr:
- if (!fold)
- idr_remove(&head->handle_idr, fnew->handle);
-
+errout_hw:
+ spin_unlock(&tp->lock);
+ if (!tc_skip_hw(fnew->flags))
+ fl_hw_destroy_filter(tp, fnew, rtnl_held, NULL);
+errout_ht:
+ if (in_ht)
+ rhashtable_remove_fast(&fnew->mask->ht, &fnew->ht_node,
+ fnew->mask->filter_ht_params);
errout_mask:
- fl_mask_put(head, fnew->mask, false);
-
+ fl_mask_put(head, fnew->mask);
errout:
- tcf_exts_destroy(&fnew->exts);
- kfree(fnew);
+ tcf_exts_get_net(&fnew->exts);
+ tcf_queue_work(&fnew->rwork, fl_destroy_filter_work);
errout_tb:
kfree(tb);
errout_mask_alloc:
kfree(mask);
+errout_fold:
+ if (fold)
+ __fl_put(fold);
return err;
}
static int fl_delete(struct tcf_proto *tp, void *arg, bool *last,
bool rtnl_held, struct netlink_ext_ack *extack)
{
- struct cls_fl_head *head = rtnl_dereference(tp->root);
+ struct cls_fl_head *head = fl_head_dereference(tp);
struct cls_fl_filter *f = arg;
+ bool last_on_mask;
+ int err = 0;
- rhashtable_remove_fast(&f->mask->ht, &f->ht_node,
- f->mask->filter_ht_params);
- __fl_delete(tp, f, extack);
+ err = __fl_delete(tp, f, &last_on_mask, rtnl_held, extack);
*last = list_empty(&head->masks);
- return 0;
+ __fl_put(f);
+
+ return err;
}
static void fl_walk(struct tcf_proto *tp, struct tcf_walker *arg,
bool rtnl_held)
{
- struct cls_fl_head *head = rtnl_dereference(tp->root);
struct cls_fl_filter *f;
arg->count = arg->skip;
- while ((f = idr_get_next_ul(&head->handle_idr,
- &arg->cookie)) != NULL) {
+ while ((f = fl_get_next_filter(tp, &arg->cookie)) != NULL) {
if (arg->fn(tp, f, arg) < 0) {
+ __fl_put(f);
arg->stop = 1;
break;
}
- arg->cookie = f->handle + 1;
+ __fl_put(f);
+ arg->cookie++;
arg->count++;
}
}
static int fl_reoffload(struct tcf_proto *tp, bool add, tc_setup_cb_t *cb,
void *cb_priv, struct netlink_ext_ack *extack)
{
- struct cls_fl_head *head = rtnl_dereference(tp->root);
struct tc_cls_flower_offload cls_flower = {};
struct tcf_block *block = tp->chain->block;
- struct fl_flow_mask *mask;
+ unsigned long handle = 0;
struct cls_fl_filter *f;
int err;
- list_for_each_entry(mask, &head->masks, list) {
- list_for_each_entry(f, &mask->filters, list) {
- if (tc_skip_hw(f->flags))
- continue;
-
- cls_flower.rule =
- flow_rule_alloc(tcf_exts_num_actions(&f->exts));
- if (!cls_flower.rule)
- return -ENOMEM;
-
- tc_cls_common_offload_init(&cls_flower.common, tp,
- f->flags, extack);
- cls_flower.command = add ?
- TC_CLSFLOWER_REPLACE : TC_CLSFLOWER_DESTROY;
- cls_flower.cookie = (unsigned long)f;
- cls_flower.rule->match.dissector = &mask->dissector;
- cls_flower.rule->match.mask = &mask->key;
- cls_flower.rule->match.key = &f->mkey;
-
- err = tc_setup_flow_action(&cls_flower.rule->action,
- &f->exts);
- if (err) {
- kfree(cls_flower.rule);
- if (tc_skip_sw(f->flags)) {
- NL_SET_ERR_MSG_MOD(extack, "Failed to setup flow action");
- return err;
- }
- continue;
- }
+ while ((f = fl_get_next_filter(tp, &handle))) {
+ if (tc_skip_hw(f->flags))
+ goto next_flow;
- cls_flower.classid = f->res.classid;
+ cls_flower.rule =
+ flow_rule_alloc(tcf_exts_num_actions(&f->exts));
+ if (!cls_flower.rule) {
+ __fl_put(f);
+ return -ENOMEM;
+ }
- err = cb(TC_SETUP_CLSFLOWER, &cls_flower, cb_priv);
+ tc_cls_common_offload_init(&cls_flower.common, tp, f->flags,
+ extack);
+ cls_flower.command = add ?
+ TC_CLSFLOWER_REPLACE : TC_CLSFLOWER_DESTROY;
+ cls_flower.cookie = (unsigned long)f;
+ cls_flower.rule->match.dissector = &f->mask->dissector;
+ cls_flower.rule->match.mask = &f->mask->key;
+ cls_flower.rule->match.key = &f->mkey;
+
+ err = tc_setup_flow_action(&cls_flower.rule->action, &f->exts);
+ if (err) {
kfree(cls_flower.rule);
-
- if (err) {
- if (add && tc_skip_sw(f->flags))
- return err;
- continue;
+ if (tc_skip_sw(f->flags)) {
+ NL_SET_ERR_MSG_MOD(extack, "Failed to setup flow action");
+ __fl_put(f);
+ return err;
}
+ goto next_flow;
+ }
+
+ cls_flower.classid = f->res.classid;
+
+ err = cb(TC_SETUP_CLSFLOWER, &cls_flower, cb_priv);
+ kfree(cls_flower.rule);
- tc_cls_offload_cnt_update(block, &f->in_hw_count,
- &f->flags, add);
+ if (err) {
+ if (add && tc_skip_sw(f->flags)) {
+ __fl_put(f);
+ return err;
+ }
+ goto next_flow;
}
+
+ spin_lock(&tp->lock);
+ tc_cls_offload_cnt_update(block, &f->in_hw_count, &f->flags,
+ add);
+ spin_unlock(&tp->lock);
+next_flow:
+ handle++;
+ __fl_put(f);
}
return 0;
struct cls_fl_filter *f = fh;
struct nlattr *nest;
struct fl_flow_key *key, *mask;
+ bool skip_hw;
if (!f)
return skb->len;
if (!nest)
goto nla_put_failure;
+ spin_lock(&tp->lock);
+
if (f->res.classid &&
nla_put_u32(skb, TCA_FLOWER_CLASSID, f->res.classid))
- goto nla_put_failure;
+ goto nla_put_failure_locked;
key = &f->key;
mask = &f->mask->key;
+ skip_hw = tc_skip_hw(f->flags);
if (fl_dump_key(skb, net, key, mask))
- goto nla_put_failure;
-
- if (!tc_skip_hw(f->flags))
- fl_hw_update_stats(tp, f);
+ goto nla_put_failure_locked;
if (f->flags && nla_put_u32(skb, TCA_FLOWER_FLAGS, f->flags))
- goto nla_put_failure;
+ goto nla_put_failure_locked;
+
+ spin_unlock(&tp->lock);
+
+ if (!skip_hw)
+ fl_hw_update_stats(tp, f, rtnl_held);
if (nla_put_u32(skb, TCA_FLOWER_IN_HW_COUNT, f->in_hw_count))
goto nla_put_failure;
return skb->len;
+nla_put_failure_locked:
+ spin_unlock(&tp->lock);
nla_put_failure:
nla_nest_cancel(skb, nest);
return -1;
.init = fl_init,
.destroy = fl_destroy,
.get = fl_get,
+ .put = fl_put,
.change = fl_change,
.delete = fl_delete,
.walk = fl_walk,
.tmplt_destroy = fl_tmplt_destroy,
.tmplt_dump = fl_tmplt_dump,
.owner = THIS_MODULE,
+ .flags = TCF_PROTO_OPS_DOIT_UNLOCKED,
};
static int __init cls_fl_init(void)
qdisc_put(old);
}
+static void qdisc_clear_nolock(struct Qdisc *sch)
+{
+ sch->flags &= ~TCQ_F_NOLOCK;
+ if (!(sch->flags & TCQ_F_CPUSTATS))
+ return;
+
+ free_percpu(sch->cpu_bstats);
+ free_percpu(sch->cpu_qstats);
+ sch->cpu_bstats = NULL;
+ sch->cpu_qstats = NULL;
+ sch->flags &= ~TCQ_F_CPUSTATS;
+}
+
/* Graft qdisc "new" to class "classid" of qdisc "parent" or
* to device "dev".
*
/* Only support running class lockless if parent is lockless */
if (new && (new->flags & TCQ_F_NOLOCK) &&
parent && !(parent->flags & TCQ_F_NOLOCK))
- new->flags &= ~TCQ_F_NOLOCK;
+ qdisc_clear_nolock(new);
if (!cops || !cops->graft)
return -EOPNOTSUPP;
#include <linux/string.h>
#include <linux/errno.h>
#include <linux/skbuff.h>
+#include <net/netevent.h>
#include <net/netlink.h>
#include <net/sch_generic.h>
#include <net/pkt_sched.h>
+static LIST_HEAD(cbs_list);
+static DEFINE_SPINLOCK(cbs_list_lock);
+
#define BYTES_PER_KBIT (1000LL / 8)
struct cbs_sched_data {
bool offload;
int queue;
- s64 port_rate; /* in bytes/s */
+ atomic64_t port_rate; /* in bytes/s */
s64 last; /* timestamp in ns */
s64 credits; /* in bytes */
s32 locredit; /* in bytes */
struct sk_buff **to_free);
struct sk_buff *(*dequeue)(struct Qdisc *sch);
struct Qdisc *qdisc;
+ struct list_head cbs_list;
};
static int cbs_child_enqueue(struct sk_buff *skb, struct Qdisc *sch,
s64 credits;
int len;
+ if (atomic64_read(&q->port_rate) == -1) {
+ WARN_ONCE(1, "cbs: dequeue() called with unknown port rate.");
+ return NULL;
+ }
+
if (q->credits < 0) {
credits = timediff_to_credits(now - q->last, q->idleslope);
/* As sendslope is a negative number, this will decrease the
* amount of q->credits.
*/
- credits = credits_from_len(len, q->sendslope, q->port_rate);
+ credits = credits_from_len(len, q->sendslope,
+ atomic64_read(&q->port_rate));
credits += q->credits;
q->credits = max_t(s64, credits, q->locredit);
return 0;
}
+static void cbs_set_port_rate(struct net_device *dev, struct cbs_sched_data *q)
+{
+ struct ethtool_link_ksettings ecmd;
+ int port_rate = -1;
+
+ if (!__ethtool_get_link_ksettings(dev, &ecmd) &&
+ ecmd.base.speed != SPEED_UNKNOWN)
+ port_rate = ecmd.base.speed * 1000 * BYTES_PER_KBIT;
+
+ atomic64_set(&q->port_rate, port_rate);
+ netdev_dbg(dev, "cbs: set %s's port_rate to: %lld, linkspeed: %d\n",
+ dev->name, (long long)atomic64_read(&q->port_rate),
+ ecmd.base.speed);
+}
+
+static int cbs_dev_notifier(struct notifier_block *nb, unsigned long event,
+ void *ptr)
+{
+ struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+ struct cbs_sched_data *q;
+ struct net_device *qdev;
+ bool found = false;
+
+ ASSERT_RTNL();
+
+ if (event != NETDEV_UP && event != NETDEV_CHANGE)
+ return NOTIFY_DONE;
+
+ spin_lock(&cbs_list_lock);
+ list_for_each_entry(q, &cbs_list, cbs_list) {
+ qdev = qdisc_dev(q->qdisc);
+ if (qdev == dev) {
+ found = true;
+ break;
+ }
+ }
+ spin_unlock(&cbs_list_lock);
+
+ if (found)
+ cbs_set_port_rate(dev, q);
+
+ return NOTIFY_DONE;
+}
+
static int cbs_change(struct Qdisc *sch, struct nlattr *opt,
struct netlink_ext_ack *extack)
{
qopt = nla_data(tb[TCA_CBS_PARMS]);
if (!qopt->offload) {
- struct ethtool_link_ksettings ecmd;
- s64 link_speed;
-
- if (!__ethtool_get_link_ksettings(dev, &ecmd))
- link_speed = ecmd.base.speed;
- else
- link_speed = SPEED_1000;
-
- q->port_rate = link_speed * 1000 * BYTES_PER_KBIT;
-
+ cbs_set_port_rate(dev, q);
cbs_disable_offload(dev, q);
} else {
err = cbs_enable_offload(dev, q, qopt, extack);
{
struct cbs_sched_data *q = qdisc_priv(sch);
struct net_device *dev = qdisc_dev(sch);
+ int err;
if (!opt) {
NL_SET_ERR_MSG(extack, "Missing CBS qdisc options which are mandatory");
qdisc_watchdog_init(&q->watchdog, sch);
- return cbs_change(sch, opt, extack);
+ err = cbs_change(sch, opt, extack);
+ if (err)
+ return err;
+
+ if (!q->offload) {
+ spin_lock(&cbs_list_lock);
+ list_add(&q->cbs_list, &cbs_list);
+ spin_unlock(&cbs_list_lock);
+ }
+
+ return 0;
}
static void cbs_destroy(struct Qdisc *sch)
struct cbs_sched_data *q = qdisc_priv(sch);
struct net_device *dev = qdisc_dev(sch);
- qdisc_watchdog_cancel(&q->watchdog);
+ spin_lock(&cbs_list_lock);
+ list_del(&q->cbs_list);
+ spin_unlock(&cbs_list_lock);
+ qdisc_watchdog_cancel(&q->watchdog);
cbs_disable_offload(dev, q);
if (q->qdisc)
.owner = THIS_MODULE,
};
+static struct notifier_block cbs_device_notifier = {
+ .notifier_call = cbs_dev_notifier,
+};
+
static int __init cbs_module_init(void)
{
+ int err = register_netdevice_notifier(&cbs_device_notifier);
+
+ if (err)
+ return err;
+
return register_qdisc(&cbs_qdisc_ops);
}
static void __exit cbs_module_exit(void)
{
unregister_qdisc(&cbs_qdisc_ops);
+ unregister_netdevice_notifier(&cbs_device_notifier);
}
module_init(cbs_module_init)
module_exit(cbs_module_exit)
skb = __skb_dequeue(&q->skb_bad_txq);
if (qdisc_is_percpu_stats(q)) {
qdisc_qstats_cpu_backlog_dec(q, skb);
- qdisc_qstats_atomic_qlen_dec(q);
+ qdisc_qstats_cpu_qlen_dec(q);
} else {
qdisc_qstats_backlog_dec(q, skb);
q->q.qlen--;
if (qdisc_is_percpu_stats(q)) {
qdisc_qstats_cpu_backlog_inc(q, skb);
- qdisc_qstats_atomic_qlen_inc(q);
+ qdisc_qstats_cpu_qlen_inc(q);
} else {
qdisc_qstats_backlog_inc(q, skb);
q->q.qlen++;
spin_unlock(lock);
}
-static inline int __dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
+static inline void dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
{
- while (skb) {
- struct sk_buff *next = skb->next;
-
- __skb_queue_tail(&q->gso_skb, skb);
- q->qstats.requeues++;
- qdisc_qstats_backlog_inc(q, skb);
- q->q.qlen++; /* it's still part of the queue */
+ spinlock_t *lock = NULL;
- skb = next;
+ if (q->flags & TCQ_F_NOLOCK) {
+ lock = qdisc_lock(q);
+ spin_lock(lock);
}
- __netif_schedule(q);
- return 0;
-}
-
-static inline int dev_requeue_skb_locked(struct sk_buff *skb, struct Qdisc *q)
-{
- spinlock_t *lock = qdisc_lock(q);
-
- spin_lock(lock);
while (skb) {
struct sk_buff *next = skb->next;
__skb_queue_tail(&q->gso_skb, skb);
- qdisc_qstats_cpu_requeues_inc(q);
- qdisc_qstats_cpu_backlog_inc(q, skb);
- qdisc_qstats_atomic_qlen_inc(q);
+ /* it's still part of the queue */
+ if (qdisc_is_percpu_stats(q)) {
+ qdisc_qstats_cpu_requeues_inc(q);
+ qdisc_qstats_cpu_backlog_inc(q, skb);
+ qdisc_qstats_cpu_qlen_inc(q);
+ } else {
+ q->qstats.requeues++;
+ qdisc_qstats_backlog_inc(q, skb);
+ q->q.qlen++;
+ }
skb = next;
}
- spin_unlock(lock);
-
+ if (lock)
+ spin_unlock(lock);
__netif_schedule(q);
-
- return 0;
-}
-
-static inline int dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
-{
- if (q->flags & TCQ_F_NOLOCK)
- return dev_requeue_skb_locked(skb, q);
- else
- return __dev_requeue_skb(skb, q);
}
static void try_bulk_dequeue_skb(struct Qdisc *q,
skb = __skb_dequeue(&q->gso_skb);
if (qdisc_is_percpu_stats(q)) {
qdisc_qstats_cpu_backlog_dec(q, skb);
- qdisc_qstats_atomic_qlen_dec(q);
+ qdisc_qstats_cpu_qlen_dec(q);
} else {
qdisc_qstats_backlog_dec(q, skb);
q->q.qlen--;
if (unlikely(err))
return qdisc_drop_cpu(skb, qdisc, to_free);
- qdisc_qstats_atomic_qlen_inc(qdisc);
- /* Note: skb can not be used after skb_array_produce(),
- * so we better not use qdisc_qstats_cpu_backlog_inc()
- */
- this_cpu_add(qdisc->cpu_qstats->backlog, pkt_len);
+ qdisc_update_stats_at_enqueue(qdisc, pkt_len);
return NET_XMIT_SUCCESS;
}
skb = __skb_array_consume(q);
}
if (likely(skb)) {
- qdisc_qstats_cpu_backlog_dec(qdisc, skb);
- qdisc_bstats_cpu_update(qdisc, skb);
- qdisc_qstats_atomic_qlen_dec(qdisc);
+ qdisc_update_stats_at_dequeue(qdisc, skb);
+ } else {
+ qdisc->empty = true;
}
return skb;
struct gnet_stats_queue *q = per_cpu_ptr(qdisc->cpu_qstats, i);
q->backlog = 0;
+ q->qlen = 0;
}
}
sch->enqueue = ops->enqueue;
sch->dequeue = ops->dequeue;
sch->dev_queue = dev_queue;
+ sch->empty = true;
dev_hold(dev);
refcount_set(&sch->refcnt, 1);
#include <net/pkt_cls.h>
#include <net/sch_generic.h>
+static LIST_HEAD(taprio_list);
+static DEFINE_SPINLOCK(taprio_list_lock);
+
#define TAPRIO_ALL_GATES_OPEN -1
struct sched_entry {
struct Qdisc *root;
s64 base_time;
int clockid;
- int picos_per_byte; /* Using picoseconds because for 10Gbps+
- * speeds it's sub-nanoseconds per byte
- */
+ atomic64_t picos_per_byte; /* Using picoseconds because for 10Gbps+
+ * speeds it's sub-nanoseconds per byte
+ */
size_t num_entries;
/* Protects the update side of the RCU protected current_entry */
struct list_head entries;
ktime_t (*get_time)(void);
struct hrtimer advance_timer;
+ struct list_head taprio_list;
};
static int taprio_enqueue(struct sk_buff *skb, struct Qdisc *sch,
static inline int length_to_duration(struct taprio_sched *q, int len)
{
- return (len * q->picos_per_byte) / 1000;
+ return (len * atomic64_read(&q->picos_per_byte)) / 1000;
}
static struct sk_buff *taprio_dequeue(struct Qdisc *sch)
u32 gate_mask;
int i;
+ if (atomic64_read(&q->picos_per_byte) == -1) {
+ WARN_ONCE(1, "taprio: dequeue() called with unknown picos per byte.");
+ return NULL;
+ }
+
rcu_read_lock();
entry = rcu_dereference(q->current_entry);
/* if there's no entry, it means that the schedule didn't
next->close_time = close_time;
atomic_set(&next->budget,
- (next->interval * 1000) / q->picos_per_byte);
+ (next->interval * 1000) / atomic64_read(&q->picos_per_byte));
first_run:
rcu_assign_pointer(q->current_entry, next);
first->close_time = ktime_add_ns(start, first->interval);
atomic_set(&first->budget,
- (first->interval * 1000) / q->picos_per_byte);
+ (first->interval * 1000) /
+ atomic64_read(&q->picos_per_byte));
rcu_assign_pointer(q->current_entry, NULL);
spin_unlock_irqrestore(&q->current_entry_lock, flags);
hrtimer_start(&q->advance_timer, start, HRTIMER_MODE_ABS);
}
+static void taprio_set_picos_per_byte(struct net_device *dev,
+ struct taprio_sched *q)
+{
+ struct ethtool_link_ksettings ecmd;
+ int picos_per_byte = -1;
+
+ if (!__ethtool_get_link_ksettings(dev, &ecmd) &&
+ ecmd.base.speed != SPEED_UNKNOWN)
+ picos_per_byte = div64_s64(NSEC_PER_SEC * 1000LL * 8,
+ ecmd.base.speed * 1000 * 1000);
+
+ atomic64_set(&q->picos_per_byte, picos_per_byte);
+ netdev_dbg(dev, "taprio: set %s's picos_per_byte to: %lld, linkspeed: %d\n",
+ dev->name, (long long)atomic64_read(&q->picos_per_byte),
+ ecmd.base.speed);
+}
+
+static int taprio_dev_notifier(struct notifier_block *nb, unsigned long event,
+ void *ptr)
+{
+ struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+ struct net_device *qdev;
+ struct taprio_sched *q;
+ bool found = false;
+
+ ASSERT_RTNL();
+
+ if (event != NETDEV_UP && event != NETDEV_CHANGE)
+ return NOTIFY_DONE;
+
+ spin_lock(&taprio_list_lock);
+ list_for_each_entry(q, &taprio_list, taprio_list) {
+ qdev = qdisc_dev(q->root);
+ if (qdev == dev) {
+ found = true;
+ break;
+ }
+ }
+ spin_unlock(&taprio_list_lock);
+
+ if (found)
+ taprio_set_picos_per_byte(dev, q);
+
+ return NOTIFY_DONE;
+}
+
static int taprio_change(struct Qdisc *sch, struct nlattr *opt,
struct netlink_ext_ack *extack)
{
struct taprio_sched *q = qdisc_priv(sch);
struct net_device *dev = qdisc_dev(sch);
struct tc_mqprio_qopt *mqprio = NULL;
- struct ethtool_link_ksettings ecmd;
int i, err, size;
- s64 link_speed;
ktime_t start;
err = nla_parse_nested(tb, TCA_TAPRIO_ATTR_MAX, opt,
mqprio->prio_tc_map[i]);
}
- if (!__ethtool_get_link_ksettings(dev, &ecmd))
- link_speed = ecmd.base.speed;
- else
- link_speed = SPEED_1000;
-
- q->picos_per_byte = div64_s64(NSEC_PER_SEC * 1000LL * 8,
- link_speed * 1000 * 1000);
-
+ taprio_set_picos_per_byte(dev, q);
start = taprio_get_start_time(sch);
if (!start)
return 0;
struct sched_entry *entry, *n;
unsigned int i;
+ spin_lock(&taprio_list_lock);
+ list_del(&q->taprio_list);
+ spin_unlock(&taprio_list_lock);
+
hrtimer_cancel(&q->advance_timer);
if (q->qdiscs) {
if (!opt)
return -EINVAL;
+ spin_lock(&taprio_list_lock);
+ list_add(&q->taprio_list, &taprio_list);
+ spin_unlock(&taprio_list_lock);
+
return taprio_change(sch, opt, extack);
}
.owner = THIS_MODULE,
};
+static struct notifier_block taprio_device_notifier = {
+ .notifier_call = taprio_dev_notifier,
+};
+
static int __init taprio_module_init(void)
{
+ int err = register_netdevice_notifier(&taprio_device_notifier);
+
+ if (err)
+ return err;
+
return register_qdisc(&taprio_qdisc_ops);
}
static void __exit taprio_module_exit(void)
{
unregister_qdisc(&taprio_qdisc_ops);
+ unregister_netdevice_notifier(&taprio_device_notifier);
}
module_init(taprio_module_init);
* in sctp_ulpevent_make_rcvmsg will drop the frame if we grow our
* memory usage too much
*/
- if (*sk->sk_prot_creator->memory_pressure) {
+ if (sk_under_memory_pressure(sk)) {
if (sctp_tsnmap_has_gap(map) &&
(sctp_tsnmap_get_ctsn(map) + 1) == tsn) {
pr_debug("%s: under pressure, reneging for tsn:%u\n",
__func__, tsn);
deliver = SCTP_CMD_RENEGE;
- }
+ } else {
+ sk_mem_reclaim(sk);
+ }
}
/*
if (sctp_wspace(asoc) < (int)msg_len)
sctp_prsctp_prune(asoc, sinfo, msg_len - sctp_wspace(asoc));
- if (sctp_wspace(asoc) <= 0) {
+ if (sk_under_memory_pressure(sk))
+ sk_mem_reclaim(sk);
+
+ if (sctp_wspace(asoc) <= 0 || !sk_wmem_schedule(sk, msg_len)) {
timeo = sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT);
err = sctp_wait_for_sndbuf(asoc, &timeo, msg_len);
if (err)
goto do_error;
if (signal_pending(current))
goto do_interrupted;
- if ((int)msg_len <= sctp_wspace(asoc))
+ if (sk_under_memory_pressure(sk))
+ sk_mem_reclaim(sk);
+ if ((int)msg_len <= sctp_wspace(asoc) &&
+ sk_wmem_schedule(sk, msg_len))
break;
/* Let another process have a go. Since we are going
}
static int sctp_enqueue_event(struct sctp_ulpq *ulpq,
- struct sctp_ulpevent *event)
+ struct sk_buff_head *skb_list)
{
- struct sk_buff *skb = sctp_event2skb(event);
struct sock *sk = ulpq->asoc->base.sk;
struct sctp_sock *sp = sctp_sk(sk);
- struct sk_buff_head *skb_list;
+ struct sctp_ulpevent *event;
+ struct sk_buff *skb;
- skb_list = (struct sk_buff_head *)skb->prev;
+ skb = __skb_peek(skb_list);
+ event = sctp_skb2event(skb);
if (sk->sk_shutdown & RCV_SHUTDOWN &&
(sk->sk_shutdown & SEND_SHUTDOWN ||
if (!(event->msg_flags & SCTP_DATA_UNORDERED)) {
event = sctp_intl_reasm(ulpq, event);
- if (event && event->msg_flags & MSG_EOR) {
+ if (event) {
skb_queue_head_init(&temp);
__skb_queue_tail(&temp, sctp_event2skb(event));
- event = sctp_intl_order(ulpq, event);
+ if (event->msg_flags & MSG_EOR)
+ event = sctp_intl_order(ulpq, event);
}
} else {
event = sctp_intl_reasm_uo(ulpq, event);
+ if (event) {
+ skb_queue_head_init(&temp);
+ __skb_queue_tail(&temp, sctp_event2skb(event));
+ }
}
if (event) {
event_eor = (event->msg_flags & MSG_EOR) ? 1 : 0;
- sctp_enqueue_event(ulpq, event);
+ sctp_enqueue_event(ulpq, &temp);
}
return event_eor;
static void sctp_intl_start_pd(struct sctp_ulpq *ulpq, gfp_t gfp)
{
struct sctp_ulpevent *event;
+ struct sk_buff_head temp;
if (!skb_queue_empty(&ulpq->reasm)) {
do {
event = sctp_intl_retrieve_first(ulpq);
- if (event)
- sctp_enqueue_event(ulpq, event);
+ if (event) {
+ skb_queue_head_init(&temp);
+ __skb_queue_tail(&temp, sctp_event2skb(event));
+ sctp_enqueue_event(ulpq, &temp);
+ }
} while (event);
}
if (!skb_queue_empty(&ulpq->reasm_uo)) {
do {
event = sctp_intl_retrieve_first_uo(ulpq);
- if (event)
- sctp_enqueue_event(ulpq, event);
+ if (event) {
+ skb_queue_head_init(&temp);
+ __skb_queue_tail(&temp, sctp_event2skb(event));
+ sctp_enqueue_event(ulpq, &temp);
+ }
} while (event);
}
}
if (event) {
sctp_intl_retrieve_ordered(ulpq, event);
- sctp_enqueue_event(ulpq, event);
+ sctp_enqueue_event(ulpq, &temp);
}
}
ntohl(skip->mid), skip->flags);
}
+static int do_ulpq_tail_event(struct sctp_ulpq *ulpq, struct sctp_ulpevent *event)
+{
+ struct sk_buff_head temp;
+
+ skb_queue_head_init(&temp);
+ __skb_queue_tail(&temp, sctp_event2skb(event));
+ return sctp_ulpq_tail_event(ulpq, &temp);
+}
+
static struct sctp_stream_interleave sctp_stream_interleave_0 = {
.data_chunk_len = sizeof(struct sctp_data_chunk),
.ftsn_chunk_len = sizeof(struct sctp_fwdtsn_chunk),
.assign_number = sctp_chunk_assign_ssn,
.validate_data = sctp_validate_data,
.ulpevent_data = sctp_ulpq_tail_data,
- .enqueue_event = sctp_ulpq_tail_event,
+ .enqueue_event = do_ulpq_tail_event,
.renege_events = sctp_ulpq_renege,
.start_pd = sctp_ulpq_partial_delivery,
.abort_pd = sctp_ulpq_abort_pd,
.handle_ftsn = sctp_handle_fwdtsn,
};
+static int do_sctp_enqueue_event(struct sctp_ulpq *ulpq,
+ struct sctp_ulpevent *event)
+{
+ struct sk_buff_head temp;
+
+ skb_queue_head_init(&temp);
+ __skb_queue_tail(&temp, sctp_event2skb(event));
+ return sctp_enqueue_event(ulpq, &temp);
+}
+
static struct sctp_stream_interleave sctp_stream_interleave_1 = {
.data_chunk_len = sizeof(struct sctp_idata_chunk),
.ftsn_chunk_len = sizeof(struct sctp_ifwdtsn_chunk),
.assign_number = sctp_chunk_assign_mid,
.validate_data = sctp_validate_idata,
.ulpevent_data = sctp_ulpevent_idata,
- .enqueue_event = sctp_enqueue_event,
+ .enqueue_event = do_sctp_enqueue_event,
.renege_events = sctp_renege_events,
.start_pd = sctp_intl_start_pd,
.abort_pd = sctp_intl_abort_pd,
gfp_t gfp)
{
struct sctp_ulpevent *event = NULL;
- struct sk_buff *skb;
- size_t padding, len;
+ struct sk_buff *skb = chunk->skb;
+ struct sock *sk = asoc->base.sk;
+ size_t padding, datalen;
int rx_count;
/*
if (asoc->ep->rcvbuf_policy)
rx_count = atomic_read(&asoc->rmem_alloc);
else
- rx_count = atomic_read(&asoc->base.sk->sk_rmem_alloc);
+ rx_count = atomic_read(&sk->sk_rmem_alloc);
- if (rx_count >= asoc->base.sk->sk_rcvbuf) {
+ datalen = ntohs(chunk->chunk_hdr->length);
- if ((asoc->base.sk->sk_userlocks & SOCK_RCVBUF_LOCK) ||
- (!sk_rmem_schedule(asoc->base.sk, chunk->skb,
- chunk->skb->truesize)))
- goto fail;
- }
+ if (rx_count >= sk->sk_rcvbuf || !sk_rmem_schedule(sk, skb, datalen))
+ goto fail;
/* Clone the original skb, sharing the data. */
skb = skb_clone(chunk->skb, gfp);
* The sender should never pad with more than 3 bytes. The receiver
* MUST ignore the padding bytes.
*/
- len = ntohs(chunk->chunk_hdr->length);
- padding = SCTP_PAD4(len) - len;
+ padding = SCTP_PAD4(datalen) - datalen;
/* Fixup cloned skb with just this chunks data. */
skb_trim(skb, chunk->chunk_end - padding - skb->data);
event = sctp_ulpq_reasm(ulpq, event);
/* Do ordering if needed. */
- if ((event) && (event->msg_flags & MSG_EOR)) {
+ if (event) {
/* Create a temporary list to collect chunks on. */
skb_queue_head_init(&temp);
__skb_queue_tail(&temp, sctp_event2skb(event));
- event = sctp_ulpq_order(ulpq, event);
+ if (event->msg_flags & MSG_EOR)
+ event = sctp_ulpq_order(ulpq, event);
}
/* Send event to the ULP. 'event' is the sctp_ulpevent for
*/
if (event) {
event_eor = (event->msg_flags & MSG_EOR) ? 1 : 0;
- sctp_ulpq_tail_event(ulpq, event);
+ sctp_ulpq_tail_event(ulpq, &temp);
}
return event_eor;
return sctp_clear_pd(ulpq->asoc->base.sk, ulpq->asoc);
}
-/* If the SKB of 'event' is on a list, it is the first such member
- * of that list.
- */
-int sctp_ulpq_tail_event(struct sctp_ulpq *ulpq, struct sctp_ulpevent *event)
+int sctp_ulpq_tail_event(struct sctp_ulpq *ulpq, struct sk_buff_head *skb_list)
{
struct sock *sk = ulpq->asoc->base.sk;
struct sctp_sock *sp = sctp_sk(sk);
- struct sk_buff_head *queue, *skb_list;
- struct sk_buff *skb = sctp_event2skb(event);
+ struct sctp_ulpevent *event;
+ struct sk_buff_head *queue;
+ struct sk_buff *skb;
int clear_pd = 0;
- skb_list = (struct sk_buff_head *) skb->prev;
+ skb = __skb_peek(skb_list);
+ event = sctp_skb2event(skb);
/* If the socket is just going to throw this away, do not
* even try to deliver it.
}
}
- /* If we are harvesting multiple skbs they will be
- * collected on a list.
- */
- if (skb_list)
- skb_queue_splice_tail_init(skb_list, queue);
- else
- __skb_queue_tail(queue, skb);
+ skb_queue_splice_tail_init(skb_list, queue);
/* Did we just complete partial delivery and need to get
* rolling again? Move pending data to the receive
static void sctp_ulpq_reasm_drain(struct sctp_ulpq *ulpq)
{
struct sctp_ulpevent *event = NULL;
- struct sk_buff_head temp;
if (skb_queue_empty(&ulpq->reasm))
return;
while ((event = sctp_ulpq_retrieve_reassembled(ulpq)) != NULL) {
- /* Do ordering if needed. */
- if ((event) && (event->msg_flags & MSG_EOR)) {
- skb_queue_head_init(&temp);
- __skb_queue_tail(&temp, sctp_event2skb(event));
+ struct sk_buff_head temp;
+
+ skb_queue_head_init(&temp);
+ __skb_queue_tail(&temp, sctp_event2skb(event));
+ /* Do ordering if needed. */
+ if (event->msg_flags & MSG_EOR)
event = sctp_ulpq_order(ulpq, event);
- }
/* Send event to the ULP. 'event' is the
* sctp_ulpevent for very first SKB on the temp' list.
*/
if (event)
- sctp_ulpq_tail_event(ulpq, event);
+ sctp_ulpq_tail_event(ulpq, &temp);
}
}
if (event) {
/* see if we have more ordered that we can deliver */
sctp_ulpq_retrieve_ordered(ulpq, event);
- sctp_ulpq_tail_event(ulpq, event);
+ sctp_ulpq_tail_event(ulpq, &temp);
}
}
event = sctp_ulpq_retrieve_first(ulpq);
/* Send event to the ULP. */
if (event) {
- sctp_ulpq_tail_event(ulpq, event);
+ struct sk_buff_head temp;
+
+ skb_queue_head_init(&temp);
+ __skb_queue_tail(&temp, sctp_event2skb(event));
+ sctp_ulpq_tail_event(ulpq, &temp);
sctp_ulpq_set_pd(ulpq);
return;
}
freed += sctp_ulpq_renege_frags(ulpq, needed - freed);
}
/* If able to free enough room, accept this chunk. */
- if (freed >= needed) {
+ if (sk_rmem_schedule(asoc->base.sk, chunk->skb, needed) &&
+ freed >= needed) {
int retval = sctp_ulpq_tail_data(ulpq, chunk, gfp);
/*
* Enter partial delivery if chunk has not been
smc = smc_sk(sk);
/* cleanup for a dangling non-blocking connect */
- if (smc->connect_info && sk->sk_state == SMC_INIT)
+ if (smc->connect_nonblock && sk->sk_state == SMC_INIT)
tcp_abort(smc->clcsock->sk, ECONNABORTED);
flush_work(&smc->connect_work);
- kfree(smc->connect_info);
- smc->connect_info = NULL;
if (sk->sk_state == SMC_LISTEN)
/* smc_close_non_accepted() is called and acquires
smc_switch_to_fallback(smc);
smc->fallback_rsn = reason_code;
smc_copy_sock_settings_to_clc(smc);
+ smc->connect_nonblock = 0;
if (smc->sk.sk_state == SMC_INIT)
smc->sk.sk_state = SMC_ACTIVE;
return 0;
mutex_unlock(&smc_client_lgr_pending);
smc_conn_free(&smc->conn);
+ smc->connect_nonblock = 0;
return reason_code;
}
/* check if there is a rdma device available for this connection. */
/* called for connect and listen */
-static int smc_check_rdma(struct smc_sock *smc, struct smc_ib_device **ibdev,
- u8 *ibport, unsigned short vlan_id, u8 gid[])
+static int smc_find_rdma_device(struct smc_sock *smc, struct smc_init_info *ini)
{
- int reason_code = 0;
-
/* PNET table look up: search active ib_device and port
* within same PNETID that also contains the ethernet device
* used for the internal TCP socket
*/
- smc_pnet_find_roce_resource(smc->clcsock->sk, ibdev, ibport, vlan_id,
- gid);
- if (!(*ibdev))
- reason_code = SMC_CLC_DECL_CNFERR; /* configuration error */
-
- return reason_code;
+ smc_pnet_find_roce_resource(smc->clcsock->sk, ini);
+ if (!ini->ib_dev)
+ return SMC_CLC_DECL_NOSMCRDEV;
+ return 0;
}
/* check if there is an ISM device available for this connection. */
/* called for connect and listen */
-static int smc_check_ism(struct smc_sock *smc, struct smcd_dev **ismdev)
+static int smc_find_ism_device(struct smc_sock *smc, struct smc_init_info *ini)
{
/* Find ISM device with same PNETID as connecting interface */
- smc_pnet_find_ism_resource(smc->clcsock->sk, ismdev);
- if (!(*ismdev))
- return SMC_CLC_DECL_CNFERR; /* configuration error */
+ smc_pnet_find_ism_resource(smc->clcsock->sk, ini);
+ if (!ini->ism_dev)
+ return SMC_CLC_DECL_NOSMCDDEV;
return 0;
}
/* Check for VLAN ID and register it on ISM device just for CLC handshake */
static int smc_connect_ism_vlan_setup(struct smc_sock *smc,
- struct smcd_dev *ismdev,
- unsigned short vlan_id)
+ struct smc_init_info *ini)
{
- if (vlan_id && smc_ism_get_vlan(ismdev, vlan_id))
- return SMC_CLC_DECL_CNFERR;
+ if (ini->vlan_id && smc_ism_get_vlan(ini->ism_dev, ini->vlan_id))
+ return SMC_CLC_DECL_ISMVLANERR;
return 0;
}
* used, the VLAN ID will be registered again during the connection setup.
*/
static int smc_connect_ism_vlan_cleanup(struct smc_sock *smc, bool is_smcd,
- struct smcd_dev *ismdev,
- unsigned short vlan_id)
+ struct smc_init_info *ini)
{
if (!is_smcd)
return 0;
- if (vlan_id && smc_ism_put_vlan(ismdev, vlan_id))
+ if (ini->vlan_id && smc_ism_put_vlan(ini->ism_dev, ini->vlan_id))
return SMC_CLC_DECL_CNFERR;
return 0;
}
/* CLC handshake during connect */
static int smc_connect_clc(struct smc_sock *smc, int smc_type,
struct smc_clc_msg_accept_confirm *aclc,
- struct smc_ib_device *ibdev, u8 ibport,
- u8 gid[], struct smcd_dev *ismdev)
+ struct smc_init_info *ini)
{
int rc = 0;
/* do inband token exchange */
- rc = smc_clc_send_proposal(smc, smc_type, ibdev, ibport, gid, ismdev);
+ rc = smc_clc_send_proposal(smc, smc_type, ini);
if (rc)
return rc;
/* receive SMC Accept CLC message */
/* setup for RDMA connection of client */
static int smc_connect_rdma(struct smc_sock *smc,
struct smc_clc_msg_accept_confirm *aclc,
- struct smc_ib_device *ibdev, u8 ibport)
+ struct smc_init_info *ini)
{
- int local_contact = SMC_FIRST_CONTACT;
struct smc_link *link;
int reason_code = 0;
+ ini->is_smcd = false;
+ ini->ib_lcl = &aclc->lcl;
+ ini->ib_clcqpn = ntoh24(aclc->qpn);
+ ini->srv_first_contact = aclc->hdr.flag;
+
mutex_lock(&smc_client_lgr_pending);
- local_contact = smc_conn_create(smc, false, aclc->hdr.flag, ibdev,
- ibport, ntoh24(aclc->qpn), &aclc->lcl,
- NULL, 0);
- if (local_contact < 0) {
- if (local_contact == -ENOMEM)
- reason_code = SMC_CLC_DECL_MEM;/* insufficient memory*/
- else if (local_contact == -ENOLINK)
- reason_code = SMC_CLC_DECL_SYNCERR; /* synchr. error */
- else
- reason_code = SMC_CLC_DECL_INTERR; /* other error */
+ reason_code = smc_conn_create(smc, ini);
+ if (reason_code) {
mutex_unlock(&smc_client_lgr_pending);
return reason_code;
}
/* create send buffer and rmb */
if (smc_buf_create(smc, false))
- return smc_connect_abort(smc, SMC_CLC_DECL_MEM, local_contact);
+ return smc_connect_abort(smc, SMC_CLC_DECL_MEM,
+ ini->cln_first_contact);
- if (local_contact == SMC_FIRST_CONTACT)
+ if (ini->cln_first_contact == SMC_FIRST_CONTACT)
smc_link_save_peer_info(link, aclc);
if (smc_rmb_rtoken_handling(&smc->conn, aclc))
return smc_connect_abort(smc, SMC_CLC_DECL_ERR_RTOK,
- local_contact);
+ ini->cln_first_contact);
smc_close_init(smc);
smc_rx_init(smc);
- if (local_contact == SMC_FIRST_CONTACT) {
+ if (ini->cln_first_contact == SMC_FIRST_CONTACT) {
if (smc_ib_ready_link(link))
return smc_connect_abort(smc, SMC_CLC_DECL_ERR_RDYLNK,
- local_contact);
+ ini->cln_first_contact);
} else {
if (smc_reg_rmb(link, smc->conn.rmb_desc, true))
return smc_connect_abort(smc, SMC_CLC_DECL_ERR_REGRMB,
- local_contact);
+ ini->cln_first_contact);
}
smc_rmb_sync_sg_for_device(&smc->conn);
reason_code = smc_clc_send_confirm(smc);
if (reason_code)
- return smc_connect_abort(smc, reason_code, local_contact);
+ return smc_connect_abort(smc, reason_code,
+ ini->cln_first_contact);
smc_tx_init(smc);
- if (local_contact == SMC_FIRST_CONTACT) {
+ if (ini->cln_first_contact == SMC_FIRST_CONTACT) {
/* QP confirmation over RoCE fabric */
reason_code = smc_clnt_conf_first_link(smc);
if (reason_code)
return smc_connect_abort(smc, reason_code,
- local_contact);
+ ini->cln_first_contact);
}
mutex_unlock(&smc_client_lgr_pending);
smc_copy_sock_settings_to_clc(smc);
+ smc->connect_nonblock = 0;
if (smc->sk.sk_state == SMC_INIT)
smc->sk.sk_state = SMC_ACTIVE;
/* setup for ISM connection of client */
static int smc_connect_ism(struct smc_sock *smc,
struct smc_clc_msg_accept_confirm *aclc,
- struct smcd_dev *ismdev)
+ struct smc_init_info *ini)
{
- int local_contact = SMC_FIRST_CONTACT;
int rc = 0;
+ ini->is_smcd = true;
+ ini->ism_gid = aclc->gid;
+ ini->srv_first_contact = aclc->hdr.flag;
+
/* there is only one lgr role for SMC-D; use server lock */
mutex_lock(&smc_server_lgr_pending);
- local_contact = smc_conn_create(smc, true, aclc->hdr.flag, NULL, 0, 0,
- NULL, ismdev, aclc->gid);
- if (local_contact < 0) {
+ rc = smc_conn_create(smc, ini);
+ if (rc) {
mutex_unlock(&smc_server_lgr_pending);
- return SMC_CLC_DECL_MEM;
+ return rc;
}
/* Create send and receive buffers */
if (smc_buf_create(smc, true))
- return smc_connect_abort(smc, SMC_CLC_DECL_MEM, local_contact);
+ return smc_connect_abort(smc, SMC_CLC_DECL_MEM,
+ ini->cln_first_contact);
smc_conn_save_peer_info(smc, aclc);
smc_close_init(smc);
rc = smc_clc_send_confirm(smc);
if (rc)
- return smc_connect_abort(smc, rc, local_contact);
+ return smc_connect_abort(smc, rc, ini->cln_first_contact);
mutex_unlock(&smc_server_lgr_pending);
smc_copy_sock_settings_to_clc(smc);
+ smc->connect_nonblock = 0;
if (smc->sk.sk_state == SMC_INIT)
smc->sk.sk_state = SMC_ACTIVE;
{
bool ism_supported = false, rdma_supported = false;
struct smc_clc_msg_accept_confirm aclc;
- struct smc_ib_device *ibdev;
- struct smcd_dev *ismdev;
- u8 gid[SMC_GID_SIZE];
- unsigned short vlan;
+ struct smc_init_info ini = {0};
int smc_type;
int rc = 0;
- u8 ibport;
sock_hold(&smc->sk); /* sock put in passive closing */
if (using_ipsec(smc))
return smc_connect_decline_fallback(smc, SMC_CLC_DECL_IPSEC);
- /* check for VLAN ID */
- if (smc_vlan_by_tcpsk(smc->clcsock, &vlan))
- return smc_connect_decline_fallback(smc, SMC_CLC_DECL_CNFERR);
+ /* get vlan id from IP device */
+ if (smc_vlan_by_tcpsk(smc->clcsock, &ini))
+ return smc_connect_decline_fallback(smc,
+ SMC_CLC_DECL_GETVLANERR);
/* check if there is an ism device available */
- if (!smc_check_ism(smc, &ismdev) &&
- !smc_connect_ism_vlan_setup(smc, ismdev, vlan)) {
+ if (!smc_find_ism_device(smc, &ini) &&
+ !smc_connect_ism_vlan_setup(smc, &ini)) {
/* ISM is supported for this connection */
ism_supported = true;
smc_type = SMC_TYPE_D;
}
/* check if there is a rdma device available */
- if (!smc_check_rdma(smc, &ibdev, &ibport, vlan, gid)) {
+ if (!smc_find_rdma_device(smc, &ini)) {
/* RDMA is supported for this connection */
rdma_supported = true;
if (ism_supported)
return smc_connect_decline_fallback(smc, SMC_CLC_DECL_NOSMCDEV);
/* perform CLC handshake */
- rc = smc_connect_clc(smc, smc_type, &aclc, ibdev, ibport, gid, ismdev);
+ rc = smc_connect_clc(smc, smc_type, &aclc, &ini);
if (rc) {
- smc_connect_ism_vlan_cleanup(smc, ism_supported, ismdev, vlan);
+ smc_connect_ism_vlan_cleanup(smc, ism_supported, &ini);
return smc_connect_decline_fallback(smc, rc);
}
/* depending on previous steps, connect using rdma or ism */
if (rdma_supported && aclc.hdr.path == SMC_TYPE_R)
- rc = smc_connect_rdma(smc, &aclc, ibdev, ibport);
+ rc = smc_connect_rdma(smc, &aclc, &ini);
else if (ism_supported && aclc.hdr.path == SMC_TYPE_D)
- rc = smc_connect_ism(smc, &aclc, ismdev);
+ rc = smc_connect_ism(smc, &aclc, &ini);
else
rc = SMC_CLC_DECL_MODEUNSUPP;
if (rc) {
- smc_connect_ism_vlan_cleanup(smc, ism_supported, ismdev, vlan);
+ smc_connect_ism_vlan_cleanup(smc, ism_supported, &ini);
return smc_connect_decline_fallback(smc, rc);
}
- smc_connect_ism_vlan_cleanup(smc, ism_supported, ismdev, vlan);
+ smc_connect_ism_vlan_cleanup(smc, ism_supported, &ini);
return 0;
}
{
struct smc_sock *smc = container_of(work, struct smc_sock,
connect_work);
- int rc;
+ long timeo = smc->sk.sk_sndtimeo;
+ int rc = 0;
- lock_sock(&smc->sk);
- rc = kernel_connect(smc->clcsock, &smc->connect_info->addr,
- smc->connect_info->alen, smc->connect_info->flags);
+ if (!timeo)
+ timeo = MAX_SCHEDULE_TIMEOUT;
+ lock_sock(smc->clcsock->sk);
if (smc->clcsock->sk->sk_err) {
smc->sk.sk_err = smc->clcsock->sk->sk_err;
- goto out;
- }
- if (rc < 0) {
- smc->sk.sk_err = -rc;
+ } else if ((1 << smc->clcsock->sk->sk_state) &
+ (TCPF_SYN_SENT | TCP_SYN_RECV)) {
+ rc = sk_stream_wait_connect(smc->clcsock->sk, &timeo);
+ if ((rc == -EPIPE) &&
+ ((1 << smc->clcsock->sk->sk_state) &
+ (TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)))
+ rc = 0;
+ }
+ release_sock(smc->clcsock->sk);
+ lock_sock(&smc->sk);
+ if (rc != 0 || smc->sk.sk_err) {
+ smc->sk.sk_state = SMC_CLOSED;
+ if (rc == -EPIPE || rc == -EAGAIN)
+ smc->sk.sk_err = EPIPE;
+ else if (signal_pending(current))
+ smc->sk.sk_err = -sock_intr_errno(timeo);
goto out;
}
smc->sk.sk_write_space(&smc->sk);
}
}
- kfree(smc->connect_info);
- smc->connect_info = NULL;
release_sock(&smc->sk);
}
smc_copy_sock_settings_to_clc(smc);
tcp_sk(smc->clcsock->sk)->syn_smc = 1;
+ if (smc->connect_nonblock) {
+ rc = -EALREADY;
+ goto out;
+ }
+ rc = kernel_connect(smc->clcsock, addr, alen, flags);
+ if (rc && rc != -EINPROGRESS)
+ goto out;
if (flags & O_NONBLOCK) {
- if (smc->connect_info) {
- rc = -EALREADY;
- goto out;
- }
- smc->connect_info = kzalloc(alen + 2 * sizeof(int), GFP_KERNEL);
- if (!smc->connect_info) {
- rc = -ENOMEM;
- goto out;
- }
- smc->connect_info->alen = alen;
- smc->connect_info->flags = flags ^ O_NONBLOCK;
- memcpy(&smc->connect_info->addr, addr, alen);
- schedule_work(&smc->connect_work);
+ if (schedule_work(&smc->connect_work))
+ smc->connect_nonblock = 1;
rc = -EINPROGRESS;
} else {
- rc = kernel_connect(smc->clcsock, addr, alen, flags);
- if (rc)
- goto out;
-
rc = __smc_connect(smc);
if (rc < 0)
goto out;
}
/* listen worker: check prefixes */
-static int smc_listen_rdma_check(struct smc_sock *new_smc,
+static int smc_listen_prfx_check(struct smc_sock *new_smc,
struct smc_clc_msg_proposal *pclc)
{
struct smc_clc_msg_proposal_prefix *pclc_prfx;
pclc_prfx = smc_clc_proposal_get_prefix(pclc);
if (smc_clc_prfx_match(newclcsock, pclc_prfx))
- return SMC_CLC_DECL_CNFERR;
+ return SMC_CLC_DECL_DIFFPREFIX;
return 0;
}
/* listen worker: initialize connection and buffers */
static int smc_listen_rdma_init(struct smc_sock *new_smc,
- struct smc_clc_msg_proposal *pclc,
- struct smc_ib_device *ibdev, u8 ibport,
- int *local_contact)
+ struct smc_init_info *ini)
{
+ int rc;
+
/* allocate connection / link group */
- *local_contact = smc_conn_create(new_smc, false, 0, ibdev, ibport, 0,
- &pclc->lcl, NULL, 0);
- if (*local_contact < 0) {
- if (*local_contact == -ENOMEM)
- return SMC_CLC_DECL_MEM;/* insufficient memory*/
- return SMC_CLC_DECL_INTERR; /* other error */
- }
+ rc = smc_conn_create(new_smc, ini);
+ if (rc)
+ return rc;
/* create send buffer and rmb */
if (smc_buf_create(new_smc, false))
/* listen worker: initialize connection and buffers for SMC-D */
static int smc_listen_ism_init(struct smc_sock *new_smc,
struct smc_clc_msg_proposal *pclc,
- struct smcd_dev *ismdev,
- int *local_contact)
+ struct smc_init_info *ini)
{
struct smc_clc_msg_smcd *pclc_smcd;
+ int rc;
pclc_smcd = smc_get_clc_msg_smcd(pclc);
- *local_contact = smc_conn_create(new_smc, true, 0, NULL, 0, 0, NULL,
- ismdev, pclc_smcd->gid);
- if (*local_contact < 0) {
- if (*local_contact == -ENOMEM)
- return SMC_CLC_DECL_MEM;/* insufficient memory*/
- return SMC_CLC_DECL_INTERR; /* other error */
- }
+ ini->ism_gid = pclc_smcd->gid;
+ rc = smc_conn_create(new_smc, ini);
+ if (rc)
+ return rc;
/* Check if peer can be reached via ISM device */
if (smc_ism_cantalk(new_smc->conn.lgr->peer_gid,
new_smc->conn.lgr->vlan_id,
new_smc->conn.lgr->smcd)) {
- if (*local_contact == SMC_FIRST_CONTACT)
+ if (ini->cln_first_contact == SMC_FIRST_CONTACT)
smc_lgr_forget(new_smc->conn.lgr);
smc_conn_free(&new_smc->conn);
- return SMC_CLC_DECL_CNFERR;
+ return SMC_CLC_DECL_SMCDNOTALK;
}
/* Create send and receive buffers */
if (smc_buf_create(new_smc, true)) {
- if (*local_contact == SMC_FIRST_CONTACT)
+ if (ini->cln_first_contact == SMC_FIRST_CONTACT)
smc_lgr_forget(new_smc->conn.lgr);
smc_conn_free(&new_smc->conn);
return SMC_CLC_DECL_MEM;
struct socket *newclcsock = new_smc->clcsock;
struct smc_clc_msg_accept_confirm cclc;
struct smc_clc_msg_proposal *pclc;
- struct smc_ib_device *ibdev;
+ struct smc_init_info ini = {0};
bool ism_supported = false;
- struct smcd_dev *ismdev;
u8 buf[SMC_CLC_MAX_LEN];
- int local_contact = 0;
- unsigned short vlan;
- int reason_code = 0;
int rc = 0;
- u8 ibport;
if (new_smc->listen_smc->sk.sk_state != SMC_LISTEN)
return smc_listen_out_err(new_smc);
* wait for and receive SMC Proposal CLC message
*/
pclc = (struct smc_clc_msg_proposal *)&buf;
- reason_code = smc_clc_wait_msg(new_smc, pclc, SMC_CLC_MAX_LEN,
- SMC_CLC_PROPOSAL, CLC_WAIT_TIME);
- if (reason_code) {
- smc_listen_decline(new_smc, reason_code, 0);
- return;
- }
+ rc = smc_clc_wait_msg(new_smc, pclc, SMC_CLC_MAX_LEN,
+ SMC_CLC_PROPOSAL, CLC_WAIT_TIME);
+ if (rc)
+ goto out_decl;
/* IPSec connections opt out of SMC-R optimizations */
if (using_ipsec(new_smc)) {
- smc_listen_decline(new_smc, SMC_CLC_DECL_IPSEC, 0);
- return;
+ rc = SMC_CLC_DECL_IPSEC;
+ goto out_decl;
+ }
+
+ /* check for matching IP prefix and subnet length */
+ rc = smc_listen_prfx_check(new_smc, pclc);
+ if (rc)
+ goto out_decl;
+
+ /* get vlan id from IP device */
+ if (smc_vlan_by_tcpsk(new_smc->clcsock, &ini)) {
+ rc = SMC_CLC_DECL_GETVLANERR;
+ goto out_decl;
}
mutex_lock(&smc_server_lgr_pending);
smc_tx_init(new_smc);
/* check if ISM is available */
- if ((pclc->hdr.path == SMC_TYPE_D || pclc->hdr.path == SMC_TYPE_B) &&
- !smc_check_ism(new_smc, &ismdev) &&
- !smc_listen_ism_init(new_smc, pclc, ismdev, &local_contact)) {
- ism_supported = true;
+ if (pclc->hdr.path == SMC_TYPE_D || pclc->hdr.path == SMC_TYPE_B) {
+ ini.is_smcd = true; /* prepare ISM check */
+ rc = smc_find_ism_device(new_smc, &ini);
+ if (!rc)
+ rc = smc_listen_ism_init(new_smc, pclc, &ini);
+ if (!rc)
+ ism_supported = true;
+ else if (pclc->hdr.path == SMC_TYPE_D)
+ goto out_unlock; /* skip RDMA and decline */
}
/* check if RDMA is available */
- if (!ism_supported &&
- ((pclc->hdr.path != SMC_TYPE_R && pclc->hdr.path != SMC_TYPE_B) ||
- smc_vlan_by_tcpsk(new_smc->clcsock, &vlan) ||
- smc_check_rdma(new_smc, &ibdev, &ibport, vlan, NULL) ||
- smc_listen_rdma_check(new_smc, pclc) ||
- smc_listen_rdma_init(new_smc, pclc, ibdev, ibport,
- &local_contact) ||
- smc_listen_rdma_reg(new_smc, local_contact))) {
- /* SMC not supported, decline */
- mutex_unlock(&smc_server_lgr_pending);
- smc_listen_decline(new_smc, SMC_CLC_DECL_MODEUNSUPP,
- local_contact);
- return;
+ if (!ism_supported) { /* SMC_TYPE_R or SMC_TYPE_B */
+ /* prepare RDMA check */
+ memset(&ini, 0, sizeof(ini));
+ ini.is_smcd = false;
+ ini.ib_lcl = &pclc->lcl;
+ rc = smc_find_rdma_device(new_smc, &ini);
+ if (rc) {
+ /* no RDMA device found */
+ if (pclc->hdr.path == SMC_TYPE_B)
+ /* neither ISM nor RDMA device found */
+ rc = SMC_CLC_DECL_NOSMCDEV;
+ goto out_unlock;
+ }
+ rc = smc_listen_rdma_init(new_smc, &ini);
+ if (rc)
+ goto out_unlock;
+ rc = smc_listen_rdma_reg(new_smc, ini.cln_first_contact);
+ if (rc)
+ goto out_unlock;
}
/* send SMC Accept CLC message */
- rc = smc_clc_send_accept(new_smc, local_contact);
- if (rc) {
- mutex_unlock(&smc_server_lgr_pending);
- smc_listen_decline(new_smc, rc, local_contact);
- return;
- }
+ rc = smc_clc_send_accept(new_smc, ini.cln_first_contact);
+ if (rc)
+ goto out_unlock;
/* SMC-D does not need this lock any more */
if (ism_supported)
mutex_unlock(&smc_server_lgr_pending);
/* receive SMC Confirm CLC message */
- reason_code = smc_clc_wait_msg(new_smc, &cclc, sizeof(cclc),
- SMC_CLC_CONFIRM, CLC_WAIT_TIME);
- if (reason_code) {
+ rc = smc_clc_wait_msg(new_smc, &cclc, sizeof(cclc),
+ SMC_CLC_CONFIRM, CLC_WAIT_TIME);
+ if (rc) {
if (!ism_supported)
- mutex_unlock(&smc_server_lgr_pending);
- smc_listen_decline(new_smc, reason_code, local_contact);
- return;
+ goto out_unlock;
+ goto out_decl;
}
/* finish worker */
if (!ism_supported) {
- rc = smc_listen_rdma_finish(new_smc, &cclc, local_contact);
+ rc = smc_listen_rdma_finish(new_smc, &cclc,
+ ini.cln_first_contact);
mutex_unlock(&smc_server_lgr_pending);
if (rc)
return;
}
smc_conn_save_peer_info(new_smc, &cclc);
smc_listen_out_connected(new_smc);
+ return;
+
+out_unlock:
+ mutex_unlock(&smc_server_lgr_pending);
+out_decl:
+ smc_listen_decline(new_smc, rc, ini.cln_first_contact);
}
static void smc_tcp_listen_work(struct work_struct *work)
poll_table *wait)
{
struct sock *sk = sock->sk;
- __poll_t mask = 0;
struct smc_sock *smc;
+ __poll_t mask = 0;
if (!sk)
return EPOLLNVAL;
/* delegate to CLC child sock */
mask = smc->clcsock->ops->poll(file, smc->clcsock, wait);
sk->sk_err = smc->clcsock->sk->sk_err;
- if (sk->sk_err)
- mask |= EPOLLERR;
} else {
if (sk->sk_state != SMC_CLOSED)
sock_poll_wait(file, sock, wait);
mask |= EPOLLHUP;
if (sk->sk_state == SMC_LISTEN) {
/* woken up by sk_data_ready in smc_listen_work() */
- mask = smc_accept_poll(sk);
+ mask |= smc_accept_poll(sk);
+ } else if (smc->use_fallback) { /* as result of connect_work()*/
+ mask |= smc->clcsock->ops->poll(file, smc->clcsock,
+ wait);
+ sk->sk_err = smc->clcsock->sk->sk_err;
} else {
- if (atomic_read(&smc->conn.sndbuf_space) ||
+ if ((sk->sk_state != SMC_INIT &&
+ atomic_read(&smc->conn.sndbuf_space)) ||
sk->sk_shutdown & SEND_SHUTDOWN) {
mask |= EPOLLOUT | EPOLLWRNORM;
} else {
u64 peer_token; /* SMC-D token of peer */
};
-struct smc_connect_info {
- int flags;
- int alen;
- struct sockaddr addr;
-};
-
struct smc_sock { /* smc sock container */
struct sock sk;
struct socket *clcsock; /* internal tcp socket */
struct smc_connection conn; /* smc connection */
struct smc_sock *listen_smc; /* listen parent */
- struct smc_connect_info *connect_info; /* connect address & flags */
struct work_struct connect_work; /* handle non-blocking connect*/
struct work_struct tcp_listen_work;/* handle tcp socket accepts */
struct work_struct smc_listen_work;/* prepare new accept socket */
* started, waiting for unsent
* data to be sent
*/
+ u8 connect_nonblock : 1;
+ /* non-blocking connect in
+ * flight
+ */
struct mutex clcsock_release_lock;
/* protects clcsock of a listen
* socket
/* send CLC PROPOSAL message across internal TCP socket */
int smc_clc_send_proposal(struct smc_sock *smc, int smc_type,
- struct smc_ib_device *ibdev, u8 ibport, u8 gid[],
- struct smcd_dev *ismdev)
+ struct smc_init_info *ini)
{
struct smc_clc_ipv6_prefix ipv6_prfx[SMC_CLC_MAX_V6_PREFIX];
struct smc_clc_msg_proposal_prefix pclc_prfx;
/* add SMC-R specifics */
memcpy(pclc.lcl.id_for_peer, local_systemid,
sizeof(local_systemid));
- memcpy(&pclc.lcl.gid, gid, SMC_GID_SIZE);
- memcpy(&pclc.lcl.mac, &ibdev->mac[ibport - 1], ETH_ALEN);
+ memcpy(&pclc.lcl.gid, ini->ib_gid, SMC_GID_SIZE);
+ memcpy(&pclc.lcl.mac, &ini->ib_dev->mac[ini->ib_port - 1],
+ ETH_ALEN);
pclc.iparea_offset = htons(0);
}
if (smc_type == SMC_TYPE_D || smc_type == SMC_TYPE_B) {
memset(&pclc_smcd, 0, sizeof(pclc_smcd));
plen += sizeof(pclc_smcd);
pclc.iparea_offset = htons(SMC_CLC_PROPOSAL_MAX_OFFSET);
- pclc_smcd.gid = ismdev->local_gid;
+ pclc_smcd.gid = ini->ism_dev->local_gid;
}
pclc.hdr.length = htons(plen);
#define SMC_CLC_DECL_CNFERR 0x03000000 /* configuration error */
#define SMC_CLC_DECL_PEERNOSMC 0x03010000 /* peer did not indicate SMC */
#define SMC_CLC_DECL_IPSEC 0x03020000 /* IPsec usage */
-#define SMC_CLC_DECL_NOSMCDEV 0x03030000 /* no SMC device found */
+#define SMC_CLC_DECL_NOSMCDEV 0x03030000 /* no SMC device found (R or D) */
+#define SMC_CLC_DECL_NOSMCDDEV 0x03030001 /* no SMC-D device found */
+#define SMC_CLC_DECL_NOSMCRDEV 0x03030002 /* no SMC-R device found */
+#define SMC_CLC_DECL_SMCDNOTALK 0x03030003 /* SMC-D dev can't talk to peer */
#define SMC_CLC_DECL_MODEUNSUPP 0x03040000 /* smc modes do not match (R or D)*/
#define SMC_CLC_DECL_RMBE_EC 0x03050000 /* peer has eyecatcher in RMBE */
#define SMC_CLC_DECL_OPTUNSUPP 0x03060000 /* fastopen sockopt not supported */
+#define SMC_CLC_DECL_DIFFPREFIX 0x03070000 /* IP prefix / subnet mismatch */
+#define SMC_CLC_DECL_GETVLANERR 0x03080000 /* err to get vlan id of ip device*/
+#define SMC_CLC_DECL_ISMVLANERR 0x03090000 /* err to reg vlan id on ism dev */
#define SMC_CLC_DECL_SYNCERR 0x04000000 /* synchronization error */
#define SMC_CLC_DECL_PEERDECL 0x05000000 /* peer declined during handshake */
-#define SMC_CLC_DECL_INTERR 0x99990000 /* internal error */
-#define SMC_CLC_DECL_ERR_RTOK 0x99990001 /* rtoken handling failed */
-#define SMC_CLC_DECL_ERR_RDYLNK 0x99990002 /* ib ready link failed */
-#define SMC_CLC_DECL_ERR_REGRMB 0x99990003 /* reg rmb failed */
+#define SMC_CLC_DECL_INTERR 0x09990000 /* internal error */
+#define SMC_CLC_DECL_ERR_RTOK 0x09990001 /* rtoken handling failed */
+#define SMC_CLC_DECL_ERR_RDYLNK 0x09990002 /* ib ready link failed */
+#define SMC_CLC_DECL_ERR_REGRMB 0x09990003 /* reg rmb failed */
struct smc_clc_msg_hdr { /* header1 of clc messages */
u8 eyecatcher[4]; /* eye catcher */
}
struct smcd_dev;
+struct smc_init_info;
int smc_clc_prfx_match(struct socket *clcsock,
struct smc_clc_msg_proposal_prefix *prop);
u8 expected_type, unsigned long timeout);
int smc_clc_send_decline(struct smc_sock *smc, u32 peer_diag_info);
int smc_clc_send_proposal(struct smc_sock *smc, int smc_type,
- struct smc_ib_device *smcibdev, u8 ibport, u8 gid[],
- struct smcd_dev *ismdev);
+ struct smc_init_info *ini);
int smc_clc_send_confirm(struct smc_sock *smc);
int smc_clc_send_accept(struct smc_sock *smc, int srv_first_contact);
}
/* create a new SMC link group */
-static int smc_lgr_create(struct smc_sock *smc, bool is_smcd,
- struct smc_ib_device *smcibdev, u8 ibport,
- char *peer_systemid, unsigned short vlan_id,
- struct smcd_dev *smcismdev, u64 peer_gid)
+static int smc_lgr_create(struct smc_sock *smc, struct smc_init_info *ini)
{
struct smc_link_group *lgr;
struct smc_link *lnk;
int rc = 0;
int i;
- if (is_smcd && vlan_id) {
- rc = smc_ism_get_vlan(smcismdev, vlan_id);
- if (rc)
+ if (ini->is_smcd && ini->vlan_id) {
+ if (smc_ism_get_vlan(ini->ism_dev, ini->vlan_id)) {
+ rc = SMC_CLC_DECL_ISMVLANERR;
goto out;
+ }
}
lgr = kzalloc(sizeof(*lgr), GFP_KERNEL);
if (!lgr) {
- rc = -ENOMEM;
+ rc = SMC_CLC_DECL_MEM;
goto out;
}
- lgr->is_smcd = is_smcd;
+ lgr->is_smcd = ini->is_smcd;
lgr->sync_err = 0;
- lgr->vlan_id = vlan_id;
+ lgr->vlan_id = ini->vlan_id;
rwlock_init(&lgr->sndbufs_lock);
rwlock_init(&lgr->rmbs_lock);
rwlock_init(&lgr->conns_lock);
memcpy(&lgr->id, (u8 *)&smc_lgr_list.num, SMC_LGR_ID_SIZE);
INIT_DELAYED_WORK(&lgr->free_work, smc_lgr_free_work);
lgr->conns_all = RB_ROOT;
- if (is_smcd) {
+ if (ini->is_smcd) {
/* SMC-D specific settings */
- lgr->peer_gid = peer_gid;
- lgr->smcd = smcismdev;
+ lgr->peer_gid = ini->ism_gid;
+ lgr->smcd = ini->ism_dev;
} else {
/* SMC-R specific settings */
lgr->role = smc->listen_smc ? SMC_SERV : SMC_CLNT;
- memcpy(lgr->peer_systemid, peer_systemid, SMC_SYSTEMID_LEN);
+ memcpy(lgr->peer_systemid, ini->ib_lcl->id_for_peer,
+ SMC_SYSTEMID_LEN);
lnk = &lgr->lnk[SMC_SINGLE_LINK];
/* initialize link */
lnk->state = SMC_LNK_ACTIVATING;
lnk->link_id = SMC_SINGLE_LINK;
- lnk->smcibdev = smcibdev;
- lnk->ibport = ibport;
- lnk->path_mtu = smcibdev->pattr[ibport - 1].active_mtu;
- if (!smcibdev->initialized)
- smc_ib_setup_per_ibdev(smcibdev);
+ lnk->smcibdev = ini->ib_dev;
+ lnk->ibport = ini->ib_port;
+ lnk->path_mtu =
+ ini->ib_dev->pattr[ini->ib_port - 1].active_mtu;
+ if (!ini->ib_dev->initialized)
+ smc_ib_setup_per_ibdev(ini->ib_dev);
get_random_bytes(rndvec, sizeof(rndvec));
lnk->psn_initial = rndvec[0] + (rndvec[1] << 8) +
(rndvec[2] << 16);
rc = smc_ib_determine_gid(lnk->smcibdev, lnk->ibport,
- vlan_id, lnk->gid, &lnk->sgid_index);
+ ini->vlan_id, lnk->gid,
+ &lnk->sgid_index);
if (rc)
goto free_lgr;
rc = smc_llc_link_init(lnk);
free_lgr:
kfree(lgr);
out:
+ if (rc < 0) {
+ if (rc == -ENOMEM)
+ rc = SMC_CLC_DECL_MEM;
+ else
+ rc = SMC_CLC_DECL_INTERR;
+ }
return rc;
}
/* Determine vlan of internal TCP socket.
* @vlan_id: address to store the determined vlan id into
*/
-int smc_vlan_by_tcpsk(struct socket *clcsock, unsigned short *vlan_id)
+int smc_vlan_by_tcpsk(struct socket *clcsock, struct smc_init_info *ini)
{
struct dst_entry *dst = sk_dst_get(clcsock->sk);
struct net_device *ndev;
int i, nest_lvl, rc = 0;
- *vlan_id = 0;
+ ini->vlan_id = 0;
if (!dst) {
rc = -ENOTCONN;
goto out;
ndev = dst->dev;
if (is_vlan_dev(ndev)) {
- *vlan_id = vlan_dev_vlan_id(ndev);
+ ini->vlan_id = vlan_dev_vlan_id(ndev);
goto out_rel;
}
lower = lower->next;
ndev = (struct net_device *)netdev_lower_get_next(ndev, &lower);
if (is_vlan_dev(ndev)) {
- *vlan_id = vlan_dev_vlan_id(ndev);
+ ini->vlan_id = vlan_dev_vlan_id(ndev);
break;
}
}
}
/* create a new SMC connection (and a new link group if necessary) */
-int smc_conn_create(struct smc_sock *smc, bool is_smcd, int srv_first_contact,
- struct smc_ib_device *smcibdev, u8 ibport, u32 clcqpn,
- struct smc_clc_msg_local *lcl, struct smcd_dev *smcd,
- u64 peer_gid)
+int smc_conn_create(struct smc_sock *smc, struct smc_init_info *ini)
{
struct smc_connection *conn = &smc->conn;
- int local_contact = SMC_FIRST_CONTACT;
struct smc_link_group *lgr;
- unsigned short vlan_id;
enum smc_lgr_role role;
int rc = 0;
+ ini->cln_first_contact = SMC_FIRST_CONTACT;
role = smc->listen_smc ? SMC_SERV : SMC_CLNT;
- rc = smc_vlan_by_tcpsk(smc->clcsock, &vlan_id);
- if (rc)
- return rc;
-
- if ((role == SMC_CLNT) && srv_first_contact)
+ if (role == SMC_CLNT && ini->srv_first_contact)
/* create new link group as well */
goto create;
spin_lock_bh(&smc_lgr_list.lock);
list_for_each_entry(lgr, &smc_lgr_list.list, list) {
write_lock_bh(&lgr->conns_lock);
- if ((is_smcd ? smcd_lgr_match(lgr, smcd, peer_gid) :
- smcr_lgr_match(lgr, lcl, role, clcqpn)) &&
+ if ((ini->is_smcd ?
+ smcd_lgr_match(lgr, ini->ism_dev, ini->ism_gid) :
+ smcr_lgr_match(lgr, ini->ib_lcl, role, ini->ib_clcqpn)) &&
!lgr->sync_err &&
- lgr->vlan_id == vlan_id &&
+ lgr->vlan_id == ini->vlan_id &&
(role == SMC_CLNT ||
lgr->conns_num < SMC_RMBS_PER_LGR_MAX)) {
/* link group found */
- local_contact = SMC_REUSE_CONTACT;
+ ini->cln_first_contact = SMC_REUSE_CONTACT;
conn->lgr = lgr;
smc_lgr_register_conn(conn); /* add smc conn to lgr */
if (delayed_work_pending(&lgr->free_work))
}
spin_unlock_bh(&smc_lgr_list.lock);
- if (role == SMC_CLNT && !srv_first_contact &&
- (local_contact == SMC_FIRST_CONTACT)) {
+ if (role == SMC_CLNT && !ini->srv_first_contact &&
+ ini->cln_first_contact == SMC_FIRST_CONTACT) {
/* Server reuses a link group, but Client wants to start
* a new one
* send out_of_sync decline, reason synchr. error
*/
- return -ENOLINK;
+ return SMC_CLC_DECL_SYNCERR;
}
create:
- if (local_contact == SMC_FIRST_CONTACT) {
- rc = smc_lgr_create(smc, is_smcd, smcibdev, ibport,
- lcl->id_for_peer, vlan_id, smcd, peer_gid);
+ if (ini->cln_first_contact == SMC_FIRST_CONTACT) {
+ rc = smc_lgr_create(smc, ini);
if (rc)
goto out;
smc_lgr_register_conn(conn); /* add smc conn to lgr */
conn->local_tx_ctrl.common.type = SMC_CDC_MSG_TYPE;
conn->local_tx_ctrl.len = SMC_WR_TX_SIZE;
conn->urg_state = SMC_URG_READ;
- if (is_smcd) {
+ if (ini->is_smcd) {
conn->rx_off = sizeof(struct smcd_cdc_msg);
smcd_cdc_rx_init(conn); /* init tasklet for this conn */
}
#endif
out:
- return rc ? rc : local_contact;
+ return rc;
}
/* convert the RMB size into the compressed notation - minimum 16K.
};
};
+struct smc_clc_msg_local;
+
+struct smc_init_info {
+ u8 is_smcd;
+ unsigned short vlan_id;
+ int srv_first_contact;
+ int cln_first_contact;
+ /* SMC-R */
+ struct smc_clc_msg_local *ib_lcl;
+ struct smc_ib_device *ib_dev;
+ u8 ib_gid[SMC_GID_SIZE];
+ u8 ib_port;
+ u32 ib_clcqpn;
+ /* SMC-D */
+ u64 ism_gid;
+ struct smcd_dev *ism_dev;
+};
+
/* Find the connection associated with the given alert token in the link group.
* To use rbtrees we have to implement our own search core.
* Requires @conns_lock
void smc_sndbuf_sync_sg_for_device(struct smc_connection *conn);
void smc_rmb_sync_sg_for_cpu(struct smc_connection *conn);
void smc_rmb_sync_sg_for_device(struct smc_connection *conn);
-int smc_vlan_by_tcpsk(struct socket *clcsock, unsigned short *vlan_id);
+int smc_vlan_by_tcpsk(struct socket *clcsock, struct smc_init_info *ini);
void smc_conn_free(struct smc_connection *conn);
-int smc_conn_create(struct smc_sock *smc, bool is_smcd, int srv_first_contact,
- struct smc_ib_device *smcibdev, u8 ibport, u32 clcqpn,
- struct smc_clc_msg_local *lcl, struct smcd_dev *smcd,
- u64 peer_gid);
+int smc_conn_create(struct smc_sock *smc, struct smc_init_info *ini);
void smcd_conn_free(struct smc_connection *conn);
void smc_lgr_schedule_free_work_fast(struct smc_link_group *lgr);
void smc_core_exit(void);
#include "smc_pnet.h"
#include "smc_ib.h"
#include "smc_ism.h"
+#include "smc_core.h"
#define SMC_ASCII_BLANK 32
{
.cmd = SMC_PNETID_GET,
.flags = GENL_ADMIN_PERM,
- .policy = smc_pnet_policy,
.doit = smc_pnet_get,
.dumpit = smc_pnet_dump,
.start = smc_pnet_dump_start
{
.cmd = SMC_PNETID_ADD,
.flags = GENL_ADMIN_PERM,
- .policy = smc_pnet_policy,
.doit = smc_pnet_add
},
{
.cmd = SMC_PNETID_DEL,
.flags = GENL_ADMIN_PERM,
- .policy = smc_pnet_policy,
.doit = smc_pnet_del
},
{
.cmd = SMC_PNETID_FLUSH,
.flags = GENL_ADMIN_PERM,
- .policy = smc_pnet_policy,
.doit = smc_pnet_flush
}
};
.name = SMCR_GENL_FAMILY_NAME,
.version = SMCR_GENL_FAMILY_VERSION,
.maxattr = SMC_PNETID_MAX,
+ .policy = smc_pnet_policy,
.netnsok = true,
.module = THIS_MODULE,
.ops = smc_pnet_ops,
* IB device and port
*/
static void smc_pnet_find_rdma_dev(struct net_device *netdev,
- struct smc_ib_device **smcibdev,
- u8 *ibport, unsigned short vlan_id, u8 gid[])
+ struct smc_init_info *ini)
{
struct smc_ib_device *ibdev;
dev_put(ndev);
if (netdev == ndev &&
smc_ib_port_active(ibdev, i) &&
- !smc_ib_determine_gid(ibdev, i, vlan_id, gid,
- NULL)) {
- *smcibdev = ibdev;
- *ibport = i;
+ !smc_ib_determine_gid(ibdev, i, ini->vlan_id,
+ ini->ib_gid, NULL)) {
+ ini->ib_dev = ibdev;
+ ini->ib_port = i;
break;
}
}
* If nothing found, try to use handshake device
*/
static void smc_pnet_find_roce_by_pnetid(struct net_device *ndev,
- struct smc_ib_device **smcibdev,
- u8 *ibport, unsigned short vlan_id,
- u8 gid[])
+ struct smc_init_info *ini)
{
u8 ndev_pnetid[SMC_MAX_PNETID_LEN];
struct smc_ib_device *ibdev;
if (smc_pnetid_by_dev_port(ndev->dev.parent, ndev->dev_port,
ndev_pnetid) &&
smc_pnet_find_ndev_pnetid_by_table(ndev, ndev_pnetid)) {
- smc_pnet_find_rdma_dev(ndev, smcibdev, ibport, vlan_id, gid);
+ smc_pnet_find_rdma_dev(ndev, ini);
return; /* pnetid could not be determined */
}
continue;
if (smc_pnet_match(ibdev->pnetid[i - 1], ndev_pnetid) &&
smc_ib_port_active(ibdev, i) &&
- !smc_ib_determine_gid(ibdev, i, vlan_id, gid,
- NULL)) {
- *smcibdev = ibdev;
- *ibport = i;
+ !smc_ib_determine_gid(ibdev, i, ini->vlan_id,
+ ini->ib_gid, NULL)) {
+ ini->ib_dev = ibdev;
+ ini->ib_port = i;
goto out;
}
}
}
static void smc_pnet_find_ism_by_pnetid(struct net_device *ndev,
- struct smcd_dev **smcismdev)
+ struct smc_init_info *ini)
{
u8 ndev_pnetid[SMC_MAX_PNETID_LEN];
struct smcd_dev *ismdev;
spin_lock(&smcd_dev_list.lock);
list_for_each_entry(ismdev, &smcd_dev_list.list, list) {
if (smc_pnet_match(ismdev->pnetid, ndev_pnetid)) {
- *smcismdev = ismdev;
+ ini->ism_dev = ismdev;
break;
}
}
* determine ib_device and port belonging to used internal TCP socket
* ethernet interface.
*/
-void smc_pnet_find_roce_resource(struct sock *sk,
- struct smc_ib_device **smcibdev, u8 *ibport,
- unsigned short vlan_id, u8 gid[])
+void smc_pnet_find_roce_resource(struct sock *sk, struct smc_init_info *ini)
{
struct dst_entry *dst = sk_dst_get(sk);
- *smcibdev = NULL;
- *ibport = 0;
-
+ ini->ib_dev = NULL;
+ ini->ib_port = 0;
if (!dst)
goto out;
if (!dst->dev)
goto out_rel;
- smc_pnet_find_roce_by_pnetid(dst->dev, smcibdev, ibport, vlan_id, gid);
+ smc_pnet_find_roce_by_pnetid(dst->dev, ini);
out_rel:
dst_release(dst);
return;
}
-void smc_pnet_find_ism_resource(struct sock *sk, struct smcd_dev **smcismdev)
+void smc_pnet_find_ism_resource(struct sock *sk, struct smc_init_info *ini)
{
struct dst_entry *dst = sk_dst_get(sk);
- *smcismdev = NULL;
+ ini->ism_dev = NULL;
if (!dst)
goto out;
if (!dst->dev)
goto out_rel;
- smc_pnet_find_ism_by_pnetid(dst->dev, smcismdev);
+ smc_pnet_find_ism_by_pnetid(dst->dev, ini);
out_rel:
dst_release(dst);
struct smc_ib_device;
struct smcd_dev;
+struct smc_init_info;
/**
* struct smc_pnettable - SMC PNET table anchor
int smc_pnet_net_init(struct net *net);
void smc_pnet_exit(void);
void smc_pnet_net_exit(struct net *net);
-void smc_pnet_find_roce_resource(struct sock *sk,
- struct smc_ib_device **smcibdev, u8 *ibport,
- unsigned short vlan_id, u8 gid[]);
-void smc_pnet_find_ism_resource(struct sock *sk, struct smcd_dev **smcismdev);
+void smc_pnet_find_roce_resource(struct sock *sk, struct smc_init_info *ini);
+void smc_pnet_find_ism_resource(struct sock *sk, struct smc_init_info *ini);
#endif
break;
}
- /* Positive extra indicates ore bytes than needed for the
+ /* Positive extra indicates more bytes than needed for the
* message
*/
* @dests: array keeping number of reachable destinations per bearer
* @primary_bearer: a bearer having links to all broadcast destinations, if any
* @bcast_support: indicates if primary bearer, if any, supports broadcast
+ * @force_bcast: forces broadcast for multicast traffic
* @rcast_support: indicates if all peer nodes support replicast
+ * @force_rcast: forces replicast for multicast traffic
* @rc_ratio: dest count as percentage of cluster size where send method changes
* @bc_threshold: calculated from rc_ratio; if dests > threshold use broadcast
*/
int dests[MAX_BEARERS];
int primary_bearer;
bool bcast_support;
+ bool force_bcast;
bool rcast_support;
+ bool force_rcast;
int rc_ratio;
int bc_threshold;
};
}
/* Can current method be changed ? */
method->expires = jiffies + TIPC_METHOD_EXPIRE;
- if (method->mandatory || time_before(jiffies, exp))
+ if (method->mandatory)
return;
+ if (!(tipc_net(net)->capabilities & TIPC_MCAST_RBCTL) &&
+ time_before(jiffies, exp))
+ return;
+
+ /* Configuration as force 'broadcast' method */
+ if (bb->force_bcast) {
+ method->rcast = false;
+ return;
+ }
+ /* Configuration as force 'replicast' method */
+ if (bb->force_rcast) {
+ method->rcast = true;
+ return;
+ }
+ /* Configuration as 'autoselect' or default method */
/* Determine method to use now */
method->rcast = dests <= bb->bc_threshold;
}
return 0;
}
+/* tipc_mcast_send_sync - deliver a dummy message with SYN bit
+ * @net: the applicable net namespace
+ * @skb: socket buffer to copy
+ * @method: send method to be used
+ * @dests: destination nodes for message.
+ * @cong_link_cnt: returns number of encountered congested destination links
+ * Returns 0 if success, otherwise errno
+ */
+static int tipc_mcast_send_sync(struct net *net, struct sk_buff *skb,
+ struct tipc_mc_method *method,
+ struct tipc_nlist *dests,
+ u16 *cong_link_cnt)
+{
+ struct tipc_msg *hdr, *_hdr;
+ struct sk_buff_head tmpq;
+ struct sk_buff *_skb;
+
+ /* Is a cluster supporting with new capabilities ? */
+ if (!(tipc_net(net)->capabilities & TIPC_MCAST_RBCTL))
+ return 0;
+
+ hdr = buf_msg(skb);
+ if (msg_user(hdr) == MSG_FRAGMENTER)
+ hdr = msg_get_wrapped(hdr);
+ if (msg_type(hdr) != TIPC_MCAST_MSG)
+ return 0;
+
+ /* Allocate dummy message */
+ _skb = tipc_buf_acquire(MCAST_H_SIZE, GFP_KERNEL);
+ if (!_skb)
+ return -ENOMEM;
+
+ /* Preparing for 'synching' header */
+ msg_set_syn(hdr, 1);
+
+ /* Copy skb's header into a dummy header */
+ skb_copy_to_linear_data(_skb, hdr, MCAST_H_SIZE);
+ skb_orphan(_skb);
+
+ /* Reverse method for dummy message */
+ _hdr = buf_msg(_skb);
+ msg_set_size(_hdr, MCAST_H_SIZE);
+ msg_set_is_rcast(_hdr, !msg_is_rcast(hdr));
+
+ skb_queue_head_init(&tmpq);
+ __skb_queue_tail(&tmpq, _skb);
+ if (method->rcast)
+ tipc_bcast_xmit(net, &tmpq, cong_link_cnt);
+ else
+ tipc_rcast_xmit(net, &tmpq, dests, cong_link_cnt);
+
+ /* This queue should normally be empty by now */
+ __skb_queue_purge(&tmpq);
+
+ return 0;
+}
+
/* tipc_mcast_xmit - deliver message to indicated destination nodes
* and to identified node local sockets
* @net: the applicable net namespace
u16 *cong_link_cnt)
{
struct sk_buff_head inputq, localq;
+ bool rcast = method->rcast;
+ struct tipc_msg *hdr;
+ struct sk_buff *skb;
int rc = 0;
skb_queue_head_init(&inputq);
/* Send according to determined transmit method */
if (dests->remote) {
tipc_bcast_select_xmit_method(net, dests->remote, method);
+
+ skb = skb_peek(pkts);
+ hdr = buf_msg(skb);
+ if (msg_user(hdr) == MSG_FRAGMENTER)
+ hdr = msg_get_wrapped(hdr);
+ msg_set_is_rcast(hdr, method->rcast);
+
+ /* Switch method ? */
+ if (rcast != method->rcast)
+ tipc_mcast_send_sync(net, skb, method,
+ dests, cong_link_cnt);
+
if (method->rcast)
rc = tipc_rcast_xmit(net, pkts, dests, cong_link_cnt);
else
return 0;
}
+static int tipc_bc_link_set_broadcast_mode(struct net *net, u32 bc_mode)
+{
+ struct tipc_bc_base *bb = tipc_bc_base(net);
+
+ switch (bc_mode) {
+ case BCLINK_MODE_BCAST:
+ if (!bb->bcast_support)
+ return -ENOPROTOOPT;
+
+ bb->force_bcast = true;
+ bb->force_rcast = false;
+ break;
+ case BCLINK_MODE_RCAST:
+ if (!bb->rcast_support)
+ return -ENOPROTOOPT;
+
+ bb->force_bcast = false;
+ bb->force_rcast = true;
+ break;
+ case BCLINK_MODE_SEL:
+ if (!bb->bcast_support || !bb->rcast_support)
+ return -ENOPROTOOPT;
+
+ bb->force_bcast = false;
+ bb->force_rcast = false;
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int tipc_bc_link_set_broadcast_ratio(struct net *net, u32 bc_ratio)
+{
+ struct tipc_bc_base *bb = tipc_bc_base(net);
+
+ if (!bb->bcast_support || !bb->rcast_support)
+ return -ENOPROTOOPT;
+
+ if (bc_ratio > 100 || bc_ratio <= 0)
+ return -EINVAL;
+
+ bb->rc_ratio = bc_ratio;
+ tipc_bcast_lock(net);
+ tipc_bcbase_calc_bc_threshold(net);
+ tipc_bcast_unlock(net);
+
+ return 0;
+}
+
int tipc_nl_bc_link_set(struct net *net, struct nlattr *attrs[])
{
int err;
u32 win;
+ u32 bc_mode;
+ u32 bc_ratio;
struct nlattr *props[TIPC_NLA_PROP_MAX + 1];
if (!attrs[TIPC_NLA_LINK_PROP])
if (err)
return err;
- if (!props[TIPC_NLA_PROP_WIN])
+ if (!props[TIPC_NLA_PROP_WIN] &&
+ !props[TIPC_NLA_PROP_BROADCAST] &&
+ !props[TIPC_NLA_PROP_BROADCAST_RATIO]) {
return -EOPNOTSUPP;
+ }
+
+ if (props[TIPC_NLA_PROP_BROADCAST]) {
+ bc_mode = nla_get_u32(props[TIPC_NLA_PROP_BROADCAST]);
+ err = tipc_bc_link_set_broadcast_mode(net, bc_mode);
+ }
+
+ if (!err && props[TIPC_NLA_PROP_BROADCAST_RATIO]) {
+ bc_ratio = nla_get_u32(props[TIPC_NLA_PROP_BROADCAST_RATIO]);
+ err = tipc_bc_link_set_broadcast_ratio(net, bc_ratio);
+ }
- win = nla_get_u32(props[TIPC_NLA_PROP_WIN]);
+ if (!err && props[TIPC_NLA_PROP_WIN]) {
+ win = nla_get_u32(props[TIPC_NLA_PROP_WIN]);
+ err = tipc_bc_link_set_queue_limits(net, win);
+ }
- return tipc_bc_link_set_queue_limits(net, win);
+ return err;
}
int tipc_bcast_init(struct net *net)
goto enomem;
bb->link = l;
tn->bcl = l;
- bb->rc_ratio = 25;
+ bb->rc_ratio = 10;
bb->rcast_support = true;
return 0;
enomem:
nl->remote = 0;
nl->local = false;
}
+
+u32 tipc_bcast_get_broadcast_mode(struct net *net)
+{
+ struct tipc_bc_base *bb = tipc_bc_base(net);
+
+ if (bb->force_bcast)
+ return BCLINK_MODE_BCAST;
+
+ if (bb->force_rcast)
+ return BCLINK_MODE_RCAST;
+
+ if (bb->bcast_support && bb->rcast_support)
+ return BCLINK_MODE_SEL;
+
+ return 0;
+}
+
+u32 tipc_bcast_get_broadcast_ratio(struct net *net)
+{
+ struct tipc_bc_base *bb = tipc_bc_base(net);
+
+ return bb->rc_ratio;
+}
+
+void tipc_mcast_filter_msg(struct net *net, struct sk_buff_head *defq,
+ struct sk_buff_head *inputq)
+{
+ struct sk_buff *skb, *_skb, *tmp;
+ struct tipc_msg *hdr, *_hdr;
+ bool match = false;
+ u32 node, port;
+
+ skb = skb_peek(inputq);
+ if (!skb)
+ return;
+
+ hdr = buf_msg(skb);
+
+ if (likely(!msg_is_syn(hdr) && skb_queue_empty(defq)))
+ return;
+
+ node = msg_orignode(hdr);
+ if (node == tipc_own_addr(net))
+ return;
+
+ port = msg_origport(hdr);
+
+ /* Has the twin SYN message already arrived ? */
+ skb_queue_walk(defq, _skb) {
+ _hdr = buf_msg(_skb);
+ if (msg_orignode(_hdr) != node)
+ continue;
+ if (msg_origport(_hdr) != port)
+ continue;
+ match = true;
+ break;
+ }
+
+ if (!match) {
+ if (!msg_is_syn(hdr))
+ return;
+ __skb_dequeue(inputq);
+ __skb_queue_tail(defq, skb);
+ return;
+ }
+
+ /* Deliver non-SYN message from other link, otherwise queue it */
+ if (!msg_is_syn(hdr)) {
+ if (msg_is_rcast(hdr) != msg_is_rcast(_hdr))
+ return;
+ __skb_dequeue(inputq);
+ __skb_queue_tail(defq, skb);
+ return;
+ }
+
+ /* Queue non-SYN/SYN message from same link */
+ if (msg_is_rcast(hdr) == msg_is_rcast(_hdr)) {
+ __skb_dequeue(inputq);
+ __skb_queue_tail(defq, skb);
+ return;
+ }
+
+ /* Matching SYN messages => return the one with data, if any */
+ __skb_unlink(_skb, defq);
+ if (msg_data_sz(hdr)) {
+ kfree_skb(_skb);
+ } else {
+ __skb_dequeue(inputq);
+ kfree_skb(skb);
+ __skb_queue_tail(inputq, _skb);
+ }
+
+ /* Deliver subsequent non-SYN messages from same peer */
+ skb_queue_walk_safe(defq, _skb, tmp) {
+ _hdr = buf_msg(_skb);
+ if (msg_orignode(_hdr) != node)
+ continue;
+ if (msg_origport(_hdr) != port)
+ continue;
+ if (msg_is_syn(_hdr))
+ break;
+ __skb_unlink(_skb, defq);
+ __skb_queue_tail(inputq, _skb);
+ }
+}
#define TIPC_METHOD_EXPIRE msecs_to_jiffies(5000)
+#define BCLINK_MODE_BCAST 0x1
+#define BCLINK_MODE_RCAST 0x2
+#define BCLINK_MODE_SEL 0x4
+
struct tipc_nlist {
struct list_head list;
u32 self;
/* Cookie to be used between socket and broadcast layer
* @rcast: replicast (instead of broadcast) was used at previous xmit
* @mandatory: broadcast/replicast indication was set by user
+ * @deferredq: defer queue to make message in order
* @expires: re-evaluate non-mandatory transmit method if we are past this
*/
struct tipc_mc_method {
bool rcast;
bool mandatory;
+ struct sk_buff_head deferredq;
unsigned long expires;
};
int tipc_nl_bc_link_set(struct net *net, struct nlattr *attrs[]);
int tipc_bclink_reset_stats(struct net *net);
+u32 tipc_bcast_get_broadcast_mode(struct net *net);
+u32 tipc_bcast_get_broadcast_ratio(struct net *net);
+
+void tipc_mcast_filter_msg(struct net *net, struct sk_buff_head *defq,
+ struct sk_buff_head *inputq);
+
static inline void tipc_bcast_lock(struct net *net)
{
spin_lock_bh(&tipc_net(net)->bclock);
#include "net.h"
#include "socket.h"
#include "bcast.h"
+#include "node.h"
#include <linux/module.h>
tn->node_addr = 0;
tn->trial_addr = 0;
tn->addr_trial_end = 0;
+ tn->capabilities = TIPC_NODE_CAPABILITIES;
memset(tn->node_id, 0, sizeof(tn->node_id));
memset(tn->node_id_string, 0, sizeof(tn->node_id_string));
tn->mon_threshold = TIPC_DEF_MON_THRESHOLD;
/* Topology subscription server */
struct tipc_topsrv *topsrv;
atomic_t subscription_count;
+
+ /* Cluster capabilities */
+ u16 capabilities;
};
static inline struct tipc_net *tipc_net(struct net *net)
/* Failover/synch */
u16 drop_point;
struct sk_buff *failover_reasm_skb;
+ struct sk_buff_head failover_deferdq;
/* Max packet negotiation */
u16 mtu;
};
#define TIPC_BC_RETR_LIM msecs_to_jiffies(10) /* [ms] */
+#define TIPC_UC_RETR_TIME (jiffies + msecs_to_jiffies(1))
/*
* Interval between NACKs when packets arrive out of order
static void tipc_link_build_bc_init_msg(struct tipc_link *l,
struct sk_buff_head *xmitq);
static bool tipc_link_release_pkts(struct tipc_link *l, u16 to);
+static u16 tipc_build_gap_ack_blks(struct tipc_link *l, void *data);
+static void tipc_link_advance_transmq(struct tipc_link *l, u16 acked, u16 gap,
+ struct tipc_gap_ack_blks *ga,
+ struct sk_buff_head *xmitq);
/*
* Simple non-static link routines (i.e. referenced outside this file)
__skb_queue_head_init(&l->transmq);
__skb_queue_head_init(&l->backlogq);
__skb_queue_head_init(&l->deferdq);
+ __skb_queue_head_init(&l->failover_deferdq);
skb_queue_head_init(&l->wakeupq);
skb_queue_head_init(l->inputq);
return true;
__skb_queue_purge(&l->transmq);
__skb_queue_purge(&l->deferdq);
__skb_queue_purge(&l->backlogq);
+ __skb_queue_purge(&l->failover_deferdq);
l->backlog[TIPC_LOW_IMPORTANCE].len = 0;
l->backlog[TIPC_MEDIUM_IMPORTANCE].len = 0;
l->backlog[TIPC_HIGH_IMPORTANCE].len = 0;
* Consumes buffer
*/
static int tipc_link_input(struct tipc_link *l, struct sk_buff *skb,
- struct sk_buff_head *inputq)
+ struct sk_buff_head *inputq,
+ struct sk_buff **reasm_skb)
{
struct tipc_msg *hdr = buf_msg(skb);
- struct sk_buff **reasm_skb = &l->reasm_buf;
struct sk_buff *iskb;
struct sk_buff_head tmpq;
int usr = msg_user(hdr);
- int rc = 0;
int pos = 0;
- int ipos = 0;
-
- if (unlikely(usr == TUNNEL_PROTOCOL)) {
- if (msg_type(hdr) == SYNCH_MSG) {
- __skb_queue_purge(&l->deferdq);
- goto drop;
- }
- if (!tipc_msg_extract(skb, &iskb, &ipos))
- return rc;
- kfree_skb(skb);
- skb = iskb;
- hdr = buf_msg(skb);
- if (less(msg_seqno(hdr), l->drop_point))
- goto drop;
- if (tipc_data_input(l, skb, inputq))
- return rc;
- usr = msg_user(hdr);
- reasm_skb = &l->failover_reasm_skb;
- }
if (usr == MSG_BUNDLER) {
skb_queue_head_init(&tmpq);
tipc_link_bc_init_rcv(l->bc_rcvlink, hdr);
tipc_bcast_unlock(l->net);
}
-drop:
+
kfree_skb(skb);
return 0;
}
+/* tipc_link_tnl_rcv() - receive TUNNEL_PROTOCOL message, drop or process the
+ * inner message along with the ones in the old link's
+ * deferdq
+ * @l: tunnel link
+ * @skb: TUNNEL_PROTOCOL message
+ * @inputq: queue to put messages ready for delivery
+ */
+static int tipc_link_tnl_rcv(struct tipc_link *l, struct sk_buff *skb,
+ struct sk_buff_head *inputq)
+{
+ struct sk_buff **reasm_skb = &l->failover_reasm_skb;
+ struct sk_buff_head *fdefq = &l->failover_deferdq;
+ struct tipc_msg *hdr = buf_msg(skb);
+ struct sk_buff *iskb;
+ int ipos = 0;
+ int rc = 0;
+ u16 seqno;
+
+ /* SYNCH_MSG */
+ if (msg_type(hdr) == SYNCH_MSG)
+ goto drop;
+
+ /* FAILOVER_MSG */
+ if (!tipc_msg_extract(skb, &iskb, &ipos)) {
+ pr_warn_ratelimited("Cannot extract FAILOVER_MSG, defq: %d\n",
+ skb_queue_len(fdefq));
+ return rc;
+ }
+
+ do {
+ seqno = buf_seqno(iskb);
+
+ if (unlikely(less(seqno, l->drop_point))) {
+ kfree_skb(iskb);
+ continue;
+ }
+
+ if (unlikely(seqno != l->drop_point)) {
+ __tipc_skb_queue_sorted(fdefq, seqno, iskb);
+ continue;
+ }
+
+ l->drop_point++;
+
+ if (!tipc_data_input(l, iskb, inputq))
+ rc |= tipc_link_input(l, iskb, inputq, reasm_skb);
+ if (unlikely(rc))
+ break;
+ } while ((iskb = __tipc_skb_dequeue(fdefq, l->drop_point)));
+
+drop:
+ kfree_skb(skb);
+ return rc;
+}
+
static bool tipc_link_release_pkts(struct tipc_link *l, u16 acked)
{
bool released = false;
return released;
}
+/* tipc_build_gap_ack_blks - build Gap ACK blocks
+ * @l: tipc link that data have come with gaps in sequence if any
+ * @data: data buffer to store the Gap ACK blocks after built
+ *
+ * returns the actual allocated memory size
+ */
+static u16 tipc_build_gap_ack_blks(struct tipc_link *l, void *data)
+{
+ struct sk_buff *skb = skb_peek(&l->deferdq);
+ struct tipc_gap_ack_blks *ga = data;
+ u16 len, expect, seqno = 0;
+ u8 n = 0;
+
+ if (!skb)
+ goto exit;
+
+ expect = buf_seqno(skb);
+ skb_queue_walk(&l->deferdq, skb) {
+ seqno = buf_seqno(skb);
+ if (unlikely(more(seqno, expect))) {
+ ga->gacks[n].ack = htons(expect - 1);
+ ga->gacks[n].gap = htons(seqno - expect);
+ if (++n >= MAX_GAP_ACK_BLKS) {
+ pr_info_ratelimited("Too few Gap ACK blocks!\n");
+ goto exit;
+ }
+ } else if (unlikely(less(seqno, expect))) {
+ pr_warn("Unexpected skb in deferdq!\n");
+ continue;
+ }
+ expect = seqno + 1;
+ }
+
+ /* last block */
+ ga->gacks[n].ack = htons(seqno);
+ ga->gacks[n].gap = 0;
+ n++;
+
+exit:
+ len = tipc_gap_ack_blks_sz(n);
+ ga->len = htons(len);
+ ga->gack_cnt = n;
+ return len;
+}
+
+/* tipc_link_advance_transmq - advance TIPC link transmq queue by releasing
+ * acked packets, also doing retransmissions if
+ * gaps found
+ * @l: tipc link with transmq queue to be advanced
+ * @acked: seqno of last packet acked by peer without any gaps before
+ * @gap: # of gap packets
+ * @ga: buffer pointer to Gap ACK blocks from peer
+ * @xmitq: queue for accumulating the retransmitted packets if any
+ */
+static void tipc_link_advance_transmq(struct tipc_link *l, u16 acked, u16 gap,
+ struct tipc_gap_ack_blks *ga,
+ struct sk_buff_head *xmitq)
+{
+ struct sk_buff *skb, *_skb, *tmp;
+ struct tipc_msg *hdr;
+ u16 bc_ack = l->bc_rcvlink->rcv_nxt - 1;
+ u16 ack = l->rcv_nxt - 1;
+ u16 seqno;
+ u16 n = 0;
+
+ skb_queue_walk_safe(&l->transmq, skb, tmp) {
+ seqno = buf_seqno(skb);
+
+next_gap_ack:
+ if (less_eq(seqno, acked)) {
+ /* release skb */
+ __skb_unlink(skb, &l->transmq);
+ kfree_skb(skb);
+ } else if (less_eq(seqno, acked + gap)) {
+ /* retransmit skb */
+ if (time_before(jiffies, TIPC_SKB_CB(skb)->nxt_retr))
+ continue;
+ TIPC_SKB_CB(skb)->nxt_retr = TIPC_UC_RETR_TIME;
+
+ _skb = __pskb_copy(skb, MIN_H_SIZE, GFP_ATOMIC);
+ if (!_skb)
+ continue;
+ hdr = buf_msg(_skb);
+ msg_set_ack(hdr, ack);
+ msg_set_bcast_ack(hdr, bc_ack);
+ _skb->priority = TC_PRIO_CONTROL;
+ __skb_queue_tail(xmitq, _skb);
+ l->stats.retransmitted++;
+ } else {
+ /* retry with Gap ACK blocks if any */
+ if (!ga || n >= ga->gack_cnt)
+ break;
+ acked = ntohs(ga->gacks[n].ack);
+ gap = ntohs(ga->gacks[n].gap);
+ n++;
+ goto next_gap_ack;
+ }
+ }
+}
+
/* tipc_link_build_state_msg: prepare link state message for transmission
*
* Note that sending of broadcast ack is coordinated among nodes, to reduce
struct sk_buff_head *xmitq)
{
u32 def_cnt = ++l->stats.deferred_recv;
+ u32 defq_len = skb_queue_len(&l->deferdq);
int match1, match2;
if (link_is_bc_rcvlink(l)) {
return 0;
}
- if ((skb_queue_len(&l->deferdq) == 1) || !(def_cnt % TIPC_NACK_INTV))
+ if (defq_len >= 3 && !((defq_len - 3) % 16))
tipc_link_build_proto_msg(l, STATE_MSG, 0, 0, 0, 0, 0, xmitq);
return 0;
}
struct sk_buff_head *xmitq)
{
struct sk_buff_head *defq = &l->deferdq;
- struct tipc_msg *hdr;
+ struct tipc_msg *hdr = buf_msg(skb);
u16 seqno, rcv_nxt, win_lim;
int rc = 0;
+ /* Verify and update link state */
+ if (unlikely(msg_user(hdr) == LINK_PROTOCOL))
+ return tipc_link_proto_rcv(l, skb, xmitq);
+
+ /* Don't send probe at next timeout expiration */
+ l->silent_intv_cnt = 0;
+
do {
hdr = buf_msg(skb);
seqno = msg_seqno(hdr);
rcv_nxt = l->rcv_nxt;
win_lim = rcv_nxt + TIPC_MAX_LINK_WIN;
- /* Verify and update link state */
- if (unlikely(msg_user(hdr) == LINK_PROTOCOL))
- return tipc_link_proto_rcv(l, skb, xmitq);
-
if (unlikely(!link_is_up(l))) {
if (l->state == LINK_ESTABLISHING)
rc = TIPC_LINK_UP_EVT;
goto drop;
}
- /* Don't send probe at next timeout expiration */
- l->silent_intv_cnt = 0;
-
/* Drop if outside receive window */
if (unlikely(less(seqno, rcv_nxt) || more(seqno, win_lim))) {
l->stats.duplicates++;
/* Deliver packet */
l->rcv_nxt++;
l->stats.recv_pkts++;
- if (!tipc_data_input(l, skb, l->inputq))
- rc |= tipc_link_input(l, skb, l->inputq);
+
+ if (unlikely(msg_user(hdr) == TUNNEL_PROTOCOL))
+ rc |= tipc_link_tnl_rcv(l, skb, l->inputq);
+ else if (!tipc_data_input(l, skb, l->inputq))
+ rc |= tipc_link_input(l, skb, l->inputq, &l->reasm_buf);
if (unlikely(++l->rcv_unacked >= TIPC_MIN_LINK_WIN))
rc |= tipc_link_build_state_msg(l, xmitq);
if (unlikely(rc & ~TIPC_LINK_SND_STATE))
break;
- } while ((skb = __skb_dequeue(defq)));
+ } while ((skb = __tipc_skb_dequeue(defq, l->rcv_nxt)));
return rc;
drop:
struct tipc_mon_state *mstate = &l->mon_state;
int dlen = 0;
void *data;
+ u16 glen = 0;
/* Don't send protocol message during reset or link failover */
if (tipc_link_is_blocked(l))
rcvgap = buf_seqno(skb_peek(dfq)) - l->rcv_nxt;
skb = tipc_msg_create(LINK_PROTOCOL, mtyp, INT_H_SIZE,
- tipc_max_domain_size, l->addr,
- tipc_own_addr(l->net), 0, 0, 0);
+ tipc_max_domain_size + MAX_GAP_ACK_BLKS_SZ,
+ l->addr, tipc_own_addr(l->net), 0, 0, 0);
if (!skb)
return;
msg_set_bc_gap(hdr, link_bc_rcv_gap(bcl));
msg_set_probe(hdr, probe);
msg_set_is_keepalive(hdr, probe || probe_reply);
- tipc_mon_prep(l->net, data, &dlen, mstate, l->bearer_id);
- msg_set_size(hdr, INT_H_SIZE + dlen);
- skb_trim(skb, INT_H_SIZE + dlen);
+ if (l->peer_caps & TIPC_GAP_ACK_BLOCK)
+ glen = tipc_build_gap_ack_blks(l, data);
+ tipc_mon_prep(l->net, data + glen, &dlen, mstate, l->bearer_id);
+ msg_set_size(hdr, INT_H_SIZE + glen + dlen);
+ skb_trim(skb, INT_H_SIZE + glen + dlen);
l->stats.sent_states++;
l->rcv_unacked = 0;
} else {
void tipc_link_tnl_prepare(struct tipc_link *l, struct tipc_link *tnl,
int mtyp, struct sk_buff_head *xmitq)
{
+ struct sk_buff_head *fdefq = &tnl->failover_deferdq;
struct sk_buff *skb, *tnlskb;
struct tipc_msg *hdr, tnlhdr;
struct sk_buff_head *queue = &l->transmq;
/* Initialize reusable tunnel packet header */
tipc_msg_init(tipc_own_addr(l->net), &tnlhdr, TUNNEL_PROTOCOL,
mtyp, INT_H_SIZE, l->addr);
- pktcnt = skb_queue_len(&l->transmq) + skb_queue_len(&l->backlogq);
+ if (mtyp == SYNCH_MSG)
+ pktcnt = l->snd_nxt - buf_seqno(skb_peek(&l->transmq));
+ else
+ pktcnt = skb_queue_len(&l->transmq);
+ pktcnt += skb_queue_len(&l->backlogq);
msg_set_msgcnt(&tnlhdr, pktcnt);
msg_set_bearer_id(&tnlhdr, l->peer_bearer_id);
tnl:
tnl->drop_point = l->rcv_nxt;
tnl->failover_reasm_skb = l->reasm_buf;
l->reasm_buf = NULL;
+
+ /* Failover the link's deferdq */
+ if (unlikely(!skb_queue_empty(fdefq))) {
+ pr_warn("Link failover deferdq not empty: %d!\n",
+ skb_queue_len(fdefq));
+ __skb_queue_purge(fdefq);
+ }
+ skb_queue_splice_init(&l->deferdq, fdefq);
}
}
struct sk_buff_head *xmitq)
{
struct tipc_msg *hdr = buf_msg(skb);
+ struct tipc_gap_ack_blks *ga = NULL;
u16 rcvgap = 0;
u16 ack = msg_ack(hdr);
u16 gap = msg_seq_gap(hdr);
u16 dlen = msg_data_sz(hdr);
int mtyp = msg_type(hdr);
bool reply = msg_probe(hdr);
+ u16 glen = 0;
void *data;
char *if_name;
int rc = 0;
rc = TIPC_LINK_UP_EVT;
break;
}
- tipc_mon_rcv(l->net, data, dlen, l->addr,
+
+ /* Receive Gap ACK blocks from peer if any */
+ if (l->peer_caps & TIPC_GAP_ACK_BLOCK) {
+ ga = (struct tipc_gap_ack_blks *)data;
+ glen = ntohs(ga->len);
+ /* sanity check: if failed, ignore Gap ACK blocks */
+ if (glen != tipc_gap_ack_blks_sz(ga->gack_cnt))
+ ga = NULL;
+ }
+
+ tipc_mon_rcv(l->net, data + glen, dlen - glen, l->addr,
&l->mon_state, l->bearer_id);
/* Send NACK if peer has sent pkts we haven't received yet */
if (rcvgap || reply)
tipc_link_build_proto_msg(l, STATE_MSG, 0, reply,
rcvgap, 0, 0, xmitq);
- tipc_link_release_pkts(l, ack);
+
+ tipc_link_advance_transmq(l, ack, gap, ga, xmitq);
/* If NACK, retransmit will now start at right position */
- if (gap) {
- rc = tipc_link_retrans(l, l, ack + 1, ack + gap, xmitq);
+ if (gap)
l->stats.recv_nacks++;
- }
tipc_link_advance_backlog(l, xmitq);
if (unlikely(!skb_queue_empty(&l->wakeupq)))
struct nlattr *attrs;
struct nlattr *prop;
struct tipc_net *tn = net_generic(net, tipc_net_id);
+ u32 bc_mode = tipc_bcast_get_broadcast_mode(net);
+ u32 bc_ratio = tipc_bcast_get_broadcast_ratio(net);
struct tipc_link *bcl = tn->bcl;
if (!bcl)
goto attr_msg_full;
if (nla_put_u32(msg->skb, TIPC_NLA_PROP_WIN, bcl->window))
goto prop_msg_full;
+ if (nla_put_u32(msg->skb, TIPC_NLA_PROP_BROADCAST, bc_mode))
+ goto prop_msg_full;
+ if (bc_mode & BCLINK_MODE_SEL)
+ if (nla_put_u32(msg->skb, TIPC_NLA_PROP_BROADCAST_RATIO,
+ bc_ratio))
+ goto prop_msg_full;
nla_nest_end(msg->skb, prop);
err = __tipc_nl_add_bc_link_stat(msg->skb, &bcl->stats);
__be32 hdr[15];
};
+/* struct tipc_gap_ack - TIPC Gap ACK block
+ * @ack: seqno of the last consecutive packet in link deferdq
+ * @gap: number of gap packets since the last ack
+ *
+ * E.g:
+ * link deferdq: 1 2 3 4 10 11 13 14 15 20
+ * --> Gap ACK blocks: <4, 5>, <11, 1>, <15, 4>, <20, 0>
+ */
+struct tipc_gap_ack {
+ __be16 ack;
+ __be16 gap;
+};
+
+/* struct tipc_gap_ack_blks
+ * @len: actual length of the record
+ * @gack_cnt: number of Gap ACK blocks in the record
+ * @gacks: array of Gap ACK blocks
+ */
+struct tipc_gap_ack_blks {
+ __be16 len;
+ u8 gack_cnt;
+ u8 reserved;
+ struct tipc_gap_ack gacks[];
+};
+
+#define tipc_gap_ack_blks_sz(n) (sizeof(struct tipc_gap_ack_blks) + \
+ sizeof(struct tipc_gap_ack) * (n))
+
+#define MAX_GAP_ACK_BLKS 32
+#define MAX_GAP_ACK_BLKS_SZ tipc_gap_ack_blks_sz(MAX_GAP_ACK_BLKS)
+
static inline struct tipc_msg *buf_msg(struct sk_buff *skb)
{
return (struct tipc_msg *)skb->data;
msg_set_bits(m, 0, 18, 1, d);
}
+static inline bool msg_is_rcast(struct tipc_msg *m)
+{
+ return msg_bits(m, 0, 18, 0x1);
+}
+
+static inline void msg_set_is_rcast(struct tipc_msg *m, bool d)
+{
+ msg_set_bits(m, 0, 18, 0x1, d);
+}
+
static inline void msg_set_size(struct tipc_msg *m, u32 sz)
{
m->hdr[0] = htonl((msg_word(m, 0) & ~0x1ffff) | sz);
tipc_skb_queue_splice_tail(&tmp, head);
}
+/* __tipc_skb_dequeue() - dequeue the head skb according to expected seqno
+ * @list: list to be dequeued from
+ * @seqno: seqno of the expected msg
+ *
+ * returns skb dequeued from the list if its seqno is less than or equal to
+ * the expected one, otherwise the skb is still hold
+ *
+ * Note: must be used with appropriate locks held only
+ */
+static inline struct sk_buff *__tipc_skb_dequeue(struct sk_buff_head *list,
+ u16 seqno)
+{
+ struct sk_buff *skb = skb_peek(list);
+
+ if (skb && less_eq(buf_seqno(skb), seqno)) {
+ __skb_unlink(skb, list);
+ return skb;
+ }
+ return NULL;
+}
+
#endif
[TIPC_NLA_PROP_UNSPEC] = { .type = NLA_UNSPEC },
[TIPC_NLA_PROP_PRIO] = { .type = NLA_U32 },
[TIPC_NLA_PROP_TOL] = { .type = NLA_U32 },
- [TIPC_NLA_PROP_WIN] = { .type = NLA_U32 }
+ [TIPC_NLA_PROP_WIN] = { .type = NLA_U32 },
+ [TIPC_NLA_PROP_BROADCAST] = { .type = NLA_U32 },
+ [TIPC_NLA_PROP_BROADCAST_RATIO] = { .type = NLA_U32 }
};
const struct nla_policy tipc_nl_bearer_policy[TIPC_NLA_BEARER_MAX + 1] = {
{
.cmd = TIPC_NL_BEARER_DISABLE,
.doit = tipc_nl_bearer_disable,
- .policy = tipc_nl_policy,
},
{
.cmd = TIPC_NL_BEARER_ENABLE,
.doit = tipc_nl_bearer_enable,
- .policy = tipc_nl_policy,
},
{
.cmd = TIPC_NL_BEARER_GET,
.doit = tipc_nl_bearer_get,
.dumpit = tipc_nl_bearer_dump,
- .policy = tipc_nl_policy,
},
{
.cmd = TIPC_NL_BEARER_ADD,
.doit = tipc_nl_bearer_add,
- .policy = tipc_nl_policy,
},
{
.cmd = TIPC_NL_BEARER_SET,
.doit = tipc_nl_bearer_set,
- .policy = tipc_nl_policy,
},
{
.cmd = TIPC_NL_SOCK_GET,
.start = tipc_dump_start,
.dumpit = tipc_nl_sk_dump,
.done = tipc_dump_done,
- .policy = tipc_nl_policy,
},
{
.cmd = TIPC_NL_PUBL_GET,
.dumpit = tipc_nl_publ_dump,
- .policy = tipc_nl_policy,
},
{
.cmd = TIPC_NL_LINK_GET,
.doit = tipc_nl_node_get_link,
.dumpit = tipc_nl_node_dump_link,
- .policy = tipc_nl_policy,
},
{
.cmd = TIPC_NL_LINK_SET,
.doit = tipc_nl_node_set_link,
- .policy = tipc_nl_policy,
},
{
.cmd = TIPC_NL_LINK_RESET_STATS,
.doit = tipc_nl_node_reset_link_stats,
- .policy = tipc_nl_policy,
},
{
.cmd = TIPC_NL_MEDIA_GET,
.doit = tipc_nl_media_get,
.dumpit = tipc_nl_media_dump,
- .policy = tipc_nl_policy,
},
{
.cmd = TIPC_NL_MEDIA_SET,
.doit = tipc_nl_media_set,
- .policy = tipc_nl_policy,
},
{
.cmd = TIPC_NL_NODE_GET,
.dumpit = tipc_nl_node_dump,
- .policy = tipc_nl_policy,
},
{
.cmd = TIPC_NL_NET_GET,
.dumpit = tipc_nl_net_dump,
- .policy = tipc_nl_policy,
},
{
.cmd = TIPC_NL_NET_SET,
.doit = tipc_nl_net_set,
- .policy = tipc_nl_policy,
},
{
.cmd = TIPC_NL_NAME_TABLE_GET,
.dumpit = tipc_nl_name_table_dump,
- .policy = tipc_nl_policy,
},
{
.cmd = TIPC_NL_MON_SET,
.doit = tipc_nl_node_set_monitor,
- .policy = tipc_nl_policy,
},
{
.cmd = TIPC_NL_MON_GET,
.doit = tipc_nl_node_get_monitor,
.dumpit = tipc_nl_node_dump_monitor,
- .policy = tipc_nl_policy,
},
{
.cmd = TIPC_NL_MON_PEER_GET,
.dumpit = tipc_nl_node_dump_monitor_peer,
- .policy = tipc_nl_policy,
},
{
.cmd = TIPC_NL_PEER_REMOVE,
.doit = tipc_nl_peer_rm,
- .policy = tipc_nl_policy,
},
#ifdef CONFIG_TIPC_MEDIA_UDP
{
.cmd = TIPC_NL_UDP_GET_REMOTEIP,
.dumpit = tipc_udp_nl_dump_remoteip,
- .policy = tipc_nl_policy,
},
#endif
};
.version = TIPC_GENL_V2_VERSION,
.hdrsize = 0,
.maxattr = TIPC_NLA_MAX,
+ .policy = tipc_nl_policy,
.netnsok = true,
.module = THIS_MODULE,
.ops = tipc_genl_v2_ops,
if (n->capabilities == capabilities)
goto exit;
/* Same node may come back with new capabilities */
- write_lock_bh(&n->lock);
+ tipc_node_write_lock(n);
n->capabilities = capabilities;
for (bearer_id = 0; bearer_id < MAX_BEARERS; bearer_id++) {
l = n->links[bearer_id].link;
if (l)
tipc_link_update_caps(l, capabilities);
}
- write_unlock_bh(&n->lock);
+ tipc_node_write_unlock_fast(n);
+
+ /* Calculate cluster capabilities */
+ tn->capabilities = TIPC_NODE_CAPABILITIES;
+ list_for_each_entry_rcu(temp_node, &tn->node_list, list) {
+ tn->capabilities &= temp_node->capabilities;
+ }
goto exit;
}
n = kzalloc(sizeof(*n), GFP_ATOMIC);
break;
}
list_add_tail_rcu(&n->list, &temp_node->list);
+ /* Calculate cluster capabilities */
+ tn->capabilities = TIPC_NODE_CAPABILITIES;
+ list_for_each_entry_rcu(temp_node, &tn->node_list, list) {
+ tn->capabilities &= temp_node->capabilities;
+ }
trace_tipc_node_create(n, true, " ");
exit:
spin_unlock_bh(&tn->node_list_lock);
*/
static bool tipc_node_cleanup(struct tipc_node *peer)
{
+ struct tipc_node *temp_node;
struct tipc_net *tn = tipc_net(peer->net);
bool deleted = false;
deleted = true;
}
tipc_node_write_unlock(peer);
+
+ /* Calculate cluster capabilities */
+ tn->capabilities = TIPC_NODE_CAPABILITIES;
+ list_for_each_entry_rcu(temp_node, &tn->node_list, list) {
+ tn->capabilities &= temp_node->capabilities;
+ }
+
spin_unlock_bh(&tn->node_list_lock);
return deleted;
}
TIPC_BLOCK_FLOWCTL = (1 << 3),
TIPC_BCAST_RCAST = (1 << 4),
TIPC_NODE_ID128 = (1 << 5),
- TIPC_LINK_PROTO_SEQNO = (1 << 6)
+ TIPC_LINK_PROTO_SEQNO = (1 << 6),
+ TIPC_MCAST_RBCTL = (1 << 7),
+ TIPC_GAP_ACK_BLOCK = (1 << 8)
};
#define TIPC_NODE_CAPABILITIES (TIPC_SYN_BIT | \
TIPC_BCAST_RCAST | \
TIPC_BLOCK_FLOWCTL | \
TIPC_NODE_ID128 | \
- TIPC_LINK_PROTO_SEQNO)
+ TIPC_LINK_PROTO_SEQNO | \
+ TIPC_MCAST_RBCTL | \
+ TIPC_GAP_ACK_BLOCK)
#define INVALID_BEARER_ID -1
void tipc_node_stop(struct net *net);
tsk_set_unreturnable(tsk, true);
if (sock->type == SOCK_DGRAM)
tsk_set_unreliable(tsk, true);
+ __skb_queue_head_init(&tsk->mc_method.deferredq);
}
trace_tipc_sk_create(sk, NULL, TIPC_DUMP_NONE, " ");
sk->sk_shutdown = SHUTDOWN_MASK;
tipc_sk_leave(tsk);
tipc_sk_withdraw(tsk, 0, NULL);
+ __skb_queue_purge(&tsk->mc_method.deferredq);
sk_stop_timer(sk, &sk->sk_timer);
tipc_sk_remove(tsk);
struct tipc_msg *hdr = buf_msg(skb);
struct net *net = sock_net(sk);
struct sk_buff_head inputq;
+ int mtyp = msg_type(hdr);
int limit, err = TIPC_OK;
trace_tipc_sk_filter_rcv(sk, skb, TIPC_DUMP_ALL, " ");
if (unlikely(grp))
tipc_group_filter_msg(grp, &inputq, xmitq);
+ if (unlikely(!grp) && mtyp == TIPC_MCAST_MSG)
+ tipc_mcast_filter_msg(net, &tsk->mc_method.deferredq, &inputq);
+
/* Validate and add to receive buffer if there is space */
while ((skb = __skb_dequeue(&inputq))) {
hdr = buf_msg(skb);
#include <net/sock.h>
#include <net/ip.h>
#include <net/udp_tunnel.h>
-#include <net/addrconf.h>
+#include <net/ipv6_stubs.h>
#include <linux/tipc_netlink.h>
#include "core.h"
#include "addr.h"
switch (crypto_info->cipher_type) {
case TLS_CIPHER_AES_GCM_128:
+ optsize = sizeof(struct tls12_crypto_info_aes_gcm_128);
+ break;
case TLS_CIPHER_AES_GCM_256: {
- optsize = crypto_info->cipher_type == TLS_CIPHER_AES_GCM_128 ?
- sizeof(struct tls12_crypto_info_aes_gcm_128) :
- sizeof(struct tls12_crypto_info_aes_gcm_256);
- if (optlen != optsize) {
- rc = -EINVAL;
- goto err_crypto_info;
- }
- rc = copy_from_user(crypto_info + 1, optval + sizeof(*crypto_info),
- optlen - sizeof(*crypto_info));
- if (rc) {
- rc = -EFAULT;
- goto err_crypto_info;
- }
+ optsize = sizeof(struct tls12_crypto_info_aes_gcm_256);
break;
}
+ case TLS_CIPHER_AES_CCM_128:
+ optsize = sizeof(struct tls12_crypto_info_aes_ccm_128);
+ break;
default:
rc = -EINVAL;
goto err_crypto_info;
}
+ if (optlen != optsize) {
+ rc = -EINVAL;
+ goto err_crypto_info;
+ }
+
+ rc = copy_from_user(crypto_info + 1, optval + sizeof(*crypto_info),
+ optlen - sizeof(*crypto_info));
+ if (rc) {
+ rc = -EFAULT;
+ goto err_crypto_info;
+ }
+
if (tx) {
#ifdef CONFIG_TLS_DEVICE
rc = tls_set_device_offload(sk, ctx);
#include <net/strparser.h>
#include <net/tls.h>
-#define MAX_IV_SIZE TLS_CIPHER_AES_GCM_128_IV_SIZE
-
static int __skb_nsg(struct sk_buff *skb, int offset, int len,
unsigned int recursion_level)
{
/* Using skb->sk to push sk through to crypto async callback
* handler. This allows propagating errors up to the socket
* if needed. It _must_ be cleared in the async handler
- * before kfree_skb is called. We _know_ skb->sk is NULL
+ * before consume_skb is called. We _know_ skb->sk is NULL
* because it is a clone from strparser.
*/
skb->sk = sk;
struct tls_rec *rec = ctx->open_rec;
struct sk_msg *msg_en = &rec->msg_encrypted;
struct scatterlist *sge = sk_msg_elem(msg_en, start);
- int rc;
+ int rc, iv_offset = 0;
+
+ /* For CCM based ciphers, first byte of IV is a constant */
+ if (prot->cipher_type == TLS_CIPHER_AES_CCM_128) {
+ rec->iv_data[0] = TLS_AES_CCM_IV_B0_BYTE;
+ iv_offset = 1;
+ }
+
+ memcpy(&rec->iv_data[iv_offset], tls_ctx->tx.iv,
+ prot->iv_size + prot->salt_size);
- memcpy(rec->iv_data, tls_ctx->tx.iv, sizeof(rec->iv_data));
- xor_iv_with_seq(prot->version, rec->iv_data,
- tls_ctx->tx.rec_seq);
+ xor_iv_with_seq(prot->version, rec->iv_data, tls_ctx->tx.rec_seq);
sge->offset += prot->prepend_size;
sge->length -= prot->prepend_size;
struct scatterlist *sgout = NULL;
const int data_len = rxm->full_len - prot->overhead_size +
prot->tail_size;
+ int iv_offset = 0;
if (*zc && (out_iov || out_sg)) {
if (out_iov)
aad = (u8 *)(sgout + n_sgout);
iv = aad + prot->aad_size;
+ /* For CCM based ciphers, first byte of nonce+iv is always '2' */
+ if (prot->cipher_type == TLS_CIPHER_AES_CCM_128) {
+ iv[0] = 2;
+ iv_offset = 1;
+ }
+
/* Prepare IV */
err = skb_copy_bits(skb, rxm->offset + TLS_HEADER_SIZE,
- iv + TLS_CIPHER_AES_GCM_128_SALT_SIZE,
+ iv + iv_offset + prot->salt_size,
prot->iv_size);
if (err < 0) {
kfree(mem);
return err;
}
if (prot->version == TLS_1_3_VERSION)
- memcpy(iv, tls_ctx->rx.iv, crypto_aead_ivsize(ctx->aead_recv));
+ memcpy(iv + iv_offset, tls_ctx->rx.iv,
+ crypto_aead_ivsize(ctx->aead_recv));
else
- memcpy(iv, tls_ctx->rx.iv, TLS_CIPHER_AES_GCM_128_SALT_SIZE);
+ memcpy(iv + iv_offset, tls_ctx->rx.iv, prot->salt_size);
xor_iv_with_seq(prot->version, iv, tls_ctx->rx.rec_seq);
rxm->full_len -= len;
return false;
}
- kfree_skb(skb);
+ consume_skb(skb);
}
/* Finished with message */
if (!is_peek) {
skb_unlink(skb, &ctx->rx_list);
- kfree_skb(skb);
+ consume_skb(skb);
}
skb = next_skb;
struct tls_crypto_info *crypto_info;
struct tls12_crypto_info_aes_gcm_128 *gcm_128_info;
struct tls12_crypto_info_aes_gcm_256 *gcm_256_info;
+ struct tls12_crypto_info_aes_ccm_128 *ccm_128_info;
struct tls_sw_context_tx *sw_ctx_tx = NULL;
struct tls_sw_context_rx *sw_ctx_rx = NULL;
struct cipher_context *cctx;
struct crypto_aead **aead;
struct strp_callbacks cb;
- u16 nonce_size, tag_size, iv_size, rec_seq_size;
+ u16 nonce_size, tag_size, iv_size, rec_seq_size, salt_size;
struct crypto_tfm *tfm;
- char *iv, *rec_seq, *key, *salt;
+ char *iv, *rec_seq, *key, *salt, *cipher_name;
size_t keysize;
int rc = 0;
keysize = TLS_CIPHER_AES_GCM_128_KEY_SIZE;
key = gcm_128_info->key;
salt = gcm_128_info->salt;
+ salt_size = TLS_CIPHER_AES_GCM_128_SALT_SIZE;
+ cipher_name = "gcm(aes)";
break;
}
case TLS_CIPHER_AES_GCM_256: {
keysize = TLS_CIPHER_AES_GCM_256_KEY_SIZE;
key = gcm_256_info->key;
salt = gcm_256_info->salt;
+ salt_size = TLS_CIPHER_AES_GCM_256_SALT_SIZE;
+ cipher_name = "gcm(aes)";
+ break;
+ }
+ case TLS_CIPHER_AES_CCM_128: {
+ nonce_size = TLS_CIPHER_AES_CCM_128_IV_SIZE;
+ tag_size = TLS_CIPHER_AES_CCM_128_TAG_SIZE;
+ iv_size = TLS_CIPHER_AES_CCM_128_IV_SIZE;
+ iv = ((struct tls12_crypto_info_aes_ccm_128 *)crypto_info)->iv;
+ rec_seq_size = TLS_CIPHER_AES_CCM_128_REC_SEQ_SIZE;
+ rec_seq =
+ ((struct tls12_crypto_info_aes_ccm_128 *)crypto_info)->rec_seq;
+ ccm_128_info =
+ (struct tls12_crypto_info_aes_ccm_128 *)crypto_info;
+ keysize = TLS_CIPHER_AES_CCM_128_KEY_SIZE;
+ key = ccm_128_info->key;
+ salt = ccm_128_info->salt;
+ salt_size = TLS_CIPHER_AES_CCM_128_SALT_SIZE;
+ cipher_name = "ccm(aes)";
break;
}
default:
prot->overhead_size = prot->prepend_size +
prot->tag_size + prot->tail_size;
prot->iv_size = iv_size;
- cctx->iv = kmalloc(iv_size + TLS_CIPHER_AES_GCM_128_SALT_SIZE,
- GFP_KERNEL);
+ prot->salt_size = salt_size;
+ cctx->iv = kmalloc(iv_size + salt_size, GFP_KERNEL);
if (!cctx->iv) {
rc = -ENOMEM;
goto free_priv;
}
/* Note: 128 & 256 bit salt are the same size */
- memcpy(cctx->iv, salt, TLS_CIPHER_AES_GCM_128_SALT_SIZE);
- memcpy(cctx->iv + TLS_CIPHER_AES_GCM_128_SALT_SIZE, iv, iv_size);
prot->rec_seq_size = rec_seq_size;
+ memcpy(cctx->iv, salt, salt_size);
+ memcpy(cctx->iv + salt_size, iv, iv_size);
cctx->rec_seq = kmemdup(rec_seq, rec_seq_size, GFP_KERNEL);
if (!cctx->rec_seq) {
rc = -ENOMEM;
}
if (!*aead) {
- *aead = crypto_alloc_aead("gcm(aes)", 0, 0);
+ *aead = crypto_alloc_aead(cipher_name, 0, 0);
if (IS_ERR(*aead)) {
rc = PTR_ERR(*aead);
*aead = NULL;
struct unix_sock *u = unix_sk(sk);
struct sk_buff *skb, *last;
long timeo;
+ int skip;
int err;
- int peeked, skip;
err = -EOPNOTSUPP;
if (flags&MSG_OOB)
mutex_lock(&u->iolock);
skip = sk_peek_offset(sk, flags);
- skb = __skb_try_recv_datagram(sk, flags, NULL, &peeked, &skip,
- &err, &last);
+ skb = __skb_try_recv_datagram(sk, flags, NULL, &skip, &err,
+ &last);
if (skb)
break;
{
.cmd = WIMAX_GNL_OP_MSG_FROM_USER,
.flags = GENL_ADMIN_PERM,
- .policy = wimax_gnl_policy,
.doit = wimax_gnl_doit_msg_from_user,
},
{
.cmd = WIMAX_GNL_OP_RESET,
.flags = GENL_ADMIN_PERM,
- .policy = wimax_gnl_policy,
.doit = wimax_gnl_doit_reset,
},
{
.cmd = WIMAX_GNL_OP_RFKILL,
.flags = GENL_ADMIN_PERM,
- .policy = wimax_gnl_policy,
.doit = wimax_gnl_doit_rfkill,
},
{
.cmd = WIMAX_GNL_OP_STATE_GET,
.flags = GENL_ADMIN_PERM,
- .policy = wimax_gnl_policy,
.doit = wimax_gnl_doit_state_get,
},
};
.version = WIMAX_GNL_VERSION,
.hdrsize = 0,
.maxattr = WIMAX_GNL_ATTR_MAX,
+ .policy = wimax_gnl_policy,
.module = THIS_MODULE,
.ops = wimax_gnl_ops,
.n_ops = ARRAY_SIZE(wimax_gnl_ops),
.doit = nl80211_get_wiphy,
.dumpit = nl80211_dump_wiphy,
.done = nl80211_dump_wiphy_done,
- .policy = nl80211_policy,
/* can be retrieved by unprivileged users */
.internal_flags = NL80211_FLAG_NEED_WIPHY |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_SET_WIPHY,
.doit = nl80211_set_wiphy,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_RTNL,
},
.cmd = NL80211_CMD_GET_INTERFACE,
.doit = nl80211_get_interface,
.dumpit = nl80211_dump_interface,
- .policy = nl80211_policy,
/* can be retrieved by unprivileged users */
.internal_flags = NL80211_FLAG_NEED_WDEV |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_SET_INTERFACE,
.doit = nl80211_set_interface,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_NEW_INTERFACE,
.doit = nl80211_new_interface,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_WIPHY |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_DEL_INTERFACE,
.doit = nl80211_del_interface,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_WDEV |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_GET_KEY,
.doit = nl80211_get_key,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_SET_KEY,
.doit = nl80211_set_key,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
NL80211_FLAG_NEED_RTNL |
{
.cmd = NL80211_CMD_NEW_KEY,
.doit = nl80211_new_key,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
NL80211_FLAG_NEED_RTNL |
{
.cmd = NL80211_CMD_DEL_KEY,
.doit = nl80211_del_key,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
NL80211_FLAG_NEED_RTNL,
},
{
.cmd = NL80211_CMD_SET_BEACON,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.doit = nl80211_set_beacon,
.internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
},
{
.cmd = NL80211_CMD_START_AP,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.doit = nl80211_start_ap,
.internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
},
{
.cmd = NL80211_CMD_STOP_AP,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.doit = nl80211_stop_ap,
.internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
.cmd = NL80211_CMD_GET_STATION,
.doit = nl80211_get_station,
.dumpit = nl80211_dump_station,
- .policy = nl80211_policy,
.internal_flags = NL80211_FLAG_NEED_NETDEV |
NL80211_FLAG_NEED_RTNL,
},
{
.cmd = NL80211_CMD_SET_STATION,
.doit = nl80211_set_station,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_NEW_STATION,
.doit = nl80211_new_station,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_DEL_STATION,
.doit = nl80211_del_station,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
NL80211_FLAG_NEED_RTNL,
.cmd = NL80211_CMD_GET_MPATH,
.doit = nl80211_get_mpath,
.dumpit = nl80211_dump_mpath,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
NL80211_FLAG_NEED_RTNL,
.cmd = NL80211_CMD_GET_MPP,
.doit = nl80211_get_mpp,
.dumpit = nl80211_dump_mpp,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_SET_MPATH,
.doit = nl80211_set_mpath,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_NEW_MPATH,
.doit = nl80211_new_mpath,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_DEL_MPATH,
.doit = nl80211_del_mpath,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_SET_BSS,
.doit = nl80211_set_bss,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
NL80211_FLAG_NEED_RTNL,
.cmd = NL80211_CMD_GET_REG,
.doit = nl80211_get_reg_do,
.dumpit = nl80211_get_reg_dump,
- .policy = nl80211_policy,
.internal_flags = NL80211_FLAG_NEED_RTNL,
/* can be retrieved by unprivileged users */
},
{
.cmd = NL80211_CMD_SET_REG,
.doit = nl80211_set_reg,
- .policy = nl80211_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_RTNL,
},
{
.cmd = NL80211_CMD_REQ_SET_REG,
.doit = nl80211_req_set_reg,
- .policy = nl80211_policy,
.flags = GENL_ADMIN_PERM,
},
{
.cmd = NL80211_CMD_RELOAD_REGDB,
.doit = nl80211_reload_regdb,
- .policy = nl80211_policy,
.flags = GENL_ADMIN_PERM,
},
{
.cmd = NL80211_CMD_GET_MESH_CONFIG,
.doit = nl80211_get_mesh_config,
- .policy = nl80211_policy,
/* can be retrieved by unprivileged users */
.internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_SET_MESH_CONFIG,
.doit = nl80211_update_mesh_config,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_TRIGGER_SCAN,
.doit = nl80211_trigger_scan,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_WDEV_UP |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_ABORT_SCAN,
.doit = nl80211_abort_scan,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_WDEV_UP |
NL80211_FLAG_NEED_RTNL,
},
{
.cmd = NL80211_CMD_GET_SCAN,
- .policy = nl80211_policy,
.dumpit = nl80211_dump_scan,
},
{
.cmd = NL80211_CMD_START_SCHED_SCAN,
.doit = nl80211_start_sched_scan,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_STOP_SCHED_SCAN,
.doit = nl80211_stop_sched_scan,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_AUTHENTICATE,
.doit = nl80211_authenticate,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
NL80211_FLAG_NEED_RTNL |
{
.cmd = NL80211_CMD_ASSOCIATE,
.doit = nl80211_associate,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
NL80211_FLAG_NEED_RTNL |
{
.cmd = NL80211_CMD_DEAUTHENTICATE,
.doit = nl80211_deauthenticate,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_DISASSOCIATE,
.doit = nl80211_disassociate,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_JOIN_IBSS,
.doit = nl80211_join_ibss,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_LEAVE_IBSS,
.doit = nl80211_leave_ibss,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
NL80211_FLAG_NEED_RTNL,
.cmd = NL80211_CMD_TESTMODE,
.doit = nl80211_testmode_do,
.dumpit = nl80211_testmode_dump,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_WIPHY |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_CONNECT,
.doit = nl80211_connect,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
NL80211_FLAG_NEED_RTNL |
{
.cmd = NL80211_CMD_UPDATE_CONNECT_PARAMS,
.doit = nl80211_update_connect_params,
- .policy = nl80211_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
NL80211_FLAG_NEED_RTNL |
{
.cmd = NL80211_CMD_DISCONNECT,
.doit = nl80211_disconnect,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_SET_WIPHY_NETNS,
.doit = nl80211_wiphy_netns,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_WIPHY |
NL80211_FLAG_NEED_RTNL,
},
{
.cmd = NL80211_CMD_GET_SURVEY,
- .policy = nl80211_policy,
.dumpit = nl80211_dump_survey,
},
{
.cmd = NL80211_CMD_SET_PMKSA,
.doit = nl80211_setdel_pmksa,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
NL80211_FLAG_NEED_RTNL |
{
.cmd = NL80211_CMD_DEL_PMKSA,
.doit = nl80211_setdel_pmksa,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_FLUSH_PMKSA,
.doit = nl80211_flush_pmksa,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_REMAIN_ON_CHANNEL,
.doit = nl80211_remain_on_channel,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_WDEV_UP |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_CANCEL_REMAIN_ON_CHANNEL,
.doit = nl80211_cancel_remain_on_channel,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_WDEV_UP |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_SET_TX_BITRATE_MASK,
.doit = nl80211_set_tx_bitrate_mask,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_REGISTER_FRAME,
.doit = nl80211_register_mgmt,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_WDEV |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_FRAME,
.doit = nl80211_tx_mgmt,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_WDEV_UP |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_FRAME_WAIT_CANCEL,
.doit = nl80211_tx_mgmt_cancel_wait,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_WDEV_UP |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_SET_POWER_SAVE,
.doit = nl80211_set_power_save,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_GET_POWER_SAVE,
.doit = nl80211_get_power_save,
- .policy = nl80211_policy,
/* can be retrieved by unprivileged users */
.internal_flags = NL80211_FLAG_NEED_NETDEV |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_SET_CQM,
.doit = nl80211_set_cqm,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_SET_CHANNEL,
.doit = nl80211_set_channel,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_SET_WDS_PEER,
.doit = nl80211_set_wds_peer,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_JOIN_MESH,
.doit = nl80211_join_mesh,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_LEAVE_MESH,
.doit = nl80211_leave_mesh,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_JOIN_OCB,
.doit = nl80211_join_ocb,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_LEAVE_OCB,
.doit = nl80211_leave_ocb,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_GET_WOWLAN,
.doit = nl80211_get_wowlan,
- .policy = nl80211_policy,
/* can be retrieved by unprivileged users */
.internal_flags = NL80211_FLAG_NEED_WIPHY |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_SET_WOWLAN,
.doit = nl80211_set_wowlan,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_WIPHY |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_SET_REKEY_OFFLOAD,
.doit = nl80211_set_rekey_data,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
NL80211_FLAG_NEED_RTNL |
{
.cmd = NL80211_CMD_TDLS_MGMT,
.doit = nl80211_tdls_mgmt,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_TDLS_OPER,
.doit = nl80211_tdls_oper,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_UNEXPECTED_FRAME,
.doit = nl80211_register_unexpected_frame,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_PROBE_CLIENT,
.doit = nl80211_probe_client,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_REGISTER_BEACONS,
.doit = nl80211_register_beacons,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_WIPHY |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_SET_NOACK_MAP,
.doit = nl80211_set_noack_map,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_START_P2P_DEVICE,
.doit = nl80211_start_p2p_device,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_WDEV |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_STOP_P2P_DEVICE,
.doit = nl80211_stop_p2p_device,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_WDEV_UP |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_START_NAN,
.doit = nl80211_start_nan,
- .policy = nl80211_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_WDEV |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_STOP_NAN,
.doit = nl80211_stop_nan,
- .policy = nl80211_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_WDEV_UP |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_ADD_NAN_FUNCTION,
.doit = nl80211_nan_add_func,
- .policy = nl80211_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_WDEV_UP |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_DEL_NAN_FUNCTION,
.doit = nl80211_nan_del_func,
- .policy = nl80211_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_WDEV_UP |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_CHANGE_NAN_CONFIG,
.doit = nl80211_nan_change_config,
- .policy = nl80211_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_WDEV_UP |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_SET_MCAST_RATE,
.doit = nl80211_set_mcast_rate,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_SET_MAC_ACL,
.doit = nl80211_set_mac_acl,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_RADAR_DETECT,
.doit = nl80211_start_radar_detection,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_GET_PROTOCOL_FEATURES,
.doit = nl80211_get_protocol_features,
- .policy = nl80211_policy,
},
{
.cmd = NL80211_CMD_UPDATE_FT_IES,
.doit = nl80211_update_ft_ies,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_CRIT_PROTOCOL_START,
.doit = nl80211_crit_protocol_start,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_WDEV_UP |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_CRIT_PROTOCOL_STOP,
.doit = nl80211_crit_protocol_stop,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_WDEV_UP |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_GET_COALESCE,
.doit = nl80211_get_coalesce,
- .policy = nl80211_policy,
.internal_flags = NL80211_FLAG_NEED_WIPHY |
NL80211_FLAG_NEED_RTNL,
},
{
.cmd = NL80211_CMD_SET_COALESCE,
.doit = nl80211_set_coalesce,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_WIPHY |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_CHANNEL_SWITCH,
.doit = nl80211_channel_switch,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
NL80211_FLAG_NEED_RTNL,
.cmd = NL80211_CMD_VENDOR,
.doit = nl80211_vendor_cmd,
.dumpit = nl80211_vendor_cmd_dump,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_WIPHY |
NL80211_FLAG_NEED_RTNL |
{
.cmd = NL80211_CMD_SET_QOS_MAP,
.doit = nl80211_set_qos_map,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_ADD_TX_TS,
.doit = nl80211_add_tx_ts,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_DEL_TX_TS,
.doit = nl80211_del_tx_ts,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_TDLS_CHANNEL_SWITCH,
.doit = nl80211_tdls_channel_switch,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_TDLS_CANCEL_CHANNEL_SWITCH,
.doit = nl80211_tdls_cancel_channel_switch,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_SET_MULTICAST_TO_UNICAST,
.doit = nl80211_set_multicast_to_unicast,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_SET_PMK,
.doit = nl80211_set_pmk,
- .policy = nl80211_policy,
.internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
NL80211_FLAG_NEED_RTNL |
NL80211_FLAG_CLEAR_SKB,
{
.cmd = NL80211_CMD_DEL_PMK,
.doit = nl80211_del_pmk,
- .policy = nl80211_policy,
.internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
NL80211_FLAG_NEED_RTNL,
},
{
.cmd = NL80211_CMD_EXTERNAL_AUTH,
.doit = nl80211_external_auth,
- .policy = nl80211_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_CONTROL_PORT_FRAME,
.doit = nl80211_tx_control_port,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_GET_FTM_RESPONDER_STATS,
.doit = nl80211_get_ftm_responder_stats,
- .policy = nl80211_policy,
.internal_flags = NL80211_FLAG_NEED_NETDEV |
NL80211_FLAG_NEED_RTNL,
},
{
.cmd = NL80211_CMD_PEER_MEASUREMENT_START,
.doit = nl80211_pmsr_start,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_WDEV_UP |
NL80211_FLAG_NEED_RTNL,
{
.cmd = NL80211_CMD_NOTIFY_RADAR,
.doit = nl80211_notify_radar_detection,
- .policy = nl80211_policy,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
NL80211_FLAG_NEED_RTNL,
.hdrsize = 0, /* no private header */
.version = 1, /* no particular meaning now */
.maxattr = NL80211_ATTR_MAX,
+ .policy = nl80211_policy,
.netnsok = true,
.pre_doit = nl80211_pre_doit,
.post_doit = nl80211_post_doit,
unsigned long flags;
rcu_read_lock();
- txq = netdev_pick_tx(dev, skb, NULL);
+ txq = netdev_core_pick_tx(dev, skb, NULL);
HARD_TX_LOCK(dev, txq, smp_processor_id());
if (!netif_xmit_frozen_or_stopped(txq))
xdp_redirect_map
xdp_router_ipv4
xdp_rxq_info
+xdp_sample_pkts
xdp_tx_iptunnel
xdpsock
#define asm_volatile_goto(x...) asm volatile("invalid use of asm_volatile_goto")
#endif
+#define volatile(x...) volatile("")
#endif
if (!addr)
return;
sym = ksym_search(addr);
+ if (!sym) {
+ printf("ksym not found. Is kallsyms loaded?\n");
+ return;
+ }
+
if (PRINT_RAW_ADDR)
printf("%s/%llx;", sym->name, addr);
else
for (i = 0; i < max; i++) {
if (counts[i].ip > PAGE_OFFSET) {
sym = ksym_search(counts[i].ip);
+ if (!sym) {
+ printf("ksym not found. Is kallsyms loaded?\n");
+ continue;
+ }
+
printf("0x%-17llx %-32s %u\n", counts[i].ip, sym->name,
counts[i].count);
} else {
bpf_map_lookup_elem(map_fd[0], &next_key, &value);
assert(next_key == value);
sym = ksym_search(value);
- printf(" %s", sym->name);
key = next_key;
+ if (!sym) {
+ printf("ksym not found. Is kallsyms loaded?\n");
+ continue;
+ }
+
+ printf(" %s", sym->name);
}
if (key)
printf("\n");
if (!addr)
return;
sym = ksym_search(addr);
+ if (!sym) {
+ printf("ksym not found. Is kallsyms loaded?\n");
+ return;
+ }
+
printf("%s;", sym->name);
if (!strcmp(sym->name, "sys_read"))
sys_read_seen = true;
info()
{
if [ "${quiet}" != "silent_" ]; then
- printf " %-7s %s\n" ${1} ${2}
+ printf " %-7s %s\n" "${1}" "${2}"
fi
}
fi
}
+# generate .BTF typeinfo from DWARF debuginfo
+gen_btf()
+{
+ local pahole_ver;
+
+ pahole_ver=$(${PAHOLE} --version | sed -E 's/v([0-9]+)\.([0-9]+)/\1\2/')
+ if [ "${pahole_ver}" -lt "113" ]; then
+ info "BTF" "${1}: pahole version $(${PAHOLE} --version) is too old, need at least v1.13"
+ exit 0
+ fi
+
+ info "BTF" ${1}
+ LLVM_OBJCOPY=${OBJCOPY} ${PAHOLE} -J ${1}
+}
# Create ${2} .o file with all symbols from the ${1} object file
kallsyms()
info LD vmlinux
vmlinux_link "${kallsymso}" vmlinux
+if [ -n "${CONFIG_DEBUG_INFO_BTF}" ]; then
+ gen_btf vmlinux
+fi
+
if [ -n "${CONFIG_BUILDTIME_EXTABLE_SORT}" ]; then
info SORTEX vmlinux
sortextable vmlinux
#define wmb() asm volatile("dmb ishst" ::: "memory")
#define rmb() asm volatile("dmb ishld" ::: "memory")
+/*
+ * Kernel uses dmb variants on arm64 for smp_*() barriers. Pretty much the same
+ * implementation as above mb()/wmb()/rmb(), though for the latter kernel uses
+ * dsb. In any case, should above mb()/wmb()/rmb() change, make sure the below
+ * smp_*() don't.
+ */
+#define smp_mb() asm volatile("dmb ish" ::: "memory")
+#define smp_wmb() asm volatile("dmb ishst" ::: "memory")
+#define smp_rmb() asm volatile("dmb ishld" ::: "memory")
+
#define smp_store_release(p, v) \
do { \
union { typeof(*p) __val; char __c[1]; } __u = \
#define rmb() asm volatile("lock; addl $0,0(%%esp)" ::: "memory")
#define wmb() asm volatile("lock; addl $0,0(%%esp)" ::: "memory")
#elif defined(__x86_64__)
-#define mb() asm volatile("mfence":::"memory")
-#define rmb() asm volatile("lfence":::"memory")
+#define mb() asm volatile("mfence" ::: "memory")
+#define rmb() asm volatile("lfence" ::: "memory")
#define wmb() asm volatile("sfence" ::: "memory")
+#define smp_rmb() barrier()
+#define smp_wmb() barrier()
+#define smp_mb() asm volatile("lock; addl $0,-132(%%rsp)" ::: "memory", "cc")
#endif
#if defined(__x86_64__)
return ret;
}
+static int btf_dumper_var(const struct btf_dumper *d, __u32 type_id,
+ __u8 bit_offset, const void *data)
+{
+ const struct btf_type *t = btf__type_by_id(d->btf, type_id);
+ int ret;
+
+ jsonw_start_object(d->jw);
+ jsonw_name(d->jw, btf__name_by_offset(d->btf, t->name_off));
+ ret = btf_dumper_do_type(d, t->type, bit_offset, data);
+ jsonw_end_object(d->jw);
+
+ return ret;
+}
+
+static int btf_dumper_datasec(const struct btf_dumper *d, __u32 type_id,
+ const void *data)
+{
+ struct btf_var_secinfo *vsi;
+ const struct btf_type *t;
+ int ret = 0, i, vlen;
+
+ t = btf__type_by_id(d->btf, type_id);
+ if (!t)
+ return -EINVAL;
+
+ vlen = BTF_INFO_VLEN(t->info);
+ vsi = (struct btf_var_secinfo *)(t + 1);
+
+ jsonw_start_object(d->jw);
+ jsonw_name(d->jw, btf__name_by_offset(d->btf, t->name_off));
+ jsonw_start_array(d->jw);
+ for (i = 0; i < vlen; i++) {
+ ret = btf_dumper_do_type(d, vsi[i].type, 0, data + vsi[i].offset);
+ if (ret)
+ break;
+ }
+ jsonw_end_array(d->jw);
+ jsonw_end_object(d->jw);
+
+ return ret;
+}
+
static int btf_dumper_do_type(const struct btf_dumper *d, __u32 type_id,
__u8 bit_offset, const void *data)
{
case BTF_KIND_CONST:
case BTF_KIND_RESTRICT:
return btf_dumper_modifier(d, type_id, bit_offset, data);
+ case BTF_KIND_VAR:
+ return btf_dumper_var(d, type_id, bit_offset, data);
+ case BTF_KIND_DATASEC:
+ return btf_dumper_datasec(d, type_id, data);
default:
jsonw_printf(d->jw, "(unsupported-kind");
return -EINVAL;
{
const struct btf_type *proto_type;
const struct btf_array *array;
+ const struct btf_var *var;
const struct btf_type *t;
if (!type_id) {
if (pos == -1)
return -1;
break;
+ case BTF_KIND_VAR:
+ var = (struct btf_var *)(t + 1);
+ if (var->linkage == BTF_VAR_STATIC)
+ BTF_PRINT_ARG("static ");
+ BTF_PRINT_TYPE(t->type);
+ BTF_PRINT_ARG(" %s",
+ btf__name_by_offset(btf, t->name_off));
+ break;
+ case BTF_KIND_DATASEC:
+ BTF_PRINT_ARG("section (\"%s\") ",
+ btf__name_by_offset(btf, t->name_off));
+ break;
case BTF_KIND_UNKN:
default:
return -1;
/* start of key-value pair */
jsonw_start_object(d->jw);
- jsonw_name(d->jw, "key");
+ if (map_info->btf_key_type_id) {
+ jsonw_name(d->jw, "key");
- ret = btf_dumper_type(d, map_info->btf_key_type_id, key);
- if (ret)
- goto err_end_obj;
+ ret = btf_dumper_type(d, map_info->btf_key_type_id, key);
+ if (ret)
+ goto err_end_obj;
+ }
if (!map_is_per_cpu(map_info->type)) {
jsonw_name(d->jw, "value");
if (info->nr_map_ids)
show_prog_maps(fd, info->nr_map_ids);
+ if (info->btf_id)
+ jsonw_int_field(json_wtr, "btf_id", info->btf_id);
+
if (!hash_empty(prog_table.table)) {
struct pinned_obj *obj;
}
}
+ if (info->btf_id)
+ printf("\n\tbtf_id %d\n", info->btf_id);
+
printf("\n");
}
if (insn->src_reg == BPF_PSEUDO_MAP_FD)
snprintf(dd->scratch_buff, sizeof(dd->scratch_buff),
"map[id:%u]", insn->imm);
+ else if (insn->src_reg == BPF_PSEUDO_MAP_VALUE)
+ snprintf(dd->scratch_buff, sizeof(dd->scratch_buff),
+ "map[id:%u][0]+%u", insn->imm, (insn + 1)->imm);
else
snprintf(dd->scratch_buff, sizeof(dd->scratch_buff),
"0x%llx", (unsigned long long)full_imm);
.off = 0, \
.imm = ((__u64) (IMM)) >> 32 })
+#define BPF_LD_IMM64_RAW_FULL(DST, SRC, OFF1, OFF2, IMM1, IMM2) \
+ ((struct bpf_insn) { \
+ .code = BPF_LD | BPF_DW | BPF_IMM, \
+ .dst_reg = DST, \
+ .src_reg = SRC, \
+ .off = OFF1, \
+ .imm = IMM1 }), \
+ ((struct bpf_insn) { \
+ .code = 0, /* zero is reserved opcode */ \
+ .dst_reg = 0, \
+ .src_reg = 0, \
+ .off = OFF2, \
+ .imm = IMM2 })
+
/* pseudo BPF_LD_IMM64 insn used to refer to process-local map_fd */
#define BPF_LD_MAP_FD(DST, MAP_FD) \
- BPF_LD_IMM64_RAW(DST, BPF_PSEUDO_MAP_FD, MAP_FD)
+ BPF_LD_IMM64_RAW_FULL(DST, BPF_PSEUDO_MAP_FD, 0, 0, \
+ MAP_FD, 0)
+
+#define BPF_LD_MAP_VALUE(DST, MAP_FD, VALUE_OFF) \
+ BPF_LD_IMM64_RAW_FULL(DST, BPF_PSEUDO_MAP_VALUE, 0, 0, \
+ MAP_FD, VALUE_OFF)
/* Relative call */
BPF_BTF_GET_FD_BY_ID,
BPF_TASK_FD_QUERY,
BPF_MAP_LOOKUP_AND_DELETE_ELEM,
+ BPF_MAP_FREEZE,
};
enum bpf_map_type {
*/
#define BPF_F_ANY_ALIGNMENT (1U << 1)
-/* when bpf_ldimm64->src_reg == BPF_PSEUDO_MAP_FD, bpf_ldimm64->imm == fd */
+/* When BPF ldimm64's insn[0].src_reg != 0 then this can have
+ * two extensions:
+ *
+ * insn[0].src_reg: BPF_PSEUDO_MAP_FD BPF_PSEUDO_MAP_VALUE
+ * insn[0].imm: map fd map fd
+ * insn[1].imm: 0 offset into value
+ * insn[0].off: 0 0
+ * insn[1].off: 0 0
+ * ldimm64 rewrite: address of map address of map[0]+offset
+ * verifier type: CONST_PTR_TO_MAP PTR_TO_MAP_VALUE
+ */
#define BPF_PSEUDO_MAP_FD 1
+#define BPF_PSEUDO_MAP_VALUE 2
/* when bpf_call->src_reg == BPF_PSEUDO_CALL, bpf_call->imm == pc-relative
* offset to another bpf function
#define BPF_OBJ_NAME_LEN 16U
-/* Flags for accessing BPF object */
+/* Flags for accessing BPF object from syscall side. */
#define BPF_F_RDONLY (1U << 3)
#define BPF_F_WRONLY (1U << 4)
/* Zero-initialize hash function seed. This should only be used for testing. */
#define BPF_F_ZERO_SEED (1U << 6)
+/* Flags for accessing BPF object from program side. */
+#define BPF_F_RDONLY_PROG (1U << 7)
+#define BPF_F_WRONLY_PROG (1U << 8)
+
/* flags for BPF_PROG_QUERY */
#define BPF_F_QUERY_EFFECTIVE (1U << 0)
__aligned_u64 data_out;
__u32 repeat;
__u32 duration;
+ __u32 ctx_size_in; /* input: len of ctx_in */
+ __u32 ctx_size_out; /* input/output: len of ctx_out
+ * returns ENOSPC if ctx_out
+ * is too small.
+ */
+ __aligned_u64 ctx_in;
+ __aligned_u64 ctx_out;
} test;
struct { /* anonymous struct used by BPF_*_GET_*_ID */
* Grow or shrink the room for data in the packet associated to
* *skb* by *len_diff*, and according to the selected *mode*.
*
- * There is a single supported mode at this time:
+ * There are two supported modes at this time:
+ *
+ * * **BPF_ADJ_ROOM_MAC**: Adjust room at the mac layer
+ * (room space is added or removed below the layer 2 header).
*
* * **BPF_ADJ_ROOM_NET**: Adjust room at the network layer
* (room space is added or removed below the layer 3 header).
*
- * All values for *flags* are reserved for future usage, and must
- * be left at zero.
+ * The following flags are supported at this time:
+ *
+ * * **BPF_F_ADJ_ROOM_FIXED_GSO**: Do not adjust gso_size.
+ * Adjusting mss in this way is not allowed for datagrams.
+ *
+ * * **BPF_F_ADJ_ROOM_ENCAP_L3_IPV4 **:
+ * * **BPF_F_ADJ_ROOM_ENCAP_L3_IPV6 **:
+ * Any new space is reserved to hold a tunnel header.
+ * Configure skb offsets and other fields accordingly.
+ *
+ * * **BPF_F_ADJ_ROOM_ENCAP_L4_GRE **:
+ * * **BPF_F_ADJ_ROOM_ENCAP_L4_UDP **:
+ * Use with ENCAP_L3 flags to further specify the tunnel type.
+ *
+ * * **BPF_F_ADJ_ROOM_ENCAP_L2(len) **:
+ * Use with ENCAP_L3/L4 flags to further specify the tunnel
+ * type; **len** is the length of the inner MAC header.
*
* A call to this helper is susceptible to change the underlaying
* packet buffer. Therefore, at load time, all checks on pointers
* Return
* A **struct bpf_sock** pointer on success, or **NULL** in
* case of failure.
+ *
+ * struct bpf_sock *bpf_skc_lookup_tcp(void *ctx, struct bpf_sock_tuple *tuple, u32 tuple_size, u64 netns, u64 flags)
+ * Description
+ * Look for TCP socket matching *tuple*, optionally in a child
+ * network namespace *netns*. The return value must be checked,
+ * and if non-**NULL**, released via **bpf_sk_release**\ ().
+ *
+ * This function is identical to bpf_sk_lookup_tcp, except that it
+ * also returns timewait or request sockets. Use bpf_sk_fullsock
+ * or bpf_tcp_socket to access the full structure.
+ *
+ * This helper is available only if the kernel was compiled with
+ * **CONFIG_NET** configuration option.
+ * Return
+ * Pointer to **struct bpf_sock**, or **NULL** in case of failure.
+ * For sockets with reuseport option, the **struct bpf_sock**
+ * result is from **reuse->socks**\ [] using the hash of the tuple.
+ *
+ * int bpf_tcp_check_syncookie(struct bpf_sock *sk, void *iph, u32 iph_len, struct tcphdr *th, u32 th_len)
+ * Description
+ * Check whether iph and th contain a valid SYN cookie ACK for
+ * the listening socket in sk.
+ *
+ * iph points to the start of the IPv4 or IPv6 header, while
+ * iph_len contains sizeof(struct iphdr) or sizeof(struct ip6hdr).
+ *
+ * th points to the start of the TCP header, while th_len contains
+ * sizeof(struct tcphdr).
+ *
+ * Return
+ * 0 if iph and th are a valid SYN cookie ACK, or a negative error
+ * otherwise.
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
FN(sk_fullsock), \
FN(tcp_sock), \
FN(skb_ecn_set_ce), \
- FN(get_listener_sock),
+ FN(get_listener_sock), \
+ FN(skc_lookup_tcp), \
+ FN(tcp_check_syncookie),
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
* function eBPF program intends to call
/* Current network namespace */
#define BPF_F_CURRENT_NETNS (-1L)
+/* BPF_FUNC_skb_adjust_room flags. */
+#define BPF_F_ADJ_ROOM_FIXED_GSO (1ULL << 0)
+
+#define BPF_ADJ_ROOM_ENCAP_L2_MASK 0xff
+#define BPF_ADJ_ROOM_ENCAP_L2_SHIFT 56
+
+#define BPF_F_ADJ_ROOM_ENCAP_L3_IPV4 (1ULL << 1)
+#define BPF_F_ADJ_ROOM_ENCAP_L3_IPV6 (1ULL << 2)
+#define BPF_F_ADJ_ROOM_ENCAP_L4_GRE (1ULL << 3)
+#define BPF_F_ADJ_ROOM_ENCAP_L4_UDP (1ULL << 4)
+#define BPF_F_ADJ_ROOM_ENCAP_L2(len) (((__u64)len & \
+ BPF_ADJ_ROOM_ENCAP_L2_MASK) \
+ << BPF_ADJ_ROOM_ENCAP_L2_SHIFT)
+
/* Mode for BPF_FUNC_skb_adjust_room helper. */
enum bpf_adj_room_mode {
BPF_ADJ_ROOM_NET,
+ BPF_ADJ_ROOM_MAC,
};
/* Mode for BPF_FUNC_skb_load_bytes_relative helper. */
* struct, union and fwd
*/
__u32 info;
- /* "size" is used by INT, ENUM, STRUCT and UNION.
+ /* "size" is used by INT, ENUM, STRUCT, UNION and DATASEC.
* "size" tells the size of the type it is describing.
*
* "type" is used by PTR, TYPEDEF, VOLATILE, CONST, RESTRICT,
- * FUNC and FUNC_PROTO.
+ * FUNC, FUNC_PROTO and VAR.
* "type" is a type_id referring to another type.
*/
union {
#define BTF_KIND_RESTRICT 11 /* Restrict */
#define BTF_KIND_FUNC 12 /* Function */
#define BTF_KIND_FUNC_PROTO 13 /* Function Proto */
-#define BTF_KIND_MAX 13
-#define NR_BTF_KINDS 14
+#define BTF_KIND_VAR 14 /* Variable */
+#define BTF_KIND_DATASEC 15 /* Section */
+#define BTF_KIND_MAX BTF_KIND_DATASEC
+#define NR_BTF_KINDS (BTF_KIND_MAX + 1)
/* For some specific BTF_KIND, "struct btf_type" is immediately
* followed by extra data.
__u32 type;
};
+enum {
+ BTF_VAR_STATIC = 0,
+ BTF_VAR_GLOBAL_ALLOCATED,
+};
+
+/* BTF_KIND_VAR is followed by a single "struct btf_var" to describe
+ * additional information related to the variable such as its linkage.
+ */
+struct btf_var {
+ __u32 linkage;
+};
+
+/* BTF_KIND_DATASEC is followed by multiple "struct btf_var_secinfo"
+ * to describe all BTF_KIND_VAR types it contains along with it's
+ * in-section offset as well as size.
+ */
+struct btf_var_secinfo {
+ __u32 type;
+ __u32 offset;
+ __u32 size;
+};
+
#endif /* _UAPI__LINUX_BTF_H__ */
libbpf_version.h
+libbpf.pc
FEATURE-DUMP.libbpf
test_libbpf
BPF_VERSION = 0
BPF_PATCHLEVEL = 0
-BPF_EXTRAVERSION = 2
+BPF_EXTRAVERSION = 3
MAKEFLAGS += --no-print-directory
LIB_TARGET = libbpf.a libbpf.so.$(LIBBPF_VERSION)
LIB_FILE = libbpf.a libbpf.so*
+PC_FILE = libbpf.pc
# Set compile option CFLAGS
ifdef EXTRA_CFLAGS
LIB_TARGET := $(addprefix $(OUTPUT),$(LIB_TARGET))
LIB_FILE := $(addprefix $(OUTPUT),$(LIB_FILE))
+PC_FILE := $(addprefix $(OUTPUT),$(PC_FILE))
GLOBAL_SYM_COUNT = $(shell readelf -s --wide $(BPF_IN) | \
awk '/GLOBAL/ && /DEFAULT/ && !/UND/ {s++} END{print s}')
VERSIONED_SYM_COUNT = $(shell readelf -s --wide $(OUTPUT)libbpf.so | \
grep -Eo '[^ ]+@LIBBPF_' | cut -d@ -f1 | sort -u | wc -l)
-CMD_TARGETS = $(LIB_TARGET)
+CMD_TARGETS = $(LIB_TARGET) $(PC_FILE)
CXX_TEST_TARGET = $(OUTPUT)test_libbpf
$(OUTPUT)test_libbpf: test_libbpf.cpp $(OUTPUT)libbpf.a
$(QUIET_LINK)$(CXX) $(INCLUDES) $^ -lelf -o $@
+$(OUTPUT)libbpf.pc:
+ $(QUIET_GEN)sed -e "s|@PREFIX@|$(prefix)|" \
+ -e "s|@LIBDIR@|$(libdir_SQ)|" \
+ -e "s|@VERSION@|$(LIBBPF_VERSION)|" \
+ < libbpf.pc.template > $@
+
check: check_abi
check_abi: $(OUTPUT)libbpf.so
$(call do_install,btf.h,$(prefix)/include/bpf,644); \
$(call do_install,xsk.h,$(prefix)/include/bpf,644);
-install: install_lib
+install_pkgconfig: $(PC_FILE)
+ $(call QUIET_INSTALL, $(PC_FILE)) \
+ $(call do_install,$(PC_FILE),$(libdir_SQ)/pkgconfig,644)
+
+install: install_lib install_pkgconfig
### Cleaning rules
clean:
$(call QUIET_CLEAN, libbpf) $(RM) $(TARGETS) $(CXX_TEST_TARGET) \
- *.o *~ *.a *.so *.so.$(VERSION) .*.d .*.cmd LIBBPF-CFLAGS
+ *.o *~ *.a *.so *.so.$(VERSION) .*.d .*.cmd *.pc LIBBPF-CFLAGS
$(call QUIET_CLEAN, core-gen) $(RM) $(OUTPUT)FEATURE-DUMP.libbpf
int bpf_create_map_xattr(const struct bpf_create_map_attr *create_attr)
{
- __u32 name_len = create_attr->name ? strlen(create_attr->name) : 0;
union bpf_attr attr;
memset(&attr, '\0', sizeof(attr));
attr.value_size = create_attr->value_size;
attr.max_entries = create_attr->max_entries;
attr.map_flags = create_attr->map_flags;
- memcpy(attr.map_name, create_attr->name,
- min(name_len, BPF_OBJ_NAME_LEN - 1));
+ if (create_attr->name)
+ memcpy(attr.map_name, create_attr->name,
+ min(strlen(create_attr->name), BPF_OBJ_NAME_LEN - 1));
attr.numa_node = create_attr->numa_node;
attr.btf_fd = create_attr->btf_fd;
attr.btf_key_type_id = create_attr->btf_key_type_id;
int key_size, int inner_map_fd, int max_entries,
__u32 map_flags, int node)
{
- __u32 name_len = name ? strlen(name) : 0;
union bpf_attr attr;
memset(&attr, '\0', sizeof(attr));
attr.inner_map_fd = inner_map_fd;
attr.max_entries = max_entries;
attr.map_flags = map_flags;
- memcpy(attr.map_name, name, min(name_len, BPF_OBJ_NAME_LEN - 1));
+ if (name)
+ memcpy(attr.map_name, name,
+ min(strlen(name), BPF_OBJ_NAME_LEN - 1));
if (node >= 0) {
attr.map_flags |= BPF_F_NUMA_NODE;
void *finfo = NULL, *linfo = NULL;
union bpf_attr attr;
__u32 log_level;
- __u32 name_len;
int fd;
if (!load_attr || !log_buf != !log_buf_sz)
return -EINVAL;
log_level = load_attr->log_level;
- if (log_level > 2 || (log_level && !log_buf))
+ if (log_level > (4 | 2 | 1) || (log_level && !log_buf))
return -EINVAL;
- name_len = load_attr->name ? strlen(load_attr->name) : 0;
-
memset(&attr, 0, sizeof(attr));
attr.prog_type = load_attr->prog_type;
attr.expected_attach_type = load_attr->expected_attach_type;
attr.line_info_rec_size = load_attr->line_info_rec_size;
attr.line_info_cnt = load_attr->line_info_cnt;
attr.line_info = ptr_to_u64(load_attr->line_info);
- memcpy(attr.prog_name, load_attr->name,
- min(name_len, BPF_OBJ_NAME_LEN - 1));
+ if (load_attr->name)
+ memcpy(attr.prog_name, load_attr->name,
+ min(strlen(load_attr->name), BPF_OBJ_NAME_LEN - 1));
fd = sys_bpf_prog_load(&attr, sizeof(attr));
if (fd >= 0)
return sys_bpf(BPF_MAP_GET_NEXT_KEY, &attr, sizeof(attr));
}
+int bpf_map_freeze(int fd)
+{
+ union bpf_attr attr;
+
+ memset(&attr, 0, sizeof(attr));
+ attr.map_fd = fd;
+
+ return sys_bpf(BPF_MAP_FREEZE, &attr, sizeof(attr));
+}
+
int bpf_obj_pin(int fd, const char *pathname)
{
union bpf_attr attr;
attr.test.data_out = ptr_to_u64(test_attr->data_out);
attr.test.data_size_in = test_attr->data_size_in;
attr.test.data_size_out = test_attr->data_size_out;
+ attr.test.ctx_in = ptr_to_u64(test_attr->ctx_in);
+ attr.test.ctx_out = ptr_to_u64(test_attr->ctx_out);
+ attr.test.ctx_size_in = test_attr->ctx_size_in;
+ attr.test.ctx_size_out = test_attr->ctx_size_out;
attr.test.repeat = test_attr->repeat;
ret = sys_bpf(BPF_PROG_TEST_RUN, &attr, sizeof(attr));
test_attr->data_size_out = attr.test.data_size_out;
+ test_attr->ctx_size_out = attr.test.ctx_size_out;
test_attr->retval = attr.test.retval;
test_attr->duration = attr.test.duration;
return ret;
#define MAPS_RELAX_COMPAT 0x01
/* Recommend log buffer size */
-#define BPF_LOG_BUF_SIZE (256 * 1024)
+#define BPF_LOG_BUF_SIZE (16 * 1024 * 1024) /* verifier maximum in kernels <= 5.1 */
LIBBPF_API int
bpf_load_program_xattr(const struct bpf_load_program_attr *load_attr,
char *log_buf, size_t log_buf_sz);
void *value);
LIBBPF_API int bpf_map_delete_elem(int fd, const void *key);
LIBBPF_API int bpf_map_get_next_key(int fd, const void *key, void *next_key);
+LIBBPF_API int bpf_map_freeze(int fd);
LIBBPF_API int bpf_obj_pin(int fd, const char *pathname);
LIBBPF_API int bpf_obj_get(const char *pathname);
LIBBPF_API int bpf_prog_attach(int prog_fd, int attachable_fd,
* out: length of data_out */
__u32 retval; /* out: return code of the BPF program */
__u32 duration; /* out: average per repetition in ns */
+ const void *ctx_in; /* optional */
+ __u32 ctx_size_in;
+ void *ctx_out; /* optional */
+ __u32 ctx_size_out; /* in: max length of ctx_out
+ * out: length of cxt_out */
};
LIBBPF_API int bpf_prog_test_run_xattr(struct bpf_prog_test_run_attr *test_attr);
((k) == BTF_KIND_CONST) || \
((k) == BTF_KIND_RESTRICT))
+#define IS_VAR(k) ((k) == BTF_KIND_VAR)
+
static struct btf_type btf_void;
struct btf {
return base_size + vlen * sizeof(struct btf_member);
case BTF_KIND_FUNC_PROTO:
return base_size + vlen * sizeof(struct btf_param);
+ case BTF_KIND_VAR:
+ return base_size + sizeof(struct btf_var);
+ case BTF_KIND_DATASEC:
+ return base_size + vlen * sizeof(struct btf_var_secinfo);
default:
pr_debug("Unsupported BTF_KIND:%u\n", BTF_INFO_KIND(t->info));
return -EINVAL;
case BTF_KIND_STRUCT:
case BTF_KIND_UNION:
case BTF_KIND_ENUM:
+ case BTF_KIND_DATASEC:
size = t->size;
goto done;
case BTF_KIND_PTR:
case BTF_KIND_VOLATILE:
case BTF_KIND_CONST:
case BTF_KIND_RESTRICT:
+ case BTF_KIND_VAR:
type_id = t->type;
break;
case BTF_KIND_ARRAY:
t = btf__type_by_id(btf, type_id);
while (depth < MAX_RESOLVE_DEPTH &&
!btf_type_is_void_or_null(t) &&
- IS_MODIFIER(BTF_INFO_KIND(t->info))) {
+ (IS_MODIFIER(BTF_INFO_KIND(t->info)) ||
+ IS_VAR(BTF_INFO_KIND(t->info)))) {
type_id = t->type;
t = btf__type_by_id(btf, type_id);
depth++;
return btf;
}
+static int compare_vsi_off(const void *_a, const void *_b)
+{
+ const struct btf_var_secinfo *a = _a;
+ const struct btf_var_secinfo *b = _b;
+
+ return a->offset - b->offset;
+}
+
+static int btf_fixup_datasec(struct bpf_object *obj, struct btf *btf,
+ struct btf_type *t)
+{
+ __u32 size = 0, off = 0, i, vars = BTF_INFO_VLEN(t->info);
+ const char *name = btf__name_by_offset(btf, t->name_off);
+ const struct btf_type *t_var;
+ struct btf_var_secinfo *vsi;
+ struct btf_var *var;
+ int ret;
+
+ if (!name) {
+ pr_debug("No name found in string section for DATASEC kind.\n");
+ return -ENOENT;
+ }
+
+ ret = bpf_object__section_size(obj, name, &size);
+ if (ret || !size || (t->size && t->size != size)) {
+ pr_debug("Invalid size for section %s: %u bytes\n", name, size);
+ return -ENOENT;
+ }
+
+ t->size = size;
+
+ for (i = 0, vsi = (struct btf_var_secinfo *)(t + 1);
+ i < vars; i++, vsi++) {
+ t_var = btf__type_by_id(btf, vsi->type);
+ var = (struct btf_var *)(t_var + 1);
+
+ if (BTF_INFO_KIND(t_var->info) != BTF_KIND_VAR) {
+ pr_debug("Non-VAR type seen in section %s\n", name);
+ return -EINVAL;
+ }
+
+ if (var->linkage == BTF_VAR_STATIC)
+ continue;
+
+ name = btf__name_by_offset(btf, t_var->name_off);
+ if (!name) {
+ pr_debug("No name found in string section for VAR kind\n");
+ return -ENOENT;
+ }
+
+ ret = bpf_object__variable_offset(obj, name, &off);
+ if (ret) {
+ pr_debug("No offset found in symbol table for VAR %s\n", name);
+ return -ENOENT;
+ }
+
+ vsi->offset = off;
+ }
+
+ qsort(t + 1, vars, sizeof(*vsi), compare_vsi_off);
+ return 0;
+}
+
+int btf__finalize_data(struct bpf_object *obj, struct btf *btf)
+{
+ int err = 0;
+ __u32 i;
+
+ for (i = 1; i <= btf->nr_types; i++) {
+ struct btf_type *t = btf->types[i];
+
+ /* Loader needs to fix up some of the things compiler
+ * couldn't get its hands on while emitting BTF. This
+ * is section size and global variable offset. We use
+ * the info from the ELF itself for this purpose.
+ */
+ if (BTF_INFO_KIND(t->info) == BTF_KIND_DATASEC) {
+ err = btf_fixup_datasec(obj, btf, t);
+ if (err)
+ break;
+ }
+ }
+
+ return err;
+}
+
int btf__load(struct btf *btf)
{
__u32 log_buf_size = BPF_LOG_BUF_SIZE;
struct btf_ext;
struct btf_type;
+struct bpf_object;
+
/*
* The .BTF.ext ELF section layout defined as
* struct btf_ext_header
LIBBPF_API void btf__free(struct btf *btf);
LIBBPF_API struct btf *btf__new(__u8 *data, __u32 size);
+LIBBPF_API int btf__finalize_data(struct bpf_object *obj, struct btf *btf);
LIBBPF_API int btf__load(struct btf *btf);
LIBBPF_API __s32 btf__find_by_name(const struct btf *btf,
const char *type_name);
* Copyright (C) 2015 Wang Nan <wangnan0@huawei.com>
* Copyright (C) 2015 Huawei Inc.
* Copyright (C) 2017 Nicira, Inc.
+ * Copyright (C) 2019 Isovalent, Inc.
*/
#ifndef _GNU_SOURCE
#define BPF_FS_MAGIC 0xcafe4a11
#endif
+/* vsprintf() in __base_pr() uses nonliteral format string. It may break
+ * compilation if user enables corresponding warning. Disable it explicitly.
+ */
+#pragma GCC diagnostic ignored "-Wformat-nonliteral"
+
#define __printf(a, b) __attribute__((format(printf, a, b)))
static int __base_pr(enum libbpf_print_level level, const char *format,
enum {
RELO_LD64,
RELO_CALL,
+ RELO_DATA,
} type;
int insn_idx;
union {
};
} *reloc_desc;
int nr_reloc;
+ int log_level;
struct {
int nr;
__u32 line_info_cnt;
};
+enum libbpf_map_type {
+ LIBBPF_MAP_UNSPEC,
+ LIBBPF_MAP_DATA,
+ LIBBPF_MAP_BSS,
+ LIBBPF_MAP_RODATA,
+};
+
+static const char * const libbpf_type_to_btf_name[] = {
+ [LIBBPF_MAP_DATA] = ".data",
+ [LIBBPF_MAP_BSS] = ".bss",
+ [LIBBPF_MAP_RODATA] = ".rodata",
+};
+
struct bpf_map {
int fd;
char *name;
__u32 btf_value_type_id;
void *priv;
bpf_map_clear_priv_t clear_priv;
+ enum libbpf_map_type libbpf_type;
+};
+
+struct bpf_secdata {
+ void *rodata;
+ void *data;
};
static LIST_HEAD(bpf_objects_list);
struct bpf_object {
+ char name[BPF_OBJ_NAME_LEN];
char license[64];
__u32 kern_version;
size_t nr_programs;
struct bpf_map *maps;
size_t nr_maps;
+ struct bpf_secdata sections;
bool loaded;
bool has_pseudo_calls;
Elf *elf;
GElf_Ehdr ehdr;
Elf_Data *symbols;
+ Elf_Data *data;
+ Elf_Data *rodata;
+ Elf_Data *bss;
size_t strtabidx;
struct {
GElf_Shdr shdr;
int nr_reloc;
int maps_shndx;
int text_shndx;
+ int data_shndx;
+ int rodata_shndx;
+ int bss_shndx;
} efile;
/*
* All loaded bpf_object is linked in a list, which is
size_t obj_buf_sz)
{
struct bpf_object *obj;
+ char *end;
obj = calloc(1, sizeof(struct bpf_object) + strlen(path) + 1);
if (!obj) {
}
strcpy(obj->path, path);
- obj->efile.fd = -1;
+ /* Using basename() GNU version which doesn't modify arg. */
+ strncpy(obj->name, basename((void *)path),
+ sizeof(obj->name) - 1);
+ end = strchr(obj->name, '.');
+ if (end)
+ *end = 0;
+ obj->efile.fd = -1;
/*
* Caller of this function should also calls
* bpf_object__elf_finish() after data collection to return
obj->efile.obj_buf = obj_buf;
obj->efile.obj_buf_sz = obj_buf_sz;
obj->efile.maps_shndx = -1;
+ obj->efile.data_shndx = -1;
+ obj->efile.rodata_shndx = -1;
+ obj->efile.bss_shndx = -1;
obj->loaded = false;
obj->efile.elf = NULL;
}
obj->efile.symbols = NULL;
+ obj->efile.data = NULL;
+ obj->efile.rodata = NULL;
+ obj->efile.bss = NULL;
zfree(&obj->efile.reloc);
obj->efile.nr_reloc = 0;
return false;
}
+static int bpf_object_search_section_size(const struct bpf_object *obj,
+ const char *name, size_t *d_size)
+{
+ const GElf_Ehdr *ep = &obj->efile.ehdr;
+ Elf *elf = obj->efile.elf;
+ Elf_Scn *scn = NULL;
+ int idx = 0;
+
+ while ((scn = elf_nextscn(elf, scn)) != NULL) {
+ const char *sec_name;
+ Elf_Data *data;
+ GElf_Shdr sh;
+
+ idx++;
+ if (gelf_getshdr(scn, &sh) != &sh) {
+ pr_warning("failed to get section(%d) header from %s\n",
+ idx, obj->path);
+ return -EIO;
+ }
+
+ sec_name = elf_strptr(elf, ep->e_shstrndx, sh.sh_name);
+ if (!sec_name) {
+ pr_warning("failed to get section(%d) name from %s\n",
+ idx, obj->path);
+ return -EIO;
+ }
+
+ if (strcmp(name, sec_name))
+ continue;
+
+ data = elf_getdata(scn, 0);
+ if (!data) {
+ pr_warning("failed to get section(%d) data from %s(%s)\n",
+ idx, name, obj->path);
+ return -EIO;
+ }
+
+ *d_size = data->d_size;
+ return 0;
+ }
+
+ return -ENOENT;
+}
+
+int bpf_object__section_size(const struct bpf_object *obj, const char *name,
+ __u32 *size)
+{
+ int ret = -ENOENT;
+ size_t d_size;
+
+ *size = 0;
+ if (!name) {
+ return -EINVAL;
+ } else if (!strcmp(name, ".data")) {
+ if (obj->efile.data)
+ *size = obj->efile.data->d_size;
+ } else if (!strcmp(name, ".bss")) {
+ if (obj->efile.bss)
+ *size = obj->efile.bss->d_size;
+ } else if (!strcmp(name, ".rodata")) {
+ if (obj->efile.rodata)
+ *size = obj->efile.rodata->d_size;
+ } else {
+ ret = bpf_object_search_section_size(obj, name, &d_size);
+ if (!ret)
+ *size = d_size;
+ }
+
+ return *size ? 0 : ret;
+}
+
+int bpf_object__variable_offset(const struct bpf_object *obj, const char *name,
+ __u32 *off)
+{
+ Elf_Data *symbols = obj->efile.symbols;
+ const char *sname;
+ size_t si;
+
+ if (!name || !off)
+ return -EINVAL;
+
+ for (si = 0; si < symbols->d_size / sizeof(GElf_Sym); si++) {
+ GElf_Sym sym;
+
+ if (!gelf_getsym(symbols, si, &sym))
+ continue;
+ if (GELF_ST_BIND(sym.st_info) != STB_GLOBAL ||
+ GELF_ST_TYPE(sym.st_info) != STT_OBJECT)
+ continue;
+
+ sname = elf_strptr(obj->efile.elf, obj->efile.strtabidx,
+ sym.st_name);
+ if (!sname) {
+ pr_warning("failed to get sym name string for var %s\n",
+ name);
+ return -EIO;
+ }
+ if (strcmp(name, sname) == 0) {
+ *off = sym.st_value;
+ return 0;
+ }
+ }
+
+ return -ENOENT;
+}
+
+static bool bpf_object__has_maps(const struct bpf_object *obj)
+{
+ return obj->efile.maps_shndx >= 0 ||
+ obj->efile.data_shndx >= 0 ||
+ obj->efile.rodata_shndx >= 0 ||
+ obj->efile.bss_shndx >= 0;
+}
+
+static int
+bpf_object__init_internal_map(struct bpf_object *obj, struct bpf_map *map,
+ enum libbpf_map_type type, Elf_Data *data,
+ void **data_buff)
+{
+ struct bpf_map_def *def = &map->def;
+ char map_name[BPF_OBJ_NAME_LEN];
+
+ map->libbpf_type = type;
+ map->offset = ~(typeof(map->offset))0;
+ snprintf(map_name, sizeof(map_name), "%.8s%.7s", obj->name,
+ libbpf_type_to_btf_name[type]);
+ map->name = strdup(map_name);
+ if (!map->name) {
+ pr_warning("failed to alloc map name\n");
+ return -ENOMEM;
+ }
+
+ def->type = BPF_MAP_TYPE_ARRAY;
+ def->key_size = sizeof(int);
+ def->value_size = data->d_size;
+ def->max_entries = 1;
+ def->map_flags = type == LIBBPF_MAP_RODATA ?
+ BPF_F_RDONLY_PROG : 0;
+ if (data_buff) {
+ *data_buff = malloc(data->d_size);
+ if (!*data_buff) {
+ zfree(&map->name);
+ pr_warning("failed to alloc map content buffer\n");
+ return -ENOMEM;
+ }
+ memcpy(*data_buff, data->d_buf, data->d_size);
+ }
+
+ pr_debug("map %ld is \"%s\"\n", map - obj->maps, map->name);
+ return 0;
+}
+
static int
bpf_object__init_maps(struct bpf_object *obj, int flags)
{
+ int i, map_idx, map_def_sz = 0, nr_syms, nr_maps = 0, nr_maps_glob = 0;
bool strict = !(flags & MAPS_RELAX_COMPAT);
- int i, map_idx, map_def_sz, nr_maps = 0;
- Elf_Scn *scn;
- Elf_Data *data = NULL;
Elf_Data *symbols = obj->efile.symbols;
+ Elf_Data *data = NULL;
+ int ret = 0;
- if (obj->efile.maps_shndx < 0)
- return -EINVAL;
if (!symbols)
return -EINVAL;
+ nr_syms = symbols->d_size / sizeof(GElf_Sym);
- scn = elf_getscn(obj->efile.elf, obj->efile.maps_shndx);
- if (scn)
- data = elf_getdata(scn, NULL);
- if (!scn || !data) {
- pr_warning("failed to get Elf_Data from map section %d\n",
- obj->efile.maps_shndx);
- return -EINVAL;
+ if (obj->efile.maps_shndx >= 0) {
+ Elf_Scn *scn = elf_getscn(obj->efile.elf,
+ obj->efile.maps_shndx);
+
+ if (scn)
+ data = elf_getdata(scn, NULL);
+ if (!scn || !data) {
+ pr_warning("failed to get Elf_Data from map section %d\n",
+ obj->efile.maps_shndx);
+ return -EINVAL;
+ }
}
/*
*
* TODO: Detect array of map and report error.
*/
- for (i = 0; i < symbols->d_size / sizeof(GElf_Sym); i++) {
+ if (obj->efile.data_shndx >= 0)
+ nr_maps_glob++;
+ if (obj->efile.rodata_shndx >= 0)
+ nr_maps_glob++;
+ if (obj->efile.bss_shndx >= 0)
+ nr_maps_glob++;
+ for (i = 0; data && i < nr_syms; i++) {
GElf_Sym sym;
if (!gelf_getsym(symbols, i, &sym))
/* Alloc obj->maps and fill nr_maps. */
pr_debug("maps in %s: %d maps in %zd bytes\n", obj->path,
nr_maps, data->d_size);
-
- if (!nr_maps)
+ if (!nr_maps && !nr_maps_glob)
return 0;
/* Assume equally sized map definitions */
- map_def_sz = data->d_size / nr_maps;
- if (!data->d_size || (data->d_size % nr_maps) != 0) {
- pr_warning("unable to determine map definition size "
- "section %s, %d maps in %zd bytes\n",
- obj->path, nr_maps, data->d_size);
- return -EINVAL;
+ if (data) {
+ map_def_sz = data->d_size / nr_maps;
+ if (!data->d_size || (data->d_size % nr_maps) != 0) {
+ pr_warning("unable to determine map definition size "
+ "section %s, %d maps in %zd bytes\n",
+ obj->path, nr_maps, data->d_size);
+ return -EINVAL;
+ }
}
+ nr_maps += nr_maps_glob;
obj->maps = calloc(nr_maps, sizeof(obj->maps[0]));
if (!obj->maps) {
pr_warning("alloc maps for object failed\n");
/*
* Fill obj->maps using data in "maps" section.
*/
- for (i = 0, map_idx = 0; i < symbols->d_size / sizeof(GElf_Sym); i++) {
+ for (i = 0, map_idx = 0; data && i < nr_syms; i++) {
GElf_Sym sym;
const char *map_name;
struct bpf_map_def *def;
map_name = elf_strptr(obj->efile.elf,
obj->efile.strtabidx,
sym.st_name);
+
+ obj->maps[map_idx].libbpf_type = LIBBPF_MAP_UNSPEC;
obj->maps[map_idx].offset = sym.st_value;
if (sym.st_value + map_def_sz > data->d_size) {
pr_warning("corrupted maps section in %s: last map \"%s\" too small\n",
map_idx++;
}
- qsort(obj->maps, obj->nr_maps, sizeof(obj->maps[0]), compare_bpf_map);
- return 0;
+ /*
+ * Populate rest of obj->maps with libbpf internal maps.
+ */
+ if (obj->efile.data_shndx >= 0)
+ ret = bpf_object__init_internal_map(obj, &obj->maps[map_idx++],
+ LIBBPF_MAP_DATA,
+ obj->efile.data,
+ &obj->sections.data);
+ if (!ret && obj->efile.rodata_shndx >= 0)
+ ret = bpf_object__init_internal_map(obj, &obj->maps[map_idx++],
+ LIBBPF_MAP_RODATA,
+ obj->efile.rodata,
+ &obj->sections.rodata);
+ if (!ret && obj->efile.bss_shndx >= 0)
+ ret = bpf_object__init_internal_map(obj, &obj->maps[map_idx++],
+ LIBBPF_MAP_BSS,
+ obj->efile.bss, NULL);
+ if (!ret)
+ qsort(obj->maps, obj->nr_maps, sizeof(obj->maps[0]),
+ compare_bpf_map);
+ return ret;
}
static bool section_have_execinstr(struct bpf_object *obj, int idx)
Elf *elf = obj->efile.elf;
GElf_Ehdr *ep = &obj->efile.ehdr;
Elf_Data *btf_ext_data = NULL;
+ Elf_Data *btf_data = NULL;
Elf_Scn *scn = NULL;
int idx = 0, err = 0;
(int)sh.sh_link, (unsigned long)sh.sh_flags,
(int)sh.sh_type);
- if (strcmp(name, "license") == 0)
+ if (strcmp(name, "license") == 0) {
err = bpf_object__init_license(obj,
data->d_buf,
data->d_size);
- else if (strcmp(name, "version") == 0)
+ } else if (strcmp(name, "version") == 0) {
err = bpf_object__init_kversion(obj,
data->d_buf,
data->d_size);
- else if (strcmp(name, "maps") == 0)
+ } else if (strcmp(name, "maps") == 0) {
obj->efile.maps_shndx = idx;
- else if (strcmp(name, BTF_ELF_SEC) == 0) {
- obj->btf = btf__new(data->d_buf, data->d_size);
- if (IS_ERR(obj->btf)) {
- pr_warning("Error loading ELF section %s: %ld. Ignored and continue.\n",
- BTF_ELF_SEC, PTR_ERR(obj->btf));
- obj->btf = NULL;
- continue;
- }
- err = btf__load(obj->btf);
- if (err) {
- pr_warning("Error loading %s into kernel: %d. Ignored and continue.\n",
- BTF_ELF_SEC, err);
- btf__free(obj->btf);
- obj->btf = NULL;
- err = 0;
- }
+ } else if (strcmp(name, BTF_ELF_SEC) == 0) {
+ btf_data = data;
} else if (strcmp(name, BTF_EXT_ELF_SEC) == 0) {
btf_ext_data = data;
} else if (sh.sh_type == SHT_SYMTAB) {
obj->efile.symbols = data;
obj->efile.strtabidx = sh.sh_link;
}
- } else if ((sh.sh_type == SHT_PROGBITS) &&
- (sh.sh_flags & SHF_EXECINSTR) &&
- (data->d_size > 0)) {
- if (strcmp(name, ".text") == 0)
- obj->efile.text_shndx = idx;
- err = bpf_object__add_program(obj, data->d_buf,
- data->d_size, name, idx);
- if (err) {
- char errmsg[STRERR_BUFSIZE];
- char *cp = libbpf_strerror_r(-err, errmsg,
- sizeof(errmsg));
-
- pr_warning("failed to alloc program %s (%s): %s",
- name, obj->path, cp);
+ } else if (sh.sh_type == SHT_PROGBITS && data->d_size > 0) {
+ if (sh.sh_flags & SHF_EXECINSTR) {
+ if (strcmp(name, ".text") == 0)
+ obj->efile.text_shndx = idx;
+ err = bpf_object__add_program(obj, data->d_buf,
+ data->d_size, name, idx);
+ if (err) {
+ char errmsg[STRERR_BUFSIZE];
+ char *cp = libbpf_strerror_r(-err, errmsg,
+ sizeof(errmsg));
+
+ pr_warning("failed to alloc program %s (%s): %s",
+ name, obj->path, cp);
+ }
+ } else if (strcmp(name, ".data") == 0) {
+ obj->efile.data = data;
+ obj->efile.data_shndx = idx;
+ } else if (strcmp(name, ".rodata") == 0) {
+ obj->efile.rodata = data;
+ obj->efile.rodata_shndx = idx;
+ } else {
+ pr_debug("skip section(%d) %s\n", idx, name);
}
} else if (sh.sh_type == SHT_REL) {
void *reloc = obj->efile.reloc;
obj->efile.reloc[n].shdr = sh;
obj->efile.reloc[n].data = data;
}
+ } else if (sh.sh_type == SHT_NOBITS && strcmp(name, ".bss") == 0) {
+ obj->efile.bss = data;
+ obj->efile.bss_shndx = idx;
} else {
pr_debug("skip section(%d) %s\n", idx, name);
}
pr_warning("Corrupted ELF file: index of strtab invalid\n");
return LIBBPF_ERRNO__FORMAT;
}
+ if (btf_data) {
+ obj->btf = btf__new(btf_data->d_buf, btf_data->d_size);
+ if (IS_ERR(obj->btf)) {
+ pr_warning("Error loading ELF section %s: %ld. Ignored and continue.\n",
+ BTF_ELF_SEC, PTR_ERR(obj->btf));
+ obj->btf = NULL;
+ } else {
+ err = btf__finalize_data(obj, obj->btf);
+ if (!err)
+ err = btf__load(obj->btf);
+ if (err) {
+ pr_warning("Error finalizing and loading %s into kernel: %d. Ignored and continue.\n",
+ BTF_ELF_SEC, err);
+ btf__free(obj->btf);
+ obj->btf = NULL;
+ err = 0;
+ }
+ }
+ }
if (btf_ext_data) {
if (!obj->btf) {
pr_debug("Ignore ELF section %s because its depending ELF section %s is not found.\n",
}
}
}
- if (obj->efile.maps_shndx >= 0) {
+ if (bpf_object__has_maps(obj)) {
err = bpf_object__init_maps(obj, flags);
if (err)
goto out;
return NULL;
}
+static bool bpf_object__shndx_is_data(const struct bpf_object *obj,
+ int shndx)
+{
+ return shndx == obj->efile.data_shndx ||
+ shndx == obj->efile.bss_shndx ||
+ shndx == obj->efile.rodata_shndx;
+}
+
+static bool bpf_object__shndx_is_maps(const struct bpf_object *obj,
+ int shndx)
+{
+ return shndx == obj->efile.maps_shndx;
+}
+
+static bool bpf_object__relo_in_known_section(const struct bpf_object *obj,
+ int shndx)
+{
+ return shndx == obj->efile.text_shndx ||
+ bpf_object__shndx_is_maps(obj, shndx) ||
+ bpf_object__shndx_is_data(obj, shndx);
+}
+
+static enum libbpf_map_type
+bpf_object__section_to_libbpf_map_type(const struct bpf_object *obj, int shndx)
+{
+ if (shndx == obj->efile.data_shndx)
+ return LIBBPF_MAP_DATA;
+ else if (shndx == obj->efile.bss_shndx)
+ return LIBBPF_MAP_BSS;
+ else if (shndx == obj->efile.rodata_shndx)
+ return LIBBPF_MAP_RODATA;
+ else
+ return LIBBPF_MAP_UNSPEC;
+}
+
static int
bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
Elf_Data *data, struct bpf_object *obj)
{
Elf_Data *symbols = obj->efile.symbols;
- int text_shndx = obj->efile.text_shndx;
- int maps_shndx = obj->efile.maps_shndx;
struct bpf_map *maps = obj->maps;
size_t nr_maps = obj->nr_maps;
int i, nrels;
GElf_Sym sym;
GElf_Rel rel;
unsigned int insn_idx;
+ unsigned int shdr_idx;
struct bpf_insn *insns = prog->insns;
+ enum libbpf_map_type type;
+ const char *name;
size_t map_idx;
if (!gelf_getrel(data, i, &rel)) {
GELF_R_SYM(rel.r_info));
return -LIBBPF_ERRNO__FORMAT;
}
- pr_debug("relo for %lld value %lld name %d\n",
+
+ name = elf_strptr(obj->efile.elf, obj->efile.strtabidx,
+ sym.st_name) ? : "<?>";
+
+ pr_debug("relo for %lld value %lld name %d (\'%s\')\n",
(long long) (rel.r_info >> 32),
- (long long) sym.st_value, sym.st_name);
+ (long long) sym.st_value, sym.st_name, name);
- if (sym.st_shndx != maps_shndx && sym.st_shndx != text_shndx) {
- pr_warning("Program '%s' contains non-map related relo data pointing to section %u\n",
- prog->section_name, sym.st_shndx);
+ shdr_idx = sym.st_shndx;
+ if (!bpf_object__relo_in_known_section(obj, shdr_idx)) {
+ pr_warning("Program '%s' contains unrecognized relo data pointing to section %u\n",
+ prog->section_name, shdr_idx);
return -LIBBPF_ERRNO__RELOC;
}
return -LIBBPF_ERRNO__RELOC;
}
- /* TODO: 'maps' is sorted. We can use bsearch to make it faster. */
- for (map_idx = 0; map_idx < nr_maps; map_idx++) {
- if (maps[map_idx].offset == sym.st_value) {
- pr_debug("relocation: find map %zd (%s) for insn %u\n",
- map_idx, maps[map_idx].name, insn_idx);
- break;
+ if (bpf_object__shndx_is_maps(obj, shdr_idx) ||
+ bpf_object__shndx_is_data(obj, shdr_idx)) {
+ type = bpf_object__section_to_libbpf_map_type(obj, shdr_idx);
+ if (type != LIBBPF_MAP_UNSPEC &&
+ GELF_ST_BIND(sym.st_info) == STB_GLOBAL) {
+ pr_warning("bpf: relocation: not yet supported relo for non-static global \'%s\' variable found in insns[%d].code 0x%x\n",
+ name, insn_idx, insns[insn_idx].code);
+ return -LIBBPF_ERRNO__RELOC;
}
- }
- if (map_idx >= nr_maps) {
- pr_warning("bpf relocation: map_idx %d large than %d\n",
- (int)map_idx, (int)nr_maps - 1);
- return -LIBBPF_ERRNO__RELOC;
- }
+ for (map_idx = 0; map_idx < nr_maps; map_idx++) {
+ if (maps[map_idx].libbpf_type != type)
+ continue;
+ if (type != LIBBPF_MAP_UNSPEC ||
+ (type == LIBBPF_MAP_UNSPEC &&
+ maps[map_idx].offset == sym.st_value)) {
+ pr_debug("relocation: find map %zd (%s) for insn %u\n",
+ map_idx, maps[map_idx].name, insn_idx);
+ break;
+ }
+ }
+
+ if (map_idx >= nr_maps) {
+ pr_warning("bpf relocation: map_idx %d large than %d\n",
+ (int)map_idx, (int)nr_maps - 1);
+ return -LIBBPF_ERRNO__RELOC;
+ }
- prog->reloc_desc[i].type = RELO_LD64;
- prog->reloc_desc[i].insn_idx = insn_idx;
- prog->reloc_desc[i].map_idx = map_idx;
+ prog->reloc_desc[i].type = type != LIBBPF_MAP_UNSPEC ?
+ RELO_DATA : RELO_LD64;
+ prog->reloc_desc[i].insn_idx = insn_idx;
+ prog->reloc_desc[i].map_idx = map_idx;
+ }
}
return 0;
}
static int bpf_map_find_btf_info(struct bpf_map *map, const struct btf *btf)
{
struct bpf_map_def *def = &map->def;
- __u32 key_type_id, value_type_id;
+ __u32 key_type_id = 0, value_type_id = 0;
int ret;
- ret = btf__get_map_kv_tids(btf, map->name, def->key_size,
- def->value_size, &key_type_id,
- &value_type_id);
- if (ret)
+ if (!bpf_map__is_internal(map)) {
+ ret = btf__get_map_kv_tids(btf, map->name, def->key_size,
+ def->value_size, &key_type_id,
+ &value_type_id);
+ } else {
+ /*
+ * LLVM annotates global data differently in BTF, that is,
+ * only as '.data', '.bss' or '.rodata'.
+ */
+ ret = btf__find_by_name(btf,
+ libbpf_type_to_btf_name[map->libbpf_type]);
+ }
+ if (ret < 0)
return ret;
map->btf_key_type_id = key_type_id;
- map->btf_value_type_id = value_type_id;
-
+ map->btf_value_type_id = bpf_map__is_internal(map) ?
+ ret : value_type_id;
return 0;
}
return bpf_object__probe_name(obj);
}
+static int
+bpf_object__populate_internal_map(struct bpf_object *obj, struct bpf_map *map)
+{
+ char *cp, errmsg[STRERR_BUFSIZE];
+ int err, zero = 0;
+ __u8 *data;
+
+ /* Nothing to do here since kernel already zero-initializes .bss map. */
+ if (map->libbpf_type == LIBBPF_MAP_BSS)
+ return 0;
+
+ data = map->libbpf_type == LIBBPF_MAP_DATA ?
+ obj->sections.data : obj->sections.rodata;
+
+ err = bpf_map_update_elem(map->fd, &zero, data, 0);
+ /* Freeze .rodata map as read-only from syscall side. */
+ if (!err && map->libbpf_type == LIBBPF_MAP_RODATA) {
+ err = bpf_map_freeze(map->fd);
+ if (err) {
+ cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
+ pr_warning("Error freezing map(%s) as read-only: %s\n",
+ map->name, cp);
+ err = 0;
+ }
+ }
+ return err;
+}
+
static int
bpf_object__create_maps(struct bpf_object *obj)
{
size_t j;
err = *pfd;
+err_out:
cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
pr_warning("failed to create map (name: '%s'): %s\n",
map->name, cp);
zclose(obj->maps[j].fd);
return err;
}
+
+ if (bpf_map__is_internal(map)) {
+ err = bpf_object__populate_internal_map(obj, map);
+ if (err < 0) {
+ zclose(*pfd);
+ goto err_out;
+ }
+ }
+
pr_debug("create map %s: fd=%d\n", map->name, *pfd);
}
return 0;
for (i = 0; i < prog->nr_reloc; i++) {
- if (prog->reloc_desc[i].type == RELO_LD64) {
+ if (prog->reloc_desc[i].type == RELO_LD64 ||
+ prog->reloc_desc[i].type == RELO_DATA) {
+ bool relo_data = prog->reloc_desc[i].type == RELO_DATA;
struct bpf_insn *insns = prog->insns;
int insn_idx, map_idx;
insn_idx = prog->reloc_desc[i].insn_idx;
map_idx = prog->reloc_desc[i].map_idx;
- if (insn_idx >= (int)prog->insns_cnt) {
+ if (insn_idx + 1 >= (int)prog->insns_cnt) {
pr_warning("relocation out of range: '%s'\n",
prog->section_name);
return -LIBBPF_ERRNO__RELOC;
}
- insns[insn_idx].src_reg = BPF_PSEUDO_MAP_FD;
+
+ if (!relo_data) {
+ insns[insn_idx].src_reg = BPF_PSEUDO_MAP_FD;
+ } else {
+ insns[insn_idx].src_reg = BPF_PSEUDO_MAP_VALUE;
+ insns[insn_idx + 1].imm = insns[insn_idx].imm;
+ }
insns[insn_idx].imm = obj->maps[map_idx].fd;
- } else {
+ } else if (prog->reloc_desc[i].type == RELO_CALL) {
err = bpf_program__reloc_text(prog, obj,
&prog->reloc_desc[i]);
if (err)
{
struct bpf_load_program_attr load_attr;
char *cp, errmsg[STRERR_BUFSIZE];
+ int log_buf_size = BPF_LOG_BUF_SIZE;
char *log_buf;
int ret;
load_attr.line_info = prog->line_info;
load_attr.line_info_rec_size = prog->line_info_rec_size;
load_attr.line_info_cnt = prog->line_info_cnt;
+ load_attr.log_level = prog->log_level;
if (!load_attr.insns || !load_attr.insns_cnt)
return -EINVAL;
- log_buf = malloc(BPF_LOG_BUF_SIZE);
+retry_load:
+ log_buf = malloc(log_buf_size);
if (!log_buf)
pr_warning("Alloc log buffer for bpf loader error, continue without log\n");
- ret = bpf_load_program_xattr(&load_attr, log_buf, BPF_LOG_BUF_SIZE);
+ ret = bpf_load_program_xattr(&load_attr, log_buf, log_buf_size);
if (ret >= 0) {
+ if (load_attr.log_level)
+ pr_debug("verifier log:\n%s", log_buf);
*pfd = ret;
ret = 0;
goto out;
}
+ if (errno == ENOSPC) {
+ log_buf_size <<= 1;
+ free(log_buf);
+ goto retry_load;
+ }
ret = -LIBBPF_ERRNO__LOAD;
cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
pr_warning("load bpf program failed: %s\n", cp);
obj->maps[i].priv = NULL;
obj->maps[i].clear_priv = NULL;
}
+
+ zfree(&obj->sections.rodata);
+ zfree(&obj->sections.data);
zfree(&obj->maps);
obj->nr_maps = 0;
return map->def.type == BPF_MAP_TYPE_PERF_EVENT_ARRAY;
}
+bool bpf_map__is_internal(struct bpf_map *map)
+{
+ return map->libbpf_type != LIBBPF_MAP_UNSPEC;
+}
+
void bpf_map__set_ifindex(struct bpf_map *map, __u32 ifindex)
{
map->map_ifindex = ifindex;
bpf_program__set_expected_attach_type(prog,
expected_attach_type);
+ prog->log_level = attr->log_level;
if (!first_prog)
first_prog = prog;
}
LIBBPF_API struct bpf_object *bpf_object__open_buffer(void *obj_buf,
size_t obj_buf_sz,
const char *name);
+int bpf_object__section_size(const struct bpf_object *obj, const char *name,
+ __u32 *size);
+int bpf_object__variable_offset(const struct bpf_object *obj, const char *name,
+ __u32 *off);
LIBBPF_API int bpf_object__pin_maps(struct bpf_object *obj, const char *path);
LIBBPF_API int bpf_object__unpin_maps(struct bpf_object *obj,
const char *path);
LIBBPF_API int bpf_map__reuse_fd(struct bpf_map *map, int fd);
LIBBPF_API int bpf_map__resize(struct bpf_map *map, __u32 max_entries);
LIBBPF_API bool bpf_map__is_offload_neutral(struct bpf_map *map);
+LIBBPF_API bool bpf_map__is_internal(struct bpf_map *map);
LIBBPF_API void bpf_map__set_ifindex(struct bpf_map *map, __u32 ifindex);
LIBBPF_API int bpf_map__pin(struct bpf_map *map, const char *path);
LIBBPF_API int bpf_map__unpin(struct bpf_map *map, const char *path);
enum bpf_prog_type prog_type;
enum bpf_attach_type expected_attach_type;
int ifindex;
+ int log_level;
};
LIBBPF_API int bpf_prog_load_xattr(const struct bpf_prog_load_attr *attr,
bpf_program__bpil_addr_to_offs;
bpf_program__bpil_offs_to_addr;
} LIBBPF_0.0.1;
+
+LIBBPF_0.0.3 {
+ global:
+ bpf_map__is_internal;
+ bpf_map_freeze;
+ btf__finalize_data;
+} LIBBPF_0.0.2;
--- /dev/null
+# SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
+
+prefix=@PREFIX@
+libdir=@LIBDIR@
+includedir=${prefix}/include
+
+Name: libbpf
+Description: BPF library
+Version: @VERSION@
+Libs: -L${libdir} -lbpf
+Requires.private: libelf
+Cflags: -I${includedir}
static int xsk_load_xdp_prog(struct xsk_socket *xsk)
{
- char bpf_log_buf[BPF_LOG_BUF_SIZE];
+ static const int log_buf_size = 16 * 1024;
+ char log_buf[log_buf_size];
int err, prog_fd;
/* This is the C-program:
size_t insns_cnt = sizeof(prog) / sizeof(struct bpf_insn);
prog_fd = bpf_load_program(BPF_PROG_TYPE_XDP, prog, insns_cnt,
- "LGPL-2.1 or BSD-2-Clause", 0, bpf_log_buf,
- BPF_LOG_BUF_SIZE);
+ "LGPL-2.1 or BSD-2-Clause", 0, log_buf,
+ log_buf_size);
if (prog_fd < 0) {
- pr_warning("BPF log buffer:\n%s", bpf_log_buf);
+ pr_warning("BPF log buffer:\n%s", log_buf);
return prog_fd;
}
test_section_names
test_tcpnotify_user
test_libbpf
+test_tcp_check_syncookie_user
alu32
test_skb_cgroup_id.sh \
test_flow_dissector.sh \
test_xdp_vlan.sh \
- test_lwt_ip_encap.sh
+ test_lwt_ip_encap.sh \
+ test_tcp_check_syncookie.sh \
+ test_tc_tunnel.sh \
+ test_tc_edt.sh
TEST_PROGS_EXTENDED := with_addr.sh \
with_tunnels.sh \
# Compile but not part of 'make run_tests'
TEST_GEN_PROGS_EXTENDED = test_libbpf_open test_sock_addr test_skb_cgroup_id_user \
- flow_dissector_load test_flow_dissector
+ flow_dissector_load test_flow_dissector test_tcp_check_syncookie_user
include ../lib.mk
all: $(TEST_CUSTOM_PROGS)
$(OUTPUT)/urandom_read: $(OUTPUT)/%: %.c
- $(CC) -o $@ -static $< -Wl,--build-id
+ $(CC) -o $@ $< -Wl,--build-id
BPFOBJ := $(OUTPUT)/libbpf.a
endif
PROG_TESTS_H := $(OUTPUT)/prog_tests/tests.h
-$(OUTPUT)/test_progs: $(PROG_TESTS_H)
+test_progs.c: $(PROG_TESTS_H)
$(OUTPUT)/test_progs: CFLAGS += $(TEST_PROGS_CFLAGS)
$(OUTPUT)/test_progs: prog_tests/*.c
) > $(PROG_TESTS_H))
VERIFIER_TESTS_H := $(OUTPUT)/verifier/tests.h
-$(OUTPUT)/test_verifier: $(VERIFIER_TESTS_H)
+test_verifier.c: $(VERIFIER_TESTS_H)
$(OUTPUT)/test_verifier: CFLAGS += $(TEST_VERIFIER_CFLAGS)
VERIFIER_TESTS_DIR = $(OUTPUT)/verifier
#define SEC(NAME) __attribute__((section(NAME), used))
/* helper functions called from eBPF programs written in C */
-static void *(*bpf_map_lookup_elem)(void *map, void *key) =
+static void *(*bpf_map_lookup_elem)(void *map, const void *key) =
(void *) BPF_FUNC_map_lookup_elem;
-static int (*bpf_map_update_elem)(void *map, void *key, void *value,
+static int (*bpf_map_update_elem)(void *map, const void *key, const void *value,
unsigned long long flags) =
(void *) BPF_FUNC_map_update_elem;
-static int (*bpf_map_delete_elem)(void *map, void *key) =
+static int (*bpf_map_delete_elem)(void *map, const void *key) =
(void *) BPF_FUNC_map_delete_elem;
-static int (*bpf_map_push_elem)(void *map, void *value,
+static int (*bpf_map_push_elem)(void *map, const void *value,
unsigned long long flags) =
(void *) BPF_FUNC_map_push_elem;
static int (*bpf_map_pop_elem)(void *map, void *value) =
int size, unsigned long long netns_id,
unsigned long long flags) =
(void *) BPF_FUNC_sk_lookup_tcp;
+static struct bpf_sock *(*bpf_skc_lookup_tcp)(void *ctx,
+ struct bpf_sock_tuple *tuple,
+ int size, unsigned long long netns_id,
+ unsigned long long flags) =
+ (void *) BPF_FUNC_skc_lookup_tcp;
static struct bpf_sock *(*bpf_sk_lookup_udp)(void *ctx,
struct bpf_sock_tuple *tuple,
int size, unsigned long long netns_id,
(void *) BPF_FUNC_get_listener_sock;
static int (*bpf_skb_ecn_set_ce)(void *ctx) =
(void *) BPF_FUNC_skb_ecn_set_ce;
+static int (*bpf_tcp_check_syncookie)(struct bpf_sock *sk,
+ void *ip, int ip_len, void *tcp, int tcp_len) =
+ (void *) BPF_FUNC_tcp_check_syncookie;
/* llvm builtin functions that eBPF C program may use to
* emit BPF_LD_ABS and BPF_LD_IND instructions
#elif defined(__TARGET_ARCH_s930x)
#define bpf_target_s930x
#define bpf_target_defined
+#elif defined(__TARGET_ARCH_arm)
+ #define bpf_target_arm
+ #define bpf_target_defined
#elif defined(__TARGET_ARCH_arm64)
#define bpf_target_arm64
#define bpf_target_defined
#define bpf_target_x86
#elif defined(__s390x__)
#define bpf_target_s930x
+#elif defined(__arm__)
+ #define bpf_target_arm
#elif defined(__aarch64__)
#define bpf_target_arm64
#elif defined(__mips__)
#define PT_REGS_SP(x) ((x)->gprs[15])
#define PT_REGS_IP(x) ((x)->psw.addr)
+#elif defined(bpf_target_arm)
+
+#define PT_REGS_PARM1(x) ((x)->uregs[0])
+#define PT_REGS_PARM2(x) ((x)->uregs[1])
+#define PT_REGS_PARM3(x) ((x)->uregs[2])
+#define PT_REGS_PARM4(x) ((x)->uregs[3])
+#define PT_REGS_PARM5(x) ((x)->uregs[4])
+#define PT_REGS_RET(x) ((x)->uregs[14])
+#define PT_REGS_FP(x) ((x)->uregs[11]) /* Works only with CONFIG_FRAME_POINTER */
+#define PT_REGS_RC(x) ((x)->uregs[0])
+#define PT_REGS_SP(x) ((x)->uregs[13])
+#define PT_REGS_IP(x) ((x)->uregs[12])
+
#elif defined(bpf_target_arm64)
#define PT_REGS_PARM1(x) ((x)->regs[0])
CONFIG_BPF_STREAM_PARSER=y
CONFIG_XDP_SOCKETS=y
CONFIG_FTRACE_SYSCALLS=y
+CONFIG_IPV6_TUNNEL=y
+CONFIG_IPV6_GRE=y
+CONFIG_NET_FOU=m
+CONFIG_NET_FOU_IP_TUNNELS=y
+CONFIG_IPV6_FOU=m
+CONFIG_IPV6_FOU_TUNNEL=m
+CONFIG_MPLS=y
+CONFIG_NET_MPLS_GSO=m
+CONFIG_MPLS_ROUTING=m
+CONFIG_MPLS_IPTUNNEL=m
sprintf(command, "rm -r %s", cfg_pin_path);
ret = system(command);
if (ret)
- error(1, errno, command);
+ error(1, errno, "%s", command);
}
static void parse_opts(int argc, char **argv)
info_len != sizeof(struct bpf_map_info) ||
strcmp((char *)map_infos[i].name, expected_map_name),
"get-map-info(fd)",
- "err %d errno %d type %d(%d) info_len %u(%Zu) key_size %u value_size %u max_entries %u map_flags %X name %s(%s)\n",
+ "err %d errno %d type %d(%d) info_len %u(%zu) key_size %u value_size %u max_entries %u map_flags %X name %s(%s)\n",
err, errno,
map_infos[i].type, BPF_MAP_TYPE_ARRAY,
info_len, sizeof(struct bpf_map_info),
*(int *)(long)prog_infos[i].map_ids != map_infos[i].id ||
strcmp((char *)prog_infos[i].name, expected_prog_name),
"get-prog-info(fd)",
- "err %d errno %d i %d type %d(%d) info_len %u(%Zu) jit_enabled %d jited_prog_len %u xlated_prog_len %u jited_prog %d xlated_prog %d load_time %lu(%lu) uid %u(%u) nr_map_ids %u(%u) map_id %u(%u) name %s(%s)\n",
+ "err %d errno %d i %d type %d(%d) info_len %u(%zu) jit_enabled %d jited_prog_len %u xlated_prog_len %u jited_prog %d xlated_prog %d load_time %lu(%lu) uid %u(%u) nr_map_ids %u(%u) map_id %u(%u) name %s(%s)\n",
err, errno, i,
prog_infos[i].type, BPF_PROG_TYPE_SOCKET_FILTER,
info_len, sizeof(struct bpf_prog_info),
memcmp(&prog_info, &prog_infos[i], info_len) ||
*(int *)(long)prog_info.map_ids != saved_map_id,
"get-prog-info(next_id->fd)",
- "err %d errno %d info_len %u(%Zu) memcmp %d map_id %u(%u)\n",
+ "err %d errno %d info_len %u(%zu) memcmp %d map_id %u(%u)\n",
err, errno, info_len, sizeof(struct bpf_prog_info),
memcmp(&prog_info, &prog_infos[i], info_len),
*(int *)(long)prog_info.map_ids, saved_map_id);
memcmp(&map_info, &map_infos[i], info_len) ||
array_value != array_magic_value,
"check get-map-info(next_id->fd)",
- "err %d errno %d info_len %u(%Zu) memcmp %d array_value %llu(%llu)\n",
+ "err %d errno %d info_len %u(%zu) memcmp %d array_value %llu(%llu)\n",
err, errno, info_len, sizeof(struct bpf_map_info),
memcmp(&map_info, &map_infos[i], info_len),
array_value, array_magic_value);
--- /dev/null
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2019 Facebook
+#include <test_progs.h>
+static int libbpf_debug_print(enum libbpf_print_level level,
+ const char *format, va_list args)
+{
+ if (level != LIBBPF_DEBUG)
+ return 0;
+
+ if (!strstr(format, "verifier log"))
+ return 0;
+ return vfprintf(stderr, "%s", args);
+}
+
+static int check_load(const char *file)
+{
+ struct bpf_prog_load_attr attr;
+ struct bpf_object *obj;
+ int err, prog_fd;
+
+ memset(&attr, 0, sizeof(struct bpf_prog_load_attr));
+ attr.file = file;
+ attr.prog_type = BPF_PROG_TYPE_SCHED_CLS;
+ attr.log_level = 4;
+ err = bpf_prog_load_xattr(&attr, &obj, &prog_fd);
+ bpf_object__close(obj);
+ if (err)
+ error_cnt++;
+ return err;
+}
+
+void test_bpf_verif_scale(void)
+{
+ const char *file1 = "./test_verif_scale1.o";
+ const char *file2 = "./test_verif_scale2.o";
+ const char *file3 = "./test_verif_scale3.o";
+ int err;
+
+ if (verifier_stats)
+ libbpf_set_print(libbpf_debug_print);
+
+ err = check_load(file1);
+ err |= check_load(file2);
+ err |= check_load(file3);
+ if (!err)
+ printf("test_verif_scale:OK\n");
+ else
+ printf("test_verif_scale:FAIL\n");
+}
} else {
for (i = 0; i < num_stack; i++) {
ks = ksym_search(raw_data[i]);
- if (strcmp(ks->name, nonjit_func) == 0) {
+ if (ks && (strcmp(ks->name, nonjit_func) == 0)) {
found = true;
break;
}
} else {
for (i = 0; i < num_stack; i++) {
ks = ksym_search(e->kern_stack[i]);
- if (strcmp(ks->name, nonjit_func) == 0) {
+ if (ks && (strcmp(ks->name, nonjit_func) == 0)) {
good_kern_stack = true;
break;
}
--- /dev/null
+// SPDX-License-Identifier: GPL-2.0
+#include <test_progs.h>
+
+static void test_global_data_number(struct bpf_object *obj, __u32 duration)
+{
+ int i, err, map_fd;
+ uint64_t num;
+
+ map_fd = bpf_find_map(__func__, obj, "result_number");
+ if (map_fd < 0) {
+ error_cnt++;
+ return;
+ }
+
+ struct {
+ char *name;
+ uint32_t key;
+ uint64_t num;
+ } tests[] = {
+ { "relocate .bss reference", 0, 0 },
+ { "relocate .data reference", 1, 42 },
+ { "relocate .rodata reference", 2, 24 },
+ { "relocate .bss reference", 3, 0 },
+ { "relocate .data reference", 4, 0xffeeff },
+ { "relocate .rodata reference", 5, 0xabab },
+ { "relocate .bss reference", 6, 1234 },
+ { "relocate .bss reference", 7, 0 },
+ { "relocate .rodata reference", 8, 0xab },
+ { "relocate .rodata reference", 9, 0x1111111111111111 },
+ { "relocate .rodata reference", 10, ~0 },
+ };
+
+ for (i = 0; i < sizeof(tests) / sizeof(tests[0]); i++) {
+ err = bpf_map_lookup_elem(map_fd, &tests[i].key, &num);
+ CHECK(err || num != tests[i].num, tests[i].name,
+ "err %d result %lx expected %lx\n",
+ err, num, tests[i].num);
+ }
+}
+
+static void test_global_data_string(struct bpf_object *obj, __u32 duration)
+{
+ int i, err, map_fd;
+ char str[32];
+
+ map_fd = bpf_find_map(__func__, obj, "result_string");
+ if (map_fd < 0) {
+ error_cnt++;
+ return;
+ }
+
+ struct {
+ char *name;
+ uint32_t key;
+ char str[32];
+ } tests[] = {
+ { "relocate .rodata reference", 0, "abcdefghijklmnopqrstuvwxyz" },
+ { "relocate .data reference", 1, "abcdefghijklmnopqrstuvwxyz" },
+ { "relocate .bss reference", 2, "" },
+ { "relocate .data reference", 3, "abcdexghijklmnopqrstuvwxyz" },
+ { "relocate .bss reference", 4, "\0\0hello" },
+ };
+
+ for (i = 0; i < sizeof(tests) / sizeof(tests[0]); i++) {
+ err = bpf_map_lookup_elem(map_fd, &tests[i].key, str);
+ CHECK(err || memcmp(str, tests[i].str, sizeof(str)),
+ tests[i].name, "err %d result \'%s\' expected \'%s\'\n",
+ err, str, tests[i].str);
+ }
+}
+
+struct foo {
+ __u8 a;
+ __u32 b;
+ __u64 c;
+};
+
+static void test_global_data_struct(struct bpf_object *obj, __u32 duration)
+{
+ int i, err, map_fd;
+ struct foo val;
+
+ map_fd = bpf_find_map(__func__, obj, "result_struct");
+ if (map_fd < 0) {
+ error_cnt++;
+ return;
+ }
+
+ struct {
+ char *name;
+ uint32_t key;
+ struct foo val;
+ } tests[] = {
+ { "relocate .rodata reference", 0, { 42, 0xfefeefef, 0x1111111111111111ULL, } },
+ { "relocate .bss reference", 1, { } },
+ { "relocate .rodata reference", 2, { } },
+ { "relocate .data reference", 3, { 41, 0xeeeeefef, 0x2111111111111111ULL, } },
+ };
+
+ for (i = 0; i < sizeof(tests) / sizeof(tests[0]); i++) {
+ err = bpf_map_lookup_elem(map_fd, &tests[i].key, &val);
+ CHECK(err || memcmp(&val, &tests[i].val, sizeof(val)),
+ tests[i].name, "err %d result { %u, %u, %llu } expected { %u, %u, %llu }\n",
+ err, val.a, val.b, val.c, tests[i].val.a, tests[i].val.b, tests[i].val.c);
+ }
+}
+
+static void test_global_data_rdonly(struct bpf_object *obj, __u32 duration)
+{
+ int err = -ENOMEM, map_fd, zero = 0;
+ struct bpf_map *map;
+ __u8 *buff;
+
+ map = bpf_object__find_map_by_name(obj, "test_glo.rodata");
+ if (!map || !bpf_map__is_internal(map)) {
+ error_cnt++;
+ return;
+ }
+
+ map_fd = bpf_map__fd(map);
+ if (map_fd < 0) {
+ error_cnt++;
+ return;
+ }
+
+ buff = malloc(bpf_map__def(map)->value_size);
+ if (buff)
+ err = bpf_map_update_elem(map_fd, &zero, buff, 0);
+ free(buff);
+ CHECK(!err || errno != EPERM, "test .rodata read-only map",
+ "err %d errno %d\n", err, errno);
+}
+
+void test_global_data(void)
+{
+ const char *file = "./test_global_data.o";
+ __u32 duration = 0, retval;
+ struct bpf_object *obj;
+ int err, prog_fd;
+
+ err = bpf_prog_load(file, BPF_PROG_TYPE_SCHED_CLS, &obj, &prog_fd);
+ if (CHECK(err, "load program", "error %d loading %s\n", err, file))
+ return;
+
+ err = bpf_prog_test_run(prog_fd, 1, &pkt_v4, sizeof(pkt_v4),
+ NULL, NULL, &retval, &duration);
+ CHECK(err || retval, "pass global data run",
+ "err %d errno %d retval %d duration %d\n",
+ err, errno, retval, duration);
+
+ test_global_data_number(obj, duration);
+ test_global_data_string(obj, duration);
+ test_global_data_struct(obj, duration);
+ test_global_data_rdonly(obj, duration);
+
+ bpf_object__close(obj);
+}
--- /dev/null
+// SPDX-License-Identifier: GPL-2.0
+#include <test_progs.h>
+
+void test_skb_ctx(void)
+{
+ struct __sk_buff skb = {
+ .cb[0] = 1,
+ .cb[1] = 2,
+ .cb[2] = 3,
+ .cb[3] = 4,
+ .cb[4] = 5,
+ .priority = 6,
+ };
+ struct bpf_prog_test_run_attr tattr = {
+ .data_in = &pkt_v4,
+ .data_size_in = sizeof(pkt_v4),
+ .ctx_in = &skb,
+ .ctx_size_in = sizeof(skb),
+ .ctx_out = &skb,
+ .ctx_size_out = sizeof(skb),
+ };
+ struct bpf_object *obj;
+ int err;
+ int i;
+
+ err = bpf_prog_load("./test_skb_ctx.o", BPF_PROG_TYPE_SCHED_CLS, &obj,
+ &tattr.prog_fd);
+ if (CHECK_ATTR(err, "load", "err %d errno %d\n", err, errno))
+ return;
+
+ /* ctx_in != NULL, ctx_size_in == 0 */
+
+ tattr.ctx_size_in = 0;
+ err = bpf_prog_test_run_xattr(&tattr);
+ CHECK_ATTR(err == 0, "ctx_size_in", "err %d errno %d\n", err, errno);
+ tattr.ctx_size_in = sizeof(skb);
+
+ /* ctx_out != NULL, ctx_size_out == 0 */
+
+ tattr.ctx_size_out = 0;
+ err = bpf_prog_test_run_xattr(&tattr);
+ CHECK_ATTR(err == 0, "ctx_size_out", "err %d errno %d\n", err, errno);
+ tattr.ctx_size_out = sizeof(skb);
+
+ /* non-zero [len, tc_index] fields should be rejected*/
+
+ skb.len = 1;
+ err = bpf_prog_test_run_xattr(&tattr);
+ CHECK_ATTR(err == 0, "len", "err %d errno %d\n", err, errno);
+ skb.len = 0;
+
+ skb.tc_index = 1;
+ err = bpf_prog_test_run_xattr(&tattr);
+ CHECK_ATTR(err == 0, "tc_index", "err %d errno %d\n", err, errno);
+ skb.tc_index = 0;
+
+ /* non-zero [hash, sk] fields should be rejected */
+
+ skb.hash = 1;
+ err = bpf_prog_test_run_xattr(&tattr);
+ CHECK_ATTR(err == 0, "hash", "err %d errno %d\n", err, errno);
+ skb.hash = 0;
+
+ skb.sk = (struct bpf_sock *)1;
+ err = bpf_prog_test_run_xattr(&tattr);
+ CHECK_ATTR(err == 0, "sk", "err %d errno %d\n", err, errno);
+ skb.sk = 0;
+
+ err = bpf_prog_test_run_xattr(&tattr);
+ CHECK_ATTR(err != 0 || tattr.retval,
+ "run",
+ "err %d errno %d retval %d\n",
+ err, errno, tattr.retval);
+
+ CHECK_ATTR(tattr.ctx_size_out != sizeof(skb),
+ "ctx_size_out",
+ "incorrect output size, want %lu have %u\n",
+ sizeof(skb), tattr.ctx_size_out);
+
+ for (i = 0; i < 5; i++)
+ CHECK_ATTR(skb.cb[i] != i + 2,
+ "ctx_out_cb",
+ "skb->cb[i] == %d, expected %d\n",
+ skb.cb[i], i + 2);
+ CHECK_ATTR(skb.priority != 7,
+ "ctx_out_priority",
+ "skb->priority == %d, expected %d\n",
+ skb.priority, 7);
+}
// SPDX-License-Identifier: GPL-2.0
#include <test_progs.h>
+static __u64 read_perf_max_sample_freq(void)
+{
+ __u64 sample_freq = 5000; /* fallback to 5000 on error */
+ FILE *f;
+
+ f = fopen("/proc/sys/kernel/perf_event_max_sample_rate", "r");
+ if (f == NULL)
+ return sample_freq;
+ fscanf(f, "%llu", &sample_freq);
+ fclose(f);
+ return sample_freq;
+}
+
void test_stacktrace_build_id_nmi(void)
{
int control_map_fd, stackid_hmap_fd, stackmap_fd, stack_amap_fd;
const char *file = "./test_stacktrace_build_id.o";
int err, pmu_fd, prog_fd;
struct perf_event_attr attr = {
- .sample_freq = 5000,
.freq = 1,
.type = PERF_TYPE_HARDWARE,
.config = PERF_COUNT_HW_CPU_CYCLES,
int build_id_matches = 0;
int retry = 1;
+ attr.sample_freq = read_perf_max_sample_freq();
+
retry:
err = bpf_prog_load(file, BPF_PROG_TYPE_PERF_EVENT, &obj, &prog_fd);
if (CHECK(err, "prog_load", "err %d errno %d\n", err, errno))
--- /dev/null
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2019 Isovalent, Inc.
+
+#include <linux/bpf.h>
+#include <linux/pkt_cls.h>
+#include <string.h>
+
+#include "bpf_helpers.h"
+
+struct bpf_map_def SEC("maps") result_number = {
+ .type = BPF_MAP_TYPE_ARRAY,
+ .key_size = sizeof(__u32),
+ .value_size = sizeof(__u64),
+ .max_entries = 11,
+};
+
+struct bpf_map_def SEC("maps") result_string = {
+ .type = BPF_MAP_TYPE_ARRAY,
+ .key_size = sizeof(__u32),
+ .value_size = 32,
+ .max_entries = 5,
+};
+
+struct foo {
+ __u8 a;
+ __u32 b;
+ __u64 c;
+};
+
+struct bpf_map_def SEC("maps") result_struct = {
+ .type = BPF_MAP_TYPE_ARRAY,
+ .key_size = sizeof(__u32),
+ .value_size = sizeof(struct foo),
+ .max_entries = 5,
+};
+
+/* Relocation tests for __u64s. */
+static __u64 num0;
+static __u64 num1 = 42;
+static const __u64 num2 = 24;
+static __u64 num3 = 0;
+static __u64 num4 = 0xffeeff;
+static const __u64 num5 = 0xabab;
+static const __u64 num6 = 0xab;
+
+/* Relocation tests for strings. */
+static const char str0[32] = "abcdefghijklmnopqrstuvwxyz";
+static char str1[32] = "abcdefghijklmnopqrstuvwxyz";
+static char str2[32];
+
+/* Relocation tests for structs. */
+static const struct foo struct0 = {
+ .a = 42,
+ .b = 0xfefeefef,
+ .c = 0x1111111111111111ULL,
+};
+static struct foo struct1;
+static const struct foo struct2;
+static struct foo struct3 = {
+ .a = 41,
+ .b = 0xeeeeefef,
+ .c = 0x2111111111111111ULL,
+};
+
+#define test_reloc(map, num, var) \
+ do { \
+ __u32 key = num; \
+ bpf_map_update_elem(&result_##map, &key, var, 0); \
+ } while (0)
+
+SEC("static_data_load")
+int load_static_data(struct __sk_buff *skb)
+{
+ static const __u64 bar = ~0;
+
+ test_reloc(number, 0, &num0);
+ test_reloc(number, 1, &num1);
+ test_reloc(number, 2, &num2);
+ test_reloc(number, 3, &num3);
+ test_reloc(number, 4, &num4);
+ test_reloc(number, 5, &num5);
+ num4 = 1234;
+ test_reloc(number, 6, &num4);
+ test_reloc(number, 7, &num0);
+ test_reloc(number, 8, &num6);
+
+ test_reloc(string, 0, str0);
+ test_reloc(string, 1, str1);
+ test_reloc(string, 2, str2);
+ str1[5] = 'x';
+ test_reloc(string, 3, str1);
+ __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
+ test_reloc(string, 4, str2);
+
+ test_reloc(struct, 0, &struct0);
+ test_reloc(struct, 1, &struct1);
+ test_reloc(struct, 2, &struct2);
+ test_reloc(struct, 3, &struct3);
+
+ test_reloc(number, 9, &struct0.c);
+ test_reloc(number, 10, &bar);
+
+ return TC_ACT_OK;
+}
+
+char _license[] SEC("license") = "GPL";
--- /dev/null
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2019 Facebook
+
+typedef unsigned int u32;
+
+static __attribute__((always_inline)) u32 rol32(u32 word, unsigned int shift)
+{
+ return (word << shift) | (word >> ((-shift) & 31));
+}
+
+#define __jhash_mix(a, b, c) \
+{ \
+ a -= c; a ^= rol32(c, 4); c += b; \
+ b -= a; b ^= rol32(a, 6); a += c; \
+ c -= b; c ^= rol32(b, 8); b += a; \
+ a -= c; a ^= rol32(c, 16); c += b; \
+ b -= a; b ^= rol32(a, 19); a += c; \
+ c -= b; c ^= rol32(b, 4); b += a; \
+}
+
+#define __jhash_final(a, b, c) \
+{ \
+ c ^= b; c -= rol32(b, 14); \
+ a ^= c; a -= rol32(c, 11); \
+ b ^= a; b -= rol32(a, 25); \
+ c ^= b; c -= rol32(b, 16); \
+ a ^= c; a -= rol32(c, 4); \
+ b ^= a; b -= rol32(a, 14); \
+ c ^= b; c -= rol32(b, 24); \
+}
+
+#define JHASH_INITVAL 0xdeadbeef
+
+static ATTR
+u32 jhash(const void *key, u32 length, u32 initval)
+{
+ u32 a, b, c;
+ const unsigned char *k = key;
+
+ a = b = c = JHASH_INITVAL + length + initval;
+
+ while (length > 12) {
+ a += *(volatile u32 *)(k);
+ b += *(volatile u32 *)(k + 4);
+ c += *(volatile u32 *)(k + 8);
+ __jhash_mix(a, b, c);
+ length -= 12;
+ k += 12;
+ }
+ switch (length) {
+ case 12: c += (u32)k[11]<<24;
+ case 11: c += (u32)k[10]<<16;
+ case 10: c += (u32)k[9]<<8;
+ case 9: c += k[8];
+ case 8: b += (u32)k[7]<<24;
+ case 7: b += (u32)k[6]<<16;
+ case 6: b += (u32)k[5]<<8;
+ case 5: b += k[4];
+ case 4: a += (u32)k[3]<<24;
+ case 3: a += (u32)k[2]<<16;
+ case 2: a += (u32)k[1]<<8;
+ case 1: a += k[0];
+ c ^= a;
+ __jhash_final(a, b, c);
+ case 0: /* Nothing left to add */
+ break;
+ }
+
+ return c;
+}
--- /dev/null
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/bpf.h>
+#include "bpf_helpers.h"
+
+int _version SEC("version") = 1;
+char _license[] SEC("license") = "GPL";
+
+SEC("skb_ctx")
+int process(struct __sk_buff *skb)
+{
+ #pragma clang loop unroll(full)
+ for (int i = 0; i < 5; i++) {
+ if (skb->cb[i] != i + 1)
+ return 1;
+ skb->cb[i]++;
+ }
+ skb->priority++;
+
+ return 0;
+}
--- /dev/null
+// SPDX-License-Identifier: GPL-2.0
+#include <stdint.h>
+#include <linux/bpf.h>
+#include <linux/if_ether.h>
+#include <linux/in.h>
+#include <linux/ip.h>
+#include <linux/pkt_cls.h>
+#include <linux/tcp.h>
+#include "bpf_helpers.h"
+#include "bpf_endian.h"
+
+/* the maximum delay we are willing to add (drop packets beyond that) */
+#define TIME_HORIZON_NS (2000 * 1000 * 1000)
+#define NS_PER_SEC 1000000000
+#define ECN_HORIZON_NS 5000000
+#define THROTTLE_RATE_BPS (5 * 1000 * 1000)
+
+/* flow_key => last_tstamp timestamp used */
+struct bpf_map_def SEC("maps") flow_map = {
+ .type = BPF_MAP_TYPE_HASH,
+ .key_size = sizeof(uint32_t),
+ .value_size = sizeof(uint64_t),
+ .max_entries = 1,
+};
+
+static inline int throttle_flow(struct __sk_buff *skb)
+{
+ int key = 0;
+ uint64_t *last_tstamp = bpf_map_lookup_elem(&flow_map, &key);
+ uint64_t delay_ns = ((uint64_t)skb->len) * NS_PER_SEC /
+ THROTTLE_RATE_BPS;
+ uint64_t now = bpf_ktime_get_ns();
+ uint64_t tstamp, next_tstamp = 0;
+
+ if (last_tstamp)
+ next_tstamp = *last_tstamp + delay_ns;
+
+ tstamp = skb->tstamp;
+ if (tstamp < now)
+ tstamp = now;
+
+ /* should we throttle? */
+ if (next_tstamp <= tstamp) {
+ if (bpf_map_update_elem(&flow_map, &key, &tstamp, BPF_ANY))
+ return TC_ACT_SHOT;
+ return TC_ACT_OK;
+ }
+
+ /* do not queue past the time horizon */
+ if (next_tstamp - now >= TIME_HORIZON_NS)
+ return TC_ACT_SHOT;
+
+ /* set ecn bit, if needed */
+ if (next_tstamp - now >= ECN_HORIZON_NS)
+ bpf_skb_ecn_set_ce(skb);
+
+ if (bpf_map_update_elem(&flow_map, &key, &next_tstamp, BPF_EXIST))
+ return TC_ACT_SHOT;
+ skb->tstamp = next_tstamp;
+
+ return TC_ACT_OK;
+}
+
+static inline int handle_tcp(struct __sk_buff *skb, struct tcphdr *tcp)
+{
+ void *data_end = (void *)(long)skb->data_end;
+
+ /* drop malformed packets */
+ if ((void *)(tcp + 1) > data_end)
+ return TC_ACT_SHOT;
+
+ if (tcp->dest == bpf_htons(9000))
+ return throttle_flow(skb);
+
+ return TC_ACT_OK;
+}
+
+static inline int handle_ipv4(struct __sk_buff *skb)
+{
+ void *data_end = (void *)(long)skb->data_end;
+ void *data = (void *)(long)skb->data;
+ struct iphdr *iph;
+ uint32_t ihl;
+
+ /* drop malformed packets */
+ if (data + sizeof(struct ethhdr) > data_end)
+ return TC_ACT_SHOT;
+ iph = (struct iphdr *)(data + sizeof(struct ethhdr));
+ if ((void *)(iph + 1) > data_end)
+ return TC_ACT_SHOT;
+ ihl = iph->ihl * 4;
+ if (((void *)iph) + ihl > data_end)
+ return TC_ACT_SHOT;
+
+ if (iph->protocol == IPPROTO_TCP)
+ return handle_tcp(skb, (struct tcphdr *)(((void *)iph) + ihl));
+
+ return TC_ACT_OK;
+}
+
+SEC("cls_test") int tc_prog(struct __sk_buff *skb)
+{
+ if (skb->protocol == bpf_htons(ETH_P_IP))
+ return handle_ipv4(skb);
+
+ return TC_ACT_OK;
+}
+
+char __license[] SEC("license") = "GPL";
--- /dev/null
+// SPDX-License-Identifier: GPL-2.0
+
+/* In-place tunneling */
+
+#include <stdbool.h>
+#include <string.h>
+
+#include <linux/stddef.h>
+#include <linux/bpf.h>
+#include <linux/if_ether.h>
+#include <linux/in.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/mpls.h>
+#include <linux/tcp.h>
+#include <linux/udp.h>
+#include <linux/pkt_cls.h>
+#include <linux/types.h>
+
+#include "bpf_endian.h"
+#include "bpf_helpers.h"
+
+static const int cfg_port = 8000;
+
+static const int cfg_udp_src = 20000;
+
+#define UDP_PORT 5555
+#define MPLS_OVER_UDP_PORT 6635
+#define ETH_OVER_UDP_PORT 7777
+
+/* MPLS label 1000 with S bit (last label) set and ttl of 255. */
+static const __u32 mpls_label = __bpf_constant_htonl(1000 << 12 |
+ MPLS_LS_S_MASK | 0xff);
+
+struct gre_hdr {
+ __be16 flags;
+ __be16 protocol;
+} __attribute__((packed));
+
+union l4hdr {
+ struct udphdr udp;
+ struct gre_hdr gre;
+};
+
+struct v4hdr {
+ struct iphdr ip;
+ union l4hdr l4hdr;
+ __u8 pad[16]; /* enough space for L2 header */
+} __attribute__((packed));
+
+struct v6hdr {
+ struct ipv6hdr ip;
+ union l4hdr l4hdr;
+ __u8 pad[16]; /* enough space for L2 header */
+} __attribute__((packed));
+
+static __always_inline void set_ipv4_csum(struct iphdr *iph)
+{
+ __u16 *iph16 = (__u16 *)iph;
+ __u32 csum;
+ int i;
+
+ iph->check = 0;
+
+#pragma clang loop unroll(full)
+ for (i = 0, csum = 0; i < sizeof(*iph) >> 1; i++)
+ csum += *iph16++;
+
+ iph->check = ~((csum & 0xffff) + (csum >> 16));
+}
+
+static __always_inline int encap_ipv4(struct __sk_buff *skb, __u8 encap_proto,
+ __u16 l2_proto)
+{
+ __u16 udp_dst = UDP_PORT;
+ struct iphdr iph_inner;
+ struct v4hdr h_outer;
+ struct tcphdr tcph;
+ int olen, l2_len;
+ __u64 flags;
+
+ if (bpf_skb_load_bytes(skb, ETH_HLEN, &iph_inner,
+ sizeof(iph_inner)) < 0)
+ return TC_ACT_OK;
+
+ /* filter only packets we want */
+ if (iph_inner.ihl != 5 || iph_inner.protocol != IPPROTO_TCP)
+ return TC_ACT_OK;
+
+ if (bpf_skb_load_bytes(skb, ETH_HLEN + sizeof(iph_inner),
+ &tcph, sizeof(tcph)) < 0)
+ return TC_ACT_OK;
+
+ if (tcph.dest != __bpf_constant_htons(cfg_port))
+ return TC_ACT_OK;
+
+ olen = sizeof(h_outer.ip);
+ l2_len = 0;
+
+ flags = BPF_F_ADJ_ROOM_FIXED_GSO | BPF_F_ADJ_ROOM_ENCAP_L3_IPV4;
+
+ switch (l2_proto) {
+ case ETH_P_MPLS_UC:
+ l2_len = sizeof(mpls_label);
+ udp_dst = MPLS_OVER_UDP_PORT;
+ break;
+ case ETH_P_TEB:
+ l2_len = ETH_HLEN;
+ udp_dst = ETH_OVER_UDP_PORT;
+ break;
+ }
+ flags |= BPF_F_ADJ_ROOM_ENCAP_L2(l2_len);
+
+ switch (encap_proto) {
+ case IPPROTO_GRE:
+ flags |= BPF_F_ADJ_ROOM_ENCAP_L4_GRE;
+ olen += sizeof(h_outer.l4hdr.gre);
+ h_outer.l4hdr.gre.protocol = bpf_htons(l2_proto);
+ h_outer.l4hdr.gre.flags = 0;
+ break;
+ case IPPROTO_UDP:
+ flags |= BPF_F_ADJ_ROOM_ENCAP_L4_UDP;
+ olen += sizeof(h_outer.l4hdr.udp);
+ h_outer.l4hdr.udp.source = __bpf_constant_htons(cfg_udp_src);
+ h_outer.l4hdr.udp.dest = bpf_htons(udp_dst);
+ h_outer.l4hdr.udp.check = 0;
+ h_outer.l4hdr.udp.len = bpf_htons(bpf_ntohs(iph_inner.tot_len) +
+ sizeof(h_outer.l4hdr.udp) +
+ l2_len);
+ break;
+ case IPPROTO_IPIP:
+ break;
+ default:
+ return TC_ACT_OK;
+ }
+
+ /* add L2 encap (if specified) */
+ switch (l2_proto) {
+ case ETH_P_MPLS_UC:
+ *((__u32 *)((__u8 *)&h_outer + olen)) = mpls_label;
+ break;
+ case ETH_P_TEB:
+ if (bpf_skb_load_bytes(skb, 0, (__u8 *)&h_outer + olen,
+ ETH_HLEN))
+ return TC_ACT_SHOT;
+ break;
+ }
+ olen += l2_len;
+
+ /* add room between mac and network header */
+ if (bpf_skb_adjust_room(skb, olen, BPF_ADJ_ROOM_MAC, flags))
+ return TC_ACT_SHOT;
+
+ /* prepare new outer network header */
+ h_outer.ip = iph_inner;
+ h_outer.ip.tot_len = bpf_htons(olen +
+ bpf_ntohs(h_outer.ip.tot_len));
+ h_outer.ip.protocol = encap_proto;
+
+ set_ipv4_csum(&h_outer.ip);
+
+ /* store new outer network header */
+ if (bpf_skb_store_bytes(skb, ETH_HLEN, &h_outer, olen,
+ BPF_F_INVALIDATE_HASH) < 0)
+ return TC_ACT_SHOT;
+
+ return TC_ACT_OK;
+}
+
+static __always_inline int encap_ipv6(struct __sk_buff *skb, __u8 encap_proto,
+ __u16 l2_proto)
+{
+ __u16 udp_dst = UDP_PORT;
+ struct ipv6hdr iph_inner;
+ struct v6hdr h_outer;
+ struct tcphdr tcph;
+ int olen, l2_len;
+ __u16 tot_len;
+ __u64 flags;
+
+ if (bpf_skb_load_bytes(skb, ETH_HLEN, &iph_inner,
+ sizeof(iph_inner)) < 0)
+ return TC_ACT_OK;
+
+ /* filter only packets we want */
+ if (bpf_skb_load_bytes(skb, ETH_HLEN + sizeof(iph_inner),
+ &tcph, sizeof(tcph)) < 0)
+ return TC_ACT_OK;
+
+ if (tcph.dest != __bpf_constant_htons(cfg_port))
+ return TC_ACT_OK;
+
+ olen = sizeof(h_outer.ip);
+ l2_len = 0;
+
+ flags = BPF_F_ADJ_ROOM_FIXED_GSO | BPF_F_ADJ_ROOM_ENCAP_L3_IPV6;
+
+ switch (l2_proto) {
+ case ETH_P_MPLS_UC:
+ l2_len = sizeof(mpls_label);
+ udp_dst = MPLS_OVER_UDP_PORT;
+ break;
+ case ETH_P_TEB:
+ l2_len = ETH_HLEN;
+ udp_dst = ETH_OVER_UDP_PORT;
+ break;
+ }
+ flags |= BPF_F_ADJ_ROOM_ENCAP_L2(l2_len);
+
+ switch (encap_proto) {
+ case IPPROTO_GRE:
+ flags |= BPF_F_ADJ_ROOM_ENCAP_L4_GRE;
+ olen += sizeof(h_outer.l4hdr.gre);
+ h_outer.l4hdr.gre.protocol = bpf_htons(l2_proto);
+ h_outer.l4hdr.gre.flags = 0;
+ break;
+ case IPPROTO_UDP:
+ flags |= BPF_F_ADJ_ROOM_ENCAP_L4_UDP;
+ olen += sizeof(h_outer.l4hdr.udp);
+ h_outer.l4hdr.udp.source = __bpf_constant_htons(cfg_udp_src);
+ h_outer.l4hdr.udp.dest = bpf_htons(udp_dst);
+ tot_len = bpf_ntohs(iph_inner.payload_len) + sizeof(iph_inner) +
+ sizeof(h_outer.l4hdr.udp);
+ h_outer.l4hdr.udp.check = 0;
+ h_outer.l4hdr.udp.len = bpf_htons(tot_len);
+ break;
+ case IPPROTO_IPV6:
+ break;
+ default:
+ return TC_ACT_OK;
+ }
+
+ /* add L2 encap (if specified) */
+ switch (l2_proto) {
+ case ETH_P_MPLS_UC:
+ *((__u32 *)((__u8 *)&h_outer + olen)) = mpls_label;
+ break;
+ case ETH_P_TEB:
+ if (bpf_skb_load_bytes(skb, 0, (__u8 *)&h_outer + olen,
+ ETH_HLEN))
+ return TC_ACT_SHOT;
+ break;
+ }
+ olen += l2_len;
+
+ /* add room between mac and network header */
+ if (bpf_skb_adjust_room(skb, olen, BPF_ADJ_ROOM_MAC, flags))
+ return TC_ACT_SHOT;
+
+ /* prepare new outer network header */
+ h_outer.ip = iph_inner;
+ h_outer.ip.payload_len = bpf_htons(olen +
+ bpf_ntohs(h_outer.ip.payload_len));
+
+ h_outer.ip.nexthdr = encap_proto;
+
+ /* store new outer network header */
+ if (bpf_skb_store_bytes(skb, ETH_HLEN, &h_outer, olen,
+ BPF_F_INVALIDATE_HASH) < 0)
+ return TC_ACT_SHOT;
+
+ return TC_ACT_OK;
+}
+
+SEC("encap_ipip_none")
+int __encap_ipip_none(struct __sk_buff *skb)
+{
+ if (skb->protocol == __bpf_constant_htons(ETH_P_IP))
+ return encap_ipv4(skb, IPPROTO_IPIP, ETH_P_IP);
+ else
+ return TC_ACT_OK;
+}
+
+SEC("encap_gre_none")
+int __encap_gre_none(struct __sk_buff *skb)
+{
+ if (skb->protocol == __bpf_constant_htons(ETH_P_IP))
+ return encap_ipv4(skb, IPPROTO_GRE, ETH_P_IP);
+ else
+ return TC_ACT_OK;
+}
+
+SEC("encap_gre_mpls")
+int __encap_gre_mpls(struct __sk_buff *skb)
+{
+ if (skb->protocol == __bpf_constant_htons(ETH_P_IP))
+ return encap_ipv4(skb, IPPROTO_GRE, ETH_P_MPLS_UC);
+ else
+ return TC_ACT_OK;
+}
+
+SEC("encap_gre_eth")
+int __encap_gre_eth(struct __sk_buff *skb)
+{
+ if (skb->protocol == __bpf_constant_htons(ETH_P_IP))
+ return encap_ipv4(skb, IPPROTO_GRE, ETH_P_TEB);
+ else
+ return TC_ACT_OK;
+}
+
+SEC("encap_udp_none")
+int __encap_udp_none(struct __sk_buff *skb)
+{
+ if (skb->protocol == __bpf_constant_htons(ETH_P_IP))
+ return encap_ipv4(skb, IPPROTO_UDP, ETH_P_IP);
+ else
+ return TC_ACT_OK;
+}
+
+SEC("encap_udp_mpls")
+int __encap_udp_mpls(struct __sk_buff *skb)
+{
+ if (skb->protocol == __bpf_constant_htons(ETH_P_IP))
+ return encap_ipv4(skb, IPPROTO_UDP, ETH_P_MPLS_UC);
+ else
+ return TC_ACT_OK;
+}
+
+SEC("encap_udp_eth")
+int __encap_udp_eth(struct __sk_buff *skb)
+{
+ if (skb->protocol == __bpf_constant_htons(ETH_P_IP))
+ return encap_ipv4(skb, IPPROTO_UDP, ETH_P_TEB);
+ else
+ return TC_ACT_OK;
+}
+
+SEC("encap_ip6tnl_none")
+int __encap_ip6tnl_none(struct __sk_buff *skb)
+{
+ if (skb->protocol == __bpf_constant_htons(ETH_P_IPV6))
+ return encap_ipv6(skb, IPPROTO_IPV6, ETH_P_IPV6);
+ else
+ return TC_ACT_OK;
+}
+
+SEC("encap_ip6gre_none")
+int __encap_ip6gre_none(struct __sk_buff *skb)
+{
+ if (skb->protocol == __bpf_constant_htons(ETH_P_IPV6))
+ return encap_ipv6(skb, IPPROTO_GRE, ETH_P_IPV6);
+ else
+ return TC_ACT_OK;
+}
+
+SEC("encap_ip6gre_mpls")
+int __encap_ip6gre_mpls(struct __sk_buff *skb)
+{
+ if (skb->protocol == __bpf_constant_htons(ETH_P_IPV6))
+ return encap_ipv6(skb, IPPROTO_GRE, ETH_P_MPLS_UC);
+ else
+ return TC_ACT_OK;
+}
+
+SEC("encap_ip6gre_eth")
+int __encap_ip6gre_eth(struct __sk_buff *skb)
+{
+ if (skb->protocol == __bpf_constant_htons(ETH_P_IPV6))
+ return encap_ipv6(skb, IPPROTO_GRE, ETH_P_TEB);
+ else
+ return TC_ACT_OK;
+}
+
+SEC("encap_ip6udp_none")
+int __encap_ip6udp_none(struct __sk_buff *skb)
+{
+ if (skb->protocol == __bpf_constant_htons(ETH_P_IPV6))
+ return encap_ipv6(skb, IPPROTO_UDP, ETH_P_IPV6);
+ else
+ return TC_ACT_OK;
+}
+
+SEC("encap_ip6udp_mpls")
+int __encap_ip6udp_mpls(struct __sk_buff *skb)
+{
+ if (skb->protocol == __bpf_constant_htons(ETH_P_IPV6))
+ return encap_ipv6(skb, IPPROTO_UDP, ETH_P_MPLS_UC);
+ else
+ return TC_ACT_OK;
+}
+
+SEC("encap_ip6udp_eth")
+int __encap_ip6udp_eth(struct __sk_buff *skb)
+{
+ if (skb->protocol == __bpf_constant_htons(ETH_P_IPV6))
+ return encap_ipv6(skb, IPPROTO_UDP, ETH_P_TEB);
+ else
+ return TC_ACT_OK;
+}
+
+static int decap_internal(struct __sk_buff *skb, int off, int len, char proto)
+{
+ char buf[sizeof(struct v6hdr)];
+ struct gre_hdr greh;
+ struct udphdr udph;
+ int olen = len;
+
+ switch (proto) {
+ case IPPROTO_IPIP:
+ case IPPROTO_IPV6:
+ break;
+ case IPPROTO_GRE:
+ olen += sizeof(struct gre_hdr);
+ if (bpf_skb_load_bytes(skb, off + len, &greh, sizeof(greh)) < 0)
+ return TC_ACT_OK;
+ switch (bpf_ntohs(greh.protocol)) {
+ case ETH_P_MPLS_UC:
+ olen += sizeof(mpls_label);
+ break;
+ case ETH_P_TEB:
+ olen += ETH_HLEN;
+ break;
+ }
+ break;
+ case IPPROTO_UDP:
+ olen += sizeof(struct udphdr);
+ if (bpf_skb_load_bytes(skb, off + len, &udph, sizeof(udph)) < 0)
+ return TC_ACT_OK;
+ switch (bpf_ntohs(udph.dest)) {
+ case MPLS_OVER_UDP_PORT:
+ olen += sizeof(mpls_label);
+ break;
+ case ETH_OVER_UDP_PORT:
+ olen += ETH_HLEN;
+ break;
+ }
+ break;
+ default:
+ return TC_ACT_OK;
+ }
+
+ if (bpf_skb_adjust_room(skb, -olen, BPF_ADJ_ROOM_MAC,
+ BPF_F_ADJ_ROOM_FIXED_GSO))
+ return TC_ACT_SHOT;
+
+ return TC_ACT_OK;
+}
+
+static int decap_ipv4(struct __sk_buff *skb)
+{
+ struct iphdr iph_outer;
+
+ if (bpf_skb_load_bytes(skb, ETH_HLEN, &iph_outer,
+ sizeof(iph_outer)) < 0)
+ return TC_ACT_OK;
+
+ if (iph_outer.ihl != 5)
+ return TC_ACT_OK;
+
+ return decap_internal(skb, ETH_HLEN, sizeof(iph_outer),
+ iph_outer.protocol);
+}
+
+static int decap_ipv6(struct __sk_buff *skb)
+{
+ struct ipv6hdr iph_outer;
+
+ if (bpf_skb_load_bytes(skb, ETH_HLEN, &iph_outer,
+ sizeof(iph_outer)) < 0)
+ return TC_ACT_OK;
+
+ return decap_internal(skb, ETH_HLEN, sizeof(iph_outer),
+ iph_outer.nexthdr);
+}
+
+SEC("decap")
+int decap_f(struct __sk_buff *skb)
+{
+ switch (skb->protocol) {
+ case __bpf_constant_htons(ETH_P_IP):
+ return decap_ipv4(skb);
+ case __bpf_constant_htons(ETH_P_IPV6):
+ return decap_ipv6(skb);
+ default:
+ /* does not match, ignore */
+ return TC_ACT_OK;
+ }
+}
+
+char __license[] SEC("license") = "GPL";
--- /dev/null
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2018 Facebook
+// Copyright (c) 2019 Cloudflare
+
+#include <string.h>
+
+#include <linux/bpf.h>
+#include <linux/pkt_cls.h>
+#include <linux/if_ether.h>
+#include <linux/in.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <sys/socket.h>
+#include <linux/tcp.h>
+
+#include "bpf_helpers.h"
+#include "bpf_endian.h"
+
+struct bpf_map_def SEC("maps") results = {
+ .type = BPF_MAP_TYPE_ARRAY,
+ .key_size = sizeof(__u32),
+ .value_size = sizeof(__u64),
+ .max_entries = 1,
+};
+
+static __always_inline void check_syncookie(void *ctx, void *data,
+ void *data_end)
+{
+ struct bpf_sock_tuple tup;
+ struct bpf_sock *sk;
+ struct ethhdr *ethh;
+ struct iphdr *ipv4h;
+ struct ipv6hdr *ipv6h;
+ struct tcphdr *tcph;
+ int ret;
+ __u32 key = 0;
+ __u64 value = 1;
+
+ ethh = data;
+ if (ethh + 1 > data_end)
+ return;
+
+ switch (bpf_ntohs(ethh->h_proto)) {
+ case ETH_P_IP:
+ ipv4h = data + sizeof(struct ethhdr);
+ if (ipv4h + 1 > data_end)
+ return;
+
+ if (ipv4h->ihl != 5)
+ return;
+
+ tcph = data + sizeof(struct ethhdr) + sizeof(struct iphdr);
+ if (tcph + 1 > data_end)
+ return;
+
+ tup.ipv4.saddr = ipv4h->saddr;
+ tup.ipv4.daddr = ipv4h->daddr;
+ tup.ipv4.sport = tcph->source;
+ tup.ipv4.dport = tcph->dest;
+
+ sk = bpf_skc_lookup_tcp(ctx, &tup, sizeof(tup.ipv4),
+ BPF_F_CURRENT_NETNS, 0);
+ if (!sk)
+ return;
+
+ if (sk->state != BPF_TCP_LISTEN)
+ goto release;
+
+ ret = bpf_tcp_check_syncookie(sk, ipv4h, sizeof(*ipv4h),
+ tcph, sizeof(*tcph));
+ break;
+
+ case ETH_P_IPV6:
+ ipv6h = data + sizeof(struct ethhdr);
+ if (ipv6h + 1 > data_end)
+ return;
+
+ if (ipv6h->nexthdr != IPPROTO_TCP)
+ return;
+
+ tcph = data + sizeof(struct ethhdr) + sizeof(struct ipv6hdr);
+ if (tcph + 1 > data_end)
+ return;
+
+ memcpy(tup.ipv6.saddr, &ipv6h->saddr, sizeof(tup.ipv6.saddr));
+ memcpy(tup.ipv6.daddr, &ipv6h->daddr, sizeof(tup.ipv6.daddr));
+ tup.ipv6.sport = tcph->source;
+ tup.ipv6.dport = tcph->dest;
+
+ sk = bpf_skc_lookup_tcp(ctx, &tup, sizeof(tup.ipv6),
+ BPF_F_CURRENT_NETNS, 0);
+ if (!sk)
+ return;
+
+ if (sk->state != BPF_TCP_LISTEN)
+ goto release;
+
+ ret = bpf_tcp_check_syncookie(sk, ipv6h, sizeof(*ipv6h),
+ tcph, sizeof(*tcph));
+ break;
+
+ default:
+ return;
+ }
+
+ if (ret == 0)
+ bpf_map_update_elem(&results, &key, &value, 0);
+
+release:
+ bpf_sk_release(sk);
+}
+
+SEC("clsact/check_syncookie")
+int check_syncookie_clsact(struct __sk_buff *skb)
+{
+ check_syncookie(skb, (void *)(long)skb->data,
+ (void *)(long)skb->data_end);
+ return TC_ACT_OK;
+}
+
+SEC("xdp/check_syncookie")
+int check_syncookie_xdp(struct xdp_md *ctx)
+{
+ check_syncookie(ctx, (void *)(long)ctx->data,
+ (void *)(long)ctx->data_end);
+ return XDP_PASS;
+}
+
+char _license[] SEC("license") = "GPL";
--- /dev/null
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2019 Facebook
+#include <linux/bpf.h>
+#include "bpf_helpers.h"
+#define ATTR __attribute__((noinline))
+#include "test_jhash.h"
+
+SEC("scale90_noinline")
+int balancer_ingress(struct __sk_buff *ctx)
+{
+ void *data_end = (void *)(long)ctx->data_end;
+ void *data = (void *)(long)ctx->data;
+ void *ptr;
+ int ret = 0, nh_off, i = 0;
+
+ nh_off = 14;
+
+ /* pragma unroll doesn't work on large loops */
+
+#define C do { \
+ ptr = data + i; \
+ if (ptr + nh_off > data_end) \
+ break; \
+ ctx->tc_index = jhash(ptr, nh_off, ctx->cb[0] + i++); \
+ } while (0);
+#define C30 C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;
+ C30;C30;C30; /* 90 calls */
+ return 0;
+}
+char _license[] SEC("license") = "GPL";
--- /dev/null
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2019 Facebook
+#include <linux/bpf.h>
+#include "bpf_helpers.h"
+#define ATTR __attribute__((always_inline))
+#include "test_jhash.h"
+
+SEC("scale90_inline")
+int balancer_ingress(struct __sk_buff *ctx)
+{
+ void *data_end = (void *)(long)ctx->data_end;
+ void *data = (void *)(long)ctx->data;
+ void *ptr;
+ int ret = 0, nh_off, i = 0;
+
+ nh_off = 14;
+
+ /* pragma unroll doesn't work on large loops */
+
+#define C do { \
+ ptr = data + i; \
+ if (ptr + nh_off > data_end) \
+ break; \
+ ctx->tc_index = jhash(ptr, nh_off, ctx->cb[0] + i++); \
+ } while (0);
+#define C30 C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;
+ C30;C30;C30; /* 90 calls */
+ return 0;
+}
+char _license[] SEC("license") = "GPL";
--- /dev/null
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2019 Facebook
+#include <linux/bpf.h>
+#include "bpf_helpers.h"
+#define ATTR __attribute__((noinline))
+#include "test_jhash.h"
+
+SEC("scale90_noinline32")
+int balancer_ingress(struct __sk_buff *ctx)
+{
+ void *data_end = (void *)(long)ctx->data_end;
+ void *data = (void *)(long)ctx->data;
+ void *ptr;
+ int ret = 0, nh_off, i = 0;
+
+ nh_off = 32;
+
+ /* pragma unroll doesn't work on large loops */
+
+#define C do { \
+ ptr = data + i; \
+ if (ptr + nh_off > data_end) \
+ break; \
+ ctx->tc_index = jhash(ptr, nh_off, ctx->cb[0] + i++); \
+ } while (0);
+#define C30 C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;
+ C30;C30;C30; /* 90 calls */
+ return 0;
+}
+char _license[] SEC("license") = "GPL";
#define BTF_UNION_ENC(name, nr_elems, sz) \
BTF_TYPE_ENC(name, BTF_INFO_ENC(BTF_KIND_UNION, 0, nr_elems), sz)
+#define BTF_VAR_ENC(name, type, linkage) \
+ BTF_TYPE_ENC(name, BTF_INFO_ENC(BTF_KIND_VAR, 0, 0), type), (linkage)
+#define BTF_VAR_SECINFO_ENC(type, offset, size) \
+ (type), (offset), (size)
+
#define BTF_MEMBER_ENC(name, type, bits_offset) \
(name), (type), (bits_offset)
#define BTF_ENUM_ENC(name, val) (name), (val)
.value_type_id = 3,
.max_entries = 4,
},
-
{
.descr = "struct test #3 Invalid member offset",
.raw_types = {
.btf_load_err = true,
.err_str = "Invalid member bits_offset",
},
-
+/*
+ * struct A {
+ * unsigned long long m;
+ * int n;
+ * char o;
+ * [3 bytes hole]
+ * int p[8];
+ * };
+ */
+{
+ .descr = "global data test #1",
+ .raw_types = {
+ /* int */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [1] */
+ /* unsigned long long */
+ BTF_TYPE_INT_ENC(0, 0, 0, 64, 8), /* [2] */
+ /* char */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 8, 1), /* [3] */
+ /* int[8] */
+ BTF_TYPE_ARRAY_ENC(1, 1, 8), /* [4] */
+ /* struct A { */ /* [5] */
+ BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_STRUCT, 0, 4), 48),
+ BTF_MEMBER_ENC(NAME_TBD, 2, 0), /* unsigned long long m;*/
+ BTF_MEMBER_ENC(NAME_TBD, 1, 64),/* int n; */
+ BTF_MEMBER_ENC(NAME_TBD, 3, 96),/* char o; */
+ BTF_MEMBER_ENC(NAME_TBD, 4, 128),/* int p[8] */
+ /* } */
+ BTF_END_RAW,
+ },
+ .str_sec = "\0A\0m\0n\0o\0p",
+ .str_sec_size = sizeof("\0A\0m\0n\0o\0p"),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = "struct_test1_map",
+ .key_size = sizeof(int),
+ .value_size = 48,
+ .key_type_id = 1,
+ .value_type_id = 5,
+ .max_entries = 4,
+},
+/*
+ * struct A {
+ * unsigned long long m;
+ * int n;
+ * char o;
+ * [3 bytes hole]
+ * int p[8];
+ * };
+ * static struct A t; <- in .bss
+ */
+{
+ .descr = "global data test #2",
+ .raw_types = {
+ /* int */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [1] */
+ /* unsigned long long */
+ BTF_TYPE_INT_ENC(0, 0, 0, 64, 8), /* [2] */
+ /* char */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 8, 1), /* [3] */
+ /* int[8] */
+ BTF_TYPE_ARRAY_ENC(1, 1, 8), /* [4] */
+ /* struct A { */ /* [5] */
+ BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_STRUCT, 0, 4), 48),
+ BTF_MEMBER_ENC(NAME_TBD, 2, 0), /* unsigned long long m;*/
+ BTF_MEMBER_ENC(NAME_TBD, 1, 64),/* int n; */
+ BTF_MEMBER_ENC(NAME_TBD, 3, 96),/* char o; */
+ BTF_MEMBER_ENC(NAME_TBD, 4, 128),/* int p[8] */
+ /* } */
+ /* static struct A t */
+ BTF_VAR_ENC(NAME_TBD, 5, 0), /* [6] */
+ /* .bss section */ /* [7] */
+ BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_DATASEC, 0, 1), 48),
+ BTF_VAR_SECINFO_ENC(6, 0, 48),
+ BTF_END_RAW,
+ },
+ .str_sec = "\0A\0m\0n\0o\0p\0t\0.bss",
+ .str_sec_size = sizeof("\0A\0m\0n\0o\0p\0t\0.bss"),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = ".bss",
+ .key_size = sizeof(int),
+ .value_size = 48,
+ .key_type_id = 0,
+ .value_type_id = 7,
+ .max_entries = 1,
+},
+{
+ .descr = "global data test #3",
+ .raw_types = {
+ /* int */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [1] */
+ /* static int t */
+ BTF_VAR_ENC(NAME_TBD, 1, 0), /* [2] */
+ /* .bss section */ /* [3] */
+ BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_DATASEC, 0, 1), 4),
+ BTF_VAR_SECINFO_ENC(2, 0, 4),
+ BTF_END_RAW,
+ },
+ .str_sec = "\0t\0.bss",
+ .str_sec_size = sizeof("\0t\0.bss"),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = ".bss",
+ .key_size = sizeof(int),
+ .value_size = 4,
+ .key_type_id = 0,
+ .value_type_id = 3,
+ .max_entries = 1,
+},
+{
+ .descr = "global data test #4, unsupported linkage",
+ .raw_types = {
+ /* int */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [1] */
+ /* static int t */
+ BTF_VAR_ENC(NAME_TBD, 1, 2), /* [2] */
+ /* .bss section */ /* [3] */
+ BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_DATASEC, 0, 1), 4),
+ BTF_VAR_SECINFO_ENC(2, 0, 4),
+ BTF_END_RAW,
+ },
+ .str_sec = "\0t\0.bss",
+ .str_sec_size = sizeof("\0t\0.bss"),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = ".bss",
+ .key_size = sizeof(int),
+ .value_size = 4,
+ .key_type_id = 0,
+ .value_type_id = 3,
+ .max_entries = 1,
+ .btf_load_err = true,
+ .err_str = "Linkage not supported",
+},
+{
+ .descr = "global data test #5, invalid var type",
+ .raw_types = {
+ /* static void t */
+ BTF_VAR_ENC(NAME_TBD, 0, 0), /* [1] */
+ /* .bss section */ /* [2] */
+ BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_DATASEC, 0, 1), 4),
+ BTF_VAR_SECINFO_ENC(1, 0, 4),
+ BTF_END_RAW,
+ },
+ .str_sec = "\0t\0.bss",
+ .str_sec_size = sizeof("\0t\0.bss"),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = ".bss",
+ .key_size = sizeof(int),
+ .value_size = 4,
+ .key_type_id = 0,
+ .value_type_id = 2,
+ .max_entries = 1,
+ .btf_load_err = true,
+ .err_str = "Invalid type_id",
+},
+{
+ .descr = "global data test #6, invalid var type (fwd type)",
+ .raw_types = {
+ /* union A */
+ BTF_TYPE_ENC(NAME_TBD,
+ BTF_INFO_ENC(BTF_KIND_FWD, 1, 0), 0), /* [1] */
+ /* static union A t */
+ BTF_VAR_ENC(NAME_TBD, 1, 0), /* [2] */
+ /* .bss section */ /* [3] */
+ BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_DATASEC, 0, 1), 4),
+ BTF_VAR_SECINFO_ENC(2, 0, 4),
+ BTF_END_RAW,
+ },
+ .str_sec = "\0A\0t\0.bss",
+ .str_sec_size = sizeof("\0A\0t\0.bss"),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = ".bss",
+ .key_size = sizeof(int),
+ .value_size = 4,
+ .key_type_id = 0,
+ .value_type_id = 2,
+ .max_entries = 1,
+ .btf_load_err = true,
+ .err_str = "Invalid type",
+},
+{
+ .descr = "global data test #7, invalid var type (fwd type)",
+ .raw_types = {
+ /* union A */
+ BTF_TYPE_ENC(NAME_TBD,
+ BTF_INFO_ENC(BTF_KIND_FWD, 1, 0), 0), /* [1] */
+ /* static union A t */
+ BTF_VAR_ENC(NAME_TBD, 1, 0), /* [2] */
+ /* .bss section */ /* [3] */
+ BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_DATASEC, 0, 1), 4),
+ BTF_VAR_SECINFO_ENC(1, 0, 4),
+ BTF_END_RAW,
+ },
+ .str_sec = "\0A\0t\0.bss",
+ .str_sec_size = sizeof("\0A\0t\0.bss"),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = ".bss",
+ .key_size = sizeof(int),
+ .value_size = 4,
+ .key_type_id = 0,
+ .value_type_id = 2,
+ .max_entries = 1,
+ .btf_load_err = true,
+ .err_str = "Invalid type",
+},
+{
+ .descr = "global data test #8, invalid var size",
+ .raw_types = {
+ /* int */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [1] */
+ /* unsigned long long */
+ BTF_TYPE_INT_ENC(0, 0, 0, 64, 8), /* [2] */
+ /* char */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 8, 1), /* [3] */
+ /* int[8] */
+ BTF_TYPE_ARRAY_ENC(1, 1, 8), /* [4] */
+ /* struct A { */ /* [5] */
+ BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_STRUCT, 0, 4), 48),
+ BTF_MEMBER_ENC(NAME_TBD, 2, 0), /* unsigned long long m;*/
+ BTF_MEMBER_ENC(NAME_TBD, 1, 64),/* int n; */
+ BTF_MEMBER_ENC(NAME_TBD, 3, 96),/* char o; */
+ BTF_MEMBER_ENC(NAME_TBD, 4, 128),/* int p[8] */
+ /* } */
+ /* static struct A t */
+ BTF_VAR_ENC(NAME_TBD, 5, 0), /* [6] */
+ /* .bss section */ /* [7] */
+ BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_DATASEC, 0, 1), 48),
+ BTF_VAR_SECINFO_ENC(6, 0, 47),
+ BTF_END_RAW,
+ },
+ .str_sec = "\0A\0m\0n\0o\0p\0t\0.bss",
+ .str_sec_size = sizeof("\0A\0m\0n\0o\0p\0t\0.bss"),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = ".bss",
+ .key_size = sizeof(int),
+ .value_size = 48,
+ .key_type_id = 0,
+ .value_type_id = 7,
+ .max_entries = 1,
+ .btf_load_err = true,
+ .err_str = "Invalid size",
+},
+{
+ .descr = "global data test #9, invalid var size",
+ .raw_types = {
+ /* int */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [1] */
+ /* unsigned long long */
+ BTF_TYPE_INT_ENC(0, 0, 0, 64, 8), /* [2] */
+ /* char */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 8, 1), /* [3] */
+ /* int[8] */
+ BTF_TYPE_ARRAY_ENC(1, 1, 8), /* [4] */
+ /* struct A { */ /* [5] */
+ BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_STRUCT, 0, 4), 48),
+ BTF_MEMBER_ENC(NAME_TBD, 2, 0), /* unsigned long long m;*/
+ BTF_MEMBER_ENC(NAME_TBD, 1, 64),/* int n; */
+ BTF_MEMBER_ENC(NAME_TBD, 3, 96),/* char o; */
+ BTF_MEMBER_ENC(NAME_TBD, 4, 128),/* int p[8] */
+ /* } */
+ /* static struct A t */
+ BTF_VAR_ENC(NAME_TBD, 5, 0), /* [6] */
+ /* .bss section */ /* [7] */
+ BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_DATASEC, 0, 1), 46),
+ BTF_VAR_SECINFO_ENC(6, 0, 48),
+ BTF_END_RAW,
+ },
+ .str_sec = "\0A\0m\0n\0o\0p\0t\0.bss",
+ .str_sec_size = sizeof("\0A\0m\0n\0o\0p\0t\0.bss"),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = ".bss",
+ .key_size = sizeof(int),
+ .value_size = 48,
+ .key_type_id = 0,
+ .value_type_id = 7,
+ .max_entries = 1,
+ .btf_load_err = true,
+ .err_str = "Invalid size",
+},
+{
+ .descr = "global data test #10, invalid var size",
+ .raw_types = {
+ /* int */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [1] */
+ /* unsigned long long */
+ BTF_TYPE_INT_ENC(0, 0, 0, 64, 8), /* [2] */
+ /* char */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 8, 1), /* [3] */
+ /* int[8] */
+ BTF_TYPE_ARRAY_ENC(1, 1, 8), /* [4] */
+ /* struct A { */ /* [5] */
+ BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_STRUCT, 0, 4), 48),
+ BTF_MEMBER_ENC(NAME_TBD, 2, 0), /* unsigned long long m;*/
+ BTF_MEMBER_ENC(NAME_TBD, 1, 64),/* int n; */
+ BTF_MEMBER_ENC(NAME_TBD, 3, 96),/* char o; */
+ BTF_MEMBER_ENC(NAME_TBD, 4, 128),/* int p[8] */
+ /* } */
+ /* static struct A t */
+ BTF_VAR_ENC(NAME_TBD, 5, 0), /* [6] */
+ /* .bss section */ /* [7] */
+ BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_DATASEC, 0, 1), 46),
+ BTF_VAR_SECINFO_ENC(6, 0, 46),
+ BTF_END_RAW,
+ },
+ .str_sec = "\0A\0m\0n\0o\0p\0t\0.bss",
+ .str_sec_size = sizeof("\0A\0m\0n\0o\0p\0t\0.bss"),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = ".bss",
+ .key_size = sizeof(int),
+ .value_size = 48,
+ .key_type_id = 0,
+ .value_type_id = 7,
+ .max_entries = 1,
+ .btf_load_err = true,
+ .err_str = "Invalid size",
+},
+{
+ .descr = "global data test #11, multiple section members",
+ .raw_types = {
+ /* int */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [1] */
+ /* unsigned long long */
+ BTF_TYPE_INT_ENC(0, 0, 0, 64, 8), /* [2] */
+ /* char */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 8, 1), /* [3] */
+ /* int[8] */
+ BTF_TYPE_ARRAY_ENC(1, 1, 8), /* [4] */
+ /* struct A { */ /* [5] */
+ BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_STRUCT, 0, 4), 48),
+ BTF_MEMBER_ENC(NAME_TBD, 2, 0), /* unsigned long long m;*/
+ BTF_MEMBER_ENC(NAME_TBD, 1, 64),/* int n; */
+ BTF_MEMBER_ENC(NAME_TBD, 3, 96),/* char o; */
+ BTF_MEMBER_ENC(NAME_TBD, 4, 128),/* int p[8] */
+ /* } */
+ /* static struct A t */
+ BTF_VAR_ENC(NAME_TBD, 5, 0), /* [6] */
+ /* static int u */
+ BTF_VAR_ENC(NAME_TBD, 1, 0), /* [7] */
+ /* .bss section */ /* [8] */
+ BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_DATASEC, 0, 2), 62),
+ BTF_VAR_SECINFO_ENC(6, 10, 48),
+ BTF_VAR_SECINFO_ENC(7, 58, 4),
+ BTF_END_RAW,
+ },
+ .str_sec = "\0A\0m\0n\0o\0p\0t\0u\0.bss",
+ .str_sec_size = sizeof("\0A\0m\0n\0o\0p\0t\0u\0.bss"),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = ".bss",
+ .key_size = sizeof(int),
+ .value_size = 62,
+ .key_type_id = 0,
+ .value_type_id = 8,
+ .max_entries = 1,
+},
+{
+ .descr = "global data test #12, invalid offset",
+ .raw_types = {
+ /* int */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [1] */
+ /* unsigned long long */
+ BTF_TYPE_INT_ENC(0, 0, 0, 64, 8), /* [2] */
+ /* char */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 8, 1), /* [3] */
+ /* int[8] */
+ BTF_TYPE_ARRAY_ENC(1, 1, 8), /* [4] */
+ /* struct A { */ /* [5] */
+ BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_STRUCT, 0, 4), 48),
+ BTF_MEMBER_ENC(NAME_TBD, 2, 0), /* unsigned long long m;*/
+ BTF_MEMBER_ENC(NAME_TBD, 1, 64),/* int n; */
+ BTF_MEMBER_ENC(NAME_TBD, 3, 96),/* char o; */
+ BTF_MEMBER_ENC(NAME_TBD, 4, 128),/* int p[8] */
+ /* } */
+ /* static struct A t */
+ BTF_VAR_ENC(NAME_TBD, 5, 0), /* [6] */
+ /* static int u */
+ BTF_VAR_ENC(NAME_TBD, 1, 0), /* [7] */
+ /* .bss section */ /* [8] */
+ BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_DATASEC, 0, 2), 62),
+ BTF_VAR_SECINFO_ENC(6, 10, 48),
+ BTF_VAR_SECINFO_ENC(7, 60, 4),
+ BTF_END_RAW,
+ },
+ .str_sec = "\0A\0m\0n\0o\0p\0t\0u\0.bss",
+ .str_sec_size = sizeof("\0A\0m\0n\0o\0p\0t\0u\0.bss"),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = ".bss",
+ .key_size = sizeof(int),
+ .value_size = 62,
+ .key_type_id = 0,
+ .value_type_id = 8,
+ .max_entries = 1,
+ .btf_load_err = true,
+ .err_str = "Invalid offset+size",
+},
+{
+ .descr = "global data test #13, invalid offset",
+ .raw_types = {
+ /* int */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [1] */
+ /* unsigned long long */
+ BTF_TYPE_INT_ENC(0, 0, 0, 64, 8), /* [2] */
+ /* char */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 8, 1), /* [3] */
+ /* int[8] */
+ BTF_TYPE_ARRAY_ENC(1, 1, 8), /* [4] */
+ /* struct A { */ /* [5] */
+ BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_STRUCT, 0, 4), 48),
+ BTF_MEMBER_ENC(NAME_TBD, 2, 0), /* unsigned long long m;*/
+ BTF_MEMBER_ENC(NAME_TBD, 1, 64),/* int n; */
+ BTF_MEMBER_ENC(NAME_TBD, 3, 96),/* char o; */
+ BTF_MEMBER_ENC(NAME_TBD, 4, 128),/* int p[8] */
+ /* } */
+ /* static struct A t */
+ BTF_VAR_ENC(NAME_TBD, 5, 0), /* [6] */
+ /* static int u */
+ BTF_VAR_ENC(NAME_TBD, 1, 0), /* [7] */
+ /* .bss section */ /* [8] */
+ BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_DATASEC, 0, 2), 62),
+ BTF_VAR_SECINFO_ENC(6, 10, 48),
+ BTF_VAR_SECINFO_ENC(7, 12, 4),
+ BTF_END_RAW,
+ },
+ .str_sec = "\0A\0m\0n\0o\0p\0t\0u\0.bss",
+ .str_sec_size = sizeof("\0A\0m\0n\0o\0p\0t\0u\0.bss"),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = ".bss",
+ .key_size = sizeof(int),
+ .value_size = 62,
+ .key_type_id = 0,
+ .value_type_id = 8,
+ .max_entries = 1,
+ .btf_load_err = true,
+ .err_str = "Invalid offset",
+},
+{
+ .descr = "global data test #14, invalid offset",
+ .raw_types = {
+ /* int */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [1] */
+ /* unsigned long long */
+ BTF_TYPE_INT_ENC(0, 0, 0, 64, 8), /* [2] */
+ /* char */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 8, 1), /* [3] */
+ /* int[8] */
+ BTF_TYPE_ARRAY_ENC(1, 1, 8), /* [4] */
+ /* struct A { */ /* [5] */
+ BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_STRUCT, 0, 4), 48),
+ BTF_MEMBER_ENC(NAME_TBD, 2, 0), /* unsigned long long m;*/
+ BTF_MEMBER_ENC(NAME_TBD, 1, 64),/* int n; */
+ BTF_MEMBER_ENC(NAME_TBD, 3, 96),/* char o; */
+ BTF_MEMBER_ENC(NAME_TBD, 4, 128),/* int p[8] */
+ /* } */
+ /* static struct A t */
+ BTF_VAR_ENC(NAME_TBD, 5, 0), /* [6] */
+ /* static int u */
+ BTF_VAR_ENC(NAME_TBD, 1, 0), /* [7] */
+ /* .bss section */ /* [8] */
+ BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_DATASEC, 0, 2), 62),
+ BTF_VAR_SECINFO_ENC(7, 58, 4),
+ BTF_VAR_SECINFO_ENC(6, 10, 48),
+ BTF_END_RAW,
+ },
+ .str_sec = "\0A\0m\0n\0o\0p\0t\0u\0.bss",
+ .str_sec_size = sizeof("\0A\0m\0n\0o\0p\0t\0u\0.bss"),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = ".bss",
+ .key_size = sizeof(int),
+ .value_size = 62,
+ .key_type_id = 0,
+ .value_type_id = 8,
+ .max_entries = 1,
+ .btf_load_err = true,
+ .err_str = "Invalid offset",
+},
+{
+ .descr = "global data test #15, not var kind",
+ .raw_types = {
+ /* int */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [1] */
+ BTF_VAR_ENC(NAME_TBD, 1, 0), /* [2] */
+ /* .bss section */ /* [3] */
+ BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_DATASEC, 0, 1), 4),
+ BTF_VAR_SECINFO_ENC(1, 0, 4),
+ BTF_END_RAW,
+ },
+ .str_sec = "\0A\0t\0.bss",
+ .str_sec_size = sizeof("\0A\0t\0.bss"),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = ".bss",
+ .key_size = sizeof(int),
+ .value_size = 4,
+ .key_type_id = 0,
+ .value_type_id = 3,
+ .max_entries = 1,
+ .btf_load_err = true,
+ .err_str = "Not a VAR kind member",
+},
+{
+ .descr = "global data test #16, invalid var referencing sec",
+ .raw_types = {
+ /* int */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [1] */
+ BTF_VAR_ENC(NAME_TBD, 5, 0), /* [2] */
+ BTF_VAR_ENC(NAME_TBD, 2, 0), /* [3] */
+ /* a section */ /* [4] */
+ BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_DATASEC, 0, 1), 4),
+ BTF_VAR_SECINFO_ENC(3, 0, 4),
+ /* a section */ /* [5] */
+ BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_DATASEC, 0, 1), 4),
+ BTF_VAR_SECINFO_ENC(6, 0, 4),
+ BTF_VAR_ENC(NAME_TBD, 1, 0), /* [6] */
+ BTF_END_RAW,
+ },
+ .str_sec = "\0A\0t\0s\0a\0a",
+ .str_sec_size = sizeof("\0A\0t\0s\0a\0a"),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = ".bss",
+ .key_size = sizeof(int),
+ .value_size = 4,
+ .key_type_id = 0,
+ .value_type_id = 4,
+ .max_entries = 1,
+ .btf_load_err = true,
+ .err_str = "Invalid type_id",
+},
+{
+ .descr = "global data test #17, invalid var referencing var",
+ .raw_types = {
+ /* int */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [1] */
+ BTF_VAR_ENC(NAME_TBD, 1, 0), /* [2] */
+ BTF_VAR_ENC(NAME_TBD, 2, 0), /* [3] */
+ /* a section */ /* [4] */
+ BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_DATASEC, 0, 1), 4),
+ BTF_VAR_SECINFO_ENC(3, 0, 4),
+ BTF_END_RAW,
+ },
+ .str_sec = "\0A\0t\0s\0a\0a",
+ .str_sec_size = sizeof("\0A\0t\0s\0a\0a"),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = ".bss",
+ .key_size = sizeof(int),
+ .value_size = 4,
+ .key_type_id = 0,
+ .value_type_id = 4,
+ .max_entries = 1,
+ .btf_load_err = true,
+ .err_str = "Invalid type_id",
+},
+{
+ .descr = "global data test #18, invalid var loop",
+ .raw_types = {
+ /* int */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [1] */
+ BTF_VAR_ENC(NAME_TBD, 2, 0), /* [2] */
+ /* .bss section */ /* [3] */
+ BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_DATASEC, 0, 1), 4),
+ BTF_VAR_SECINFO_ENC(2, 0, 4),
+ BTF_END_RAW,
+ },
+ .str_sec = "\0A\0t\0aaa",
+ .str_sec_size = sizeof("\0A\0t\0aaa"),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = ".bss",
+ .key_size = sizeof(int),
+ .value_size = 4,
+ .key_type_id = 0,
+ .value_type_id = 4,
+ .max_entries = 1,
+ .btf_load_err = true,
+ .err_str = "Invalid type_id",
+},
+{
+ .descr = "global data test #19, invalid var referencing var",
+ .raw_types = {
+ /* int */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [1] */
+ BTF_VAR_ENC(NAME_TBD, 3, 0), /* [2] */
+ BTF_VAR_ENC(NAME_TBD, 1, 0), /* [3] */
+ BTF_END_RAW,
+ },
+ .str_sec = "\0A\0t\0s\0a\0a",
+ .str_sec_size = sizeof("\0A\0t\0s\0a\0a"),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = ".bss",
+ .key_size = sizeof(int),
+ .value_size = 4,
+ .key_type_id = 0,
+ .value_type_id = 4,
+ .max_entries = 1,
+ .btf_load_err = true,
+ .err_str = "Invalid type_id",
+},
+{
+ .descr = "global data test #20, invalid ptr referencing var",
+ .raw_types = {
+ /* int */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [1] */
+ /* PTR type_id=3 */ /* [2] */
+ BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_PTR, 0, 0), 3),
+ BTF_VAR_ENC(NAME_TBD, 1, 0), /* [3] */
+ BTF_END_RAW,
+ },
+ .str_sec = "\0A\0t\0s\0a\0a",
+ .str_sec_size = sizeof("\0A\0t\0s\0a\0a"),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = ".bss",
+ .key_size = sizeof(int),
+ .value_size = 4,
+ .key_type_id = 0,
+ .value_type_id = 4,
+ .max_entries = 1,
+ .btf_load_err = true,
+ .err_str = "Invalid type_id",
+},
+{
+ .descr = "global data test #21, var included in struct",
+ .raw_types = {
+ /* int */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [1] */
+ /* struct A { */ /* [2] */
+ BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_STRUCT, 0, 2), sizeof(int) * 2),
+ BTF_MEMBER_ENC(NAME_TBD, 1, 0), /* int m; */
+ BTF_MEMBER_ENC(NAME_TBD, 3, 32),/* VAR type_id=3; */
+ /* } */
+ BTF_VAR_ENC(NAME_TBD, 1, 0), /* [3] */
+ BTF_END_RAW,
+ },
+ .str_sec = "\0A\0t\0s\0a\0a",
+ .str_sec_size = sizeof("\0A\0t\0s\0a\0a"),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = ".bss",
+ .key_size = sizeof(int),
+ .value_size = 4,
+ .key_type_id = 0,
+ .value_type_id = 4,
+ .max_entries = 1,
+ .btf_load_err = true,
+ .err_str = "Invalid member",
+},
+{
+ .descr = "global data test #22, array of var",
+ .raw_types = {
+ /* int */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [1] */
+ BTF_TYPE_ARRAY_ENC(3, 1, 4), /* [2] */
+ BTF_VAR_ENC(NAME_TBD, 1, 0), /* [3] */
+ BTF_END_RAW,
+ },
+ .str_sec = "\0A\0t\0s\0a\0a",
+ .str_sec_size = sizeof("\0A\0t\0s\0a\0a"),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = ".bss",
+ .key_size = sizeof(int),
+ .value_size = 4,
+ .key_type_id = 0,
+ .value_type_id = 4,
+ .max_entries = 1,
+ .btf_load_err = true,
+ .err_str = "Invalid elem",
+},
/* Test member exceeds the size of struct.
*
* struct A {
} aenum;
uint32_t ui32b;
uint32_t bits2c:2;
+ uint8_t si8_4[2][2];
};
#ifdef __SIZEOF_INT128__
BTF_ENUM_ENC(NAME_TBD, 2),
BTF_ENUM_ENC(NAME_TBD, 3),
/* struct pprint_mapv */ /* [16] */
- BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_STRUCT, 0, 10), 40),
+ BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_STRUCT, 0, 11), 40),
BTF_MEMBER_ENC(NAME_TBD, 11, 0), /* uint32_t ui32 */
BTF_MEMBER_ENC(NAME_TBD, 10, 32), /* uint16_t ui16 */
BTF_MEMBER_ENC(NAME_TBD, 12, 64), /* int32_t si32 */
BTF_MEMBER_ENC(NAME_TBD, 15, 192), /* aenum */
BTF_MEMBER_ENC(NAME_TBD, 11, 224), /* uint32_t ui32b */
BTF_MEMBER_ENC(NAME_TBD, 6, 256), /* bits2c */
+ BTF_MEMBER_ENC(NAME_TBD, 17, 264), /* si8_4 */
+ BTF_TYPE_ARRAY_ENC(18, 1, 2), /* [17] */
+ BTF_TYPE_ARRAY_ENC(1, 1, 2), /* [18] */
BTF_END_RAW,
},
- BTF_STR_SEC("\0unsigned char\0unsigned short\0unsigned int\0int\0unsigned long long\0uint8_t\0uint16_t\0uint32_t\0int32_t\0uint64_t\0ui64\0ui8a\0ENUM_ZERO\0ENUM_ONE\0ENUM_TWO\0ENUM_THREE\0pprint_mapv\0ui32\0ui16\0si32\0unused_bits2a\0bits28\0unused_bits2b\0aenum\0ui32b\0bits2c"),
+ BTF_STR_SEC("\0unsigned char\0unsigned short\0unsigned int\0int\0unsigned long long\0uint8_t\0uint16_t\0uint32_t\0int32_t\0uint64_t\0ui64\0ui8a\0ENUM_ZERO\0ENUM_ONE\0ENUM_TWO\0ENUM_THREE\0pprint_mapv\0ui32\0ui16\0si32\0unused_bits2a\0bits28\0unused_bits2b\0aenum\0ui32b\0bits2c\0si8_4"),
.key_size = sizeof(unsigned int),
.value_size = sizeof(struct pprint_mapv),
.key_type_id = 3, /* unsigned int */
BTF_ENUM_ENC(NAME_TBD, 2),
BTF_ENUM_ENC(NAME_TBD, 3),
/* struct pprint_mapv */ /* [16] */
- BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_STRUCT, 1, 10), 40),
+ BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_STRUCT, 1, 11), 40),
BTF_MEMBER_ENC(NAME_TBD, 11, BTF_MEMBER_OFFSET(0, 0)), /* uint32_t ui32 */
BTF_MEMBER_ENC(NAME_TBD, 10, BTF_MEMBER_OFFSET(0, 32)), /* uint16_t ui16 */
BTF_MEMBER_ENC(NAME_TBD, 12, BTF_MEMBER_OFFSET(0, 64)), /* int32_t si32 */
BTF_MEMBER_ENC(NAME_TBD, 15, BTF_MEMBER_OFFSET(0, 192)), /* aenum */
BTF_MEMBER_ENC(NAME_TBD, 11, BTF_MEMBER_OFFSET(0, 224)), /* uint32_t ui32b */
BTF_MEMBER_ENC(NAME_TBD, 6, BTF_MEMBER_OFFSET(2, 256)), /* bits2c */
+ BTF_MEMBER_ENC(NAME_TBD, 17, 264), /* si8_4 */
+ BTF_TYPE_ARRAY_ENC(18, 1, 2), /* [17] */
+ BTF_TYPE_ARRAY_ENC(1, 1, 2), /* [18] */
BTF_END_RAW,
},
- BTF_STR_SEC("\0unsigned char\0unsigned short\0unsigned int\0int\0unsigned long long\0uint8_t\0uint16_t\0uint32_t\0int32_t\0uint64_t\0ui64\0ui8a\0ENUM_ZERO\0ENUM_ONE\0ENUM_TWO\0ENUM_THREE\0pprint_mapv\0ui32\0ui16\0si32\0unused_bits2a\0bits28\0unused_bits2b\0aenum\0ui32b\0bits2c"),
+ BTF_STR_SEC("\0unsigned char\0unsigned short\0unsigned int\0int\0unsigned long long\0uint8_t\0uint16_t\0uint32_t\0int32_t\0uint64_t\0ui64\0ui8a\0ENUM_ZERO\0ENUM_ONE\0ENUM_TWO\0ENUM_THREE\0pprint_mapv\0ui32\0ui16\0si32\0unused_bits2a\0bits28\0unused_bits2b\0aenum\0ui32b\0bits2c\0si8_4"),
.key_size = sizeof(unsigned int),
.value_size = sizeof(struct pprint_mapv),
.key_type_id = 3, /* unsigned int */
BTF_ENUM_ENC(NAME_TBD, 2),
BTF_ENUM_ENC(NAME_TBD, 3),
/* struct pprint_mapv */ /* [16] */
- BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_STRUCT, 1, 10), 40),
+ BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_STRUCT, 1, 11), 40),
BTF_MEMBER_ENC(NAME_TBD, 11, BTF_MEMBER_OFFSET(0, 0)), /* uint32_t ui32 */
BTF_MEMBER_ENC(NAME_TBD, 10, BTF_MEMBER_OFFSET(0, 32)), /* uint16_t ui16 */
BTF_MEMBER_ENC(NAME_TBD, 12, BTF_MEMBER_OFFSET(0, 64)), /* int32_t si32 */
BTF_MEMBER_ENC(NAME_TBD, 15, BTF_MEMBER_OFFSET(0, 192)), /* aenum */
BTF_MEMBER_ENC(NAME_TBD, 11, BTF_MEMBER_OFFSET(0, 224)), /* uint32_t ui32b */
BTF_MEMBER_ENC(NAME_TBD, 17, BTF_MEMBER_OFFSET(2, 256)), /* bits2c */
+ BTF_MEMBER_ENC(NAME_TBD, 20, BTF_MEMBER_OFFSET(0, 264)), /* si8_4 */
/* typedef unsigned int ___int */ /* [17] */
BTF_TYPEDEF_ENC(NAME_TBD, 18),
BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_VOLATILE, 0, 0), 6), /* [18] */
BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_CONST, 0, 0), 15), /* [19] */
+ BTF_TYPE_ARRAY_ENC(21, 1, 2), /* [20] */
+ BTF_TYPE_ARRAY_ENC(1, 1, 2), /* [21] */
BTF_END_RAW,
},
- BTF_STR_SEC("\0unsigned char\0unsigned short\0unsigned int\0int\0unsigned long long\0uint8_t\0uint16_t\0uint32_t\0int32_t\0uint64_t\0ui64\0ui8a\0ENUM_ZERO\0ENUM_ONE\0ENUM_TWO\0ENUM_THREE\0pprint_mapv\0ui32\0ui16\0si32\0unused_bits2a\0bits28\0unused_bits2b\0aenum\0ui32b\0bits2c\0___int"),
+ BTF_STR_SEC("\0unsigned char\0unsigned short\0unsigned int\0int\0unsigned long long\0uint8_t\0uint16_t\0uint32_t\0int32_t\0uint64_t\0ui64\0ui8a\0ENUM_ZERO\0ENUM_ONE\0ENUM_TWO\0ENUM_THREE\0pprint_mapv\0ui32\0ui16\0si32\0unused_bits2a\0bits28\0unused_bits2b\0aenum\0ui32b\0bits2c\0___int\0si8_4"),
.key_size = sizeof(unsigned int),
.value_size = sizeof(struct pprint_mapv),
.key_type_id = 3, /* unsigned int */
v->aenum = i & 0x03;
v->ui32b = 4;
v->bits2c = 1;
+ v->si8_4[0][0] = (cpu + i) & 0xff;
+ v->si8_4[0][1] = (cpu + i + 1) & 0xff;
+ v->si8_4[1][0] = (cpu + i + 2) & 0xff;
+ v->si8_4[1][1] = (cpu + i + 3) & 0xff;
v = (void *)v + rounded_value_size;
}
}
nexpected_line = snprintf(expected_line, line_size,
"%s%u: {%u,0,%d,0x%x,0x%x,0x%x,"
"{%lu|[%u,%u,%u,%u,%u,%u,%u,%u]},%s,"
- "%u,0x%x}\n",
+ "%u,0x%x,[[%d,%d],[%d,%d]]}\n",
percpu_map ? "\tcpu" : "",
percpu_map ? cpu : next_key,
v->ui32, v->si32,
v->ui8a[6], v->ui8a[7],
pprint_enum_str[v->aenum],
v->ui32b,
- v->bits2c);
+ v->bits2c,
+ v->si8_4[0][0], v->si8_4[0][1],
+ v->si8_4[1][0], v->si8_4[1][1]);
}
#ifdef __SIZEOF_INT128__
start_test("Test if netdev removal waits for translation...")
delay_msec = 500
- sim.dfs["bpf_bind_verifier_delay"] = delay_msec
+ sim.dfs["sdev/bpf_bind_verifier_delay"] = delay_msec
start = time.time()
cmd_line = "tc filter add dev %s ingress bpf %s da skip_sw" % \
(sim['ifname'], obj)
int error_cnt, pass_cnt;
bool jit_enabled;
+bool verifier_stats = false;
struct ipv4_packet pkt_v4 = {
.eth.h_proto = __bpf_constant_htons(ETH_P_IP),
#include <prog_tests/tests.h>
#undef DECLARE
-int main(void)
+int main(int ac, char **av)
{
srand(time(NULL));
jit_enabled = is_jit_enabled();
+ if (ac == 2 && strcmp(av[1], "-s") == 0)
+ verifier_stats = true;
+
#define CALL
#include <prog_tests/tests.h>
#undef CALL
extern int error_cnt, pass_cnt;
extern bool jit_enabled;
+extern bool verifier_stats;
#define MAGIC_BYTES 123
--- /dev/null
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# This test installs a TC bpf program that throttles a TCP flow
+# with dst port = 9000 down to 5MBps. Then it measures actual
+# throughput of the flow.
+
+if [[ $EUID -ne 0 ]]; then
+ echo "This script must be run as root"
+ echo "FAIL"
+ exit 1
+fi
+
+# check that nc, dd, and timeout are present
+command -v nc >/dev/null 2>&1 || \
+ { echo >&2 "nc is not available"; exit 1; }
+command -v dd >/dev/null 2>&1 || \
+ { echo >&2 "nc is not available"; exit 1; }
+command -v timeout >/dev/null 2>&1 || \
+ { echo >&2 "timeout is not available"; exit 1; }
+
+readonly NS_SRC="ns-src-$(mktemp -u XXXXXX)"
+readonly NS_DST="ns-dst-$(mktemp -u XXXXXX)"
+
+readonly IP_SRC="172.16.1.100"
+readonly IP_DST="172.16.2.100"
+
+cleanup()
+{
+ ip netns del ${NS_SRC}
+ ip netns del ${NS_DST}
+}
+
+trap cleanup EXIT
+
+set -e # exit on error
+
+ip netns add "${NS_SRC}"
+ip netns add "${NS_DST}"
+ip link add veth_src type veth peer name veth_dst
+ip link set veth_src netns ${NS_SRC}
+ip link set veth_dst netns ${NS_DST}
+
+ip -netns ${NS_SRC} addr add ${IP_SRC}/24 dev veth_src
+ip -netns ${NS_DST} addr add ${IP_DST}/24 dev veth_dst
+
+ip -netns ${NS_SRC} link set dev veth_src up
+ip -netns ${NS_DST} link set dev veth_dst up
+
+ip -netns ${NS_SRC} route add ${IP_DST}/32 dev veth_src
+ip -netns ${NS_DST} route add ${IP_SRC}/32 dev veth_dst
+
+# set up TC on TX
+ip netns exec ${NS_SRC} tc qdisc add dev veth_src root fq
+ip netns exec ${NS_SRC} tc qdisc add dev veth_src clsact
+ip netns exec ${NS_SRC} tc filter add dev veth_src egress \
+ bpf da obj test_tc_edt.o sec cls_test
+
+
+# start the listener
+ip netns exec ${NS_DST} bash -c \
+ "nc -4 -l -s ${IP_DST} -p 9000 >/dev/null &"
+declare -i NC_PID=$!
+sleep 1
+
+declare -ir TIMEOUT=20
+declare -ir EXPECTED_BPS=5000000
+
+# run the load, capture RX bytes on DST
+declare -ir RX_BYTES_START=$( ip netns exec ${NS_DST} \
+ cat /sys/class/net/veth_dst/statistics/rx_bytes )
+
+set +e
+ip netns exec ${NS_SRC} bash -c "timeout ${TIMEOUT} dd if=/dev/zero \
+ bs=1000 count=1000000 > /dev/tcp/${IP_DST}/9000 2>/dev/null"
+set -e
+
+declare -ir RX_BYTES_END=$( ip netns exec ${NS_DST} \
+ cat /sys/class/net/veth_dst/statistics/rx_bytes )
+
+declare -ir ACTUAL_BPS=$(( ($RX_BYTES_END - $RX_BYTES_START) / $TIMEOUT ))
+
+echo $TIMEOUT $ACTUAL_BPS $EXPECTED_BPS | \
+ awk '{printf "elapsed: %d sec; bps difference: %.2f%%\n",
+ $1, ($2-$3)*100.0/$3}'
+
+# Pass the test if the actual bps is within 1% of the expected bps.
+# The difference is usually about 0.1% on a 20-sec test, and ==> zero
+# the longer the test runs.
+declare -ir RES=$( echo $ACTUAL_BPS $EXPECTED_BPS | \
+ awk 'function abs(x){return ((x < 0.0) ? -x : x)}
+ {if (abs(($1-$2)*100.0/$2) > 1.0) { print "1" }
+ else { print "0"} }' )
+if [ "${RES}" == "0" ] ; then
+ echo "PASS"
+else
+ echo "FAIL"
+ exit 1
+fi
--- /dev/null
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# In-place tunneling
+
+# must match the port that the bpf program filters on
+readonly port=8000
+
+readonly ns_prefix="ns-$$-"
+readonly ns1="${ns_prefix}1"
+readonly ns2="${ns_prefix}2"
+
+readonly ns1_v4=192.168.1.1
+readonly ns2_v4=192.168.1.2
+readonly ns1_v6=fd::1
+readonly ns2_v6=fd::2
+
+# Must match port used by bpf program
+readonly udpport=5555
+# MPLSoverUDP
+readonly mplsudpport=6635
+readonly mplsproto=137
+
+readonly infile="$(mktemp)"
+readonly outfile="$(mktemp)"
+
+setup() {
+ ip netns add "${ns1}"
+ ip netns add "${ns2}"
+
+ ip link add dev veth1 mtu 1500 netns "${ns1}" type veth \
+ peer name veth2 mtu 1500 netns "${ns2}"
+
+ ip netns exec "${ns1}" ethtool -K veth1 tso off
+
+ ip -netns "${ns1}" link set veth1 up
+ ip -netns "${ns2}" link set veth2 up
+
+ ip -netns "${ns1}" -4 addr add "${ns1_v4}/24" dev veth1
+ ip -netns "${ns2}" -4 addr add "${ns2_v4}/24" dev veth2
+ ip -netns "${ns1}" -6 addr add "${ns1_v6}/64" dev veth1 nodad
+ ip -netns "${ns2}" -6 addr add "${ns2_v6}/64" dev veth2 nodad
+
+ # clamp route to reserve room for tunnel headers
+ ip -netns "${ns1}" -4 route flush table main
+ ip -netns "${ns1}" -6 route flush table main
+ ip -netns "${ns1}" -4 route add "${ns2_v4}" mtu 1458 dev veth1
+ ip -netns "${ns1}" -6 route add "${ns2_v6}" mtu 1438 dev veth1
+
+ sleep 1
+
+ dd if=/dev/urandom of="${infile}" bs="${datalen}" count=1 status=none
+}
+
+cleanup() {
+ ip netns del "${ns2}"
+ ip netns del "${ns1}"
+
+ if [[ -f "${outfile}" ]]; then
+ rm "${outfile}"
+ fi
+ if [[ -f "${infile}" ]]; then
+ rm "${infile}"
+ fi
+}
+
+server_listen() {
+ ip netns exec "${ns2}" nc "${netcat_opt}" -l -p "${port}" > "${outfile}" &
+ server_pid=$!
+ sleep 0.2
+}
+
+client_connect() {
+ ip netns exec "${ns1}" timeout 2 nc "${netcat_opt}" -w 1 "${addr2}" "${port}" < "${infile}"
+ echo $?
+}
+
+verify_data() {
+ wait "${server_pid}"
+ # sha1sum returns two fields [sha1] [filepath]
+ # convert to bash array and access first elem
+ insum=($(sha1sum ${infile}))
+ outsum=($(sha1sum ${outfile}))
+ if [[ "${insum[0]}" != "${outsum[0]}" ]]; then
+ echo "data mismatch"
+ exit 1
+ fi
+}
+
+set -e
+
+# no arguments: automated test, run all
+if [[ "$#" -eq "0" ]]; then
+ echo "ipip"
+ $0 ipv4 ipip none 100
+
+ echo "ip6ip6"
+ $0 ipv6 ip6tnl none 100
+
+ for mac in none mpls eth ; do
+ echo "ip gre $mac"
+ $0 ipv4 gre $mac 100
+
+ echo "ip6 gre $mac"
+ $0 ipv6 ip6gre $mac 100
+
+ echo "ip gre $mac gso"
+ $0 ipv4 gre $mac 2000
+
+ echo "ip6 gre $mac gso"
+ $0 ipv6 ip6gre $mac 2000
+
+ echo "ip udp $mac"
+ $0 ipv4 udp $mac 100
+
+ echo "ip6 udp $mac"
+ $0 ipv6 ip6udp $mac 100
+
+ echo "ip udp $mac gso"
+ $0 ipv4 udp $mac 2000
+
+ echo "ip6 udp $mac gso"
+ $0 ipv6 ip6udp $mac 2000
+ done
+
+ echo "OK. All tests passed"
+ exit 0
+fi
+
+if [[ "$#" -ne "4" ]]; then
+ echo "Usage: $0"
+ echo " or: $0 <ipv4|ipv6> <tuntype> <none|mpls|eth> <data_len>"
+ exit 1
+fi
+
+case "$1" in
+"ipv4")
+ readonly addr1="${ns1_v4}"
+ readonly addr2="${ns2_v4}"
+ readonly ipproto=4
+ readonly netcat_opt=-${ipproto}
+ readonly foumod=fou
+ readonly foutype=ipip
+ readonly fouproto=4
+ readonly fouproto_mpls=${mplsproto}
+ readonly gretaptype=gretap
+ ;;
+"ipv6")
+ readonly addr1="${ns1_v6}"
+ readonly addr2="${ns2_v6}"
+ readonly ipproto=6
+ readonly netcat_opt=-${ipproto}
+ readonly foumod=fou6
+ readonly foutype=ip6tnl
+ readonly fouproto="41 -6"
+ readonly fouproto_mpls="${mplsproto} -6"
+ readonly gretaptype=ip6gretap
+ ;;
+*)
+ echo "unknown arg: $1"
+ exit 1
+ ;;
+esac
+
+readonly tuntype=$2
+readonly mac=$3
+readonly datalen=$4
+
+echo "encap ${addr1} to ${addr2}, type ${tuntype}, mac ${mac} len ${datalen}"
+
+trap cleanup EXIT
+
+setup
+
+# basic communication works
+echo "test basic connectivity"
+server_listen
+client_connect
+verify_data
+
+# clientside, insert bpf program to encap all TCP to port ${port}
+# client can no longer connect
+ip netns exec "${ns1}" tc qdisc add dev veth1 clsact
+ip netns exec "${ns1}" tc filter add dev veth1 egress \
+ bpf direct-action object-file ./test_tc_tunnel.o \
+ section "encap_${tuntype}_${mac}"
+echo "test bpf encap without decap (expect failure)"
+server_listen
+! client_connect
+
+if [[ "$tuntype" =~ "udp" ]]; then
+ # Set up fou tunnel.
+ ttype="${foutype}"
+ targs="encap fou encap-sport auto encap-dport $udpport"
+ # fou may be a module; allow this to fail.
+ modprobe "${foumod}" ||true
+ if [[ "$mac" == "mpls" ]]; then
+ dport=${mplsudpport}
+ dproto=${fouproto_mpls}
+ tmode="mode any ttl 255"
+ else
+ dport=${udpport}
+ dproto=${fouproto}
+ fi
+ ip netns exec "${ns2}" ip fou add port $dport ipproto ${dproto}
+ targs="encap fou encap-sport auto encap-dport $dport"
+elif [[ "$tuntype" =~ "gre" && "$mac" == "eth" ]]; then
+ ttype=$gretaptype
+else
+ ttype=$tuntype
+ targs=""
+fi
+
+# serverside, insert decap module
+# server is still running
+# client can connect again
+ip netns exec "${ns2}" ip link add name testtun0 type "${ttype}" \
+ ${tmode} remote "${addr1}" local "${addr2}" $targs
+
+expect_tun_fail=0
+
+if [[ "$tuntype" == "ip6udp" && "$mac" == "mpls" ]]; then
+ # No support for MPLS IPv6 fou tunnel; expect failure.
+ expect_tun_fail=1
+elif [[ "$tuntype" =~ "udp" && "$mac" == "eth" ]]; then
+ # No support for TEB fou tunnel; expect failure.
+ expect_tun_fail=1
+elif [[ "$tuntype" =~ "gre" && "$mac" == "eth" ]]; then
+ # Share ethernet address between tunnel/veth2 so L2 decap works.
+ ethaddr=$(ip netns exec "${ns2}" ip link show veth2 | \
+ awk '/ether/ { print $2 }')
+ ip netns exec "${ns2}" ip link set testtun0 address $ethaddr
+elif [[ "$mac" == "mpls" ]]; then
+ modprobe mpls_iptunnel ||true
+ modprobe mpls_gso ||true
+ ip netns exec "${ns2}" sysctl -qw net.mpls.platform_labels=65536
+ ip netns exec "${ns2}" ip -f mpls route add 1000 dev lo
+ ip netns exec "${ns2}" ip link set lo up
+ ip netns exec "${ns2}" sysctl -qw net.mpls.conf.testtun0.input=1
+ ip netns exec "${ns2}" sysctl -qw net.ipv4.conf.lo.rp_filter=0
+fi
+
+# Because packets are decapped by the tunnel they arrive on testtun0 from
+# the IP stack perspective. Ensure reverse path filtering is disabled
+# otherwise we drop the TCP SYN as arriving on testtun0 instead of the
+# expected veth2 (veth2 is where 192.168.1.2 is configured).
+ip netns exec "${ns2}" sysctl -qw net.ipv4.conf.all.rp_filter=0
+# rp needs to be disabled for both all and testtun0 as the rp value is
+# selected as the max of the "all" and device-specific values.
+ip netns exec "${ns2}" sysctl -qw net.ipv4.conf.testtun0.rp_filter=0
+ip netns exec "${ns2}" ip link set dev testtun0 up
+if [[ "$expect_tun_fail" == 1 ]]; then
+ # This tunnel mode is not supported, so we expect failure.
+ echo "test bpf encap with tunnel device decap (expect failure)"
+ ! client_connect
+else
+ echo "test bpf encap with tunnel device decap"
+ client_connect
+ verify_data
+ server_listen
+fi
+
+# serverside, use BPF for decap
+ip netns exec "${ns2}" ip link del dev testtun0
+ip netns exec "${ns2}" tc qdisc add dev veth2 clsact
+ip netns exec "${ns2}" tc filter add dev veth2 ingress \
+ bpf direct-action object-file ./test_tc_tunnel.o section decap
+echo "test bpf encap with bpf decap"
+client_connect
+verify_data
+
+echo OK
--- /dev/null
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2018 Facebook
+# Copyright (c) 2019 Cloudflare
+
+set -eu
+
+wait_for_ip()
+{
+ local _i
+ printf "Wait for IP %s to become available " "$1"
+ for _i in $(seq ${MAX_PING_TRIES}); do
+ printf "."
+ if ns1_exec ping -c 1 -W 1 "$1" >/dev/null 2>&1; then
+ echo " OK"
+ return
+ fi
+ sleep 1
+ done
+ echo 1>&2 "ERROR: Timeout waiting for test IP to become available."
+ exit 1
+}
+
+get_prog_id()
+{
+ awk '/ id / {sub(/.* id /, "", $0); print($1)}'
+}
+
+ns1_exec()
+{
+ ip netns exec ns1 "$@"
+}
+
+setup()
+{
+ ip netns add ns1
+ ns1_exec ip link set lo up
+
+ ns1_exec sysctl -w net.ipv4.tcp_syncookies=2
+
+ wait_for_ip 127.0.0.1
+ wait_for_ip ::1
+}
+
+cleanup()
+{
+ ip netns del ns1 2>/dev/null || :
+}
+
+main()
+{
+ trap cleanup EXIT 2 3 6 15
+ setup
+
+ printf "Testing clsact..."
+ ns1_exec tc qdisc add dev "${TEST_IF}" clsact
+ ns1_exec tc filter add dev "${TEST_IF}" ingress \
+ bpf obj "${BPF_PROG_OBJ}" sec "${CLSACT_SECTION}" da
+
+ BPF_PROG_ID=$(ns1_exec tc filter show dev "${TEST_IF}" ingress | \
+ get_prog_id)
+ ns1_exec "${PROG}" "${BPF_PROG_ID}"
+ ns1_exec tc qdisc del dev "${TEST_IF}" clsact
+
+ printf "Testing XDP..."
+ ns1_exec ip link set "${TEST_IF}" xdp \
+ object "${BPF_PROG_OBJ}" section "${XDP_SECTION}"
+ BPF_PROG_ID=$(ns1_exec ip link show "${TEST_IF}" | get_prog_id)
+ ns1_exec "${PROG}" "${BPF_PROG_ID}"
+}
+
+DIR=$(dirname $0)
+TEST_IF=lo
+MAX_PING_TRIES=5
+BPF_PROG_OBJ="${DIR}/test_tcp_check_syncookie_kern.o"
+CLSACT_SECTION="clsact/check_syncookie"
+XDP_SECTION="xdp/check_syncookie"
+BPF_PROG_ID=0
+PROG="${DIR}/test_tcp_check_syncookie_user"
+
+main
--- /dev/null
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2018 Facebook
+// Copyright (c) 2019 Cloudflare
+
+#include <string.h>
+#include <stdlib.h>
+#include <unistd.h>
+
+#include <arpa/inet.h>
+#include <netinet/in.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+
+#include <bpf/bpf.h>
+#include <bpf/libbpf.h>
+
+#include "bpf_rlimit.h"
+#include "cgroup_helpers.h"
+
+static int start_server(const struct sockaddr *addr, socklen_t len)
+{
+ int fd;
+
+ fd = socket(addr->sa_family, SOCK_STREAM, 0);
+ if (fd == -1) {
+ log_err("Failed to create server socket");
+ goto out;
+ }
+
+ if (bind(fd, addr, len) == -1) {
+ log_err("Failed to bind server socket");
+ goto close_out;
+ }
+
+ if (listen(fd, 128) == -1) {
+ log_err("Failed to listen on server socket");
+ goto close_out;
+ }
+
+ goto out;
+
+close_out:
+ close(fd);
+ fd = -1;
+out:
+ return fd;
+}
+
+static int connect_to_server(int server_fd)
+{
+ struct sockaddr_storage addr;
+ socklen_t len = sizeof(addr);
+ int fd = -1;
+
+ if (getsockname(server_fd, (struct sockaddr *)&addr, &len)) {
+ log_err("Failed to get server addr");
+ goto out;
+ }
+
+ fd = socket(addr.ss_family, SOCK_STREAM, 0);
+ if (fd == -1) {
+ log_err("Failed to create client socket");
+ goto out;
+ }
+
+ if (connect(fd, (const struct sockaddr *)&addr, len) == -1) {
+ log_err("Fail to connect to server");
+ goto close_out;
+ }
+
+ goto out;
+
+close_out:
+ close(fd);
+ fd = -1;
+out:
+ return fd;
+}
+
+static int get_map_fd_by_prog_id(int prog_id)
+{
+ struct bpf_prog_info info = {};
+ __u32 info_len = sizeof(info);
+ __u32 map_ids[1];
+ int prog_fd = -1;
+ int map_fd = -1;
+
+ prog_fd = bpf_prog_get_fd_by_id(prog_id);
+ if (prog_fd < 0) {
+ log_err("Failed to get fd by prog id %d", prog_id);
+ goto err;
+ }
+
+ info.nr_map_ids = 1;
+ info.map_ids = (__u64)(unsigned long)map_ids;
+
+ if (bpf_obj_get_info_by_fd(prog_fd, &info, &info_len)) {
+ log_err("Failed to get info by prog fd %d", prog_fd);
+ goto err;
+ }
+
+ if (!info.nr_map_ids) {
+ log_err("No maps found for prog fd %d", prog_fd);
+ goto err;
+ }
+
+ map_fd = bpf_map_get_fd_by_id(map_ids[0]);
+ if (map_fd < 0)
+ log_err("Failed to get fd by map id %d", map_ids[0]);
+err:
+ if (prog_fd >= 0)
+ close(prog_fd);
+ return map_fd;
+}
+
+static int run_test(int server_fd, int results_fd)
+{
+ int client = -1, srv_client = -1;
+ int ret = 0;
+ __u32 key = 0;
+ __u64 value = 0;
+
+ if (bpf_map_update_elem(results_fd, &key, &value, 0) < 0) {
+ log_err("Can't clear results");
+ goto err;
+ }
+
+ client = connect_to_server(server_fd);
+ if (client == -1)
+ goto err;
+
+ srv_client = accept(server_fd, NULL, 0);
+ if (srv_client == -1) {
+ log_err("Can't accept connection");
+ goto err;
+ }
+
+ if (bpf_map_lookup_elem(results_fd, &key, &value) < 0) {
+ log_err("Can't lookup result");
+ goto err;
+ }
+
+ if (value != 1) {
+ log_err("Didn't match syncookie: %llu", value);
+ goto err;
+ }
+
+ goto out;
+
+err:
+ ret = 1;
+out:
+ close(client);
+ close(srv_client);
+ return ret;
+}
+
+int main(int argc, char **argv)
+{
+ struct sockaddr_in addr4;
+ struct sockaddr_in6 addr6;
+ int server = -1;
+ int server_v6 = -1;
+ int results = -1;
+ int err = 0;
+
+ if (argc < 2) {
+ fprintf(stderr, "Usage: %s prog_id\n", argv[0]);
+ exit(1);
+ }
+
+ results = get_map_fd_by_prog_id(atoi(argv[1]));
+ if (results < 0) {
+ log_err("Can't get map");
+ goto err;
+ }
+
+ memset(&addr4, 0, sizeof(addr4));
+ addr4.sin_family = AF_INET;
+ addr4.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
+ addr4.sin_port = 0;
+
+ memset(&addr6, 0, sizeof(addr6));
+ addr6.sin6_family = AF_INET6;
+ addr6.sin6_addr = in6addr_loopback;
+ addr6.sin6_port = 0;
+
+ server = start_server((const struct sockaddr *)&addr4, sizeof(addr4));
+ if (server == -1)
+ goto err;
+
+ server_v6 = start_server((const struct sockaddr *)&addr6,
+ sizeof(addr6));
+ if (server_v6 == -1)
+ goto err;
+
+ if (run_test(server, results))
+ goto err;
+
+ if (run_test(server_v6, results))
+ goto err;
+
+ printf("ok\n");
+ goto out;
+err:
+ err = 1;
+out:
+ close(server);
+ close(server_v6);
+ close(results);
+ return err;
+}
#include "../../../include/linux/filter.h"
#define MAX_INSNS BPF_MAXINSNS
+#define MAX_TEST_INSNS 1000000
#define MAX_FIXUPS 8
-#define MAX_NR_MAPS 14
+#define MAX_NR_MAPS 16
#define MAX_TEST_RUNS 8
#define POINTER_VALUE 0xcafe4all
#define TEST_DATA_LEN 64
struct bpf_test {
const char *descr;
struct bpf_insn insns[MAX_INSNS];
+ struct bpf_insn *fill_insns;
int fixup_map_hash_8b[MAX_FIXUPS];
int fixup_map_hash_48b[MAX_FIXUPS];
int fixup_map_hash_16b[MAX_FIXUPS];
int fixup_cgroup_storage[MAX_FIXUPS];
int fixup_percpu_cgroup_storage[MAX_FIXUPS];
int fixup_map_spin_lock[MAX_FIXUPS];
+ int fixup_map_array_ro[MAX_FIXUPS];
+ int fixup_map_array_wo[MAX_FIXUPS];
+ int fixup_map_array_small[MAX_FIXUPS];
const char *errstr;
const char *errstr_unpriv;
uint32_t retval, retval_unpriv, insn_processed;
+ int prog_len;
enum {
UNDEF,
ACCEPT,
static void bpf_fill_ld_abs_vlan_push_pop(struct bpf_test *self)
{
- /* test: {skb->data[0], vlan_push} x 68 + {skb->data[0], vlan_pop} x 68 */
+ /* test: {skb->data[0], vlan_push} x 51 + {skb->data[0], vlan_pop} x 51 */
#define PUSH_CNT 51
- unsigned int len = BPF_MAXINSNS;
- struct bpf_insn *insn = self->insns;
+ /* jump range is limited to 16 bit. PUSH_CNT of ld_abs needs room */
+ unsigned int len = (1 << 15) - PUSH_CNT * 2 * 5 * 6;
+ struct bpf_insn *insn = self->fill_insns;
int i = 0, j, k = 0;
insn[i++] = BPF_MOV64_REG(BPF_REG_6, BPF_REG_1);
for (; i < len - 1; i++)
insn[i] = BPF_ALU32_IMM(BPF_MOV, BPF_REG_0, 0xbef);
insn[len - 1] = BPF_EXIT_INSN();
+ self->prog_len = len;
}
static void bpf_fill_jump_around_ld_abs(struct bpf_test *self)
{
- struct bpf_insn *insn = self->insns;
- unsigned int len = BPF_MAXINSNS;
+ struct bpf_insn *insn = self->fill_insns;
+ /* jump range is limited to 16 bit. every ld_abs is replaced by 6 insns */
+ unsigned int len = (1 << 15) / 6;
int i = 0;
insn[i++] = BPF_MOV64_REG(BPF_REG_6, BPF_REG_1);
while (i < len - 1)
insn[i++] = BPF_LD_ABS(BPF_B, 1);
insn[i] = BPF_EXIT_INSN();
+ self->prog_len = i + 1;
}
static void bpf_fill_rand_ld_dw(struct bpf_test *self)
{
- struct bpf_insn *insn = self->insns;
+ struct bpf_insn *insn = self->fill_insns;
uint64_t res = 0;
int i = 0;
insn[i++] = BPF_ALU64_IMM(BPF_RSH, BPF_REG_1, 32);
insn[i++] = BPF_ALU64_REG(BPF_XOR, BPF_REG_0, BPF_REG_1);
insn[i] = BPF_EXIT_INSN();
+ self->prog_len = i + 1;
res ^= (res >> 32);
self->retval = (uint32_t)res;
}
/* BPF_SK_LOOKUP contains 13 instructions, if you need to fix up maps */
-#define BPF_SK_LOOKUP \
+#define BPF_SK_LOOKUP(func) \
/* struct bpf_sock_tuple tuple = {} */ \
BPF_MOV64_IMM(BPF_REG_2, 0), \
BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -8), \
BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_2, -32), \
BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_2, -40), \
BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_2, -48), \
- /* sk = sk_lookup_tcp(ctx, &tuple, sizeof tuple, 0, 0) */ \
+ /* sk = func(ctx, &tuple, sizeof tuple, 0, 0) */ \
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), \
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -48), \
BPF_MOV64_IMM(BPF_REG_3, sizeof(struct bpf_sock_tuple)), \
BPF_MOV64_IMM(BPF_REG_4, 0), \
BPF_MOV64_IMM(BPF_REG_5, 0), \
- BPF_EMIT_CALL(BPF_FUNC_sk_lookup_tcp)
+ BPF_EMIT_CALL(BPF_FUNC_ ## func)
/* BPF_DIRECT_PKT_R2 contains 7 instructions, it initializes default return
* value into 0 and does necessary preparation for direct packet access
return false;
}
-static int create_map(uint32_t type, uint32_t size_key,
- uint32_t size_value, uint32_t max_elem)
+static int __create_map(uint32_t type, uint32_t size_key,
+ uint32_t size_value, uint32_t max_elem,
+ uint32_t extra_flags)
{
int fd;
fd = bpf_create_map(type, size_key, size_value, max_elem,
- type == BPF_MAP_TYPE_HASH ? BPF_F_NO_PREALLOC : 0);
+ (type == BPF_MAP_TYPE_HASH ?
+ BPF_F_NO_PREALLOC : 0) | extra_flags);
if (fd < 0) {
if (skip_unsupported_map(type))
return -1;
return fd;
}
+static int create_map(uint32_t type, uint32_t size_key,
+ uint32_t size_value, uint32_t max_elem)
+{
+ return __create_map(type, size_key, size_value, max_elem, 0);
+}
+
static void update_map(int fd, int index)
{
struct test_val value = {
int *fixup_cgroup_storage = test->fixup_cgroup_storage;
int *fixup_percpu_cgroup_storage = test->fixup_percpu_cgroup_storage;
int *fixup_map_spin_lock = test->fixup_map_spin_lock;
+ int *fixup_map_array_ro = test->fixup_map_array_ro;
+ int *fixup_map_array_wo = test->fixup_map_array_wo;
+ int *fixup_map_array_small = test->fixup_map_array_small;
- if (test->fill_helper)
+ if (test->fill_helper) {
+ test->fill_insns = calloc(MAX_TEST_INSNS, sizeof(struct bpf_insn));
test->fill_helper(test);
+ }
/* Allocating HTs with 1 elem is fine here, since we only test
* for verifier and not do a runtime lookup, so the only thing
fixup_map_spin_lock++;
} while (*fixup_map_spin_lock);
}
+ if (*fixup_map_array_ro) {
+ map_fds[14] = __create_map(BPF_MAP_TYPE_ARRAY, sizeof(int),
+ sizeof(struct test_val), 1,
+ BPF_F_RDONLY_PROG);
+ update_map(map_fds[14], 0);
+ do {
+ prog[*fixup_map_array_ro].imm = map_fds[14];
+ fixup_map_array_ro++;
+ } while (*fixup_map_array_ro);
+ }
+ if (*fixup_map_array_wo) {
+ map_fds[15] = __create_map(BPF_MAP_TYPE_ARRAY, sizeof(int),
+ sizeof(struct test_val), 1,
+ BPF_F_WRONLY_PROG);
+ update_map(map_fds[15], 0);
+ do {
+ prog[*fixup_map_array_wo].imm = map_fds[15];
+ fixup_map_array_wo++;
+ } while (*fixup_map_array_wo);
+ }
+ if (*fixup_map_array_small) {
+ map_fds[16] = __create_map(BPF_MAP_TYPE_ARRAY, sizeof(int),
+ 1, 1, 0);
+ update_map(map_fds[16], 0);
+ do {
+ prog[*fixup_map_array_small].imm = map_fds[16];
+ fixup_map_array_small++;
+ } while (*fixup_map_array_small);
+ }
}
static int set_admin(bool admin)
prog_type = BPF_PROG_TYPE_SOCKET_FILTER;
fixup_skips = skips;
do_test_fixup(test, prog_type, prog, map_fds);
+ if (test->fill_insns) {
+ prog = test->fill_insns;
+ prog_len = test->prog_len;
+ } else {
+ prog_len = probe_filter_length(prog);
+ }
/* If there were some map skips during fixup due to missing bpf
* features, skip this test.
*/
if (fixup_skips != skips)
return;
- prog_len = probe_filter_length(prog);
pflags = 0;
if (test->flags & F_LOAD_WITH_STRICT_ALIGNMENT)
if (test->flags & F_NEEDS_EFFICIENT_UNALIGNED_ACCESS)
pflags |= BPF_F_ANY_ALIGNMENT;
fd_prog = bpf_verify_program(prog_type, prog, prog_len, pflags,
- "GPL", 0, bpf_vlog, sizeof(bpf_vlog), 1);
+ "GPL", 0, bpf_vlog, sizeof(bpf_vlog), 4);
if (fd_prog < 0 && !bpf_probe_prog_type(prog_type, 0)) {
printf("SKIP (unsupported program type %d)\n", prog_type);
skips++;
goto fail_log;
}
close_fds:
+ if (test->fill_insns)
+ free(test->fill_insns);
close(fd_prog);
for (i = 0; i < MAX_NR_MAPS; i++)
close(map_fds[i]);
int start = 0, end = sym_cnt;
int result;
+ /* kallsyms not loaded. return NULL */
+ if (sym_cnt <= 0)
+ return NULL;
+
while (start < end) {
size_t mid = start + (end - start) / 2;
#define BUF_SIZE 256
+static __attribute__((noinline))
+void urandom_read(int fd, int count)
+{
+ char buf[BUF_SIZE];
+ int i;
+
+ for (i = 0; i < count; ++i)
+ read(fd, buf, BUF_SIZE);
+}
+
int main(int argc, char *argv[])
{
int fd = open("/dev/urandom", O_RDONLY);
- int i;
- char buf[BUF_SIZE];
int count = 4;
if (fd < 0)
if (argc == 2)
count = atoi(argv[1]);
- for (i = 0; i < count; ++i)
- read(fd, buf, BUF_SIZE);
+ urandom_read(fd, count);
close(fd);
return 0;
.result = REJECT,
.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
},
+{
+ "valid read map access into a read-only array 1",
+ .insns = {
+ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+ BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1),
+ BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map_array_ro = { 3 },
+ .result = ACCEPT,
+ .retval = 28,
+},
+{
+ "valid read map access into a read-only array 2",
+ .insns = {
+ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+ BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 6),
+
+ BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+ BPF_MOV64_IMM(BPF_REG_2, 4),
+ BPF_MOV64_IMM(BPF_REG_3, 0),
+ BPF_MOV64_IMM(BPF_REG_4, 0),
+ BPF_MOV64_IMM(BPF_REG_5, 0),
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+ BPF_FUNC_csum_diff),
+ BPF_EXIT_INSN(),
+ },
+ .prog_type = BPF_PROG_TYPE_SCHED_CLS,
+ .fixup_map_array_ro = { 3 },
+ .result = ACCEPT,
+ .retval = -29,
+},
+{
+ "invalid write map access into a read-only array 1",
+ .insns = {
+ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+ BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1),
+ BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 42),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map_array_ro = { 3 },
+ .result = REJECT,
+ .errstr = "write into map forbidden",
+},
+{
+ "invalid write map access into a read-only array 2",
+ .insns = {
+ BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
+ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+ BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 5),
+ BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+ BPF_MOV64_IMM(BPF_REG_2, 0),
+ BPF_MOV64_REG(BPF_REG_3, BPF_REG_0),
+ BPF_MOV64_IMM(BPF_REG_4, 8),
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+ BPF_FUNC_skb_load_bytes),
+ BPF_EXIT_INSN(),
+ },
+ .prog_type = BPF_PROG_TYPE_SCHED_CLS,
+ .fixup_map_array_ro = { 4 },
+ .result = REJECT,
+ .errstr = "write into map forbidden",
+},
+{
+ "valid write map access into a write-only array 1",
+ .insns = {
+ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+ BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1),
+ BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 42),
+ BPF_MOV64_IMM(BPF_REG_0, 1),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map_array_wo = { 3 },
+ .result = ACCEPT,
+ .retval = 1,
+},
+{
+ "valid write map access into a write-only array 2",
+ .insns = {
+ BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
+ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+ BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 5),
+ BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+ BPF_MOV64_IMM(BPF_REG_2, 0),
+ BPF_MOV64_REG(BPF_REG_3, BPF_REG_0),
+ BPF_MOV64_IMM(BPF_REG_4, 8),
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+ BPF_FUNC_skb_load_bytes),
+ BPF_EXIT_INSN(),
+ },
+ .prog_type = BPF_PROG_TYPE_SCHED_CLS,
+ .fixup_map_array_wo = { 4 },
+ .result = ACCEPT,
+ .retval = 0,
+},
+{
+ "invalid read map access into a write-only array 1",
+ .insns = {
+ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+ BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1),
+ BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map_array_wo = { 3 },
+ .result = REJECT,
+ .errstr = "read from map forbidden",
+},
+{
+ "invalid read map access into a write-only array 2",
+ .insns = {
+ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+ BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 6),
+
+ BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+ BPF_MOV64_IMM(BPF_REG_2, 4),
+ BPF_MOV64_IMM(BPF_REG_3, 0),
+ BPF_MOV64_IMM(BPF_REG_4, 0),
+ BPF_MOV64_IMM(BPF_REG_5, 0),
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+ BPF_FUNC_csum_diff),
+ BPF_EXIT_INSN(),
+ },
+ .prog_type = BPF_PROG_TYPE_SCHED_CLS,
+ .fixup_map_array_wo = { 3 },
+ .result = REJECT,
+ .errstr = "read from map forbidden",
+},
.errstr = "invalid bpf_context access",
.result = REJECT,
.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
- .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
},
{
"check cb access: half, wrong type",
--- /dev/null
+{
+ "direct map access, write test 1",
+ .insns = {
+ BPF_MOV64_IMM(BPF_REG_0, 1),
+ BPF_LD_MAP_VALUE(BPF_REG_1, 0, 0),
+ BPF_ST_MEM(BPF_DW, BPF_REG_1, 0, 4242),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map_array_48b = { 1 },
+ .result = ACCEPT,
+ .retval = 1,
+},
+{
+ "direct map access, write test 2",
+ .insns = {
+ BPF_MOV64_IMM(BPF_REG_0, 1),
+ BPF_LD_MAP_VALUE(BPF_REG_1, 0, 8),
+ BPF_ST_MEM(BPF_DW, BPF_REG_1, 0, 4242),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map_array_48b = { 1 },
+ .result = ACCEPT,
+ .retval = 1,
+},
+{
+ "direct map access, write test 3",
+ .insns = {
+ BPF_MOV64_IMM(BPF_REG_0, 1),
+ BPF_LD_MAP_VALUE(BPF_REG_1, 0, 8),
+ BPF_ST_MEM(BPF_DW, BPF_REG_1, 8, 4242),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map_array_48b = { 1 },
+ .result = ACCEPT,
+ .retval = 1,
+},
+{
+ "direct map access, write test 4",
+ .insns = {
+ BPF_MOV64_IMM(BPF_REG_0, 1),
+ BPF_LD_MAP_VALUE(BPF_REG_1, 0, 40),
+ BPF_ST_MEM(BPF_DW, BPF_REG_1, 0, 4242),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map_array_48b = { 1 },
+ .result = ACCEPT,
+ .retval = 1,
+},
+{
+ "direct map access, write test 5",
+ .insns = {
+ BPF_MOV64_IMM(BPF_REG_0, 1),
+ BPF_LD_MAP_VALUE(BPF_REG_1, 0, 32),
+ BPF_ST_MEM(BPF_DW, BPF_REG_1, 8, 4242),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map_array_48b = { 1 },
+ .result = ACCEPT,
+ .retval = 1,
+},
+{
+ "direct map access, write test 6",
+ .insns = {
+ BPF_MOV64_IMM(BPF_REG_0, 1),
+ BPF_LD_MAP_VALUE(BPF_REG_1, 0, 40),
+ BPF_ST_MEM(BPF_DW, BPF_REG_1, 4, 4242),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map_array_48b = { 1 },
+ .result = REJECT,
+ .errstr = "R1 min value is outside of the array range",
+},
+{
+ "direct map access, write test 7",
+ .insns = {
+ BPF_MOV64_IMM(BPF_REG_0, 1),
+ BPF_LD_MAP_VALUE(BPF_REG_1, 0, -1),
+ BPF_ST_MEM(BPF_DW, BPF_REG_1, 4, 4242),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map_array_48b = { 1 },
+ .result = REJECT,
+ .errstr = "direct value offset of 4294967295 is not allowed",
+},
+{
+ "direct map access, write test 8",
+ .insns = {
+ BPF_MOV64_IMM(BPF_REG_0, 1),
+ BPF_LD_MAP_VALUE(BPF_REG_1, 0, 1),
+ BPF_ST_MEM(BPF_DW, BPF_REG_1, -1, 4242),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map_array_48b = { 1 },
+ .result = ACCEPT,
+ .retval = 1,
+},
+{
+ "direct map access, write test 9",
+ .insns = {
+ BPF_MOV64_IMM(BPF_REG_0, 1),
+ BPF_LD_MAP_VALUE(BPF_REG_1, 0, 48),
+ BPF_ST_MEM(BPF_DW, BPF_REG_1, 0, 4242),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map_array_48b = { 1 },
+ .result = REJECT,
+ .errstr = "invalid access to map value pointer",
+},
+{
+ "direct map access, write test 10",
+ .insns = {
+ BPF_MOV64_IMM(BPF_REG_0, 1),
+ BPF_LD_MAP_VALUE(BPF_REG_1, 0, 47),
+ BPF_ST_MEM(BPF_B, BPF_REG_1, 0, 4),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map_array_48b = { 1 },
+ .result = ACCEPT,
+ .retval = 1,
+},
+{
+ "direct map access, write test 11",
+ .insns = {
+ BPF_MOV64_IMM(BPF_REG_0, 1),
+ BPF_LD_MAP_VALUE(BPF_REG_1, 0, 48),
+ BPF_ST_MEM(BPF_B, BPF_REG_1, 0, 4),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map_array_48b = { 1 },
+ .result = REJECT,
+ .errstr = "invalid access to map value pointer",
+},
+{
+ "direct map access, write test 12",
+ .insns = {
+ BPF_MOV64_IMM(BPF_REG_0, 1),
+ BPF_LD_MAP_VALUE(BPF_REG_1, 0, (1<<29)),
+ BPF_ST_MEM(BPF_B, BPF_REG_1, 0, 4),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map_array_48b = { 1 },
+ .result = REJECT,
+ .errstr = "direct value offset of 536870912 is not allowed",
+},
+{
+ "direct map access, write test 13",
+ .insns = {
+ BPF_MOV64_IMM(BPF_REG_0, 1),
+ BPF_LD_MAP_VALUE(BPF_REG_1, 0, (1<<29)-1),
+ BPF_ST_MEM(BPF_B, BPF_REG_1, 0, 4),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map_array_48b = { 1 },
+ .result = REJECT,
+ .errstr = "invalid access to map value pointer, value_size=48 off=536870911",
+},
+{
+ "direct map access, write test 14",
+ .insns = {
+ BPF_MOV64_IMM(BPF_REG_0, 1),
+ BPF_LD_MAP_VALUE(BPF_REG_1, 0, 47),
+ BPF_LD_MAP_VALUE(BPF_REG_2, 0, 46),
+ BPF_ST_MEM(BPF_H, BPF_REG_2, 0, 0xffff),
+ BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1, 0),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map_array_48b = { 1, 3 },
+ .result = ACCEPT,
+ .retval = 0xff,
+},
+{
+ "direct map access, write test 15",
+ .insns = {
+ BPF_MOV64_IMM(BPF_REG_0, 1),
+ BPF_LD_MAP_VALUE(BPF_REG_1, 0, 46),
+ BPF_LD_MAP_VALUE(BPF_REG_2, 0, 46),
+ BPF_ST_MEM(BPF_H, BPF_REG_2, 0, 0xffff),
+ BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1, 0),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map_array_48b = { 1, 3 },
+ .result = ACCEPT,
+ .retval = 0xffff,
+},
+{
+ "direct map access, write test 16",
+ .insns = {
+ BPF_MOV64_IMM(BPF_REG_0, 1),
+ BPF_LD_MAP_VALUE(BPF_REG_1, 0, 46),
+ BPF_LD_MAP_VALUE(BPF_REG_2, 0, 47),
+ BPF_ST_MEM(BPF_H, BPF_REG_2, 0, 0xffff),
+ BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1, 0),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map_array_48b = { 1, 3 },
+ .result = REJECT,
+ .errstr = "invalid access to map value, value_size=48 off=47 size=2",
+},
+{
+ "direct map access, write test 17",
+ .insns = {
+ BPF_MOV64_IMM(BPF_REG_0, 1),
+ BPF_LD_MAP_VALUE(BPF_REG_1, 0, 46),
+ BPF_LD_MAP_VALUE(BPF_REG_2, 0, 46),
+ BPF_ST_MEM(BPF_H, BPF_REG_2, 1, 0xffff),
+ BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1, 0),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map_array_48b = { 1, 3 },
+ .result = REJECT,
+ .errstr = "invalid access to map value, value_size=48 off=47 size=2",
+},
+{
+ "direct map access, write test 18",
+ .insns = {
+ BPF_MOV64_IMM(BPF_REG_0, 1),
+ BPF_LD_MAP_VALUE(BPF_REG_1, 0, 0),
+ BPF_ST_MEM(BPF_H, BPF_REG_1, 0, 42),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map_array_small = { 1 },
+ .result = REJECT,
+ .errstr = "R1 min value is outside of the array range",
+},
+{
+ "direct map access, write test 19",
+ .insns = {
+ BPF_MOV64_IMM(BPF_REG_0, 1),
+ BPF_LD_MAP_VALUE(BPF_REG_1, 0, 0),
+ BPF_ST_MEM(BPF_B, BPF_REG_1, 0, 42),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map_array_small = { 1 },
+ .result = ACCEPT,
+ .retval = 1,
+},
+{
+ "direct map access, write test 20",
+ .insns = {
+ BPF_MOV64_IMM(BPF_REG_0, 1),
+ BPF_LD_MAP_VALUE(BPF_REG_1, 0, 1),
+ BPF_ST_MEM(BPF_B, BPF_REG_1, 0, 42),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map_array_small = { 1 },
+ .result = REJECT,
+ .errstr = "invalid access to map value pointer",
+},
+{
+ "direct map access, invalid insn test 1",
+ .insns = {
+ BPF_MOV64_IMM(BPF_REG_0, 1),
+ BPF_LD_IMM64_RAW_FULL(BPF_REG_1, BPF_PSEUDO_MAP_VALUE, 0, 1, 0, 47),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map_array_48b = { 1 },
+ .result = REJECT,
+ .errstr = "invalid bpf_ld_imm64 insn",
+},
+{
+ "direct map access, invalid insn test 2",
+ .insns = {
+ BPF_MOV64_IMM(BPF_REG_0, 1),
+ BPF_LD_IMM64_RAW_FULL(BPF_REG_1, BPF_PSEUDO_MAP_VALUE, 1, 0, 0, 47),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map_array_48b = { 1 },
+ .result = REJECT,
+ .errstr = "BPF_LD_IMM64 uses reserved fields",
+},
+{
+ "direct map access, invalid insn test 3",
+ .insns = {
+ BPF_MOV64_IMM(BPF_REG_0, 1),
+ BPF_LD_IMM64_RAW_FULL(BPF_REG_1, BPF_PSEUDO_MAP_VALUE, ~0, 0, 0, 47),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map_array_48b = { 1 },
+ .result = REJECT,
+ .errstr = "BPF_LD_IMM64 uses reserved fields",
+},
+{
+ "direct map access, invalid insn test 4",
+ .insns = {
+ BPF_MOV64_IMM(BPF_REG_0, 1),
+ BPF_LD_IMM64_RAW_FULL(BPF_REG_1, BPF_PSEUDO_MAP_VALUE, 0, ~0, 0, 47),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map_array_48b = { 1 },
+ .result = REJECT,
+ .errstr = "invalid bpf_ld_imm64 insn",
+},
+{
+ "direct map access, invalid insn test 5",
+ .insns = {
+ BPF_MOV64_IMM(BPF_REG_0, 1),
+ BPF_LD_IMM64_RAW_FULL(BPF_REG_1, BPF_PSEUDO_MAP_VALUE, ~0, ~0, 0, 47),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map_array_48b = { 1 },
+ .result = REJECT,
+ .errstr = "invalid bpf_ld_imm64 insn",
+},
+{
+ "direct map access, invalid insn test 6",
+ .insns = {
+ BPF_MOV64_IMM(BPF_REG_0, 1),
+ BPF_LD_IMM64_RAW_FULL(BPF_REG_1, BPF_PSEUDO_MAP_FD, ~0, 0, 0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map_array_48b = { 1 },
+ .result = REJECT,
+ .errstr = "BPF_LD_IMM64 uses reserved fields",
+},
+{
+ "direct map access, invalid insn test 7",
+ .insns = {
+ BPF_MOV64_IMM(BPF_REG_0, 1),
+ BPF_LD_IMM64_RAW_FULL(BPF_REG_1, BPF_PSEUDO_MAP_FD, 0, ~0, 0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map_array_48b = { 1 },
+ .result = REJECT,
+ .errstr = "invalid bpf_ld_imm64 insn",
+},
+{
+ "direct map access, invalid insn test 8",
+ .insns = {
+ BPF_MOV64_IMM(BPF_REG_0, 1),
+ BPF_LD_IMM64_RAW_FULL(BPF_REG_1, BPF_PSEUDO_MAP_FD, ~0, ~0, 0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map_array_48b = { 1 },
+ .result = REJECT,
+ .errstr = "invalid bpf_ld_imm64 insn",
+},
+{
+ "direct map access, invalid insn test 9",
+ .insns = {
+ BPF_MOV64_IMM(BPF_REG_0, 1),
+ BPF_LD_IMM64_RAW_FULL(BPF_REG_1, BPF_PSEUDO_MAP_FD, 0, 0, 0, 47),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map_array_48b = { 1 },
+ .result = REJECT,
+ .errstr = "unrecognized bpf_ld_imm64 insn",
+},
.result = ACCEPT,
.retval = 5,
},
+{
+ "ld_dw: xor semi-random 64 bit imms, test 5",
+ .insns = { },
+ .data = { },
+ .fill_helper = bpf_fill_rand_ld_dw,
+ .prog_type = BPF_PROG_TYPE_SCHED_CLS,
+ .result = ACCEPT,
+ .retval = 1000000 - 6,
+},
{
"reference tracking: leak potential reference",
.insns = {
- BPF_SK_LOOKUP,
+ BPF_SK_LOOKUP(sk_lookup_tcp),
+ BPF_MOV64_REG(BPF_REG_6, BPF_REG_0), /* leak reference */
+ BPF_EXIT_INSN(),
+ },
+ .prog_type = BPF_PROG_TYPE_SCHED_CLS,
+ .errstr = "Unreleased reference",
+ .result = REJECT,
+},
+{
+ "reference tracking: leak potential reference to sock_common",
+ .insns = {
+ BPF_SK_LOOKUP(skc_lookup_tcp),
BPF_MOV64_REG(BPF_REG_6, BPF_REG_0), /* leak reference */
BPF_EXIT_INSN(),
},
{
"reference tracking: leak potential reference on stack",
.insns = {
- BPF_SK_LOOKUP,
+ BPF_SK_LOOKUP(sk_lookup_tcp),
BPF_MOV64_REG(BPF_REG_4, BPF_REG_10),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_4, -8),
BPF_STX_MEM(BPF_DW, BPF_REG_4, BPF_REG_0, 0),
{
"reference tracking: leak potential reference on stack 2",
.insns = {
- BPF_SK_LOOKUP,
+ BPF_SK_LOOKUP(sk_lookup_tcp),
BPF_MOV64_REG(BPF_REG_4, BPF_REG_10),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_4, -8),
BPF_STX_MEM(BPF_DW, BPF_REG_4, BPF_REG_0, 0),
{
"reference tracking: zero potential reference",
.insns = {
- BPF_SK_LOOKUP,
+ BPF_SK_LOOKUP(sk_lookup_tcp),
+ BPF_MOV64_IMM(BPF_REG_0, 0), /* leak reference */
+ BPF_EXIT_INSN(),
+ },
+ .prog_type = BPF_PROG_TYPE_SCHED_CLS,
+ .errstr = "Unreleased reference",
+ .result = REJECT,
+},
+{
+ "reference tracking: zero potential reference to sock_common",
+ .insns = {
+ BPF_SK_LOOKUP(skc_lookup_tcp),
BPF_MOV64_IMM(BPF_REG_0, 0), /* leak reference */
BPF_EXIT_INSN(),
},
{
"reference tracking: copy and zero potential references",
.insns = {
- BPF_SK_LOOKUP,
+ BPF_SK_LOOKUP(sk_lookup_tcp),
BPF_MOV64_REG(BPF_REG_7, BPF_REG_0),
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_MOV64_IMM(BPF_REG_7, 0), /* leak reference */
{
"reference tracking: release reference without check",
.insns = {
- BPF_SK_LOOKUP,
+ BPF_SK_LOOKUP(sk_lookup_tcp),
/* reference in r0 may be NULL */
BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
BPF_MOV64_IMM(BPF_REG_2, 0),
.errstr = "type=sock_or_null expected=sock",
.result = REJECT,
},
+{
+ "reference tracking: release reference to sock_common without check",
+ .insns = {
+ BPF_SK_LOOKUP(skc_lookup_tcp),
+ /* reference in r0 may be NULL */
+ BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+ BPF_MOV64_IMM(BPF_REG_2, 0),
+ BPF_EMIT_CALL(BPF_FUNC_sk_release),
+ BPF_EXIT_INSN(),
+ },
+ .prog_type = BPF_PROG_TYPE_SCHED_CLS,
+ .errstr = "type=sock_common_or_null expected=sock",
+ .result = REJECT,
+},
{
"reference tracking: release reference",
.insns = {
- BPF_SK_LOOKUP,
+ BPF_SK_LOOKUP(sk_lookup_tcp),
+ BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+ BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1),
+ BPF_EMIT_CALL(BPF_FUNC_sk_release),
+ BPF_EXIT_INSN(),
+ },
+ .prog_type = BPF_PROG_TYPE_SCHED_CLS,
+ .result = ACCEPT,
+},
+{
+ "reference tracking: release reference to sock_common",
+ .insns = {
+ BPF_SK_LOOKUP(skc_lookup_tcp),
BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1),
BPF_EMIT_CALL(BPF_FUNC_sk_release),
{
"reference tracking: release reference 2",
.insns = {
- BPF_SK_LOOKUP,
+ BPF_SK_LOOKUP(sk_lookup_tcp),
BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
BPF_EXIT_INSN(),
{
"reference tracking: release reference twice",
.insns = {
- BPF_SK_LOOKUP,
+ BPF_SK_LOOKUP(sk_lookup_tcp),
BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
BPF_MOV64_REG(BPF_REG_6, BPF_REG_0),
BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1),
{
"reference tracking: release reference twice inside branch",
.insns = {
- BPF_SK_LOOKUP,
+ BPF_SK_LOOKUP(sk_lookup_tcp),
BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
BPF_MOV64_REG(BPF_REG_6, BPF_REG_0),
BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 3), /* goto end */
BPF_EXIT_INSN(),
BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_2,
offsetof(struct __sk_buff, mark)),
- BPF_SK_LOOKUP,
+ BPF_SK_LOOKUP(sk_lookup_tcp),
BPF_JMP_IMM(BPF_JEQ, BPF_REG_6, 0, 1), /* mark == 0? */
/* Leak reference in R0 */
BPF_EXIT_INSN(),
BPF_EXIT_INSN(),
BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_2,
offsetof(struct __sk_buff, mark)),
- BPF_SK_LOOKUP,
+ BPF_SK_LOOKUP(sk_lookup_tcp),
BPF_JMP_IMM(BPF_JEQ, BPF_REG_6, 0, 4), /* mark == 0? */
BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2), /* sk NULL? */
BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
{
"reference tracking in call: free reference in subprog",
.insns = {
- BPF_SK_LOOKUP,
+ BPF_SK_LOOKUP(sk_lookup_tcp),
BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), /* unchecked reference */
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 2),
BPF_MOV64_IMM(BPF_REG_0, 0),
{
"reference tracking in call: free reference in subprog and outside",
.insns = {
- BPF_SK_LOOKUP,
+ BPF_SK_LOOKUP(sk_lookup_tcp),
BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), /* unchecked reference */
BPF_MOV64_REG(BPF_REG_6, BPF_REG_0),
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 3),
/* subprog 1 */
BPF_MOV64_REG(BPF_REG_6, BPF_REG_4),
- BPF_SK_LOOKUP,
+ BPF_SK_LOOKUP(sk_lookup_tcp),
/* spill unchecked sk_ptr into stack of caller */
BPF_STX_MEM(BPF_DW, BPF_REG_6, BPF_REG_0, 0),
BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
BPF_EXIT_INSN(),
/* subprog 1 */
- BPF_SK_LOOKUP,
+ BPF_SK_LOOKUP(sk_lookup_tcp),
BPF_EXIT_INSN(), /* return sk */
},
.prog_type = BPF_PROG_TYPE_SCHED_CLS,
BPF_EXIT_INSN(),
/* subprog 2 */
- BPF_SK_LOOKUP,
+ BPF_SK_LOOKUP(sk_lookup_tcp),
BPF_EXIT_INSN(),
},
.prog_type = BPF_PROG_TYPE_SCHED_CLS,
BPF_EXIT_INSN(),
/* subprog 2 */
- BPF_SK_LOOKUP,
+ BPF_SK_LOOKUP(sk_lookup_tcp),
BPF_EXIT_INSN(),
},
.prog_type = BPF_PROG_TYPE_SCHED_CLS,
"reference tracking: allow LD_ABS",
.insns = {
BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
- BPF_SK_LOOKUP,
+ BPF_SK_LOOKUP(sk_lookup_tcp),
BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1),
BPF_EMIT_CALL(BPF_FUNC_sk_release),
"reference tracking: forbid LD_ABS while holding reference",
.insns = {
BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
- BPF_SK_LOOKUP,
+ BPF_SK_LOOKUP(sk_lookup_tcp),
BPF_LD_ABS(BPF_B, 0),
BPF_LD_ABS(BPF_H, 0),
BPF_LD_ABS(BPF_W, 0),
"reference tracking: allow LD_IND",
.insns = {
BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
- BPF_SK_LOOKUP,
+ BPF_SK_LOOKUP(sk_lookup_tcp),
BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1),
BPF_EMIT_CALL(BPF_FUNC_sk_release),
"reference tracking: forbid LD_IND while holding reference",
.insns = {
BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
- BPF_SK_LOOKUP,
+ BPF_SK_LOOKUP(sk_lookup_tcp),
BPF_MOV64_REG(BPF_REG_4, BPF_REG_0),
BPF_MOV64_IMM(BPF_REG_7, 1),
BPF_LD_IND(BPF_W, BPF_REG_7, -0x200000),
"reference tracking: check reference or tail call",
.insns = {
BPF_MOV64_REG(BPF_REG_7, BPF_REG_1),
- BPF_SK_LOOKUP,
+ BPF_SK_LOOKUP(sk_lookup_tcp),
/* if (sk) bpf_sk_release() */
BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 7),
"reference tracking: release reference then tail call",
.insns = {
BPF_MOV64_REG(BPF_REG_7, BPF_REG_1),
- BPF_SK_LOOKUP,
+ BPF_SK_LOOKUP(sk_lookup_tcp),
/* if (sk) bpf_sk_release() */
BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 1),
.insns = {
BPF_MOV64_REG(BPF_REG_7, BPF_REG_1),
/* Look up socket and store in REG_6 */
- BPF_SK_LOOKUP,
+ BPF_SK_LOOKUP(sk_lookup_tcp),
/* bpf_tail_call() */
BPF_MOV64_REG(BPF_REG_6, BPF_REG_0),
BPF_MOV64_IMM(BPF_REG_3, 2),
.insns = {
BPF_MOV64_REG(BPF_REG_7, BPF_REG_1),
/* Look up socket and store in REG_6 */
- BPF_SK_LOOKUP,
+ BPF_SK_LOOKUP(sk_lookup_tcp),
BPF_MOV64_REG(BPF_REG_6, BPF_REG_0),
/* if (!sk) goto end */
BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 7),
{
"reference tracking: mangle and release sock_or_null",
.insns = {
- BPF_SK_LOOKUP,
+ BPF_SK_LOOKUP(sk_lookup_tcp),
BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 5),
BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1),
{
"reference tracking: mangle and release sock",
.insns = {
- BPF_SK_LOOKUP,
+ BPF_SK_LOOKUP(sk_lookup_tcp),
BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 5),
{
"reference tracking: access member",
.insns = {
- BPF_SK_LOOKUP,
+ BPF_SK_LOOKUP(sk_lookup_tcp),
BPF_MOV64_REG(BPF_REG_6, BPF_REG_0),
BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 3),
BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_0, 4),
{
"reference tracking: write to member",
.insns = {
- BPF_SK_LOOKUP,
+ BPF_SK_LOOKUP(sk_lookup_tcp),
BPF_MOV64_REG(BPF_REG_6, BPF_REG_0),
BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 5),
BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
{
"reference tracking: invalid 64-bit access of member",
.insns = {
- BPF_SK_LOOKUP,
+ BPF_SK_LOOKUP(sk_lookup_tcp),
BPF_MOV64_REG(BPF_REG_6, BPF_REG_0),
BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 3),
BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_0, 0),
{
"reference tracking: access after release",
.insns = {
- BPF_SK_LOOKUP,
+ BPF_SK_LOOKUP(sk_lookup_tcp),
BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2),
BPF_EMIT_CALL(BPF_FUNC_sk_release),
{
"reference tracking: use ptr from bpf_tcp_sock() after release",
.insns = {
- BPF_SK_LOOKUP,
+ BPF_SK_LOOKUP(sk_lookup_tcp),
BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
BPF_EXIT_INSN(),
BPF_MOV64_REG(BPF_REG_6, BPF_REG_0),
{
"reference tracking: use ptr from bpf_sk_fullsock() after release",
.insns = {
- BPF_SK_LOOKUP,
+ BPF_SK_LOOKUP(sk_lookup_tcp),
BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
BPF_EXIT_INSN(),
BPF_MOV64_REG(BPF_REG_6, BPF_REG_0),
{
"reference tracking: use ptr from bpf_sk_fullsock(tp) after release",
.insns = {
- BPF_SK_LOOKUP,
+ BPF_SK_LOOKUP(sk_lookup_tcp),
BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
BPF_EXIT_INSN(),
BPF_MOV64_REG(BPF_REG_6, BPF_REG_0),
{
"reference tracking: use sk after bpf_sk_release(tp)",
.insns = {
- BPF_SK_LOOKUP,
+ BPF_SK_LOOKUP(sk_lookup_tcp),
BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
BPF_EXIT_INSN(),
BPF_MOV64_REG(BPF_REG_6, BPF_REG_0),
{
"reference tracking: use ptr from bpf_get_listener_sock() after bpf_sk_release(sk)",
.insns = {
- BPF_SK_LOOKUP,
+ BPF_SK_LOOKUP(sk_lookup_tcp),
BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
BPF_EXIT_INSN(),
BPF_MOV64_REG(BPF_REG_6, BPF_REG_0),
{
"reference tracking: bpf_sk_release(listen_sk)",
.insns = {
- BPF_SK_LOOKUP,
+ BPF_SK_LOOKUP(sk_lookup_tcp),
BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
BPF_EXIT_INSN(),
BPF_MOV64_REG(BPF_REG_6, BPF_REG_0),
/* !bpf_sk_fullsock(sk) is checked but !bpf_tcp_sock(sk) is not checked */
"reference tracking: tp->snd_cwnd after bpf_sk_fullsock(sk) and bpf_tcp_sock(sk)",
.insns = {
- BPF_SK_LOOKUP,
+ BPF_SK_LOOKUP(sk_lookup_tcp),
BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
BPF_EXIT_INSN(),
BPF_MOV64_REG(BPF_REG_6, BPF_REG_0),
.insns = {
BPF_MOV64_REG(BPF_REG_8, BPF_REG_1),
/* struct bpf_sock *sock = bpf_sock_lookup(...); */
- BPF_SK_LOOKUP,
+ BPF_SK_LOOKUP(sk_lookup_tcp),
BPF_MOV64_REG(BPF_REG_2, BPF_REG_0),
/* u64 foo; */
/* void *target = &foo; */
.insns = {
BPF_MOV64_REG(BPF_REG_8, BPF_REG_1),
/* struct bpf_sock *sock = bpf_sock_lookup(...); */
- BPF_SK_LOOKUP,
+ BPF_SK_LOOKUP(sk_lookup_tcp),
BPF_MOV64_REG(BPF_REG_2, BPF_REG_0),
/* u64 foo; */
/* void *target = &foo; */
.insns = {
BPF_MOV64_REG(BPF_REG_8, BPF_REG_1),
/* struct bpf_sock *sock = bpf_sock_lookup(...); */
- BPF_SK_LOOKUP,
+ BPF_SK_LOOKUP(sk_lookup_tcp),
BPF_MOV64_REG(BPF_REG_2, BPF_REG_0),
/* u64 foo; */
/* void *target = &foo; */
.insns = {
BPF_MOV64_REG(BPF_REG_8, BPF_REG_1),
/* struct bpf_sock *sock = bpf_sock_lookup(...); */
- BPF_SK_LOOKUP,
+ BPF_SK_LOOKUP(sk_lookup_tcp),
BPF_MOV64_REG(BPF_REG_2, BPF_REG_0),
/* u64 foo; */
/* void *target = &foo; */
.prog_type = BPF_PROG_TYPE_LWT_IN,
},
{
- "indirect variable-offset stack access",
+ "indirect variable-offset stack access, unbounded",
+ .insns = {
+ BPF_MOV64_IMM(BPF_REG_2, 6),
+ BPF_MOV64_IMM(BPF_REG_3, 28),
+ /* Fill the top 16 bytes of the stack. */
+ BPF_ST_MEM(BPF_DW, BPF_REG_10, -16, 0),
+ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+ /* Get an unknown value. */
+ BPF_LDX_MEM(BPF_DW, BPF_REG_4, BPF_REG_1, offsetof(struct bpf_sock_ops,
+ bytes_received)),
+ /* Check the lower bound but don't check the upper one. */
+ BPF_JMP_IMM(BPF_JSLT, BPF_REG_4, 0, 4),
+ /* Point the lower bound to initialized stack. Offset is now in range
+ * from fp-16 to fp+0x7fffffffffffffef, i.e. max value is unbounded.
+ */
+ BPF_ALU64_IMM(BPF_SUB, BPF_REG_4, 16),
+ BPF_ALU64_REG(BPF_ADD, BPF_REG_4, BPF_REG_10),
+ BPF_MOV64_IMM(BPF_REG_5, 8),
+ /* Dereference it indirectly. */
+ BPF_EMIT_CALL(BPF_FUNC_getsockopt),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .errstr = "R4 unbounded indirect variable offset stack access",
+ .result = REJECT,
+ .prog_type = BPF_PROG_TYPE_SOCK_OPS,
+},
+{
+ "indirect variable-offset stack access, max out of bound",
.insns = {
/* Fill the top 8 bytes of the stack */
BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
BPF_EXIT_INSN(),
},
.fixup_map_hash_8b = { 5 },
- .errstr = "variable stack read R2",
+ .errstr = "R2 max value is outside of stack bound",
+ .result = REJECT,
+ .prog_type = BPF_PROG_TYPE_LWT_IN,
+},
+{
+ "indirect variable-offset stack access, min out of bound",
+ .insns = {
+ /* Fill the top 8 bytes of the stack */
+ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+ /* Get an unknown value */
+ BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, 0),
+ /* Make it small and 4-byte aligned */
+ BPF_ALU64_IMM(BPF_AND, BPF_REG_2, 4),
+ BPF_ALU64_IMM(BPF_SUB, BPF_REG_2, 516),
+ /* add it to fp. We now have either fp-516 or fp-512, but
+ * we don't know which
+ */
+ BPF_ALU64_REG(BPF_ADD, BPF_REG_2, BPF_REG_10),
+ /* dereference it indirectly */
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map_hash_8b = { 5 },
+ .errstr = "R2 min value is outside of stack bound",
+ .result = REJECT,
+ .prog_type = BPF_PROG_TYPE_LWT_IN,
+},
+{
+ "indirect variable-offset stack access, max_off+size > max_initialized",
+ .insns = {
+ /* Fill only the second from top 8 bytes of the stack. */
+ BPF_ST_MEM(BPF_DW, BPF_REG_10, -16, 0),
+ /* Get an unknown value. */
+ BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, 0),
+ /* Make it small and 4-byte aligned. */
+ BPF_ALU64_IMM(BPF_AND, BPF_REG_2, 4),
+ BPF_ALU64_IMM(BPF_SUB, BPF_REG_2, 16),
+ /* Add it to fp. We now have either fp-12 or fp-16, but we don't know
+ * which. fp-12 size 8 is partially uninitialized stack.
+ */
+ BPF_ALU64_REG(BPF_ADD, BPF_REG_2, BPF_REG_10),
+ /* Dereference it indirectly. */
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map_hash_8b = { 5 },
+ .errstr = "invalid indirect read from stack var_off",
+ .result = REJECT,
+ .prog_type = BPF_PROG_TYPE_LWT_IN,
+},
+{
+ "indirect variable-offset stack access, min_off < min_initialized",
+ .insns = {
+ /* Fill only the top 8 bytes of the stack. */
+ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+ /* Get an unknown value */
+ BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, 0),
+ /* Make it small and 4-byte aligned. */
+ BPF_ALU64_IMM(BPF_AND, BPF_REG_2, 4),
+ BPF_ALU64_IMM(BPF_SUB, BPF_REG_2, 16),
+ /* Add it to fp. We now have either fp-12 or fp-16, but we don't know
+ * which. fp-16 size 8 is partially uninitialized stack.
+ */
+ BPF_ALU64_REG(BPF_ADD, BPF_REG_2, BPF_REG_10),
+ /* Dereference it indirectly. */
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map_hash_8b = { 5 },
+ .errstr = "invalid indirect read from stack var_off",
.result = REJECT,
.prog_type = BPF_PROG_TYPE_LWT_IN,
},
+{
+ "indirect variable-offset stack access, priv vs unpriv",
+ .insns = {
+ /* Fill the top 16 bytes of the stack. */
+ BPF_ST_MEM(BPF_DW, BPF_REG_10, -16, 0),
+ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+ /* Get an unknown value. */
+ BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, 0),
+ /* Make it small and 4-byte aligned. */
+ BPF_ALU64_IMM(BPF_AND, BPF_REG_2, 4),
+ BPF_ALU64_IMM(BPF_SUB, BPF_REG_2, 16),
+ /* Add it to fp. We now have either fp-12 or fp-16, we don't know
+ * which, but either way it points to initialized stack.
+ */
+ BPF_ALU64_REG(BPF_ADD, BPF_REG_2, BPF_REG_10),
+ /* Dereference it indirectly. */
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map_hash_8b = { 6 },
+ .errstr_unpriv = "R2 stack pointer arithmetic goes out of range, prohibited for !root",
+ .result_unpriv = REJECT,
+ .result = ACCEPT,
+ .prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+},
+{
+ "indirect variable-offset stack access, uninitialized",
+ .insns = {
+ BPF_MOV64_IMM(BPF_REG_2, 6),
+ BPF_MOV64_IMM(BPF_REG_3, 28),
+ /* Fill the top 16 bytes of the stack. */
+ BPF_ST_MEM(BPF_W, BPF_REG_10, -16, 0),
+ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+ /* Get an unknown value. */
+ BPF_LDX_MEM(BPF_W, BPF_REG_4, BPF_REG_1, 0),
+ /* Make it small and 4-byte aligned. */
+ BPF_ALU64_IMM(BPF_AND, BPF_REG_4, 4),
+ BPF_ALU64_IMM(BPF_SUB, BPF_REG_4, 16),
+ /* Add it to fp. We now have either fp-12 or fp-16, we don't know
+ * which, but either way it points to initialized stack.
+ */
+ BPF_ALU64_REG(BPF_ADD, BPF_REG_4, BPF_REG_10),
+ BPF_MOV64_IMM(BPF_REG_5, 8),
+ /* Dereference it indirectly. */
+ BPF_EMIT_CALL(BPF_FUNC_getsockopt),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .errstr = "invalid indirect read from stack var_off",
+ .result = REJECT,
+ .prog_type = BPF_PROG_TYPE_SOCK_OPS,
+},
+{
+ "indirect variable-offset stack access, ok",
+ .insns = {
+ /* Fill the top 16 bytes of the stack. */
+ BPF_ST_MEM(BPF_DW, BPF_REG_10, -16, 0),
+ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+ /* Get an unknown value. */
+ BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, 0),
+ /* Make it small and 4-byte aligned. */
+ BPF_ALU64_IMM(BPF_AND, BPF_REG_2, 4),
+ BPF_ALU64_IMM(BPF_SUB, BPF_REG_2, 16),
+ /* Add it to fp. We now have either fp-12 or fp-16, we don't know
+ * which, but either way it points to initialized stack.
+ */
+ BPF_ALU64_REG(BPF_ADD, BPF_REG_2, BPF_REG_10),
+ /* Dereference it indirectly. */
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map_hash_8b = { 6 },
+ .result = ACCEPT,
+ .prog_type = BPF_PROG_TYPE_LWT_IN,
+},
--- /dev/null
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# A test for strict prioritization of traffic in the switch. Run two streams of
+# traffic, each through a different ingress port, one tagged with PCP of 1, the
+# other with PCP of 2. Both streams converge at one egress port, where they are
+# assigned TC of, respectively, 1 and 2, with strict priority configured between
+# them. In H3, we expect to see (almost) exclusively the high-priority traffic.
+#
+# Please see qos_mc_aware.sh for an explanation of why we use mausezahn and
+# counters instead of just running iperf3.
+#
+# +---------------------------+ +-----------------------------+
+# | H1 | | H2 |
+# | $h1.111 + | | + $h2.222 |
+# | 192.0.2.33/28 | | | | 192.0.2.65/28 |
+# | e-qos-map 0:1 | | | | e-qos-map 0:2 |
+# | | | | | |
+# | $h1 + | | + $h2 |
+# +-----------------|---------+ +---------|-------------------+
+# | |
+# +-----------------|-------------------------------------|-------------------+
+# | $swp1 + + $swp2 |
+# | >1Gbps | | >1Gbps |
+# | +---------------|-----------+ +----------|----------------+ |
+# | | $swp1.111 + | | + $swp2.222 | |
+# | | BR111 | SW | BR222 | |
+# | | $swp3.111 + | | + $swp3.222 | |
+# | +---------------|-----------+ +----------|----------------+ |
+# | \_____________________________________/ |
+# | | |
+# | + $swp3 |
+# | | 1Gbps bottleneck |
+# | | ETS: (up n->tc n for n in 0..7) |
+# | | strict priority |
+# +------------------------------------|--------------------------------------+
+# |
+# +--------------------|--------------------+
+# | + $h3 H3 |
+# | / \ |
+# | / \ |
+# | $h3.111 + + $h3.222 |
+# | 192.0.2.34/28 192.0.2.66/28 |
+# +-----------------------------------------+
+
+ALL_TESTS="
+ ping_ipv4
+ test_ets_strict
+"
+
+lib_dir=$(dirname $0)/../../../net/forwarding
+
+NUM_NETIFS=6
+source $lib_dir/lib.sh
+source $lib_dir/devlink_lib.sh
+source qos_lib.sh
+
+h1_create()
+{
+ simple_if_init $h1
+ mtu_set $h1 10000
+
+ vlan_create $h1 111 v$h1 192.0.2.33/28
+ ip link set dev $h1.111 type vlan egress-qos-map 0:1
+}
+
+h1_destroy()
+{
+ vlan_destroy $h1 111
+
+ mtu_restore $h1
+ simple_if_fini $h1
+}
+
+h2_create()
+{
+ simple_if_init $h2
+ mtu_set $h2 10000
+
+ vlan_create $h2 222 v$h2 192.0.2.65/28
+ ip link set dev $h2.222 type vlan egress-qos-map 0:2
+}
+
+h2_destroy()
+{
+ vlan_destroy $h2 222
+
+ mtu_restore $h2
+ simple_if_fini $h2
+}
+
+h3_create()
+{
+ simple_if_init $h3
+ mtu_set $h3 10000
+
+ vlan_create $h3 111 v$h3 192.0.2.34/28
+ vlan_create $h3 222 v$h3 192.0.2.66/28
+}
+
+h3_destroy()
+{
+ vlan_destroy $h3 222
+ vlan_destroy $h3 111
+
+ mtu_restore $h3
+ simple_if_fini $h3
+}
+
+switch_create()
+{
+ ip link set dev $swp1 up
+ mtu_set $swp1 10000
+
+ ip link set dev $swp2 up
+ mtu_set $swp2 10000
+
+ # prio n -> TC n, strict scheduling
+ lldptool -T -i $swp3 -V ETS-CFG up2tc=0:0,1:1,2:2,3:3,4:4,5:5,6:6,7:7
+ lldptool -T -i $swp3 -V ETS-CFG tsa=$(
+ )"0:strict,"$(
+ )"1:strict,"$(
+ )"2:strict,"$(
+ )"3:strict,"$(
+ )"4:strict,"$(
+ )"5:strict,"$(
+ )"6:strict,"$(
+ )"7:strict"
+ sleep 1
+
+ ip link set dev $swp3 up
+ mtu_set $swp3 10000
+ ethtool -s $swp3 speed 1000 autoneg off
+
+ vlan_create $swp1 111
+ vlan_create $swp2 222
+ vlan_create $swp3 111
+ vlan_create $swp3 222
+
+ ip link add name br111 up type bridge vlan_filtering 0
+ ip link set dev $swp1.111 master br111
+ ip link set dev $swp3.111 master br111
+
+ ip link add name br222 up type bridge vlan_filtering 0
+ ip link set dev $swp2.222 master br222
+ ip link set dev $swp3.222 master br222
+
+ # Make sure that ingress quotas are smaller than egress so that there is
+ # room for both streams of traffic to be admitted to shared buffer.
+ devlink_pool_size_thtype_set 0 dynamic 10000000
+ devlink_pool_size_thtype_set 4 dynamic 10000000
+
+ devlink_port_pool_th_set $swp1 0 6
+ devlink_tc_bind_pool_th_set $swp1 1 ingress 0 6
+
+ devlink_port_pool_th_set $swp2 0 6
+ devlink_tc_bind_pool_th_set $swp2 2 ingress 0 6
+
+ devlink_tc_bind_pool_th_set $swp3 1 egress 4 7
+ devlink_tc_bind_pool_th_set $swp3 2 egress 4 7
+ devlink_port_pool_th_set $swp3 4 7
+}
+
+switch_destroy()
+{
+ devlink_port_pool_th_restore $swp3 4
+ devlink_tc_bind_pool_th_restore $swp3 2 egress
+ devlink_tc_bind_pool_th_restore $swp3 1 egress
+
+ devlink_tc_bind_pool_th_restore $swp2 2 ingress
+ devlink_port_pool_th_restore $swp2 0
+
+ devlink_tc_bind_pool_th_restore $swp1 1 ingress
+ devlink_port_pool_th_restore $swp1 0
+
+ devlink_pool_size_thtype_restore 4
+ devlink_pool_size_thtype_restore 0
+
+ ip link del dev br222
+ ip link del dev br111
+
+ vlan_destroy $swp3 222
+ vlan_destroy $swp3 111
+ vlan_destroy $swp2 222
+ vlan_destroy $swp1 111
+
+ ethtool -s $swp3 autoneg on
+ mtu_restore $swp3
+ ip link set dev $swp3 down
+ lldptool -T -i $swp3 -V ETS-CFG up2tc=0:0,1:0,2:0,3:0,4:0,5:0,6:0,7:0
+
+ mtu_restore $swp2
+ ip link set dev $swp2 down
+
+ mtu_restore $swp1
+ ip link set dev $swp1 down
+}
+
+setup_prepare()
+{
+ h1=${NETIFS[p1]}
+ swp1=${NETIFS[p2]}
+
+ swp2=${NETIFS[p3]}
+ h2=${NETIFS[p4]}
+
+ swp3=${NETIFS[p5]}
+ h3=${NETIFS[p6]}
+
+ h3mac=$(mac_get $h3)
+
+ vrf_prepare
+
+ h1_create
+ h2_create
+ h3_create
+ switch_create
+}
+
+cleanup()
+{
+ pre_cleanup
+
+ switch_destroy
+ h3_destroy
+ h2_destroy
+ h1_destroy
+
+ vrf_cleanup
+}
+
+ping_ipv4()
+{
+ ping_test $h1 192.0.2.34 " from H1"
+ ping_test $h2 192.0.2.66 " from H2"
+}
+
+rel()
+{
+ local old=$1; shift
+ local new=$1; shift
+
+ bc <<< "
+ scale=2
+ ret = 100 * $new / $old
+ if (ret > 0) { ret } else { 0 }
+ "
+}
+
+test_ets_strict()
+{
+ RET=0
+
+ # Run high-prio traffic on its own.
+ start_traffic $h2.222 192.0.2.65 192.0.2.66 $h3mac
+ local -a rate_2
+ rate_2=($(measure_rate $swp2 $h3 rx_octets_prio_2 "prio 2"))
+ check_err $? "Could not get high enough prio-2 ingress rate"
+ local rate_2_in=${rate_2[0]}
+ local rate_2_eg=${rate_2[1]}
+ stop_traffic # $h2.222
+
+ # Start low-prio stream.
+ start_traffic $h1.111 192.0.2.33 192.0.2.34 $h3mac
+
+ local -a rate_1
+ rate_1=($(measure_rate $swp1 $h3 rx_octets_prio_1 "prio 1"))
+ check_err $? "Could not get high enough prio-1 ingress rate"
+ local rate_1_in=${rate_1[0]}
+ local rate_1_eg=${rate_1[1]}
+
+ # High-prio and low-prio on their own should have about the same
+ # throughput.
+ local rel21=$(rel $rate_1_eg $rate_2_eg)
+ check_err $(bc <<< "$rel21 < 95")
+ check_err $(bc <<< "$rel21 > 105")
+
+ # Start the high-prio stream--now both streams run.
+ start_traffic $h2.222 192.0.2.65 192.0.2.66 $h3mac
+ rate_3=($(measure_rate $swp2 $h3 rx_octets_prio_2 "prio 2 w/ 1"))
+ check_err $? "Could not get high enough prio-2 ingress rate with prio-1"
+ local rate_3_in=${rate_3[0]}
+ local rate_3_eg=${rate_3[1]}
+ stop_traffic # $h2.222
+
+ stop_traffic # $h1.111
+
+ # High-prio should have about the same throughput whether or not
+ # low-prio is in the system.
+ local rel32=$(rel $rate_2_eg $rate_3_eg)
+ check_err $(bc <<< "$rel32 < 95")
+
+ log_test "strict priority"
+ echo "Ingress to switch:"
+ echo " p1 in rate $(humanize $rate_1_in)"
+ echo " p2 in rate $(humanize $rate_2_in)"
+ echo " p2 in rate w/ p1 $(humanize $rate_3_in)"
+ echo "Egress from switch:"
+ echo " p1 eg rate $(humanize $rate_1_eg)"
+ echo " p2 eg rate $(humanize $rate_2_eg) ($rel21% of p1)"
+ echo " p2 eg rate w/ p1 $(humanize $rate_3_eg) ($rel32% of p2)"
+}
+
+trap cleanup EXIT
+
+setup_prepare
+setup_wait
+
+tests_run
+
+exit $EXIT_STATUS
--- /dev/null
+# SPDX-License-Identifier: GPL-2.0
+
+humanize()
+{
+ local speed=$1; shift
+
+ for unit in bps Kbps Mbps Gbps; do
+ if (($(echo "$speed < 1024" | bc))); then
+ break
+ fi
+
+ speed=$(echo "scale=1; $speed / 1024" | bc)
+ done
+
+ echo "$speed${unit}"
+}
+
+rate()
+{
+ local t0=$1; shift
+ local t1=$1; shift
+ local interval=$1; shift
+
+ echo $((8 * (t1 - t0) / interval))
+}
+
+start_traffic()
+{
+ local h_in=$1; shift # Where the traffic egresses the host
+ local sip=$1; shift
+ local dip=$1; shift
+ local dmac=$1; shift
+
+ $MZ $h_in -p 8000 -A $sip -B $dip -c 0 \
+ -a own -b $dmac -t udp -q &
+ sleep 1
+}
+
+stop_traffic()
+{
+ # Suppress noise from killing mausezahn.
+ { kill %% && wait %%; } 2>/dev/null
+}
+
+check_rate()
+{
+ local rate=$1; shift
+ local min=$1; shift
+ local what=$1; shift
+
+ if ((rate > min)); then
+ return 0
+ fi
+
+ echo "$what $(humanize $ir) < $(humanize $min)" > /dev/stderr
+ return 1
+}
+
+measure_rate()
+{
+ local sw_in=$1; shift # Where the traffic ingresses the switch
+ local host_in=$1; shift # Where it ingresses another host
+ local counter=$1; shift # Counter to use for measurement
+ local what=$1; shift
+
+ local interval=10
+ local i
+ local ret=0
+
+ # Dips in performance might cause momentary ingress rate to drop below
+ # 1Gbps. That wouldn't saturate egress and MC would thus get through,
+ # seemingly winning bandwidth on account of UC. Demand at least 2Gbps
+ # average ingress rate to somewhat mitigate this.
+ local min_ingress=2147483648
+
+ for i in {5..0}; do
+ local t0=$(ethtool_stats_get $host_in $counter)
+ local u0=$(ethtool_stats_get $sw_in $counter)
+ sleep $interval
+ local t1=$(ethtool_stats_get $host_in $counter)
+ local u1=$(ethtool_stats_get $sw_in $counter)
+
+ local ir=$(rate $u0 $u1 $interval)
+ local er=$(rate $t0 $t1 $interval)
+
+ if check_rate $ir $min_ingress "$what ingress rate"; then
+ break
+ fi
+
+ # Fail the test if we can't get the throughput.
+ if ((i == 0)); then
+ ret=1
+ fi
+ done
+
+ echo $ir $er
+ return $ret
+}
NUM_NETIFS=6
source $lib_dir/lib.sh
+source $lib_dir/devlink_lib.sh
+source qos_lib.sh
h1_create()
{
ip link set dev br111 up
ip link set dev $swp2.111 master br111
ip link set dev $swp3.111 master br111
+
+ # Make sure that ingress quotas are smaller than egress so that there is
+ # room for both streams of traffic to be admitted to shared buffer.
+ devlink_port_pool_th_set $swp1 0 5
+ devlink_tc_bind_pool_th_set $swp1 0 ingress 0 5
+
+ devlink_port_pool_th_set $swp2 0 5
+ devlink_tc_bind_pool_th_set $swp2 1 ingress 0 5
+
+ devlink_port_pool_th_set $swp3 4 12
}
switch_destroy()
{
+ devlink_port_pool_th_restore $swp3 4
+
+ devlink_tc_bind_pool_th_restore $swp2 1 ingress
+ devlink_port_pool_th_restore $swp2 0
+
+ devlink_tc_bind_pool_th_restore $swp1 0 ingress
+ devlink_port_pool_th_restore $swp1 0
+
ip link del dev br111
ip link del dev br1
ping_test $h2 192.0.2.130
}
-humanize()
-{
- local speed=$1; shift
-
- for unit in bps Kbps Mbps Gbps; do
- if (($(echo "$speed < 1024" | bc))); then
- break
- fi
-
- speed=$(echo "scale=1; $speed / 1024" | bc)
- done
-
- echo "$speed${unit}"
-}
-
-rate()
-{
- local t0=$1; shift
- local t1=$1; shift
- local interval=$1; shift
-
- echo $((8 * (t1 - t0) / interval))
-}
-
-check_rate()
-{
- local rate=$1; shift
- local min=$1; shift
- local what=$1; shift
-
- if ((rate > min)); then
- return 0
- fi
-
- echo "$what $(humanize $ir) < $(humanize $min_ingress)" > /dev/stderr
- return 1
-}
-
-measure_uc_rate()
-{
- local what=$1; shift
-
- local interval=10
- local i
- local ret=0
-
- # Dips in performance might cause momentary ingress rate to drop below
- # 1Gbps. That wouldn't saturate egress and MC would thus get through,
- # seemingly winning bandwidth on account of UC. Demand at least 2Gbps
- # average ingress rate to somewhat mitigate this.
- local min_ingress=2147483648
-
- $MZ $h2.111 -p 8000 -A 192.0.2.129 -B 192.0.2.130 -c 0 \
- -a own -b $h3mac -t udp -q &
- sleep 1
-
- for i in {5..0}; do
- local t0=$(ethtool_stats_get $h3 rx_octets_prio_1)
- local u0=$(ethtool_stats_get $swp2 rx_octets_prio_1)
- sleep $interval
- local t1=$(ethtool_stats_get $h3 rx_octets_prio_1)
- local u1=$(ethtool_stats_get $swp2 rx_octets_prio_1)
-
- local ir=$(rate $u0 $u1 $interval)
- local er=$(rate $t0 $t1 $interval)
-
- if check_rate $ir $min_ingress "$what ingress rate"; then
- break
- fi
-
- # Fail the test if we can't get the throughput.
- if ((i == 0)); then
- ret=1
- fi
- done
-
- # Suppress noise from killing mausezahn.
- { kill %% && wait; } 2>/dev/null
-
- echo $ir $er
- exit $ret
-}
-
test_mc_aware()
{
RET=0
local -a uc_rate
- uc_rate=($(measure_uc_rate "UC-only"))
+ start_traffic $h2.111 192.0.2.129 192.0.2.130 $h3mac
+ uc_rate=($(measure_rate $swp2 $h3 rx_octets_prio_1 "UC-only"))
check_err $? "Could not get high enough UC-only ingress rate"
+ stop_traffic
local ucth1=${uc_rate[1]}
- $MZ $h1 -p 8000 -c 0 -a own -b bc -t udp -q &
+ start_traffic $h1 own bc bc
local d0=$(date +%s)
local t0=$(ethtool_stats_get $h3 rx_octets_prio_0)
local u0=$(ethtool_stats_get $swp1 rx_octets_prio_0)
local -a uc_rate_2
- uc_rate_2=($(measure_uc_rate "UC+MC"))
+ start_traffic $h2.111 192.0.2.129 192.0.2.130 $h3mac
+ uc_rate_2=($(measure_rate $swp2 $h3 rx_octets_prio_1 "UC+MC"))
check_err $? "Could not get high enough UC+MC ingress rate"
+ stop_traffic
local ucth2=${uc_rate_2[1]}
local d1=$(date +%s)
local mc_ir=$(rate $u0 $u1 $interval)
local mc_er=$(rate $t0 $t1 $interval)
- # Suppress noise from killing mausezahn.
- { kill %% && wait; } 2>/dev/null
+ stop_traffic
log_test "UC performace under MC overload"
{
RET=0
- $MZ $h2.111 -p 8000 -A 192.0.2.129 -B 192.0.2.130 -c 0 \
- -a own -b $h3mac -t udp -q &
+ start_traffic $h2.111 192.0.2.129 192.0.2.130 $h3mac
local d0=$(date +%s)
local t0=$(ethtool_stats_get $h3 rx_octets_prio_1)
((attempts == passes))
check_err $?
- # Suppress noise from killing mausezahn.
- { kill %% && wait; } 2>/dev/null
+ stop_traffic
log_test "MC performace under UC overload"
echo " ingress UC throughput $(humanize ${uc_ir})"
lag_dev_deletion_test
vlan_interface_uppers_test
bridge_extern_learn_test
+ neigh_offload_test
devlink_reload_test
"
NUM_NETIFS=2
ip link del dev br0
}
+neigh_offload_test()
+{
+ # Test that IPv4 and IPv6 neighbour entries are marked as offloaded
+ RET=0
+
+ ip -4 address add 192.0.2.1/24 dev $swp1
+ ip -6 address add 2001:db8:1::1/64 dev $swp1
+
+ ip -4 neigh add 192.0.2.2 lladdr de:ad:be:ef:13:37 nud perm dev $swp1
+ ip -6 neigh add 2001:db8:1::2 lladdr de:ad:be:ef:13:37 nud perm \
+ dev $swp1
+
+ ip -4 neigh show dev $swp1 | grep 192.0.2.2 | grep -q offload
+ check_err $? "ipv4 neigh entry not marked as offloaded when should"
+ ip -6 neigh show dev $swp1 | grep 2001:db8:1::2 | grep -q offload
+ check_err $? "ipv6 neigh entry not marked as offloaded when should"
+
+ log_test "neighbour offload indication"
+
+ ip -6 neigh del 2001:db8:1::2 dev $swp1
+ ip -4 neigh del 192.0.2.2 dev $swp1
+ ip -6 address del 2001:db8:1::1/64 dev $swp1
+ ip -4 address del 192.0.2.1/24 dev $swp1
+}
+
devlink_reload_test()
{
# Test that after executing all the above configuration tests, a
delta_two_masks_one_key_test delta_simple_rehash_test \
bloom_simple_test bloom_complex_test bloom_delta_test"
NUM_NETIFS=2
+source $lib_dir/lib.sh
source $lib_dir/tc_common.sh
source $lib_dir/devlink_lib.sh
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
+lib_dir=$(dirname $0)/../../../../net/forwarding
+
NUM_NETIFS=1
+source $lib_dir/lib.sh
source devlink_lib_spectrum.sh
setup_prepare()
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
+lib_dir=$(dirname $0)/../../../../net/forwarding
+
NUM_NETIFS=6
-source ../../../../net/forwarding/tc_common.sh
+source $lib_dir/lib.sh
+source $lib_dir/tc_common.sh
source devlink_lib_spectrum.sh
current_test=""
ksft_skip=4
# all tests in this script. Can be overridden with -t option
-TESTS="unregister down carrier nexthop ipv6_rt ipv4_rt ipv6_addr_metric ipv4_addr_metric ipv6_route_metrics ipv4_route_metrics"
+TESTS="unregister down carrier nexthop ipv6_rt ipv4_rt ipv6_addr_metric ipv4_addr_metric ipv6_route_metrics ipv4_route_metrics ipv4_route_v6_gw"
+
VERBOSE=0
PAUSE_ON_FAIL=no
PAUSE=no
{
set -e
ip netns add ns1
+ ip netns set ns1 auto
$IP link set dev lo up
ip netns exec ns1 sysctl -qw net.ipv4.ip_forward=1
ip netns exec ns1 sysctl -qw net.ipv6.conf.all.forwarding=1
set -e
ip netns add ns2
+ ip netns set ns2 auto
ip -netns ns2 link set dev lo up
ip netns exec ns2 sysctl -qw net.ipv4.ip_forward=1
ip netns exec ns2 sysctl -qw net.ipv6.conf.all.forwarding=1
route_cleanup
}
+ipv4_route_v6_gw_test()
+{
+ local rc
+
+ echo
+ echo "IPv4 route with IPv6 gateway tests"
+
+ route_setup
+ sleep 2
+
+ #
+ # single path route
+ #
+ run_cmd "$IP ro add 172.16.104.0/24 via inet6 2001:db8:101::2"
+ rc=$?
+ log_test $rc 0 "Single path route with IPv6 gateway"
+ if [ $rc -eq 0 ]; then
+ check_route "172.16.104.0/24 via inet6 2001:db8:101::2 dev veth1"
+ fi
+
+ run_cmd "ip netns exec ns1 ping -w1 -c1 172.16.104.1"
+ log_test $rc 0 "Single path route with IPv6 gateway - ping"
+
+ run_cmd "$IP ro del 172.16.104.0/24 via inet6 2001:db8:101::2"
+ rc=$?
+ log_test $rc 0 "Single path route delete"
+ if [ $rc -eq 0 ]; then
+ check_route "172.16.112.0/24"
+ fi
+
+ #
+ # multipath - v6 then v4
+ #
+ run_cmd "$IP ro add 172.16.104.0/24 nexthop via inet6 2001:db8:101::2 dev veth1 nexthop via 172.16.103.2 dev veth3"
+ rc=$?
+ log_test $rc 0 "Multipath route add - v6 nexthop then v4"
+ if [ $rc -eq 0 ]; then
+ check_route "172.16.104.0/24 nexthop via inet6 2001:db8:101::2 dev veth1 weight 1 nexthop via 172.16.103.2 dev veth3 weight 1"
+ fi
+
+ run_cmd "$IP ro del 172.16.104.0/24 nexthop via 172.16.103.2 dev veth3 nexthop via inet6 2001:db8:101::2 dev veth1"
+ log_test $? 2 " Multipath route delete - nexthops in wrong order"
+
+ run_cmd "$IP ro del 172.16.104.0/24 nexthop via inet6 2001:db8:101::2 dev veth1 nexthop via 172.16.103.2 dev veth3"
+ log_test $? 0 " Multipath route delete exact match"
+
+ #
+ # multipath - v4 then v6
+ #
+ run_cmd "$IP ro add 172.16.104.0/24 nexthop via 172.16.103.2 dev veth3 nexthop via inet6 2001:db8:101::2 dev veth1"
+ rc=$?
+ log_test $rc 0 "Multipath route add - v4 nexthop then v6"
+ if [ $rc -eq 0 ]; then
+ check_route "172.16.104.0/24 nexthop via 172.16.103.2 dev veth3 weight 1 nexthop via inet6 2001:db8:101::2 dev veth1 weight 1"
+ fi
+
+ run_cmd "$IP ro del 172.16.104.0/24 nexthop via inet6 2001:db8:101::2 dev veth1 nexthop via 172.16.103.2 dev veth3"
+ log_test $? 2 " Multipath route delete - nexthops in wrong order"
+
+ run_cmd "$IP ro del 172.16.104.0/24 nexthop via 172.16.103.2 dev veth3 nexthop via inet6 2001:db8:101::2 dev veth1"
+ log_test $? 0 " Multipath route delete exact match"
+
+ route_cleanup
+}
################################################################################
# usage
ipv4_addr_metric) ipv4_addr_metric_test;;
ipv6_route_metrics) ipv6_route_metrics_test;;
ipv4_route_metrics) ipv4_route_metrics_test;;
+ ipv4_route_v6_gw) ipv4_route_v6_gw_test;;
help) echo "Test names: $TESTS"; exit 0;;
esac
--- /dev/null
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+ALL_TESTS="reportleave_test"
+NUM_NETIFS=4
+CHECK_TC="yes"
+TEST_GROUP="239.10.10.10"
+TEST_GROUP_MAC="01:00:5e:0a:0a:0a"
+source lib.sh
+
+h1_create()
+{
+ simple_if_init $h1 192.0.2.1/24 2001:db8:1::1/64
+}
+
+h1_destroy()
+{
+ simple_if_fini $h1 192.0.2.1/24 2001:db8:1::1/64
+}
+
+h2_create()
+{
+ simple_if_init $h2 192.0.2.2/24 2001:db8:1::2/64
+}
+
+h2_destroy()
+{
+ simple_if_fini $h2 192.0.2.2/24 2001:db8:1::2/64
+}
+
+switch_create()
+{
+ ip link add dev br0 type bridge mcast_snooping 1 mcast_querier 1
+
+ ip link set dev $swp1 master br0
+ ip link set dev $swp2 master br0
+
+ ip link set dev br0 up
+ ip link set dev $swp1 up
+ ip link set dev $swp2 up
+}
+
+switch_destroy()
+{
+ ip link set dev $swp2 down
+ ip link set dev $swp1 down
+
+ ip link del dev br0
+}
+
+setup_prepare()
+{
+ h1=${NETIFS[p1]}
+ swp1=${NETIFS[p2]}
+
+ swp2=${NETIFS[p3]}
+ h2=${NETIFS[p4]}
+
+ vrf_prepare
+
+ h1_create
+ h2_create
+
+ switch_create
+}
+
+cleanup()
+{
+ pre_cleanup
+
+ switch_destroy
+
+ # Always cleanup the mcast group
+ ip address del dev $h2 $TEST_GROUP/32 2>&1 1>/dev/null
+
+ h2_destroy
+ h1_destroy
+
+ vrf_cleanup
+}
+
+# return 0 if the packet wasn't seen on host2_if or 1 if it was
+mcast_packet_test()
+{
+ local mac=$1
+ local ip=$2
+ local host1_if=$3
+ local host2_if=$4
+ local seen=0
+
+ # Add an ACL on `host2_if` which will tell us whether the packet
+ # was received by it or not.
+ tc qdisc add dev $host2_if ingress
+ tc filter add dev $host2_if ingress protocol ip pref 1 handle 101 \
+ flower dst_mac $mac action drop
+
+ $MZ $host1_if -c 1 -p 64 -b $mac -B $ip -t udp "dp=4096,sp=2048" -q
+ sleep 1
+
+ tc -j -s filter show dev $host2_if ingress \
+ | jq -e ".[] | select(.options.handle == 101) \
+ | select(.options.actions[0].stats.packets == 1)" &> /dev/null
+ if [[ $? -eq 0 ]]; then
+ seen=1
+ fi
+
+ tc filter del dev $host2_if ingress protocol ip pref 1 handle 101 flower
+ tc qdisc del dev $host2_if ingress
+
+ return $seen
+}
+
+reportleave_test()
+{
+ RET=0
+ ip address add dev $h2 $TEST_GROUP/32 autojoin
+ check_err $? "Could not join $TEST_GROUP"
+
+ sleep 5
+ bridge mdb show dev br0 | grep $TEST_GROUP 1>/dev/null
+ check_err $? "Report didn't create mdb entry for $TEST_GROUP"
+
+ mcast_packet_test $TEST_GROUP_MAC $TEST_GROUP $h1 $h2
+ check_fail $? "Traffic to $TEST_GROUP wasn't forwarded"
+
+ log_test "IGMP report $TEST_GROUP"
+
+ RET=0
+ bridge mdb show dev br0 | grep $TEST_GROUP 1>/dev/null
+ check_err $? "mdb entry for $TEST_GROUP is missing"
+
+ ip address del dev $h2 $TEST_GROUP/32
+ check_err $? "Could not leave $TEST_GROUP"
+
+ sleep 5
+ bridge mdb show dev br0 | grep $TEST_GROUP 1>/dev/null
+ check_fail $? "Leave didn't delete mdb entry for $TEST_GROUP"
+
+ mcast_packet_test $TEST_GROUP_MAC $TEST_GROUP $h1 $h2
+ check_err $? "Traffic to $TEST_GROUP was forwarded without mdb entry"
+
+ log_test "IGMP leave $TEST_GROUP"
+}
+
+trap cleanup EXIT
+
+setup_prepare
+setup_wait
+
+tests_run
+
+exit $EXIT_STATUS
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
-##############################################################################
-# Source library
-
-relative_path="${BASH_SOURCE%/*}"
-if [[ "$relative_path" == "${BASH_SOURCE}" ]]; then
- relative_path="."
-fi
-
-source "$relative_path/lib.sh"
-
##############################################################################
# Defines
-DEVLINK_DEV=$(devlink port show | grep "${NETIFS[p1]}" | \
- grep -v "${NETIFS[p1]}[0-9]" | cut -d" " -f1 | \
- rev | cut -d"/" -f2- | rev)
+DEVLINK_DEV=$(devlink port show "${NETIFS[p1]}" -j \
+ | jq -r '.port | keys[]' | cut -d/ -f-2)
if [ -z "$DEVLINK_DEV" ]; then
echo "SKIP: ${NETIFS[p1]} has no devlink device registered for it"
exit 1
grep -c "size_new")
check_err $still_pending "Failed reload - There are still unset sizes"
}
+
+declare -A DEVLINK_ORIG
+
+devlink_port_pool_threshold()
+{
+ local port=$1; shift
+ local pool=$1; shift
+
+ devlink sb port pool show $port pool $pool -j \
+ | jq '.port_pool."'"$port"'"[].threshold'
+}
+
+devlink_port_pool_th_set()
+{
+ local port=$1; shift
+ local pool=$1; shift
+ local th=$1; shift
+ local key="port_pool($port,$pool).threshold"
+
+ DEVLINK_ORIG[$key]=$(devlink_port_pool_threshold $port $pool)
+ devlink sb port pool set $port pool $pool th $th
+}
+
+devlink_port_pool_th_restore()
+{
+ local port=$1; shift
+ local pool=$1; shift
+ local key="port_pool($port,$pool).threshold"
+
+ devlink sb port pool set $port pool $pool th ${DEVLINK_ORIG[$key]}
+}
+
+devlink_pool_size_thtype()
+{
+ local pool=$1; shift
+
+ devlink sb pool show "$DEVLINK_DEV" pool $pool -j \
+ | jq -r '.pool[][] | (.size, .thtype)'
+}
+
+devlink_pool_size_thtype_set()
+{
+ local pool=$1; shift
+ local thtype=$1; shift
+ local size=$1; shift
+ local key="pool($pool).size_thtype"
+
+ DEVLINK_ORIG[$key]=$(devlink_pool_size_thtype $pool)
+ devlink sb pool set "$DEVLINK_DEV" pool $pool size $size thtype $thtype
+}
+
+devlink_pool_size_thtype_restore()
+{
+ local pool=$1; shift
+ local key="pool($pool).size_thtype"
+ local -a orig=(${DEVLINK_ORIG[$key]})
+
+ devlink sb pool set "$DEVLINK_DEV" pool $pool \
+ size ${orig[0]} thtype ${orig[1]}
+}
+
+devlink_tc_bind_pool_th()
+{
+ local port=$1; shift
+ local tc=$1; shift
+ local dir=$1; shift
+
+ devlink sb tc bind show $port tc $tc type $dir -j \
+ | jq -r '.tc_bind[][] | (.pool, .threshold)'
+}
+
+devlink_tc_bind_pool_th_set()
+{
+ local port=$1; shift
+ local tc=$1; shift
+ local dir=$1; shift
+ local pool=$1; shift
+ local th=$1; shift
+ local key="tc_bind($port,$dir,$tc).pool_th"
+
+ DEVLINK_ORIG[$key]=$(devlink_tc_bind_pool_th $port $tc $dir)
+ devlink sb tc bind set $port tc $tc type $dir pool $pool th $th
+}
+
+devlink_tc_bind_pool_th_restore()
+{
+ local port=$1; shift
+ local tc=$1; shift
+ local dir=$1; shift
+ local key="tc_bind($port,$dir,$tc).pool_th"
+ local -a orig=(${DEVLINK_ORIG[$key]})
+
+ devlink sb tc bind set $port tc $tc type $dir \
+ pool ${orig[0]} th ${orig[1]}
+}
# +------------------+ +------------------+
#
-ALL_TESTS="mcast_v4 mcast_v6"
+ALL_TESTS="mcast_v4 mcast_v6 rpf_v4 rpf_v6"
NUM_NETIFS=6
source lib.sh
source tc_common.sh
ip route add 2001:db8:2::/64 vrf v$h1 nexthop via 2001:db8:1::1
ip route add 2001:db8:3::/64 vrf v$h1 nexthop via 2001:db8:1::1
+
+ tc qdisc add dev $h1 ingress
}
h1_destroy()
{
+ tc qdisc del dev $h1 ingress
+
ip route del 2001:db8:3::/64 vrf v$h1
ip route del 2001:db8:2::/64 vrf v$h1
ip address add 2001:db8:1::1/64 dev $rp1
ip address add 2001:db8:2::1/64 dev $rp2
ip address add 2001:db8:3::1/64 dev $rp3
+
+ tc qdisc add dev $rp3 ingress
}
router_destroy()
{
+ tc qdisc del dev $rp3 ingress
+
ip address del 2001:db8:3::1/64 dev $rp3
ip address del 2001:db8:2::1/64 dev $rp2
ip address del 2001:db8:1::1/64 dev $rp1
log_test "mcast IPv6"
}
+rpf_v4()
+{
+ # Add a multicast route from first router port to the other two. Send
+ # matching packets and test that both hosts receive them. Then, send
+ # the same packets via the third router port and test that they do not
+ # reach any host due to RPF check. A filter with 'skip_hw' is added to
+ # test that devices capable of multicast routing offload trap those
+ # packets. The filter is essentialy a NOP in other scenarios.
+
+ RET=0
+
+ tc filter add dev $h1 ingress protocol ip pref 1 handle 1 flower \
+ dst_ip 225.1.2.3 ip_proto udp dst_port 12345 action drop
+ tc filter add dev $h2 ingress protocol ip pref 1 handle 1 flower \
+ dst_ip 225.1.2.3 ip_proto udp dst_port 12345 action drop
+ tc filter add dev $h3 ingress protocol ip pref 1 handle 1 flower \
+ dst_ip 225.1.2.3 ip_proto udp dst_port 12345 action drop
+ tc filter add dev $rp3 ingress protocol ip pref 1 handle 1 flower \
+ skip_hw dst_ip 225.1.2.3 ip_proto udp dst_port 12345 action pass
+
+ create_mcast_sg $rp1 198.51.100.2 225.1.2.3 $rp2 $rp3
+
+ $MZ $h1 -c 5 -p 128 -t udp "ttl=10,sp=54321,dp=12345" \
+ -a 00:11:22:33:44:55 -b 01:00:5e:01:02:03 \
+ -A 198.51.100.2 -B 225.1.2.3 -q
+
+ tc_check_packets "dev $h2 ingress" 1 5
+ check_err $? "Multicast not received on first host"
+ tc_check_packets "dev $h3 ingress" 1 5
+ check_err $? "Multicast not received on second host"
+
+ $MZ $h3 -c 5 -p 128 -t udp "ttl=10,sp=54321,dp=12345" \
+ -a 00:11:22:33:44:55 -b 01:00:5e:01:02:03 \
+ -A 198.51.100.2 -B 225.1.2.3 -q
+
+ tc_check_packets "dev $h1 ingress" 1 0
+ check_err $? "Multicast received on first host when should not"
+ tc_check_packets "dev $h2 ingress" 1 5
+ check_err $? "Multicast received on second host when should not"
+ tc_check_packets "dev $rp3 ingress" 1 5
+ check_err $? "Packets not trapped due to RPF check"
+
+ delete_mcast_sg $rp1 198.51.100.2 225.1.2.3 $rp2 $rp3
+
+ tc filter del dev $rp3 ingress protocol ip pref 1 handle 1 flower
+ tc filter del dev $h3 ingress protocol ip pref 1 handle 1 flower
+ tc filter del dev $h2 ingress protocol ip pref 1 handle 1 flower
+ tc filter del dev $h1 ingress protocol ip pref 1 handle 1 flower
+
+ log_test "RPF IPv4"
+}
+
+rpf_v6()
+{
+ RET=0
+
+ tc filter add dev $h1 ingress protocol ipv6 pref 1 handle 1 flower \
+ dst_ip ff0e::3 ip_proto udp dst_port 12345 action drop
+ tc filter add dev $h2 ingress protocol ipv6 pref 1 handle 1 flower \
+ dst_ip ff0e::3 ip_proto udp dst_port 12345 action drop
+ tc filter add dev $h3 ingress protocol ipv6 pref 1 handle 1 flower \
+ dst_ip ff0e::3 ip_proto udp dst_port 12345 action drop
+ tc filter add dev $rp3 ingress protocol ipv6 pref 1 handle 1 flower \
+ skip_hw dst_ip ff0e::3 ip_proto udp dst_port 12345 action pass
+
+ create_mcast_sg $rp1 2001:db8:1::2 ff0e::3 $rp2 $rp3
+
+ $MZ $h1 -6 -c 5 -p 128 -t udp "ttl=10,sp=54321,dp=12345" \
+ -a 00:11:22:33:44:55 -b 33:33:00:00:00:03 \
+ -A 2001:db8:1::2 -B ff0e::3 -q
+
+ tc_check_packets "dev $h2 ingress" 1 5
+ check_err $? "Multicast not received on first host"
+ tc_check_packets "dev $h3 ingress" 1 5
+ check_err $? "Multicast not received on second host"
+
+ $MZ $h3 -6 -c 5 -p 128 -t udp "ttl=10,sp=54321,dp=12345" \
+ -a 00:11:22:33:44:55 -b 33:33:00:00:00:03 \
+ -A 2001:db8:1::2 -B ff0e::3 -q
+
+ tc_check_packets "dev $h1 ingress" 1 0
+ check_err $? "Multicast received on first host when should not"
+ tc_check_packets "dev $h2 ingress" 1 5
+ check_err $? "Multicast received on second host when should not"
+ tc_check_packets "dev $rp3 ingress" 1 5
+ check_err $? "Packets not trapped due to RPF check"
+
+ delete_mcast_sg $rp1 2001:db8:1::2 ff0e::3 $rp2 $rp3
+
+ tc filter del dev $rp3 ingress protocol ipv6 pref 1 handle 1 flower
+ tc filter del dev $h3 ingress protocol ipv6 pref 1 handle 1 flower
+ tc filter del dev $h2 ingress protocol ipv6 pref 1 handle 1 flower
+ tc filter del dev $h1 ingress protocol ipv6 pref 1 handle 1 flower
+
+ log_test "RPF IPv6"
+}
+
trap cleanup EXIT
setup_prepare
# SPDX-License-Identifier: GPL-2.0
ALL_TESTS="match_dst_mac_test match_src_mac_test match_dst_ip_test \
- match_src_ip_test match_ip_flags_test"
+ match_src_ip_test match_ip_flags_test match_pcp_test match_vlan_test"
NUM_NETIFS=2
source tc_common.sh
source lib.sh
log_test "ip_flags match ($tcflags)"
}
+match_pcp_test()
+{
+ RET=0
+
+ vlan_create $h2 85 v$h2 192.0.2.11/24
+
+ tc filter add dev $h2 ingress protocol 802.1q pref 1 handle 101 \
+ flower vlan_prio 6 $tcflags dst_mac $h2mac action drop
+ tc filter add dev $h2 ingress protocol 802.1q pref 2 handle 102 \
+ flower vlan_prio 7 $tcflags dst_mac $h2mac action drop
+
+ $MZ $h1 -c 1 -p 64 -a $h1mac -b $h2mac -B 192.0.2.11 -Q 7:85 -t ip -q
+ $MZ $h1 -c 1 -p 64 -a $h1mac -b $h2mac -B 192.0.2.11 -Q 0:85 -t ip -q
+
+ tc_check_packets "dev $h2 ingress" 101 0
+ check_err $? "Matched on specified PCP when should not"
+
+ tc_check_packets "dev $h2 ingress" 102 1
+ check_err $? "Did not match on specified PCP"
+
+ tc filter del dev $h2 ingress protocol 802.1q pref 2 handle 102 flower
+ tc filter del dev $h2 ingress protocol 802.1q pref 1 handle 101 flower
+
+ vlan_destroy $h2 85
+
+ log_test "PCP match ($tcflags)"
+}
+
+match_vlan_test()
+{
+ RET=0
+
+ vlan_create $h2 85 v$h2 192.0.2.11/24
+ vlan_create $h2 75 v$h2 192.0.2.10/24
+
+ tc filter add dev $h2 ingress protocol 802.1q pref 1 handle 101 \
+ flower vlan_id 75 $tcflags action drop
+ tc filter add dev $h2 ingress protocol 802.1q pref 2 handle 102 \
+ flower vlan_id 85 $tcflags action drop
+
+ $MZ $h1 -c 1 -p 64 -a $h1mac -b $h2mac -B 192.0.2.11 -Q 0:85 -t ip -q
+
+ tc_check_packets "dev $h2 ingress" 101 0
+ check_err $? "Matched on specified VLAN when should not"
+
+ tc_check_packets "dev $h2 ingress" 102 1
+ check_err $? "Did not match on specified VLAN"
+
+ tc filter del dev $h2 ingress protocol 802.1q pref 2 handle 102 flower
+ tc filter del dev $h2 ingress protocol 802.1q pref 1 handle 101 flower
+
+ vlan_destroy $h2 75
+ vlan_destroy $h2 85
+
+ log_test "VLAN match ($tcflags)"
+}
+
setup_prepare()
{
h1=${NETIFS[p1]}
--- /dev/null
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+ALL_TESTS="
+ vlan_modify_ingress
+ vlan_modify_egress
+"
+
+NUM_NETIFS=4
+CHECK_TC="yes"
+source lib.sh
+
+h1_create()
+{
+ simple_if_init $h1 192.0.2.1/28 2001:db8:1::1/64
+ vlan_create $h1 85 v$h1 192.0.2.17/28 2001:db8:2::1/64
+}
+
+h1_destroy()
+{
+ vlan_destroy $h1 85
+ simple_if_fini $h1 192.0.2.1/28 2001:db8:1::1/64
+}
+
+h2_create()
+{
+ simple_if_init $h2 192.0.2.2/28 2001:db8:1::2/64
+ vlan_create $h2 65 v$h2 192.0.2.18/28 2001:db8:2::2/64
+}
+
+h2_destroy()
+{
+ vlan_destroy $h2 65
+ simple_if_fini $h2 192.0.2.2/28 2001:db8:1::2/64
+}
+
+switch_create()
+{
+ ip link add dev br0 type bridge vlan_filtering 1 mcast_snooping 0
+
+ ip link set dev $swp1 master br0
+ ip link set dev $swp2 master br0
+
+ ip link set dev br0 up
+ ip link set dev $swp1 up
+ ip link set dev $swp2 up
+
+ bridge vlan add dev $swp1 vid 85
+ bridge vlan add dev $swp2 vid 65
+
+ bridge vlan add dev $swp2 vid 85
+ bridge vlan add dev $swp1 vid 65
+
+ tc qdisc add dev $swp1 clsact
+ tc qdisc add dev $swp2 clsact
+}
+
+switch_destroy()
+{
+ tc qdisc del dev $swp2 clsact
+ tc qdisc del dev $swp1 clsact
+
+ bridge vlan del vid 65 dev $swp1
+ bridge vlan del vid 85 dev $swp2
+
+ bridge vlan del vid 65 dev $swp2
+ bridge vlan del vid 85 dev $swp1
+
+ ip link set dev $swp2 down
+ ip link set dev $swp1 down
+
+ ip link del dev br0
+}
+
+setup_prepare()
+{
+ h1=${NETIFS[p1]}
+ swp1=${NETIFS[p2]}
+
+ swp2=${NETIFS[p3]}
+ h2=${NETIFS[p4]}
+
+ vrf_prepare
+
+ h1_create
+ h2_create
+
+ switch_create
+}
+
+cleanup()
+{
+ pre_cleanup
+
+ switch_destroy
+
+ h2_destroy
+ h1_destroy
+
+ vrf_cleanup
+}
+
+vlan_modify_ingress()
+{
+ RET=0
+
+ ping_do $h1.85 192.0.2.18
+ check_fail $? "ping between two different vlans passed when should not"
+
+ ping6_do $h1.85 2001:db8:2::2
+ check_fail $? "ping6 between two different vlans passed when should not"
+
+ tc filter add dev $swp1 ingress protocol all pref 1 handle 1 \
+ flower action vlan modify id 65
+ tc filter add dev $swp2 ingress protocol all pref 1 handle 1 \
+ flower action vlan modify id 85
+
+ ping_do $h1.85 192.0.2.18
+ check_err $? "ping between two different vlans failed when should not"
+
+ ping6_do $h1.85 2001:db8:2::2
+ check_err $? "ping6 between two different vlans failed when should not"
+
+ log_test "VLAN modify at ingress"
+
+ tc filter del dev $swp2 ingress protocol all pref 1 handle 1 flower
+ tc filter del dev $swp1 ingress protocol all pref 1 handle 1 flower
+}
+
+vlan_modify_egress()
+{
+ RET=0
+
+ ping_do $h1.85 192.0.2.18
+ check_fail $? "ping between two different vlans passed when should not"
+
+ ping6_do $h1.85 2001:db8:2::2
+ check_fail $? "ping6 between two different vlans passed when should not"
+
+ tc filter add dev $swp1 egress protocol all pref 1 handle 1 \
+ flower action vlan modify id 85
+ tc filter add dev $swp2 egress protocol all pref 1 handle 1 \
+ flower action vlan modify id 65
+
+ ping_do $h1.85 192.0.2.18
+ check_err $? "ping between two different vlans failed when should not"
+
+ ping6_do $h1.85 2001:db8:2::2
+ check_err $? "ping6 between two different vlans failed when should not"
+
+ log_test "VLAN modify at egress"
+
+ tc filter del dev $swp2 egress protocol all pref 1 handle 1 flower
+ tc filter del dev $swp1 egress protocol all pref 1 handle 1 flower
+}
+
+trap cleanup EXIT
+
+setup_prepare
+setup_wait
+
+tests_run
+
+exit $EXIT_STATUS
# Kselftest framework requirement - SKIP code is 4.
ksft_skip=4
+PAUSE_ON_FAIL=no
+VERBOSE=0
+TRACING=0
+
# Some systems don't have a ping6 binary anymore
which ping6 > /dev/null 2>&1 && ping6=$(which ping6) || ping6=$(which ping)
err_buf=
}
+run_cmd() {
+ cmd="$*"
+
+ if [ "$VERBOSE" = "1" ]; then
+ printf " COMMAND: $cmd\n"
+ fi
+
+ out="$($cmd 2>&1)"
+ rc=$?
+ if [ "$VERBOSE" = "1" -a -n "$out" ]; then
+ echo " $out"
+ echo
+ fi
+
+ return $rc
+}
+
# Find the auto-generated name for this namespace
nsname() {
eval echo \$NS_$1
fi
fi
- ${ns_a} ip fou add port 5555 ipproto ${ipproto} || return 2
- ${ns_a} ip link add ${encap}_a type ${type} ${mode} local ${a_addr} remote ${b_addr} encap ${encap} encap-sport auto encap-dport 5556 || return 2
+ run_cmd ${ns_a} ip fou add port 5555 ipproto ${ipproto} || return 2
+ run_cmd ${ns_a} ip link add ${encap}_a type ${type} ${mode} local ${a_addr} remote ${b_addr} encap ${encap} encap-sport auto encap-dport 5556 || return 2
- ${ns_b} ip fou add port 5556 ipproto ${ipproto}
- ${ns_b} ip link add ${encap}_b type ${type} ${mode} local ${b_addr} remote ${a_addr} encap ${encap} encap-sport auto encap-dport 5555
+ run_cmd ${ns_b} ip fou add port 5556 ipproto ${ipproto}
+ run_cmd ${ns_b} ip link add ${encap}_b type ${type} ${mode} local ${b_addr} remote ${a_addr} encap ${encap} encap-sport auto encap-dport 5555
if [ "${inner}" = "4" ]; then
- ${ns_a} ip addr add ${tunnel4_a_addr}/${tunnel4_mask} dev ${encap}_a
- ${ns_b} ip addr add ${tunnel4_b_addr}/${tunnel4_mask} dev ${encap}_b
+ run_cmd ${ns_a} ip addr add ${tunnel4_a_addr}/${tunnel4_mask} dev ${encap}_a
+ run_cmd ${ns_b} ip addr add ${tunnel4_b_addr}/${tunnel4_mask} dev ${encap}_b
else
- ${ns_a} ip addr add ${tunnel6_a_addr}/${tunnel6_mask} dev ${encap}_a
- ${ns_b} ip addr add ${tunnel6_b_addr}/${tunnel6_mask} dev ${encap}_b
+ run_cmd ${ns_a} ip addr add ${tunnel6_a_addr}/${tunnel6_mask} dev ${encap}_a
+ run_cmd ${ns_b} ip addr add ${tunnel6_b_addr}/${tunnel6_mask} dev ${encap}_b
fi
- ${ns_a} ip link set ${encap}_a up
- ${ns_b} ip link set ${encap}_b up
+ run_cmd ${ns_a} ip link set ${encap}_a up
+ run_cmd ${ns_b} ip link set ${encap}_b up
}
setup_fou44() {
}
setup_veth() {
- ${ns_a} ip link add veth_a type veth peer name veth_b || return 1
- ${ns_a} ip link set veth_b netns ${NS_B}
+ run_cmd ${ns_a} ip link add veth_a type veth peer name veth_b || return 1
+ run_cmd ${ns_a} ip link set veth_b netns ${NS_B}
- ${ns_a} ip addr add ${veth4_a_addr}/${veth4_mask} dev veth_a
- ${ns_b} ip addr add ${veth4_b_addr}/${veth4_mask} dev veth_b
+ run_cmd ${ns_a} ip addr add ${veth4_a_addr}/${veth4_mask} dev veth_a
+ run_cmd ${ns_b} ip addr add ${veth4_b_addr}/${veth4_mask} dev veth_b
- ${ns_a} ip addr add ${veth6_a_addr}/${veth6_mask} dev veth_a
- ${ns_b} ip addr add ${veth6_b_addr}/${veth6_mask} dev veth_b
+ run_cmd ${ns_a} ip addr add ${veth6_a_addr}/${veth6_mask} dev veth_a
+ run_cmd ${ns_b} ip addr add ${veth6_b_addr}/${veth6_mask} dev veth_b
- ${ns_a} ip link set veth_a up
- ${ns_b} ip link set veth_b up
+ run_cmd ${ns_a} ip link set veth_a up
+ run_cmd ${ns_b} ip link set veth_b up
}
setup_vti() {
[ ${proto} -eq 6 ] && vti_type="vti6" || vti_type="vti"
- ${ns_a} ip link add vti${proto}_a type ${vti_type} local ${veth_a_addr} remote ${veth_b_addr} key 10 || return 1
- ${ns_b} ip link add vti${proto}_b type ${vti_type} local ${veth_b_addr} remote ${veth_a_addr} key 10
+ run_cmd ${ns_a} ip link add vti${proto}_a type ${vti_type} local ${veth_a_addr} remote ${veth_b_addr} key 10 || return 1
+ run_cmd ${ns_b} ip link add vti${proto}_b type ${vti_type} local ${veth_b_addr} remote ${veth_a_addr} key 10
- ${ns_a} ip addr add ${vti_a_addr}/${vti_mask} dev vti${proto}_a
- ${ns_b} ip addr add ${vti_b_addr}/${vti_mask} dev vti${proto}_b
+ run_cmd ${ns_a} ip addr add ${vti_a_addr}/${vti_mask} dev vti${proto}_a
+ run_cmd ${ns_b} ip addr add ${vti_b_addr}/${vti_mask} dev vti${proto}_b
- ${ns_a} ip link set vti${proto}_a up
- ${ns_b} ip link set vti${proto}_b up
+ run_cmd ${ns_a} ip link set vti${proto}_a up
+ run_cmd ${ns_b} ip link set vti${proto}_b up
}
setup_vti4() {
opts_b=""
fi
- ${ns_a} ip link add ${type}_a type ${type} id 1 ${opts_a} remote ${b_addr} ${opts} || return 1
- ${ns_b} ip link add ${type}_b type ${type} id 1 ${opts_b} remote ${a_addr} ${opts}
+ run_cmd ${ns_a} ip link add ${type}_a type ${type} id 1 ${opts_a} remote ${b_addr} ${opts} || return 1
+ run_cmd ${ns_b} ip link add ${type}_b type ${type} id 1 ${opts_b} remote ${a_addr} ${opts}
- ${ns_a} ip addr add ${tunnel4_a_addr}/${tunnel4_mask} dev ${type}_a
- ${ns_b} ip addr add ${tunnel4_b_addr}/${tunnel4_mask} dev ${type}_b
+ run_cmd ${ns_a} ip addr add ${tunnel4_a_addr}/${tunnel4_mask} dev ${type}_a
+ run_cmd ${ns_b} ip addr add ${tunnel4_b_addr}/${tunnel4_mask} dev ${type}_b
- ${ns_a} ip addr add ${tunnel6_a_addr}/${tunnel6_mask} dev ${type}_a
- ${ns_b} ip addr add ${tunnel6_b_addr}/${tunnel6_mask} dev ${type}_b
+ run_cmd ${ns_a} ip addr add ${tunnel6_a_addr}/${tunnel6_mask} dev ${type}_a
+ run_cmd ${ns_b} ip addr add ${tunnel6_b_addr}/${tunnel6_mask} dev ${type}_b
- ${ns_a} ip link set ${type}_a up
- ${ns_b} ip link set ${type}_b up
+ run_cmd ${ns_a} ip link set ${type}_a up
+ run_cmd ${ns_b} ip link set ${type}_b up
}
setup_geneve4() {
veth_a_addr="${2}"
veth_b_addr="${3}"
- ${ns_a} ip -${proto} xfrm state add src ${veth_a_addr} dst ${veth_b_addr} spi 0x1000 proto esp aead "rfc4106(gcm(aes))" 0x0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f 128 mode tunnel || return 1
- ${ns_a} ip -${proto} xfrm state add src ${veth_b_addr} dst ${veth_a_addr} spi 0x1001 proto esp aead "rfc4106(gcm(aes))" 0x0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f 128 mode tunnel
- ${ns_a} ip -${proto} xfrm policy add dir out mark 10 tmpl src ${veth_a_addr} dst ${veth_b_addr} proto esp mode tunnel
- ${ns_a} ip -${proto} xfrm policy add dir in mark 10 tmpl src ${veth_b_addr} dst ${veth_a_addr} proto esp mode tunnel
+ run_cmd "${ns_a} ip -${proto} xfrm state add src ${veth_a_addr} dst ${veth_b_addr} spi 0x1000 proto esp aead 'rfc4106(gcm(aes))' 0x0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f 128 mode tunnel" || return 1
+ run_cmd "${ns_a} ip -${proto} xfrm state add src ${veth_b_addr} dst ${veth_a_addr} spi 0x1001 proto esp aead 'rfc4106(gcm(aes))' 0x0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f 128 mode tunnel"
+ run_cmd "${ns_a} ip -${proto} xfrm policy add dir out mark 10 tmpl src ${veth_a_addr} dst ${veth_b_addr} proto esp mode tunnel"
+ run_cmd "${ns_a} ip -${proto} xfrm policy add dir in mark 10 tmpl src ${veth_b_addr} dst ${veth_a_addr} proto esp mode tunnel"
- ${ns_b} ip -${proto} xfrm state add src ${veth_a_addr} dst ${veth_b_addr} spi 0x1000 proto esp aead "rfc4106(gcm(aes))" 0x0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f 128 mode tunnel
- ${ns_b} ip -${proto} xfrm state add src ${veth_b_addr} dst ${veth_a_addr} spi 0x1001 proto esp aead "rfc4106(gcm(aes))" 0x0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f 128 mode tunnel
- ${ns_b} ip -${proto} xfrm policy add dir out mark 10 tmpl src ${veth_b_addr} dst ${veth_a_addr} proto esp mode tunnel
- ${ns_b} ip -${proto} xfrm policy add dir in mark 10 tmpl src ${veth_a_addr} dst ${veth_b_addr} proto esp mode tunnel
+ run_cmd "${ns_b} ip -${proto} xfrm state add src ${veth_a_addr} dst ${veth_b_addr} spi 0x1000 proto esp aead 'rfc4106(gcm(aes))' 0x0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f 128 mode tunnel"
+ run_cmd "${ns_b} ip -${proto} xfrm state add src ${veth_b_addr} dst ${veth_a_addr} spi 0x1001 proto esp aead 'rfc4106(gcm(aes))' 0x0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f 128 mode tunnel"
+ run_cmd "${ns_b} ip -${proto} xfrm policy add dir out mark 10 tmpl src ${veth_b_addr} dst ${veth_a_addr} proto esp mode tunnel"
+ run_cmd "${ns_b} ip -${proto} xfrm policy add dir in mark 10 tmpl src ${veth_a_addr} dst ${veth_b_addr} proto esp mode tunnel"
}
setup_xfrm4() {
}
trace() {
- [ $tracing -eq 0 ] && return
+ [ $TRACING -eq 0 ] && return
for arg do
[ "${ns_cmd}" = "" ] && ns_cmd="${arg}" && continue
mtu "${ns_b}" veth_B-R2 1500
# Create route exceptions
- ${ns_a} ${ping} -q -M want -i 0.1 -w 1 -s 1800 ${dst1} > /dev/null
- ${ns_a} ${ping} -q -M want -i 0.1 -w 1 -s 1800 ${dst2} > /dev/null
+ run_cmd ${ns_a} ${ping} -q -M want -i 0.1 -w 1 -s 1800 ${dst1}
+ run_cmd ${ns_a} ${ping} -q -M want -i 0.1 -w 1 -s 1800 ${dst2}
# Check that exceptions have been created with the correct PMTU
pmtu_1="$(route_get_dst_pmtu_from_exception "${ns_a}" ${dst1})"
# Decrease remote MTU on path via R2, get new exception
mtu "${ns_r2}" veth_R2-B 400
mtu "${ns_b}" veth_B-R2 400
- ${ns_a} ${ping} -q -M want -i 0.1 -w 1 -s 1400 ${dst2} > /dev/null
+ run_cmd ${ns_a} ${ping} -q -M want -i 0.1 -w 1 -s 1400 ${dst2}
pmtu_2="$(route_get_dst_pmtu_from_exception "${ns_a}" ${dst2})"
check_pmtu_value "lock 552" "${pmtu_2}" "exceeding MTU, with MTU < min_pmtu" || return 1
check_pmtu_value "1500" "${pmtu_2}" "increasing local MTU" || return 1
# Get new exception
- ${ns_a} ${ping} -q -M want -i 0.1 -w 1 -s 1400 ${dst2} > /dev/null
+ run_cmd ${ns_a} ${ping} -q -M want -i 0.1 -w 1 -s 1400 ${dst2}
pmtu_2="$(route_get_dst_pmtu_from_exception "${ns_a}" ${dst2})"
check_pmtu_value "lock 552" "${pmtu_2}" "exceeding MTU, with MTU < min_pmtu" || return 1
}
mtu "${ns_a}" ${type}_a $((${ll_mtu} + 1000))
mtu "${ns_b}" ${type}_b $((${ll_mtu} + 1000))
- ${ns_a} ${ping} -q -M want -i 0.1 -w 1 -s $((${ll_mtu} + 500)) ${dst} > /dev/null
+ run_cmd ${ns_a} ${ping} -q -M want -i 0.1 -w 1 -s $((${ll_mtu} + 500)) ${dst}
# Check that exception was created
pmtu="$(route_get_dst_pmtu_from_exception "${ns_a}" ${dst})"
mtu "${ns_a}" ${encap}_a $((${ll_mtu} + 1000))
mtu "${ns_b}" ${encap}_b $((${ll_mtu} + 1000))
- ${ns_a} ${ping} -q -M want -i 0.1 -w 1 -s $((${ll_mtu} + 500)) ${dst} > /dev/null
+ run_cmd ${ns_a} ${ping} -q -M want -i 0.1 -w 1 -s $((${ll_mtu} + 500)) ${dst}
# Check that exception was created
pmtu="$(route_get_dst_pmtu_from_exception "${ns_a}" ${dst})"
# Send DF packet without exceeding link layer MTU, check that no
# exception is created
- ${ns_a} ping -q -M want -i 0.1 -w 1 -s ${ping_payload} ${tunnel4_b_addr} > /dev/null
+ run_cmd ${ns_a} ping -q -M want -i 0.1 -w 1 -s ${ping_payload} ${tunnel4_b_addr}
pmtu="$(route_get_dst_pmtu_from_exception "${ns_a}" ${tunnel4_b_addr})"
check_pmtu_value "" "${pmtu}" "sending packet smaller than PMTU (IP payload length ${esp_payload_rfc4106})" || return 1
# Now exceed link layer MTU by one byte, check that exception is created
# with the right PMTU value
- ${ns_a} ping -q -M want -i 0.1 -w 1 -s $((ping_payload + 1)) ${tunnel4_b_addr} > /dev/null
+ run_cmd ${ns_a} ping -q -M want -i 0.1 -w 1 -s $((ping_payload + 1)) ${tunnel4_b_addr}
pmtu="$(route_get_dst_pmtu_from_exception "${ns_a}" ${tunnel4_b_addr})"
check_pmtu_value "${esp_payload_rfc4106}" "${pmtu}" "exceeding PMTU (IP payload length $((esp_payload_rfc4106 + 1)))"
}
mtu "${ns_b}" veth_b 4000
mtu "${ns_a}" vti6_a 5000
mtu "${ns_b}" vti6_b 5000
- ${ns_a} ${ping6} -q -i 0.1 -w 1 -s 60000 ${tunnel6_b_addr} > /dev/null
+ run_cmd ${ns_a} ${ping6} -q -i 0.1 -w 1 -s 60000 ${tunnel6_b_addr}
# Check that exception was created
pmtu="$(route_get_dst_pmtu_from_exception "${ns_a}" ${tunnel6_b_addr})"
test_pmtu_vti4_link_add_mtu() {
setup namespaces || return 2
- ${ns_a} ip link add vti4_a type vti local ${veth4_a_addr} remote ${veth4_b_addr} key 10
+ run_cmd ${ns_a} ip link add vti4_a type vti local ${veth4_a_addr} remote ${veth4_b_addr} key 10
[ $? -ne 0 ] && err " vti not supported" && return 2
- ${ns_a} ip link del vti4_a
+ run_cmd ${ns_a} ip link del vti4_a
fail=0
max=$((65535 - 20))
# Check invalid values first
for v in $((min - 1)) $((max + 1)); do
- ${ns_a} ip link add vti4_a mtu ${v} type vti local ${veth4_a_addr} remote ${veth4_b_addr} key 10 2>/dev/null
+ run_cmd ${ns_a} ip link add vti4_a mtu ${v} type vti local ${veth4_a_addr} remote ${veth4_b_addr} key 10
# This can fail, or MTU can be adjusted to a proper value
[ $? -ne 0 ] && continue
mtu="$(link_get_mtu "${ns_a}" vti4_a)"
err " vti tunnel created with invalid MTU ${mtu}"
fail=1
fi
- ${ns_a} ip link del vti4_a
+ run_cmd ${ns_a} ip link del vti4_a
done
# Now check valid values
for v in ${min} 1300 ${max}; do
- ${ns_a} ip link add vti4_a mtu ${v} type vti local ${veth4_a_addr} remote ${veth4_b_addr} key 10
+ run_cmd ${ns_a} ip link add vti4_a mtu ${v} type vti local ${veth4_a_addr} remote ${veth4_b_addr} key 10
mtu="$(link_get_mtu "${ns_a}" vti4_a)"
- ${ns_a} ip link del vti4_a
+ run_cmd ${ns_a} ip link del vti4_a
if [ "${mtu}" != "${v}" ]; then
err " vti MTU ${mtu} doesn't match configured value ${v}"
fail=1
test_pmtu_vti6_link_add_mtu() {
setup namespaces || return 2
- ${ns_a} ip link add vti6_a type vti6 local ${veth6_a_addr} remote ${veth6_b_addr} key 10
+ run_cmd ${ns_a} ip link add vti6_a type vti6 local ${veth6_a_addr} remote ${veth6_b_addr} key 10
[ $? -ne 0 ] && err " vti6 not supported" && return 2
- ${ns_a} ip link del vti6_a
+ run_cmd ${ns_a} ip link del vti6_a
fail=0
max=$((65535 - 40))
# Check invalid values first
for v in $((min - 1)) $((max + 1)); do
- ${ns_a} ip link add vti6_a mtu ${v} type vti6 local ${veth6_a_addr} remote ${veth6_b_addr} key 10 2>/dev/null
+ run_cmd ${ns_a} ip link add vti6_a mtu ${v} type vti6 local ${veth6_a_addr} remote ${veth6_b_addr} key 10
# This can fail, or MTU can be adjusted to a proper value
[ $? -ne 0 ] && continue
mtu="$(link_get_mtu "${ns_a}" vti6_a)"
err " vti6 tunnel created with invalid MTU ${v}"
fail=1
fi
- ${ns_a} ip link del vti6_a
+ run_cmd ${ns_a} ip link del vti6_a
done
# Now check valid values
for v in 68 1280 1300 $((65535 - 40)); do
- ${ns_a} ip link add vti6_a mtu ${v} type vti6 local ${veth6_a_addr} remote ${veth6_b_addr} key 10
+ run_cmd ${ns_a} ip link add vti6_a mtu ${v} type vti6 local ${veth6_a_addr} remote ${veth6_b_addr} key 10
mtu="$(link_get_mtu "${ns_a}" vti6_a)"
- ${ns_a} ip link del vti6_a
+ run_cmd ${ns_a} ip link del vti6_a
if [ "${mtu}" != "${v}" ]; then
err " vti6 MTU ${mtu} doesn't match configured value ${v}"
fail=1
test_pmtu_vti6_link_change_mtu() {
setup namespaces || return 2
- ${ns_a} ip link add dummy0 mtu 1500 type dummy
+ run_cmd ${ns_a} ip link add dummy0 mtu 1500 type dummy
[ $? -ne 0 ] && err " dummy not supported" && return 2
- ${ns_a} ip link add dummy1 mtu 3000 type dummy
- ${ns_a} ip link set dummy0 up
- ${ns_a} ip link set dummy1 up
+ run_cmd ${ns_a} ip link add dummy1 mtu 3000 type dummy
+ run_cmd ${ns_a} ip link set dummy0 up
+ run_cmd ${ns_a} ip link set dummy1 up
- ${ns_a} ip addr add ${dummy6_0_addr}/${dummy6_mask} dev dummy0
- ${ns_a} ip addr add ${dummy6_1_addr}/${dummy6_mask} dev dummy1
+ run_cmd ${ns_a} ip addr add ${dummy6_0_addr}/${dummy6_mask} dev dummy0
+ run_cmd ${ns_a} ip addr add ${dummy6_1_addr}/${dummy6_mask} dev dummy1
fail=0
# Create vti6 interface bound to device, passing MTU, check it
- ${ns_a} ip link add vti6_a mtu 1300 type vti6 remote ${dummy6_0_addr} local ${dummy6_0_addr}
+ run_cmd ${ns_a} ip link add vti6_a mtu 1300 type vti6 remote ${dummy6_0_addr} local ${dummy6_0_addr}
mtu="$(link_get_mtu "${ns_a}" vti6_a)"
if [ ${mtu} -ne 1300 ]; then
err " vti6 MTU ${mtu} doesn't match configured value 1300"
# Move to another device with different MTU, without passing MTU, check
# MTU is adjusted
- ${ns_a} ip link set vti6_a type vti6 remote ${dummy6_1_addr} local ${dummy6_1_addr}
+ run_cmd ${ns_a} ip link set vti6_a type vti6 remote ${dummy6_1_addr} local ${dummy6_1_addr}
mtu="$(link_get_mtu "${ns_a}" vti6_a)"
if [ ${mtu} -ne $((3000 - 40)) ]; then
err " vti MTU ${mtu} is not dummy MTU 3000 minus IPv6 header length"
fi
# Move it back, passing MTU, check MTU is not overridden
- ${ns_a} ip link set vti6_a mtu 1280 type vti6 remote ${dummy6_0_addr} local ${dummy6_0_addr}
+ run_cmd ${ns_a} ip link set vti6_a mtu 1280 type vti6 remote ${dummy6_0_addr} local ${dummy6_0_addr}
mtu="$(link_get_mtu "${ns_a}" vti6_a)"
if [ ${mtu} -ne 1280 ]; then
err " vti6 MTU ${mtu} doesn't match configured value 1280"
# Fill exception cache for multiple CPUs (2)
# we can always use inner IPv4 for that
for cpu in ${cpu_list}; do
- taskset --cpu-list ${cpu} ${ns_a} ping -q -M want -i 0.1 -w 1 -s $((${ll_mtu} + 500)) ${tunnel4_b_addr} > /dev/null
+ run_cmd taskset --cpu-list ${cpu} ${ns_a} ping -q -M want -i 0.1 -w 1 -s $((${ll_mtu} + 500)) ${tunnel4_b_addr}
done
${ns_a} ip link del dev veth_A-R1 &
exit 1
}
+################################################################################
+#
exitcode=0
desc=0
+
+while getopts :ptv o
+do
+ case $o in
+ p) PAUSE_ON_FAIL=yes;;
+ v) VERBOSE=1;;
+ t) if which tcpdump > /dev/null 2>&1; then
+ TRACING=1
+ else
+ echo "=== tcpdump not available, tracing disabled"
+ fi
+ ;;
+ *) usage;;
+ esac
+done
+shift $(($OPTIND-1))
+
IFS="
"
-tracing=0
for arg do
- if [ "${arg}" != "${arg#--*}" ]; then
- opt="${arg#--}"
- if [ "${opt}" = "trace" ]; then
- if which tcpdump > /dev/null 2>&1; then
- tracing=1
- else
- echo "=== tcpdump not available, tracing disabled"
- fi
- else
- usage
- fi
- else
- # Check first that all requested tests are available before
- # running any
- command -v > /dev/null "test_${arg}" || { echo "=== Test ${arg} not found"; usage; }
- fi
+ # Check first that all requested tests are available before running any
+ command -v > /dev/null "test_${arg}" || { echo "=== Test ${arg} not found"; usage; }
done
trap cleanup EXIT
(
unset IFS
+
+ if [ "$VERBOSE" = "1" ]; then
+ printf "\n##########################################################################\n\n"
+ fi
+
eval test_${name}
ret=$?
cleanup
printf "TEST: %-60s [ OK ]\n" "${t}"
elif [ $ret -eq 1 ]; then
printf "TEST: %-60s [FAIL]\n" "${t}"
+ if [ "${PAUSE_ON_FAIL}" = "yes" ]; then
+ echo
+ echo "Pausing. Hit enter to continue"
+ read a
+ fi
err_flush
exit 1
elif [ $ret -eq 2 ]; then
# SPDX-License-Identifier: GPL-2.0
# Makefile for netfilter selftests
-TEST_PROGS := nft_trans_stress.sh nft_nat.sh
+TEST_PROGS := nft_trans_stress.sh nft_nat.sh bridge_brouter.sh
include ../lib.mk
--- /dev/null
+#!/bin/bash
+#
+# This test is for bridge 'brouting', i.e. make some packets being routed
+# rather than getting bridged even though they arrive on interface that is
+# part of a bridge.
+
+# eth0 br0 eth0
+# setup is: ns1 <-> ns0 <-> ns2
+
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+ret=0
+
+ebtables -V > /dev/null 2>&1
+if [ $? -ne 0 ];then
+ echo "SKIP: Could not run test without ebtables"
+ exit $ksft_skip
+fi
+
+ip -Version > /dev/null 2>&1
+if [ $? -ne 0 ];then
+ echo "SKIP: Could not run test without ip tool"
+ exit $ksft_skip
+fi
+
+ip netns add ns0
+ip netns add ns1
+ip netns add ns2
+
+ip link add veth0 netns ns0 type veth peer name eth0 netns ns1
+if [ $? -ne 0 ]; then
+ echo "SKIP: Can't create veth device"
+ exit $ksft_skip
+fi
+ip link add veth1 netns ns0 type veth peer name eth0 netns ns2
+
+ip -net ns0 link set lo up
+ip -net ns0 link set veth0 up
+ip -net ns0 link set veth1 up
+
+ip -net ns0 link add br0 type bridge
+if [ $? -ne 0 ]; then
+ echo "SKIP: Can't create bridge br0"
+ exit $ksft_skip
+fi
+
+ip -net ns0 link set veth0 master br0
+ip -net ns0 link set veth1 master br0
+ip -net ns0 link set br0 up
+ip -net ns0 addr add 10.0.0.1/24 dev br0
+
+# place both in same subnet, ns1 and ns2 connected via ns0:br0
+for i in 1 2; do
+ ip -net ns$i link set lo up
+ ip -net ns$i link set eth0 up
+ ip -net ns$i addr add 10.0.0.1$i/24 dev eth0
+done
+
+test_ebtables_broute()
+{
+ local cipt
+
+ # redirect is needed so the dstmac is rewritten to the bridge itself,
+ # ip stack won't process OTHERHOST (foreign unicast mac) packets.
+ ip netns exec ns0 ebtables -t broute -A BROUTING -p ipv4 --ip-protocol icmp -j redirect --redirect-target=DROP
+ if [ $? -ne 0 ]; then
+ echo "SKIP: Could not add ebtables broute redirect rule"
+ return $ksft_skip
+ fi
+
+ # ping netns1, expected to not work (ip forwarding is off)
+ ip netns exec ns1 ping -q -c 1 10.0.0.12 > /dev/null 2>&1
+ if [ $? -eq 0 ]; then
+ echo "ERROR: ping works, should have failed" 1>&2
+ return 1
+ fi
+
+ # enable forwarding on both interfaces.
+ # neither needs an ip address, but at least the bridge needs
+ # an ip address in same network segment as ns1 and ns2 (ns0
+ # needs to be able to determine route for to-be-forwarded packet).
+ ip netns exec ns0 sysctl -q net.ipv4.conf.veth0.forwarding=1
+ ip netns exec ns0 sysctl -q net.ipv4.conf.veth1.forwarding=1
+
+ sleep 1
+
+ ip netns exec ns1 ping -q -c 1 10.0.0.12 > /dev/null
+ if [ $? -ne 0 ]; then
+ echo "ERROR: ping did not work, but it should (broute+forward)" 1>&2
+ return 1
+ fi
+
+ echo "PASS: ns1/ns2 connectivity with active broute rule"
+ ip netns exec ns0 ebtables -t broute -F
+
+ # ping netns1, expected to work (frames are bridged)
+ ip netns exec ns1 ping -q -c 1 10.0.0.12 > /dev/null
+ if [ $? -ne 0 ]; then
+ echo "ERROR: ping did not work, but it should (bridged)" 1>&2
+ return 1
+ fi
+
+ ip netns exec ns0 ebtables -t filter -A FORWARD -p ipv4 --ip-protocol icmp -j DROP
+
+ # ping netns1, expected to not work (DROP in bridge forward)
+ ip netns exec ns1 ping -q -c 1 10.0.0.12 > /dev/null 2>&1
+ if [ $? -eq 0 ]; then
+ echo "ERROR: ping works, should have failed (icmp forward drop)" 1>&2
+ return 1
+ fi
+
+ # re-activate brouter
+ ip netns exec ns0 ebtables -t broute -A BROUTING -p ipv4 --ip-protocol icmp -j redirect --redirect-target=DROP
+
+ ip netns exec ns2 ping -q -c 1 10.0.0.11 > /dev/null
+ if [ $? -ne 0 ]; then
+ echo "ERROR: ping did not work, but it should (broute+forward 2)" 1>&2
+ return 1
+ fi
+
+ echo "PASS: ns1/ns2 connectivity with active broute rule and bridge forward drop"
+ return 0
+}
+
+# test basic connectivity
+ip netns exec ns1 ping -c 1 -q 10.0.0.12 > /dev/null
+if [ $? -ne 0 ]; then
+ echo "ERROR: Could not reach ns2 from ns1" 1>&2
+ ret=1
+fi
+
+ip netns exec ns2 ping -c 1 -q 10.0.0.11 > /dev/null
+if [ $? -ne 0 ]; then
+ echo "ERROR: Could not reach ns1 from ns2" 1>&2
+ ret=1
+fi
+
+if [ $ret -eq 0 ];then
+ echo "PASS: netns connectivity: ns1 and ns2 can reach each other"
+fi
+
+test_ebtables_broute
+ret=$?
+for i in 0 1 2; do ip netns del ns$i;done
+
+exit $ret
# Kselftest framework requirement - SKIP code is 4.
ksft_skip=4
ret=0
+test_inet_nat=true
nft --version > /dev/null 2>&1
if [ $? -ne 0 ];then
test_local_dnat6()
{
+ local family=$1
local lret=0
+ local IPF=""
+
+ if [ $family = "inet" ];then
+ IPF="ip6"
+ fi
+
ip netns exec ns0 nft -f - <<EOF
-table ip6 nat {
+table $family nat {
chain output {
type nat hook output priority 0; policy accept;
- ip6 daddr dead:1::99 dnat to dead:2::99
+ ip6 daddr dead:1::99 dnat $IPF to dead:2::99
}
}
EOF
if [ $? -ne 0 ]; then
- echo "SKIP: Could not add add ip6 dnat hook"
+ echo "SKIP: Could not add add $family dnat hook"
return $ksft_skip
fi
fi
done
- test $lret -eq 0 && echo "PASS: ipv6 ping to ns1 was NATted to ns2"
+ test $lret -eq 0 && echo "PASS: ipv6 ping to ns1 was $family NATted to ns2"
ip netns exec ns0 nft flush chain ip6 nat output
return $lret
test_local_dnat()
{
+ local family=$1
local lret=0
-ip netns exec ns0 nft -f - <<EOF
-table ip nat {
+ local IPF=""
+
+ if [ $family = "inet" ];then
+ IPF="ip"
+ fi
+
+ip netns exec ns0 nft -f - <<EOF 2>/dev/null
+table $family nat {
chain output {
type nat hook output priority 0; policy accept;
- ip daddr 10.0.1.99 dnat to 10.0.2.99
+ ip daddr 10.0.1.99 dnat $IPF to 10.0.2.99
}
}
EOF
+ if [ $? -ne 0 ]; then
+ if [ $family = "inet" ];then
+ echo "SKIP: inet nat tests"
+ test_inet_nat=false
+ return $ksft_skip
+ fi
+ echo "SKIP: Could not add add $family dnat hook"
+ return $ksft_skip
+ fi
+
# ping netns1, expect rewrite to netns2
ip netns exec ns0 ping -q -c 1 10.0.1.99 > /dev/null
if [ $? -ne 0 ]; then
fi
done
- test $lret -eq 0 && echo "PASS: ping to ns1 was NATted to ns2"
+ test $lret -eq 0 && echo "PASS: ping to ns1 was $family NATted to ns2"
- ip netns exec ns0 nft flush chain ip nat output
+ ip netns exec ns0 nft flush chain $family nat output
reset_counters
ip netns exec ns0 ping -q -c 1 10.0.1.99 > /dev/null
fi
done
- test $lret -eq 0 && echo "PASS: ping to ns1 OK after nat output chain flush"
+ test $lret -eq 0 && echo "PASS: ping to ns1 OK after $family nat output chain flush"
return $lret
}
test_masquerade6()
{
+ local family=$1
local lret=0
ip netns exec ns0 sysctl net.ipv6.conf.all.forwarding=1 > /dev/null
# add masquerading rule
ip netns exec ns0 nft -f - <<EOF
-table ip6 nat {
+table $family nat {
chain postrouting {
type nat hook postrouting priority 0; policy accept;
meta oif veth0 masquerade
}
}
EOF
+ if [ $? -ne 0 ]; then
+ echo "SKIP: Could not add add $family masquerade hook"
+ return $ksft_skip
+ fi
+
ip netns exec ns2 ping -q -c 1 dead:1::99 > /dev/null # ping ns2->ns1
if [ $? -ne 0 ] ; then
- echo "ERROR: cannot ping ns1 from ns2 with active ipv6 masquerading"
+ echo "ERROR: cannot ping ns1 from ns2 with active $family masquerading"
lret=1
fi
fi
done
- ip netns exec ns0 nft flush chain ip6 nat postrouting
+ ip netns exec ns0 nft flush chain $family nat postrouting
if [ $? -ne 0 ]; then
- echo "ERROR: Could not flush ip6 nat postrouting" 1>&2
+ echo "ERROR: Could not flush $family nat postrouting" 1>&2
lret=1
fi
- test $lret -eq 0 && echo "PASS: IPv6 masquerade for ns2"
+ test $lret -eq 0 && echo "PASS: $family IPv6 masquerade for ns2"
return $lret
}
test_masquerade()
{
+ local family=$1
local lret=0
ip netns exec ns0 sysctl net.ipv4.conf.veth0.forwarding=1 > /dev/null
# add masquerading rule
ip netns exec ns0 nft -f - <<EOF
-table ip nat {
+table $family nat {
chain postrouting {
type nat hook postrouting priority 0; policy accept;
meta oif veth0 masquerade
}
}
EOF
+ if [ $? -ne 0 ]; then
+ echo "SKIP: Could not add add $family masquerade hook"
+ return $ksft_skip
+ fi
+
ip netns exec ns2 ping -q -c 1 10.0.1.99 > /dev/null # ping ns2->ns1
if [ $? -ne 0 ] ; then
- echo "ERROR: cannot ping ns1 from ns2 with active ip masquerading"
+ echo "ERROR: cannot ping ns1 from ns2 with active $family masquerading"
lret=1
fi
fi
done
- ip netns exec ns0 nft flush chain ip nat postrouting
+ ip netns exec ns0 nft flush chain $family nat postrouting
if [ $? -ne 0 ]; then
- echo "ERROR: Could not flush nat postrouting" 1>&2
+ echo "ERROR: Could not flush $family nat postrouting" 1>&2
lret=1
fi
- test $lret -eq 0 && echo "PASS: IP masquerade for ns2"
+ test $lret -eq 0 && echo "PASS: $family IP masquerade for ns2"
return $lret
}
test_redirect6()
{
+ local family=$1
local lret=0
ip netns exec ns0 sysctl net.ipv6.conf.all.forwarding=1 > /dev/null
# add redirect rule
ip netns exec ns0 nft -f - <<EOF
-table ip6 nat {
+table $family nat {
chain prerouting {
type nat hook prerouting priority 0; policy accept;
meta iif veth1 meta l4proto icmpv6 ip6 saddr dead:2::99 ip6 daddr dead:1::99 redirect
}
}
EOF
+ if [ $? -ne 0 ]; then
+ echo "SKIP: Could not add add $family redirect hook"
+ return $ksft_skip
+ fi
+
ip netns exec ns2 ping -q -c 1 dead:1::99 > /dev/null # ping ns2->ns1
if [ $? -ne 0 ] ; then
- echo "ERROR: cannot ping ns1 from ns2 with active ip6 redirect"
+ echo "ERROR: cannot ping ns1 from ns2 via ipv6 with active $family redirect"
lret=1
fi
fi
done
- ip netns exec ns0 nft delete table ip6 nat
+ ip netns exec ns0 nft delete table $family nat
if [ $? -ne 0 ]; then
- echo "ERROR: Could not delete ip6 nat table" 1>&2
+ echo "ERROR: Could not delete $family nat table" 1>&2
lret=1
fi
- test $lret -eq 0 && echo "PASS: IPv6 redirection for ns2"
+ test $lret -eq 0 && echo "PASS: $family IPv6 redirection for ns2"
return $lret
}
test_redirect()
{
+ local family=$1
local lret=0
ip netns exec ns0 sysctl net.ipv4.conf.veth0.forwarding=1 > /dev/null
# add redirect rule
ip netns exec ns0 nft -f - <<EOF
-table ip nat {
+table $family nat {
chain prerouting {
type nat hook prerouting priority 0; policy accept;
meta iif veth1 ip protocol icmp ip saddr 10.0.2.99 ip daddr 10.0.1.99 redirect
}
}
EOF
+ if [ $? -ne 0 ]; then
+ echo "SKIP: Could not add add $family redirect hook"
+ return $ksft_skip
+ fi
+
ip netns exec ns2 ping -q -c 1 10.0.1.99 > /dev/null # ping ns2->ns1
if [ $? -ne 0 ] ; then
- echo "ERROR: cannot ping ns1 from ns2 with active ip redirect"
+ echo "ERROR: cannot ping ns1 from ns2 with active $family ip redirect"
lret=1
fi
fi
done
- ip netns exec ns0 nft delete table ip nat
+ ip netns exec ns0 nft delete table $family nat
if [ $? -ne 0 ]; then
- echo "ERROR: Could not delete nat table" 1>&2
+ echo "ERROR: Could not delete $family nat table" 1>&2
lret=1
fi
- test $lret -eq 0 && echo "PASS: IP redirection for ns2"
+ test $lret -eq 0 && echo "PASS: $family IP redirection for ns2"
return $lret
}
fi
reset_counters
-test_local_dnat
-test_local_dnat6
+test_local_dnat ip
+test_local_dnat6 ip6
+reset_counters
+$test_inet_nat && test_local_dnat inet
+$test_inet_nat && test_local_dnat6 inet
reset_counters
-test_masquerade
-test_masquerade6
+test_masquerade ip
+test_masquerade6 ip6
+reset_counters
+$test_inet_nat && test_masquerade inet
+$test_inet_nat && test_masquerade6 inet
reset_counters
-test_redirect
-test_redirect6
+test_redirect ip
+test_redirect6 ip6
+reset_counters
+$test_inet_nat && test_redirect inet
+$test_inet_nat && test_redirect6 inet
for i in 0 1 2; do ip netns del ns$i;done
"teardown": [
"$TC actions flush action pedit"
]
+ },
+ {
+ "id": "377e",
+ "name": "Add pedit action with RAW_OP offset u32",
+ "category": [
+ "actions",
+ "pedit",
+ "raw_op"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action pedit",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action pedit munge offset 12 u32 set 0x90abcdef",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions list action pedit | grep 'key '",
+ "matchPattern": "12: val 90abcdef mask 00000000",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action pedit"
+ ]
+ },
+ {
+ "id": "a0ca",
+ "name": "Add pedit action with RAW_OP offset u32 (INVALID)",
+ "category": [
+ "actions",
+ "pedit",
+ "raw_op"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action pedit",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action pedit munge offset 2 u32 set 0x12345678",
+ "expExitCode": "255",
+ "verifyCmd": "/bin/true",
+ "matchPattern": " ",
+ "matchCount": "0",
+ "teardown": [
+ "$TC actions flush action pedit"
+ ]
+ },
+ {
+ "id": "dd8a",
+ "name": "Add pedit action with RAW_OP offset u16 u16",
+ "category": [
+ "actions",
+ "pedit",
+ "raw_op"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action pedit",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action pedit munge offset 12 u16 set 0x1234 munge offset 14 u16 set 0x5678",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions list action pedit | grep 'key '",
+ "matchPattern": "val 12340000 mask 0000ffff.*val 00005678 mask ffff0000",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action pedit"
+ ]
+ },
+ {
+ "id": "53db",
+ "name": "Add pedit action with RAW_OP offset u16 (INVALID)",
+ "category": [
+ "actions",
+ "pedit",
+ "raw_op"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action pedit",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action pedit munge offset 15 u16 set 0x1234",
+ "expExitCode": "255",
+ "verifyCmd": "/bin/true",
+ "matchPattern": " ",
+ "matchCount": "0",
+ "teardown": [
+ "$TC actions flush action pedit"
+ ]
+ },
+ {
+ "id": "5c7e",
+ "name": "Add pedit action with RAW_OP offset u8 add value",
+ "category": [
+ "actions",
+ "pedit",
+ "raw_op"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action pedit",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action pedit ex munge offset 16 u8 add 0xf",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions list action pedit | grep 'key '",
+ "matchPattern": " 16: add 0f000000 mask 00ffffff",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action pedit"
+ ]
+ },
+ {
+ "id": "2893",
+ "name": "Add pedit action with RAW_OP offset u8 quad",
+ "category": [
+ "actions",
+ "pedit",
+ "raw_op"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action pedit",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action pedit munge offset 12 u8 set 0x12 munge offset 13 u8 set 0x34 munge offset 14 u8 set 0x56 munge offset 15 u8 set 0x78",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions list action pedit | grep 'key '",
+ "matchPattern": "val 12000000 mask 00ffffff.*val 00340000 mask ff00ffff.*val 00005600 mask ffff00ff.*val 00000078 mask ffffff00",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action pedit"
+ ]
+ },
+ {
+ "id": "3a07",
+ "name": "Add pedit action with RAW_OP offset u8-u16-u8",
+ "category": [
+ "actions",
+ "pedit",
+ "raw_op"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action pedit",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action pedit munge offset 0 u8 set 0x12 munge offset 1 u16 set 0x3456 munge offset 3 u8 set 0x78",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions list action pedit | grep 'key '",
+ "matchPattern": "val 12000000 mask 00ffffff.*val 00345600 mask ff0000ff.*val 00000078 mask ffffff00",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action pedit"
+ ]
+ },
+ {
+ "id": "ab0f",
+ "name": "Add pedit action with RAW_OP offset u16-u8-u8",
+ "category": [
+ "actions",
+ "pedit",
+ "raw_op"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action pedit",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action pedit munge offset 0 u16 set 0x1234 munge offset 2 u8 set 0x56 munge offset 3 u8 set 0x78",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions list action pedit | grep 'key '",
+ "matchPattern": "val 12340000 mask 0000ffff.*val 00005600 mask ffff00ff.*val 00000078 mask ffffff00",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action pedit"
+ ]
+ },
+ {
+ "id": "9d12",
+ "name": "Add pedit action with RAW_OP offset u32 set u16 clear u8 invert",
+ "category": [
+ "actions",
+ "pedit",
+ "raw_op"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action pedit",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action pedit munge offset 0 u32 set 0x12345678 munge offset 1 u16 clear munge offset 2 u8 invert",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions list action pedit | grep 'key '",
+ "matchPattern": "val 12345678 mask 00000000.*val 00000000 mask ff0000ff.*val 0000ff00 mask ffffffff",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action pedit"
+ ]
+ },
+ {
+ "id": "ebfa",
+ "name": "Add pedit action with RAW_OP offset overflow u32 (INVALID)",
+ "category": [
+ "actions",
+ "pedit",
+ "raw_op"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action pedit",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action pedit munge offset 0xffffffffffffffffffffffffffffffffffffffffff u32 set 0x1",
+ "expExitCode": "255",
+ "verifyCmd": "/bin/true",
+ "matchPattern": " ",
+ "matchCount": "0",
+ "teardown": [
+ "$TC actions flush action pedit"
+ ]
+ },
+ {
+ "id": "f512",
+ "name": "Add pedit action with RAW_OP offset u16 at offmask shift set",
+ "category": [
+ "actions",
+ "pedit",
+ "raw_op"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action pedit",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action pedit munge offset 12 u16 at 12 ffff 1 set 0xaaaa",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions list action pedit | grep 'key '",
+ "matchPattern": " 12: val aaaa0000 mask 0000ffff",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action pedit"
+ ]
+ },
+ {
+ "id": "c2cb",
+ "name": "Add pedit action with RAW_OP offset u32 retain value",
+ "category": [
+ "actions",
+ "pedit",
+ "raw_op"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action pedit",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action pedit munge offset 12 u32 set 0x12345678 retain 0xff00",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions list action pedit | grep 'key '",
+ "matchPattern": " 12: val 00005600 mask ffff00ff",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action pedit"
+ ]
+ },
+ {
+ "id": "86d4",
+ "name": "Add pedit action with LAYERED_OP eth set src & dst",
+ "category": [
+ "actions",
+ "pedit",
+ "layered_op"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action pedit",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action pedit ex munge eth src set 11:22:33:44:55:66 munge eth dst set ff:ee:dd:cc:bb:aa",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions list action pedit | grep 'key '",
+ "matchPattern": "eth\\+4: val 00001122 mask ffff0000.*eth\\+8: val 33445566 mask 00000000.*eth\\+0: val ffeeddcc mask 00000000.*eth\\+4: val bbaa0000 mask 0000ffff",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action pedit"
+ ]
+ },
+ {
+ "id": "c715",
+ "name": "Add pedit action with LAYERED_OP eth set src (INVALID)",
+ "category": [
+ "actions",
+ "pedit",
+ "layered_op"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action pedit",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action pedit ex munge eth src set %e:11:m2:33:x4:-5",
+ "expExitCode": "255",
+ "verifyCmd": "/bin/true",
+ "matchPattern": " ",
+ "matchCount": "0",
+ "teardown": [
+ "$TC actions flush action pedit"
+ ]
+ },
+ {
+ "id": "ba22",
+ "name": "Add pedit action with LAYERED_OP eth type set/clear sequence",
+ "category": [
+ "actions",
+ "pedit",
+ "layered_op"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action pedit",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action pedit ex munge eth type set 0x1 munge eth type clear munge eth type set 0x1 munge eth type clear munge eth type set 0x1 munge eth type clear munge eth type set 0x1 munge eth type clear munge eth type set 0x1 munge eth type clear munge eth type set 0x1 munge eth type clear",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions list action pedit | grep 'key '",
+ "matchPattern": "eth\\+12: val 00010000 mask 0000ffff.*eth\\+12: val 00000000 mask 0000ffff.*eth\\+12: val 00010000 mask 0000ffff.*eth\\+12: val 00000000 mask 0000ffff.*eth\\+12: val 00010000 mask 0000ffff.*eth\\+12: val 00000000 mask 0000ffff.*eth\\+12: val 00010000 mask 0000ffff.*eth\\+12: val 00000000 mask 0000ffff.*eth\\+12: val 00010000 mask 0000ffff.*eth\\+12: val 00000000 mask 0000ffff.*eth\\+12: val 00010000 mask 0000ffff.*eth\\+12: val 00000000 mask 0000ffff",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action pedit"
+ ]
+ },
+ {
+ "id": "5810",
+ "name": "Add pedit action with LAYERED_OP ip set src & dst",
+ "category": [
+ "actions",
+ "pedit",
+ "layered_op"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action pedit",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action pedit munge ip src set 18.52.86.120 munge ip dst set 18.52.86.120",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions list action pedit | grep 'key '",
+ "matchPattern": " 12: val 12345678 mask 00000000.* 16: val 12345678 mask 00000000",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action pedit"
+ ]
+ },
+ {
+ "id": "1092",
+ "name": "Add pedit action with LAYERED_OP ip set ihl & dsfield",
+ "category": [
+ "actions",
+ "pedit",
+ "layered_op"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action pedit",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action pedit munge ip ihl set 0xff munge ip dsfield set 0xff",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions list action pedit | grep 'key '",
+ "matchPattern": " 0: val 0f000000 mask f0ffffff.* 0: val 00ff0000 mask ff00ffff",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action pedit"
+ ]
+ },
+ {
+ "id": "02d8",
+ "name": "Add pedit action with LAYERED_OP ip set ttl & protocol",
+ "category": [
+ "actions",
+ "pedit",
+ "layered_op"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action pedit",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action pedit munge ip ttl set 0x1 munge ip protocol set 0xff",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions list action pedit | grep 'key '",
+ "matchPattern": " 8: val 01000000 mask 00ffffff.* 8: val 00ff0000 mask ff00ffff",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action pedit"
+ ]
+ },
+ {
+ "id": "3e2d",
+ "name": "Add pedit action with LAYERED_OP ip set ttl (INVALID)",
+ "category": [
+ "actions",
+ "pedit",
+ "layered_op"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action pedit",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action pedit munge ip ttl set 300",
+ "expExitCode": "255",
+ "verifyCmd": "/bin/true",
+ "matchPattern": " ",
+ "matchCount": "0",
+ "teardown": [
+ "$TC actions flush action pedit"
+ ]
+ },
+ {
+ "id": "31ae",
+ "name": "Add pedit action with LAYERED_OP ip ttl clear/set",
+ "category": [
+ "actions",
+ "pedit",
+ "layered_op"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action pedit",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action pedit munge ip ttl clear munge ip ttl set 0x1",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions list action pedit | grep 'key '",
+ "matchPattern": " 8: val 00000000 mask 00ffffff.* 8: val 01000000 mask 00ffffff",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action pedit"
+ ]
+ },
+ {
+ "id": "486f",
+ "name": "Add pedit action with LAYERED_OP ip set duplicate fields",
+ "category": [
+ "actions",
+ "pedit",
+ "layered_op"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action pedit",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action pedit munge ip ttl set 0x1 munge ip ttl set 0x1",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions list action pedit | grep 'key '",
+ "matchPattern": " 8: val 01000000 mask 00ffffff.* 8: val 01000000 mask 00ffffff",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action pedit"
+ ]
+ },
+ {
+ "id": "e790",
+ "name": "Add pedit action with LAYERED_OP ip set ce, df, mf, firstfrag, nofrag fields",
+ "category": [
+ "actions",
+ "pedit",
+ "layered_op"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action pedit",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action pedit munge ip ce set 0xff munge ip df set 0xff munge ip mf set 0xff munge ip firstfrag set 0xff munge ip nofrag set 0xff",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions list action pedit | grep 'key '",
+ "matchPattern": " 4: val 00008000 mask ffff7fff.* 4: val 00004000 mask ffffbfff.* 4: val 00002000 mask ffffdfff.* 4: val 00001f00 mask ffffe0ff.* 4: val 00003f00 mask ffffc0ff",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action pedit"
+ ]
+ },
+ {
+ "id": "6829",
+ "name": "Add pedit action with LAYERED_OP beyond ip set dport & sport",
+ "category": [
+ "actions",
+ "pedit",
+ "layered_op"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action pedit",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action pedit munge ip dport set 0x1234 munge ip sport set 0x5678",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions list action pedit | grep 'key '",
+ "matchPattern": " 20: val 00001234 mask ffff0000.* 20: val 56780000 mask 0000ffff",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action pedit"
+ ]
+ },
+ {
+ "id": "afd8",
+ "name": "Add pedit action with LAYERED_OP beyond ip set icmp_type & icmp_code",
+ "category": [
+ "actions",
+ "pedit",
+ "layered_op"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action pedit",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action pedit munge ip icmp_type set 0xff munge ip icmp_code set 0xff",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions list action pedit | grep 'key '",
+ "matchPattern": " 20: val ff000000 mask 00ffffff.* 20: val ff000000 mask 00ffffff",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action pedit"
+ ]
+ },
+ {
+ "id": "3143",
+ "name": "Add pedit action with LAYERED_OP beyond ip set dport (INVALID)",
+ "category": [
+ "actions",
+ "pedit",
+ "layered_op"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action pedit",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action pedit ex munge ip dport set 0x1234",
+ "expExitCode": "255",
+ "verifyCmd": "/bin/true",
+ "matchPattern": " ",
+ "matchCount": "0",
+ "teardown": [
+ "$TC actions flush action pedit"
+ ]
+ },
+ {
+ "id": "fc1f",
+ "name": "Add pedit action with LAYERED_OP ip6 set src & dst",
+ "category": [
+ "actions",
+ "pedit",
+ "layered_op"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action pedit",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action pedit ex munge ip6 src set 2001:0db8:0:f101::1 munge ip6 dst set 2001:0db8:0:f101::1",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions list action pedit | grep 'key '",
+ "matchPattern": "ipv6\\+8: val 20010db8 mask 00000000.*ipv6\\+12: val 0000f101 mask 00000000.*ipv6\\+16: val 00000000 mask 00000000.*ipv6\\+20: val 00000001 mask 00000000.*ipv6\\+24: val 20010db8 mask 00000000.*ipv6\\+28: val 0000f101 mask 00000000.*ipv6\\+32: val 00000000 mask 00000000.*ipv6\\+36: val 00000001 mask 00000000",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action pedit"
+ ]
+ },
+ {
+ "id": "6d34",
+ "name": "Add pedit action with LAYERED_OP ip6 dst retain value (INVALID)",
+ "category": [
+ "actions",
+ "pedit",
+ "layered_op"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action pedit",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action pedit ex munge ip6 dst set 2001:0db8:0:f101::1 retain 0xff0000",
+ "expExitCode": "255",
+ "verifyCmd": "/bin/true",
+ "matchPattern": " ",
+ "matchCount": "0",
+ "teardown": [
+ "$TC actions flush action pedit"
+ ]
+ },
+ {
+ "id": "6f5e",
+ "name": "Add pedit action with LAYERED_OP ip6 flow_lbl",
+ "category": [
+ "actions",
+ "pedit",
+ "layered_op"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action pedit",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action pedit ex munge ip6 flow_lbl set 0xfffff",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions list action pedit | grep 'key '",
+ "matchPattern": "ipv6\\+0: val 0007ffff mask fff80000",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action pedit"
+ ]
+ },
+ {
+ "id": "6795",
+ "name": "Add pedit action with LAYERED_OP ip6 set payload_len, nexthdr, hoplimit",
+ "category": [
+ "actions",
+ "pedit",
+ "layered_op"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action pedit",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action pedit ex munge ip6 payload_len set 0xffff munge ip6 nexthdr set 0xff munge ip6 hoplimit set 0xff",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions list action pedit | grep 'key '",
+ "matchPattern": "ipv6\\+4: val ffff0000 mask 0000ffff.*ipv6\\+4: val 0000ff00 mask ffff00ff.*ipv6\\+4: val 000000ff mask ffffff00",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action pedit"
+ ]
+ },
+ {
+ "id": "1442",
+ "name": "Add pedit action with LAYERED_OP tcp set dport & sport",
+ "category": [
+ "actions",
+ "pedit",
+ "layered_op"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action pedit",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action pedit ex munge tcp dport set 4789 munge tcp sport set 1",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions list action pedit | grep 'key '",
+ "matchPattern": "tcp\\+0: val 000012b5 mask ffff0000.*tcp\\+0: val 00010000 mask 0000ffff",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action pedit"
+ ]
+ },
+ {
+ "id": "b7ac",
+ "name": "Add pedit action with LAYERED_OP tcp sport set (INVALID)",
+ "category": [
+ "actions",
+ "pedit",
+ "layered_op"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action pedit",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action pedit ex munge tcp sport set -200",
+ "expExitCode": "255",
+ "verifyCmd": "/bin/true",
+ "matchPattern": " ",
+ "matchCount": "0",
+ "teardown": [
+ "$TC actions flush action pedit"
+ ]
+ },
+ {
+ "id": "cfcc",
+ "name": "Add pedit action with LAYERED_OP tcp flags set",
+ "category": [
+ "actions",
+ "pedit",
+ "layered_op"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action pedit",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action pedit ex munge tcp flags set 0x16",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions list action pedit | grep 'key '",
+ "matchPattern": "tcp\\+12: val 00160000 mask ff00ffff",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action pedit"
+ ]
+ },
+ {
+ "id": "3bc4",
+ "name": "Add pedit action with LAYERED_OP tcp set dport, sport & flags fields",
+ "category": [
+ "actions",
+ "pedit",
+ "layered_op"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action pedit",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action pedit ex munge tcp dport set 4789 munge tcp sport set 1 munge tcp flags set 0x1",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions list action pedit | grep 'key '",
+ "matchPattern": "tcp\\+0: val 000012b5 mask ffff0000.*tcp\\+0: val 00010000 mask 0000ffff.*tcp\\+12: val 00010000 mask ff00ffff",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action pedit"
+ ]
+ },
+ {
+ "id": "f1c8",
+ "name": "Add pedit action with LAYERED_OP udp set dport & sport",
+ "category": [
+ "actions",
+ "pedit",
+ "layered_op"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action pedit",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action pedit ex munge udp dport set 4789 munge udp sport set 4789",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions list action pedit | grep 'key '",
+ "matchPattern": "udp\\+0: val 000012b5 mask ffff0000.*udp\\+0: val 12b50000 mask 0000ffff",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action pedit"
+ ]
+ },
+ {
+ "id": "d784",
+ "name": "Add pedit action with mixed RAW/LAYERED_OP #1",
+ "category": [
+ "actions",
+ "pedit",
+ "layered_op",
+ "raw_op"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action pedit",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action pedit ex munge eth src set 11:22:33:44:55:66 munge ip ttl set 0xff munge tcp flags clear munge offset 15 u8 add 40 retain 0xf0 munge udp dport add 1",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions list action pedit | grep 'key '",
+ "matchPattern": "eth\\+4: val 00001122 mask ffff0000.*eth\\+8: val 33445566 mask 00000000.*ipv4\\+8: val ff000000 mask 00ffffff.*tcp\\+12: val 00000000 mask ff00ffff.* 12: add 00000020 mask ffffff0f.*udp\\+0: add 00000001 mask ffff0000",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action pedit"
+ ]
+ },
+ {
+ "id": "70ca",
+ "name": "Add pedit action with mixed RAW/LAYERED_OP #2",
+ "category": [
+ "actions",
+ "pedit",
+ "layered_op",
+ "raw_op"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action pedit",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action pedit ex munge eth src set 11:22:33:44:55:66 munge eth dst set ff:ee:dd:cc:bb:aa munge ip6 payload_len set 0xffff munge ip6 nexthdr set 0xff munge ip6 hoplimit preserve munge offset 0 u8 set 0x12 munge offset 1 u16 set 0x3456 munge offset 3 u8 set 0x78 munge ip ttl set 0xaa munge ip protocol set 0xff",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions list action pedit | grep 'key '",
+ "matchPattern": "eth\\+4: val 00001122 mask ffff0000.*eth\\+8: val 33445566 mask 00000000.*eth\\+0: val ffeeddcc mask 00000000.*eth\\+4: val bbaa0000 mask 0000ffff.*ipv6\\+4: val ffff0000 mask 0000ffff.*ipv6\\+4: val 0000ff00 mask ffff00ff.*ipv6\\+4: val 00000000 mask ffffffff.* 0: val 12000000 mask 00ffffff.* 0: val 00345600 mask ff0000ff.* 0: val 00000078 mask ffffff00.*ipv4\\+8: val aa000000 mask 00ffffff.*ipv4\\+8: val 00ff0000 mask ff00ffff",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action pedit"
+ ]
}
+
]
"$TC qdisc del dev $DEV2 ingress",
"/bin/rm $BATCH_FILE"
]
+ },
+ {
+ "id": "4cbd",
+ "name": "Try to add filter with duplicate key",
+ "category": [
+ "filter",
+ "flower"
+ ],
+ "setup": [
+ "$TC qdisc add dev $DEV2 ingress",
+ "$TC filter add dev $DEV2 protocol ip prio 1 parent ffff: flower dst_mac e4:11:22:11:4a:51 src_mac e4:11:22:11:4a:50 ip_proto tcp src_ip 1.1.1.1 dst_ip 2.2.2.2 action drop"
+ ],
+ "cmdUnderTest": "$TC filter add dev $DEV2 protocol ip prio 1 parent ffff: flower dst_mac e4:11:22:11:4a:51 src_mac e4:11:22:11:4a:50 ip_proto tcp src_ip 1.1.1.1 dst_ip 2.2.2.2 action drop",
+ "expExitCode": "2",
+ "verifyCmd": "$TC -s filter show dev $DEV2 ingress",
+ "matchPattern": "filter protocol ip pref 1 flower chain 0 handle",
+ "matchCount": "1",
+ "teardown": [
+ "$TC qdisc del dev $DEV2 ingress"
+ ]
}
]