Internet-Draft Per multicast flow Designated Forwarder July 2023
Sajassi, et al. Expires 11 January 2024 [Page]
Workgroup:
BESS WorkGroup
Internet-Draft:
draft-ietf-bess-evpn-per-mcast-flow-df-election-09
Published:
Intended Status:
Standards Track
Expires:
Authors:
Ali. Sajassi
Cisco Systems
Mankamana. Mishra
Cisco Systems
Samir. Thoria
Cisco Systems
Jorge. Rabadan
Nokia
John. Drake
Juniper Networks

Per multicast flow Designated Forwarder Election for EVPN

Abstract

[RFC7432] describes mechanism to elect designated forwarder (DF) at the granularity of (ESI, EVI) which is per VLAN (or per group of VLANs in case of VLAN bundle or VLAN-aware bundle service). However, the current level of granularity of per-VLAN is not adequate for some applications.[RFC8584] improves base line DF election by introducing HRW DF election. [RFC9251] introduces applicability of EVPN to Multicast flows, routes to sync them and a default DF election. This document is an extension to HRW base draft [RFC8584] and further enhances HRW algorithm for the Multicast flows to do DF election at the granularity of (ESI, VLAN, Mcast flow).

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 11 January 2024.

Table of Contents

1. Introduction

EVPN based All-Active multi-homing is becoming the basic building block for providing redundancy in next generation data center deployments as well as service provider access/aggregation networks. [RFC7432] defines the role of a designated forwarder as the node in the redundancy group that is responsible to forward Broadcast, Unknown unicast, Multicast (BUM) traffic on that Ethernet Segment (CE device or network) in All-Active multi-homing.

The default DF election mechanism allows selecting a DF at the granularity of (ES, VLAN) or (ES, VLAN bundle) for BUM traffic. While [RFC8584] improve on the default DF election procedure, some service provider residential applications require a finer granularity, where whole multicast flows are delivered on a single VLAN.


                            (Multicast sources)
                                     |
                                     |
                                   +---+
                                   |CE4|
                                   +---+
                                     |
                                     |
                               +-----+-----+
                  +------------|   PE-1    |------------+
                  |            |           |            |
                  |            +-----------+            |
                  |                                     |
                  |                   EVPN              |
                  |                                     |
                  |                                     |
                  | (DF)                           (NDF)|
            +-----------+                        +-----------+
            |  |EVI-1|  |                        |  |EVI-1|  |
            |   PE-2    |------------------------|   PE-3    |
            +-----------+                        +-----------+
                   AC1  \                       / AC2
                         \                     /
                          \      ESI-1        /
                           \                 /
                            \               /
                            +---------------+
                            |    CE2        |
                            +---------------+
                                   |
                                   |
                          (Multiple receivers)


                Figure 1: Multi-homing Network of EVPN
                          for IPTV deployments

Consider the above topology, which shows a typical residential deployment scenario, where multiple receivers are behind an all-active multihoming segments. All of the multicast traffic is provisioned on EVI-1. Assume PE-2 get elected as DF. According to [RFC7432], PE-2 will be responsible for forwarding multicast traffic to that Ethernet segment.

In this document, we propose an extension to the HRW base draft to allow DF election at the granularity of (ESI, VLAN, Mcast flow) which would allow multicast flows to be better distributed among redundancy group PEs to share the load.

2. Terminology

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119] .

With respect to EVPN, this document follows the terminology that has been defined in [RFC7432] and [RFC4601] for multicast terminology.

3. The DF Election Extended Community

[RFC8584] defines an extended community, which would be used for PEs in redundancy group to reach a consensus as to which DF election procedure is desired. A PE can notify other participating PEs in redundancy group about its willingness to support Per multicast flow base DF election capability by signaling a DF election extended community along with Ethernet-Segment Route (Type-4). The current proposal extends the existing extended community defined in [RFC8584]. This draft defines new a DF type.

4. HRW base per multicast flow EVPN DF election

This document is an extension of [RFC8584], so this draft does not repeat the description of HRW algorithm itself.

EVPN PE does the discovery of redundancy groups based on [RFC7432]. If redundancy group consists of N peering EVPN PE nodes, after the discovery all PEs build an unordered list of IP address of all the nodes in the redundancy group. The procedure defined in this draft does not require the list of PEs to be ordered. Address [i] denotes the IP address of the [i]th EVPN PE in redundancy group where (0 < i <= N ).

4.1. DF election for IGMP (S,G) membership request

The DF is the PE who has maximum weight for (S, G, V, Es) where

  • S - Multicast Source
  • G - Multicast Group
  • V - VLAN ID.
  • Es - Ethernet Segment Identifier

Address[i] is address of the ith PE. The PEs IP address length does not matter as only the lower-order 31 bits are modulo significant.

  1. Weight

    • The weight of PE(i) to (S,G,VLAN ID, Es) is calculated by function, weight (S,G,V, Es, Address(i)), where (0 < i <= N), PE(i) is the PE at ordinal i.
    • Weight (S,G,V, Es, Address(i)) = (1103515245. ((1103515245.Address(i) + 12345) XOR D(S,G,V,ESI))+12345) (mod 2^31)
    • In case of tie, the PE whose IP address is numerically least is chosen.
  2. Digest

    • D(S,G,V, Es) = CRC_32(S,G,V, Es)
    • Here D(S,G,V,Es) is the 31-bit digest (CRC_32 and discarding the MSB) of the Source IP, Group IP, Vlan ID and Es. The CRC MUST proceed as if the architecture is in network byte order (big-endian).

4.2. DF election for IGMP (*,G) membership request

The DF is the PE who has maximum weight for (G, V, Es) where

  • G - Multicast Group
  • V - VLAN ID.
  • Es - Ethernet Segment Identifier

Address[i] is address of the ith PE. The PEs IP address length does not matter as only the lower-order 31 bits are modulo significant.

  1. Weight

    • The weight of PE(i) to (G,VLAN ID, Es) is calculated by function, weight (G,V, Es, Address(i)), where (0 < i <= N), PE(i) is the PE at ordinal i.
    • Weight (G,V, Es, Address(i)) = (1103515245. ((1103515245.Address(i) + 12345) XOR D(G,V,ESI))+12345) (mod 2^31)
    • In case of tie, the PE whose IP address is numerically least is chosen.
  2. Digest

    • D(G,V, Es) = CRC_32(G,V, Es)
    • Here D(G,V,Es) is the 31-bit digest (CRC_32 and discarding the MSB) of the Group IP, Vlan ID and Es. The CRC MUST proceed as if the architecture is in network byte order (big-endian).

4.3. Default DF election procedure

Per multicast DF election procedure would be applicable only when host behind Attachment Circuit (of the Es) start sending IGMP membership requests. Membership requests are synced using procedure defined in [RFC9251], and each of the PE in redundancy group can use per flow DF election and create DF state per multicast flow. The HRW DF election "Type 1" procedure defined in [RFC8584] MUST be used for the Es DF election and SHOULD be performed on Es even before learning multicast membership request state. This default election procedure MUST be used at port level but will be overwritten by Per flow DF election as and when new membership request state are learnt.

5. Procedure to use per multicast flow DF election algorithm


                                     Multicast  Source
                                             |
                                             |
                                             |
                                             |
                                         +---------+
                          +--------------+  PE-4   +--------------+
                          |              |         |              |
                          |              +---------+              |
                          |                                       |
                          |              EVPN CORE                |
                          |                                       |
                          |                                       |
                          |                                       |
                      +---------+        +---------+         +---------+
                      |  PE-1   +--------+   PE-2  +---------+   PE-3  |
                      |  EVI-1  |        |  EVI-1  |         | EVI-1   |
                      +---------+        +---------+         +---------+
                           |__________________|___________________|
                         AC-1    ESI-1        | AC-2               AC-3
                                         +---------+
                                         |  CE-1   |
                                         |         |
                                         +---------+
                                              |
                                              |
                                              |
                                              |
                                      Multicast Receivers

                      Figure-2 : Multihomed network

Figure-2 shows multihomed network. Where EVPN PE-1, PE-2, PE-3 are multihomed to CE-1. Multiple multicast receivers are behind all active multihoming segment.

  1. PEs connected to the same Ethernet segment can automatically discover each other through exchange of the Ethernet Segment Route. This draft does not change any of this procedure, it still uses the procedure defined in [RFC7432].
  2. Each of the PEs in redundancy group advertise Ethernet segment route with extended community indicating their ability to participate in per multicast flow DF election procedure. Since Per multicast flow would not be applicable unless PE learns about membership request from receiver, there is a need to have the default DF election among PEs in redundancy group for BUM traffic. Until multicast membership state are learnt, we use the the DF election procedure in Section 4.3, namely HRW per (v,Es) as defined in [RFC8584] .
  3. When a receiver starts sending membership requests for (s1,g1), where s1 is multicast source address and g1 is multicast group address, CE-1 could hash membership request (IGMP join) to any of the PEs in redundancy group. Let's consider it is hashed to PE-2. [RFC9251] defines a procedure to sync IGMP join state among redundancy group of PEs. Now each of the PE would have information about membership request (s1,g1) and each of them run DF election procedure Section 4.1 to elect DF among participating PEs in redundancy group. Consider PE-2 gets elected as DF for multicast flow (s1,g1).

    1. PE-1 forwarding state would be nDF for flow (s1,g1) and DF for rest other BUM traffic.
    2. PE-2 forwarding state would be DF for flow (s1,g1) and nDF for rest other BUM traffic.
    3. PE-3 forwarding state would be nDF for flow (s1,g1) and rest other BUM traffic.
  4. As and when new multicast membership request comes, same procedure as above would continue.
  5. If Section 3 has DF type 4, For membership request (S,G) it MUST use Section 4.1 to elect DF among participating PEs. And membership request (*,G) MUST use Section 4.2 to elect DF among participating PEs.

6. Triggers for DF re-election

There are multiple triggers which can cause DF re-election. Some of the triggers could be

  1. Local ES going down due to physical failure or configuration change triggers DF re-election at peering PE.
  2. Detection of new PE through ES route.
  3. AC going up / down
  4. ESI change
  5. Remote PE removed / Down
  6. Local configuration change of DF election Type and peering PE consensus on new DF Type

This document does not provide any new mechanism to handle DF re-election procedure. It uses the existing mechanism defined in [RFC7432]. Whenever either of the triggers occur, a DF re-election would be done. and all of the flows would be redistributed among existing PEs in redundancy group for ES.

7. Security Considerations

The same Security Considerations described in [RFC7432] are valid for this document.

8. IANA Considerations

Allocation of DF type in DF extended community for EVPN.

9. Acknowledgement

Authors would like to acknowledge helpful comments and contributions of Luc Andre Burdet.

10. Normative References

[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/info/rfc2119>.
[RFC4601]
Fenner, B., Handley, M., Holbrook, H., and I. Kouvelas, "Protocol Independent Multicast - Sparse Mode (PIM-SM): Protocol Specification (Revised)", RFC 4601, DOI 10.17487/RFC4601, , <https://www.rfc-editor.org/info/rfc4601>.
[RFC7432]
Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A., Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, , <https://www.rfc-editor.org/info/rfc7432>.
[RFC8584]
Rabadan, J., Ed., Mohanty, S., Ed., Sajassi, A., Drake, J., Nagaraj, K., and S. Sathappan, "Framework for Ethernet VPN Designated Forwarder Election Extensibility", RFC 8584, DOI 10.17487/RFC8584, , <https://www.rfc-editor.org/info/rfc8584>.
[RFC9251]
Sajassi, A., Thoria, S., Mishra, M., Patel, K., Drake, J., and W. Lin, "Internet Group Management Protocol (IGMP) and Multicast Listener Discovery (MLD) Proxies for Ethernet VPN (EVPN)", RFC 9251, DOI 10.17487/RFC9251, , <https://www.rfc-editor.org/info/rfc9251>.

Authors' Addresses

Ali Sajassi
Cisco Systems
821 Alder Drive,
MILPITAS, CALIFORNIA 95035
United States
Mankamana Mishra
Cisco Systems
821 Alder Drive,
MILPITAS, CALIFORNIA 95035
United States
Samir Thoria
Cisco Systems
821 Alder Drive,
MILPITAS, CALIFORNIA 95035
United States
Jorge Rabadan
Nokia
777 E. Middlefield Road
Mountain View, CA 94043
United States
John Drake
Juniper Networks