Internet-Draft | Media Header Extension Wireless Networks | July 2023 |
Kaippallimalil, et al. | Expires 6 January 2024 | [Page] |
Wireless networks like 5G cellular or Wi-Fi experience significant variations in link capacity over short intervals due to wireless channel conditions, interference, or the end-user's movement. These variations in capacity take place in the order of hundreds of milliseconds and is much too fast for end-to-end congestion signaling by itself to convey the changes for an application to adapt. Media applications on the other hand demand both high throughput and low latency, and are able to dynamically adjust the size and quality of a stream to match available network bandwidth. However, catering to such media flows over a radio link where the capacity changes rapidly requires the buffers and QoS in general to be managed carefully. This draft proposes to provide metadata about the media transported in each packet to allow the wireless network to manage radio resources optimally and to maximize network utilization while also improving application performance.¶
This draft discusses at a high level potential solution options to this problem and the trade-offs involved. The draft then defines a solution that uses a new UDP option to carry media metadata between a UDP source and destination. This option is compact and has low processing overhead at the wireless router.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 6 January 2024.¶
Copyright (c) 2023 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
Wireless networks inherently experience large variations in link capacity due to a number of factors. These include the change in wireless channel conditions, interference between proximate cells and channels or as a result of the end user's movement. These variations in link capacity take place in a short time in the order of hundreds of milliseconds. End-to-end congestion control at the IP layer does not react fast enough to these changes when a combination of high throughput and low latency are required. Media packets on the other hand can demand both high throughput and low latency, and many emerging applications are expected to increase the strain on radio network capacity and utilization. The application is able to adapt, but when the feedback signal (i.e., via end-to-end congestion signaling or application level feedback) is of low resolution and frequency compared to the rapid (but transient) changes in the wireless network, the result is that the application settles to a longer term average sending rate that is well below the capacity available. One option is for the application to increase the sending rate to match the radio network capacity available in theory. If the application increases the sending rate aggressively, it can result in packet loss because the radio network keeps smaller buffers to ensure low latency for these flows. Low latency for the media flow and maximal usage of radio network capacity without affecting media application performance is not easy to realize in practice.¶
With the aim of providing low latency, maximizing radio network resource utilization and improving media application performance, 3GPP studied QoS and other enhancements in the wireless network in [TR.23.700-60-3GPP]. The findings of the study are now standardized in [TR.23.501-3GPP]. The recommendations include providing the wireless network with information on groups of media packets that should be handled similarly (e.g., all packets of a video I-frame), the importance of media packet group relative to and other such groups of packets (defined as Media Data Unit (MDU) in Section 2) as well as delay and error tolerance.¶
The specification in [TR.23.501-3GPP] relies on inspecting RTP headers and using that information for packet classification in the radio network. However, further specification is needed for handling of fully encrypted media streams (RTP over QUIC, media over QUIC, RTP cryptex) and end-to-end flow aspects (i.e., feedback and packet pacing). These and other related gaps are covered in detail in Appendix A. The rest of this document focuses media header extensions in UDP for fully encrypted media packets. Appendix B discusses other solution options including DHCP, congestion control options.¶
Media packets that are fully encrypted and carry fragments of multiple media streams in a packet are not easy to classify since it depends on the sets of media being encoded and the application's choices on packetization of the various streams. Examining or inferring based on patterns or other heuristics is expensive, unreliable and defeats the goal of minimizing sojourn time in the wireless network. The simplest way is to examine metadata inserted by the application as a basis for classification in the wireless network. This is also inline with the recommendations in [RFC8558] that discuss explicit signals to on-path network elements. Section 4 proposes a set of metadata that the wireless network can use to optimize media packet forwarding in the wireless network.¶
Media payload and metadata maybe inserted by the application server in one of two ways. One option is for the application server (UDP source) to carry the metadata in an overlay path between application server and wireless node, and the inner packet carries the media payload. Alternatively, metadata is sent along with media payload by the application server, inspected on-path in the wireless network and terminated at the wireless client. The transport is designed to allow carrying metadata for a range of media transports including SRTP [RFC3711] and HTTP/3 media over QUIC. A new UDP option [I-D.ietf-tsvwg-udp-options] is proposed here to carry the metadata. The trade-off in terms of lookup efficiency, protocol overhead, the constraints for transporting metadata across trusted networks and other related aspects are discussed in Section 3.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
The following terms is used in this document:¶
Section 1 has outlined the the issue around changes in link capacity in a wireless network changes and the need for additional information to handle such flows in the wireless network. This section provides an end-to-end view of what the wireless network needs to optimize its resource handling and the actions of clients, servers and entities in the network to facilitate it.¶
Figure 1 outlines the scneario where a packet containing a media payload from a server (e.g., a media server or relay) is sent to a client (i.e., a wireless end point). Media metadata is carried along with the packet payload in UDP option MED. The wireless node inspects metadata but does not alter the UDP option. The client (UDP destination) can use timestamps for determining one way delay, received / dropped packets and other statistics that can be fed back to the server. The server in turn adjusts the sending rate and quality based on the feedback.¶
In Figure 1 the assumption is that the server and on-path wireless node that serves the client (wireless end point) are in two networks. These networks share a trust relationship that allows entities in these networks to exchange media metadata. The exposure of the media metadata is limited to authorized entities within the two networks. A trusted domain (e.g., as outlined in [RFC8799]) associated to the wireless and application networks with a public key and trust anchors within each network have the ability to perform operations to authorize, enroll, and manage nodes with specific policy and roles (i.e., server, wireless node, gateways) for managing media metadata handling in a secure manner. When the application (server, UDP source) and wireless network are not directly connected, a secure overlay network with encryption MUST be used between the two domains.¶
It is assumed here that the server and client in Figure 1 have completed signaling to setup the media session (e.g., using SDP, HTTP) prior to sending media packets. The UDP source (e.g., an Application Server) is responsible for inserting relevant metadata based on the media content of the packet and using the metadata format specified in Section 4. The metadata in the UDP option is inspected and used by the wireless node (e.g., a 3GPP UPF) to classifies using metadata in the packet along with other network policies. The metadata and its transport are designed to be efficient in processing and byte overhead per packet. The metadata is expected to work with any UDP media transport including RTP, SRTP and HTTP/3. Metadata parameters are encoded in binary format for compact representation. Details are in Section 4.¶
The UDP option and metadata defined in this specification must only be exchanged between entities that are trusted. The server (UDP source) and the wireless end point (UDP destination) trust the other to send/receive media. The server (UDP source) and Wireless Node (access router in wireless network) are configured with data that allow establishment of trust between the entities and the network(s) in between prior to the exchange of metadata using the UDP option defined in this specification. When there are insecure network segments in between, all packets that carry the metadata in the MED UDP option must be secured with encryption between these segments (e.g., secure GRE/VXLAN or MASQUE tunnel). Section 6 describes a few common deployments.¶
The application server (server in Figure 1) is responsible for inserting the metadata in the UDP option. The application server determines the importance and other metadata parameters based on the type of media encoded as well other information (e.g., configured information on destination wireless network, live feedback from the session). The application encrypts the payload (i.e., media content) in the UDP packet and adds the MED UDP option to be used in the wireless network (end point and wireless router). Entities on-path do not process the UDP option, but security gateways or other network entities at the boundary of a trust domain may remove the option if there is an untrusted network segment on-path. The wireless node receives UDP packet, inspects the metadata in the UDP option and applies local policies to the metadata to derive optimal scheduling and forwarding on the wireless path. The wireless node does not examine the content of the packet which may use various encrypted application transports like SRTP cryptex, HTTP/3 and may have variable number of media streams.¶
Media packets are encoded and formatted to enable efficient and reliable processing of the data at both the encoding and decoding endpoints. Media may consist of audio, live video, static pictures and overlaid objects among others. Each of these may have different tolerance to delays in the network, resiliency (i.e., the ability to recover from loss) or even subjective importance (e.g., a loss of a video base layer I-frame packets is more significant than enhanced layer P-frame). Media encoding is evolving continually and modern codecs use complex prediction structures and make various dynamic decisions in the encoding process. However, it is expected that there are differences in priority, delay and acceptable loss across sets of packets.¶
A media application that uses this specification provides a set of metadata about the media packet that an end point or authorized wireless network can inspect and provide feedback to the server, or to optimize handling during adverse radio conditions. Metadata for media packets are carried in a new UDP option discussed further in Section 5.¶
Metadata defined in Section 4.2 is broad enough to be applied regardless of whether the application uses RTP, HTTP or another application transport protocol.¶
The media application(server, UDP source) is responsible for and retains control over the metadata that is inserted at the UDP source. Feedback from the end point on packets received, latency and jitter may be used by the application to determine the sending rate, quality and other statistics of the data received at the UDP destination (e.g., via RTCP receiver report). The application may use heuristics or other algorithms on the feedback, explicit network congestion information, encoding characteristics of the media or other aspects of the data to obtain the desired handling in the wireless network. Details of the mechanisms an application uses is not in the scope of this document. The feedback provided allows the application server (or UDP sender) to remain in control and determine if there is any potential malicious or incoherent handling of media packets. In such cases, the the application server (or UDP sender) can revert to marking all packets with the same level of importance.¶
The media application only inserts metadata if the destination (wireless end point) is a device in a trusted wireless network. For example, a range of IP addresses that belong to the trusted wireless network. The wireless network verifies that a packet with MED UDP option metadata has originated from a trusted server. The wireless network that inspects metadata may defer or drop packets to optimize the use of radio resources.¶
The on-path wireless network entity that inspects metadata does not rely on packets arriving in order. The metadata itself should provide sufficient information and the network entity should factor in these assumptions when calculating jitter and burst length using the metadata in each packet. For example jitter may be calculated as a moving average across multiple packets and burst length should compensate for potential out-of-order packet arrivals especially towards the tail end of a burst.¶
Metadata is transported in a new UDP option, MED, defined in Section 5. The metadata in MED UDP option is carried in each packet that the application server (or UDP source) inserts. Thus, the wireless entity keeps some state information to use the metadata. For example, a sequence counter is used to track the set of packets that belong to a media data unit (MDU), and a series of timestamps may be used to derive jitter. The different metadata parameters are described below in Section 4.2.¶
This specification describes one set of metadata described as a profile. A Profile field makes this specification extendable to future specifications that describe a new metadata profile.¶
The media application provides a set of metadata about the content of the packet and the wireless network inspects the metadata and uses to optimize handling during adverse radio conditions. Some information that is useful to wireless networks include the importance of a packet (or a group of packets), the number of packets in a burst, timestamps and acceptable end-to-end latency of the packet. Importance of a packet (or group of packets) is useful to provide some flexibility to the radio scheduler to prioritize packets that are essential during low capacity intervals and to defer packets that can tolerate some additional delay, or even drop the packet. For example, if some set of packets carry a stored video image that is stored in advance, it may be able to tolerate some additional delay over a real-time video encoding carried in another stream. Only the media application is able to provide such information since even inspecting a clear media header (e.g., RTP packet carrying an I-frame fragment) does not provide the on-path network entity with sufficient information as whether that represents live media, the length of a data burst or the actual delay budget where the packet is useful for decoding.¶
The parameters below identify a minimum set that an on-path network entity can use for optimizing the use of wireless network resources.¶
This parameter allows for more metadata profiles to be carried by the MED UDP option. This specification only defines one profile.¶
0 0 1 2 3 4 +-+-+-+-+-+ | Profile | +-+-+-+-+-+ Value Meaning ----------------------------------------------------- 0 RESERVED 1 Basic - defined in this specification 2-31 Unassigned (assignable by IANA)¶
Specifications may define a new metadata format in future using one of the unassigned values.¶
Timestamp contains the wallclock time (absolute date and time) of transmission of the packet and is represented in a compact format where the first 16 bits represent seconds relative to 0h UTC on 1 January 1900, and the second 16 bits represent the fractional part of a second.¶
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+¶
A pair of timestamps S2 and S1 represent a time interval between them of (S2 - S1) that have sequential Packet counter values. The transmission time contained in the field may be used for network jitter calculations.¶
The Media Data Unit (MDU) sequence is a cyclical counter that has the same value for a set of packets identified by an application to be treated as a unit (i.e., an MDU), and is incremented for the next MDU.¶
0 0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+ | MDU Sequence | +-+-+-+-+-+-+-+-+¶
The wireless network uses this field to provide consistent treatment to the set of packets that belong to the same MDU. In some cases, based on the priority and tolerance to delay and loss, the wireless network may delay or drop the sequence of packets that has the same MDU sequence value. An MDU sequence of 8-bits means that there can be upto 256 (2^8) concurrent MDU sequences for a UDP source/destination pair that a wireless network can distinguish.¶
The MDU sequence value is not itself associated to any set of media properties. These media properties are defined in Importance, burst length and delay in the sections that follow.¶
This parameter provides a counter starting at "0" that is incremented for each subsequent packet belonging to a Media Data Unit (MDU).¶
0 1 2 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Packet Counter | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+¶
The delay between subsequent packets of an MDU may be averaged or otherwise used to extrapolate jitter in the arrival stream at the wireless node.¶
Importance represents the media characteristics of the set of packets that that form a media data unit (MDU) relative to the characteristics of another MDU. The characteristics represented in importance are the priority level, the ability to tolerate delay and transmission errors.¶
0 0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+ | L | D | P | +-+-+-+-+-+-+-+-+ Value Meaning ----------------------------------------------------- L Delay Tolerance 00 limited value if delayed 01 should be forwarded even if delayed D Inter-MDU Dependency 000 No dependency Information provided 001 Independent 010 Base MDU 011 Enhanced MDU (dependent on previous base MDU) P Priority level 001 high priority 010 medium priority 100 low priority¶
The application determines the priority of a packet in terms of how critical the loss of packets of an MDU is for a destination/decoding end. Some media frames may be extremely important but not as sensitive to delay, others may be important and should be delivered even past a delay deadline. There are various other factors such as packets with medium or lower priority and varying tolerance for delay that need to be considered.¶
The dependency flags indicate whether the packet is independent or dependent on packets of other MDUs. TBD - specification/behavior of the different values of priority.¶
The data burst field represents the number of byptes of data in a continuous burst of packets. This may be the result of a large amount of media encoded at a particular time. In many cases, the distribution of packets tend to be heavy tailed and this information, if available to the wireless network at the beginning of the burst, is useful to let the wireless network know so that it can plan for radio resources in advance. In RTP streams, a burst may for example represent the number of bytes to send in a video I-frame. However, in more complex encodings where the media in a packet belongs to multiple streams (e.g., AR/VR), the application should determine the length of a burst of data.¶
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Data Burst | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+¶
If the value is set to "zero", it indicates that the application does not provide the size of the data burst. All other values indicate the actual size of the data burst in bytes upto a maximum of 2^32 bytes. The wireless node keeps track of the number of bytes in each packet payload to determine the total number of bytes in a burst.¶
The delay budget represents an upper bound in milliseconds between the reception of the first packet of the MDU to the last packet of the MDU.¶
0 0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+ | Delay | +-+-+-+-+-+-+-+-+¶
The delay budget along with data burst and importance (priority) is used to convey to the wireless network in advance the duration of time over which the burst of packets is sent. This can allow the wireless scheduler to plan for the appropriate level of resources.¶
Metadata in this specification consists of the set of parameters in Section 4.2 and always uses Profile value of "1".¶
The application server (UDP source) inserts the metadata into each packet. The application server should only prepare metadata in UDP MED option if the UDP destination belongs to a wireless network that has a trust relationship with the application network. Importance, data burst and delay budget parameters are the same for all packets of an MDU (identified by an MDU sequence value for the UDP source/destination). The timestamp indicates the sending time of each packet while the packet counter is incremented for each packet in an MDU.¶
The wireless node that receives metadata in the UDP MED option should verify that it orinated from an application network with which it has a trust relationship. The metadata is used to prioritize, defer or drop packets of an MDU when radio resources are limited.¶
Transport of metadata between the application and wireless network may be based on one of several protocol options but it would be preferable to have one mechanism (or limited number) so that wireless network entities do not have to support a large number of options. Some considerations include the ease with which an application can encode the metadata in a transport header, compactness and efficiency for lookup in the wireless network as this is applied per packet, and the security of the metadata itself (not unique to wireless networks). In this specification, the media metadata is transported in UDP options. UDP transport of metadata is efficient and applicable to not only HTTP/3 media but also RTP/SRTP for any further extensions related to wireless networks.¶
A new UDP option, MED, that conforms to [I-D.ietf-tsvwg-udp-options] is defined to carry media metadata. Figure 2 shows the parameters in the MED UDP option. The Kind value for this option is (TBD - IANA assigned). The MED option is a SAFE option as it does not alter the UDP data payload in any manner and should therefore be assigned a value in the 0..191 range as defined in [I-D.ietf-tsvwg-udp-options]. The length of this option can be variable since another specification can define a new media "Profile" of a different length.¶
The MED UDP option in this specification has a size of 17 bytes. In this specification, the Profile option MUST be set to "1". Following the length field, 3 bits are left reserved (RES) for future use. The MDU sequence indicates the set of media data unit packets of the UDP/IP datagram 5-tuple). The MDU sequence value should be the same for all packets that form a media data unit (MDU) Other UDP/IP datagrams (e.g., from the same server to another client) that have the same value of MDU sequence represents a different MDU set. The Importance of a packet includes its priority relative to other MDUs of the same UDP/IP datagram (5-tuple). The Timestamp value in this option represents the transmission time of the packet and along with Packet counter may be used to derive latency and jitter information. For a media flow/sequence identified by IP 5-tuple, the MDU sequence is incremented for every subsequent MDU. The Packet counter represents a sequence of packets of an MDU and may be used along with timestamps to derive jitter. The wireless node does not attempt to sequence packets arriving out of order using the Packet counter. The Data burst when provided indicates the number of bytes of the MDU and this value remains the same for all packets of the MDU. The Delay field conveys the upper bound in milliseconds between the reception of the first packet of the MDU to the last packet of the MDU. All packets of an MDU have the same value of Delay.¶
The UDP source (application server) MUST NOT add the UDP MED option if the UDP destination (wireless client) does not belong to a wireless network that has a trust relationship with the application network. The wireless network MUST NOT use metadata in the UDP MED option of the UDP source (application server) does not belong to an application network that has a trust relationship with it. The wireless network MUST NOT remove the UDP MED option when forwarding the packet to the wireless node.¶
A security gateway at the boundary of an application network or wireless network that share a trust relationship should inspect the UDP MED option to ensure that the origin/destination network comply with the policies of the domain.¶
This section provides a few examples of common deployments and the use of the MED UDP option to carry media metadata.¶
In this deployment scenario, the UDP source (i.e., App Server) and the wireless network entity (i.e., Wireless Node) are within the same Data Center and within a secure network.¶
The UDP MED option is inserted by the Application Server and forwarded. The network in between is within the boundaries of the trust domain. The Wireless Node processes the metadata in the MED UDP option and forwards the packet to the client (wireless end point). The MED UDP option is used to calculate packet statistics, one way delay and jitter. The packet statistics and other information is sent to the application server which tunes the delivery of media.¶
In this deployment scenario, the UDP sender (i.e, App Server) and the wireless network entity (i.e., Wireless Node) have a trust relationship between them and security gateways are used to encrypt all traffic traversing an insecure network segment in between.¶
As in Section 6.1, the UDP MED option is inserted by the Application Server and forwarded. The security gateways encrypt the packet across the insecure network segment. The Wireless Node processes the metadata in the MED UDP option and forwards the packet to the client (wireless end point). The MED UDP option is used to calculate packet statistics, one way delay and jitter. The packet statistics and other information is sent to the application server which tunes the delivery of media.¶
Thanks to Tiru Reddy for extensive discussions on security, metadata and UDP options formats in this draft. Thanks to Dan Wing for input on security and reliability of messages for this draft. Xavier De Foy and the authors of this draft have discussed the similarities and differences of this draft with the MoQ draft for carrying media metadata.¶
IANA request to assign new kind from UDP option registry to be set by IANA for [I-D.ietf-tsvwg-udp-options].¶
Kind Length Meaning ----------------------------------------------------- TBA1 17 Media Metadata (MED)¶
Metadata in the UDP option MED must only be exchanged between entities that have a trust relationship that permits sending/receiving this UDP option.¶
Metadata in the MED UDP option MUST NOT be sent to a wireless network that does not have a trust relationship with the application network (UDP source). A wireless network that receives a MED UDP option MUST verify that the origin of the metadata is from a trusted network. After processing the MED option, the wireless network node MUST delete the option before forwarding the packet.¶
If the application network that sends the media packet with MED UDP option and the wireless network that receives the UDP packet/MED option are separated by an untrusted network, the traffic must be encrypted across the untrusted network segment. Security gateways at the boundary of the origin /destination networks SHOULD inspect to verify that the MED UDP option to verify that the origin or destination of the packet with UDP MED option are across the two trusted networks.¶
Section 1 outlined the issues around providing high throughput and low latency when link capacity fluctuates in very short periods of time as is the case in a wireless network. Some deployment examples are also shown in Section 6. This applies not only in wireless downstream, but also for upstream. [TR.22.847-3GPP], section 5.8 describes an Industry 4.0 use case which includes support for various aspects to optimize production some of which use enhanced media. Examples include monitor camera capture of robot movement, observation using VR glasses and related control signaling. Use cases include wireless upstream and downstream video, haptics and other media processing that require low latency. From an IP transport/protocol viewpoint, these examples additionally illustrate the need for a wireless end-point (UDP source) to provide classification information.¶
End-to-end congestion control reacts in the order of round trip times (RTT) while wireless capacity variations take place in the order of hundreds of milliseconds. When a wireless network provides low latency handling for flows while maximizing the use of all available bandwidth, it results in either packet drops or delays. The application is not able to adapt quickly enough when maximizing bandwidth use and packets may be dropped to keep the queues short/latency low.¶
Packets dropped due to short term (order of milliseconds) capacity fluctuation and the resulting feedback to the server (e.g., via RTCP) have the potential for the server (UDP source) to reduce the flow rate. Over time it could result in the application ramping the sending rate up and down, reducing the encoding quality of sent packets, or settling for a lower flow rate. None of the above result in higher quality media delivery. In 3GPP studies (see [TR.23.700-60-3GPP]) and most recent standards updates for QoS in [TR.23.501-3GPP], the approach considered is to prioritize what media frames are more/less important and drop media frames as a whole if absolutely necessary, and not just random packets. However, when fully encrypted packets such as with QUIC or RTP-cryptex [RFC9335] are sent it is not practical to inspect the media headers and classify packets into set of frames with priority/importance levels.¶
When addressing these gaps, solutions should also consider the evolution of media encoding, feedback for packet pacing, multipath, performance and security aspects.¶
This section provides an outline of some possible solutions approaches to handling media frames/media data units (MDU) identified in Section 1. The aim is to provide low latency, maximize radio network resource utilization and improve media application performance. The approaches considered here include providing metadata to MDU, assigning different DSCP values within a single media flow, as well as considering new congestion control handling. Each of them have different trade-offs to consider but these options are not mutually exclusive.¶
In this case the wireless router inspects metadata inserted by the media application and uses it for classification in the wireless network. Since media headers are encrypted, the application would provide this information in a header that is visible to the wireless router. One option is to send the metadata in a new UDP option and this is described in the main body of this document (see Section 5 for UDP transport details). Another option is to transport the metadata in a MASQUE tunnel between the media/application server and the wireless router.¶
Both approaches (new UDP option, MASQUE tunnel) can use the metadata defined in Section 4.2 and the main difference is how the metadata is transported between server/wireless router/end-point. The UDP option in this draft requires wireless and application networks to be deployed across networks with a degree of trust to exchange the metadata parameters. If the application and wireless network are not directly connected, a secure overlay network with encryption is necessary between the two domains. The packet that arrives at the wireless router contains metadata in the original form (i.e., all packets decrypted after exiting the overlay network). The demands on processing the metadata per packet by the wireless router are minimal as a result.¶
The transport of metadata MASQUE is similar, however it is encrypted end-to-end and terminated at the wireless router. The end-to-end encryption provides an inherently safe transmission of metadata but the wireless router has to decrypt the metadata in the MASQUE tunnel to process it. This can have a significant impact to performance/delay in classifying each packet. The MASQUE approach also requires the setup of the tunnel by the wireless router at the beginning of the media session which is additional configuration overhead (i.e., determining which upstream flows should trigger the initiation of the MASQUE tunnel).¶
Metadata used by the wireless router to classify packets into MDUs of different priorities, delay tolerance is used by the wireless network to optimize handling but the feedback from the client/end-point back to the server (e.g, via RTCP) can skew the behavior/sending rate of the server. For example, if a wireless network drops an entire media frame due to transient lack of bandwidth and this is reported back to the server, it should not be misunderstood by the server as extreme congestion and a subsequent reduction of sending rate. This is perhaps not what is desired to manage transient changes in bandwidth.¶
DSCP [RFC2474] could have offered an excellent solution if were possible to assign a separate code to categorize different media frames (audio frames, different sets of video frames, etc.). However, DSCP codes are relatively limited and additionally, it is not possible to convey a delay budget or related constraint that is valuable for a wireless scheduler. [RFC8837] has recommendations for using two DSCP values for WebRTC flows, however, they are for flows (media flow, data flow) and not at the granularity of media frames/MDU. Extending DSCP to the granularity of media frames (assuming enough codes available) has different implications that need to be looked at. It should also be assumed in this case that DSCP values are not overwritten (or re-classified) between the application and wireless networks, which may not always be a safe one.¶
Even if DSCP does not provide level of detail that metadata provides, it may be able to complement the overall solution for handling media along the lines indicated in [RFC8837].¶
One option would be to deploy media relays/proxies in close to the wireless network (for example, in an edge data center). The media relays/proxies would then use specific congestion control mechanisms that are developed for the wireless network in that network segment.¶
A congestion control solution between an application proxy and wireless end-host would still operate at different timescales. The metadata/DSCP information is used to optimize radio resource usage in very short timeframes (10-100 ms) while E2E congestion control can operate to stabilize over a longer timeframe. This option also implies the provisioning and deployment of proxies on-path which may add to the cost. In any case, this would be complementary to the metadata/DSCP based approach at the transport layer.¶
Some other solutions that can potentially be considered but have significant disadvantages:¶