Internet-Draft | transport challenges | September 2023 |
Huang, et al. | Expires 15 March 2024 | [Page] |
This document discusses the challenges for improving the transmission quality when lack of information between network and application, and then provide some basic requirements that new synergy mechanisms should possess.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 15 March 2024.¶
Copyright (c) 2023 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
Currently, the Internet transport protocols are evolving rapidly. On one hand, this is due to the consideration of user privacy and security that drives the transport protocol evolution towards built-in encryption; On the other hand, TCP ossification caused by excessive intervention of intermediate devices is also frustrating the industry, and then e2e built-in encryption becomes the most popular design of new transport protocols. However, network and transport are not independent nor unrelated; they are closely rely on each other to work, thus there must have some synergy mechanisms between them to help the transmission work better. In the past, transport protocols like TCP enable the collaboration between network and application through plaintext message headers. But now, this is no longer possible in increasingly popular secure transport protocols like QUIC, and the industry urgently needs a new way to achieve this synergy.¶
This document discusses the challenges for improving the transmission quality when lack of information between network and application, and then provide some basic requirements that new synergy mechanisms should possess.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
DSCP is designed to ensure Quality of Service (QoS) for transmission in network by encoding the 6 bits in the header of an IP packet to classify service categories and achieve differentiated services. However, as the variety of Internet applications continues to increase, current differentiating services become coarse granularity, e.g., internet traffic is all treated as Best Effort, and network devices are unable to obtain effective and legitimate application information to forward the internet traffic appropriately with quality. For instance, service specific bandwidth, latency, or jitter requirements cannot be adequately met, resulting in relative poor end user experience. This is also pointed out in [I-D.kaippallimalil-tsvwg-media-hdr-wireless]. Even though DSCP is implemented in the real deployments agreed among service provider and ISPs, the benefit is quite limited due to the lack of information density. For example, the specific traffic paying for the good quality service still cannot get a satisfied improvement of quality during the busy hours.¶
At another point, network undifferentiated secheduling also affects some network functions to be fully utilized. An example would be the usage of CoMP (Coordinated Multipoint transmission/reception) in LTE scenarios, which is used to manage interference effect through collaborative processing among different cells or base stations, thereby improving network efficiency. In our experience of the intra-eNB deployment, if additional service level information, such as desired completion time and start/end signals, is provided, the CoMP success rate can be greatly improved and so does the network's goodput.¶
Application transmissions rely on network, thus network conditions greatly affect applications performance. Because current application and network are loosely cooperated and little information is shared, applications can only passively make speculative adjustments through end-to-end feedback. Such adjustments not only lack precision but also have lagged effect. This is discussed in following sections.¶
Current transport protocols increase the sending packets gradually through slow start, usually starting with a small initial window of around 10, to avoid injecting too much packets into the network. This has been effectively preventing network from collapse for decades. This also means bandwidth utilization is low during the slow start phase. It becomes significant with the widespread adoption of technologies such as 5G and fiber-to-the-home (FTTH). Particularly, when the network's BDP (Bandwidth Delay Product) increases, the duration of slow start becomes longer, resulting in poor transmission efficiency for show flows. In the test reports of [_5G], it is mentioned that BBR slow start phase lasts around 6s before it converges to the high network bandwidth in 5G mobile web browsing scenario. In [flash], it is highlighted that with a flow duration of 1 s (which transferred over 1 MB of data), the bandwidth efficiencies for Cubic and BBR were only 53% and 48%. This significantly impacts the transmission quality of short flow applications, e.g., mobile app dowload/update, cloud album, or first page loading of apps.¶
Current congestion control algorithms often rely on E2E feedback to infer the network state and adjust packet transmissions accordingly. However, in the case of RTT is relatively large, which is quite common in WAN scenarios, the increased transmission time in the network results in longer E2E feedback cycles, and the feedback signals may not reach the sender in a timely manner. In such situations, the sender is unable to accurately perceive congestion and make timely adjustments, leading to lower effective throughput in wide area and long-distance networks. Therefore, the effectiveness of performance adjustments may be adversely affected in these circumstances.¶
We conducted tests on the throughput performance of BBR and CUBIC under different network conditions, including 64 concurrent traffic, 2 Gbps link capacity, and varying levels of latency and packet loss. Under the scenario of a 5ms latency and a 0.01% packet loss rate, the total throughput of CUBIC has already dropped to less than 10% of the total bandwidth. BBR showed a significant enhancement in this scenario, achieving a throughput of over 50% even with a 5ms latency and a 0.1% packet loss rate. However, as the latency increased to 10ms, the throughput of BBR decreased to only about 30%, and further decreased to around 20% with a 15ms latency. It is evident that BBR improves overall throughput performance, it fails to fully utilize the available network resources as latency increases.¶
The problem is particularly prominent in heterogeneous environment, e.g., traffic aggregating across both data centers and WAN. The internal delay within the data center is short and allows for quick convergence, resulting in significant dynamic changes in bandwidth. On the other hand, the WAN side has a longer feedback period and slower convergence, making it challenging to accurately predict the bandwidth situation. As a result, the overall network resources and performance cannot be well balanced. This is also discussed in [Annulus] and [Cross-Datacenter].¶
As network coverage and diversity continue to improve, wide-area multipath application becomes a trend. 3GPP has already introduced Access Traffic Streering, Switching, and Splitting (ATSSS), which is one of the prevalent use case of network-assisted multipath transport. However, practical multiple path deployments often face the coexistence of high-quality and low-quality links, with different lost rate and RTT for different disjoint paths. Relying solely on e2e path congestion control to guess the network condition on each path, especially for highly dynamic wireless networks, can easily lead to traffic scheduling instability and suboptimal regimes. Quantifying the network behaviour precisely and taking advantage of it in multiple path mechanisms can be a way to achieve fast convergence and better experience.¶
Real networks have feature of segmented heterogeneity, such as the potential mutual influence between WAN-side traffic and data center internal traffic, or a traffic could go through comparatively less stable wireless segment inside enterprise or home broadband scenarios and stable fixed cable segment in WAN. However, one single set of parameters, or even one single congestion control algorithm cannot achieve optimized performance in such a complex enviroment. For instance, as the tests in [Pantheon], BBR can handle scenarios with random packet loss like 5G and Wi-Fi, but its throughput may not be as good as cubic in other situations, while cubic's throughput is poor in scenarios with random packet loss; In a multipath scenario, due to the dynamic and diverse nature of different paths, a fixed set of algorithm parameters may not achieve optimal performance. Currently, the work in IETF is mainly limited to idealized scenarios only relying on e2e feedback which has been used for decades, and has not extensively considered new ways, e.g., adaptive solutions for transport protocols, when traversing heterogeneous networks. And these new ways may require a good collaboration and information exchange between network and endpoints.¶
In conclusion, the improvement of transmission quality should not solely rely on passive heuristic network conditions at the endpoints. Further enhancement should involve the synergy between the network and the client side. Several requirements for this collaborative mechanism are listed as following:¶
ECN [rfc3168] is widely deployed in the industry that uses 2 bits in the IP header to convey congestion information. It combines with AQM mechanisms in network devices, setting the CE code point in the IP header to indicate congestion before the queue overflows, thereby notifying endpoints to reduce their sending rate. Futhermore, L4S [rfc9330] redefines the semantics of ECT(1) code point and isolates L4S traffic from traditional traffic through the usage of dual-queue AQM in the middlebox, to achieve low latency.¶
ECN and L4S are essentially the collaboration between network and endpoints to achieve the desired low loss and low latency goals. However, this approach cannot completely address the challenges described in Section 3. Additionally, as elaborated in [L4SinCellular], L4S is quite sensitivity to time varying network, such as wireless and Wi-Fi networks, which may make it difficult to simultaneously achieve high throughput and low latency in such environments. If more information is provided for collaboration, issues may be overcomed more easily.¶
This document has no security considerations.¶
This document has no IANA actions.¶