Internet-Draft | Protocol for interactive low-latency med | July 2023 |
Liu, et al. | Expires 11 January 2024 | [Page] |
This document introduces a protocol used for allowing WebRTC-based pull, merge and switch of content supported by media transmission network.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [1].¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 2 January 2024.¶
Copyright (c) 2023 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
Emerging real-time interactive video/audio communication applications bring new challenges for existing protocols. This documents introduces the use cases, requirements and protocol for WebRTC-HTTP interactive low-latency multimedia transmission network over the Internet.¶
Interactive real-time media communication is getting popular with the rapid growth of short video, on-line education, on-line gaming and other similar applications. Some application providers build their own interactive real-time media communication network to support their applications yet facing high costs and technical issues. For example, interactive communication between users is unpredictable, which results in high costs when dedicated entity for interaction is used and the wastage of reserved resources for interaction.¶
To avoid the aforementioned issues and challenges, some other application providers attempt to use third party's interactive real-time media communication network provided by cloud operators. However, there are several challenges of existing protocol to support the above mentioned scenarios.¶
1. Interactive online broadcasting service is flexible and much more complicated compared with traditional media broadcasting service. For interactive online broadcasting applications, audiences may occasionally request to setup bidirectional real-time communication with the broadcaster and all the other audiences are expected to be able to receive the merged interactive media traffic containing the broadcaster and connected audience. To meet this end, there is a need for standardized signaling protocol which can support media stream merging,switching and pulling to support those complicated scenarios.¶
2. Applications such as interactive online broadcasting, short video, on-line education, on-line gaming are very delay sensitive. Thus, the protocols for media stream merging, switching and pulling are expected to be able to meet the latency requirement for those applications.¶
3. Nowadays, WebRTC is widely used in the multimedia ecosystem. The protocols for media stream merging,switching and pulling are expected to be able to compatible with WebRTC in order to deliver interactive media services to customers.¶
This section specifies the system architecture of the Interactive real-time media communication system.¶
The WHISP communication network can be provided by cloud providers. The communication network can provide fundamental capabilities of media stream, including media pulling. In addition, the network can also support capabilities such as media merging and media switching. The capabilities can be triggered by control server and server for media streaming merging, which can be provided by 3rd party. Based on those capabilities, the audience can receive corresponding media from broadcaster or merged media between broadcaster and requested audience for interaction seamlessly.¶
This section defines the signaling procedure of WHISP communication network.¶
Figure 2 shows the signaling procedure of Interactive real-time media communication among broadcaster, requested audience for interaction and other audiences. HTTP POST is used for the signaling in the aforementioned procedure. The broadcaster and audience firstly ingest their media streams to the interactive real-time media communications network. A audience wishes to interact with the broadcaster and thus sends a request to the control server for interaction. The control server processes the request and sends command for media merging to the server for media stream merging. Upon the receipt of merging request from control server, the server for media stream merging pulls the corresponding streams from both the broadcaster and the requested audience for interaction and processes with the media merging.¶
After the completion of media merging, the server for media stream merging ingests the merged media to the Interactive real-time media communication network which then sends the merged media to corresponding edge media distribution servers which connect the audiences who consume the media. After the distribution, the control server sends the command for media switching to the Interactive real-time media communication network. The network then forwards the switching signaling message to the edge node. Up the receipt of the signaling message, the edge node performs the media switching by ingesting the merged media to the audiences.¶
This section defines the signaling specification for the interactive real time media communication. In order to achieve the merging and switching functionalities for different media source, signaling messages need to be delivered to the corresponding entities (e.g. control server, edge node, etc) in order to perform the proper operations. All the messages below are transmitted using HTTP POST. The signaling message of interactive media control protocol is shown as follows:¶
To process with the signaling message, the corresponding entities need to identify the type of signaling message. This can be achieved via using message type which can be carried by the message header. The message types of Interactive media control protocol can be described as follows:¶
ID | Messages |
---|---|
0x0 | Merging |
0x1 | Switching |
0x2 | Pulling |
0x3 | Grabbing |
The message length indicates the total length of the message payload filed in bytes. Message payload contains the information for controlling media merging and media switching. The subsequent sub-section describes these two message types and related payload in detail.¶
Merging signaling message is used to request the server for media stream merging to perform media merging between a broadcaster and an audience. The merging signaling message is shown as follows:¶
The payload type field "/whisp/merging/endpoint" in the header indicates the merging signaling message. Main media decides the media-related parameters (such as video format) of the merged media and the secondary media needs to comply with the parameters when conducting merging. Merge template decides the video layout of the merged media when merging main media and secondary media. The merge template id represents the id of the merge template. Media ID represents the ID of an media. Amsid and vmsid stand for audio stream id and video stream id, respectively. The ID is comprised of a string which represents the unique ID of an media source and the format of media ID follows the definition in RFC 8830 [3]. The media URL represents the address of edge node which interacts with the audience and format of URL follows the definition in RFC 3986 [2].¶
Switching signaling message is used to instruct the Interactive real-time media communication system to perform media switching upon the receipt of the request from the control server. The switching signaling message is shown as follows:¶
The payload type field "/whisp/switching/endpoint" in the header indicates the switching signaling message. Source media contains the information regarding source media from the broadcaster. Destination media contains the information regarding destination media which is the merged media between the broadcaster and the requested audience for interaction. Each media contains the media ID, media URL.¶
The switch signaling message is sent to the edge node which manages the media delivery for the audience. If the edge node acknowledges the media switching, it re-directs the media content with the destination media using WebRTC protocol. Upon the receipt of the switching signaling message, the media transmission protocol decides time-stamp, information regarding I-frame, and optionally the sequence number to achieve the re-direction of the new merged media. This is to make sure that the audience can smoothly switch to the merged media without the negative impact on user experience.¶
Grabbing signaling message is used to instruct the Interactive real-time media communication system to switch edge node for audience, for example, in mobility scenario. In the mobility case, the Interactive real-time media communication system may decide to switch a more suitable edge node for media ingestion for an audience according the location information. The grabbing signaling message is shown as follows:¶
The grabbing signaling message is sent from Interactive real-time media communication system to the edge node. A new edge node firstly starts ingesting media to the audience. Meanwhile, it registers the service to the Interactive real-time media communication system. The system detects that the media ingesting service already exists and thus sends the grabbing signaling message to the old edge node. For the old edge node, the grabbing signaling message is used to instruct the node to drop the media ingestion to the audience. The error code indicates the reason for dropping. The reasons are shown below:¶
Reason | Code |
---|---|
0x0 | Dropped by Mobility |
0x1 | Proactive dropping |
0x2 | Passive dropping |
Dropped by Mobility indicates the case where a new edge node has taken place and ingests the media to the audience instead of the old edge node. Proactive dropping indicates the case where an edge node gets issues on the media ingestion and the audience can request for re-connection for the delivery of the media. Passive dropping indicates the case where the corresponding media has been banned and thus can not be ingested anymore.¶
Pulling signaling message is sent from audience to the edge node. Once the pulling signaling message is acknowledged, the edge node sends the corresponding media to the audience. The pulling signaling message is shown below:¶
The payload type field in the header indicates the pulling signaling message. The media URL indicates the address of the target media which can be obtained from the edge node.¶
The edge node allocates an media ID for the broadcaster or the requested audience for interaction so that the media can be uniquely identified in the communication system. Upon the receipt of the pulling signaling message, the edge node acknowledges the signaling message with the media ID which uniquely identifies the target media.¶
TBD.¶
The signaling messages defined in this document should be protected by security mechanism.¶