In the tightly-coupled conferencing model, shown in Figure 23.3, each participant has a signaling relationship with a central conference server.. For example, when the central conference
Trang 1Chapter 23
Conferencing on the Internet
Conferencing involves communication among several users Multimedia conferencing, including audio, video, instant messaging, whiteboard sharing, and file transfer, is a popular service on the Internet and in enterprises Chat rooms where users exchange instant messages are an example of a conference service on the Internet The collaboration tools used in most enterprises are also examples of conferences
Thus, conferences are not limited to traditional unmoderated audio or video conferences They can include all types of media and can be moderated by using floor control mechanisms Conferencing is an important area for enterprises with employees working in different countries A conference system including collaboration tools can save much money and time
by reducing the need for face-to-face meetings where attendees need to travel great distances However, we are still far from having conference systems that can replace face-to-face meetings completely That is why there is much ongoing research in areas such as telepresence and virtual reality The goal is to make virtual interactions as close to real ones
as possible
23.1 Conferencing Standardization at the IETF
In the past, working groups such as MMUSIC did some work on conferencing (e.g., SDP was designed with multiparty sessions in mind) Lately, the working groups that have been active in this area have been SIPPING and XCON In fact, implementers sometimes find it confusing to have similar specifications in the same area coming from two different working groups Knowing the history behind conferencing standardization at the IETF will help readers understand how the specifications coming from both working groups relate among them
Initially, the SIPPING working group developed a set of specifications that described how to provide conferencing services using SIP Coming from the SIPPING working group, these specifications were, unsurprisingly, very much focused on SIP Pieces needed to build a complete conference service such as floor control and conference management mechanisms (beyond the simple ones SIP provides) were out of the scope of this work
The XCON working group was chartered to work on generalizing the work done in SIPPING so that different signaling protocols (not only SIP) could be used and to specify those missing pieces needed to build a complete conference system The charter was limited
to centralized conferences where clients connect to a central server following a star topology
´ıa- M ar t´ın
The 3G IP Multimedia Subsystem (IMS): Merging the Internet and the Cellular Worlds Third Edition
Gonzalo Camarillo and Miguel A Garc
© 2008 John Wiley & Sons, Ltd ISBN: 978- 0- 470- 51662- 1
Trang 2Conferences using different topologies such as full-meshed and cascaded conferences were left out of scope
The results of the work of these two working groups include two conferencing frame-works: the SIPPING conferencing framework and the XCON conferencing framework We discuss both of them, their differences, and how they relate to each other
23.2 The SIPPING Conferencing Framework
The SIPPING conferencing framework (specified in RFC 4353 [272]) describes three conferencing models: loosely coupled, fully distributed, and tightly coupled In the loosely-coupled conferencing model, shown in Figure 23.1, media streams are multicast Conference participants join the multicast group of the conference using, for example, IGMP (Internet Group Management Protocol, specified in RFC 3376 [95]) in order to receive media Conference participants do not typically have any signaling relationship between them Still, they can use SIP to invite new participants into the conference A SIP INVITE request sent
to a new participant would contain (in its body) all information needed to join the multicast group
Figure 23.1: The loosely-coupled conference model
In the fully-distributed conferencing model, shown in Figure 23.2, each participant has
a signaling relationship with all of the other participants in the conference Each participant sends media to all of the other participants
In the tightly-coupled conferencing model, shown in Figure 23.3, each participant has a signaling relationship with a central conference server The central conference server mixes the media received from different participants and distributes it to all of them
Of course, the three conferencing models just described are not the only models that can
be implemented with SIP Many other variants are possible For example, when the central conference server in a tightly-coupled conference is distributed among several SIP nodes, the resulting model is typically referred to as the cascaded conferencing model In any case, the SIPPING conferencing framework focuses on the tightly-coupled conferencing model; the rest of the models are considered to be out of scope of our work
Trang 323.2 THE SIPPING CONFERENCING FRAMEWORK 485
Figure 23.2: The fully-distributed conference model
Figure 23.3: The tightly-coupled conference model
23.2.1 Signaling Architecture
Figure 23.4 shows the signaling architecture proposed by the SIPPING conferencing framework The conference server consists of several logical functions: the conference policy, the conference policy server, and the focus, which includes the conference notification service
The conference policy is the set of rules that define a conference The conference policy includes information about the participants of the conference, the time and date when the conference will take place, the media streams the conference has, etc Participants manipulate
Trang 4Figure 23.4: Signaling architecture in the SIPPING framework
the conference policy (e.g., to add a video stream to an audio-only conference) through the conference policy server The protocol between participants and the conference policy server
is left unspecified
The focus interacts with the conference participants using SIP It acts as a user agent towards all of the participants The focus includes the conference notification service, which provides participants with information about the conference using the SIP event package for the conference state (specified in RFC 4575 [289]) This event package defines an XML-based format to convey conference-related information Figure 23.5 shows an example of a document that uses this format This document, which is mostly self-explanatory, describes
a conference and provides information about two of its participants: Bob and Alice Bob was kicked out from the conference because he experienced bad voice quality and Alice was brought in into the conference by Mike Note that even though the number of participants in the conference is 33 (see the <user-count> element), the document only provides detailed information about two of them (Bob and Alice) Conferencing servers can omit information about certain users for policy reasons
The XML document in Figure 23.5 is already fairly long, even though it only carries information about two users A document describing a large conference with many users would be much longer In principle, every time a small change occurs in the conference (e.g., one user leaves the conference), the conference notifications service would need to send a new large XML document that would very similar to the last one it sent (e.g., the only difference would be in the elements related to the user that left) This would result in a non-efficient bandwidth use
In order to avoid this situation, the SIP event package for conference state implements a mechanism for partial notifications The “state” attribute indicates whether an element carries full or partial information In addition, the “state” attribute can also indicate that an element
Trang 523.2 THE SIPPING CONFERENCING FRAMEWORK 487
<?xml version="1.0" encoding="UTF-8"?>
<conference-info
xmlns="urn:ietf:params:xml:ns:conference-info"
entity="URI}sips:conf233@example.com"
state="full" version="1">
<! CONFERENCE INFO >
<conference-description>
<subject>Agenda: This month’s goals</subject>
<service-uris>
<entry>
<uri>http://sharepoint/salesgroup/</uri>
<purpose>web-page</purpose>
</entry>
</service-uris>
</conference-description>
<! CONFERENCE STATE >
<conference-state>
<user-count>33</user-count>
</conference-state>
<! USERS >
<users>
<user entity="sip:bob@example.com" state="full">
<display-text>Bob Hoskins</display-text>
<! ENDPOINTS >
<endpoint entity="sip:bob@pc33.example.com">
<display-text>Bob’s Laptop</display-text>
<status>disconnected</status>
<disconnection-method>departed</disconnection-method>
<disconnection-info>
<when>2005-03-04T20:00:00Z</when>
<reason>bad voice quality</reason>
<by>sip:mike@example.com</by>
</disconnection-info>
<! MEDIA >
<media id="1">
<display-text>main audio</display-text>
<type>audio</type>
<label>34567</label>
<src-id>432424</src-id>
<status>sendrecv</status>
</media>
</endpoint>
</user>
Figure 23.5: Example of an XML-based conference description (part 1)
Trang 6<! USER >
<user entity="sip:alice@example.com" state="full">
<display-text>Alice</display-text>
<! ENDPOINTS >
<endpoint entity="sip:4kfk4j392jsu@example.com;grid=433kj4j3u">
<status>connected</status>
<joining-method>dialed-out</joining-method>
<joining-info>
<when>2005-03-04T20:00:00Z</when>
<by>sip:mike@example.com</by>
</joining-info>
<! MEDIA >
<media id="1">
<display-text>main audio</display-text>
<type>audio</type>
<label>34567</label>
<src-id>534232</src-id>
<status>sendrecv</status>
</media>
</endpoint>
</user>
</users>
</conference-info>
Figure 23.6: Example of an XML-based conference description (part 2)
has been deleted Accordingly, the “state” attribute can take on the following values: full, partial, or deleted An element with a “state” attribute with a value of partial carries only the information that has changed since the previous document was sent to the participant
If a parent element has a “state” of full, all of its child elements should also have a “state”
of full On the other hand, if a parent element has a “state” of partial, its child elements can have any “state” The default value for the “state” attribute is full The only elements that can carry a “state” attribute are <conference-info>, <users>, <user>, <endpoint>,
<sidebars-by-val>, and <sidebars-by-ref> In Figure 23.5, all of the “state” attributes have a value of full
23.2.2 Media Architecture
The SIPPING conferencing framework describes the following media plane realizations: centralized server, endpoint server, media server component, distributed mixing, and cascaded mixers
In the centralized-server model, a central server handles both signaling and media, as shown in Figure 23.3 In the endpoint-server model, one of the endpoints behaves as the central server in the centralized-server model, as shown in Figure 23.7 The endpoint-server model is typically the result of a two-party call between two endpoints that transitions into
an ad-hoc conference This is the case when the users involved in the original two-party call
decide to bring in one or more additional users into the call at some point
Trang 723.3 THE XCON CONFERENCING FRAMEWORK 489
Figure 23.7: The endpoint-server model
The endpoint-server model works well when the endpoint performing the mixing does not have processing, bandwidth, or battery constraints Conferences between endpoints with those constraints are better handled by a central server
In the media-server-component model, the central server of the centralized-server model
is divided into two servers: an application server and a mixing server The application server interacts with the conference participants but does not have mixing capabilities The mixing server performs the actual media mixing
The interface between the application server and the mixing server is based on SIP The application server can use SIP mechanisms such as third-party call control (specified in RFC
3725 [282]) to instruct the mixing server how to mix the conference’s media streams The SIPPING conferencing framework does not talk about distributed conference servers that use a protocol other than SIP (e.g., H.248 [189]) between the server handling SIP signaling (i.e., hosting the focus) and the server performing the mixing However, this model can be considered a special case of the centralized-server model in which the internal structure
of the server is distributed
In the distributed-mixing model, the central server of the centralized-server model handles signaling but not media The central server does not have any media mixing capabilities; instead, it instructs users to exchange media among them In this model, the conference server
is, effectively, a third-party call controller (as specified in RFC 3725 [282]) Figure 23.8 shows how, in this model, media can be exchanged using unicast or multicast
In the cascaded-servers model, the mixing functionality is distributed among several physical mixers The central server handling the signaling of the conference coordinates all of the mixers so that all users receive the conference’s media correctly
23.3 The XCON Conferencing Framework
As discussed earlier, the XCON working group was chartered to work on generalizing the work on conferencing performed on SIPPING, which was specific to SIP The XCON
Trang 8Figure 23.8: The distributed-mixing model
framework (specified in the Internet-Draft “A Framework and Data Model for Centralized Conferencing” [82]) defines the conferencing architecture shown in Figure 23.9 This figure shows a conference system able to host several conferences That is why the figure shows more than one conference object
23.3.1 Conference Objects
A conference object contains all of the information related to a given conference It is the same concept as the conference policy in the SIPPING conferencing framework (see Figure 23.4) with a different name
Figure 23.5 shows an example of the XML-based format to describe conference policies developed by the SIPPING working group (which is specified in RFC 4575 [289]) The XCON working group extended this format so that it can be used to describe more general conferences (i.e., not only SIP-based conferences) and to provide more information about a given conference (e.g., floor-control-related information was missing from the original format and was added by the XCON working group) The resulting format is referred to as the XCON data model (which is specified in the Internet-Draft “Conference Information Data Model for Centralized Conferencing (XCON)” [224])
The improvements in the XCON data model, with respect to the original format defined
by the SIPPING working group, include the ability to carry different types of URIs and the inclusion of information that relates to floor control, conference scheduling, and media controls (e.g., a control to mute a media stream)
In order to create a conference, it is necessary to create its conference object The initial values for the variables of a conference object are typically taken from a conference blueprint
A conference blueprint is a template to create conference objects For example, a conference
Trang 923.3 THE XCON CONFERENCING FRAMEWORK 491
Figure 23.9: XCON architecture
system may have a conference blueprint with the typical values to create an audio-only conference
23.3.2 Conference Control Server
Users can manipulate conference objects and, thus, the properties of any conference, using
a conference control protocol Such a protocol runs between the participant’s conference control client and the conference control server
The XCON working group is chartered to develop a conference control protocol We expect this working group to specify such a protocol in the future One of the main decisions concerning this protocol is whether it should follow a semantic approach or a syntactic approach
A semantically-oriented protocol would have primitives to perform conference-related operations such as create a conference, add a user to a conference, and remove a media stream from a conference Such primitives would have an effect on a conference object (which is described by an XML document) For example, the creation of a conference would create a new conference object The addition of a user would add a new <user> element to the XML document describing the conference object
A syntactically-oriented protocol would have primitives to operate directly at the XML level For example, in order to add a user to a conference, the protocol would directly instruct
Trang 10the conference control server to add a <user> element to the XML document describing the conference object
Both approaches have advantages and disadvantages A syntactically-oriented protocol may initially be more complex since it would need to provide general XML manipulation mechanisms On the other hand, it would not need to be extended in order to manipulate new data model elements that may be defined in the future A semantically-oriented protocol may initially be simpler and, in general, more efficient but would need to be extended in order to perform new operations Specifying policies (e.g., only the moderator can add new user into the conference) seems to be easier if the semantic approach is followed
The XCON working group started working on an XCAP-based protocol that followed a syntactic approach However, that protocol was abandoned and, at present, it seems that the conference control protocol to be developed by XCON will follow a semantic approach
23.3.3 Foci and Notification Service
As in the SIPPING conferencing framework, an XCON focus has a signaling relationship with the user agents in the conference However, in the XCON framework, a conference can have multiple foci; each one handling a different protocol (e.g., SIP and H.323)
In the SIPPING framework, both the focus and the notification service used SIP and, thus, were part of the same logical entity The XCON framework separates them into two different logical entities because they can use different protocols
As discussed earlier, the XCON data model extends the XML-based format used by the SIPPING notification service (which is specified in RFC 4575 [289]) The XCON notification service needs to be able to use this extended format (i.e., the XCON data model) in its notifications An extension to the SIP event package for conference state (also specified
in RFC 4575 [289]) has been defined so that the event package can carry information in the format specified in the XCON data model (this extension is specified in the Internet-Draft “Conference Event Package Data Format Extension for Centralized Conferencing (XCON)” [113])
23.3.4 Floor Control Server
Floor control is used to manage the access to a shared resource Examples of resources in a conferencing environment are a shared whiteboard, a video stream, and a voice stream The user that has the floor corresponding to a resource at a given moment is allowed to access the resource For example, the user that has the floor corresponding to a shared whiteboard, is allowed to draw on the whiteboard
It is important to note the difference between not being allowed to do something and actually being kept from doing it Let us think of a face-to-face conference where all participants have their own microphone The conference’s chair will indicate which participant can speak (e.g., to ask a question) at a given time However, the chair does not need to manage access to the microphones If the participants are polite enough, they will only talk into their microphones when they are told to by the chair
However, if participants start talking when it is not their turn, the chair may have to disable all of the microphones expect the one of the participant that has the floor at any given time Therefore, the fact that a conference uses floor control does not imply that floor-control-related decisions are enforced in any way They may or may not be enforced, depending on the environment