Friday, September 28, 2012

How to weather signalling storms

I was struck a few months back when I heard an anecdote from Telecom Italia about a signalling storm in their network, bringing unanticipated outages. After investigation, the operator found out that the launch of Angry bird on Android had a major difference with the iOS version. It was a free app monetized through advertisement. Ads were being requested and served between each levels (or retry).
 If you are like me, you can easily go through 10 or more levels (mmmh... retries|) in a minute. Each one of these created a request going to the ad server, which generated queries to the subscriber database, location, charging engine over diameter resulting in +351% diameter traffic.
The traffic generated by one chatty app brought the network to its knees withing days of its launch.



As video traffic congestion becomes more prevalent and we see operators starting to measure subscriber's satisfaction in that area, we have seen several solutions emerge (video optimization, RAN optimization, policy management, HSPA +  and LTE upgrades, new pricing models...).
Signalling congestion, by contrast remains an emerging issue. I sat yesterday with Tekelec's Director of Strategic Marketing, Joanne Steinberg to discuss the topic and what should operators do about it.
Tekelec recently (September 2012) released its LTE Diameter Signalling Index. This report projects that diameter traffic will increase at a +252% CAGR until 2016 from 800k to 46 million messages per second globally. This is due to a radical change in applications behavior, as well as the new pricing and business models put in place by operators. Policy management, QoS management, metered charging, 2 sided business models and M2M traffic are some of the culprits highlighted in the report.

Diameter is a protocol that was invented originally to replace SS7 Radius, for the main purposes of Authentication, Authorization and Accounting (AAA). Real time charging and the evolution to IN drove its implementation. The protocol was created to be lighter than Radius, while extensible, with a variety of proprietary fields that could be added for specific uses. Its extensibility was the main criterion for its adoption as the protocol of choice for Policy and Charging functions.
Victim of its success, the protocol is now used in LTE for a variety of tasks ranging from querying subscriber databases (HSS), querying user balance and performing transactional charging and policy traffic.

Tekelec' signaling solutions, together with its policy product line (inherited from the Camiant acquisition), provides a variety of solution to handle the increasing load of diameter signaling traffic and is proposing its "Diameter Signaling Router as a means to manage, throttle, load balance and route diameter traffic".

In my opinion, data browsing is less predictable than voice or messaging traffic when it comes to signalling. While in the past a message at the establishment of the session, one at the end and optionally a few interim updates were sufficient, today sophisticated business models and price plans require a lot of signalling traffic. Additionally, diameter starts to be used to extend outside of the core packet network towards the RAN (for RAN optimization) and towards the internet (for OTT 2 sided business models). OTT content and app providers do not understand the functioning of mobile networks and we cannot expect device and app signalling traffic to self-regulate. While some 3GPP effort is expended to evaluate new architectures and rules such as fast dormancy, the problem is likely to grow faster than the standards' capacity to contain  it. I believe that diameter management and planning is necessary for network operators who are departing from all-you-can eat data plans and policy-driven traffic and charging models.

No comments: