Wednesday, June 18, 2014

Are we ready for experience assurance? part II

Many vendors’ reporting capabilities are just fine when it comes to troubleshooting issues associated with connectivity or health of their system. Their capability to infer, beyond observation of their own system, the health of a connection or the network is oftentimes limited. 

Analytics, by definition require a large dataset that is ideally covering several systems and elements to provide correlation and pattern recognition on otherwise seemingly random events. With a complex environment like the mobile network, it is extremely difficult to understand what a user’s experience is on their phone. There are means to extrapolate and infer the state of a connection, a cell, a service by looking at network connections fluctuations. 

Traffic management vendors routinely report on the state of a session by measuring the TCP connection and its changes. Being able to associate with that session the device type, time of the day, location, service being used is good but a far cry from analytics.
Most systems will be able to detect if a connection went wrong and a user had a sub-par experience. Being able to tell why, is where analytics’ value is. Being able to prevent it is big data territory.
So what is experience assurance? How does (should) it work?

For instance, a client calls the call center to complain about a poor video experience. The video was sluggish to start with, started 7 seconds after pressing play and started buffering after 15 seconds of playback.
A DPI engine would be able to identify whether TCP and HTTP traffic were running efficiently at the time of the connection.
A probe in the RAN would be able to report a congestion event in a specific location.
A video reporting engine would be able to look at whether the definition and encoding of the video was compatible with the network speed at the time.
The media player in the device would be able to report whether there was enough resources locally to decode, buffer, process and play the video.
A video gateway should be able to detect the connection impairment in real time and to provide the means to correct or elegantly notify of the impending state of the video before the customer experiences a negative QoE.
A big data analytics platform should be able to point out that the poor experience is the result of a congestion in that cell that occurs nearly daily at the same time because the antenna serving that cell is in an area where there is a train station and every day the rush hour brings throngs of people connecting to that cell at roughly the same time.
An experience assurance framework would be able to provide feedback instruction to the policy framework, forcing download, emails and non-real-time data traffic to be delayed to account for short burst of video usage until the congestion passes. It should also allow to decide what the minimum level of quality should be for video and data traffic, in term of delivery, encoding speed, picture quality, start up time, etc… and proactively manage the video traffic to that target when the network “knows” that congestion is likely

Experience assurance is a concept that is making its debut when it comes to data and video services. To be effective, a proper solution should ideally be able to gather real time events from the RAN, the core, the content, the service provider and the device and to decide in real-time what is the nature of the potential impairment, what are the possible course of actions to reduce or negate the impairment or what are the means to notify the user of a sub-optimal experience. No single vendor, to my knowledge is able to achieve this use case, either on its own or through partnerships at this point in time. The technology vendors are too specialized, the elements involved in the delivery and management of data traffic too loosely integrated to offer real experience assurance at this point in time.

Vendors who want to provide experience assurance should first focus on the data. Most systems create event or call logs, registering hundreds of parameters every session, every second. Properly representing what is happening on the platform itself is quite difficult. It is an exercise in interpretation and representation of what is relevant and actionable and what is merely interesting. This is an exercise in small data. Understanding relevance and discriminating good data from over engineered logs is key.

A good experience assurance solution must rely on a strong detection, analytics and traffic management solution. When it comes to video, this means a video gateway that is able to perform deep media inspection and to extract data points that can be exported into a reporting engine. The data exported cannot be just a dump of every event or every session. The reporting engine is only going to be as good as the quality of the data that is fed into it. This is why traffic management products must be designed with analytics in mind from the ground up if they are to be efficiently integrated within an experience assurance framework.

1 comment:

Martin Geddes said...

Given that all operators and vendors use measurements that are single point averages, they are unable to spatially isolate issues, identify true cause and effect, and distinguish static structural problems from dynamic load ones.

The answer is to move to multi-point distribution measurements.