Monday, February 20, 2012

Mobile video QOE part I: Subjective measurement


As video traffic continues to flood many wireless networks, over 80 mobile network operators have turned towards video optimization as a means to reduce the costs associated with growing their capacity for video traffic.
In many cases, the trials and deployments I have been involved in, have shown many carriers at a loss when it comes to comparing one vendor or technology against another. Lately, a few specialized vendors have been offering video QoE (Quality of Experience) tools to measure the quality of the video transmitted over wireless networks. In some cases, the video optimization vendors themselves have as well started to package some measurement with their tool to illustrate the quality of their encoding.
In the next few posts,and in more details, in my report "Video Optimization 2012" I examine the challenges and benefits of measuring  the video QoE in wireless networks, together with the most popular methods and their limitations.
Video QoE subjective measurement
Video quality is a very subjective matter. There is a whole body of science dedicated to provide an objective measure for a subjective quality. The attempt, here, is to rationalize the differences in quality between two videos via a mathematical measurement. It is called objective measurements and will be addressed in my next posts. Subjective measurement on the other hand, is a more reliable means to determine a video’s quality. It is also the most expensive and the most time-consuming technique if performed properly. 
For video optimization, a subjective measurement usually necessitates a focus group who is going to be shown several versions of a video, at different quality (read encoding). The individual opinion of the viewer is recorded in a templatized feedback form and averaged. For this method to work, all users need to see the same videos, in the same sequence, with the same conditions. It means that if the videos are to be streamed on a wireless network, it should be over a controlled environment, so that the same level of QoS is served for the same videos. You can then vary the protocol by having users comparing the original video with a modified version, both played at the same time, on the same device, for instance.
The averaged opinion, the Mean Opinion Score, of each video is then used to rank the different versions. In the case of video optimization, we can imagine an original video encoded at 2Mbps, then 4 versions provided by each vendor at 1Mbps, 750kbps and 500kbps and 250kbps. Each of the subject in the focus group will rank each version from each vendor from 1 to 5, for instance.
The environment must be strictly controlled for the results to be meaningful. The variables must be the same for each vendor, e.g. all performing transcoding in real time or all offline, same network conditions, for all the playback / streams and of course, same devices and same group of users.
You can easily understand that this method can be time consuming and costly, as network equipment and lab time must be reserved, network QoS must be controlled, focus group must be available for the duration, etc...
In that example, the carrier would have each corresponding version from each vendor compared in parallel for the computation of the MOS.  The result could be something like this:
The size of the sample (the number of users in the focus group) and how controlled the environment is, can dramatically affect the result, and it is not rare that you find aberrational results, as in the example above where vendor "a" sees its result increase from version 2 to 3.
If correctly executed, this test can track the relative quality of each vendor at different level of optimization. In this case, you can see that vendor "a" has a high level of perceived quality at medium-high bit rates but performs poorly at lower bit rates. Vendor "b" shows little degradation as the encoding decreases, vendors "c" and "d" show near-linear degradation inversely proportional to the encoding.
In every case, the test must be performed in a controlled environment to be valid. Results will vary sometimes greatly from one vendor to an other, and sometimes with the same vendor at different bit rate, so an expert in video is necessary to create the testing protocol, evaluate the vendors' setup, analyse the results and interpret the scores. As you can see, this is not an easy task and rare are the carriers who have successfully performed subjective analysis with meaningful results for vendor evaluation. This is why, by and large, vendors and carriers have started to look at automatized tools to evaluate existing video quality in a given network,  to compare different vendors and technologies and to measure ongoing perceived quality degradation due to network congestion or destructive video optimization. This will be subject of my next posts.

No comments: