Call Quality Assessment
8 min read

Call Quality Assessment

Today, we'll "briefly" look at quality assesment of media(audio and video) and how we can use it to measure the quality of media being transmitted to our users.

This blog contains contents from multiple ITU documents and related studies

Why?

Quality assessment is important due to various reasons:

  • It allows us to measure the quality of content being provided to our users
  • Benchmark multiple products' quality
  • Use it as a guideline to compare any improvements or degradations of product being developed

By developing a good quality assessment metrics and systems, we can work on providing a better product to our users while ensuring its capacity.

How?

Quality assessment of media aims to predict the quality of the given media using certain set of metrics and algorithms. This results in a MOS value (Mean Opinion Score) which is based on a 5 level scale that predicts real human subject's opinion on the given media. There are many different types of MOS values with different qualities in mind.

As shown in the image above, the original video goes through the distortion channel (such as WebRTC, Streaming service, or some sort of network transmission), which is then fed into the Quality Assessment that generates a quality score.

Some types of quality assessment methodologies uses both original signal and distorted signal to produce a quality score.

Type of Quality Assessments

Because there are many different types of media such as real-time streaming, conversational media, and video streaming service, different types of quality assessment methods are used.

Perceptual vs. Error sensitivity

Perceptual vs. Error Sensitivity Model

Perceptual and Error sensitivity quality assessment models differ in how they perdict the degradation in the given media.

Perceptual model uses human audio-visual qualities to predict the actual quality of the given media in a way that actual human subjects would observe it.

In contrast, error sensitivity model uses the measured error signals in the media to "linearly" predict the quality of the given media. Some examples of error sensitivity model is signal-to-noise ratio, packet loss, or pixel distortion.

Because perceptual quality assessment models use human attributes to model the perceived quality of the media, it is often more accurate than error-sensitivity models.

In the above image, the error-sensitivity model generates the same level of image quality for all three rows of image, while perceptual model generates image quality levels in an ascending order, which corresponds with our actual perceivment of the images.  

Subjective vs. Objective

Another differences in quality assessment is subjective quality assessment and objective quality assessment.

Subjective quality assessment is subjective assessment of the given media done by real people with their own subjective views. Because everyone has different standards, subjective quality assessments can produce different outputs for the same input.

Objective quality assessment predicts the given media using a standardized model such as algorithms and machine learning models. Unlike subjective quality assessment, it produces same output for the same input.

Subjective quality assessment is often costly and time-consuming as it requires human efforts, so it is usually used in the process of making a quality assessment model and verifying one. Objective quality assessment is simpler and easier to test, so it is often used for iterative testing and in production environment.

Full reference vs. No reference vs. Reduced reference

Quality assessment is done by either of these three methods

FR vs. NR vs. RR
  • Full reference: Uses both the original signal and distorted signal to produce a quality score. Because it can directly compare the original and distorted signals,  it can use more accurate and detailed models. However, due to privacy and performance issues, it usually isn't used in production environments.
  • No reference: Uses only the distorted signal to produce a quality score. Because it has no original signal to compare to, it can only use simpler and perhaps less-accurate models. No reference quality assessment methodology is still being developed and lacks accuracy to be readily used.
  • Reduced reference: Uses the distorted signal and some parts of original signal to produce a quality score. Because it only uses some parts of the original signal, it can overcome the privacy and performance issues seen in full reference models. For example, a model could extract only few details about the frame from the original media and use it to improve the accuracy of the quality score. Similar to a no reference model, reduced reference model is still yet to be developed.
Most of standardized models are based on full reference model.

MOS Score

Mean Opinion Score
1 2 3 4 5
Quality Bad Poor Fair Good Excellent
Impairment Very Annoying Annoying Slightly Annoying Perceptible Imperceptible

Mean Opinion Score is a standardized scale used to predict the quality of a media. It ranges from 1 to 5, with 1 being the worst and 5 being the best.

MOS has many different types depending of the type of media and use cases being assessed.

Audio Quality

Now that we looked at the basics of quality assessment and different types of it, let's look at Audio Quality Assessment in specific.

When conducting an audio quality assessment, the methodology is varies depending on the type of audio being assessed.

  • Speech Audio: Uses frequencies up to 8000Hz in which speech normally takes place.
  • General Audio: Uses audible frequencies up to 48000Hz.

Psychoacoustic Model

How we perceive audio can not be defined in a simple linear scale. Also frequencies can not be compared directly as external factors like noise can distort the signals.

So, if you want to actually model how people perceive audio, we must model human audio system into a psychoacoustic model. There are many different ways to achieve this, and FFT(Fast Fourier Transform) and Filter bank based ear model are some of the examples.

The following are few of the strategies that such models use:

  • use different responses to each frequency
  • band pass and smoothing
  • modeling different reactions that happen inside an actual ear

There are a lot of different ways to model human's auditory system, and many other techniques are used in order to improve the model.

This topic is to broad cover in a blog post. For implementations of such method, take a look at google's VISQOL.

Video Quality

Now, let's look at video quality assessment.

Similar to audio, video quality assessment is done by either modeling human's visual system or checking for literal errors in the pixels.

While images were compared as a whole until a few years ago, current methodologies relies more on how human perceives objects structurally. Instead of comparing every pixel, more important parts of the image (such as a center of focus) is more heavily compared.

  • Perceputal: Models the human visual system
  • Error sensitivity: Uses basic signals such as mean squared error or signal to noise ratio.

The most significant difference between these two is that perceptual model checks for relative errors while error sensitivity model checks for absolute errors. Unlike absolute error, relative errors takes extra features such as image's focus and amount of light into account.

As media goes through the process of compression and decompression and network transmission, there may be a number of side effects such as jitter, timing issue, temporal effects, masking, and compression artifacts. These factors must be correctly taken into account and correctly model how it affects human's ability to perceive the image.

Few examples of existing video quality assessment methodologies are

  • VMAF
  • VIF
  • SSIM
  • PSNR

Unlike static image, different types of videos must be tested differently. Depending on the purpose of video (such as for gaming, conference, etc), people focus on different parts of the video, and the factors of the video have different effects on the perceived quality. For example, while the video's smooth frame rate is not necessarily important for a conference video, it is very important for gaming videos.

Additionally, because video is a continuous flow of images, its flow and natural-ness must be tested as well. But due to its complexity, there is yet to be a standardized methodology for objective video quality assessment.

Audiovisual Quality

As the actual end users listen to and look at audio and video at the same time, we must also be able to predict the audiovisual quality of the media. The audiovisual quality can be produced by either directly adding the audio and video scores or measuring them together.

However, there is no standardized method of objectively predicting the audiovisual quality of a media content. Few difficulties involve the need to correctly sync the audio and video.

Validation Process

Objective quality assessment methodologies need to be validated with corresponding subjective quality assessment, checking if their results correspond with each other.

Subjective quality assessments can show biased results depending on the subject's mood, audiovisual abilities, preference for the media, surrounding environments, and so on. To oversome these biases, Single Stimulus Continuous Quality Evaluation and Double Stimulus Continuous Quality Scale are conducted.

Single Stimulus Continuous Quality Evaluation
  • Single Stimulus Continuous Quality Evaluation: plays one stimulus (one media) continuously and allow the subject to freely change the quality scale throughout the lifetime of the content. This allows for more accurate data within each time intervals
Double Stimulus Continuous Quality Scale
  • Double Stimulus Continuous Quality Scale: plays two stimuluses (two media) in mixed sequence. By showing the original and distorted in mixed sequence makes the subject forget the previously played media. This allows for less biased opinion for both original and disorted signals.  

Even with such efforts, depending on the participants and their viewing condition, the quality score may change. However, these methods are still valid efforts to improve the subjective data points.


This was a general look at media quality assessment. If you wish to learn more about it, you can look at ITU documents and studies about quality assessment.

Reference

For more ITU-T Recommendations, please see https://www.itu.int/rec/T-REC-P/en