Benefiting from AI and deep learning for video summarization

Author

Divya Jain

Date published May 9, 2019 Categories

The global video market is taking center stage, according to Forbes, over 500 million hours of video are watched on YouTube every day. Google adds that almost 50% of internet users look for videos related to a product or service before visiting a store. Many such statistics show how video content is growing and will remain the mainstream as a means of sharing information. We are already seeing a shift from copy and text to snapshot stories and visual posts (for instance, Instagram) for sharing content. Artificial intelligence (AI) is also playing a large role in this shift to video. We can use AI to improve video quality by stabilization, to understand and classify content for editing purposes, or to better deliver and target.

AI is also playing a key role with video summarization, a process of shortening a video by selecting keyframes or parts of videos that capture the main points in the video. Summarization has many use cases, with one of the most significant being the ability to gauge interest in the content. A flashcard summary can determine how many people will actually watch an entire video. Even a single thumbnail plays a crucial role in determining how many people will click on a video to play it. Along with determining video clicks, video summarization is also necessary for efficient viewing of the material and for video length adaptation for different mediums, like Instagram, Facebook, and the others.

Recently, there have been many advances in using deep learning to increase the processing of images. The ability for AI to understand an image’s context has rapidly improved in accuracy. Similar techniques can be used to understand videos too, but this is a much more complex process. Video is not just a collection of a large number of frames or images, but videos are multi-dimensional, including audio, motion, and a time-series dimension. Each of these dimensions is key in understanding a video, and depending on what the summarization is targeting, different dimensions can be crucial.

The anatomy of AI video summarization

Video summarization can be categorized into two broad areas of machine learning, supervised and unsupervised. Supervised summarization entails learning patterns from previously annotated videos and examples. This works very well in case of videos where a pattern exists, like sporting events. For these videos, we can annotate some sequences and learn from them. However, the biggest challenge with supervised learning is the labeled data. It is costly to create these well-defined datasets. Labeling of data requires domain knowledge and does not work well when it comes to a wide variety of content that is present on the web.

The other machine learning form of summarization is unsupervised, where a smaller number of frames are selected from the original video through change detection in the video. Low-level features such as color, motion, and texture have been commonly used to create histograms and clusters to determine the similar frames within a video. A few frames are then selected that are deemed useful for the summary based on the information that they are conveying from the original video. These techniques work best when the video has distinct visual content, for example, a video taken throughout the different days of a vacation. However, these summaries often lack the context and come out as disjointed images.

Recent forms of deep learning look very promising in addressing the above-mentioned challenges. They lend themselves to much more effective creation of video summaries. While supervised deep learning techniques popularized the process, unsupervised techniques such as generative adversarial networks (GANs) and reinforcement learning are showing great promise, offering excellent advantages that are making them a forerunner in video summarization.

The power of emerging unsupervised deep learning techniques in video summarization

For videos that don’t adhere to any pattern and are completely different from each other, GANs work very nicely. GANs have two neural nets:

An encoder that tries to mimic the real data.
A decoder that is trying to learn if the generated data is fake or not.

This helps GANs learn the data distribution very effectively and create data that is very difficult to distinguish from the original dataset. In this case, each video can be described as a dataset, with GANs creating a subset of frames that are most representative of the given videos. This generates unique summaries for videos while preserving the context and meaning of the videos themselves. This technique can be used by marketers for creating smaller versions of full-length ads or campaigns based on the devices and target the right audience. This can also be used by creative artists to give a preview of their upcoming releases.

For videos that have a common structure, like sporting events, reinforcement learning is more effective than supervised learning because it does not require labeled data. Here, the neural nets can learn which frames to choose based on a reward function. They learn from previous summaries to determine whether certain frames were watched or skipped. Different kinds of reward functions can also be defined in ways where previous information is not required, such as frame diversity and representativeness or frame category classification. Such techniques can be employed by campaign managers to create more watchable and memorable summaries from past experience and engage with their customers effectively.

These new unsupervised techniques are just the start of a new era in deep learning technology when it comes to video summarization. Many advances will be made in the near future to create and optimize the best summaries based on the audience, delivery medium, and intent of summarization. Together with efforts across the industry, we’ll make video summarization highly scalable, reliable, and incredibly efficient.

Divya Jain is Director of Machine Learning at Adobe Sensei. She can be found on Twitter @divyajain1.

Follow us

Benefiting from AI and deep learning for video summarization

The anatomy of AI video summarization

The power of emerging unsupervised deep learning techniques in video summarization

Subscribe to get your daily business insights

Read the next article

Explore Tech Talks

Whitepapers

Whitepapers

US Mobile Streaming Behavior

US Mobile Streaming Behavior

Winning the Data Game: Digital Analytics Tactics for Media Groups

Winning the Data Game: Digital Analytics Tactics f...

Learning to win the talent war: how digital marketing can develop its peopl...

Learning to win the talent war: how digital market...

Engagement To Empowerment - Winning in Today's Experience Economy

Engagement To Empowerment - Winning in Today's Exp...

Related Articles

The age of the prompt: Brand drivers wanted

The age of the prompt: Brand drivers wanted

Overcoming CX shortfalls across digital channels with (and without!) AI

Overcoming CX shortfalls across digital channels w...

Nestlé USA drives consumer engagement with cookie coach, AI bot ‘Ruth’

Nestlé USA drives consumer engagement with cookie ...

A contingency plan for the inevitable cookie death

A contingency plan for the inevitable cookie death

Can we trust AI if we don't trust each other?

Can we trust AI if we don't trust each other?

If you want sellers to sell more, embrace AI

If you want sellers to sell more, embrace AI

Customer experience in 2025: here’s where we’re heading

Customer experience in 2025: here’s where we’re he...

AI-powered chatbots deliver personalization at scale

AI-powered chatbots deliver personalization at sca...