宝安御景国际抓人视频:嵌入式视频处理基本原理(第1部分,共5部分)

来源:百度文库 编辑:中财网 时间:2024/05/07 14:32:06

Fundamentals of Embedded Video Processing   (Part 1 of a 5-part series   )

嵌入式视频处理基本原理(第1部分,共5部分)

By David Katz and Rick Gentile, ADI公司

 

As consumers, we’re intimately familiar with video systems in many embodiments.  However, from the embedded developer’s viewpoint, video represents a tangled web of different resolutions, formats, standards, sources and displays.   

作为消费者,我们对于各种形式的视频系统都已经了如指掌。不过,从嵌入式开发者的角度来看,视频技术就好象是一个具有不同的分辨率、格式、标准、信源和显示的复杂网络。

 

In this series, we will strive to untangle some of this intricate web, focusing on the most common circumstances you’re likely to face in today’s media processing systems.  After reviewing the basics of video, we will discuss some common scenarios you may encounter in embedded video design and provide some tips and tricks for dealing with challenging video design issues.  

在本章中,我们只对视频中的某些方面进行具体阐述,这些方面都是当今多媒体处理系统中比较常见的问题。首先,简要介绍视频方面的基本知识,然后,将会重点讨论一些在嵌入式多媒体设计中常见的问题,同时,也将针对一些具有挑战性的视频设计问题,提供一些技巧与窍门。

 

Human Visual Perception  

人类视觉感知

Let’s start by discussing a little physiology.  As we’ll see, understanding how our eyes work has paved an important path in the evolution of video and imaging.

首先来讨论一些简单的生理学问题。正如我们将看到的那样,对我们的眼睛的工作原理的理解为视频和成像技术的发展铺设出一条重要的道路。

Our eyes contain 2 types of vision cells: rods and cones.  Rods are primarily sensitive to light intensity as opposed to color, and they give us night vision capability. Cones, on the other hand, are not tuned to intensity, but instead are sensitive to wavelengths of light between 400nm(violet) and 770nm(red).  Thus, the cones provide the foundation for our color perception.  

眼睛包含两种视觉细胞:杆状细胞和视锥细胞。杆状细胞主要对亮度信息敏感,而对颜色信息不敏感,它们使我们具备夜视能力。与此相反,视锥细胞对亮度并不敏感,但对400nm(紫光)~770nm(红光)波长范围内的光比较敏感。因此,这些视锥细胞使我们能够感知色彩。

There are 3 types of cones, each with a different pigment that’s either most sensitive to red, green or blue energy, although there’s a lot of overlap between the three responses. Taken together, the response of our cones peaks in the green region, at around 555 nm.  This is why, as we’ll see, we can make compromises in LCD displays by assigning the Green channel more bits of resolution than the Red or Blue channels.  

视锥细胞有3种,每一种都带有不同的色素,分别对红光、绿光或者蓝光波长敏感,虽然这3种细胞的响应特性有重叠区域。总的说来,视锥细胞对波长在555nm左右的绿光区域最为敏感。这也就是为什么在LCD显示器中,绿色通道的分辨率高于红色和蓝色通道。

The discovery of the Red, Green and Blue cones ties into the development of the trichromatic color theory, which states that almost any color of light can be conveyed by combining proportions of monochromatic Red, Green and Blue wavelengths. 

红色、绿色和蓝色视锥细胞的发现大大促进了三色理论的发展,该理论认为,任何一种有色光,可以通过不同比例的红光、绿光和蓝光的组合生成。

Because our eyes have lots more rods than cones, they are more sensitive to intensity rather than actual color.  This allows us to save bandwidth in video and image representations by subsampling the color information.

由于人眼含有的杆状细胞的数量要远多于视锥细胞,故眼睛对亮度的敏感度要高于对色彩的敏感度。这使得我们可以借助对色彩信息的子采样来节省视频和图像信息的带宽。

Our perception of brightness is logarithmic, not linear. In other words, the actual intensity required to produce a 50% gray image (exactly between total black and total white) is only around 18% of the intensity we need to produce total white.  This characteristic is extremely important in camera sensor and display technology, as we’ll see in our discussion of gamma correction. Also, this effect leads to a reduced sensitivity to quantization distortion at high intensities, a trait that many media encoding algorithms use to their advantage.  

我们对亮度的感受特性是对数性的,而非线性的。换句话说,用于产生50%灰度图(恰好在全黑和全白之间的正中)所需的实际的光强仅为我们需要产生全白图像所需的光强的18%。这一特性在相机传感器和显示技术中尤为重要,正如我们将在后面的伽马校正中讨论的。此外,这一效应还将导致人眼对高亮度环境下的量化失真的感知度下降,导致这一特性被许多媒体编码算法所利用。

Another visual novelty is that our eyes adjust to the viewing environment, always creating their own reference for white, even in low-lighting or artificial-lighting situations.  Because camera sensors don’t innately act the same way, this gives rise to a white balance control in which the camera picks its reference point for absolute white.

视觉方面的另一新奇之处在于,人眼可以适应环境,创建自己的白光参考,即使在低照明或者人工照明的情况下也是如此。因为摄像传感器自身并不具有这一特性,因此它需要使用参考量作为绝对白色,并对传感器进行调整,这一过程称为称为白平衡控制。

The eye is less sensitive to high-frequency information than low-frequency information. What’s more, although it can detect fine details and color resolution in still images, it cannot do so for rapidly moving images. As a result, transform coding (DCT, FFT, etc.) and low-pass filtering can be used to reduce total bandwidth needed to represent an image or video sequence.   

人眼对高频信息的敏感性要低于对低频信息。而且,虽然它可以检测出静态图像中细节和彩色部分的分辨率,但对于快速移动的图像,却无法做到这一点。于是,人们可以利用变换编码(DCT、FFT等)以及低通滤波技术来降低呈现一幅图像或者视频序列时所需的总带宽。

Our eyes can notice a “flicker” effect at image update rates less than 50-60 times per second, or 50-60 Hz, in bright light. Under dim lighting conditions, this rate drops to about 24 Hz. Additionally, we tend to notice flicker in large uniform regions more so than in localized areas. These traits have important implications for interlaced video, refresh rates and display technologies.  

当图像的刷新速率低于50~60次/s时,我们的眼睛会感受到一种亮光“闪烁”的效应。在光线较暗的情况下,该频率值降低到24Hz。此外,我们更倾向于观察到大而均匀的区域内的闪烁,相比之下,对局部区域的闪烁敏感度较低。这些特性对于隔行视频、刷新速率和显示技术具有重要的潜在作用。

 

What’s a video signal? 

何谓视频信号?

 

Figure 1: Composition of Luma signal    

图1  亮度信号的组成

图中:Breakdown of Luma Signal——亮度信号的分类,Back Porch——后沿,Horizontal syc——水平同步,White level——白色级,Grey Level——灰色级,Black Level——黑色级

 

At its root, a video signal is basically just a two-dimensional array of intensity and color data that is updated at a regular frame rate, conveying the perception of motion. On conventional cathode-ray tube (CRT) TVs and monitors, an electron beam modulated by the analog video signal shown in Figure 1 illuminates phosphors on the screen in a top-bottom, left-right fashion.  Synchronization signals embedded in the analog signal define when the beam is actively “painting” phosphors and when it is inactive, so that the electron beam can retrace from right to left to start on the next row, or from bottom to top to begin the next video field or frame.  These synchronization signals are represented in Figure 2.    

从根本上来说,一个视频信号基本上只是由亮度和色彩数据构成的2维阵列,该阵列以一定帧率的刷新变化来描述运动。在传统的阴极射线管(CRT)电视和显示器中,屏幕上的磷粉由一个电子束从上到下、从左到右的方式激发产生光亮。该电子束是由一个如图1所示的模拟视频信号调制生成。嵌入该模拟信号中的同步信号,决定了电子束什么时候激亮磷粉,什么时候停止操作。这样电子束可以在下一行由右向左回程扫描,或者从下到上开始对下一帧视频场或帧信号进行扫描。这些同步信号如图2所示。

 

 

HSYNC is the horizontal synchronization signal. It demarcates the start of active video on each row (left to right) of a video frame. Horizontal Blanking is the interval in which the electron gun retraces from the right side of the screen back over to the next row on the left side.     

HSYNC是水平同步信号。它界定了视频帧每一行中(从左到右)有效视频的起始位置。水平消隐为电子枪从屏幕右侧回扫至下一行左侧的时间间隔。

 

VSYNC is the vertical synchronization signal. It defines the start (top to bottom) of a new video image. Vertical Blanking is the interval in which the electron gun retraces from the bottom right corner of the screen image back up to the top left corner.   VSYNC是垂直同步信号。它定义了一个新的视频图像的起始位置(从上到下)。垂直消隐为电子枪从屏幕图像的右下角返回左上角所需的时间间隔。

 

FIELD distinguishes, for interlaced video, which field is currently being displayed.  This signal is not applicable for progressive-scan video systems.   

FIELD用于在隔行视频信号中区分出目前所显示的场。该信号并不适用于逐行扫描视频系统。

 

  

Figure 2: Typical timing relationships between HSYNC, VSYNC, FIELD

图2  HSYNC、VSYNC、FIELD信号之间的时序关系

 

The transmission of video information originated as a display of relative luminance from black to white – thus was born the black-and-white television system. The voltage level at a given point in space correlates to the brightness level of the image at that point.  

视频信息的传输起源于由黑到白的相关亮度显示,黑白电视系统也是这样产生的。在空间中的一个给定点处的电压水平则与该点图像的亮度水平相关。

 

When color TV became available, it had to be backward-compatible with B/W systems, so the color burst information was added on top of the existing luminance signal, as shown in Figure 3. Color information is also called chrominance. We’ll talk more about it in our discussion on color spaces (in part 2 of this series).   

当彩色电视出现后,它必须保证与黑白电视的后向兼容,因此彩色脉冲信息被添加到已有的亮度信号顶部,如图3所示。色彩信息也被称为色度。我们将在关于色彩空间的讨论中更多的探讨这一问题(见该系列文章的第2部分)。

 

Figure 3: Analog video signal with color burst

3   带色同步信号的模拟视频信号

图中:Luma Channle——亮度通道,Chroma Channel——色度通道,Composite Video signal——复合视频信号。

Color Burst Demodulation Reference Signal——色彩脉冲解调参考信号

 

 

Broadcast TV – NTSC and PAL    

广播电视——NTSCPAL制式

Analog video standards differ in the ways they encode brightness and color information. Two standards dominate the broadcast television realm – NTSC and PAL.  NTSC, devised by the National Television System Committee, is prevalent in Asia and North America, whereas PAL (“Phase Alternation Line”) dominates Europe and South America.  PAL developed as an offshoot of NTSC, improving on its color distortion performance. A third standard, SECAM, is popular in France and parts of eastern Europe, but many of these areas use PAL as well.  Our discussions will center on NTSC systems, but the results relate also to PAL-based systems.    

模拟视频标准的区别在于它们各自对亮度和彩色信息的编码方式。目前广播电视领域占统治地位的是两种标准——NTSC和PAL。NTSC由美国国家电视系统委员会提出,在亚洲和北美广泛使用,而PAL是NTSC的一个分支,在欧洲和南美占据统治地位。另外一种制式,SECAM,则在法国和东欧部分地区流行,不过,在这些地区中,许多也都采用PAL。我们的讨论将集中在NTSC制上,但讨论的结果也适用于基于PAL制的系统。

 

Video Resolution  

视频分辨率

Horizontal resolution indicates the number of pixels on each line of the image, and vertical resolution designates how many horizontal lines are displayed on the screen to create the entire frame.  Standard definition (SD) NTSC systems are interlaced-scan, with 480 lines of active pixels, each with 720 active pixels per line (i.e., 720x480 pixels).  Frames refresh at a rate of roughly 30 frames/second (actually 29.97 fps), with interlaced fields updating at a rate of 60 fields/second (actually 59.94 fields/sec).   

水平分辨率是指图像每行的像素个数,而垂直分辨率则是指显示完整一帧时屏幕上出现的水平线的数量。标清NTSC系统采用隔行扫描方式,具有480线有效像素,每条线上有720个有效的像素(即总计720×480像素)。

 

High definition systems (HD) often employ progressive scanning and can have much higher horizontal and vertical resolutions than SD systems.  We will focus on SD systems rather than HD systems, but most of our discussion also generalizes to the higher frame and pixel rates of the high-definition systems.   

高清系统常常采用逐行扫描方式,其水平和垂直分辨率要远高于标清系统。我们将专注于标清系统,而非高清系统,但我们讨论的大部分,也将推广到具有更高帧和像素传输率的高清系统。

 

When discussing video, there are two main branches along which resolutions and frame rates have evolved.  These are computer graphics formats and broadcast video formats.  Table 1 shows some common screen resolutions and frame rates belonging to each category.  Even though these two branches emerged from separate domains with different requirements (for instance, computer graphics uses RGB progressive-scan schemes, while broadcast video uses YCbCr interlaced schemes), today they are used almost interchangeably in the embedded world.  That is, VGA  compares closely with the NTSC “D-1” broadcast format, and QVGA parallels CIF.  It should be noted that although D-1 is 720 pixels x 486 rows, it’s commonly referred to as being 720x480 pixels (which is really the arrangement of the NTSC “DV” format used for DVDs and other digital video). 

在讨论视频技术时,分辨率和帧速率的提升是沿着两条主要的分支发展的,即计算机图形图像格式和广播视频格式。表1给出了各种常见的屏幕分辨率和帧率的比较。即使这两路分支源于不同的领域,而且要求也不同(例如,计算机图形显示使用RGB逐行扫描方法,而广播视频则使用YCbCr隔行扫描方法),如今在嵌入式领域,它们在使用上几乎可以是互换的。也就是说,VGA与NTSC“D-1”广播格式相当, QVGA对应的则是CIF。应该注意的是,虽然D-1是720像素×480行格式,但它通常被称为720×480像素(这实际上是针对DVD和其他数字视频的NTSC“DV”格式)。

 

Table 1: Graphics vs Broadcast standards    

图形图像与广播标准之间的对比

  

视频源

视频标准

水平分辨率(像素)

垂直分辨率(像素)

总像素

广播

QCIF

176

144

25344

计算机图形

QVGA

320

240

76800

广播

CIF

352

288

101376

计算机图形

VGA

640

480

307200

广播

NTSC

720

480

345600

广播

PAL

720

576

414720

计算机图形

SVGA

800

600

480000

计算机图形

XGA

1024

768

786432

广播

HDTV(720P)

1280

720

921600

计算机图形

SXGA

1280

1024

1310720

计算机图形

UXGA

1600

1200

1920000

计算机图形

QXGA

2048

1536

3145728

 

 

Interlaced vs. Progressive Scanning  

隔行和逐行扫描

 

Interlaced scanning originates from early analog television broadcast, where the image needed to be updated rapidly in order to minimize visual flicker, but the technology available did not allow for refreshing the entire screen this quickly.  Therefore, each frame was “interlaced,” or split into two fields, one consisting of odd-numbered scan lines, and the other composed of even-numbered scan lines, as depicted in Figure 4.  The frame refresh rate for NTSC/(PAL) was set at approximately 30/(25) frames/sec. Thus, large areas flicker at 60 (50) Hz, while localized regions flicker at 30 (25) Hz.  This was a compromise to conserve bandwidth while accounting for the eye’s greater sensitivity to flicker in large uniform regions.   

隔行扫描方式源于早期的模拟电视广播技术,这种技术需要对图像进行快速扫描,以便最大限度地降低视觉上的闪烁感,但是当时可以运用的技术并不能以如此之快的速度对整个屏幕进行刷新。于是,将每帧图像进行“交错”排列或分为两场,一个由奇数扫描线构成,而另一个由偶数扫描线构成,如图4所示。NTSC/(PAL)的帧刷新速率设定为约30/(25)帧/秒。于是,大片图像区域的刷新率为60(50)Hz,而局部区域的刷新率为30(25)Hz,这也是出于节省带宽的折中考虑,因为人眼对大面积区域的闪烁更为敏感。

 

Not only does some flickering persist, but interlacing also causes other artifacts.  For one, the scan lines themselves are often visible. Because each NTSC field is a snapshot of activity occurring at 1/60 second intervals, a video frame consists of two temporally different fields.  This isn’t a problem when you’re watching the display, because it presents the video in a temporally appropriate manner.  However, converting interlaced fields into progressive frames (a process known as “deinterlacing”), can cause jagged edges when there’s motion in an image. Deinterlacing is important because it’s often more efficient to process video frames as a series of adjacent lines.   

隔行扫描方式不仅会产生闪烁现象,也会带来其它问题。例如,扫描线本身也常常可见。因为NTSC中每场信号就是1/60s时间间隔内的快照,故一幅视频帧通常包括两个不同的时间场。当正常观看显示屏时,这并不是一个问题,因为它所呈现的视频在时间上是近似一致的。然而,当画面中存在运动物体时,把隔行场转换为逐行帧(即解交织过程),会产生锯齿边缘。解交织过程非常重要,因为将视频帧作为一系列相邻的线来处理,这将带来更高的效率。

 

With the advent of digital television, progressive (that is, non-interlaced) scan has become a very popular input and output video format for improved image quality.  Here, the entire image updates sequentially from top to bottom, at twice the scan rate of a comparable interlaced system. This eliminates many of the artifacts associated with interlaced scanning. In progressive scanning, the notion of two fields composing a video frame does not apply.    

随着数字电视的出现,逐行(即非隔行)扫描已经成为一种具有更高图像品质的流行的输入和输出视频格式。在这种方式下,整幅图像将从上到下依次刷新,其扫描速率约为相应隔行系统的扫描速率的两倍,这消除了隔行扫描产生的许多弊病。在逐行扫描中,由两场信号来表示一帧视频的方式不再使用。

  

Figure 4: Interlaced Scan vs Progressive Scan illustration

4:隔行扫描与逐行扫描方式的对比

图中:486 LinesOne Frame——486线:1

Line——行,InterlacedFrame is split into 2 field——隔行:图像帧被分离为两个视场;

ProgressivewFrame is displayed in sequence as a single field——逐行:图像帧作为一个视场依序显示;

 

Now that we’ve briefly discussed the basis for video signals and some common terminology, we’re almost ready to move to the really interesting stuff – digital video.  We’ll get to that in the next installment of this series.  

我们已经简要地讨论了视频信号的基础以及某些常用的术语,接下来,我们将在下一章节开始讨论令人感兴趣的部分——数字视频技术。