word表格线显示不全:基于Visual C++6.0工具下的声音文件操作

来源:百度文库 编辑:中财网 时间:2024/05/13 07:44:29

声音是人类传递信息的重要途径,如果应用程序中包含声音信息,就可以大大增强它的亲合力;另外在科研开发过程中,声音信号的处理也是一个很重要的科学研究领域。Visual C++作为一个强大的开发工具,当然是声音处理的首选工具,但是在当前Visual C++相关的编程资料中,无论是大部头的参考书,还是一些计算机杂志,对声音文件的处理都是泛泛的涉及一下,许多编程爱好者都感到对该部分的内容了解不是很透彻,笔者结合自己的学习和开发过程中积累的一些经验,在本实例中来和广大编程爱好者们探讨一下声音文件的处理,当然本实例也不可能包括声音处理内容的方方面面,只是希望它能够对刚刚涉及到声音处理领域的朋友们起到一个引路的作用,帮助他们尽快进入声音处理的更深奥空间。

  当前计算机系统处理声音文件有两种办法:一是使用现成的软件,如微软的录音机、SoundForge、CoolEdit等软件可以实现对声音信号进行录音、编辑、播放的处理,但它们的功能是有限的,为了更灵活,更大限度地处理声音数据,就不得不使用另外一种方法,既利用微软提供的多媒体服务,在Windows环境下自己编写程序来进行声音处理来实现一些特定的功能。下面就开始介绍声音文件的格式和在Windows环境下使用Visual C++开发工具进行声音文件编程处理的方法。

  一、实现方法

  1、RIFF文件结构和WAVE文件格式

  Windows支持两种RIFF(Resource Interchange File Format,"资源交互文件格式")格式的音频文件:MIDI的RMID文件和波形音频文件格式WAVE文件,其中在计算机领域最常用的数字化声音文件格式是后者,它是微软专门为Windows系统定义的波形文件格式(Waveform Audio),由于其扩展名为"*.wav",因而该类文件也被称为WAVE文件。为了突出重点,有的放矢,本文涉及到的声音文件所指的就是WAVE文件。常见的WAVE语音文件主要有两种,分别对应于单声道(11.025KHz采样率、8Bit的采样值)和双声道(44.1KHz采样率、16Bit的采样值)。这里的采样率是指声音信号在进行"模→数"转换过程中单位时间内采样的次数。采样值是指每一次采样周期内声音模拟信号的积分值。对于单声道声音文件,采样数据为八位的短整数(short int 00H-FFH);而对于双声道立体声声音文件,每次采样数据为一个16位的整数(int),高八位和低八位分别代表左右两个声道。WAVE文件数据块包含以脉冲编码调制(PCM)格式表示的样本。在进行声音编程处理以前,首先让我们来了解一下RIFF文件和WAVE文件格式。

  RIFF文件结构可以看作是树状结构,其基本构成是称为"块"(Chunk)的单元,每个块有"标志符"、"数据大小"及"数据"所组成,块的结构如图1所示:

块的标志符(4BYTES)数据大小 (4BYTES)数据
图一、 块的结构示意图

  从上图可以看出,其中"标志符"为4个字符所组成的代码,如"RIFF","LIST"等,指定块的标志ID;数据大小用来指定块的数据域大小,它的尺寸也为4个字符;数据用来描述具体的声音信号,它可以由若干个子块构成,一般情况下块与块是平行的,不能相互嵌套,但是有两种类型的块可以嵌套子块,他们是"RIFF"或"LIST"标志的块,其中RIFF块的级别最高,它可以包括LIST块。另外,RIFF块和LIST块与其他块不同,RIFF块的数据总是以一个指定文件中数据存储格式的四个字符码(称为格式类型)开始,如WAVE文件有一个"WAVE"的格式类型。LIST块的数据总是以一个指定列表内容的4个字符码(称为列表类型)开始,例如扩展名为".AVI"的视频文件就有一个"strl"的列表类型。RIFF和LIST的块结构如下:

RIFF/LIST标志符数据1大小数据1格式/列表类型数据
 
图二、RIFF/LIST块结构

  WAVE文件是非常简单的一种RIFF文件,它的格式类型为"WAVE"。RIFF块包含两个子块,这两个子块的ID分别是"fmt"和"data",其中"fmt"子块由结构PCMWAVEFORMAT所组成,其子块的大小就是sizeofof(PCMWAVEFORMAT),数据组成就是PCMWAVEFORMAT结构中的数据。WAVE文件的结构如下图三所示:

标志符(RIFF)数据大小格式类型("WAVE")"fmt"Sizeof(PCMWAVEFORMAT)PCMWAVEFORMAT"data"声音数据大小声音数据
 
图三、WAVE文件结构
      PCMWAVEFORMAT结构定义如下:

Typedef struct
{
 WAVEFORMAT wf;//波形格式;
 WORD wBitsPerSample;//WAVE文件的采样大小;
}PCMWAVEFORMAT;
WAVEFORMAT结构定义如下:
typedef struct
{
 WORD wFormatag;//编码格式,包括WAVE_FORMAT_PCM,WAVEFORMAT_ADPCM等
 WORD nChannls;//声道数,单声道为1,双声道为2;
 DWORD nSamplesPerSec;//采样频率;
 DWORD nAvgBytesperSec;//每秒的数据量;
 WORD nBlockAlign;//块对齐;
}WAVEFORMAT;

  "data"子块包含WAVE文件的数字化波形声音数据,其存放格式依赖于"fmt"子块中wFormatTag成员指定的格式种类,在多声道WAVE文件中,样本是交替出现的。如16bit的单声道WAVE文件和双声道WAVE文件的数据采样格式分别如图四所示:

  16位单声道:

采样一采样二……低字节高字节低字节高字节……

  16位双声道:

采样一……左声道右声道……低字节高字节低字节高字节……

图四、WAVE文件数据采样格式 


 2、声音文件的声音数据的读取操作

  操作声音文件,也就是将WAVE文件打开,获取其中的声音数据,根据所需要的声音数据处理算法,进行相应的数学运算,然后将结果重新存储与WAVE格式的文件中去。可以使用CFILE类来实现读取操作,也可以使用另外一种方法,拿就是使用Windows提供的多媒体处理函数(这些函数都以mmino打头)。这里就介绍如何使用这些相关的函数来获取声音文件的数据,至于如何进行处理,那要根据你的目的来选择不同的算法了。WAVE文件的操作流程如下:1)调用mminoOpen函数来打开WAVE文件,获取HMMIO类型的文件句柄;2)根据WAVE文件的结构,调用mmioRead、mmioWrite和mmioSeek函数实现文件的读、写和定位操作;3)调用mmioClose函数来关闭WAVE文件。

  下面的函数代码就是根据WAVE文件的格式,实现了读取双声道立体声数据,但是在使用下面的代码过程中,注意需要在程序中链接Winmm.lib库,并且包含头文件"Mmsystem.h"。

BYTE * GetData(Cstring *pString) 
//获取声音文件数据的函数,pString参数指向要打开的声音文件;
{
 if (pString==NULL)
  return NULL;
 HMMIO file1;//定义HMMIO文件句柄;
 file1=mmioOpen((LPSTR)pString,NULL,MMIO_READWRITE);
 //以读写模式打开所给的WAVE文件;
 if(file1==NULL)
 {
  MessageBox("WAVE文件打开失败!");
  Return NULL;
 }
 char style[4];//定义一个四字节的数据,用来存放文件的类型;
 mmioSeek(file1,8,SEEK_SET);//定位到WAVE文件的类型位置
 mmioRead(file1,style,4);
 if(style[0]!='W'||style[1]!='A'||style[2]!='V'||style[3]!='E')
  //判断该文件是否为"WAVE"文件格式
 {
  MessageBox("该文件不是WAVE格式的文件!");
  Return NULL;
 }

 PCMWAVEFORMAT format; //定义PCMWAVEFORMAT结构对象,用来判断WAVE文件格式;
 mmioSeek(file1,20,SEEK_SET);
 //对打开的文件进行定位,此时指向WAVE文件的PCMWAVEFORMAT结构的数据;
 mmioRead(file1,(char*)&format,sizeof(PCMWAVEFORMAT));//获取该结构的数据;
 if(format.wf.nChannels!=2)//判断是否是立体声声音;
 {
  MessageBox("该声音文件不是双通道立体声文件");
  return NULL;
 }
 mmioSeek(file1,24+sizeof(PCMWAVEFORMAT),SEEK_SET);
 //获取WAVE文件的声音数据的大小;
 long size;
 mmioRead(file1,(char*)&size,4);
 BYTE *pData;
 pData=(BYTE*)new char[size];//根据数据的大小申请缓冲区;
 mmioSeek(file1,28+sizeof(PCMWAVEFORMAT),SEEK_SET);//对文件重新定位;
 mmioRead(file1,(char*)pData,size);//读取声音数据;
 mmioClose(file1, MMIO_FHOPEN);//关闭WAVE文件;
 return pData;
}
  3、使用MCI方法操作声音文件

  WAVE声音文件一个最基本的操作就是将文件中的声音数据播放出来,用Windows提供的API函数BOOL sndPlaySound(LPCSTR lpszSound, UINT fuSound)可以实现小型WAV文件的播放,其中参数lpszSound 为所要播放的声音文件,fuSound为播放声音文件时所用的标志位。例如实现Sound.wav 文件的异步播放,只要调用函数sndPlaySound("c:\windows\Sound.wav",SND_ASYNC)就可以了,由此可以看到sndPlaySound函数使用是很简单的。但是当WAVE文件大于100K时,这时候系统无法将声音数据一次性的读入内存,sndPlaySound函数就不能进行播放了。为了解决这个问题,你的一个选择就是用MCI方法来操作声音文件了。在使用MCI方法之前,首先需要在你开发的项目设置Project->Setting->Link->Object/library modules中加入winmm.lib。并在头文件中包括"mmsystem.h"头文件。

  MicroSoft API提供了MCI(The Media Control Interface)的方法mciSendCommand()和mciSendString()来完成WAVE文件的播放,这里仅介绍mciSendCommand()函数的使用。

  原型:DWORD mciSendCommand(UINT wDeviceID,UINT wMessage,DWORD dwParam1,DWORD dwParam2);

  参数:wDeviceID:接受消息的设备ID;

  Message:MCI命令消息;

  wParam1:命令的标志位;

  wParam2:所使用参数块的指针

  返值:调用成功,返回零;否则,返回双字中的低字存放有错误信息。

  在使用MCI播放声音文件时,首先要打开音频设备,为此要定义MCI_OPEN_PARMS变量 OpenParms,并设置该结构的相应分量:

OpenParms.lpstrDeviceType = (LPCSTR) MCI_DEVTYPE_WAVEFORM_AUDIO;//WAVE类型
OpenParms.lpstrElementName = (LPCSTR) Filename;//打开的声音文件名;
OpenParms.wDeviceID = 0;//打开的音频设备的ID
  mciSendCommand (NULL, MCI_OPEN,MCI_WAIT | MCI_OPEN_TYPE | MCI_OPEN_TYPE_ID | MCI_OPEN_ELEMENT, (DWORD)(LPVOID) &OpenParms)函数调用发送MCI_OPEN命令后,返回的参数 OpenParms中成员变量的wDeviceID指明打开了哪个设备。需要关闭音频设备时只要调用mciSendCommand (m_wDeviceID, MCI_CLOSE, NULL, NULL)就可以了。

  播放WAVE文件时,需要定义MCI_PLAY_PARMS变量PlayParms,对该变量进行如下设置:PlayParms.dwFrom = 0,这是为了指定从什么地方(时间)播放WAVE文件,设置好以后,调用函数mciSendCommand (m_wDeviceID, MCI_PLAY,MCI_FROM, (DWORD)(LPVOID)&PlayParms));就实现了WAVE声音文件的播放。 

  另外,调用mciSendCommand (m_wDeviceID, MCI_PAUSE, 0,(DWORD)(LPVOID)&PlayParms)实现了暂停功能。调用mciSendCommand (m_wDeviceID, MCI_STOP, NULL, NULL)实现停止功能等,可以看出,这些不同的功能实现都是依靠参数"Message"取不同的值来实现的。 不同的Message和dwParam1、dwParam2的组合还可以实现文件的 跳跃功能。如下面的代码实现了跳转到WAVE文件末端的操作:mciSendCommand (m_wDeviceID, MCI_SEEK, MCI_SEEK_TO_END, NULL)。

  4、DirectSound操作WAVE文件的方法

  MCI虽然调用简单,功能强大,可以满足声音文件处理的基本需要,但是MCI也有它的缺点,那就是它一次只能播放一个WAVE文件,有时在实际应用中,为了实现混音效果,需要同时播放两个或两个以上的WAVE文件时,就需要使用微软DirectX技术中的DirectSound了,该技术直接操作底层声卡设备,可以实现八个以上WAV文件的同时播放。

  实现DirectSound需要以下几个步骤:1.创建及初始化DirectSound;2.设定应用程序的声音设备优先级别方式,一般为DSSCL_NORMAL;2. 将WAV文件读入内存,找到格式块、数据块位置及数据长度;3.创建声音缓冲区;4.载入声音数据;5.播放及停止:

  二、编程步骤

  1、 启动Visual C++6.0生成一个单文档视图结构的应用程序,将该程序命名为"playsound";

  2、 在程序的主菜单中添加"MCI Play"、"PlaySound"菜单,并使用Class Wizard添加相应的消息响应函说,分别用不同的方法来处理声音文件;

  3、 在程序的"Link"设置中添加"dsound.lib、dxguid.lib、winmm.lib"库,程序的视图类中包含"mmsystem.h"文件,程序的Debug目录下添加待播放的声音文件"chimes.wav和sound.wav";

  4、 添加代码,编译运行程序;

 三、程序代码

////////////////////////////////////////////////////
void CPlaysoundView::OnMciplay()//下面的代码实现了WAVE声音文件的播放:
{
 // TODO: Add your command handler code here
 MCI_OPEN_PARMS mciOpenParms;
 MCI_PLAY_PARMS PlayParms;
 mciOpenParms.dwCallback=0;
 mciOpenParms.lpstrElementName="d:\chimes.wav";
 mciOpenParms.wDeviceID=0;
 mciOpenParms.lpstrDeviceType="waveaudio";
 mciOpenParms.lpstrAlias=" ";
 PlayParms.dwCallback=0;
 PlayParms.dwTo=0;
 PlayParms.dwFrom=0;
 mciSendCommand(NULL,MCI_OPEN,MCI_OPEN_TYPE|MCI_OPEN_ELEMENT,(DWORD)(LPVOID)&mciOpenParms);//打开音频设备;
 mciSendCommand(mciOpenParms.wDeviceID,MCI_PLAY,MCI_WAIT,(DWORD)(LPVOID)&PlayParms);//播放WAVE声音文件;
 mciSendCommand(mciOpenParms.wDeviceID,MCI_CLOSE,NULL,NULL);//关闭音频设备;
}
//////////////////////////////////////////////////////////////////////////////
/*下面的函数利用DirectSound技术实现了一个WAVE声音文件的播放(注意项目设置中要包含"dsound.lib、dxguid.lib"的内容),代码和注释如下:*/
void CPlaysoundView::OnPlaySound() 
{
 // TODO: Add your command handler code here
 LPVOID lpPtr1;//指针1;
 LPVOID lpPtr2;//指针2;
 HRESULT hResult;
 DWORD dwLen1,dwLen2;
 LPVOID m_pMemory;//内存指针;
 LPWAVEFORMATEX m_pFormat;//LPWAVEFORMATEX变量;
 LPVOID m_pData;//指向语音数据块的指针;
 DWORD m_dwSize;//WAVE文件中语音数据块的长度;
 CFile File;//Cfile对象;
 DWORD dwSize;//存放WAV文件长度;
 //打开sound.wav文件;
 if (!File.Open ("d://sound.wav", CFile::modeRead |CFile::shareDenyNone))
  return ;
 dwSize = File.Seek (0, CFile::end);//获取WAVE文件长度;
 File.Seek (0, CFile::begin);//定位到打开的WAVE文件头;
 //为m_pMemory分配内存,类型为LPVOID,用来存放WAVE文件中的数据;
 m_pMemory = GlobalAlloc (GMEM_FIXED, dwSize);
 if (File.ReadHuge (m_pMemory, dwSize) != dwSize)//读取文件中的数据;
 {
  File.Close ();
  return ;
 }
 File.Close ();
 LPDWORD pdw,pdwEnd;
 DWORD dwRiff,dwType, dwLength;
 if (m_pFormat) //格式块指针
  m_pFormat = NULL;
 if (m_pData) //数据块指针,类型:LPBYTE
  m_pData = NULL;
 if (m_dwSize) //数据长度,类型:DWORD
  m_dwSize = 0;
 pdw = (DWORD *) m_pMemory;
 dwRiff = *pdw++;
 dwLength = *pdw++;
 dwType = *pdw++;
 if (dwRiff != mmioFOURCC ('R', 'I', 'F', 'F'))
  return ;//判断文件头是否为"RIFF"字符;
 if (dwType != mmioFOURCC ('W', 'A', 'V', 'E'))
  return ;//判断文件格式是否为"WAVE";
  //寻找格式块,数据块位置及数据长度
 pdwEnd = (DWORD *)((BYTE *) m_pMemory+dwLength -4);
 bool m_bend=false;
 while ((pdw < pdwEnd)&&(!m_bend))
 //pdw文件没有指到文件末尾并且没有获取到声音数据时继续;
 {
  dwType = *pdw++;
  dwLength = *pdw++;
  switch (dwType)
  {
   case mmioFOURCC('f', 'm', 't', ' ')://如果为"fmt"标志;
    if (!m_pFormat)//获取LPWAVEFORMATEX结构数据;
    {
     if (dwLength < sizeof (WAVEFORMAT))
      return ;
     m_pFormat = (LPWAVEFORMATEX) pdw;
 
    }
    break;
   case mmioFOURCC('d', 'a', 't', 'a')://如果为"data"标志;
    if (!m_pData || !m_dwSize)
    {
     m_pData = (LPBYTE) pdw;//得到指向声音数据块的指针;
     m_dwSize = dwLength;//获取声音数据块的长度;
     if (m_pFormat)
      m_bend=TRUE;
    }
    break;
   }
   pdw = (DWORD *)((BYTE *) pdw + ((dwLength + 1)&~1));//修改pdw指针,继续循环;
  }
  DSBUFFERDESC BufferDesc;//定义DSUBUFFERDESC结构对象;
  memset (&BufferDesc, 0, sizeof (BufferDesc));
  BufferDesc.lpwfxFormat = (LPWAVEFORMATEX)m_pFormat;
  BufferDesc.dwSize = sizeof (DSBUFFERDESC);
  BufferDesc.dwBufferBytes = m_dwSize;
  BufferDesc.dwFlags = 0;
  HRESULT hRes;
  LPDIRECTSOUND m_lpDirectSound;
  hRes = ::DirectSoundCreate(0, &m_lpDirectSound, 0);//创建DirectSound对象;
  if( hRes != DS_OK )
   return;
  m_lpDirectSound->SetCooperativeLevel(this->GetSafeHwnd(), DSSCL_NORMAL);
  //设置声音设备优先级别为"NORMAL";
  //创建声音数据缓冲;
  LPDIRECTSOUNDBUFFER m_pDSoundBuffer;
  if (m_lpDirectSound->CreateSoundBuffer (&BufferDesc, &m_pDSoundBuffer, 0) == DS_OK)
  //载入声音数据,这里使用两个指针lpPtr1,lpPtr2来指向DirectSoundBuffer缓冲区的数据,这是为了处理大型WAVE文件而设计的。dwLen1,dwLen2分别对应这两个指针所指向的缓冲区的长度。
   hResult=m_pDSoundBuffer->Lock(0,m_dwSize,&lpPtr1,&dwLen1,&lpPtr2,&dwLen2,0);
  if (hResult == DS_OK)
  {
   memcpy (lpPtr1, m_pData, dwLen1);
   if(dwLen2>0) 
   {
    BYTE *m_pData1=(BYTE*)m_pData+dwLen1;
    m_pData=(void *)m_pData1;
    memcpy(lpPtr2,m_pData, dwLen2);
   }
   m_pDSoundBuffer->Unlock (lpPtr1, dwLen1, lpPtr2, dwLen2);
  }
  DWORD dwFlags = 0;
  m_pDSoundBuffer->Play (0, 0, dwFlags); //播放WAVE声音数据;
}
  四、小结

  为了更好的说明DiretSound编程的实现,笔者使用了一个函数来实现所有的操作,当然读者可以将上面的内容包装到一个类中,从而更好的实现程序的封装性,至于如何实现就不需要笔者多说了,真不明白的话,找本C++的书看看(呵呵)。如果定义了类,那么就可以一次声明多个对象来实现多个WAVE声音文件的混合播放。也许细心的读者朋友会发现,在介绍WAVE文件格式的时候我们介绍了PCMWAVEFORMAT结构,但是在代码的实现读取WAVE文件数据部分,我们使用的却是LPWAVEFORMATEX结构,那末是不是我们有错误呢?其实没有错,对于PCM格式的WAVE文件来说,这两个结构是完全一样的,使用LPWAVEFORMATEX结构不过是为了方便设置DSBUFFERDESC对象罢了。

  操作WAVE声音文件的方法很多,灵活的运用它们可以灵活地操作WAVE文件,这些函数的详细用途读者可以参考MSDN。本实例只是对WAVE文件的操作作了一个肤浅的介绍,希望可以对读者朋友起到抛砖引玉的作用.





音乐就是一系列的音符,这些音符在不同的时间用不同的幅度被播放或者停止。有非常多的指令被用来播放音乐,但是这些指令的操作基本相同,都在使用各种各样不同的音符。在计算机上进行作曲,实际上是存储了很多组音乐,回放时由音频硬件将这些音符播放出来。

Midi格式(文件扩展名是.MID)是存储数字音乐的标准格式。

DirectMusic 音乐片段(music segments)使用.SGT文件扩展名,其他的相关文件包括乐队文件(band file .BND),这种文件里面包含乐器信息;弦映射表文件(chordmaps file .CDM)包含在回放时修改音乐的和弦指令;样式文件(styles file .STY)包含回放样式信息;模板文件(templates file .TPL)包含创造音乐片段的模板。

Midi是一种非常强大的音乐格式,惟一的不利因素是音乐品质依赖于音乐合成器的性能,因为Midi 仅仅记录了音符,其播放的品质由播放音乐的软硬件决定。MP3文件(文件后缀为.MP3)是一种类似于波表文件的文件格式,但是MP3文件和WAV文件最 大的区别在于MP3文件将声音压缩到了最小的程度,但是音质却基本不变。可以用DirectShow组件播放MP3文件,DirectShow组件是一个 非常强大的多媒体组件,用DirectShow几乎可以播放任何媒体文件,包括声音和音频文件,部分声音文件我们只能用DirectShow播放。

Direct Audio是一个复合组件,它由DirectSound和DirectMusic两个组件组成,如下图所示:

DirectMusic在DirectX8中得到了巨大的增强,但是DirectSound基本保持原有的状态。DirectSound是主要的数 字声音回放组件。DirectMusic处理所有的乐曲格式,包括MIDI、DirectMusic本地格式文件和波表文件。DirectMusic处理 完之后将它们送入DirectSound中做其他处理,这意味着回放MIDI的时候可以使用数字化的乐器。

使用DirectSound

使用时需要创建一个和声卡通讯的COM对象,用这个COM对象再创造一些独立的声音数据缓冲区(被称之为辅助音频缓冲区 secondary sound buffers)来存储音频数据。缓冲区中的这些数据在主混音缓存(称之为主音频缓存 primary sound buffer)中被混合,然后可以用指定的任何格式播放出来。回放格式通过采样频率、声道数、采样精度排列,可能的采样频率有8000HZ, 11025HZ,22050HZ和44100HZ(CD音质)。

对于声道数可以有两个选择:单通道的单声道声音和双通道的立体声声音。采样精度被限制在两种选择上:8位的低质量声音和16位的高保真声音。在没有修改的 情况下,DirectSound主缓冲区的默认设置是22025HZ采样率、8位精度、立体声。在DirectSound中可以调整声音的播放速度(这同 样会改变声音的音调),调整音量 、循环播放等。甚至还可以在一个虚拟的 3D环境中播放,以模拟一个实际环绕在周围的声音。

需要做的是将声音数据充满缓冲区,如果声音数据太大的话,必须创建流播放方法,加载声音数据中的一小块,当这一小块播放完毕以后,再加载另外的小块数据进 缓冲区,一直持续这个过程,直到声音被处理完毕。在缓冲区中调整播放位置可以实现流式音频,当播放完成通知应用程序更新音频数据。这个通知更新的过程我们 称之为“通告”。在同一时间被播放的缓存数目虽然没有限制,但是仍然需要保证缓冲区数目不要太多,因为每增加一个缓冲区,就要消耗很多内存和CPU资源。

在项目中使用DirectSound和DirectMusic,需要添加头文件dsound.h和dmsuic.h,并且需要链接DSound.lib到包含库中,添加DXGuid.lib库可以让DirectSound更容易使用。

以下是DirectSound COM接口:

IDirectSound8:DirectSound接口。
IDirectSoundBuffer8:主缓冲区和辅助缓冲区接口,保存数据并控制回放。
IDirectSoundNotify8:通知对象,通知应用程序指定播放位置已经达到。

各个对象间的关系如下图所示:



IDirectSound8是主接口,用它来创建缓冲区(IDirectSoundBuffer8),然后用缓冲区接口创建通告接口(IDirectSoundNotify8),通告接口告诉应用程序指定的位置已经到达,通告接口在流化音频文件时非常有用。

初始化DirectSound

使用 DirectSound的第一步是创建IDirectSound8对象,IDirectSound8起到控制音频硬件设备的作用,可以通过 DirectSoundCreate8函数来创建。

The DirectSoundCreate8 function creates and initializes an object that supports the IDirectSound8 interface.

HRESULT DirectSoundCreate8(
LPCGUID lpcGuidDevice,
LPDIRECTSOUND8 * ppDS8,
LPUNKNOWN pUnkOuter
);

Parameters

lpcGuidDevice
Address of the GUID that identifies the sound device. The value of this parameter must be one of the GUIDs returned by DirectSoundEnumerate, or NULL for the default device, or one of the following values.

 

ValueDescriptionDSDEVID_DefaultPlaybackSystem-wide default audio playback device. Equivalent to NULL.DSDEVID_DefaultVoicePlaybackDefault voice playback device.

 

ppDS8
Address of a variable to receive an IDirectSound8 interface pointer.
pUnkOuter
Address of the controlling object's IUnknown interface for COM aggregation. Must be NULL, because aggregation is not supported.

Return Values

If the function succeeds, it returns DS_OK. If it fails, the return value may be one of the following.

Return CodeDSERR_ALLOCATEDDSERR_INVALIDPARAMDSERR_NOAGGREGATIONDSERR_NODRIVERDSERR_OUTOFMEMORY

Remarks

The application must call the IDirectSound8::SetCooperativeLevel method immediately after creating a device object.

 

创建主音频缓冲区

用 IDirectSoundBuffer对象控制主音频缓冲区,创建主缓冲区不需要DirectX8的接口,因为这个接口从来没有改变。用来创建音频缓冲区的函数是IDirectSound8::CreateSoundBuffer。

The CreateSoundBuffer method creates a sound buffer object to manage audio samples.

HRESULT CreateSoundBuffer(
LPCDSBUFFERDESC pcDSBufferDesc,
LPDIRECTSOUNDBUFFER * ppDSBuffer,
LPUNKNOWN pUnkOuter
);

Parameters

pcDSBufferDesc
Address of a DSBUFFERDESC structure that describes the sound buffer to create.
ppDSBuffer
Address of a variable that receives the IDirectSoundBuffer interface of the new buffer object. Use QueryInterface to obtain IDirectSoundBuffer8. IDirectSoundBuffer8 is not available for the primary buffer.
pUnkOuter
Address of the controlling object's IUnknown interface for COM aggregation. Must be NULL.

Return Values

If the method succeeds, the return value is DS_OK, or DS_NO_VIRTUALIZATION if a requested 3D algorithm was not available and stereo panning was substituted. See the description of the guid3DAlgorithm member of DSBUFFERDESC. If the method fails, the return value may be one of the error values shown in the following table.

Return codeDSERR_ALLOCATEDDSERR_BADFORMAT DSERR_BUFFERTOOSMALLDSERR_CONTROLUNAVAILDSERR_DS8_REQUIREDDSERR_INVALIDCALLDSERR_INVALIDPARAMDSERR_NOAGGREGATIONDSERR_OUTOFMEMORYDSERR_UNINITIALIZEDDSERR_UNSUPPORTED

Remarks

DirectSound does not initialize the contents of the buffer, and the application cannot assume that it contains silence.

If an attempt is made to create a buffer with the DSBCAPS_LOCHARDWARE flag on a system where hardware acceleration is not available, the method fails with either DSERR_CONTROLUNAVAIL or DSERR_INVALIDCALL, depending on the operating system.


pcDSBufferDesc是一个指向DSBUFFERDESC结构的指针,保存所创建的缓冲区的信息。

The DSBUFFERDESC structure describes the characteristics of a new buffer object. It is used by the IDirectSound8::CreateSoundBuffer method and by the DirectSoundFullDuplexCreate8 function.

An earlier version of this structure, DSBUFFERDESC1, is maintained in Dsound.h for compatibility with DirectX 7 and earlier.

typedef struct DSBUFFERDESC {
DWORD dwSize;
DWORD dwFlags;
DWORD dwBufferBytes;
DWORD dwReserved;
LPWAVEFORMATEX lpwfxFormat;
GUID guid3DAlgorithm;
} DSBUFFERDESC;

Members

dwSize
Size of the structure, in bytes. This member must be initialized before the structure is used.
dwFlags
Flags specifying the capabilities of the buffer. See the dwFlags member of the DSBCAPS structure for a detailed listing of valid flags.
dwBufferBytes
Size of the new buffer, in bytes. This value must be 0 when creating a buffer with the DSBCAPS_PRIMARYBUFFER flag. For secondary buffers, the minimum and maximum sizes allowed are specified by DSBSIZE_MIN and DSBSIZE_MAX, defined in Dsound.h.
dwReserved
Reserved. Must be 0.
lpwfxFormat
Address of a WAVEFORMATEX or WAVEFORMATEXTENSIBLE structure specifying the waveform format for the buffer. This value must be NULL for primary buffers.
guid3DAlgorithm
Unique identifier of the two-speaker virtualization algorithm to be used by DirectSound3D hardware emulation. If DSBCAPS_CTRL3D is not set in dwFlags, this member must be GUID_NULL (DS3DALG_DEFAULT). The following algorithm identifiers are defined.

 

ValueDescriptionAvailabilityDS3DALG_DEFAULTDirectSound uses the default algorithm. In most cases this is DS3DALG_NO_VIRTUALIZATION. On WDM drivers, if the user has selected a surround sound speaker configuration in Control Panel, the sound is panned among the available directional speakers.Applies to software mixing only. Available on WDM or Vxd Drivers.DS3DALG_NO_VIRTUALIZATION3D output is mapped onto normal left and right stereo panning. At 90 degrees to the left, the sound is coming out of only the left speaker; at 90 degrees to the right, sound is coming out of only the right speaker. The vertical axis is ignored except for scaling of volume due to distance. Doppler shift and volume scaling are still applied, but the 3D filtering is not performed on this buffer. This is the most efficient software implementation, but provides no virtual 3D audio effect. When the DS3DALG_NO_VIRTUALIZATION algorithm is specified, HRTF processing will not be done. Because DS3DALG_NO_VIRTUALIZATION uses only normal stereo panning, a buffer created with this algorithm may be accelerated by a 2D hardware voice if no free 3D hardware voices are available. Applies to software mixing only. Available on WDM or Vxd Drivers.DS3DALG_HRTF_FULLThe 3D API is processed with the high quality 3D audio algorithm. This algorithm gives the highest quality 3D audio effect, but uses more CPU cycles. See Remarks.Applies to software mixing only. Available on Microsoft Windows 98 Second Edition and later operating systems when using WDM drivers.DS3DALG_HRTF_LIGHTThe 3D API is processed with the efficient 3D audio algorithm. This algorithm gives a good 3D audio effect, but uses fewer CPU cycles than DS3DALG_HRTF_FULL.Applies to software mixing only. Available on Windows 98 Second Edition and later operating systems when using WDM drivers.

需要设置的惟一一个值是dwFlags,这是一系列标志,用于决定缓冲区性能。
dwFlags
Flags that specify buffer-object capabilities. Use one or more of the values shown in the following table.

 

ValueDescriptionDSBCAPS_CTRL3DThe buffer has 3D control capability.DSBCAPS_CTRLFREQUENCYThe buffer has frequency control capability.DSBCAPS_CTRLFXThe buffer supports effects processing.DSBCAPS_CTRLPANThe buffer has pan control capability.DSBCAPS_CTRLVOLUMEThe buffer has volume control capability.DSBCAPS_CTRLPOSITIONNOTIFYThe buffer has position notification capability. See the Remarks for DSCBUFFERDESC.DSBCAPS_GETCURRENTPOSITION2The buffer uses the new behavior of the play cursor when IDirectSoundBuffer8::GetCurrentPosition is called. In the first version of DirectSound, the play cursor was significantly ahead of the actual playing sound on emulated sound cards; it was directly behind the write cursor. Now, if the DSBCAPS_GETCURRENTPOSITION2 flag is specified, the application can get a more accurate play cursor. If this flag is not specified, the old behavior is preserved for compatibility. This flag affects only emulated devices; if a DirectSound driver is present, the play cursor is accurate for DirectSound in all versions of DirectX.DSBCAPS_GLOBALFOCUSThe buffer is a global sound buffer. With this flag set, an application using DirectSound can continue to play its buffers if the user switches focus to another application, even if the new application uses DirectSound. The one exception is if you switch focus to a DirectSound application that uses the DSSCL_WRITEPRIMARY flag for its cooperative level. In this case, the global sounds from other applications will not be audible.DSBCAPS_LOCDEFERThe buffer can be assigned to a hardware or software resource at play time, or when IDirectSoundBuffer8::AcquireResources is called.DSBCAPS_LOCHARDWAREThe buffer uses hardware mixing.DSBCAPS_LOCSOFTWAREThe buffer is in software memory and uses software mixing.DSBCAPS_MUTE3DATMAXDISTANCEThe sound is reduced to silence at the maximum distance. The buffer will stop playing when the maximum distance is exceeded, so that processor time is not wasted. Applies only to software buffers.DSBCAPS_PRIMARYBUFFERThe buffer is a primary buffer.DSBCAPS_STATICThe buffer is in on-board hardware memory.DSBCAPS_STICKYFOCUSThe buffer has sticky focus. If the user switches to another application not using DirectSound, the buffer is still audible. However, if the user switches to another DirectSound application, the buffer is muted.DSBCAPS_TRUEPLAYPOSITIONForce IDirectSoundBuffer8::GetCurrentPosition to return the buffer's true play position. This flag is only valid in Windows Vista.
以下是创建声音缓冲区的代码:
    // setup the DSBUFFERDESC structure
    DSBUFFERDESC ds_buffer_desc;

// zero out strcutre
    ZeroMemory(&ds_buffer_desc, sizeof(DSBUFFERDESC));

ds_buffer_desc.dwSize        = sizeof(DSBUFFERDESC); 
ds_buffer_desc.dwFlags       = DSBCAPS_CTRLVOLUME;
ds_buffer_desc.dwBufferBytes = wave_format.nAvgBytesPerSec * 2;  // 2 seconds
    ds_buffer_desc.lpwfxFormat   = &wave_format;

// create the fist version object
    if(FAILED(g_ds->CreateSoundBuffer(&ds_buffer_desc, &ds, NULL)))
{
// error ocuurred
        MessageBox(NULL, "Unable to create sound buffer", "Error", MB_OK);
}
设置格式

对于格式,有一系列的选择,但是建议在11025HZ、16位、单通道;22050HZ、16位、单通道中选择。选择格式的时候,不要尝试使用立体声,立 体声浪费处理时间,而且效果很难评估。同样也不要使用16位以外的采样精度,因为这会导致音质的大幅下降。对于采样频率来说,越高越好,但是也不要设置超 过 22050HZ,在这个采样频率下,也能表现出CD音质的水准而没有太多的损失。

设置回放格式需要通过调用 IDirectSoundBuffer::SetFormat。

The SetFormat method sets the format of the primary buffer. Whenever this application has the input focus, DirectSound will set the primary buffer to the specified format.

HRESULT SetFormat(
LPCWAVEFORMATEX pcfxFormat
);

Parameters

pcfxFormat
Address of a WAVEFORMATEX structure that describes the new format for the primary sound buffer.

Return Values

If the method succeeds, the return value is DS_OK. If the method fails, the return value may be one of the following error values:

Return codeDSERR_BADFORMATDSERR_INVALIDCALLDSERR_INVALIDPARAMDSERR_OUTOFMEMORYDSERR_PRIOLEVELNEEDEDDSERR_UNSUPPORTED

Remarks

The format of the primary buffer should be set before secondary buffers are created.

The method fails if the application has the DSSCL_NORMAL cooperative level.

If the application is using DirectSound at the DSSCL_WRITEPRIMARY cooperative level, and the format is not supported, the method fails.

If the cooperative level is DSSCL_PRIORITY, DirectSound stops the primary buffer, changes the format, and restarts the buffer. The method succeeds even if the hardware does not support the requested format; DirectSound sets the buffer to the closest supported format. To determine whether this has happened, an application can call the GetFormat method for the primary buffer and compare the result with the format that was requested with the SetFormat method.

This method is not available for secondary sound buffers. If a new format is required, the application must create a new DirectSoundBuffer object.


这个函数惟一的参数是指向WAVEFORMATEX结构的指针,该结构保存已设置的格式信息。

The WAVEFORMATEX structure defines the format of waveform-audio data. Only format information common to all waveform-audio data formats is included in this structure. For formats that require additional information, this structure is included as the first member in another structure, along with the additional information.

This structure is part of the Platform SDK and is not declared in Dsound.h. It is documented here for convenience.

typedef struct WAVEFORMATEX {
WORD wFormatTag;
WORD nChannels;
DWORD nSamplesPerSec;
DWORD nAvgBytesPerSec;
WORD nBlockAlign;
WORD wBitsPerSample;
WORD cbSize;
} WAVEFORMATEX;

Members

wFormatTag
Waveform-audio format type. Format tags are registered with Microsoft Corporation for many compression algorithms. A complete list of format tags can be found in the Mmreg.h header file. For one- or two-channel PCM data, this value should be WAVE_FORMAT_PCM.
nChannels
Number of channels in the waveform-audio data. Monaural data uses one channel and stereo data uses two channels.
nSamplesPerSec
Sample rate, in samples per second (hertz). If wFormatTag is WAVE_FORMAT_PCM, then common values for nSamplesPerSec are 8.0 kHz, 11.025 kHz, 22.05 kHz, and 44.1 kHz. For non-PCM formats, this member must be computed according to the manufacturer's specification of the format tag.
nAvgBytesPerSec
Required average data-transfer rate, in bytes per second, for the format tag. If wFormatTag is WAVE_FORMAT_PCM, nAvgBytesPerSec should be equal to the product of nSamplesPerSec and nBlockAlign. For non-PCM formats, this member must be computed according to the manufacturer's specification of the format tag.
nBlockAlign
Block alignment, in bytes. The block alignment is the minimum atomic unit of data for the wFormatTag format type. If wFormatTag is WAVE_FORMAT_PCM or WAVE_FORMAT_EXTENSIBLE, nBlockAlign must be equal to the product of nChannels and wBitsPerSample divided by 8 (bits per byte). For non-PCM formats, this member must be computed according to the manufacturer's specification of the format tag.

Software must process a multiple of nBlockAlign bytes of data at a time. Data written to and read from a device must always start at the beginning of a block. For example, it is illegal to start playback of PCM data in the middle of a sample (that is, on a non-block-aligned boundary).

wBitsPerSample
Bits per sample for the wFormatTag format type. If wFormatTag is WAVE_FORMAT_PCM, then wBitsPerSample should be equal to 8 or 16. For non-PCM formats, this member must be set according to the manufacturer's specification of the format tag. If wFormatTag is WAVE_FORMAT_EXTENSIBLE, this value can be any integer multiple of 8. Some compression schemes cannot define a value for wBitsPerSample, so this member can be zero.
cbSize
Size, in bytes, of extra format information appended to the end of the WAVEFORMATEX structure. This information can be used by non-PCM formats to store extra attributes for the wFormatTag. If no extra information is required by the wFormatTag, this member must be set to zero. For WAVE_FORMAT_PCM formats (and only WAVE_FORMAT_PCM formats), this member is ignored.
以下设置音频格式为:11025HZ、单通道、16位。
    // setup the WAVEFORMATEX structure
    WAVEFORMATEX wave_format;

ZeroMemory(&wave_format, sizeof(WAVEFORMATEX));

wave_format.wFormatTag      = WAVE_FORMAT_PCM;
wave_format.nChannels       = 1;        // mono
    wave_format.nSamplesPerSec  = 11025;    
wave_format.wBitsPerSample  = 16;
wave_format.nBlockAlign     = (wave_format.wBitsPerSample / 8) * wave_format.nChannels;
wave_format.nAvgBytesPerSec = wave_format.nSamplesPerSec * wave_format.nBlockAlign;