Toru Takahashi, Toru Takahashi, Hideki Banno, Hideki Banno, Toshio Irino, Toshio Irino, Hideki Kawahara, Hideki Kawahara
European Signal Processing Conference 2006年12月1日
A simple, efficient, and high-quality speech style conversion algorithm is proposed based on STRAIGHT. A very highquality VOCODER STRAIGHT consists of instantaneousfrequency based F0 and source information extraction part and F0-adaptive time-frequency smoothing part to eliminate preriodicity interferences. The proposed method uses only vowel information to design the desired conversion functions and parameters. So, it is possible to reduce the amount of training data required for conversion. The processing of the proposed method is : 1) to produce abstract spectra that is represented on the perceptual frequency axis and is derived as average spectrum for each vowel and each style; 2) to decompose the original spectrum into the abstract spectrum and the residual, fine structure; 3) to replace the abstract spectrum from the original to the target style; 4) to map the fine structure with nonlinear frequency warping for adapting the target style fine structure; 5) then to add them together to produce target speech. An efficient algorithm for this conversion was developed using an orthogonal transformation referred to as warped-DCT. An informal listening test indicated that the proposed method yields more natural and high-quality speech style conversion than the previous methods.