研究者業績

高橋 徹

タカハシ トオル  (Takahashi Toru)

基本情報

所属
大阪産業大学 デザイン工学部情報システム学科 教授
学位
博士(工学)(名古屋工業大学)

研究者番号
30419494
J-GLOBAL ID
201201026236304402
researchmap会員ID
7000000887

外部リンク

論文

 132
  • 水本 武志, 辻野 広司, 高橋 徹, 駒谷 和範, 尾形 哲也, 奥乃 博
    情報処理学会論文誌 51(10) pp.2007--2019 2010年10月  査読有り
  • Takuma Otsuka, Kazuhiro Nakadai, Toru Takahashi, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno
    PALADYN Journal of Behavioral Robotics 1(1) pp.80-88 2010年3月  査読有り
  • Toru Takahashi, Kazuhiro Nakadai, Kazunori Komatani, Tetsuya Ogata,Array
    Paladyn 1(1) 37-47 2010年1月  査読有り
  • Kazuhiro Nakadai, Toru Takahashi, Hiroshi G. Okuno, Hirofumi Nakajima, Yuji Hasegawa, Hiroshi Tsujino
    ADVANCED ROBOTICS 24(5-6) 739-761 2010年  査読有り
    This paper presents the design and implementation of the HARK robot audition software system consisting of sound source localization modules, sound source separation modules and automatic speech recognition modules of separated speech signals that works on any robot with any microphone configuration. Since a robot with ears may be deployed to various auditory environments, the robot audition system should provide an easy way to adapt to them. HARK provides a set of modules to cope with various auditory environments by using an open-sourced middleware, FlowDesigner, and reduces the overheads of data transfer between modules. HARK has been open-sourced since April 2008. The resulting implementation of HARK with MUSIC-based sound source localization, GSS-based sound source separation and Missing Feature Theory-based automatic speech recognition on Honda ASIMO, SIG2 and Robovie R2 attains recognizing three simultaneous utterances with the delay of 1.9 s at the word correct rate of 80-90% for three speakers. (C) Koninklijke Brill NV, Leiden and The Robotics Society of Japan, 2010
  • 日下 航, 尾形 哲也, 小島 秀樹, 高橋 徹, 奥乃 愽
    日本ロボット学会誌 27(4) pp.532--543 2010年  査読有り
  • Hisashi Kanda, Tetsuya Ogata, Toru Takahashi, Kazunori Komatani, Hiroshi G. Okuno
    2009 IEEE/RSJ International Conference on Intelligent Robots and Systems 2009年10月  査読有り
  • Shun Nishide, Tatsuhiro Nakagawa, Tetsuya Ogata, Jun Tani, Toru Takahashi, Hiroshi G. Okuno
    2009 IEEE/RSJ International Conference on Intelligent Robots and Systems 2009年10月  査読有り
  • Wataru Hinoshita, Tetsuya Ogata, Hideki Kozima, Hisashi Kanda, Toru Takahashi, Hiroshi G. Okuno
    2009 IEEE/RSJ International Conference on Intelligent Robots and Systems 2009年10月  査読有り
  • Ryu Takeda, Kazuhiro Nakadai, Toru Takahashi, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno
    2009 IEEE/RSJ International Conference on Intelligent Robots and Systems 2009年10月  査読有り
  • Takeshi Mizumoto, Hiroshi Tsujino, Toru Takahashi, Tetsuya Ogata, Hiroshi G. Okuno
    2009 IEEE/RSJ International Conference on Intelligent Robots and Systems 2009年10月  査読有り
  • Toru Takahashi, Kazuhiro Nakadai, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno
    2009 IEEE/RSJ International Conference on Intelligent Robots and Systems 2009年10月  査読有り
  • Takuma Otsuka, Toru Takahashi, Hiroshi G. Okuno, Kazunori Komatani, Tetsuya Ogata, Kazumasa Murata, Kazuhiro Nakadai
    2009 IEEE/RSJ International Conference on Intelligent Robots and Systems 2009年10月  査読有り
  • 合原 一究, 武田 龍, 水本 武志, 高橋 徹, 奥乃 博
    数理解析研究所講究録 1663 153-158 2009年9月  
  • Shun Shiramatsu, Yuji Kubota, Kazunori Komatani, Tetsuya Ogata, Toru Takahashi, Hiroshi G. Okuno
    Opportunities and Challenges for Next-Generation Applied Intelligence, Studies in Computational Intelligence Springer-Verlag 214 111-117 2009年5月  査読有り
  • Ryu Takeda, Kazuhiro Nakadai, Toru Takahashi, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno
    2009 IEEE International Conference on Acoustics, Speech and Signal Processing 3677-+ 2009年4月  査読有り
  • Hideki Kawahara, Masanori Morise, Toru Takahashi, Hideki Banno, Ryuichi Nisimura, Toshio Irino
    Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2647-2650 2009年  
    A simple and fast voice conversion method based only on vowel information is proposed. The proposed method relies on empirical distribution of perceptual spectral distances between representative examples of each vowel segment extracted using TANDEM-STRAIGHT spectral envelope estimation procedure [1]. Mapping functions of vowel spectra are designed to preserve vowel space structure defined by the observed empirical distribution while transforming position and orientation of the structure in an abstract vowel spectral space. By introducing physiological constraints in vocal tract shapes and vocal tract length normalization, difficulties in careful frequency alignment between vowel template spectra of the source and the target speakers can be alleviated without significant degradations in converted speech. The proposed method is a frame-based instantaneous method and is relevant for real-time processing. Applications of the proposed method in-cross language voice conversion are also discussed. Copyright © 2009 ISCA.
  • Hisashi Kanda, Tetsuya Ogata, Toru Takahashi, Kazunori Komatani, Hiroshi G. Okuno
    ICRA: 2009 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, VOLS 1-7 4036-4041 2009年  査読有り
    A continuous vocal imitation system was developed using a computational model that explains the process of phoneme acquisition by infants. Human infants perceive speech sounds not as discrete phoneme sequences but as continuous acoustic signals. One of critical problems in phoneme acquisition is the design for segmenting these continuous speech sounds. The key idea to solve this problem is that articulatory mechanisms such as the vocal tract help human beings to perceive speech sound units corresponding to phonemes. To segment acoustic signal with articulatory movement, we apply the segmenting method to our system by Recurrent Neural Network with Parametric Bias (RNNPB). This method determines the multiple segmentation boundaries in a temporal sequence using the prediction error of the RNNPB model, and the PB values obtained by the method can be encoded as kind of phonemes. Our system was implemented by using a physical vocal tract model, called the Maeda model. Experimental results demonstrated that our system can self-organize the same phonemes in different continuous sounds, and can imitate vocal sound involving arbitrary numbers of vowels using the vowel space in the RNNPB. This suggests that our model reflects the process of phoneme acquisition.
  • Naoki Yasuraoka, Takehiro Abe, Katsutoshi Itoyama, Toru Takahashi, Array,Array
    Proceedings of the 17th International Conference on Multimedia 2009, Vancouver, British Columbia, Canada, October 19-24, 2009 203-212 2009年  査読有り
  • Shun Shiramatsu, Tadachika Ozono, Toramatsu Shintani, Kazunori Komatani, Tetsuya Ogata, Toru Takahashi, Hiroshi G. Okuno
    2009 International Conference on Computational Science and Engineering 2009年  査読有り
  • Takuma Otsuka, Kazuhiro Nakadai, Toru Takahashi, Kazunori Komatani, Array,Array
    9th IEEE-RAS International Conference on Humanoid Robots, Humanoids 2009, Paris, France, December 7-10, 2009 405-410 2009年  査読有り
  • Ryu Takeda, Kazuhiro Nakadai, Toru Takahashi, Kazunori Komatani, Array,Array
    9th IEEE-RAS International Conference on Humanoid Robots, Humanoids 2009, Paris, France, December 7-10, 2009 250-255 2009年  査読有り
  • Akira Maezawa, Katsutoshi Itoyama, Toru Takahashi, Tetsuya Ogata, Hiroshi G. Okuno
    2009 11th IEEE International Symposium on Multimedia 2009年  査読有り
  • 森勢 将雅, 高橋 徹, 河原 英紀, 入野 俊夫
    電子情報通信学会 和文A 分冊 J92-A(3) pp.163--171 2009年  査読有り
  • Masato Onishi, Toru Takahashi, Toshio Irino, Hideki Kawahara
    2008 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY: SLT 2008, PROCEEDINGS 25-+ 2008年  
    New design procedures of time-frequency alignment for automatic speech morphing are proposed. The frequency alignment function at a specific frame is represented as a weighted average of vowel alignment functions based on similarity to each vowel. Julian, an open source speech recognition system, was used to design a time alignment function. Objective and subjective tests were conducted to evaluate the proposed method, and test results indicated that the proposed method yields comparable naturalness to the manually morphed samples in terms of time alignment. The results also illustrated that the proposed frequency alignment provides significantly better naturalness than morphed samples without frequency alignment.
  • Hideki Kawahara, Masanori Morise, Hideki Banno, Toru Takahashi, Ryuichi Nisimura, Toshio Irino
    Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 650-653 2008年  
    A simple new method to recover details in a spectral envelope is proposed based on a recently introduced speech analysis, modification and resynthesis framework called TANDEM-STRAIGHT. Spectral envelope recovery of voiced sounds is a discrete-to-analog conversion in the frequency domain. However, there is a fundamental problem because the spatial frequency contents of vocal tract functions generally exceed the Nyquist limit of the equivalent sampling rate determined by the fundamental frequency. TANDEM-STRAIGHT yields a method to recover a spectral envelope based on the consistent sampling theory and provides base information for exceeding this limit. At the final stage, the AR spectral envelope estimated from the TANDEM-STRAIGHT spectrum is divided by the F0 adaptively smoothed version of itself to supply the missing high-spatial- frequency details of the envelope. The underlying principle of the proposed method can also be applied to other speech synthesis frameworks. Copyright © 2008 ISCA.
  • Toru Takahashi, Shun'ichi Yamamoto, Kazuhiro Nakadai, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5 1(1) 992-+ 2008年  査読有り
  • Toru Takahashi, Toshio Irino, Hideki Kawahara
    19th International Congress on Acoustics (ICA2007) , Madrid, Spain, 2-7 Sept. 2007 2007年9月  査読有り
    (発表日 2 Sept.)
  • Hideki Banno, Hiroaki Hata, Masanori Morise, Toru Takahashi, Toshio Irino, Hideki Kawahara
    Acoustical Science and Technology 28 140-146 2007年5月8日  
    A very high quality speech analysis, modification and synthesis system - STRAIGHT - has now been implemented in C language and operated in realtime. This article first provides a brief summary of STRAIGHT components and then introduces the underlying principles that enabled realtime operation. In STRAIGHT, the built-in extended pitch synchronous analysis, which does not require analysis window alignment, plays an important role in realtime implementation. A detailed description of the processing steps, which are based on the so-called "just-in-time" architecture, is presented. Further, discussions on other issues related to realtime implementation and performance measures are also provided. The software will be available to researchers upon request. © 2007 The Acoustical Society of Japan.
  • Hideki Kawahara, Masanori Morise, Toru Takahashi, Toshio Irino, Hideki Banno, Osamu Fujimura
    European Signal Processing Conference 2219-2223 2007年  
    A new framework is proposed for representing acoustic events based on bandwise durations derived from a group delay function and bandwise aperiodicity indices. The goal is to provide an efficient and detailed source information for a high-quality speech manipulation system, STRAIGHT. The proposed representation enables event based processing of speech parameters and provides means to fill the gap between waveform based methods and VOCODERs in a perceptually relevant manner. Simulations using a pulse plus noise source and a time varying filter demonstrated that the proposed method provides accurate estimates of the source aperiodicity. Application of the proposed method to STRAIGHT illustrated that it enables significant reduction in storage size and improves reproduced sound quality. © 2007 EURASIP.
  • Toru Takahashi, Toru Takahashi, Hideki Banno, Hideki Banno, Toshio Irino, Toshio Irino, Hideki Kawahara, Hideki Kawahara
    European Signal Processing Conference 2006年12月1日  
    A simple, efficient, and high-quality speech style conversion algorithm is proposed based on STRAIGHT. A very highquality VOCODER STRAIGHT consists of instantaneousfrequency based F0 and source information extraction part and F0-adaptive time-frequency smoothing part to eliminate preriodicity interferences. The proposed method uses only vowel information to design the desired conversion functions and parameters. So, it is possible to reduce the amount of training data required for conversion. The processing of the proposed method is : 1) to produce abstract spectra that is represented on the perceptual frequency axis and is derived as average spectrum for each vowel and each style; 2) to decompose the original spectrum into the abstract spectrum and the residual, fine structure; 3) to replace the abstract spectrum from the original to the target style; 4) to map the fine structure with nonlinear frequency warping for adapting the target style fine structure; 5) then to add them together to produce target speech. An efficient algorithm for this conversion was developed using an orthogonal transformation referred to as warped-DCT. An informal listening test indicated that the proposed method yields more natural and high-quality speech style conversion than the previous methods.
  • Toru Takahashi, Masashi Nishi, Toshio Irino, Hideki Kawahara
    INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP 5 2514-2517 2006年  
    The automatic assignment of anchoring points is proposed to define the correspondence between the time-frequency representations of speech samples for speech morphing, speech texture mapping, and so on. The correspondence is modeled as a set of segmental bilinear function. These model parameters are called anchoring points. Although, the correspondence significantly affects the quality of such manipulated speech sounds as morphed and texture mapped speech sounds, anchoring points were manually aligned on time-frequency representations. Anchoring points should be placed at auditorily important locations. When a spectrogram is presented as a time-frequency representation, auditorily important locations are given by formant frequencies around vowel transitions. The central idea of the proposed method is to prepare vowel template spectra with pre-assigned anchoring points in advance and to deform one of the templates to match the input speech spectrum. Finally, anchoring points on the input spectrum are copied from pre-assigned anchoring points. Experimental results suggest that the naturalness of morphed speech based on the proposed automatic assignment method has equivalent quality to STRAIGHT synthetic speech samples.
  • Hideki Kawahara, Alain De Cheveigné, Hideki Banno, Toru Takahashi, Toshio Irino
    9th European Conference on Speech Communication and Technology 537-540 2005年  
    A new method for source information extraction is proposed. The aim of the method is to provide optimal source information for the very high quality speech manipulation system STRAIGHT. The method is based on both time interval and frequency cues, and it provides fundamental frequency and periodicity information within each frequency band, to allow mixed mode excitation. The method is designed to minimize perceptual disturbance due to errors in source information extraction. A preliminary evaluation using a database of simultaneously recorded EGG and speech signals yielded very low gross error rates (0.029% for females and 0.14% for males). In addition, the method is designed so as to minimize the perceptual disturbance caused by any such gross error.

MISC

 109

書籍等出版物

 8

講演・口頭発表等

 80

担当経験のある科目(授業)

 18

所属学協会

 6

Works(作品等)

 1

共同研究・競争的資金等の研究課題

 15

産業財産権

 5

研究テーマ

 1
  • 研究テーマ
    ヒューマンロボットインタラクション,音声コミュニケーション,音声認識,音環境理解,
    キーワード
    マイクロホンアレイ,音響特徴量,音声認識,音源定位,音源分離
    概要
    ロボットと人の自然な対話を実環境において実現するための課題に取り組んでいる