研究者業績

高橋 徹

タカハシ トオル  (Takahashi Toru)

基本情報

所属
大阪産業大学 デザイン工学部情報システム学科 教授
学位
博士(工学)(名古屋工業大学)

研究者番号
30419494
J-GLOBAL ID
201201026236304402
researchmap会員ID
7000000887

外部リンク

論文

 115

MISC

 71
  • 乾 聡志, 高橋 徹
    電子情報通信学会技術研究報告 = IEICE technical report : 信学技報 116(477) 129-134 2017年3月1日  
  • 高橋徹, 山田耕嗣
    大阪産業大学論文集, 自然科学編 128(128) 31-40 2017年3月  
    我々は,可聴周波数帯域での振幅変調に基づく信号伝送システムの設計と実装について述べる。典型的な振幅変調に基づく信号伝送システムでは,信号は,非常に高い周波数によって変調される。我々のシステムは,2から20000Hzの間の可聴信号のような非常に低い周波数を用いる。我々は,可聴帯域で信号を伝送可能であることを実験で示した。We describe a design and an implementation of a signal transmission system based on the amplitude modulation in audible frequency bands. In conventional signal transmission system based on the amplitude modulation, signal is modulated by very high frequency.Our system uses very low frequency, such as audible signal between 2 and 20000Hz. We experimentally show that it is possible to transmit based on the audible bands.
  • 高橋 徹, 能勢 和夫, 塚本 直幸
    電子情報通信学会技術研究報告 = IEICE technical report : 信学技報 115(354) 43-46 2015年12月8日  
  • 高橋 徹, 能勢 和夫, 塚本 直幸, 吉川 耕司
    電子情報通信学会技術研究報告 = IEICE technical report : 信学技報 114(357) 57-62 2014年12月11日  
    本稿では,GPSを用いた路面電車の位置を通知するシステムの開発と評価について述べ,設計思想を示す.最も重要な設計思想は,容易に入手可能な汎用品を用いてシステムを構成する点である.何故ならば,バスやタクシーや電車のような他の交通機関への導入が推進されることを狙っているためである.本コンセプトに基づくプロトタイプシステムを開発し,阪堺電気軌道で試験サービスを実施し評価した.位置情報を地図上に表す時の測位値をそのまま地図上に表示する替わりに,推定誤差軽減のためマップマチングアルゴリズムを用いて補正した.またマップマッチングアルゴリズムに用いるアンカーポイント間の間隔について評価している.以上の評価に基づきシステムをチューニングした結果,最大誤差で100mで,情報表示までの遅延を3秒程度に抑えることができた.
  • 阿曽 慎平, 齋藤 毅, 後藤 真孝, 糸山 克寿, 高橋 徹, 尾形 哲也, 奥乃 博
    研究報告音楽情報科学(MUS) 2012(13) 1-8 2012年1月27日  
    本稿では,歌声と朗読音声を識別するシステムについて述べる.入力は無雑音音声,出力は歌声と朗読音声それぞれの尤度 (連続値) である.従来,スペクトル包絡 (MFCC) と基本周波数 (F0) の時間変化に基づいた識別システムが報告されている.これらの特徴量に基づく識別器に,スペクトル変化量のピーク間隔という,音素継続時間に関連する特徴量に基づく識別器を加え,入力音声長に応じて各識別器への重みを変化させた.実験の結果,従来システムでは1秒の音声に対し 86.7% の精度であったのに対し,本システムでは 90.2% という結果を得た.本システムが実時間で動作するデモアプリケーションについても述べる.In this paper we describe a system that discriminates between singing and speaking voices. Given a clean speech signal, it outputs the likelihood of each of the singing and speaking voices. Previous systems use temporal transition of spectral envelope (MFCC) and fundamental frequency (F0) as discrimina- tion features. Our system adds peak interval of spectral change as a phoneme duration feature and weights these features according to the duration of the input speech signal. Experimental results with one-second speech signal show that our system achieves 90.2 % accuracy compared to 86.7 % with previous systems. We also describe a real-time application demonstrating our system.
  • Kohei Nagira, Toru Takahashi, Tetsuya Ogata, Hiroshi G. Okuno
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 7191 388-396 2012年  査読有り
    We present a method of blind source separation (BSS) for speech signals using a complex extension of infinite sparse factor analysis (ISFA) in the frequency domain. Our method is robust against delayed signals that usually occur in real environments, such as reflections, short-time reverberations, and time lags of signals arriving at microphones. ISFA is a conventional non-parametric Bayesian method of BSS, which has only been applied to time domain signals because it can only deal with real signals. Our method uses complex normal distributions to estimate source signals and mixing matrix. Experimental results indicate that our method outperforms the conventional ISFA in the average signal-to-distortion ratio (SDR). © 2012 Springer-Verlag.
  • Yasuharu Hirasawa, Naoki Yasuraoka, Toru Takahashi, Tetsuya Ogata, Hiroshi G. Okuno
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 7191 446-453 2012年  査読有り
    This paper focuses on blind speech separation in under-determined conditions, that is, in the case when there are more sound sources than microphones. We introduce a sound source model based on the Gaussian mixture model (GMM) to represent a speech signal in the time-frequency domain, and derive rules for updating the model parameters using the auxiliary function method. Our GMM sound source model consists of two kinds of Gaussians: sharp ones representing harmonic parts and smooth ones representing nonharmonic parts. Experimental results reveal that our method outperforms the method based on non-negative matrix factorization (NMF) by 0.7dB in the signal-to-distortion ratio (SDR), and by 1.7dB in the signal-to-interference ratio (SIR). This means that our method effectively removes interference coming from other talkers. © 2012 Springer-Verlag.
  • 駒谷和範, 松山匡子, 武田龍, 高橋徹, 尾形哲也, 奥乃博
    情報処理学会論文誌ジャーナル(CD-ROM) 52(12) 3374-3385 2011年12月15日  
  • 糸原達彦, 大塚琢馬, 水本武志, 高橋徹, 尾形哲也, 奥乃博
    全国大会講演論文集 2011(1) 235-237 2011年3月2日  
    合奏において、ビートトラッキングは動作タイミングの取得の基礎となる技術である。ギターとの合奏において、ビートトラッキングは演奏テンポの揺らぎや裏拍ビートを含む多様なリズムへの頑健性、つまり(1)テンポと(2)音符長の両方の変動に対する追従性が要求される。しかし従来の手法では両立できなかった。本研究では視聴覚情報統合により、両者の変動追従性向上を実現する。(1)の問題にはSTPMという聴覚情報を用いた手法を適用する。(2)の問題はギター演奏動作の周期性を利用し手の位置情報を取得、それとSTPMで得られる信頼度関数とに粒子フィルタを適用することで解決する。
  • 山川暢英, 高橋徹, 北原鉄朗, 尾形哲也, 奥乃博
    情報処理学会全国大会講演論文集 73rd(2) 2.113-2.114-114 2011年3月2日  
    本研究は複数の環境音を認識し,その結果を対話に役立てることができるロボット聴覚システムの開発を目標とする.ロボット聴覚における環境音認識の課題は:(1) 主にロボット自身からの音による多雑音環境下での認識,(2) 音源分離処理によるスペクトル歪みに頑健な特徴量の必要性,である.本稿では同時に発生する二つの環境音を定位・分離し,個別の信号を,信号中の目立った特徴のみを抽出可能なGaborウェーヴレットによる matching pursuit (MP) を用いた認識実験を行う.実験結果から,音源の特性に依存するものの,MPで抽出した特徴量の雑音頑健性が示された.
  • Takeshi Mizumoto, Kazuhiro Nakadai, Takami Yoshida, Ryu Takeda, Takuma Otsuka, Toru Takahashi, Hiroshi G. Okuno
    2011 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA) 2130-2137 2011年  査読有り
    This paper presents the design and implementation of selectable sound separation functions on the telepresence system "Texai" using the robot audition software "HARK." An operator of Texai can "walk" around a faraway office to attend a meeting or talk with people through video-conference instead of meeting in person. With a normal microphone, the operator has difficulty recognizing the auditory scene of the Texai, e.g., he/she cannot know the number and the locations of sounds. To solve this problem, we design selectable sound separation functions with 8 microphones in two modes, overview and filter modes, and implement them using HARK's sound source localization and separation. The overview mode visualizes the direction-of-arrival of surrounding sounds, while the filter mode provides sounds that originate from the range of directions he/she specifies. The functions enable the operator to be aware of a sound even if it comes from behind the Texai, and to concentrate on a particular sound. The design and implementation was completed in five days due to the portability of HARK. Experimental evaluations with actual and simulated data show that the resulting system localizes sound sources with a tolerance of 5 degrees.
  • Nobuhide Yamakawa, Toru Takahashi, Tetsuro Kitahara, Tetsuya Ogata, Hiroshi G. Okuno
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 6704(2) 1-10 2011年  査読有り
    Our goal is to achieve a robot audition system that is capable of recognizing multiple environmental sounds and making use of them in human-robot interaction. The main problems in environmental sound recognition in robot audition are: (1) recognition under a large amount of background noise including the noise from the robot itself, and (2) the necessity of robust feature extraction against spectrum distortion due to separation of multiple sound sources. This paper presents the environmental recognition of two sound sources fired simultaneously using matching pursuit (MP) with the Gabor wavelet, which extracts salient audio features from a signal. The two environmental sounds come from different directions, and they are localized by multiple signal classification and, using their geometric information, separated by geometric source separation with the aid of measured head-related transfer functions. The experimental results show the noise-robustness of MP although the performance depends on the properties of the sound sources. © 2011 Springer-Verlag.
  • Yasuharu Hirasawa, Toru Takahashi, Tetsuya Ogata, Hiroshi G. Okuno
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 6703(1) 348-358 2011年  査読有り
    In real-world situations, people often hear more than two simultaneous sounds. For robots, when the number of sound sources exceeds that of sensors, the situation is called under-determined, and robots with two ears need to deal with this situation. Some studies on under-determined sound source separation use L1-norm minimization methods, but the performance of automatic speech recognition with separated speech signals is poor due to its spectral distortion. In this paper, a two-stage separation method to improve separation quality with low computational cost is presented. The first stage uses a L1-norm minimization method in order to extract the harmonic structures. The second stage exploits reliable harmonic structures to maintain acoustic features. Experiments that simulate three utterances recorded by two microphones in an anechoic chamber show that our method improves speech recognition correctness by about three points and is fast enough for real-time separation. © 2011 Springer-Verlag.
  • Yang Zhang, Shun Nishide, Toru Takahashi, Hiroshi G. Okuno, Tetsuya Ogata
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2011, PT I 6791 167-175 2011年  査読有り
    Our goal is to develop a system that is able to learn and classify environmental sounds for robots working in the real world. In the real world, two main restrictions pertain in learning. First, the system has to learn using only a small amount of data in a limited time because of hardware restrictions. Second, it has to adapt to unknown data since it is virtually impossible to collect samples of all environmental sounds. We used a neuro-dynamical model to build a prediction and classification system which can self-organize sound classes into parameters by learning samples. The proposed system searches space of parameters for classifying. In the experiment, we evaluated the accuracy of classification for known and unknown sound classes.
  • Yasuharu Hirasawa, Naoki Yasuraoka, Toru Takahashi, Tetsuya Ogata, Hiroshi G. Okuno
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5 1756-1759 2011年  査読有り
    This paper presents an efficient algorithm to solve Lp-norm minimization problem for under-determined speech separation; that is, for the case that there are more sound sources than microphones. We employ an auxiliary function method in order to derive update rules under the assumption that the amplitude of each sound source follows generalized Gaussian distribution. Experiments reveal that our method solves the L1-norm minimization problem ten times faster than a general solver, and also solves Lp-norm minimization problem efficiently, especially when the parameter p is small; when p is not more than 0.7, it runs in real-time without loss of separation quality.
  • Hiromitsu Awano, Shun Nishide, Hiroaki Arie, Jun Tani, Toru Takahashi, Hiroshi G. Okuno, Tetsuya Ogata
    NEURAL INFORMATION PROCESSING, PT III 7064 323-+ 2011年  査読有り
    The objective of our study is to find out how a sparse structure affects the performance of a recurrent neural network (RNN). Only a few existing studies have dealt with the sparse structure of RNN with learning like Back Propagation Through Time (BPTT). In this paper, we propose a RNN with sparse connection and BPTT called Multiple time scale RNN (MTRNN). Then, we investigated how sparse connection affects generalization performance and noise robustness. In the experiments using data composed of alphabetic sequences, the MTRNN showed the best generalization performance when the connection rate was 40%. We also measured sparseness of neural activity and found out that sparseness of neural activity corresponds to generalization performance. These results means that sparse connection improved learning performance and sparseness of neural activity would be used as metrics of generalization performance.
  • Yang Zhang, Tetsuya Ogata, Shun Nishide, Toru Takahashi, Hiroshi G. Okuno
    in Proc. of Joint 5th Int. Conf. on Soft Computing and Intelligent Systems and 11th International Symposium on advanced Intelligent Systems (SCIS & ISIS 2010) 378-383 2010年12月  査読有り
  • 水本 武志, 辻野 広司, 高橋 徹, 駒谷 和範, 尾形 哲也, 奥乃 博
    情報処理学会論文誌 51(10) 2007-2019 2010年10月15日  
    本論文では,テルミンを演奏するロボットのためのテルミンの特性モデルと演奏動作生成手法について報告する.テルミンとは,演奏者の手の位置を動かして演奏する電子楽器である.楽器との物理的接触なしで連続的に音高と音量を操作できるので,ハードウェア構成が異なるロボットにも適用可能であるという点で,移植性が高い.テルミン演奏ロボットの主たる課題は,(1)動作生成の物理的な基準点がないので,演奏法学習に要する学習サンプル数が多くなること,および(2)演奏特性が静電的環境によって変化するので,適応的な演奏動作生成が必要であることの2点である.これらの課題に対して,我々は環境の影響をパラメータで表現した音高・音量特性モデルを構築し,少数の測定で音域内の任意の音を演奏できる制御手法を開発した.実験の結果,約12点の測定で音高が任意に制御できること,環境が変化しても所望の音高や音量で演奏できることを3種類のロボットで確認した.We present a theremin player robot towards an ensemble between humans and robots. A theremin, whose pitch and volume change continuously, can be played without any physical contacts. We thus expect that a robot system has high portability because it requires only few physical constraints. The problems for theremin playing are: (1) we have no physical reference points and (2) an environment affects sound characteristics seriously. To solve them, we develop a model-based feedforward arm control method based on our novel models of theremin's pitch and volume characteristics, which method realizes play an arbitrary sound using a few measurements. Experimental results show that our method works under four environments and three different robots.
  • Takuma Otsuka, Kazuhiro Nakadai, Toru Takahashi, Tetsuya Ogata, Hiroshi G. Okuno
    Proceedings of IEEE/RSJ-2010 Workshop on Robots and Musical Expression,CD-ROM 2010年10月  査読有り
  • Takeshi Mizumoto, Angelica Lim, Takuma Otsuka, Kazuhiro Nakadai, Toru Takahashi, Tetsuya Ogata, Hiroshi G. Okuno
    Proceedings of IEEE/RSJ-2010 Workshop on Robots and Musical Expression,CD-ROM 159-171 2010年10月  査読有り
  • Angelica Lim, Takeshi Mizumoto, Toru Takahashi, Tetsuya Ogata, Hiroshi G. Okuno
    Proceedings of IEEE/RSJ-2010 Workshop on Robots and Musical Expression,CD-ROM 2010年10月  査読有り
  • Shinpei Aso, Takuya Saitou, Masataka Goto, Katsutoshi Itoyama, Toru Takahashi, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno
    Proceedings of the 13th International Conference on Digital Audio Effects (DAFx-10) 2010年9月  査読有り
  • 奥乃 博, 中臺 一博, 高橋 徹
    電子情報通信学会ソサイエティ大会講演論文集 2010 "SS-72"-"SS-73" 2010年8月31日  
  • Akira Maezawa, Katsutoshi Itoyama, Toru Takahashi, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno
    Proceedings of 11th International Conference on Musical Information Retreival (ISMIR-2010) 2010年8月  査読有り
  • 安良岡 直希, 糸山克寿, 吉岡 拓也, 高橋 徹, 駒谷 和範, 尾形 哲也, 奥乃 博
    研究報告音楽情報科学(MUS) 2010(20) 1-8 2010年7月21日  
    フレーズ置換とは,多重奏音響信号から特定パート演奏をユーザー指定の別楽譜による演奏に差し替えるものである.これは,1) 元々のフレーズ演奏成分を除去する音源分離の課題と,2)元演奏の音色や演奏表情を新しい演奏上で再現する演奏合成の課題からなる.我々は調波非調波Gaussian Mixture Model (GMM) による置換対象演奏モデルとNonnegative Matrix Factorizationによる伴奏モデルを用いて音源分離を行い,同時に調波非調波GMMから得た基本周波数,倍音強度などの音響特徴を新しい演奏楽譜のMIDI音源音響信号に転写することで元演奏の音響特性を持つ新しい演奏を合成する.本フレーズ置換法に対し1) 元の演奏が正しく除去されるか,2) 新しい演奏は元演奏の特徴を保持しているか,の2点を客観評価し,提案法の有効性を示す.This paper presents a music manipulating system that enables a user to replace an instrument performance phrase in polyphonic audio mixture. Two technical problems must be solved to realize this system: 1)separating the melody part from accompaniment, and 2)synthesizing a new instrument performance that has timbre and expression of the original one. Our method first performs the separation using statistical model integrating harmonic and inharmonic Gaussian mixture and nonnegative-matrix-factorization. Then our method synthesizes a new instrument performance by adding the acoustic characteristics given by Gaussian mixture parameters to a MIDI synthesizer-generated sound. Two evaluations confirm the effectiveness of the proposed method.
  • 前澤 陽, 後藤 真孝, 高橋 徹, 駒谷 和範, 尾形 哲也, 奥乃 博
    全国大会講演論文集 72 143-144 2010年3月8日  
  • 安良岡 直希, 糸山 克寿, 高橋 徹, 駒谷 和範, 尾形 哲也, 奥乃 博
    全国大会講演論文集 72 183-184 2010年3月8日  
  • 水本 武志, 大塚 琢馬, 高橋 徹, 駒谷 和範, 尾形 哲也, 奥乃 博, 奥乃 博
    全国大会講演論文集 72 201-202 2010年3月8日  
  • 水本 武志, 高橋 徹, 駒谷 和範, 尾形 哲也, 奥乃 博
    全国大会講演論文集 72 203-204 2010年3月8日  
  • 平澤 恭治, 高橋 徹, 駒谷 和範, 尾形 哲也, 奥乃 博
    全国大会講演論文集 72 253-254 2010年3月8日  
  • 山川 暢英, 北原 鉄朗, 高橋 徹, 駒谷 和範, 尾形 哲也, 奥乃 博
    全国大会講演論文集 72 257-258 2010年3月8日  
  • 穐山 空道, 駒谷 和範, 高橋 徹, 尾形 哲也, 奥乃 博
    全国大会講演論文集 72 291-292 2010年3月8日  
  • 阿曽 慎平, 齋藤 毅, 後藤 真孝, 糸山 克寿, 高橋 徹, 駒谷 和範, 尾形 哲也, 奥乃 博
    全国大会講演論文集 72 295-296 2010年3月8日  
  • 粟野 皓光, 尾形 哲也, 高橋 徹, 駒谷 和範, 奥乃 博
    全国大会講演論文集 72 395-396 2010年3月8日  
  • 日下 航, 有江 浩明, 谷 淳, 尾形 哲也, 高橋 徹, 駒谷 和範, 奥乃 博
    全国大会講演論文集 72 525-526 2010年3月8日  
  • 武田 龍, 中臺 一博, 高橋 徹, 駒谷 和範, 尾形 哲也, 奥乃 博
    全国大会講演論文集 72 27-28 2010年3月8日  
  • 高橋 徹, 中臺 一博, 駒谷 和範, 尾形 哲也, 奥乃 博
    全国大会講演論文集 72 29-30 2010年3月8日  
  • 松山 匡子, 駒谷 和範, 高橋 徹, 尾形 哲也, 奥乃 博
    全国大会講演論文集 72 129-130 2010年3月8日  
  • 山川暢英, 高橋徹, 北原鉄朗, 尾形哲也, 奥乃博
    日本ロボット学会学術講演会予稿集(CD-ROM) 28th ROMBUNNO.1H2-4 2010年  
  • Toru Takahashi, Kazuhiro Nakadai, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno
    2010 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA) 470-475 2010年  査読有り
    This paper describes improvement of sound source separation for a simultaneous automatic speech recognition (ASR) system of a humanoid robot. A recognition error in the system is caused by a separation error and interferences of other sources. In separability, an original geometric source separation (GSS) is improved. Our GSS uses a measured robot's head related transfer function (HRTF) to estimate a separation matrix. As an original GSS uses a simulated HRTF calculated based on a distance between microphone and sound source, there is a large mismatch between the simulated and the measured transfer functions. The mismatch causes a severe degradation of recognition performance. Faster convergence speed of separation matrix reduces separation error. Our approach gives a nearer initial separation matrix based on a measured transfer function from an optimal separation matrix than a simulated one. As a result, we expect that our GSS improves the convergence speed. Our GSS is also able to handle an adaptive step-size parameter. These new features are added into open source robot audition software (OSS) called "HARK" which is newly updated as version 1.0.0. The HARK has been installed on a HRP-2 humanoid with an 8-element microphone array. The listening capability of HRP-2 is evaluated by recognizing a target speech signal which is separated from a simultaneous speech signal by three talkers. The word correct rate (WCR) of ASR improves by 5 points under normal acoustic environments and by 10 points under noisy environments. Experimental results show that HARK 1.0.0 improves the robustness against noises.
  • Ryu Takeda, Kazuhiro Nakadai, Toru Takahashi, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno
    2010 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA) 4366-4371 2010年  査読有り
    This paper presents the upper-limit evaluation of robot audition based on ICA-BSS in multi-source, barge-in and highly reverberant conditions. The goal is that the robot can automatically distinguish a target speech from its own speech and other sound sources in a reverberant environment. We focus on the multi-channel semi-blind ICA (MCSB-ICA), which is one of the sound source separation methods with a microphone array, to achieve such an audition system because it can separate sound source signals including reverberations with few assumptions on environments. The evaluation of MCSB-ICA has been limited to robot's speech separation and reverberation separation. In this paper, we evaluate MCSB-ICA extensively by applying it to multi-source separation problems under common reverberant environments. Experimental results prove that MCSB-ICA outperforms conventional ICA by 30 points in automatic speech recognition performance.
  • Takuma Otsuka, Takeshi Mizumoto, Kazuhiro Nakadai, Toru Takahashi, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno
    TRENDS IN APPLIED INTELLIGENT SYSTEMS, PT I, PROCEEDINGS 6096 102-+ 2010年  査読有り
    Our goal is to achieve a musical ensemble among a robot and human musicians where the robot listens to the music with its own microphones The maul issues ale (7) robust heat-tracking since the robot hears its own generated sounds in addition to the accompanied and (2) robust synchronizing it performance with the accompanied music even if humans' musical performance fluctuates This paper presents a. music-ensemble Therein mist robot implemented on the humanoid HRP-2 with the following three functions (1) self-generated Theremin sound suppression by semi-blind Independent Component Analysis, (2) beat tracking robust against, tempo fluctuation in humans' performance, and (3) feedforward control of Theremin pitch Experimental results with a. human drummer show the capability of this robot for the adaptation to the temporal fluctuation in his performance
  • Kyoko Matsuyama, Kazunori Komatani, Toru Takahashi, Tetsuya Ogata, Hiroshi G. Okuno
    TRENDS IN APPLIED INTELLIGENT SYSTEMS, PT II, PROCEEDINGS 6097 585-594 2010年  査読有り
    We describe a novel dialogue strategy enabling robust, interaction under noisy environments where automatic speech recognition (ASR) results ate not, necessarily tellable We have developed method that exploits utterance timing together with ASR. results to interpret, user intention, that, IS to identify one item that a user wants to indicate from system The timing of utterances containing icier:laud expressors is approximated by Gamma distribution which is integrated with ASK results by expressing of them as probabilities lit this paper, we improve the identification accuracy by extending the method First we enable interpretation of utterances including ordinal numbers, which appear several tunes in out data collected from users Then we use proper acoustic models and parameters; improving the identification accuracy by 4 0% in total We also show that Latent Semantic Mapping enables mole expressions to be handled in our framework
  • Akira Maezawa, Katsutoshi Itoyama, Toru Takahashi, Kazunori Komatani, Tetsnya Ogata, Hiroshi C. Okuno
    TRENDS IN APPLIED INTELLIGENT SYSTEMS, PT III, PROCEEDINGS 6098 249-259 2010年  査読有り
    This work presents an automated violin fingering estimation method that facilitates a student violinist acquire the "sound" of his/her favorite recording artist created by the artist's unique fingering. Our method realizes this by analyzing an audio recording played by the artist, and recuperating the most playable fingering that recreates the aural characteristics of the recording. Recovering the aural characteristics requires the bowed string estimation of an audio recording, and using the estimated result for optimal fingering decision. The former requires high accuracy mid robustness against the use of different violins or brand of strings; and the latter needs to create a natural fingering for the violinist. We solve the first problem by detecting estimation errors using rule-based algorithms, and by adapting the estimator to the recording based on mean normalization. We solve the second problem by incorporating; in addition to generic stringed-instrument model used in existing studies; a fingering model that is based on pedagogical practices of violin playing; defined OH a. sequence of two or three notes. The accuracy of the bowed string estimator improved by 21 points in a realistic situation (38% - 59%) by incorporating error correction and mean normalization. Subjective evaluation of the optimal fingering decision algorithm by seven violinists on 22 musical excerpts showed that compared to the model used in existing studies, our proposed model was preferred over existing one (p = 0.01) but no significant preference towards proposed method defined on sequence of two notes versus three notes was observed (p = 0.05).
  • Takuma Otsuka, Kazuhiro Nakadai, Toru Takahashi, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno
    PROCEEDINGS OF THE TWENTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-10) 1238-1244 2010年  査読有り
    Our goal is to develop an interactive music robot, i.e., a robot that presents a musical expression together with humans. A music interaction requires two important functions: synchronization with the music and musical expression, such as singing and dancing. Many instrument-performing robots are only capable of the latter function, they may have difficulty in playing live with human performers. The synchronization function is critical for the interaction. We classify synchronization and musical expression into two levels: (1) the rhythm level and (2) the melody level. Two issues in achieving two-layer synchronization and musical expression are: (1) simultaneous estimation of the rhythm structure and the current part of the music and (2) derivation of the estimation confidence to switch behavior between the rhythm level and the melody level. This paper presents a score following algorithm, incremental audio to score alignment, that conforms to the two-level synchronization design using a particle filter. Our method estimates the score position for the melody level and the tempo for the rhythm level. The reliability of the score position estimation is extracted from the probability distribution of the score position. Experiments are carried out using polyphonic jazz songs. The results confirm that our method switches levels in accordance with the difficulty of the score estimation. When the tempo of the music is less than 120 (beats per minute; bpm), the estimated score positions are accurate and reported; when the tempo is over 120 (bpm), the system tends to report only the tempo to suppress the error in the reported score position predictions.
  • Hideki Kawahara, Masanori Morise, Toru Takahashi, Hideki Banno, Ryuichi Nisimura, Toshio Irino
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2 38-+ 2010年  査読有り
    A systematic framework for non-periodic excitation source representation is proposed for high-quality speech manipulation systems such as TANDEM-STRAIGHT, which is basically a channel VOCODER. The proposed method consists of two subsystems for non-periodic components; a colored noise source and an event analyzer/generator. The colored noise source is represented by using a sigmoid model with non-linear level conversion. Two model parameters, boundary frequency and slope parameters, are estimated based on pitch range linear prediction combined with F0 adaptive temporal axis warping and those on the original temporal axis. The event subsystem detects events based on kurtosis of filtered speech signals. The proposed framework provides significant quality improvement for high-quality recorded speech materials.
  • Kyoko Matsuyama, Kazunori Komatani, Ryu Takeda, Toru Takahashi, Tetsuya Ogata, Hiroshi G. Okuno
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4 3050-3053 2010年  査読有り
    In our barge-in-able spoken dialogue system, the user's behaviors such as barge-in timing and utterance expressions vary according to his/her characteristics and situations. The system adapts to the behaviors by modeling them. We analyzed 1584 utterances collected by our systems of quiz and news-listing tasks and showed that ratio of using referential expressions depends on individual users and average lengths of listed items. This tendency was incorporated as a prior probability into our method and improved the identification accuracy of the user's intended items.
  • Nobuhide Yamakawa, Tetsuro Kitahara, Toru Takahashi, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4 2342-+ 2010年  査読有り
    Research on environmental sound recognition has not shown great development in comparison with that on speech and musical signals. One of the reasons is that the sound category of environmental sounds covers a broad range of acoustical natures. We classified them in order to explore suitable recognition techniques for each characteristic. We focus on impulsive sounds and their non-stationary feature within and between analytic frames. We used matching-pursuit as a framework to use wavelet analysis for extracting temporal variation of audio features inside a frame. We also investigated the validity of modeling decaying patterns of sounds using Hidden markov models. Experimental results indicate that sounds with multiple impulsive signals are recognized better by using time-frequency analyzing bases than by frequency domain analysis. Classification of sound classes with a long and clear decaying pattern improves when HMMs with multiple number of hidden states are applied.
  • Hiromitsu Awano, Tetsuya Ogata, Shun Nishide, Torn Takahashi, Kazunori Komatani, Hiroshi G. Okuno
    IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2010) 2010年  査読有り
    The objective of our study was to develop dynamic collaboration between a human and a robot. Most conventional studies have created pre-designed rule-based collaboration systems to determine the timing and behavior of robots to participate in tasks. Our aim is to introduce the confidence of the task as a criterion for robots to determine their timing and behavior. In this paper, we report the effectiveness of applying reproduction accuracy as a measure for quantitatively evaluating confidence in an object arrangement task. Our method is comprised of three phases. First, we obtain human-robot interaction data through the Wizard of OZ method. Second, the obtained data are trained using a neuro-dynamical system, namely, the Multiple Time-scales Recurrent Neural Network (MTRNN). Finally, the prediction error in MTRNN is applied as a confidence measure to determine the robot's behavior. The robot participated in the task when its confidence was high, while it just observed when its confidence was low. Training data were acquired using an actual robot platform, Hiro. The method was evaluated using a robot simulator. The results revealed that motion trajectories could be precisely reproduced with a high degree of confidence, demonstrating the effectiveness of the method.
  • Ryu Takeda, Kazuhiro Nakadai, Toru Takahashi, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno
    IEEE/RSJ 2010 INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2010) 1949-1956 2010年  査読有り
    This paper describes a speedup and performance improvement of multi-channel semi-blind ICA (MCSB-ICA) with parallel and resampling-based block-wise processing. MCSB-ICA is an integrated method of sound source separation that accomplishes blind source separation, blind dereverberation, and echo cancellation. This method enables robots to separate user's speech signals from observed signals including the robot's own speech, other speech and their reverberations without a priori information. The main problem when MCSB-ICA is applied to robot audition is its high computational cost. We tackle this by multi-threading programming, and the two main issues are 1) the design of parallel processing and 2) incremental implementation. These are solved by a) multiple-stack-based parallel implementation, and b) resampling-based overlaps and block-wise separation. The experimental results proved that our method reduced the real-time factor to less than 0.5 with an eight-core CPU, and it improves the performance of automatic speech recognition by 2-10 points compared with the single-stack-based parallel implementation without the resampling technique.

書籍等出版物

 8

講演・口頭発表等

 79

担当経験のある科目(授業)

 18

所属学協会

 6

Works(作品等)

 1

共同研究・競争的資金等の研究課題

 14

産業財産権

 2

研究テーマ

 1
  • 研究テーマ
    ヒューマンロボットインタラクション,音声コミュニケーション,音声認識,音環境理解,
    キーワード
    マイクロホンアレイ,音響特徴量,音声認識,音源定位,音源分離
    概要
    ロボットと人の自然な対話を実環境において実現するための課題に取り組んでいる