研究者業績

中山 雅人

ナカヤマ マサト  (Masato Nakayama)

基本情報

所属
大阪産業大学 デザイン工学部情報システム学科 教授 (副学長)
(兼任)工学研究科 博士前期課程 専攻担当教員
学位
学士(工学)(近畿大学)
修士(工学)(和歌山大学)
博士(工学)(立命館大学)

研究者番号
90511056
J-GLOBAL ID
201601002814105918
researchmap会員ID
7000017209

外部リンク

論文

 124
  • Nakasako Noboru, Kawanishi Keiji, Shinohara Toshihiro, Nakayama Masato, Uebo Tetsuji
    2012 IEEE International Conference on Signal Processing, Communications and Computing, ICSPCC 2012 680-685 2012年  査読有り
    Since the distance to target is very important information, we have proposed an acoustic distance measurement using a standing wave, which is generated by interference between transmitted and reflected waves. This method is given in a very simple form such that the distance between microphone and target is estimated as a peak value of the range spectrum (i.e., the absolute value of Fourier transform with respect to the power spectrum of the observed wave). However, to measure the short distance, the wide bandwidth of the transmitted wave is required. This paper describes a new distance estimation method measurable from 0 m based on the interference between transmitted and reflected audible sound, especially using only a single microphone (i.e., single channel observations). More concretely, we introduce an analytic signal instead of the power spectrum and examine the validity and effectiveness of our method through computer simulation and by applying it to an actual sound field with a band-limited impulse sound. © 2012 IEEE.
  • Masato Nakayama, Yuma Neki, Noboru Nakasako, Tetsuji Uebo, Takanobu Nishiura
    2012 IEEE International Conference on Signal Processing, Communications and Computing, ICSPCC 2012 674-679 2012年  査読有り
    The distance to talkers is very important information for both hands-free speech interfaces and nursing-care robots. We have proposed an acoustic distance measurement method based on interference between the transmitted and reflected waves, which can measure distance in a short range. In the present paper, we propose an acoustic distance measurement method based on interference of speech presented by a dialogue system. In a dialogue system, dry sources of presented speech are known. Therefore, we can easily perform voice activity detection (VAD) for presented speech using a phoneme segmentation with a hidden Markov model. The proposed method estimates the distance to talkers in speech segments using the results of VAD. Finally, we confirmed the effectiveness of the proposed method through experiments in real environments. © 2012 IEEE.
  • 中山 雅人, 英 慎平, 中迫 昇, 篠原 寿広, 上保 徹志
    電気学会論文誌 C Vol.131-C(No.11) 1864-1870 2011年11月  査読有り
    ・複数回の同期加算とスペクトル減算に基づく雑音抑圧処理を利用した雑音環境に頑健な位相干渉に基づく音響測距法を提案した。 ・提案手法の理論構築、実験、執筆の全般を担当
  • 中山 雅人, 英 慎平, 中迫 昇, 篠原 寿広, 上保 徹志
    電子情報通信学会論文誌 A、研究速報 Vol.J94-A(No.8) 676-679 2011年8月  査読有り
    ・対象物のある環境とない環境での1回の同期収録信号をスペクトル減算することで,バックグラウンド要素を除去する音響測距法を提案した。 ・提案手法の理論構築、実験、執筆の全般を担当
  • Masato Nakayama, Shimpei Hanabusa, Tetsuji Uebo, Noboru Nakasako
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES E94A(8) 1638-1646 2011年8月  査読有り
    Distance to target is fundamental and very important information in numerous engineering fields. Many distance measurement methods using sound use the time delay of a reflected wave, which is measured in reference to the transmitted wave. This method, however, cannot measure short distances because the transmitted wave, which has not attenuated sufficiently by the time the reflected waves are received, suppresses the reflected waves for short distances. Therefore, we proposed an acoustic distance measurement method based on the interference between the transmitted wave and the reflected waves, which can measure distance in a short range. The proposed method requires a cancellation processing for background components due to the spectrum of the transmitted wave and the transfer function of the measurement system in real environments. We refer to this processing as background components cancellation processing (BGCCP). We proposed BGCCP based on subtraction or whitening. However, the proposed method had a limitation with respect to the transmitted wave or additive noise in real environments. In the present paper, we propose an acoustic distance measurement method based on the new BGCCP. In the new BGCCP, we use the calibration of a real measurement system and the whitening processing of the transmitted wave and introduce the concept of the cepstrum to the proposed method in order to achieve robustness. Although the conventional BGCCP requires the recording of the transmitted wave under the condition without targets, the new BGCCP does not have this requirement. Finally, we confirmed the effectiveness of the proposed method through experiments in real environments. As a result, the proposed method was confirmed to be valid and effective, even in noisy environments.
  • 中山 雅人, 廣畑 和紀, 中迫 昇
    電子情報通信学会論文誌 A Vol.J94-A(No.5) 313-322 2011年5月  査読有り
    ・頭部左右に設置した近距離スピーカを利用してスピーカ・両耳間の音響距離を計測し、立体音響を実現する手法をを提案した。 ・提案手法の理論構築、実験、執筆の全般を担当
  • Takahiro Fukumori, Takanobu Nishiura, Masato Nakayama, Yuki Denda, Norihide Kitaoka, Takeshi Yamada, Kazumasa Yamamoto, Satoru Tsuge, Masakiyo Fujimoto, Tetsuya Takiguchi, Chiyomi Miyajima, Satoshi Tamura, Tetsuji Ogawa, Shigeki Matsuda, Shingo Kuroiwa, Kazuya Takeda, Satoshi Nakamura
    Acoustical Science and Technology 32(5) 201-210 2011年  査読有り
    We have been distributing a new collection of databases and evaluation tools called CENSREC-4, which is a framework for evaluating distant-talking speech in reverberant environments. The data contained in CENSREC-4 are connected digit utterances as in CENSREC-1. Two subsets are included in the data: "basic data sets" and "extra data sets." The basic data sets are used for evaluating the room impulse response-convolved speech data to simulate the various reverberations. The extra data sets consist of simulated data and corresponding real recorded data. Evaluation tools are presently only provided for the basic data sets and will be delivered to the extra data sets in the future. The task of CENSREC-4 with a basic data set appears simple however, the results of experiments prove that CENSREC-4 provides a challenging reverberation speech-recognition task, in the sense that a traditional technique to improve recognition and a widely used criterion to represent the difficulty of recognition deliver poor performance. Within this context, this common framework can be an important step toward the future evolution of reverberant speech-recognition methodologies. © 2011 The Acoustical Society of Japan.
  • 中山 雅人
    博士論文 1-119 2010年12月  査読有り
    ・音響測距とマイクロホンアレーを用いた雑音下における遠隔発話音声受音に関する研究を行った。 ・研究と執筆に関するすべてを担当
  • 中山 雅人, 中迫 昇, 篠原 寿広, 上保 徹志
    電気学会論文誌 C Vol.130-C(No.11) 1994-2000 2010年11月  査読有り
    ・可聴音の送信波と反射波の位相干渉に基づく音響測距法を多チャンネルに拡張した話者位置の推定手法を提案した。 ・提案手法の理論構築、実験、執筆の全般を担当
  • Satoshi Tamura, Chiyomi Miyajima, Norihide Kitaoka, Takeshi Yamada, Satoru Tsuge, Tetsuya Takiguchi, Kazumasa Yamamoto, Takanobu Nishiura, Masato Nakayama, Yuki Denda, Masakiyo Fujimoto, Shigeki Matsuda, Tetsuji Ogawa, Shingo Kuroiwa, Kazuya Takeda, Satoshi Nakamura
    Proc. AVSP 2010 1-6 2010年10月  査読有り
    ・IPSJ SIG-SLP 雑音下音声認識評価環境 (CENSREC)の一つとして、マルチモーダル音声認識評価環境 (CENSREC-1-AV)を構築した。 ・雑音下音声認識評価ワーキンググループの一員として、データベースの設計を中心に全般を担当。
  • 田村, 哲嗣, 宮島, 千代美, 北岡, 教英, 武田, 一哉, 山田, 武志, 滝口, 哲也, 柘植, 覚, 山本, 一公, 西浦, 敬信, 中山, 雅人, 傳田, 遊亀, 藤本, 雅清, 松田, 繁樹, 小川, 哲司, 黒岩, 眞吾, 中村, 哲
    情報処理学会研究報告. SLP, 音声言語情報処理 2010(7) 1-6 2010年7月  
    本稿では,音声と画像を用いたマルチモーダル音声認識の共通評価基盤 CENSREC-1-AV について紹介する.CENSREC-1-AV では,音声・画像データベースおよびベースラインシステムを提供する.音声は学習用クリーンデータのほか,乗用車走行雑音を付与したものを収録した.画像はカラー映像と近赤外線映像を収録し,ガンマ補正を用いて乗用車走行シミュレーション画像をテストデータとした.ベースラインシステムでは,MFCC と,固有顔ないしはオプティカルフローを特徴量として,マルチストリーム HMM により認識を行った.
  • Masato Nakayama, Shimpei Hanabusa, Noboru Nakasako, Tetsuji Uebo
    ISCIT 2010 - 2010 10th International Symposium on Communications and Information Technologies 176-181 2010年  査読有り
    Distance to target is fundamental and very important information in many engineering fields. Many distance measurement methods with sound utilize the time delay of reflected wave which is measured with reference to transmitted one. This method, however, can not measure short distance because the transmitted wave, which has not attenuated enough as of reception of reflected waves, suppresses the reflected waves for short distance. Therefore we proposed acoustic distance measurement method based on interference between transmitted wave and reflected waves, which could measure distance in short range. However, the performance of our method was not discussed in noisy environments. In this paper, we focus discussion on noisy environments with additive noise. We describe our method using real measuring system in noisy environments. In addition, we propose suitable noise reduction for our method by applying an idea of synchronous addition and spectral subtraction in noisy environments. Finally, we confirm the effectiveness of our method using noise reduction through simulation and experiment. As a result, it has been confirmed that the distance can be measured at SNR = 5 or 15 dB by our method without using noise reduction and can be measured at SNR = -5 to 5 dB by our method using noise reduction. ©2010 IEEE.
  • 中山 雅人, 西浦 敬信, 山下 洋一
    電子情報通信学会論文誌 D Vol.J92-D(No.9) 1568-1578 2009年9月  査読有り
    ・音声を母音/子音に分類し、その特徴量を強調した適応形マイクロホンアレーを用いて雑音下で音声認識する手法を提案した。 ・提案手法の理論構築、実験、執筆の全般を担当
  • 傳田 遊亀, 田中 貴雅, 溝口 遊, 中山 雅人, 西浦 敬信, 山下 洋一
    電子情報通信学会論文誌 D Vol.J92-D(No.1) 112-122 2009年1月  査読有り
    ・CSP法による話者方位推定を利用し、統計量の尤度を利用した動的な時間領域処理に基づいて、遠隔発話区間を検出する手法を提案した。 ・論文執筆の全般を担当。特に理論構築の検証を担当
  • Norihide Kitaoka, Takeshi Yamada, Satoru Tsuge, Chiyomi Miyajima, Kazumasa Yamamoto, Takanobu Nishiura, Masato Nakayama, Yuki Denda, Masakiyo Fujimoto, Tetsuya Takiguchi, Satoshi Tamura, Shigeki Matsuda, Tetsuji Ogawa, Shingo Kuroiwa, Kazuya Takeda, Satoshi Nakamura
    Acoustical Science and Technology 30(5) 363-371 2009年  査読有り
    Voice activity detection (VAD) plays an important role in speech processing including speech recognition, speech enhancement, and speech coding under noisy environments. We have developed an evaluation framework for VAD under noisy environments, named CENSREC-1-C. We designed this framework for simple isolated utterance detection and hence, this framework consists of noisy continuous digit utterances and evaluation tools for VAD results. We define two evaluation measures, one for frame-level detection performance and the other for utterance-level detection performance. We also provide the evaluation results of a power-based VAD method as a reference. ©2009 The Acoustical Society of Japan.
  • Takanobu Nishiura, Masato Nakayama, Yuki Denda, Norihide Kitaoka, Kazumasa Yamamoto, Takeshi Yamada, Satoru Tsuge, Chiyomi Miyajima, Masakiyo Fujimoto, Tetsuya Takiguchi, Satoshi Tamura, Shingo Kuroiwa, Kazuya Takeda, Satoshi Nakamura
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008 1828-1834 2008年  査読有り
    Recently, speech recognition performance has been drastically improved by statistical methods and huge speech databases. Now performance improvement under such realistic environments as noisy conditions is being focused on. Since October 2001, we from the working group of the Information Processing Society in Japan have been working on evaluation methodologies and frameworks for Japanese noisy speech recognition. We have released frameworks including databases and evaluation tools called CENSREC-1 (Corpus and Environment for Noisy Speech RECognition 1; formerly AURORA-2J), CENSREC-2 (in-car connected digits recognition), CENSREC-3 (in-car isolated word recognition), and CENSREC-1-C (voice activity detection under noisy conditions). In this paper, we newly introduce a collection of databases and evaluation tools named CENSREC-4, which is an evaluation framework for distant-talking speech under hands-free conditions. Distant-talking speech recognition is crucial for a hands-free speech interface. Therefore, we measured room impulse responses to investigate reverberant speech recognition. The results of evaluation experiments proved that CENSREC-4 is an effective database suitable for evaluating the new dereverberation method because the traditional dereverberation process had difficulty sufficiently improving the recognition performance. The framework was released in March 2008, and many studies are being conducted with it in Japan.
  • Masato Nakayama, Takanobu Nishiura, Yuki Denda, Norihide Kitaoka, Kazumasa Yamamoto, Takeshi Yamada, Satoru Tsuge, Chiyomi Miyajima, Masakiyo Fujimoto, Tetsuya Takiguchi, Satoshi Tamura, Tetsuji Ogawa, Shigeki Matsuda, Shingo Kuroiwa, Kazuya Takeda, Satoshi Nakamura
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5 968-+ 2008年  査読有り
    ・残響に頑健な音声認識手法を評価するために、CENSREC-4 : 残響下遠隔発話音声認識評価基盤を構築のための評価を行った。特に、本稿では、代表的な従来手法を用いた場合の音声認識率を明らかにした。 ・雑音下音声認識評価ワーキンググループの一員として、データベースの設計、収録、原稿執筆の全般を担当
  • Yuki Denda, Takamasa Tanaka, Masato Nakayama, Takanobu Nishiura, Yoichi Yamashita
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4 1477-+ 2007年  査読有り
    This paper proposes a novel hands-free voice activity detection (VAD) method utilizing not only temporal features but also spatial features, called adaptive zero crossing detection (AZCD), that uses talker direction estimation. It firstly estimates talker direction to extract two spatial features: spatial reliability and spatial variance, based on weighted cross-power spectrum phase analysis and maximum likelihood estimation. Then, the AZCD detects voice activity frames by robustly detecting zero crossing information of speech with adaptively controlled thresholds using the extracted spatial features in noisy environments. The experimental results in an actual office room confirmed that the VAD performance of the proposed method that utilizes both temporal and spatial features is superior to that of the conventional method that utilizes only the temporal or spatial features.
  • Takanobu Nishiura, Yoshiki Hirano, Yuki Denda, Masato Nakayama
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4 1369-1372 2007年  査読有り
    Reverberation-robust speech recognition has become very important in the recognition of distant-talking speech. However, as no common reverberation criteria for the recognition of reverberant-speech have been proposed, it has been difficult to estimate this. We have thus focused on a reverberation criterion for the recognition of distant-talking speech. The reverberation time is generally currently used as a reverberation criterion for the recognition of distant-talking speech. This is unique and does not depend on the position of the source in a room. However, distant-talking speech recognition greatly depends on the location of the talker relative to that of the microphone and the distance between them. We investigated a suitable reverberation criterion with the ISO3382 acoustic parameters for distant-talking speech recognition to overcome this problem. We first calculated distant-talking speech recognition with early and late reflections based on the impulse response between the talker and microphone. As a result, we found that early reflections within about 12.5 ms from the duration of direct sound contributed slightly to distant-talking speech recognition in non-noisy environments. We then evaluated it based on ISO3382 acoustic parameters. We consequently confirmed that the ISO3382 acoustic parameters are strong candidates for the new reverberation criteria for distant-talking speech recognition.
  • Norihide Kitaoka, Kazumasa Yamamoto, Tomohiro Kusamizu, Seiichi Nakagawa, Takeshi Yamada, Satoru Tsuge, Chiyomi Miyajima, Takanobu Nishiura, Masato Nakayama, Yuki Denda, Masakiyo Fujimoto, Tetsuya Takiguchi, Satoshi Tamura, Shingo Kuroiwa, Kazuya Takeda, Satoshi Nakamura
    2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2 607-+ 2007年  査読有り
    Voice activity detection (VAD) plays an important role in speech processing including speech recognition, speech enhancement, and speech coding in noisy environments. We developed an evaluation framework for VAD in such environments, called Corpus and Environment for Noisy Speech Recognition 1 Concatenated (CENSREC-1-C). This framework consists of noisy continuous digit utterances and evaluation tools for VAD results. By adoptiong two evaluation measures, one for frame-level detection performance and the other for utterance-level detection performance, we provide the evaluation results of a power-based VAD method as a baseline. When using VAD in speech recognizer, the detected speech segments are extended to avoid the loss of speech frames and the pause segments are then absorbed by a pause model. We investigate the balance of an explicit segmentation by VAD and an implicit segmentation by a pause model using an experimental simulation of segment extension and show that a small extension improves speech recognition.
  • Masato Nakayama, Yuki Denda, Takanobu Nishiura, Hideki Kawahara, Toshio Irino
    18th International Congress on Acoustics (ICA2004) 4 3041-3044 2004年4月  査読有り
    Kyoto, Japan, 4-9 Apr. 2004 (abstract review)
  • Masato Nakayama, Takanobu Nishiura, Hideki Kawahara
    Proc. IWAENC 2003 243-246 2003年9月  査読有り
    ・音声を母音と子音に分類し、その区間を推定、それぞれに適した適応型アレーを切り替えるVC-AMNORを提案し、その母音/子音の識別率と音声認識率の関係を明らかにした。 ・提案手法の理論構築、実験、執筆の全般を担当
  • 中山 雅人
    修士論文 1-42 2003年3月  査読有り
    ・母音と子音の音声スペクトルに着目した適応型ビームフォーマによる音声強調に関する研究を行った。 ・研究と執筆に関するすべてを担当
  • T Nishiura, M Nakayama, S Nakamura
    2003 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL III, PROCEEDINGS 209-212 2003年  査読有り
    Distant-talking speech recognition in noisy environments is indispensable for self-moving robots or tele-conference systems. However, background noise and room reverberations seriously degrade the sound-capture quality in real acoustic environments. A microphone array is an ideal candidate as an effective method for capturing distant-talking speech. AMNOR (Adaptive Microphone-array for NOise Reduction) was proposed as an adaptive beamformer for capturing the desired distant signals in noisy environments by Kaneda et al. Although the AMNOR has been proven effective, it can be further improved if we know the spectrum characteristics of the desired distant signals in advance. Therefore, we regarded speech as a desired distant signal and designed an AMNOR based on the average speech spectrum. In this paper, we particularly focused on the performance of AMNOR based on the average speech spectrum for distant-talking speech capture and recognition. As a result of evaluation experiments in real acoustic environments, we confirmed that the ASR (Automatic Speech Recognition) performance was improved 5 - 10% by using an AMNOR based on the average speech spectrum in noisy environments. In addition, the proposed AMNOR provides better noise reduction performance than that of conventional AMNOR.

MISC

 192

講演・口頭発表等

 495

担当経験のある科目(授業)

 17

共同研究・競争的資金等の研究課題

 11

産業財産権

 10

研究テーマ

 3
  • 研究テーマ
    マイクロホンアレー,雑音下音声受音,音声認識
    研究期間(開始)
    2003/04/01
  • 研究テーマ
    能動騒音制御,快音化
    研究期間(開始)
    2007/04/01
  • 研究テーマ
    パラメトリックスピーカ,音場再現,立体音響,音レーダ
    研究期間(開始)
    2009/04/01