Ethology and Soundscape Analysis
Wild bird singing voice analysis is addressed as an important issue from various aspects such as ethology, bioacoustics, robot audition, machine learning and so on. In the ethological aspect, people listen carefully, write down the type, direction and time of wild birds, and based on this memo, the analysis is conducted manually, requiring concentration and thus time consuming. Since such an analysis method is manual, it is difficult to obtain a unified analysis result, and furthermore, there is a problem that it cannot be confirmed again afterwards.
In recent years, overseas, some projects collecting wild bird singing voice have been launched, and research on the type identification of wild birds applying machine learning has been conducted. However, since their recording is done with a single microphone, those researches do not consider position detection of wild birds important in behavior analysis.
We are addressing the position detection task by fusion of robot audition and machine learning. Using multiple microphone arrays, we are engaged in development of the technology that detects the position of wild birds by estimating the three-dimensional position of the sound source over a wide area outdoors, and or the technology that distinguishes the type of wild birds from singing voices by a method of deep learning even with small amount of teacher data.In collaboration with research groups of Nagoya University and Kyoto University, we are also conducting research to develop these technologies and analyze communication by wild bird songs.
Sound Position Estimation of Bird Songs Using Multiple Microphone Arrays
Bird Song Scene Analysis (when, where, what)

Bird Song Reconstruction
Merging Bioacoustic Classifiers Without Shared Data
We want to build one AI classifier that covers many animal groups, like birds, whales, and frogs. The usual way is to gather all the training data and train on it together, but in many cases the data cannot be shared. We notice that animal groups tend to vocalize in non-overlapping frequency bands. We use this property to merge separate classifiers by simply averaging their parameters. The result is a single classifier covering many species, with no data sharing required.

Weakly Supervised Whale Call Detection in Long Recordings
To train an AI to detect whale calls, we usually need to mark every call by hand in long recordings, which takes hundreds of hours. We propose a method that only needs one yes/no label per recording. The method also learns to point to when each call happens inside the recording.

Multi-Channel Audio Alignment Under Clock Drift
To localize wildlife in 3D, we deploy many recorders across a habitat. The internal clocks of these recorders drift apart over time, so the recordings do not line up. We propose a deep learning method that predicts and corrects this drift. This work won first place in the BioDCASE 2025 international challenge.

| Publications |
|---|

