We present a new approach to speech separation and localization. The method uses a spatial binary search and can handle arbitrary many speakers, moving speakers, and background sounds.
Generative modeling has gotten so good at producing unconditional samples, but other tasks like source separation don't produce equally convincing results. In this paper we propse a new way to tap into state-of-the art generative models to solve source separation. We call our method BASIS Separation (Bayesian Annealed SIgnal Source Separation).
Background replacement has many applications from VFX to privacy (anyone tried to use zoom's virtual background??). In this work we push the state of the art for separating a subject from their background.
Do you ever feel like songs nowadays repeat themselves a lot? This was the motivation for a recent method I worked on for detecting choruses in music. By looking for repetition in the spectrogram it's possible to discover song structure. The method is fast, runs on the cpu, and can be installed with pip. It works on a wide variety of music genres as well.
During my senior year I was a researcher in the Cox Lab at Harvard at the intersection of neuroscience and computer vision. My thesis project borrowed the idea of Feature Based Attention from neuroscience to improve visual classification in neural networks.