Voice activity detection powered by SileroVAD.
A haskell implentation of SileroVAD, a pre-trained enterprise-grade voice activity detector.
silero-vad-hs
Voice activity detection powered by SileroVAD.
Supported architectures
Tested on GHC 9.8
, GHC 9.2.8
, and GHC 8.10.7
.
Quick start
This is a literate haskell file. You can run this example via the following:
nix develop --command bash -c '
export LD_LIBRARY_PATH=lib:$(nix path-info .#stdenv.cc.cc.lib)/lib
cabal run --flags="build-readme"
'
Necessary language extensions and imports for the example:
import qualified Data.Vector.Storable as Vector
import Data.Function ((&))
import Data.WAVE (sampleToDouble, WAVE (waveSamples), getWAVEFile)
import Silero (withVad, withModel, detectSegments, detectSpeech, windowLength)
For this example, the WAVE library is used for simplicity.
Unfortunately, its design is flawed and represents audio in a lazy linked list.
Prefer using wave for better performance.
main :: IO ()
main = do
wav <- getWAVEFile "lib/jfk.wav"
The functions below expects a Vector Float
. This converts it to the expected format.
let samples =
concat (waveSamples wav)
& Vector.fromList
& Vector.map (realToFrac . sampleToDouble)
Use detectSegments
to detect the start/end times of voice activity segments.
withVad $ \vad -> do
segments <- detectSegments vad samples
print segments
Alternatively, use detectSpeech
if you want to detect if speech is found in a single window.
withModel $ \model -> do
probability <- detectSpeech model $ Vector.take windowLength samples
putStrLn $ "Probability: " <> show probability
[!NOTE] Audio passed to
detectSegments
anddetectSpeech
functions have the following requirements:
- Must be 16khz sample rate.
- Must be mono channel.
- Must be 16-bit audio.
When using
detectSpeech
, audio samples must be of sizewindowLength
(defined as 512).
Iflength samples /= windowLength
, the probability will always be 0.