Combining articulatory features with end-to-end learning in speech recognition

Link:

https://doi.org/10.1007/978-3-030-01424-7_49

Autor/in:

Erscheinungsjahr:

2018

Medientyp:

Text

Schlagworte:

Articulatory features
Automatic speech recognition
Deep neural networks (DNN)
End-to-end learning

Beschreibung:

End-to-end neural networks have shown promising results on large vocabulary continuous speech recognition (LVCSR) systems. However, it is challenging to integrate domain knowledge into such systems. Specifically, articulatory features (AFs) which are inspired by the human speech production mechanism can help in speech recognition. This paper presents two approaches to incorporate domain knowledge into end-to-end training: (a) fine-tuning networks which reuse hidden layer representations of AF extractors as input for ASR tasks; (b) progressive networks which combine articulatory knowledge by lateral connections from AF extractors. We evaluate the proposed approaches on the speech Wall Street Journal corpus and test on the eval92 standard evaluation dataset. Results show that both fine-tuning and progressive networks can integrate articulatory information into end-to-end learning and outperform previous systems.

Lizenz:

info:eu-repo/semantics/closedAccess

Quellsystem:

Forschungsinformationssystem der UHH

Interne Metadaten

Quelldatensatz: oai:www.edit.fis.uni-hamburg.de:publications/72e1fef5-91ad-43d2-87d0-6487229f5272