multi language speech dataset

Multilingual LibriSpeech Dataset - Papers With Code paperswithcode.com › dataset › multilingual-lib...

The dataset is derived from read audiobooks from LibriVox and consists of 8 languages - English, German, Dutch, Spanish, French, Italian, Portuguese, Polish. It ...

legacy-datasets/multilingual_librispeech - Hugging Face huggingface.co › datasets › multilingual_librisp...

The dataset is derived from read audiobooks from LibriVox and consists of 8 languages - English, German, Dutch, Spanish, French, Italian, Portuguese, Polish.

MLCommons Multilingual Spoken Words Dataset mlcommons.org › Datasets

A large and growing audio dataset of spoken words in 50 languages for academic research and commercial applications in keyword spotting and spoken term search.

CVSS: A Massively Multilingual Speech-to-Speech Translation ... github.com › google-research-datasets › cvss

CVSS is a massively multilingual-to-English speech-to-speech translation corpus, covering sentence-level parallel speech-to-speech translation pairs from 21 ...

Multi-lingual HateSpeech Dataset - Kaggle www.kaggle.com › datasets › wajidhassanmoosa

This dataset contains hate speech text with labels where 0 represents non-hate and 1 shows hate texts also the data from different languages needed to be ...

Multilingual LibriSpeech (MLS) - openslr.org www.openslr.org › ...

The dataset is derived from read audiobooks from LibriVox and consists of 8 languages - English, German, Dutch, Spanish, French, Italian, Portuguese, Polish.

Speech Wikimedia: A 77 Language Multilingual Speech Dataset arxiv.org › cs

30 авг. 2023 г. · It includes 1780 hours (195 GB) of CC-BY-SA licensed transcribed speech from a diverse set of scenarios and speakers, in 77 different languages.

+172 Multi lingual Datasets - NLP Database - Metatext.AI metatext.io › datasets-list › multi-lingual-language

Dataset contains conversational, bilingual speech test and tuning data for English, Chinese, and Japanese. It includes audio data, transcripts, and translations ...

CMU Wilderness Multilingual Speech Dataset - Festvox festvox.org › cmu_wilderness

The CMU Wilderness Multilingual Speech Dataset is a speech dataset of aligned sentences and audio for some 700 different languages. It is based on readings of ...

[PDF] A Multilingual Speech Dataset for SLU and Beyond www.isca-archive.org › interspeech_2024 › lee24i_interspeech

5 сент. 2024 г. · We introduced Speech-MASSIVE, a multilingual SLU dataset spanning 12 languages for intent prediction and slot-filling tasks. Alongside dataset ...

Запросы по теме

voice dataset

common voice dataset

speech recognition dataset

cvss corpus and massively multilingual speech to speech translation

english speech dataset

fleurs dataset

cvss dataset

voxpopuli dataset