Follow
Soumi Maiti
Soumi Maiti
Verified email at andrew.cmu.edu - Homepage
Title
Cited by
Cited by
Year
VoxtLM: Unified Decoder-Only Models for Consolidating Speech Recognition, Synthesis and Speech, Text Continuation Tasks
S Maiti, Y Peng, S Choi, J Jung, X Chang, S Watanabe
ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024
512024
Improving massively multilingual asr with auxiliary ctc objectives
W Chen, B Yan, J Shi, Y Peng, S Maiti, S Watanabe
ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and …, 2023
422023
Reproducing whisper-style training using an open-source toolkit and publicly available data
Y Peng, J Tian, B Yan, D Berrebbi, X Chang, X Li, J Shi, S Arora, W Chen, ...
2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 1-8, 2023
412023
Exploring speech recognition, translation, and understanding with discrete speech units: A comparative study
X Chang, B Yan, K Choi, JW Jung, Y Lu, S Maiti, R Sharma, J Shi, J Tian, ...
ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024
352024
Speechlmscore: Evaluating speech generation using speech language model
S Maiti, Y Peng, T Saeki, S Watanabe
ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and …, 2023
322023
EEND-SS: Joint end-to-end neural speaker diarization and speech separation for flexible number of speakers
S Maiti, Y Ueda, S Watanabe, C Zhang, M Yu, SX Zhang, Y Xu
2022 IEEE Spoken Language Technology Workshop (SLT), 480-487, 2023
322023
Reducing barriers to self-supervised learning: Hubert pre-training with academic compute
W Chen, X Chang, Y Peng, Z Ni, S Maiti, S Watanabe
arXiv preprint arXiv:2306.06672, 2023
272023
Parametric resynthesis with neural vocoders
S Maiti, MI Mandel
2019 IEEE Workshop on Applications of Signal Processing to Audio and …, 2019
262019
Generating multilingual voices using speaker space translation based on bilingual speaker data
S Maiti, E Marchi, A Conkie
ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and …, 2020
242020
End-to-end diarization for variable number of speakers with local-global networks and discriminative speaker embeddings
S Maiti, H Erdogan, K Wilson, S Wisdom, S Watanabe, JR Hershey
ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and …, 2021
232021
Speaker independence of neural vocoders and their effect on parametric resynthesis speech enhancement
S Maiti, MI Mandel
ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and …, 2020
232020
ESPnet-ST-v2: Multipurpose spoken language translation toolkit
B Yan, J Shi, Y Tang, H Inaguma, Y Peng, S Dalmia, P Polák, ...
arXiv preprint arXiv:2304.04596, 2023
132023
Learning to speak from text: Zero-shot multilingual text-to-speech with unsupervised text pretraining
T Saeki, S Maiti, X Li, S Watanabe, S Takamichi, H Saruwatari
arXiv preprint arXiv:2301.12596, 2023
132023
Speech denoising by parametric resynthesis
S Maiti, MI Mandel
ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and …, 2019
132019
SpeechBERTScore: Reference-Aware Automatic Evaluation of Speech Generation Leveraging NLP Evaluation Metrics
T Saeki, S Maiti, S Takamichi, S Watanabe, H Saruwatari
arXiv preprint arXiv:2401.16812, 2024
122024
Joint prediction and denoising for large-scale multilingual self-supervised learning
W Chen, J Shi, B Yan, D Berrebbi, W Zhang, Y Peng, X Chang, S Maiti, ...
2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 1-8, 2023
112023
TriniTTS: Pitch-controllable End-to-end TTS without External Aligner.
Y Ju, I Kim, H Yang, JH Kim, B Kim, S Maiti, S Watanabe
INTERSPEECH, 16-20, 2022
112022
Unsupervised data selection for tts: Using arabic broadcast news as a case study
M Baali, T Hayashi, H Mubarak, S Maiti, S Watanabe, W El-Hajj, A Ali
arXiv preprint arXiv:2301.09099, 2023
102023
Predicting interaction quality in customer service dialogs
S Stoyanchev, S Maiti, S Bangalore
Advanced Social Interaction with Agents: 8th International Workshop on …, 2018
92018
Towards practical and efficient image-to-speech captioning with vision-language pre-training and multi-modal tokens
M Kim, J Choi, S Maiti, JH Yeo, S Watanabe, YM Ro
ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024
82024
The system can't perform the operation now. Try again later.
Articles 1–20