Chae
Young Lee

Tutorial – Dealing with the Lack of Audio Data

Clova AI Research Team

Chae Young Lee

Chae
Young Lee

Tutorial – Dealing with the Lack of Audio Data

Clova AI Research Team

Chae Young Lee

Bio

Chae Young Lee recently graduated from Hankuk Academy of Foreign Studies and is currently working at Naver’s Clova AI Research Team as a data scientist. Last year, she participated in Deep Learning Camp Jeju 2018 as the only high school student and developed the Conditional WaveGAN. She is also well-known for her Medium post on TPU.

Bio

Chae Young Lee recently graduated from Hankuk Academy of Foreign Studies and is currently working at Naver’s Clova AI Research Team as a data scientist. Last year, she participated in Deep Learning Camp Jeju 2018 as the only high school student and developed the Conditional WaveGAN. She is also well-known for her Medium post on TPU.

Abstract

In recent years, speech data is receiving spotlight for various applications in deep learning, from Automatic Speech Recognition (ASR) system to source separation. And yet, there are not many augmentation techniques explored for speech data compared to those of image data. Thus, in this track, we will explore various methods to augment speech data. This hands-on tutorial will work along the task of building a simple speech classifier with the Speech Commands Zero to Nine (SC09) dataset available by TensorFlow and go over traditional augmentation techniques, transfer learning, GAN augmentation, and style transfer to increase the classification accuracy. Participants are required to download the libraries and pre-trained models, which will be available in late-January.

Abstract

In recent years, speech data is receiving spotlight for various applications in deep learning, from Automatic Speech Recognition (ASR) system to source separation. And yet, there are not many augmentation techniques explored for speech data compared to those of image data. Thus, in this track, we will explore various methods to augment speech data. This hands-on tutorial will work along the task of building a simple speech classifier with the Speech Commands Zero to Nine (SC09) dataset available by TensorFlow and go over traditional augmentation techniques, transfer learning, GAN augmentation, and style transfer to increase the classification accuracy. Participants are required to download the libraries and pre-trained models, which will be available in late-January.

Area

Speech
Domain Adaptation
Transfer Learning
Data Augmentation

Area

Speech
Domain Adaptation
Transfer Learning
Data Augmentation

Pre-requisites for tutorial

TensorFlow, Librosa, Numpy
Pre-trained models of CNN (SC20), DCGAN, and CycleGAN
SC09 dataset

Pre-requisites for tutorial

TensorFlow, Librosa, Numpy
Pre-trained models of CNN (SC20), DCGAN, and CycleGAN
SC09 dataset

Outline

1. Motivation

Implementing ML in custom tasks

Data shortage

4. Transfer Learning

Model: the same
Dataset: Speech Command (20 classes)

2. Classifier

Model: CNN trainable with laptop CPUs (speech.py)
Dataset: Speech Command Zero to Nine (SC09)
Input: Spectrogram images
Pre-processing data → model setup → initial training

5. GAN augmentation

Model: DCGAN

Generating more SC09 dataset

Pre-trained → generation (on spot)

3. Traditional augmentation

Adding noises

Stretching

Shifting pitches

Rolling

6. Style transfer

Model: CycleGAN/StarGAN

Generate dataset by converting gender/age/etc

Pre-trained → generation (on spot)

7. Conclusion

Compare accuracy

Insights

This tutorial is first come – first serve. Please register soon as number of spots are limited.
As a prerequisite participants are asked to download all the dev packages and data.

 

Registration opens soon – stay tuned!

Outline

1. Motivation

Implementing ML in custom tasks

Data shortage

2. Classifier

Model: CNN trainable with laptop CPUs (speech.py)
Dataset: Speech Command Zero to Nine (SC09)
Input: Spectrogram images
Pre-processing data → model setup → initial training

3. Traditional augmentation

Adding noises

Stretching

Shifting pitches

Rolling

4. Transfer Learning

Model: the same
Dataset: Speech Command (20 classes)

5. GAN augmentation

Model: DCGAN

Generating more SC09 dataset

Pre-trained → generation (on spot)

6. Style transfer

Model: CycleGAN/StarGAN

Generate dataset by converting gender/age/etc

Pre-trained → generation (on spot)

7. Conclusion

Compare accuracy

Insights

This tutorial is first come – first serve. Please register soon as number of spots are limited.
As a prerequisite participants are asked to download all the dev packages and data.

 

Registration opens soon – stay tuned!

Planned Agenda

Planned Agenda