Deep Sequence Models: Context Representation, Regularization, and Application to Language

Date
Thu April 12th 2018, 11:00am
Location
Gates Building, Room 219
Adji Bousso Dieng
Columbia University

 

Recurrent Neural Networks (RNNs) are the most successful models for sequential data. They have achieved state-of-the-art results in many tasks including language modeling, image and text generation, speech recognition, and machine translation. Despite all these successes, RNNs still face some challenges: they fail to capture long-term dependencies (don't believe the myth that they do!) and they easily overfit.

The ability to capture long-term dependencies in sequential data depends on the way context is represented. Theoretically, RNNs capture all the dependencies in the sequence via the use of recurrence and parameter sharing. However practically, RNNs face optimization issues. Assumptions made to counter these optimization challenges hinder the capability of RNNs to capture long-term dependencies. On the other hand, the overfitting problem of RNNs stem from the strong dependence of the hidden units to each other. I will talk about my research on context representation and regularization for RNNs. First, I will make the case that in the context of language, topic models are very effective at representing context and can be used jointly with RNNs to facilitate learning and capture long-term dependencies. Second, I will discuss our new proposed method to regularize RNNs called NOISIN. NOISIN relies on the concept of unbiased noise injection in the hidden units of RNNs to reduce co-adaptation. It significantly improves the generalization capabilities of existing RNN-based models. For example, it improves RNNs with dropout by as much as 12.2% on the Penn Treebank and 9.4% on the Wikitext-2 dataset.