Parallel Scans for Nonlinear Sequence Models
I've recently come across two remarkable papers that are making me rethink recurrent neural networks (RNNs). The first is Were RNNs All We Needed? and the second is Towards Scalable and Stable Parallelization of Nonlinear RNNs.
The thing that held back RNNs for a long time was the need to do sequential rollouts. But now it seems we can get around that, which is crazy powerful.
Basically, the idea is that we can consider generating valid trajectories from sequential models in two ways: sequentially, or by generating random trajectories and then updating them with iterations who have `valid' trajectories as fixed points.
Interesting questions arise when the underlying models themselves are known to have stability, as this stability naturally transfers to these iterations.