Inception-v4, inception-resnet and the impact of residual connections on learning.
Learning transferable architectures for scalable image recognition.
Masking Simulate the embedding lookup by expanding the 2D input to 3D, with embedding dimension of 10.
Each input sequence will be of size 28, 28 height is treated like time.
.
Identity mappings in deep residual networks.