Convolution
\[s_i = w_l x_{i-l}\]a bilinear operation in x, w, defined over all integer indices. By convention \(x\) and \(w\) have non-zero values beginning from the 0th index. The largest index (minus one) of \(w\) with a non-zero value defines the kernel size, giving the longest correlation between \(s\) and \(x\).
Is a constrained linear operation (dignified as ‘Toeplitz matrix’) and generalizes to multiple dimensions, \(s_{ijk ...} = A_{lmn ...} x_{i-l, j-m, k-n, ...}\). Some dimensions may be general linear transformations (‘channels’ in image processing).
Determined by translation invariance under Lie group \(\mathbb{R}^n\). Given \(T\) a Lie algebra element of \(\mathbb{R}^n\), \(K\) the convolution operator
\[T[x]_i := x_{i+k}\] \[K[x]_i := w_l x_{i-l}\]Invariance under group action requires
\[[T, K] = 0\] \[K_{i, m-k} = K_{i+k, m}\] \[K_{im} = \xi(i-m)\]for some function \(\xi\). So \(K\) is the convolution operation. Can apply same logic with other symmetry groups.
Translation invariance example: object in image detected regardless of location in scene.
Neural network lingo
Dilation, stride, tiling common structures in convolutional kernel:
\(s_i = x_{\lambda i - l} w_{l}\) (stride \(\lambda\))
\(s_i = x_{\alpha (i - l)} w_{l}\) (dilation \(\alpha\))
\(s_i = x_{i-l} w_{l\bmod\beta}\) (tiling \(\beta\))
A dilation skips every \(\alpha\) input, and a stride skips regions of size \(\lambda\) of the input. Tiling alters the weights periodicity of the convolution, defaulting to \(\beta = 1\). From its definition it must be less than the convolution length to have any effect.
Composition of convolutions is a convolution (\(K_1 K_2\) commutes with \(T\) for \(K_1, K_2\) commuting separately). Composed convolution effective receptive field size no greater than sum of individual effective receptive field sizes.
Dilation \(\alpha\) creates receptive field length \(\alpha (k - 1) + 1\) for kernel size k. With a dilation increase factor \(n\) can stack \(l\) increasingly dilated convolutions beginning with \(\alpha\) for effective receptive field of size
\[(\alpha (k - 1) + 1)(1 + n + n^2 + .... + n^l) = (\alpha (k - 1) + 1) \frac{(n^{l+1}-1)}{(n-1)} \sim \mathcal{O}(\alpha k n^l) \sim \mathcal{O}( n^l)\]with respect to dilation factor n and # of layers l. For example Wavenet sets \(n = 2\).