These notes cover in detail the development of algorithms required for implementing the ResNet CNN architecture in the JPEG transform domain. This means that the new architecture can perform inference and learning directly on JPEG compressed images. They do not need to be decompressed being being fed into the network.

Although this allows time to be saved up front by allowing the decompression process to be skipped, that is not considered to be the main contribution of the work. JPEG files are highly sparse, and CNNs are mostly performing adds and multiplies. This means that many such operations on a JPEG should be `noop`

, greatly speeding up the entire network processing. Furthermore, sparse data can be stored in a much smaller space than dense data, so this should permit larger batch sizes and therefore more accurate gradients, increasing the accuracy of the network. Finally, JPEG is by far the most popular image file compression scheme due to it's high compression ratio, so this method should be able to find wide applicability. For example, the ImageNet data set and challenge consists entirely of JPEG images.

- A general method for CNN processing in the JPEG transform domain
- A model conversion algorithm for pre-trained spatial domain models
- Approximated Spatial Masking: An accurate approximation algorithm for computing piecewise linear functions on DCT coefficients (see "Approximated Spatial Masks and ReLu")
- Half-Spatial Masking: A highly efficient algorithm for applying spatial domain masks to DCT coefficients (see "Approximated Spatial Masks and ReLu")
- The DCT Mean-Variance Theorem (see "Batch Normalization")

The notes are separated by topic and the individual components of ResNet are developed in isolation.

**Background**

**Convolutions**

**Nonlinearity and Utilities**

**End-to-End**

**Appendix**

M. Ehrlich and L. Davis. Deep Residual Learning in the JPEG Transform Domain. *arXiv
preprint arXiv:1812.11690, 2018*

© 2018 Max Ehrlich