Convolutional Neural Networks (CNNs from here on) triumph in the field of image processing because they are designed to effectively handle strong spatial dependencies. Simply put, adjacent pixel-values are close to each other, often changing only gradually from one pixel to the next. In a picture where you wear a blue shirt, all the pixels in that area of the picture are blue. You can think of a strong autocorrelated time series, just for spatial data rather than sequential data. This post explains few important concepts related to CNNs: sparsity of connections, parameter sharing, and hierarchical feature engineering.
A distinctive power of neural networks (neural nets from here on) is their ability to flex themselves in order to capture complex underlying data structure. This post shows that the expressive power of neural networks can be quite swiftly taken to the extreme, in a bad way.
What does it mean? A paper from 1989 (universal approximation theorem, reference below) shows that any reasonable function can be approximated arbitrarily well by fairly a shallow neural net.
Speaking freely, if one wants to abuse the data, to overfit it like there is no tomorrow, then neural nets is the way to go; with neural nets you can perfectly map your fitted values to any data shape. Let’s code an example and explain the meaning of this.