Why Are Convolutional Neural Networks Nice For Pictures?

May 1, 2025

7

Common Approximation Theorem states {that a} neural community with a single hidden layer and a nonlinear activation perform can approximate any steady perform.

Sensible points apart, such that the variety of neurons on this hidden layer would develop enormously giant, we don’t want different community architectures. A easy feed-forward neural community may do the trick.

It’s difficult to estimate what number of community architectures have been developed.

While you open the favored AI mannequin platform Hugging Face in the present day, you will see a couple of million pretrained fashions. Relying on the duty, you’ll use totally different architectures, for instance transformers for pure language processing and convolutional networks for picture classification.

So, why will we want so many Neural Community architectures?

On this submit, I would like supply a solution to this query from a physics perspective. It’s the construction within the knowledge that evokes novel neural community architectures.

Symmetry and invariance

Physicists love symmetry. The basic legal guidelines of physics make use of symmetries, akin to the truth that the movement of a particle might be described by the identical equations, no matter the place it finds itself in time and area.

Symmetry all the time implies invariance with respect to some transformation. These ice crystals are an instance of translational invariance. The smaller buildings look the identical, no matter the place they seem within the bigger context.

By Picture by PtrQs, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=127396876

Exploiting symmetries: convolutional neural community

Should you already know {that a} sure symmetry persists in your knowledge, you’ll be able to exploit this reality to simplify your neural community structure.

Let’s clarify this with the instance of picture classification. The panel reveals three scenes together with a goldfish. It will possibly present up in any location throughout the picture, however the picture ought to all the time be categorized as goldfish.

Three panels showing the same goldfish in different locations — Pictures created by the writer utilizing Midjourney.

A feed-forward neural community may actually obtain this, given adequate coaching knowledge.

This community structure requires a flattened enter picture. Weights are then assigned between every enter layer neuron (representing one pixel within the picture) and every hidden layer neuron. Additionally, weights are assigned between every neuron within the hidden and the output layer.

Together with this structure, the panel reveals a “flattened” model of the three goldfish photos from above. Do they nonetheless look alike to you?

Schematic explaining the flattening of an image and the resulting network architecture. — Picture created by the writer. Utilizing photos created by the writer with Midjourney and ANN structure created with https://alexlenail.me/NN-SVG/.

By flattening the picture, now we have incurred two issues:

Pictures that comprise an analogous object don’t look alike as soon as they’re flattened,
For prime-resolution photos, we might want to prepare numerous weights connecting the enter layer and the hidden layer.

Convolutional networks, then again, work with kernels. Kernel sizes usually vary between 3 and seven pixels, and the kernel parameters are learnable in coaching.

The kernel is utilized like a raster to the picture. A convolutional layer may have a couple of kernel, permitting every kernel to give attention to totally different facets of the picture.

For instance, one kernel would possibly decide up on horizontal strains within the picture, whereas one other would possibly decide up on convex curves.

Convolutional neural networks protect the order of pixels and are nice to study localized buildings. The convolutional layers might be nested to create deep layers. Along with pooling layers, high-level options might be discovered.

The ensuing networks are significantly smaller than for those who would use a fully-connected neural community. A convolutional layer solely requires kernel_size x kernel_size x n_kernel trainable parameters.

It can save you reminiscence and computational finances, all by exploiting the truth that your object could also be positioned wherever inside your picture!

Extra superior deep studying architectures that exploit symmetries are Graph Neural Networks and physics-informed neural networks.

Abstract

Convolutional neural networks work nice with photos as a result of they protect the native data in your picture. As a substitute of flattening all of the pixels, rendering the picture meaningless, kernels with learnable parameters decide up on native options.

Why Are Convolutional Neural Networks Nice For Pictures?

Symmetry and invariance

Exploiting symmetries: convolutional neural community

Abstract

Additional studying

Related Articles

Dwell translation in iOS 26: The way it works and the place to search out it

Roboworx to assist Miso Robotics in set up, upkeep of its Flippy robots

MXene infused printed nanogenerator advances ecofriendly wearable vitality techniques

LEAVE A REPLY Cancel reply

Latest Articles

Dwell translation in iOS 26: The way it works and the place to search out it

Roboworx to assist Miso Robotics in set up, upkeep of its Flippy robots

MXene infused printed nanogenerator advances ecofriendly wearable vitality techniques

Get three months of Audible for less than $3, plus save on Disney+, Starz and extra

Why Information Scientists Ought to Care About SFX Energy Provides

Dwell translation in iOS 26: The way it works and the...