I am having an argument with my colleague that an image is a 3 dimensional object
He says that in geometric sense an image is a 2d square and not a cube.
But since the image has 3 components height,width and channels i feel that it should be treated as a cube.
The following article from cs231n supports my point
For example, suppose that the input volume has size [32x32x3], (e.g. an RGB CIFAR-10 image). If the receptive field (or the filter size) is 5x5, then each neuron in the Conv Layer will have weights to a [5x5x3] region in the input volume, for a total of 5*5*3 = 75 weights (and +1 bias parameter). Notice that the extent of the connectivity along the depth axis must be 3, since this is the depth of the input volume.
Moreover,in pre deep learning we had techniques for image segmentation such as mean shift,HOG feature descriptor,they all relied on the assumption of an image being a cube.(I am just making that u[)