Rise of AI in image processing: CNNs
- Yasin Uzun, MSc, PhD
- Sep 1, 2024
- 3 min read
Updated: Jul 12
CNNs was the first architecture that started the deep learning revolution.

Although AI was heavily publicized in 2010s with the Go game defeats, AI has always around the industry and research for decades. Various AI models were being used in background for forecasting, image processing, data analyses and scientific research in everyday items that we use, but it was less publicized. While the defeats of professional go players by AlphaGo was making headlines and leading to speculations on AI ending professional jobs in media, there was significant developments in the AI-research, more known to the scientific community which the public was less aware.
AlphaGo was mainly built upon a new AI paradigm called "Deep Reinforcement Learning", which is essentially the combination of two relatively AI techniques: deep learning and reinforcement learning. The first technique is inspired by how to train your dog: Give treats for little steps of improvement and penalties for bad behavior. In programming, positive scores correspond to treats and negative scores correspond to penalties. The algorithm is trained to maximize its score by iterative trials and experimentation of different paths. It has various application areas such as game programming, robotics, self driving, finance and healthcare.
The second AI technique that powered AlphaGo was deep learning. This technique was an enhancement of another AI algorithm named neural networks, which was first proposed in 1958 and been researched and used since 1960s. Neural networks were very well known in AI research community and found application industry for decades. However, its application was limited because it required computationally expensive calculations in large amounts. Training large neural networks were infeasible due to increasing computational complexity. The second problem was that they required very large amounts of data to be trained due to the extremely high number of parameters to be optimized.
In the 21st century, there were developments to resolve both problems. In terms of hardware, so called "Graphical Processing Units (GPUs)" were introduced. They primarily targeted the game industry to accelerate computer graphics, giving their name. These computing were designed to be very powerful and able to handle the expensive mathematical calculations that are required for rapidly changing high-resolution images in games. Scientific research, AI, and particularly neural networks benefited from this technology for faster computing.
The second development was the data explosion thanks to initially internet, then smart devices. Billions of connected large and small devices started to generate continuous stream of digital data. This digital data, combined with global labor force for labeling, enacted large scale large-scale datasets that were later used to train neural networks. This solved the problem of training data.
Solving the scalability problem gave a significant boost to application of neural network to solve real life problems. Thanks to this scalability, AI developers were able to design deeper neural networks (giving rise to the name "deep learning") with many layers and apply different neural network architectures that are specialized to specific problems. Convolutional neural networks (CNNs), the algorithm that powered AlphaGo, was one of the earliest deep learning models that became famous.
In fact, even prior to the success of AlphaGo, the name "deep learning" was most used in reference to CNNs, starting a silent revolution in research and industry. This architecture was based on matrix-type data structures named "filters". Somewhat taking advantage of the divide-and-conquer approach, these filters turned out to be highly successful tools for solving non-conventional type machine learning problem like image recognition. Long before AlhaGo, CNNs showed highly superior performance in image classification problems (eg: separating cat and dog pictures from each other) in Kaggle machine learning competitions and their success was noticed by not only machine learning and image processing community, but the entire research society.
In very short amount of time, convolutional neural networks found widespread adaptation in many areas, including object detection, autonomous driving, face recognition, plate recognition (in highways), security, robotics and many other domains in 2010s. Today, CNNs are very commonly used in everyday applications for not only image recognition, but also for generative AI, especially image generation. Although CNNs were highly successful for image processing, they were no silver bullet to solve all machine learning problems. One of them was long-studied "machine translation" research, which aimed to automate translations of texts between languages. It appeared to be that another novel algorithm would fit this problem better. I will touch on this solution in my next article.



Comments