top of page
Search

Rise of AI in automated translation: RNNs

  • Yasin Uzun, MSc, PhD
  • Sep 21, 2024
  • 3 min read

Updated: May 25

RNNs were the first deep learning method that could handle sequential information.

ree

One of the long standing research fields of computational community is the machine translation. This task is essentially translating sentences, paragraphs and documents between languages automatically. It is not hard to appreciate the value of such an effort in a globalized word with instant communication. In fact, we have access to this service freely through the services such as Google Translate. But just about two decades ago (early 2000s), the level of efficiency that is currently available for public was beyond most people's dream.


For decades the natural language processing researchers have been working to understand the grammatical components of human languages and embed them into a language processing system to do tasks such as understanding texts and translating them between languages. There was limited incremental improvements in this field for a long period of time.


Traditionally, machine learning methods were only applied to problems with a table-formatted inputs, mostly a set of predefined size of numerical values. Language processing did not fit into this format very well. First, the data is not numerical, but it is stream of words. Second, it is not structured data of fixed-length lists but have variable length, that's why they are considered as "unstructured" data. Hence, methods like neural networks (which are ancestors or backbone of deep learning) could not be applied to language processing widely. Another problem was that, the translations of words dependent on context: the word "fly" can have a different meaning when you are talking about airplanes (verb) or fishing (noun). This association was not straightforward to capture using machine learning methods. Moreover, the order of the words are also important, posing an additional challenge.


First problem is addressed in two ways. In one approach, each word in the dictionary has an index and that index is used as a numerical transformation called one-hot encoding. But the more popular way (which also powers ChatGPT) is the word embedding. In this method, each word has some association from each of the different topics (for example football has strong association with sports, but weak association with farming). In this embedding, each word is represented as a list of values that consists of topic associations that are depicted with numerical values.


The other problems were solved with a special architecture. Just like convolutional neural networks (CNNs) have revolutionized the field of image processing, so-called "Recurrent Neural Networks (RNNs)", (the ancestor of the algorithm that is used in ChatGPT in some sense), would revolutionize the field of natural language processing.


In the heart of the RNNs, there is the architecture of "Encoder-Decoder", which is also the heart of the transformer learning models, which power ChatGPT and other large language models. Essentially, Encoder summarizes the set of words in a numerical way. While encoding, the Encoder does not only use one word at a time, but surrounding words as well. Moreover, for each word's encoding, it also uses the combined encoding of the previous words so far, generating a memory.


The "Encoder" can be thought as a tool that compresses information a numerical format. The "Decoder" reads the encoding and decompresses the information into words. More clearly, you can think that "Encoder" transforms a text in to an "abstract language" that is only understandable to a "Decoder". Then, the "Decoder" transforms this abstract language to the target language.


It is highly crucial to understand the "Encoder-Decoder" concept because it is the fundamental architecture for certain other deep learning methods as well. Probably most important of these is the "Transformer", which powers ChatGPT ("T" in GPT stands for "Transformer") and other language learning models (LLMs).


Hence, application of recursive neural networks for natural language processing is a very important milestone in deep learning. It is important to understand the essence of RNNs to understand the modern LLMs. In my next article, I will touch on the Transformer, which is currently the poster child and star of not just AI, but also the overall tech industry.


Additional Reading:

For a clear description of RNNs, please read the fantastic post of Thomas Tracey in Towards Data Science :

 
 
 

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating

© 2024 by Systems Biology Consulting & Analytics LLC.

bottom of page