<?xml version="1.0" encoding="UTF-8" ?><!-- generator=Zoho Sites --><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><atom:link href="https://www.maadhyamik.com/blogs/tag/malayalam-keyboard/feed" rel="self" type="application/rss+xml"/><title>Lexifyd - Blog #Malayalam Keyboard</title><description>Lexifyd - Blog #Malayalam Keyboard</description><link>https://www.maadhyamik.com/blogs/tag/malayalam-keyboard</link><lastBuildDate>Sat, 04 Apr 2026 13:39:55 +0530</lastBuildDate><generator>http://zoho.com/sites/</generator><item><title><![CDATA[Designing a New Input Method for Tamil]]></title><link>https://www.maadhyamik.com/blogs/post/designing-a-new-input-method-for-tamil</link><description><![CDATA[Learn our approach and process in designing a new Tamil input method to improve typing speed, accuracy, and user experience. Explore innovations in Tamil keyboard layouts, language technology, and native script usability.]]></description><content:encoded><![CDATA[<div class="zpcontent-container blogpost-container "><div data-element-id="elm_GLM76cwvSaGAHk7PHlyLqw" data-element-type="section" class="zpsection "><style type="text/css"></style><div class="zpcontainer-fluid zpcontainer"><div data-element-id="elm_W_SNgeJHSv-eKE7SLsZ40A" data-element-type="row" class="zprow zprow-container zpalign-items- zpjustify-content- " data-equal-column=""><style type="text/css"></style><div data-element-id="elm_ajF33mJMQ0uMMn26Vf6ZmQ" data-element-type="column" class="zpelem-col zpcol-12 zpcol-md-12 zpcol-sm-12 zpalign-self- "><style type="text/css"></style><div data-element-id="elm_yROeHF6VTFqsJ-wc8zxo0Q" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_yROeHF6VTFqsJ-wc8zxo0Q"].zpelem-text { border-radius:1px; } </style><div class="zptext zptext-align-center zptext-align-mobile-center zptext-align-tablet-center " data-editor="true"><div style="color:inherit;"><div style="color:inherit;"><div style="color:inherit;"><div style="color:inherit;"><div style="color:inherit;line-height:2;"><p style="text-align:justify;"><span style="font-family:quicksand, sans-serif;">The present input methods for Tamil, while providing reasonable support for using the language in Computers and other devices, they have several shortcomings. We will explore the current input methods and their shortcomings briefly before proposing a new Input Method for Tamil that can be used in both keyboard based and touch devices.</span></p><p style="text-align:justify;"><span style="font-size:18px;font-weight:500;">Current Input Methods:</span></p><div><p style="text-align:justify;"><span style="font-family:quicksand, sans-serif;">Presently there are 3 predominant input methods that are in wider use.</span></p><ul><li style="text-align:justify;"><span style="font-family:quicksand, sans-serif;">Tamil99</span></li><li style="text-align:justify;"><span style="font-family:quicksand, sans-serif;">Murasu Anjal</span></li><li style="text-align:justify;"><span style="font-family:quicksand, sans-serif;">Google Keyboard (GBoard)</span></li></ul><p style="text-align:justify;"><span style="font-family:quicksand, sans-serif;">The inadequacy of existing Input methods for Tamil and the idea for a new Input Method for the language has been around for several years [2]. More recently Elango Cheran published a detailed blog post [1] not only explaining the shortcomings of the current Input methods but also expounding his idea of exploiting the phonemic nature of the Tamil alphabets for the new Input method. Here is the brief summary of the key shortcomings of the current Input methods.</span></p><p style="text-align:justify;"><span style="font-family:quicksand, sans-serif;"><span style="font-weight:bold;">Unnatural Design:</span> In Tamil Consonant Vowels such as&nbsp; 'க' and 'தை' are generated by the combination of pure consonants 'க்' and 'த்' with the vowels 'அ' and 'ஐ' respectively. Thus the Vowels and the Consonants, which form the basic units of the sounds (phonemes) in Tamil, should be the basis of designing a good Input method. Several Input methods including the Tamil99 follow the unnatural design of Vowels and Consonant vowels (CV) such as 'க', 'ங' and 'ச', as the basic units in the Keyboard. To be fair, this design came to be used, because these CV characters</span></p><p style="text-align:justify;"><span style="font-family:quicksand, sans-serif;">Coming back to the Input methods, one of the unintended consequence of this design approach is that, this can produce illegal character sequences in Tamil such as with dangling consonant vowel modifiers such as '்', '</span></p><p style="text-align:justify;"><span style="font-family:quicksand, sans-serif;"><span style="font-weight:bold;">Transliteration Dependency:</span> Murasu Anjal and other transliteration keyboards, rely on users being familiar with the English alphabet and actually inputting the Tamil words in transliterated English letters, which are then transliterated back to Tamil. This method is hugely popular due to the high English literacy among the Tamil speakers the population around the world. However, we believe this is doing more harm because the speakers no longer have to learn the script but only the sounds in the language. Secondly, there are multiple ways to represent a Tamil character in English, because the variations in the sounds in the two languages.</span></p></div><p style="text-align:justify;"><span style="font-family:quicksand, sans-serif;"><span style="font-weight:bold;">Non-adherence to QWERTY layout:</span> Most of the <strong><a href="https://www.maadhyamik.com/tamil-izhai" title="Tamil keyboards" rel="">Tamil keyboards</a></strong> do not adhere to the widely-used QWERTY layout in terms of the key positions reserved for punctuations and other symbols in the keyboard. These Input methods assign Tamil characters in these positions. Consequently, the bilingual users using QWERTY will find it difficult to switch back and forth between English and Tamil typing and they will be forced to learn and follow the different key positions for typing punctuations and symbols while using Tamil.</span></p><p style="text-align:justify;"><span style="font-family:quicksand, sans-serif;"><span style="font-weight:bold;">GBoard Design Incongruity:</span> The Google Keyboard or GBoard for Tamil is the soft key layout launched for touch devices. It lays out the Vowels and Consonant Vowels ('அ' வரிசை such as 'க', 'ங', 'ச', ...) in a 9x4 matrix.&nbsp;The vowel character panel on the left changes every time a consonant vowel is pressed to show its other CV variations. The layout uses a simplistic sequential positional of characters in the alphabet, without any concern for either optimizing finger movements or the character frequency based layout design. Combined with the incongruity of ever changing vowel panel, GBoard's design choice is probably the least efficient Tamil key layout in use.</span></p><p style="text-align:justify;"><span style="font-family:quicksand, sans-serif;"><br/></span></p><p style="text-align:justify;"><span style="font-size:18px;font-weight:500;">A New Input Method - Design Principles:</span></p><p style="text-align:justify;"><span style="font-family:quicksand, sans-serif;">We want to design a new Input Method for Tamil that address the shortcomings in the existing one and also makes it easier to learn the new method with a short learning curve. Based on our research, we decided on the following design goals for the new Input method.</span></p><ol><li style="text-align:justify;"><span style="font-family:quicksand, sans-serif;">A design that adheres to and exploits the Phonemic nature of Tamil, taking the phonemes as the basic unit</span></li><li style="text-align:justify;"><span style="font-family:quicksand, sans-serif;">Frequency analysis of base phonemes and consonant vowel combination in order to achieve an optimal design that speeds up touch typing in computers and equivalently reduces finger movement in touch devices</span></li><li style="text-align:justify;"><span style="font-family:quicksand, sans-serif;">Intuitive arrangement of keys to make the learning easier that is consistent across different platforms and devices</span></li><li style="text-align:justify;"><span style="font-family:quicksand, sans-serif;">Prevent any illegal character sequences in the output text</span></li><li style="text-align:justify;"><span style="font-family:quicksand, sans-serif;">Maximize compatibility with the QWERTY keyboard to make the transition between English and Tamil typing seamless and easier.</span></li><li style="text-align:justify;"><span style="font-family:quicksand, sans-serif;">Eliminate the forced requirement for the user to know other script/ language and instead facilitate typing in the Tamil script</span></li></ol><p style="text-align:justify;"><span style="font-family:quicksand, sans-serif;">It should be noted that, the same design principles could be used for designing better Input methods for other Abugida languages as well.</span></p><p style="text-align:justify;"><span style="font-family:quicksand, sans-serif;"><br/></span></p><p style="text-align:justify;"><span style="font-size:18px;font-weight:500;">Designing the Input Method:</span></p><p style="text-align:justify;"><span style="font-family:quicksand, sans-serif;">We started the design by identifying Tamil corpus data for doing usage frequency analysis of the characters in the language. We identified large enough corpora (approx.&nbsp;<span style="font-weight:bold;">537M words</span>) from two different sources as below:</span></p><ul><li style="text-align:justify;"><span style="font-family:quicksand, sans-serif;"><a href="https://www.kaggle.com/datasets/praveengovi/tamil-language-corpus-for-nlp" target="_blank" rel="nofollow noreferrer noopener">Kaggle - Tamil Language Corpus for NLP</a></span></li><ul><li style="text-align:justify;"><span style="font-family:quicksand, sans-serif;">Tamil Articles Corpus</span></li><li style="text-align:justify;"><span style="font-family:quicksand, sans-serif;">Tamil New Corpus</span></li><li style="text-align:justify;"><span style="font-family:quicksand, sans-serif;">Tamil Language Corpus</span></li></ul><li style="text-align:justify;"><span style="font-family:quicksand, sans-serif;"><a href="https://github.com/ajithalbus/TamilCorpus" target="_blank" rel="nofollow noreferrer noopener">Github - Opensource Tamil Corpus</a></span></li><ul><li style="text-align:justify;"><span style="font-family:quicksand, sans-serif;">Wikipedia, TheHindu - 58M words in total​</span></li></ul></ul><div style="text-align:justify;"><span style="font-family:quicksand, sans-serif;"><br/></span></div><p style="text-align:justify;"><span style="font-size:18px;font-weight:500;">Frequency Analysis:</span></p><p style="text-align:justify;"><span style="font-family:quicksand, sans-serif;">Our goal is to understand the usage frequency of basic phonemes as well as for the full set of Tamil alphabets including the consonant vowel characters. Once we understand the usage frequency of the phonemes and the full set of alphabets, we can exploit this information to design the keyboard layout. It should be noted that we have omitted the Sanskritized characters (வடமொழி எழுத்துக்கள்) 'ஜ்', 'ஶ்', 'ஷ்',&nbsp;</span></p><p><span style="font-family:quicksand, sans-serif;"><img src="/images/Blog-images/ta_chars_freq.png"/><br/></span></p><p style="text-align:justify;"><span style="font-family:quicksand, sans-serif;">Among the top-10 most frequent characters are, we have 5 consonant vowels and 4 'அ' ending CVs and&nbsp;</span></p><ol><li style="text-align:justify;"><span style="font-family:quicksand, sans-serif;">Consonants: ம், ர், ல், க் and ன் - 349.13M</span></li><li style="text-align:justify;"><span style="font-family:quicksand, sans-serif;">'அ' ending CVs: க, த, ப and வ - 306.19M</span></li><li style="text-align:justify;"><span style="font-family:quicksand, sans-serif;">'உ' ending CV: து - 64.63M</span></li></ol><p style="text-align:justify;"><span style="font-family:quicksand, sans-serif;">Thus by using the base phonemes (vowels and pure consonants) for out keyboard layout, would result in a saving of nearly 43M keystrokes for this dataset. Now consider two more heatmaps i) by characters ending with vowel sounds (column-wise sum of the above heatmap) and ii) by the characters for each consonant-vowel series (row-wise sum).</span></p><p><img src="/images/Blog-images/ta_vowel_freq.png" alt="Freq analaysis of characters ending with Vowel sound" style="width:915px;"/><br/></p><p style="text-align:justify;"><span style="font-family:quicksand, sans-serif;">Notice that the pure consonants (right-most cell) tend to be more frequent than any consonant vowel series. Here again by using the basic phonemes as the keys instead of the 'அ' ending consonant vowels, these pure consonants can be typed with a single key press as opposed to two presses, saving about 37M keystrokes on this dataset. Also notice that the vowels and CVs ending in short form vowel sound are much more frequent than their long form counterparts.</span></p><p><span style="font-family:quicksand, sans-serif;"><img src="/images/Blog-images/ta_consvowel_freq.png"/><br/></span></p><p style="text-align:justify;"><span style="font-family:quicksand, sans-serif;">Using the frequency statistics of the Vowels (first two in the first heatmap)&nbsp;and the&nbsp;Consonant vowels above, we can design optimal keyboard layout to minimize the movement of fingers and to use the dominant fingers for the high frequency phonemes. The next section discusses the design decisions and explains our Tamil Phonemic keyboard layout.</span></p><p style="text-align:justify;"><span style="font-family:quicksand, sans-serif;"><br/></span></p><p style="text-align:justify;"><span style="font-weight:500;font-size:18px;">Phonemic Keyboard Layout Design for Tamil:</span></p><p style="text-align:justify;"><span style="font-family:quicksand, sans-serif;">As we mentioned earlier in our design goals, we want the new keyboard layout to be easier for the users to learn and use across different devices. We want to minimize&nbsp; Given the constraints of available keys and total required keys, we had to make certain design decisions in the character assignment to the key positions.</span></p><ol><ol><li style="text-align:justify;"><span style="font-family:quicksand, sans-serif;">We want to use Home row of the keyboard for the Vowels and some high-frequency Consonants in the language</span></li><li style="text-align:justify;"><span style="font-family:quicksand, sans-serif;">We also believe that the dominant index and middle finger keys in two non-Home rows should take precedence over the Home row keys with weaker fingers. We'll be using this later in optimizing the character assignment to the keys.</span></li><li style="text-align:justify;"><span style="font-family:quicksand, sans-serif;">Given the high frequency of short form vowel ending characters, we have retained the short vowels in the Home row and assign the corresponding long form vowels to the same key in the Shift row. Following the wider convention, we've assigned the vowels on the left side of the keyboard.</span></li><li style="text-align:justify;"><span style="font-family:quicksand, sans-serif;">Most of the Consonants are going to be assigned in the right hand side of the keyboard to exploit the dominant hand for the majority of the population. Note that this arrangement allows the CVs to be typed efficiently by a mix of both hands, without making the same hand/ finger to move to a different position for typing a single character.</span></li><li style="text-align:justify;"><span style="font-family:quicksand, sans-serif;">Based on our observation #2 above, the frequent Consonants starting with 'க்' and 'த்' are assigned in the dominant finger positions in the Home row and the rows above and below. We assign the rest of the consonants to the weaker key positions in the decreasing frequency order.</span></li><li style="text-align:justify;"><span style="font-family:quicksand, sans-serif;">We now specifically consider the case of மெல்லினம் (nasalized consonants), which are typically be followed by the corresponding வல்லினம் (plosive/ stop consonant) in the Tamil text. Thus, it made sense for us to place these nasalized consonants on the left side of the keyboard (above and below the home row) so that the following வல்லினம் can be typed with the right hand. We made an exception for 'ம்' and assign it to the dominant key position on the right side, due to its high frequency in both Consonant and CV forms.</span></li><li style="text-align:justify;"><span style="font-family:quicksand, sans-serif;">On the Shift key layout, we assigned the Tamil numerals right below the roman numerals to make the typing intuitive and easier. Additionally, the Sanskritized consonants and other Tamil symbols are assigned on this Shift layout.</span></li></ol></ol><p style="text-align:justify;"><span style="font-family:quicksand, sans-serif;">The screenshots of the new keyboard layout for the regular and Shift keys are below.</span></p><p style="text-align:justify;"><br/></p><p><img src="/images/Blog-images/Tamil_Phonemic_Keyboard_Shift.png" style="width:808.01px;"/></p><p style="text-align:justify;"><br/></p><p><img src="/images/Blog-images/Tamil_Phonemic_Keyboard_Reg.png" style="width:809.02px;"/><br/></p><p style="text-align:justify;"><br/></p><p style="text-align:justify;"><span style="font-weight:500;font-size:18px;">Input Method Analysis:</span></p><p style="text-align:justify;"><span style="font-family:quicksand, sans-serif;">Based on the keyboard layouts for the three input methods, viz Anjal, Phonemic and Tamil99, we analyzed their efficiency and ease of typing in two ways. We first calculated the number of absolute keystrokes required to type the Tamil words in the above corpora of 537M words. To keep the analysis simple, we ignored the punctuations and any non-Tamil words/ characters for this. We also ignored the shift key here because the shift key is pressed <span style="font-style:italic;">simultaneously</span> with the key following it. Here are the absolute number of keystrokes required for typing the above Tamil corpora by the 3 input methods.</span></p><ul><ul><li style="text-align:justify;"><span><span style="color:inherit;text-align:center;font-family:quicksand, sans-serif;">Anjal : 4,470,795,879</span></span></li><li style="text-align:justify;"><span><span style="color:inherit;text-align:center;font-family:quicksand, sans-serif;">Phonemic : 4,045,040,635</span></span></li><li style="text-align:justify;"><span><span style="color:inherit;text-align:center;font-family:quicksand, sans-serif;">Tamil99 : 4,124,838,873</span></span></li></ul></ul><p style="text-align:justify;"><span style="font-family:quicksand, sans-serif;"><span style="color:inherit;text-align:center;">The Phonemic method requires the least number of keystrokes among the 3 methods; specifically it requires 80M fewer keystrokes than Tamil99. This is because, the pure consonants are usually frequent than their CV combination. Tami99 requires an additional keystroke </span><span style="color:inherit;text-align:center;">'்</span><span style="color:inherit;text-align:center;">' for typing the pure consonants.&nbsp;</span><span style="color:inherit;">For Anjal, we used the standard transliteration mapping as suggested in the Sellinam app, thus requiring two keystrokes for each long vowel as well as for long CV combinations.</span></span></p><p style="text-align:justify;"><span style="font-family:quicksand, sans-serif;">We then analysed the heatmap on the keyboard layouts of the three input methods to see which keys are typed in more frequently and their relative position in the keyboard. We plotted the heatmap on the 3 keyboard layouts separately for this analysis. As above, we ignored the punctuation marks and non-Tamiil words to keep this analysis simple. However, we considered the shift key in this analysis.</span></p><p style="text-align:justify;"><span style="font-family:quicksand, sans-serif;"><img src="/images/Blog-images/anjal_heatmap.png" style="color:inherit;"/><br/></span></p><div style="color:inherit;"><p style="text-align:justify;"><span style="font-family:quicksand, sans-serif;">The layout will be easier for typing if the frequently typed keys are in the position of dominant fingers of both hands or in the Home row of the keyboard. The Anjal keyboard is clearly the least efficient option as the most of the frequently used keys are outside of the dominant finger positions of the keyboard.</span></p></div><p style="text-align:justify;"><span style="font-family:quicksand, sans-serif;"><img src="/images/Blog-images/phonemic_heatmap.png"/><br/></span></p><p style="text-align:justify;"><span style="font-family:quicksand, sans-serif;">Between the Phonemic and Tamil99 keyboards, the frequent keys are mostly located in the dominant finger position, which makes the typing easier. The dominant left and right index fingers positions (in all 3 rows) alone account for 61.22% and&nbsp;<span style="color:inherit;text-align:center;">46.32%&nbsp;</span><span style="color:inherit;">of the overall typing in Phonemic and Tamil99 keyboards respectively. This difference of 15% is significant and makes the Phonemic layout a better (in terms of ease of use) option than the Tamil99 keyboard.</span></span></p><p style="text-align:justify;"><span style="font-family:quicksand, sans-serif;"><img src="/images/Blog-images/tamil99_heatmap.png"/><br/></span></p><p style="text-align:justify;"><span style="font-family:quicksand, sans-serif;"><span style="color:inherit;">We then look at the percentage of typing for the keys in the Home row, which is the usual resting position for the hands when not typing. It thus has the advantage that the user will not have to move their hands from its resting position. The Phonemic layout accounts for 58.33% of the Home row typing, while the Tamil99 is slightly better with 61.83% of overall typing. We believe this small difference of Home row typing is far outweighed by the advantage gained in the Phonemic keyboard layout by the dominant index fingers across all the rows.&nbsp;</span><span style="color:inherit;">In addition to the efficiency in typing, the new Phonemic keyboard layout offers other advantages over the Tamil99 keyboard as discussed earlier in this post.</span></span></p><p style="text-align:justify;"><br/></p><div style="text-align:left;"><div><b><span style="font-size:18px;font-weight:500;">Explore Frequency Analyses in Other Languages:</span></b></div></div><div><b><div style="text-align:left;"></div></b><div style="text-align:left;"><span>We’ve performed similar analysis for other Indian languages as well. Explore them below:</span></div><ul><li style="text-align:left;"><span>Character Frequency Analysis for&nbsp;<a href="https://www.maadhyamik.com/blogs/post/character-frequency-analysis-for-hindi">Hindi</a></span></li><li style="text-align:left;"><span>Character Frequency Analysis for&nbsp;<a href="https://www.maadhyamik.com/blogs/post/character-frequency-analysis-for-kannada" rel="">Kannada</a></span></li><li style="text-align:left;"><span>Character Frequency Analysis for&nbsp;<a href="https://www.maadhyamik.com/blogs/post/character-frequency-analysis-for-malayalam">Malayalam</a></span></li><li style="text-align:left;">Character Frequency Analysis for&nbsp;<a href="https://www.maadhyamik.com/blogs/post/character-frequency-analysis-for-telugu" title="Telugu" rel="">T</a><a href="https://www.maadhyamik.com/blogs/post/character-frequency-analysis-for-telugu" title="Telugu" rel="">elugu</a></li></ul></div><p style="text-align:justify;"><span style="font-family:quicksand, sans-serif;"><br/></span></p><p style="text-align:justify;"><span style="font-weight:500;font-size:18px;">References:</span></p><ol><ol><li style="text-align:justify;"><span style="font-family:quicksand, sans-serif;"><a href="https://elangocheran.com/2022/02/14/redesigning-an-input-method-for-an-abugida-script/" target="_blank" rel="nofollow noreferrer noopener">Redesigning an Input Method for an Abugida Script</a>. Elango Cheran's Blog</span></li><li style="text-align:justify;"><a href="https://echeran.files.wordpress.com/2020/11/07scheran.pdf" target="_blank" rel="nofollow noreferrer noopener" style="font-family:quicksand, sans-serif;">Optimization of Tamil Phonetic Keyboard</a><span style="font-family:quicksand, sans-serif;color:inherit;">. Sendhil Kumar Cheran, Thuraiappah Vaseeharan and Elango Cheran. Tamil Internet Conference. 2004.</span><br/></li></ol></ol></div></div></div></div></div></div>
</div></div></div></div></div></div> ]]></content:encoded><pubDate>Wed, 22 Mar 2023 16:15:41 +0530</pubDate></item></channel></rss>