Dataset Summary
Our Telugu analysis is based on a large and diverse dataset compiled from multiple sources in the web including the following key resources.
- IndicCorp
- Leipzig Telugu Corpus
- Samanantar v0.3 (En-Indic; Indic-Indic)
- Telugu News Articles Dataset
- Telugu Books Dataset
After pre-processing, this dataset contained over 1.45 billion total words and 12.3 million unique words, providing rich & diverse data for studying character frequency patterns in the language. This extensive dataset provides a strong statistical foundation for understanding character usage patterns in Telugu helping, us design an optimal keyboard layout.
Overall Character Frequency Heatmap
We created a heatmap to visualize the frequency of Telugu characters, where vowels are represented in columns and consonants in the rows. Each cell represents the frequency of a basic vowel, consonant or consonant-vowel (CV) combination, with darker shades indicating higher usage

Vowels such as అ (a) dominate the frequency spectrum, similar to how 'e' and 'a' are dominant in English. Consonants like క (ka), న (na), and ల (la) also show high usage, reflecting their foundational role in Telugu script. A number of characters, particularly aspirated or Sanskrit-derived forms, show very low or near-zero usage, indicating their limited role in everyday Telugu. We also note a unique pattern where long vowels like ఓ (oo) are more frequent than their short counterparts, a trend not commonly seen in other Indic scripts.
Vowel Frequency Analysis
By calculating the column-wise sum of the above character frequency matrix, we obtain the frequency heatmap of each vowel, either in its base form or as a vowel sign in a consonant-vowel (CV)/ consonant conjuncts.
As in many other languages, the short vowel forms—అ (a), ఇ (i), and ఎ (e)—are generally more frequent than their long counterparts. However, Telugu exhibits a unique pattern with the long vowels ఏ (ee) and ఓ (oo), which occur significantly more often than their corresponding short forms. Notably, ఓ (oo) is used nearly five times more frequently than ఒ (o), highlighting a distinct phonological preference in Telugu. Vowels like ఊ (U) and ఋ (vR) appear far less frequently, indicating limited usage in modern Telugu text. These insights are crucial for prioritizing vowel placement in keyboard layouts.
The final cell in the heatmap represents the frequency of the Telugu pollu sign, which denotes a pure consonant (i.e., without an inherent vowel). Interestingly, its distribution is comparable to that of the CV forms with the vowel అ (a)—found in the first column—indicating a similar usage pattern across consonants.
Consonant Frequency Analysis
We then started analysis of the Telugu consonants. By summing the rows of the character frequency chart, we obtained the frequency of each consonant, across its different vowel combinations. This row-wise analysis reveals the most commonly used consonants in Telugu.
The consonant forms of ర్ (r), న్ (na), and ಲ್ (la) emerge as the most frequently used in Telugu, underscoring their central role in the phonetic structure of the language. Following closely are క్ (ka), త్ (ta), and ప్ (pa), each with approximately 300 million occurrences. In contrast, aspirated and less commonly used consonants such as ఙ్ (nga), ఝ్ (jha), and ఢ్ (ḍha) appear only rarely, reflecting their limited presence in contemporary Telugu usage.
Special Characters and Ligatures
Telugu, like many Indic scripts, includes a variety of ligatures and conjunct characters. However, to support commonly used consonant conjuncts in Telugu, we’ve assigned త్ర్ (tr), క్ష్ (kss), శ్ర్ (shr), and జ్ఞ్ (jny) to the Alt + Shift positions of specific keys on desktop keyboards. On mobile keyboards, where the Alt key is not available, these conjuncts can be accessed by long-pressing the keys Y, U, I, and O, respectively. This design ensures that these frequently used clusters remain easily accessible across both desktop and mobile platforms.
Keyboard Design Implications
Our Telugu character frequency analysis is a must-read for those interested in refining the Telugu keyboard layout. With growing demand for intuitive Telugu phonetic keyboards and mobile-friendly Telugu keyboard solutions, this data helps developers to validate and appropriately align their design based on actual language usage patterns.
The insights from this frequency analysis directly inform our design strategy for the Varta Telugu phonetic keyboard. Frequently used characters are placed in more accessible positions (home row or index finger positions), while less common ones are assigned to secondary layers (e.g., Shift or long-press positions). This ensures a balance between comprehensive script coverage and ease of use. To streamline the typing experience, we decided not to include separate vowel signs. Instead, the correct sign is automatically generated from the consonant and vowel input, reducing errors and simplifying the keyboard design.
You can explore our optimized keyboard layout through the Varta Keyboard apps, available on Android and iOS, as well as through browser extensions for Chrome, Edge, and Safari.
Explore Frequency Analyses in Other Languages
We’ve performed similar analysis for other Indian languages as well. Explore them below:
- Character Frequency Analysis for Hindi
- Character Frequency Analysis for Kannada
- Character Frequency Analysis for Malayalam
- Character Frequency Analysis for Tamil (includes design principles)