AI-powered Spellchecker for Khmer to Raise Language Skills

Data scientist Danh Hong has developed an AI-powered spellchecker for the Khmer language and is hoping it will help Cambodians with their writing
Data scientist Danh Hong. Kiripost/Bun Tharum
Data scientist Danh Hong. Kiripost/Bun Tharum

Prominent data scientist Danh Hong is hopeful that an AI-powered spellchecker for the Khmer language, that he is developing, will be another breakthrough for writers as writing the native language is a major obstacle for many Cambodians.

Hong, 51, began working on Optical Character Recognition (OCR) in 2010 and put the into use in 2018 to 2019. However, the platform, which users can also download from smartphone stores, has recently become popular.

For Hong, when he writes Khmer, before he posts on Facebook to ensure he has not made any mistakes, he copies and pastes texts and spell checks first for corrections.

It is not that he doesn’t know how to spell words but sometimes it can be a mistype or misunderstanding, he said.

He is from Kampuchea Krom, or Bạc Liêu province, a coastal province in the southern part of Vietnam. He moved to Cambodia in 2000 to work with an NGO and in the IT sector.

Hong said that Khmer spellcheck presents unique challenges due to the complexity of the script and the presence of Sanskrit and Pali loan words.

Additionally, Khmer does not use spaces between words, making it difficult to identify word boundaries, he said, adding, however, with the help of machine learning, it is possible to develop an effective spell checker for Khmer.

He added that this is helpful for journalists who write Khmer so that there is one agreement between the spelling.

There are many other platforms and apps that do similar Khmer spell checking but for him, he uses AI to update data regularly when new words enter the system, which is like a human brain that can detect new words and distinguish what is right and wrong.

“We are using new modern technology to implement, and for our Khmer language, it is very difficult. It is not like the English language, which is not so difficult because words are space and separated, so finding wrong words is not difficult,” he said.

“But for our Khmer language, it is complicated, so we use AI technology, which helps a lot,” he added.

Hong's spellchecker is based on Chuon Nath’s dictionary but it does not have all new words, such as names of computers and other tech-related words. So, he bases it on other websites and the public who wrote those words.

“With AI, the more data there is, the more accurate it is, so to be 100 percent accurate, it is not easy, it will take some time,” he said.

Pa Chanroeun, President of the Cambodian Institute for Democracy, said that will be a big help in terms of writing Khmer. He gave an example of a certificate that had misspelled words and urged the government to support Hong, including financially.

Chanroeun said that the inability to write and speak Khmer language properly is a major issue, saying that the quality of language education should be reviewed.

“Those who study English for five or six years, perhaps they can use English more properly than the Khmer language nowadays, especially writing and speaking,” Chanroeun told Kiripost.

Writing Khmer is also a lot harder because the teaching has not been good, he said, adding that will be the platform to help spell check.

Tuy Engly, a freelance writer, said that she has used the next spell check before. But she does not use it so much now as she does not find it necessary.

“Perhaps I know about it more now, so I no longer need to use it,” Engly told Kiripost.