Innovation in Machine Learning Translates to Top Honors for Stevens Computer Science Students
Team enables Hinglish translation to bring home three first-place awards in Stevens’ first-ever entry in the elite WMT22 machine translation competition
Thanks to free services such as Google Translate, it’s astonishingly easy for anyone with a computer or smart phone to translate written words among more than a hundred well-known, established languages in a matter of seconds. But what about languages such as Hinglish, a hybrid mix of English and Hindi that has become the common vernacular of Hindi and Urdu speakers, particularly on platforms including Facebook and Twitter?
Translating that emerging language is the challenge Stevens Institute of Technology computer science master’s degree students Girish Bundhrani, Preet Jhanglani and Hrishikesh Kanade and their advisors tackled in the 2022 Seventh Conference on Machine Translation. Better known as WMT22, this annual event is one of the world’s largest machine learning translation competitions showcasing expertise and advances in this area. The Stevens team worked on the shared task titled “Code-mixed Machine Translation.”
And while it was the first time Stevens had competed in this prestigious event, the team was able to bring home top honors: three awards for first place and one for third place.
‘We are getting closer to achieving human-like translation.’
Multilingual communities such as India and Spain are known to mix words and phrases from different languages, often in the same sentence, in a phenomenon known as code-mixing or code-switching. For WMT22, the Stevens team competed against seven other teams such as the University of Edinburgh to enable machine translation of English and Hindi inputs to Hinglish, and of Hinglish inputs to English.
“Hindi, being the fourth-largest spoken language, and using Devanagari script, is often combined with Roman script in social media applications in India and many other South Asian countries,” said Bundhrani. “With these increasing trends, it has become imperative to develop machine translation models that include Hinglish. WMT22 has been a wonderful place to explore this area and gain hands-on experience with the latest tools and techniques in the natural language processing space.”
The team’s approach was based on its research on domain adaptation (a form of transfer learning), data augmentation (addressing the challenge of limited data), back-translation (retranslating translated content back to its original language), ensemble technique (creating and combining learning models) and both existing and team-created pre-trained multilingual natural language processing models such as Google mT5 XL.
“So far, a machine can never match the accuracy of human translation,” said Jhanglani, “but with these kinds of competition and research, we are getting closer to achieving human-like translation. As a student at Stevens, I’ve appreciated the opportunity to use my theoretical knowledge into application-based projects such as this.”
The team earned first place in the translation of Hinglish to English in both the WER (word error recognition) and ROUGE-L (recall-oriented understudy for gisting evaluation) metrics, as well as first place in WER and third place in ROUGE-L for its translation of Hindi and English to Hinglish.
“Stevens has a proud tradition and records in athletic competitions, and we believe we can also pursue the same in scientific competitions,” said Jia Xu, assistant professor in the Department of Computer Science, and the team’s co-advisor with Abdul Rafae Khan, postdoctoral researcher at Stevens. “Even so, I was surprised by the fast learning of these students. They had no experience in natural language processing research, and in a few months, they were competing with researchers who have been working on machine translation for years. Their knowledge, dedication, persistence and unity were the keys to their outperforming.”
In December, at the 2022 Conference on Empirical Methods in Natural Language Processing, the team will travel to Dubai to present its systems and network with others in the field.