Machine Translation from English to Odia language.

Home page | MT_Flow_Pipleine | Wishlist

Project Moved

This project has been moved under OdiaNLP GitHub organization. For more details please visit:

Machine translation from English to Odia language

Analysis so far:

Machine Translation (MT) has started as early as on 1950s. Based on the progress on this field, the MT can be broadly categorized into following types:

The NMT is giving best score (BLEU score) followed by SMT and RBMT. As explained in this paper for Indic languages SMT is performing better (at least 10% higher) than RBMT. Based on the reading and analysis from some existing papers, as the corpus is low, for the time-being we will go ahead with SMT (Statistical Machine Translation) first. As the corpus grows, we will start testing our luck in NMT (Neural Machine Translation)

High level Roadmap

This road map is prepared based on my extra time and availability to work. If I will get more help we can deliver early.

Month Year Milestone Status
December 2018 Analyze and study the existing resources available on Internet Completed
January 2019 Study the reference papers and experts in NMT and analyze their opinions Completed
February 2019 Do same as January, concentrate more on the state-of-the-art practices Completed
Mar-Dec 2019 Parallel corpora generation  
September 2019 Data ingestion pipeline Initial draft prepared
October 2019 Read existing papers on MT and write the summary Delegated
Nov-Dec 2019 Parallel corpora generation and data cleaning In-progress

Further progress you can see over:

The parallel corpora generation data has been moved to Odia Wikimedia. There will be no further work unless we have achieved at least 10k (12k/10k achieved) parallel corpus.

Detailed works completed in December 2018

Detailed works completed in January 2019

Detailed works completed/ongoing in February 2019

Data Ingestion pipeline

Reading papers

Six Challenges for Neural Machine Translation

NMT deployment


Referred articles/websites:

Useful Open source libraries

Data collected from:

Prospective data corpus

These are few places where relevant data may be present, however getting the data is not straight forward.

Key Contributors

“In my dream of the 21st century for the State, I would have young men and women who put the interest of the State before them. They will have pride in themselves, confidence in themselves. They will not be at anybody’s mercy, except their own selves. By their brains, intelligence and capacity, they will recapture the history of Kalinga.” - Biju Pattnaik

Creative Commons License
This Website’s documentation work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.