Jia Xu
Assistant Professor
Charles V. Schaefer, Jr. School of Engineering and Science
Department of Computer Science
- PhD (2010) RWTH-Aachen University (Computer Science)
I am creating methods for competitive machine translation systems. These methods often push things beyond the current state-of-the-art. To achieve this, I am devising general machine learning methods, study their empirical and theoretical limitations, and introduce techniques in ensemble learning, subsampling methods, and bringing geometric techniques in the study of structured prediction.
General Information
I am an assistant professor at the Stevens Institute of Technology. My current research interests are in Machine Learning, with a focus on highly competitive machine translation systems. Lately, I have developed an interest and devise techniques that explore the underlying Metric and Geometric properties of machine translation systems. I am publishing in mainstream venues in computational linguistics and machine learning (e.g., AAAI, ICML, ACL).
Institutional Service
- Undergraduate Advisor Member
- SES Faculty Advisory Council (FAC) Member
- Participation on the Election of Committee of Committee Member
- Hiring Committee Member
- Working group on improving student-faculty interaction at Stevens Member
Professional Service
- AAAI Program Committee
- ACL Program Committee
- EMNLP Program Committee
- IJCAI Program Committee
- NAACL Program Committee
- ICML Program Committee
- NeurIPS Program Committee
- Transactions on Knowledge Discovery from Data Reviewer
Consulting Service
Hiring Committee member
Working group on improving student-faculty interaction
Working group on improving student-faculty interaction
Honors and Awards
WMT 2022 SIT 1st Code-Mixing MT subtask II Hinglish->English
WMT 2022 SIT 1st (w.r.t. WER) Code-Mixing MT subtask I Hindi+English->Hinglish
WMT 2018 Hunter 1st French-English Biomedical track team leader
WMT 2017 Hunter 1st (w.r.t. BLEU) Finnish-English News track team leader
NIST 2015 ICT-DCU 1st and 4th 1st (academic inst.) and 4th (overall) team leader and main contributor
WMT 2011DFKI 1st (w.r.t. BLEU) English-German News track team leader
NIST 2008 MSR 1st intern at MSR
NIST 2006 RWTH-Aachen 4th
NIST 2005 RWTH-Aachen 4th
NIST 2004 RWTH-Aachen 2nd
GALE 2008 RWTH-Aachen 2nd in NightInGale
GALE 2007 RWTH-Aachen 2nd in NightInGale
GALE 2006 RWTH-Aachen 2nd in NightInGale
TC-Star 2006 RWTH-Aachen 1st
TC-Star 2005 RWTH-Aachen 1st
TC-Star 2004 RWTH-Aachen1st
WMT 2022 SIT 1st (w.r.t. WER) Code-Mixing MT subtask I Hindi+English->Hinglish
WMT 2018 Hunter 1st French-English Biomedical track team leader
WMT 2017 Hunter 1st (w.r.t. BLEU) Finnish-English News track team leader
NIST 2015 ICT-DCU 1st and 4th 1st (academic inst.) and 4th (overall) team leader and main contributor
WMT 2011DFKI 1st (w.r.t. BLEU) English-German News track team leader
NIST 2008 MSR 1st intern at MSR
NIST 2006 RWTH-Aachen 4th
NIST 2005 RWTH-Aachen 4th
NIST 2004 RWTH-Aachen 2nd
GALE 2008 RWTH-Aachen 2nd in NightInGale
GALE 2007 RWTH-Aachen 2nd in NightInGale
GALE 2006 RWTH-Aachen 2nd in NightInGale
TC-Star 2006 RWTH-Aachen 1st
TC-Star 2005 RWTH-Aachen 1st
TC-Star 2004 RWTH-Aachen1st
Professional Societies
- Program Committee members of EMNLP, ACL, NAACL, AAAI, IJCAI, NIPS, ICML Member
- Reviewer of IEEE Journal Member
- ACL 2019 – Program Committee of ACL Member
- Program Committee of IJCAI Member
- AAAI – Program Committee of AAAI Member
- Program committee of EMNLP Member
Grants, Contracts and Funds
NSF CRAFT Pilot 30,000 USDPrincipal investigator 2022 Center for Research toward Advanced Financial Technologies (CRAFT) NSF IUCRC
NSF grant 299,000 USD Co-PI 2018-2023 IUCRC Phase I Rutgers, Newark: Center for Accelerated Real Time Analytics (CARTA) ID: 1747728
NSFC (NSF-China) grant 660,000 RMB (100,000 USD) Co-PI 2017--2019 Key Problems for Tightly-coupled, Multi-signal Fusion based Simultaneously Locating and Mapping ID: 61672524
ICT-CAS grant (Innovation subjects) 500,000 RMB (83,000 USD) Principal investigator 2015--2017 Ensemble learning in machine translation ID: 20156020
KLIIP-ICT-CAS grant 200,000 RMB (33,000 USD )Principal investigator 2015 -- 2016 Novel machine learning methods 20156020 NSFC grant 660,000 RMB (100,000 USD) Co-PI 2014-2017 New approaches to the limits of efficient propositional reasoning: algorithms, approximations and foundations ID: 20131351464
IIIS-Tsinghua grant 150,000 RMB (25,000 USD) Principal investigator 2012-2015 Machine learning and machine translation
NSF grant 299,000 USD Co-PI 2018-2023 IUCRC Phase I Rutgers, Newark: Center for Accelerated Real Time Analytics (CARTA) ID: 1747728
NSFC (NSF-China) grant 660,000 RMB (100,000 USD) Co-PI 2017--2019 Key Problems for Tightly-coupled, Multi-signal Fusion based Simultaneously Locating and Mapping ID: 61672524
ICT-CAS grant (Innovation subjects) 500,000 RMB (83,000 USD) Principal investigator 2015--2017 Ensemble learning in machine translation ID: 20156020
KLIIP-ICT-CAS grant 200,000 RMB (33,000 USD )Principal investigator 2015 -- 2016 Novel machine learning methods 20156020 NSFC grant 660,000 RMB (100,000 USD) Co-PI 2014-2017 New approaches to the limits of efficient propositional reasoning: algorithms, approximations and foundations ID: 20131351464
IIIS-Tsinghua grant 150,000 RMB (25,000 USD) Principal investigator 2012-2015 Machine learning and machine translation
Patents and Inventions
Unsupervised Chinese Word Segmentation for Statistical Machine Translation
Jianfeng Gao, Kristina Nikolova Toutanova, and Jia Xu
Jianfeng Gao, Kristina Nikolova Toutanova, and Jia Xu
Selected Publications
Book Chapter
- Xu, J.; Gao, J.; Toutanova, K.; Ney, H. (2011). Synchronous Learning of Chinese Word Segmentation and Word Alignment. Handbook of Natural Language Processing and Machine Translation.
Conference Proceeding
- Tang, X.; Xu, J.; Wang, S. (2023). Resilient Multi-Agent Reinforcement Learning with Dynamic Participating Agents. Proceedings of IEEE 12th International Conference on Cloud Networking (CloudNet).
- Tang, X.; Zhang, M.; Khan, A.; Yang, .; Xu, J. (2023). Unveiling Equity: Exploring Feature Dependency using Complex-Valued Neural Networks for Fair Data Analysis. Proceedings of IEEE 12th International Conference on Cloud Networking (CloudNet).
- Xu, J. (2023). From Hybrid Dialogers to Neural Responders. https://www.amazon.science/alexa-prize/proceedings/nam-from-hybrid-dialogers-to-neural-responders. Amazon Science Online .
- Alam, F.; Dalvi, F.; Durrani, N.; Sajjad, H.; Khan, A.; Xu, J. (2023). ConceptX: A Framework for Latent Concept Analysis. Proceedings of AAAI-23 Demonstrations Program.
- Yu, Y.; Sajjad, H.; Xu, J. (2023). Learning Uncertainty for Unknown Domains with Zero-Target-Assumption. Proceedings of ICLR.
- Yu, Y.; Sajjad, H.; Xu, J. (2023). Probabilistic Robustness for Data Filtering. Proceedings of EACL.
- Khan, A. R.; Kanade, H.; Budhrani, G. A.; Jhanglani, P.; Xu, J. (2022). SIT at MixMT 2022: Fluent Translation Built on Giant Pre-trained Models. Proceedings of WMT at EMNLP.
- Zhang, M.; Xu, J. (2022). Byte-based Multilingual NMT for Endangered Languages.
- Yu, Y.; Khadivi, S.; Xu, J. (2022). Can Data Diversity Enhance Learning Generalization?. Proceedings of COLING.
- Yu, Y.; Khan, A. R.; Xu, J. (2022). Measuring Robustness for NLP. Proceedings of COLING.
- Sajjad, H.; Durrani, N.; Dalvi, F.; Alam, F.; Khan, A. R.; Xu, J. (2022). Analyzing Encoded Concepts in Transformer Language Models. Proceedings of NAACL.
- Tang, X.; Khan, A. R.; Wang, S.; Xu, J. (2022). Learning by Interpreting. Proceedings of IJCAI.
- Dalvi, F.; Khan, A.; Alam, F.; Durrani, N.; Xu, J.; Sajjad, H. (2022). Discovering Latent Concepts Learned in BERT. Proceedings of ICLR.
- Chubarian, K.; Khan, A.; Sidiropoulos, A.; Xu, J. (2021). Grouping Words with Semantic Diversity. NAACL.
- Khan, A.; Xu, J. (2021). Interpreting Criminal Charge Prediction and Its Algorithmic Bias via Quantum-Inspired Complex Valued Networks. XAI Workshop at ICML.
- Khan, A. R.; Xu, J.; Sun, W. (2020). Coding Textual Inputs Boosts the Accuracy of Neural Networks. Proceedings of EMNLP. Proceedings of EMNLP.
- Lyu, W.; Huang, S.; Zhang, S.; Khan, A. R.; Sun, W.; Xu, J. (2019). CUNY-PKU Parser at SemEval-2019 Task 1: Cross-lingual Semantic Parsing with UCCA. Proceedins of SemEval 2019.
- Cuong, H.; Xu, J. (2018). Assessing Quality Estimation Models for Sentence-Level Prediction. COLING.
- Khan, A.; Panda, S.; Xu, J.; Flokas, L. (2018). Hunter NMT System for WMT'18 Biomedical Translation Task: Transfer Learning in Neural Machine Translation.
- Xu; Kuang, Y.; Baijoo, S.; Lee, J.; Shahazad, U.; Lancaster, M.; Carlan, C. (2017). Hunter MT: A Course for Young Researchers in WMT'17. Conference on Machine Translation at EMNLP.
- Lei, Z.; Ye, X.; Wang, Y.; Li, D.; Xu, J. (2017). On the Efficient Online Model Adaptation by Incremental Simplex Tableau. AAAI.
- Papakonstantinou, P.; Xu, J.; Yang, G. (2016). On the Power and Limits of Distance-Based Learning. ICML.
- Javadi, S.; Khadivi, S.; Shiri, M.; Xu, J. (2014). An Ant Colony Optimization Method to Detect Communities in Social Networks. ASONAM.
- Papakonstantinou, P.; Xu, J.; Cao, Z. (2014). Bagging by Design (On the Sub-optimality of Bagging). AAAI.
- Dong, M.; Cheng, Y.; Liu, Y.; Xu, J.; Sun, M. (2014). Query Lattice for Translation Retrieval. COLING.
- Gan, C.; Qin, Z.; Xu, J.; Wan, T. (2013). Salient Object Detection in Image Sequences via Spatial-Temporal Cue. Conference on Visual Communications and Image Processing.
- Sun, W.; Xu, J. (2011). Enhancing Chinese Word Segmentation Using Unlabeled Data. EMNLP.
- Xu, J.; Sun, W. (2011). Generating Virtual Parallel Corpus: A Compatibility Centric Method. Machine Translation Summit.
- Federmann, C.; Eisele, A.; Chen, Y.; Hunsicker, S.; Xu, J.; Uszkoreit, H. (2010). Further Experiments with Shallow Hybrid MT Systems. ACL Workshop on Statistical Machine Translation.
- Eisele, A.; Xu (2010). Improving Machine Translation Performance Using Comparable Corpora. LREC.
- Xu, J.; Gao, J.; Toutanova, K.; Ney, H. (2008). Bayesian Semi-Supervised Chinese Word Segmentation for Statistical Machine Translation. COLING.
- Deng, Y.; Xu, J.; Gao, Y. (2008). Phrase Tabel Training for Precision and Recall: What Makes a Good Phrase and a Good Phrase Pair?. ACL.
- Xu, J.; Deng, Y.; Gao, Y.; Ney, H. (2007). Domain Dependent Machine Translation. Machine Translation Summit.
- Vilar, D.; Xu, J.; D'Haro, L.; Ney, H. (2006). Error Analysis of Statistical Machine Translation Output. LREC.
- Xu, J.; Zens, R.; Ney, H. (2006). Partitioning Parallel Documents Using Binary Segmentation. Workshop on Statistical Machine Translation at NAACL.
- Xu, J.; Matusov, E.; Zens, R.; Ney, H. (2005). Integrated Chinese Word Segmentation in Statistical Machine Translation. IWSLT.
- Zens, R.; Bender, O.; Hasan, S.; Khadivi, S.; Matusov, E.; Xu, J.; Zhang, Y.; Ney, H. (2005). The RWTH-Aachen Phrase-based Statistical Machine Translation System. IWSLT.
- Xu, J.; Zens, R.; Ney, H. (2004). Do We Need Chinese Word Segmentation for Statistical Machine Translation?. SIGHAN Workshop on Chinese Language Processing at ACL.
Journal Article
- Khan, A.; Karim, A.; Sajjad, H.; Kamiran, F.; Xu, J. (2020). A Clustering Framework for Lexical Normalization of Roman Urdu. Journal of Natural Language Engineering (Impact factor: 1.065 in 2016). Cambridge Press.
- Sun, T.; Wang, Y.; Li, D.; Gu, Z.. WCS: Robust Network Localization by Weighted Component Stitching. IEEE/ACM Transactions on Networking.
Deep Learning
Data Structures
Data Structures