Jan Šnajder

Jan Šnajder

Jan Šnajder received his PhD degree in Computer Science from the University of Zagreb in 2010. Since 2010, he has been an Assistant Professor at the Faculty of Electrical Engineering and Computing (FER) at the University of Zagreb and a member of Text Analysis and Knowledge Engineering Lab (TakeLab). His research revolves around artificial intelligence, more precisely natural language processing (NLP) and machine learning, with a focus on statistical methods for lexical semantics, information extraction, and opinion mining. From 2012 to 2016 he was a visiting research fellow at the Department of Computational Linguistics at Heidelberg University, National Institute of Information and Communications Technology in Kyoto, Institute for Natural Language Processing at Stuttgart University, and the University of Melbourne.
Jan has participated in several national and international research projects, as well as projects with the industry, while currently he is the principal investigator on several NLP research projects. He has (co)authored more than 70 research papers, and has been reviewing for the major journals and conferences in the field. He is teaching several subjects at the University of Zagreb, including Artificial Intelligence, Machine Learning, and Text Analysis and Retrieval courses, and supervised more than 70 bachelor and master’s students.
Jan is the co-founder and secretary of the Special Interest Group for the Natural Language Processing of Slavic Languages, endorsed by the Association for Computational Linguistics, and member of the newly established Scientific Board of the Croatian Scientific Excellence Centre for Data Science. He was a member of the winning team for the VIDI e-Novation Award in 2007 in 2009, and was awarded the Croatian Science Foundation fellowship in 2012, the fellowship of the Japanese Society for the Promotion of Science in 2014, and the Endeavour Fellowship of the Australian Government in 2015.

Artificial Intelligence Meets Terminology Extraction

Artificial Intelligence (AI) is making huge waves once again. Fuelled by recent advances in machine learning, AI has made remarkable progress in developing human-level competence systems for a variety of complex cognitive tasks, such as playing the game of Go against a human champion, recognizing objects featured in a photograph, or navigating an autonomous vehicle. On the other hand, while we have also witnessed significant progress in the AI field of natural language processing (NLP), the full understanding of language remains a distant dream. Much effort in the NLP community is currently being directed towards machine translation. At the same time, translation community is in dire need of more mature computer-aided translation technologies. A key component of such technology is the capability for automated extraction of terminology lexica. This task has received considerable attention in the NLP community, especially from the perspective of statistical collocations and multi-word expression processing.
In this talk, I will review the research on multi-word expression processing in NLP, and discuss how this research relates to terminology and translation. In particular, I will focus on the recent developments in statistical and semantic approaches to multi-word expression extraction and identification, and how these can be leveraged in computer-aided translation tools. I’ll outline some of the outstanding challenges and caveats, which the end users of these tools should be aware of. Finally, I will present TermeX, a state-of-the-art terminology extraction tool developed at TakeLab FER, and announce a new release of this tool. For this new version, we are working together with the translation community to create a product that specifically meets the needs of translators in their daily work, by assisting them in delivering high-quality, accurate translations under increasingly tight time constraints.