笢恅    |    English

Welcome!




Cross-Language Information Retrieval

Douglas W. Oard

    

Abstract:

    In this session, I will explain what we know about how to build information retrieval systems that can be used to find information in languages that the users is not able understand. I*ll start by explaining what we know about how to rank documents written in one language based on a query that is expressed in another language. To illustrate this, I will decompose the problem into three parts: (1) what to translate, (2) what translations are possible, and (3) how to use those translations in the ranking process. Along the way I will then draw on evaluation results obtained using standard test collections to illustrate the effect of alternative design choices. I*ll then turn my attention to how users could make effective use of systems that rank documents that, by assumption, those users cannot understand without the help of translation technology. I*ll start this discussion by briefly reviewing the present state of the art for the design of machine translation systems. I will then focus on two types of user studies: (1) controlled quantitative studies using highly structured tasks, and (2) qualitative observational studies using a more natural range of tasks. I*ll conclude the session with a some remarks about where the research frontier is today, and how future developments in related fields might help to create new opportunities.

    

Bio:

    Douglas Oard is the Director of the Computational Linguistics and Information Processing (CLIP) lab at the University of Maryland, College Park in the USA. A Professor with joint appointments in the College of Information Studies and the Institute for Advanced Computer Studies (UMIACS), his research interests center around the use of emerging technologies to support information seeking by end users. His recent work has focused on interactive techniques for cross-language information retrieval, searching conversational media such as speech or email, and support for sense-making in large collections or archived digital media. He has served as a track coordinator for the TREC Arabic CLIR and Legal Tracks, and for the CLEF Interactive and Cross-Language Speech Retrieval Tracks. He earned his Ph.D. in electrical engineering from the University of Maryland, College Park. Additional information is available at http://terpconnect.umd.edu/~oard/.