Search queries, keywords, tags and other short text entries pose many challenges for traditional natural language processing techniques. Professor Huan Liu hosts Haixun Wang, director of natural language processing at Amazon, to discuss methods to make short texts easier to handle.
NLP for Short Text Understanding
Presented by Haixun Wang, director of Natural Language Processing at Amazon. Hosted by Professor Huan Liu.
Monday, January 29, 2018
Brickyard (BYENG) 210, Tempe campus [map]
Billions of short texts are produced every day, in the form of search queries, ad keywords, tags, tweets, messenger conversations, social network posts, etc. Unlike documents, short texts have some unique characteristics which make them difficult to handle. First, short texts, especially search queries, do not always observe the syntax of a written language. This means traditional NLP techniques, such as syntactic parsing, do not always apply to short texts. Second, short texts contain limited context. The majority of search queries contain less than five words, and tweets can have no more than 140 characters. Because of these reasons, short texts give rise to a significant amount of ambiguity, which makes them extremely difficult to handle.
On the other hand, many applications, including search engines, ads, automatic question answering, online advertising, recommendation systems, etc., rely on short text understanding. In this talk, Haixun Wang, Director of Natural Language Processing at Amazon, will go over various techniques in knowledge acquisition, representation and inferencing has been proposed for text understanding, and will describe massive structured and semi-structured data that have been made available in the recent decade that directly or indirectly encode human knowledge, turning the knowledge representation problems into a computational grand challenge with feasible solutions insight.
About the speaker
Haixun Wang is a Director of Natural Language Processing at Amazon, and an IEEE fellow. Before Amazon, he led the NLP Infra team in Facebook working on Query and Document Understanding. From 2013 to 2015, he was with Google Research, working on natural language processing. From 2009 to 2013, he led research in semantic search, graph data processing systems, and distributed query processing at Microsoft Research Asia. His knowledge base project Probase has created significant impact in industry and academia.
He had been a research staff member at IBM T. J. Watson Research Center from 2000 to 2009. He was Technical Assistant to Stuart Feldman (Vice President of Computer Science of IBM Research) from 2006 to 2007, and Technical Assistant to Mark Wegman (Head of Computer Science of IBM Research) from 2007 to 2009.
He received a doctoral degree in computer science from the University of California, Los Angeles in 2000.
He has published more than 150 research papers in refereed international journals and conference proceedings.
He served PC Chair of conferences such as CIKM’12, and he is on the editorial board of journals such as IEEE Transactions of Knowledge and Data Engineering (TKDE) and Journal of Computer Science and Technology (JCST). He won the best paper award in ICDE 2015, 10 year best paper award in ICDM 2013, and best paper award of ER 2009.