Join us for the Dean’s Distinguished Lecture on Friday, Feb. 24, to hear about a scalable approach to extracting knowledge from text data. This year’s lecture is presented by Jiawei Han from the University of Illinois at Urbana-Champaign.
Dean’s Distinguished Lecture: Text Mining — A Pretrained Language Model-Based, Annotation-Free Approach
Friday, Feb. 24, 2023
Lecture: 1 p.m.
Q&A: 1:45 p.m.
Reception: 2–3 p.m.
Interdisciplinary Science and Technology IV (ISTB4), Marston Exploration Theater, Tempe campus [map]
Register to attend
Faculty are invited to attend a meeting with Han prior to the lecture. Sign up for the lecture, faculty meeting or both when you register to attend the event.
Download driving directions to ISTB4.
Real-world big data are largely dynamic, interconnected and unstructured texts. It is important to transform such massive unstructured text into structured knowledge. Many researchers rely on labor-intensive labeling and annotation to extract knowledge from text data. Such approaches, however, are not scalable.
We vision that massive text itself may disclose a large body of hidden structures and knowledge. Equipped with pre-trained language models and data mining/machine learning methods, it is promising to transform unstructured text into structured knowledge without extensive human annotation.
In this talk, Han overviews a set of annotation-free text mining methods developed recently by his group for such an exploration, including discriminative topic mining, taxonomy construction, text classification and taxonomy-guided text analysis. He shows that a weakly supervised, annotation-free approach could be promising at transforming massive text into structured knowledge.
About the speaker
Jiawei Han is Michael Aiken Chair Professor in the Department of Computer Science, University of Illinois at Urbana-Champaign. He received the ACM SIGKDD Innovation Award in 2004, the IEEE Computer Society Technical Achievement Award in 2005, IEEE Computer Society W. Wallace McDowell Award in 2009, Japan’s Funai Achievement Award in 2018, and was elevated to Fellow of the Royal Society of Canada in 2022.
He is a Fellow of ACM and a Fellow of IEEE. He has served as the director of the Information Network Academic Research Center, or INARC, from 2009 to 2016. Han has been supported by the Network Science-Collaborative Technology Alliance program of U.S. Army Research Lab. He was co-director of KnowEnG, a Center of Excellence in Big Data Computing, from 2014 to 2019. And he was funded by NIH Big Data to Knowledge, or BD2K, Initiative.
Currently, he is serving on the executive committees of two National Science Foundation-funded research centers: MMLI (Molecular Make Research Institute) — one of the NSF =0funded national AI centers since 2020; and I-Guide — The NSF Institute for Geospatial Understanding through an Integrative Discovery Environment — since 2021.