Managing Director, MSR Outreach Academic Services, USA
Kuansan Wang is a Principal Researcher and Managing Director of Microsoft Research Outreach where he is responsible for engaging with the global academic community on jointly advancing the state-ofthe- art in the areas MSR conducts research. He is leading a team that conducts research on web-scale machine reading, intelligent inference, deep semantic analytics and user behavior modeling. In addition to contributing to the development of Microsoft Bing and Cortana, the technologies developed at his team can also be seen in Microsoft Academic services that include a search engine at academic.microsoft.com and the Academic Knowledge API available through Microsoft Cognitive Services. Dr. Wang joined MSR in 1998 as a researcher in speech technology group where he conducted research in language modeling and multimodal interactions. He then became a software architect for Microsoft speech product group, responsible for Microsoft Speech Server and Response Point, and represented Microsoft to W3C, ECMA and ISO to help author international standards in speech, language and communication areas. He returned to MSR to work on web search in 2007 and has been a key driving force to evolve web search from a keyword based to semantic based paradigm. Kuansan received his BS from National Taiwan University and MS and PhD as an NSF Fellow from University of Maryland, College Park, all in electrical engineering.
Keynote Title: Mitigating scholarly corpus biases with citations: A case study on CORD-19
With the broad adoption of evidence based decision making processes, recent years have witnessed more frequent examples where biases in the datasets or the analytical algorithms lead to unfortunate and sometimes harmful outcomes. Being mindful of potential biases and actively taking measures to mitigate them have become a necessary second nature for scholars and decision makers alike. Citations in scholarly publications have long been known to represent the crowd-sourced collective judgments on scholarly communications and can be a valuable source of information in analyzing scholarly documents. This study describes a methodology that uses citations to identify biases in such corpus, using as an example the COVID-19 Open Research Dataset, or CORD-19, a corpus created to advance the development of intelligent technologies that can assist scientists in navigating through the voluminous literature of COVID-19. By expanding to articles in the citation networks seeded by CORD-19 with three distinct algorithms, it can be shown that CORD-19 has a strong tilt in favor of recent articles and uneven coverages in the topical fields and the publication venues. Using CORD-19 to identify critical knowledge and assess the journal importance, for example, will lead to different conclusions from the analyses based on the three expanded datasets, of which results largely agree with one another. CORD-19, however, does not appear to exhibit biases in describing research collaborations in terms of team sizes or geolocations. Currently, the three citation network traversal algorithms only utilize bibliographic records. How improvements can be made to them, such as through more sophisticated uses of citation contexts, will also be discussed.
Scientific Director of arXiv, Professor in the Department of Astronomy & Astrophysics at The Pennsylvania State University
Steinn Sigurðsson is a Professor in the Department of Astronomy & Astrophysics at the Pennsylvania State University. He received his PhD in physics in 1991 from the California Institute of Technology, and completed postdoctoral fellowships at the University of California at Santa Cruz and Cambridge University. He does research in theoretical astrophysics. Dr. Sigurðsson is a member of the Center for Exoplanets and Habitable Worlds at Penn State; the Institute for Gravitation and the Cosmos at Penn State; and, the Penn State Astrobiology Research Center. Dr. Sigurðsson is a Science Editor of the AAS Journals, a Trustee of the Aspen Center for Physics, and the Scientific Director of arΧiv at Cornell University.
Keynote Title: The future of arXiv and knowledge discovery in open science
arXiv, the preprint server for the physical and mathematical sciences, is in its third decade of operation. As the flow of new, open access research increases inexorably, the challenges to keep up with and discover research content also become greater. I will discuss the status and future of arXiv, and possibilities and plans to make more effective use of the research database to enhance ongoing research efforts.