CL-SciSumm is the first medium-scale shared task on scientific document summarization, with over 500 annotated documents. Last year's CL-SciSumm shared task introduced large scale training datasets, both annotated from ScisummNet and auto-annotated. For the task, Systems were provided with a Reference Paper (RP) and 10 or more Citing Papers (CPs) that all contain citations to the RP, which they used to summarise RP. This was evaluated against abstract and human written summaries on ROUGE. The shared task attracted 17 registrations and 9 final system submissions. This year, CL-SciSumm '20 will have two new tracks: LaySumm and LongSumm.
The task is defined as follows:
Task 1 will be scored by overlap of text spans measured by number of sentences in the system output vs the gold standard created by human annotators. Task 2 will be scored using the ROUGE family of metrics between (i) the system output and the gold standard summary from the reference spans (ii) the system output and the asbtract of the reference paper.
The training and test sets from previous years can be downloaded from GitHub.
For further information about this task and dataset, please contact:
To ensure and increase the relevance of science for all of society and not just a small group of niche practitioners, researchers have been increasingly tasked by funders and publishers to outline the scope of their research for a general public by writing a summary for a lay audience, or lay summary. The LaySumm summarization task considers automating this responsibility, by enabling systems to automatically generate lay summaries.
The CL-LaySumm Shared Task is to automatically produce Lay Summaries of technical (scientific research article) texts. A Lay Summary is defined as a textual summary intended for a non-technical audience. It is typically produced either by the authors or by a journalist or commentator. Examples are provided in the training data. The corpus will cover three distinct domains: epilepsy, archeology, and materials engineering.
In more detail, a lay summary explains, succinctly and without using technical jargon, what the overall scope, goal and potential impact of a scientific paper is. It is typically about 70 - 100 words in length. The task is to generate summaries that are representative of the content, comprehensible, and interesting to a lay audience.
The intrinsic evaluation will be done by ROUGE, using ROUGE-1, -2, and Skipgram metrics. In addition, a randomly selected subset of the summaries will undergo human evaluation by science journalists and communicators for comprehensiveness, legibility, and interest.
All nominated entries will be invited to publish a paper in Open Access (Author-Payment Charges will be waived) in a selected Elsevier publication. Authors will be asked to provide an automatically generated lay summary of their paper, together with their contribution.
The task is defined as follows:
The Lay Summary Task will be scored by using several ROUGE metrics to compare the system output and the gold standard Lay Summary. As a follow-up to the intrinsic evaluation, we will crowdsource a number of automatically generated lay summaries to a panel of judges and a lay audience. Details of the crowdsourcing evaluation will be announced with the sharing of the final test corpus on July 15, 2020.
The corpus for this task will comprise full-text papers with lay summaries, in a variety of domains, and from a number of journals. Elsevier will make available a collection of lay summaries from a multidisciplinary collection of journals, as well as the abstracts and full text of these journals. For a small sample dataset, see the task GitHub repository. To obtain access to full the LaySumm (training and test) corpus, please send an email to a.dewaard@elsevier.com. You will be emailed and asked to sign a contract that grants you research access to the full corpus of approximately 600 full-text articles, abstracts and lay summaries. A training corpus consisting of approximately 2/3 of the corpus will be made available directly; the full corpus will be available on the Test Set Release date, July 15, 2020.
For any questions on the contracts or the corpus, please contact a.dewaard@elsevier.com.
For further information about this data release, contact the following members of the SDP 2020 workshop organizing committee:
Most of the work on scientific document summarization focuses on generating relatively short summaries (abstract like). While such a length constraint can be sufficient for summarizing news articles, it is far from sufficient for summarizing scientific work. In fact, such a short summary resembles more to an abstract than to a summary that aims to cover all the salient information conveyed in a given text. Writing such summaries requires expertise and a deep understanding in a scientific domain, as can be found in some researchers blogs.
The LongSumm task opted to leverage blogs created by researchers in the NLP and Machine learning communities and use these summaries as reference summaries to compare the submissions against.
The corpus for this task includes a training set that consists of 1,705 extractive summaries, and around 700 abstractive summaries of NLP and Machine Learning scientific papers. These are drawn from papers based on video talks from associated conferences (Lev et al. 2019 TalkSumm) and from blogs created by NLP and ML researchers. In addition, we create a test set of abstractive summaries. Each submission is judged against one reference summary (gold summary) on ROUGE and should not exceed 600 words.
The task is defined as follows:
The Long Summary Task will be scored by using several ROUGE metrics to compare the system output and the gold standard Lay Summary. The intrinsic evaluation will be done by ROUGE, using ROUGE-1, -2, -L and Skipgram metrics. In addition, a randomly selected subset of the summaries will undergo human evaluation.
The training data is composed of abstractive and extractive summaries. To download both datasets, and for further details, see the LongSumm GitHub repository.
The (blind) test dataset will be released on July 15, 2020.
For further information about this dataset please contact the organizers of the shared task:
To register for participation in the shared tasks, please use this registration form.
Please consult the SDP Workshop website for official dates for the workshop. All submission deadlines are 11:59 PM AoE (Anywhere on Earth) Time Zone (UTC-12).
Event | Date |
---|---|
Training Set Release | Feb 15, 2020. An additional development set will be made available closer to the test set release date |
Deadline for Registration | April 30 (remains open till evaluation window starts) |
Test Set Release (Blind) | |
System Runs Due | |
Preliminary System Reports Due in SoftConf | August 16, 2020 |
Camera-Ready Contributions Due in SoftConf | |
Participant Presentations at SDP 2020 |
Amazon, Seattle, US
Muthu Kumar Chandrasekaran is a Research Scientist at Amazon, Seattle working on Natural language understanding. Previously he was a Scientist at SRI's International Artificial Intelligence Center. He completed his Ph.D. from NUS School of Computing. He is broadly interested in natural language processing, machine learning and their applications to information retrieval; specifically, in retrieving and organising information from asynchronous conversation media such as scholarly publications and discussion forums. He has been co-chairing the CL-SciSumm Shared Task series and the BIRNDL workshop series since 2014. He also reviews for ACL, EMNLP, NAACL, CoNLL and JCDL conferences. During his PhD he also interned at the Allen Institute for Artificial Intelligence's Semantic Scholar research and National Institute of Informatics, Tokyo.
Elsevier, USA
Anita is VP of Research Collaborations, where her work focuses on working with academic and industry partners on projects pertaining to progressing modes and frameworks for scholarly communication. Since 1997, she has worked on bridging the gap between science publishing and computational and information technologies, collaborating with groups in Europe and the US. From 2006 onwards, de Waard has been working on a discourse analysis of scientific narrative, with an emphasis on finding key epistemic components in biological text and within that scope, helped start the TAC SciSumm workshops in 2013. She is a cofounder of Force11 and cofounder of the Research Data Alliance's group on data retrieval technologies.
IBM Research AI, Haifa Research Lab, Israel
Guy Feigenblat is a team leader at the Language and Retrieval group in IBM Research AI. Guy is interested in AI, NLP and Information Retrieval (IR) research. He currently leads projects focusing on automatic document summarization (query-based, generic, extractive, abstractive) for various domains and use cases. Guy is involved in the development of IBM Science Summarizer, a novel search engine for scientific literature. Guy holds a Ph.D. in computer science from Bar-Ilan University. He co-organized Stringology workshop 2012.
SRI International, San Diego, USA
He is Program Director at SRI's Artificial Intelligence Center. He leads the Advanced Analytics group. His research seeks to apply artificial intelligence to information assimilation, management and exploitation. Freitag has served as principal investigator for a number of research projects including several large, multi-institutional efforts. His research goals have focused on the automation of data science; the automatic extension of mechanistic models through machine reading; knowledge federation over diverse information sources through data analytics and natural language processing; explaining the spread of ideas through online communities; and novel approaches to institutional knowledge management using controlled English. Freitag holds a B.A. in English literature from Reed College, and a Ph.D. in computer science from Carnegie Mellon University.
Research Professor, LTI, Carnegie Melon University
His research includes work on computational semantics of human language (such as text analysis, event detection and coreference, text summarization and generation, question answering, discourse processing, ontologies, text mining, text annotation, and machine translation evaluation), aspects of social media (such as event detection and tracking, sentiment and opinion analysis, and author profile creation), analysis of the semantics of non-textual information such as tables, and aspects of digital government.
IBM Research AI, Haifa Research Lab, Israel
David Konopnicki manages the Language and Retrieval group in IBM Research AI and is leading a variety of R&D projects: development of large-scale full-text search engines, building customer profiles from enterprise and social media sources, affective computing on conversations data and more. David leads the development of IBM Science Summarizer, a novel search engine for scientific literature. David was a co-organizer of the THUM workshop at UMAP 2017, and the organizer of ISCOL 2018, and 2019.
IBM Research AI, Haifa Research Lab, Israel
Dr. Michal Shmueli-Scheuer is a leader researcher in the Language and Retrieval research group (AI Language department) in IBM Research - Haifa, with over 12 years of industry experience. She holds a Ph.D (2009) degree in Information and Computer Science from the University of California, Irvine, USA. Her area of expertise is in the fields of conversational bots, affective computing, user modeling, large scale analytics, database, and information systems, focusing on user behavior analytics and information management on the web. She has published more than 30 academic papers in leading conferences, and journals, and book chapters. She has served as a PC member and a reviewer of numerous leading conferences and journals. Within IBM, she has lead numerous user modeling related projects and has been recognized for her significant contributions.