Scholarly Document Processing @ EMNLP 2020
  • Home
  • Call for Papers
    • Call for Papers
    • Topics of Interest
    • Shared Tasks
    • Submission Information
    • Important Dates
    • Journal Extension
    • Keynote Speakers
    • Committees
  • Shared Tasks
    • Call for Participation
    • CL-SciSumm
    • CL-LaySumm
    • LongSumm
    • Registration
    • Important Dates
    • Organizers
  • ProgramNEW!
    • Keynotes
    • Program
    • Accepted Papers-Research Track
  • Committees
    • Organizing Committee
    • Steering Committee
    • Program Committee
    • Contact Us
  • Other Workshops
  • Venue

Shared Tasks: Call for Participation


Navigation

  • CL-SciSumm 2020
  • CL-LaySumm 2020
  • LongSumm 2020
  • Registration
  • Important Dates
  • Organizing Committee

CL-SciSumm 2020: The 6th Computational Linguistics Scientific Document Summarization Shared Task

CL-SciSumm is the first medium-scale shared task on scientific document summarization, with over 500 annotated documents. Last year's CL-SciSumm shared task introduced large scale training datasets, both annotated from ScisummNet and auto-annotated. For the task, Systems were provided with a Reference Paper (RP) and 10 or more Citing Papers (CPs) that all contain citations to the RP, which they used to summarise RP. This was evaluated against abstract and human written summaries on ROUGE. The shared task attracted 17 registrations and 9 final system submissions. This year, CL-SciSumm '20 will have two new tracks: LaySumm and LongSumm.

CL-SciSumm Task

The task is defined as follows:

  • Given: A topic consisting of a Reference Paper (RP) and Citing Papers (CPs) that all contain citations to the RP. In each CP, the text spans (i.e., citances) have been identified that pertain to a particular citation to the RP.
  • Task 1A: For each citance, identify the spans of text (cited text spans) in the RP that most accurately reflect the citance. These are of the granularity of a sentence fragment, a full sentence, or several consecutive sentences (no more than 5).
  • Task 1B: For each cited text span, identify what facet of the paper it belongs to, from a predefined set of facets.
  • Task 2 (optional bonus task): Finally, generate a structured summary of the RP from the cited text spans of the RP. The length of the summary should not exceed 250 words.

Evaluation

Task 1 will be scored by overlap of text spans measured by number of sentences in the system output vs the gold standard created by human annotators. Task 2 will be scored using the ROUGE family of metrics between (i) the system output and the gold standard summary from the reference spans (ii) the system output and the asbtract of the reference paper.

Corpus

The training and test sets from previous years can be downloaded from GitHub.

Contact

For further information about this task and dataset, please contact:

  • Muthu Kumar Chandrasekaran (Amazon), cmkumar087@gmail.com

CL-LaySumm 2020: The 1st Computational Linguistics Lay Summary Challenge Shared Task

To ensure and increase the relevance of science for all of society and not just a small group of niche practitioners, researchers have been increasingly tasked by funders and publishers to outline the scope of their research for a general public by writing a summary for a lay audience, or lay summary. The LaySumm summarization task considers automating this responsibility, by enabling systems to automatically generate lay summaries.

The CL-LaySumm Shared Task is to automatically produce Lay Summaries of technical (scientific research article) texts. A Lay Summary is defined as a textual summary intended for a non-technical audience. It is typically produced either by the authors or by a journalist or commentator. Examples are provided in the training data. The corpus will cover three distinct domains: epilepsy, archeology, and materials engineering.

In more detail, a lay summary explains, succinctly and without using technical jargon, what the overall scope, goal and potential impact of a scientific paper is. It is typically about 70 - 100 words in length. The task is to generate summaries that are representative of the content, comprehensible, and interesting to a lay audience.

The intrinsic evaluation will be done by ROUGE, using ROUGE-1, -2, and Skipgram metrics. In addition, a randomly selected subset of the summaries will undergo human evaluation by science journalists and communicators for comprehensiveness, legibility, and interest.

All nominated entries will be invited to publish a paper in Open Access (Author-Payment Charges will be waived) in a selected Elsevier publication. Authors will be asked to provide an automatically generated lay summary of their paper, together with their contribution.

Lay Summary Task

The task is defined as follows:

  • Given: A full-text paper, its Abstract, and a Lay Summary of a given paper
  • Task: For each paper, generate a Lay Summary of the specified length

Evaluation

The Lay Summary Task will be scored by using several ROUGE metrics to compare the system output and the gold standard Lay Summary. As a follow-up to the intrinsic evaluation, we will crowdsource a number of automatically generated lay summaries to a panel of judges and a lay audience. Details of the crowdsourcing evaluation will be announced with the sharing of the final test corpus on July 15, 2020.

Corpus

The corpus for this task will comprise full-text papers with lay summaries, in a variety of domains, and from a number of journals. Elsevier will make available a collection of lay summaries from a multidisciplinary collection of journals, as well as the abstracts and full text of these journals. For a small sample dataset, see the task GitHub repository. To obtain access to full the LaySumm (training and test) corpus, please send an email to a.dewaard@elsevier.com. You will be emailed and asked to sign a contract that grants you research access to the full corpus of approximately 600 full-text articles, abstracts and lay summaries. A training corpus consisting of approximately 2/3 of the corpus will be made available directly; the full corpus will be available on the Test Set Release date, July 15, 2020.

For any questions on the contracts or the corpus, please contact a.dewaard@elsevier.com.

Contact

For further information about this data release, contact the following members of the SDP 2020 workshop organizing committee:

  • Anita de Waard (Elsevier, VT), a.dewaard@elsevier.com
  • Eduard Hovy (LTI, CMU), hovy@cmu.edu

LongSumm 2020: Shared Task on Generating Long Summaries for Scientific Documents

Most of the work on scientific document summarization focuses on generating relatively short summaries (abstract like). While such a length constraint can be sufficient for summarizing news articles, it is far from sufficient for summarizing scientific work. In fact, such a short summary resembles more to an abstract than to a summary that aims to cover all the salient information conveyed in a given text. Writing such summaries requires expertise and a deep understanding in a scientific domain, as can be found in some researchers blogs.

The LongSumm task opted to leverage blogs created by researchers in the NLP and Machine learning communities and use these summaries as reference summaries to compare the submissions against.

The corpus for this task includes a training set that consists of 1,705 extractive summaries, and around 700 abstractive summaries of NLP and Machine Learning scientific papers. These are drawn from papers based on video talks from associated conferences (Lev et al. 2019 TalkSumm) and from blogs created by NLP and ML researchers. In addition, we create a test set of abstractive summaries. Each submission is judged against one reference summary (gold summary) on ROUGE and should not exceed 600 words.

Long Summary Task

The task is defined as follows:

  • Given: For a detailed description of the provided data, please see the LongSumm GitHub repository
  • Task: Generate abstractive and extractive summaries for scientific papers

Evaluation

The Long Summary Task will be scored by using several ROUGE metrics to compare the system output and the gold standard Lay Summary. The intrinsic evaluation will be done by ROUGE, using ROUGE-1, -2, -L and Skipgram metrics. In addition, a randomly selected subset of the summaries will undergo human evaluation.

Corpus

The training data is composed of abstractive and extractive summaries. To download both datasets, and for further details, see the LongSumm GitHub repository.

The (blind) test dataset will be released on July 15, 2020.

Contact

For further information about this dataset please contact the organizers of the shared task:

  • Michal Shmueli-Scheuer - IBM Research AI
  • Guy Feigenblat - IBM Research AI

Registration

To register for participation in the shared tasks, please use this registration form.


Important Dates

Please consult the SDP Workshop website for official dates for the workshop. All submission deadlines are 11:59 PM AoE (Anywhere on Earth) Time Zone (UTC-12).

Event Date
Training Set Release Feb 15, 2020. An additional development set will be made available closer to the test set release date
Deadline for Registration April 30 (remains open till evaluation window starts)
Test Set Release (Blind) July 1, 2020 July 15, 2020
System Runs Due August 1, 2020 Aug 15, 2020
Preliminary System Reports Due in SoftConf August 16, 2020
Camera-Ready Contributions Due in SoftConf August 31, 2020 Oct 10, 2020
Participant Presentations at SDP 2020 Nov 12 Nov 19, 2020

Organizing Committee

Muthu Kumar Chandrasekaran

Amazon, Seattle, US

Muthu Kumar Chandrasekaran is a Research Scientist at Amazon, Seattle working on Natural language understanding. Previously he was a Scientist at SRI's International Artificial Intelligence Center. He completed his Ph.D. from NUS School of Computing. He is broadly interested in natural language processing, machine learning and their applications to information retrieval; specifically, in retrieving and organising information from asynchronous conversation media such as scholarly publications and discussion forums. He has been co-chairing the CL-SciSumm Shared Task series and the BIRNDL workshop series since 2014. He also reviews for ACL, EMNLP, NAACL, CoNLL and JCDL conferences. During his PhD he also interned at the Allen Institute for Artificial Intelligence's Semantic Scholar research and National Institute of Informatics, Tokyo.

Anita de Waard

Elsevier, USA

Anita is VP of Research Collaborations, where her work focuses on working with academic and industry partners on projects pertaining to progressing modes and frameworks for scholarly communication. Since 1997, she has worked on bridging the gap between science publishing and computational and information technologies, collaborating with groups in Europe and the US. From 2006 onwards, de Waard has been working on a discourse analysis of scientific narrative, with an emphasis on finding key epistemic components in biological text and within that scope, helped start the TAC SciSumm workshops in 2013. She is a cofounder of Force11 and cofounder of the Research Data Alliance's group on data retrieval technologies.

Guy Feigenblat

IBM Research AI, Haifa Research Lab, Israel

Guy Feigenblat is a team leader at the Language and Retrieval group in IBM Research AI. Guy is interested in AI, NLP and Information Retrieval (IR) research. He currently leads projects focusing on automatic document summarization (query-based, generic, extractive, abstractive) for various domains and use cases. Guy is involved in the development of IBM Science Summarizer, a novel search engine for scientific literature. Guy holds a Ph.D. in computer science from Bar-Ilan University. He co-organized Stringology workshop 2012.

Dayne Freitag

SRI International, San Diego, USA

He is Program Director at SRI's Artificial Intelligence Center. He leads the Advanced Analytics group. His research seeks to apply artificial intelligence to information assimilation, management and exploitation. Freitag has served as principal investigator for a number of research projects including several large, multi-institutional efforts. His research goals have focused on the automation of data science; the automatic extension of mechanistic models through machine reading; knowledge federation over diverse information sources through data analytics and natural language processing; explaining the spread of ideas through online communities; and novel approaches to institutional knowledge management using controlled English. Freitag holds a B.A. in English literature from Reed College, and a Ph.D. in computer science from Carnegie Mellon University.

Eduard Hovy

Research Professor, LTI, Carnegie Melon University

His research includes work on computational semantics of human language (such as text analysis, event detection and coreference, text summarization and generation, question answering, discourse processing, ontologies, text mining, text annotation, and machine translation evaluation), aspects of social media (such as event detection and tracking, sentiment and opinion analysis, and author profile creation), analysis of the semantics of non-textual information such as tables, and aspects of digital government.

David Konopnicki

IBM Research AI, Haifa Research Lab, Israel

David Konopnicki manages the Language and Retrieval group in IBM Research AI and is leading a variety of R&D projects: development of large-scale full-text search engines, building customer profiles from enterprise and social media sources, affective computing on conversations data and more. David leads the development of IBM Science Summarizer, a novel search engine for scientific literature. David was a co-organizer of the THUM workshop at UMAP 2017, and the organizer of ISCOL 2018, and 2019.

Michal Shmueli-Scheuer

IBM Research AI, Haifa Research Lab, Israel

Dr. Michal Shmueli-Scheuer is a leader researcher in the Language and Retrieval research group (AI Language department) in IBM Research - Haifa, with over 12 years of industry experience. She holds a Ph.D (2009) degree in Information and Computer Science from the University of California, Irvine, USA. Her area of expertise is in the fields of conversational bots, affective computing, user modeling, large scale analytics, database, and information systems, focusing on user behavior analytics and information management on the web. She has published more than 30 academic papers in leading conferences, and journals, and book chapters. She has served as a PC member and a reviewer of numerous leading conferences and journals. Within IBM, she has lead numerous user modeling related projects and has been recognized for her significant contributions.



Contact: sdproc@googlegroups.com

Follow us: https://twitter.com/SDProc

© 2020 Oak Ridge National Laboratory

Back to top