Skip to main content


Rosni Vasu (University of Zurich) - SciHyp: A Fine-grained Dataset Describing Hypotheses and Their Components from Scientific Articles

Neurocognition, Language and Visual Processing (NLVP) series

Event details

Abstract: Scientific discovery involves understanding and structuring hypotheses, a challenging task due to the complexity of scientific texts. This talk presents SciHyp, a novel dataset containing RDF descriptions of 689 hypothesis sentences from 479 computer science articles. These hypotheses include relation-finding and comparative types. The dataset was created using a multi-step annotation pipeline with expert annotation, Language Models (LMs) like BERT and Sci-BERT, and crowd-based refinement. Our pipeline effectively identified non-hypothesis sentences with a 96.1% consensus rate between the LMs and crowd annotations, demonstrating its effectiveness in identifying relevant sentences that contain hypotheses. We also used GPT-4 to extract hypothesis components. SciHyp aims to benefit the scientific community by providing a structured dataset for model training and evaluation. The talk concludes with a glimpse into an ongoing project making use of the SciHyp pipeline for scientific hypotheses generation.

Bio: Rosni Vasu is an Informatics PhD student at the University of Zurich, where she is advised by Prof. Abraham Bernstein. Her research focuses on human-machine collaboration for scientific text understanding and reasoning. She is particularly interested in how humans and machines can jointly contribute to the tasks of scientific hypothesis detection, generation, and ranking. Website and twitter are and

Zoom meeting link:

(Meeting ID: 937 0760 9239 Password: 259613)

Upcoming webinars:

· Roberto Navigli (November 14, 2024, 3-4pm BST)
· Vered Shwartz (December 12, 2024, 4-5pm BST)

Check past and upcoming seminars at the following url:

If you want to follow future NLVP seminars, you are welcome to join our *Google group*: