Published November 30, 2021 | Version 1.0.0
Dataset Open

The CLEF-IP 2009 Test Collection

  • 1. Technische Universität Wien, Vienna, Austria
  • 2. Matrixware GmbH, Vienna, Austria
  • 3. Information Retrieval Facility, Vienna, Austria

Contributors

Contact person:

  • 1. TU Wien

Description

CLEF-IP: Cross-Language Evaluation Forum - Intellectual Property

The CLEF-IP track was launched in 2009 to investigate IR techniques for patent retrieval and it is part of the CLEF 2009 evaluation campaign.The track utilizes a collection of more than 1M patent documents derived from EPO (European Patent Office) sources. The collection contains documents in English, French and German with at least 100,000 documents in each language. The task is to find patent documents that constitute prior art. The topics are complete patent documents that participants can process to extract queries. In addition to the Main task, CLEF-IP 2009 provided three language tasks (English, German, French) where topics were in one of these three languages.

Relevance judgements were produced by two methods: automatically, using patent citations from seed patents; and manual for a small number of queries for which search results will be reviewed by Intellectual Property Experts.

Files

  1. Document Collection
    The CLEF-IP 2009 collection of documents consists of XML files. There are 1,9 million XML files, corresponding to approximately 1 million individual patents filed between 1985 and 2000. A dtd file for the XML format is provided as well.
  2. Topics and Answers (Qrels)
    Both the training and the test topic sets contain also the relevance assessments for the topics. For each task of the CLEF-IP 09 track, we provide 4 sets of different sizes of topic test sets: XLarge, Large, Medium, Small.
  3. Guidelines
    Contains detailed explanation on how to work with the four tasks from the corpus.


 

Files

Files (14.0 GiB)

Name Size
md5:c60860b3caae9728c796492ff7de253e
13.6 GiB Download
md5:123370ee60df353de237192dcd7c58c1
403.3 MiB Download
md5:5ee9d6233e9ff0d8c64b174fdf56ee75
581.8 KiB Download

Additional details

Related works

Is described by
Publication: 10.1007/978-3-642-15754-7_47 (DOI)