The CLEF-IP 2010 Test Collection

Piroi, Florina; Tait, John

doi:10.48436/jqrsc-jbq51

Published November 30, 2021 | Version 1.0.0

Dataset Open

The CLEF-IP 2010 Test Collection

1. Technische Universität Wien, Vienna, Austria
2. Information Retrieval Facility, Vienna, Austria

CLEF-IP: Cross-Language Evaluation Forum - Intellectual Property

The CLEF-IP track was launched in 2009 to investigate IR techniques for patent retrieval and it is part of the CLEF 2010 evaluation campaign.The track utilizes a collection of more than 1.3M patent documents (~2.6 million files) derived from EPO (European Patent Office) sources, and published before 2001. The collection contains documents in English, French and German with at least 150,000 documents in each language. The task is to find patent documents that constitute prior art.

There are two tasks in the 2010's track. The first one is to find patent documents that are candidates to constitute prior art for a given document. The second task is to classify a given document according to the International Patent Classification system (IPC). Relevance judgements are produced using the patent citations and meta-data (bibliographic data).

Files

Document Collection
The collection contains over 2.6 million XML files.
Topics and Answers
Both the training and the test topic sets contain also the relevance assessments for the topics.
Guidelines
Detailed explanation on how to work with the tasks from the corpus.

Files

Files (9.3 GiB)

Name	Size
01_document_collection.tgz md5:058e49042ba09355a47eed20141293d4	9.3 GiB	Download
02_topics.tgz md5:30ce87c22a9f845f7c93b6f08631bf29	43.9 MiB	Download
03_guidelines_docs.tgz md5:ceefbd68c3ef315a5fca94f44bf7c5e0	147.7 KiB	Download

Additional details

Is described by: Publication: http://ceur-ws.org/Vol-1176/CLEF2010wn-CLEF-IP-PiroiEt2010.pdf (URL)

The CLEF-IP 2010 Test Collection

Creators

Description

CLEF-IP: Cross-Language Evaluation Forum - Intellectual Property

Files

Files

Files (9.3 GiB)

Additional details

Related works