Published February 20, 2023 | Version 1.0.0
Dataset Open

Legal CaPER Benchmark

Creators

  • 1. ROR icon TU Wien

Contributors

Data manager:

  • 1. TU Wien

Description

The Legal Case Passage Extraction and Retrieval benchmark is an information retrieval benchmark collection for court case passage retrieval. Specifically, it is a collection for evaluating Cited Case Passage Retrieval (CCPR) and contains case passages from the Austrian building regulations domain (Source: RIS). The following files are included in the dataset:

  • full_collection.tsv

A tab separated file containing the passage texts of court cases from the building regulations domain. Column 1 contains the ID of the passage, Column 2 contains the passage text and Column 3 contains the case ID (Geschäftszahl) of the origin case of the passage.

  • queries.tsv

A tab separated file containing the queries / topics for which relevance assessments exist in this collection. Column 1 contains the ID of the query, Column 2 contains the query passage text and Column 3 contains the case ID (Geschäftszahl) of the cited case. For the task of CCPR, it is intended that results are additionally filtered based on exact matches of the case ID. For each query, only relevance assessments exist for passages that match the case ID of column 3.

  • qrel.json

Contains relevance assessments for each query. In this dictionary, a passage from the full collection is relevant for a query if qrel[<query ID>][<passage ID>] == 1. If a passage ID is not in qrel[<query ID>], it is not relevant. Relevance assessments only exist for full collection passages that match the case ID of the query.

  • qrel.json.txt

A conversion of the qrel.json file to be compatible with trec eval.

Files

qrel.json

Files (33.4 MiB)

Name Size
md5:2bfcddde3ad0cd2a3e546f6bb1e87311
33.3 MiB Download
md5:27516b812762e6da5045b95c0bba2f24
7.1 KiB Preview Download
md5:1d3e87f731f76df02b0a6da075f34403
7.0 KiB Preview Download
md5:0b4b5f720ae2eba2b08263efee7055d2
74.7 KiB Download