Document Type

Article

Abstract

This paper explores new methods for locating the sources used to write a text by 昀椀ne-tuning a variety of language models to rerank candidate sources. These methods promise to shed new light on traditions with complex citational practices, such as in medieval Arabic where citations are ambiguous and boundaries of quotation are poorly defined. After retrieving candidates sources using a baseline BM25 retrieval model, a variety of reranking methods are tested to see how effective they are at the task of source attribution. We conduct experiments on two datasets—English Wikipedia and medieval Arabic historical writing—and employ a variety of retrieval- and generation-based reranking models. In particular, we seek to understand how the degree of supervision required affects the performance of various reranking models. We find that semi-supervised methods can be nearly as effective as fully supervised methods while avoiding potentially costly span-level annotation of the target and source documents.

Publication (Name of Journal)

CEUR Workshop Proceedings

Recommended Citation

Muther, R., Barber, M., Smith, D. (2023). Querying the Past: Automatic Source Attribution with Language Models. CEUR Workshop Proceedings, 3558, 344-355.
Available at: https://ecommons.aku.edu/uk_ismc_faculty_publications/287

Download

Included in

Arabic Studies Commons, Databases and Information Systems Commons, History Commons, Reading and Language Commons

COinS

eCommons@AKU

Faculty & Staff Publications

Querying the Past: Automatic Source Attribution with Language Models

Document Type

Abstract

Publication (Name of Journal)

Recommended Citation

Included in

Search

Browse

Links

eCommons@AKU

Faculty & Staff Publications

Querying the Past: Automatic Source Attribution with Language Models

Authors

Document Type

Abstract

Publication (Name of Journal)

Recommended Citation

Included in

Share

Search

Browse

Links