longformer-base-4096
Ai2's Longformer base model that extends transformer attention to handle documents up to 4,096 tokens long for NLP tasks.
Model Card
Longformer is a transformer model for long documents.
longformer-base-4096 is a BERT-like model started from the RoBERTa checkpoint and pretrained for MLM on long documents. It supports sequences of length up to 4,096.
Longformer uses a combination of a sliding window (local) attention and global attention. Global attention is user-configured based on the task to allow the model to learn task-specific representations.
Please refer to the examples in modeling_longformer.py and the paper for more details on how to set global attention.
Citing
If you use Longformer in your research, please cite Longformer: The Long-Document Transformer.
@article{Beltagy2020Longformer,
title={Longformer: The Long-Document Transformer},
author={Iz Beltagy and Matthew E. Peters and Arman Cohan},
journal={arXiv:2004.05150},
year={2020},
}
Longformer is an open-source project developed by the Allen Institute for Artificial Intelligence (AI2).
AI2 is a non-profit institute with the mission to contribute to humanity through high-impact AI research and engineering.
Sign up to read complete case studies, access detailed metrics, and unlock all use cases.
Sign up to read complete case studies, access detailed metrics, and unlock all use cases.