¿Quién creó scibert_scivocab_uncased?

scibert_scivocab_uncased fue publicado por Ai2 en Hugging Face.

scibert_scivocab_uncased

Name: scibert_scivocab_uncased
Author: Ai2

Modelo BERT de Ai2 preentrenado en texto científico con un vocabulario específico del dominio para tareas de PLN en ciencias.

Descripción del Modelo

This is the pretrained model presented in SciBERT: A Pretrained Language Model for Scientific Text, which is a BERT model trained on scientific text.

The training corpus was papers taken from Semantic Scholar. Corpus size is 1.14M papers, 3.1B tokens. We use the full text of the papers in training, not just abstracts.

SciBERT has its own wordpiece vocabulary (scivocab) that's built to best match the training corpus. We trained cased and uncased versions.

Available models include:

scibert_scivocab_cased
scibert_scivocab_uncased

The original repo can be found here.

If using these models, please cite the following paper:

@inproceedings{beltagy-etal-2019-scibert,
    title = "SciBERT: A Pretrained Language Model for Scientific Text",
    author = "Beltagy, Iz  and Lo, Kyle  and Cohan, Arman",
    booktitle = "EMNLP",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/D19-1371"
}

Autor

Ai2

Organización · ✓

allenai

Detalles

Descargas174.8K

Me gusta173

AccesoCódigo Abierto

Libreríatransformers

Creado2 mar 2022

Actualizado3 oct 2022

Ver en Hugging Face

Idiomas

Entiende todo el contexto.

Regístrate para leer casos de estudio completos, acceder a métricas detalladas y recibir todos los reportes.