How many parameters does scibert_scivocab_uncased have?

Parameter count for scibert_scivocab_uncased is not available. See the Hugging Face model page for full specifications.

Who created scibert_scivocab_uncased?

scibert_scivocab_uncased was published by Ai2 on Hugging Face.

scibert_scivocab_uncased

Name: scibert_scivocab_uncased
Author: Ai2

Otherby Ai2·Model page ↗

Ai2's BERT model pretrained on scientific text with a domain-specific vocabulary for NLP tasks in scientific domains.

Model Description

This is the pretrained model presented in SciBERT: A Pretrained Language Model for Scientific Text, which is a BERT model trained on scientific text.

The training corpus was papers taken from Semantic Scholar. Corpus size is 1.14M papers, 3.1B tokens. We use the full text of the papers in training, not just abstracts.

SciBERT has its own wordpiece vocabulary (scivocab) that's built to best match the training corpus. We trained cased and uncased versions.

Available models include:

scibert_scivocab_cased
scibert_scivocab_uncased

The original repo can be found here.

If using these models, please cite the following paper:

@inproceedings{beltagy-etal-2019-scibert,
    title = "SciBERT: A Pretrained Language Model for Scientific Text",
    author = "Beltagy, Iz  and Lo, Kyle  and Cohan, Arman",
    booktitle = "EMNLP",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/D19-1371"
}

Author

Ai2

Organization · ✓

allenai

Details

Downloads174.8K

Likes173

AccessOpen Source

Librarytransformers

CreatedMar 2, 2022

UpdatedOct 3, 2022

View on Hugging Face