Formulir Kontak

Nama

Email *

Pesan *

Cari Blog Ini

Llama 2 70b Size

Understanding LLaMA 2: Training Details and Performance Enhancements

Pretraining Data and Batch Size

LLaMA 2 models are trained solely on pretraining data, with token counts indicating the size of this data. All models leverage a global batch size of 4M tokens during training.

Model-Specific Enhancements

* The larger 70B model employs Grouped-Query Attention (GQA) for improved performance. * During testing, the LLaMA-2 70b q3_K_S model at 32k context utilized arguments tailored for 16k context size.

Enhanced Context Length and Model Sizes

Compared to LLaMA 1, LLaMA 2 models boast twice the context length. All three available sizes (7B, 13B, and 70B) are trained on 2 trillion tokens.

Fine-Tuning Options and GPU Requirements

LLaMA 2 can be fine-tuned using Amazon SageMaker. The vocab_size parameter is optional, with a default value of 32000. For optimal performance of LLaMA-65B and 70B, GPUs with at least 40GB VRAM are recommended.


Komentar