Коллекция LLM

В данном разделе представлен сборник и краткое описание значимых и основополагающих моделей языковых моделей (LLM).

Модели

Model	Release Date	Size (B)	Checkpoints	Description
Falcon LLM (opens in a new tab)	May 2023	7, 40	Falcon-7B (opens in a new tab), Falcon-40B (opens in a new tab)	Falcon LLM is a foundational large language model (LLM) with 40 billion parameters trained on one trillion tokens. TII has now released Falcon LLM – a 40B model.
PaLM 2 (opens in a new tab)	May 2023	-	-	A Language Model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM.
Med-PaLM 2 (opens in a new tab)	May 2023	-	-	Towards Expert-Level Medical Question Answering with Large Language Models
Gorilla (opens in a new tab)	May 2023	7	Gorilla (opens in a new tab)	Gorilla: Large Language Model Connected with Massive APIs
RedPajama-INCITE (opens in a new tab)	May 2023	3, 7	RedPajama-INCITE (opens in a new tab)	A family of models including base, instruction-tuned & chat models.
LIMA (opens in a new tab)	May 2023	65	-	A 65B parameter LLaMa language model fine-tuned with the standard supervised loss on only 1,000 carefully curated prompts and responses, without any reinforcement learning or human preference modeling.
Replit Code (opens in a new tab)	May 2023	3	Replit Code (opens in a new tab)	replit-code-v1-3b model is a 2.7B LLM trained on 20 languages from the Stack Dedup v1.2 dataset.
h2oGPT (opens in a new tab)	May 2023	12	h2oGPT (opens in a new tab)	h2oGPT is a large language model (LLM) fine-tuning framework and chatbot UI with document(s) question-answer capabilities.
CodeGen2 (opens in a new tab)	May 2023	1, 3, 7, 16	CodeGen2 (opens in a new tab)	Code models for program synthesis.
CodeT5 and CodeT5+ (opens in a new tab)	May 2023	16	CodeT5 (opens in a new tab)	CodeT5 and CodeT5+ models for Code Understanding and Generation from Salesforce Research.
StarCoder (opens in a new tab)	May 2023	15	StarCoder (opens in a new tab)	StarCoder: A State-of-the-Art LLM for Code
MPT-7B (opens in a new tab)	May 2023	7	MPT-7B (opens in a new tab)	MPT-7B is a GPT-style model, and the first in the MosaicML Foundation Series of models.
DLite (opens in a new tab)	May 2023	0.124 - 1.5	DLite-v2-1.5B (opens in a new tab)	Lightweight instruction following models which exhibit ChatGPT-like interactivity.
Dolly (opens in a new tab)	April 2023	3, 7, 12	Dolly (opens in a new tab)	An instruction-following LLM, fine-tuned on a human-generated instruction dataset licensed for research and commercial use.
StableLM (opens in a new tab)	April 2023	3, 7	StableLM-Alpha (opens in a new tab)	Stability AI's StableLM series of language models
Pythia (opens in a new tab)	April 2023	0.070 - 12	Pythia (opens in a new tab)	A suite of 16 LLMs all trained on public data seen in the exact same order and ranging in size from 70M to 12B parameters.
Open Assistant (Pythia Family) (opens in a new tab)	March 2023	12	Open Assistant (opens in a new tab)	OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
Cerebras-GPT (opens in a new tab)	March 2023	0.111 - 13	Cerebras-GPT (opens in a new tab)	Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster
BloombergGPT (opens in a new tab)	March 2023	50	-	BloombergGPT: A Large Language Model for Finance
PanGu-Σ (opens in a new tab)	March 2023	1085	-	PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing
GPT-4 (opens in a new tab)	March 2023	-	-	GPT-4 Technical Report
LLaMA (opens in a new tab)	Feb 2023	7, 13, 33, 65	LLaMA (opens in a new tab)	LLaMA: Open and Efficient Foundation Language Models
ChatGPT (opens in a new tab)	Nov 2022	-	-	A model called ChatGPT which interacts in a conversational way. The dialogue format makes it possible for ChatGPT to answer followup questions, admit its mistakes, challenge incorrect premises, and reject inappropriate requests.
Galactica (opens in a new tab)	Nov 2022	0.125 - 120	Galactica (opens in a new tab)	Galactica: A Large Language Model for Science
mT0 (opens in a new tab)	Nov 2022	13	mT0-xxl (opens in a new tab)	Crosslingual Generalization through Multitask Finetuning
BLOOM (opens in a new tab)	Nov 2022	176	BLOOM (opens in a new tab)	BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
U-PaLM (opens in a new tab)	Oct 2022	540	-	Transcending Scaling Laws with 0.1% Extra Compute
UL2 (opens in a new tab)	Oct 2022	20	UL2, Flan-UL2 (opens in a new tab)	UL2: Unifying Language Learning Paradigms
Sparrow (opens in a new tab)	Sep 2022	70	-	Improving alignment of dialogue agents via targeted human judgements
Flan-T5 (opens in a new tab)	Oct 2022	11	Flan-T5-xxl (opens in a new tab)	Scaling Instruction-Finetuned Language Models
AlexaTM (opens in a new tab)	Aug 2022	20	-	AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model
GLM-130B (opens in a new tab)	Oct 2022	130	GLM-130B (opens in a new tab)	GLM-130B: An Open Bilingual Pre-trained Model
OPT-IML (opens in a new tab)	Dec 2022	30, 175	OPT-IML (opens in a new tab)	OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization
OPT (opens in a new tab)	May 2022	175	OPT-13B (opens in a new tab), OPT-66B (opens in a new tab)	OPT: Open Pre-trained Transformer Language Models
PaLM (opens in a new tab)	April 2022	540	-	PaLM: Scaling Language Modeling with Pathways
Tk-Instruct (opens in a new tab)	April 2022	11	Tk-Instruct-11B (opens in a new tab)	Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
GPT-NeoX-20B (opens in a new tab)	April 2022	20	GPT-NeoX-20B (opens in a new tab)	GPT-NeoX-20B: An Open-Source Autoregressive Language Model
Chinchilla (opens in a new tab)	Mar 2022	70	-	Shows that for a compute budget, the best performances are not achieved by the largest models but by smaller models trained on more data.
InstructGPT (opens in a new tab)	Mar 2022	175	-	Training language models to follow instructions with human feedback
CodeGen (opens in a new tab)	Mar 2022	0.350 - 16	CodeGen (opens in a new tab)	CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis
AlphaCode (opens in a new tab)	Feb 2022	41	-	Competition-Level Code Generation with AlphaCode
MT-NLG (opens in a new tab)	Jan 2022	530	-	Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
LaMDA (opens in a new tab)	Jan 2022	137	-	LaMDA: Language Models for Dialog Applications
GLaM (opens in a new tab)	Dec 2021	1200	-	GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
Gopher (opens in a new tab)	Dec 2021	280	-	Scaling Language Models: Methods, Analysis & Insights from Training Gopher
WebGPT (opens in a new tab)	Dec 2021	175	-	WebGPT: Browser-assisted question-answering with human feedback
Yuan 1.0 (opens in a new tab)	Oct 2021	245	-	Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning
T0 (opens in a new tab)	Oct 2021	11	T0 (opens in a new tab)	Multitask Prompted Training Enables Zero-Shot Task Generalization
FLAN (opens in a new tab)	Sep 2021	137	-	Finetuned Language Models Are Zero-Shot Learners
HyperCLOVA (opens in a new tab)	Sep 2021	82	-	What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers
ERNIE 3.0 Titan (opens in a new tab)	July 2021	10	-	ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation
Jurassic-1 (opens in a new tab)	Aug 2021	178	-	Jurassic-1: Technical Details and Evaluation
ERNIE 3.0 (opens in a new tab)	July 2021	10	-	ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation
Codex (opens in a new tab)	July 2021	12	-	Evaluating Large Language Models Trained on Code
GPT-J-6B (opens in a new tab)	June 2021	6	GPT-J-6B (opens in a new tab)	A 6 billion parameter, autoregressive text generation model trained on The Pile.
CPM-2 (opens in a new tab)	Jun 2021	198	CPM (opens in a new tab)	CPM-2: Large-scale Cost-effective Pre-trained Language Models
PanGu-α (opens in a new tab)	April 2021	13	PanGu-α (opens in a new tab)	PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation
mT5 (opens in a new tab)	Oct 2020	13	mT5 (opens in a new tab)	mT5: A massively multilingual pre-trained text-to-text transformer
BART (opens in a new tab)	Jul 2020	-	BART (opens in a new tab)	Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
GShard (opens in a new tab)	Jun 2020	600	-	GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
GPT-3 (opens in a new tab)	May 2020	175	-	Language Models are Few-Shot Learners
CTRL (opens in a new tab)	Sep 2019	1.63	CTRL (opens in a new tab)	CTRL: A Conditional Transformer Language Model for Controllable Generation
ALBERT (opens in a new tab)	Sep 2019	0.235	ALBERT (opens in a new tab)	A Lite BERT for Self-supervised Learning of Language Representations
XLNet (opens in a new tab)	Jun 2019	-	XLNet (opens in a new tab)	Generalized Autoregressive Pretraining for Language Understanding and Generation
T5 (opens in a new tab)	Oct 2019	0.06 - 11	Flan-T5 (opens in a new tab)	Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
GPT-2 (opens in a new tab)	Nov 2019	1.5	GPT-2 (opens in a new tab)	Language Models are Unsupervised Multitask Learners
RoBERTa (opens in a new tab)	July 2019	0.125 - 0.355	RoBERTa (opens in a new tab)	A Robustly Optimized BERT Pretraining Approach
BERT (opens in a new tab)	Oct 2018	-	BERT (opens in a new tab)	Bidirectional Encoder Representations from Transformers
GPT (opens in a new tab)	June 2018	-	GPT (opens in a new tab)	Improving Language Understanding by Generative Pre-Training

⚠️

Данный раздел находится в стадии разработки.

Данные для этого раздела взяты из Papers with Code (opens in a new tab) и из недавних работ Zhao et al. (2023) (opens in a new tab).

Sora claude-3