Tin Rabzelj
Tin Rabzelj
Blog
Toggle theme
Toggle navigation menu
Dashed Line
Blog
Blog RSS feed
See tags.
Personal
Notes
External
These are my notes and thoughts, jotted down for future reference. They may be outdated, inaccurate, or completely useless.
Fast Inference from Transformers via Speculative Decoding | Paper Notes
AI
Paper Notes
Fast Transformer Decoding: One Write-Head is All You Need | Paper Notes
AI
Paper Notes
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints | Paper Notes
AI
Paper Notes
Ring Attention with Blockwise Transformers for Near-Infinite Context | Paper Notes
AI
Paper Notes
Effective Long-Context Scaling of Foundation Models | Paper Notes
AI
Paper Notes
YaRN: Efficient Context Window Extension of Large Language Models | Paper Notes
AI
Paper Notes
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention | Paper Notes
AI
Paper Notes
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness | Paper Notes
AI
Paper Notes
Longformer: The Long-Document Transformer | Paper Notes
AI
Paper Notes
ReAct: Synergizing Reasoning and Acting in Language Models | Paper Notes
AI
Paper Notes
RoFormer: Enhanced Transformer with Rotary Position Embedding | Paper Notes
AI
Paper Notes
The Impact of Positional Encoding on Length Generalization in Transformers | Paper Notes
AI
Paper Notes
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation | Paper Notes
AI
Paper Notes
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models | Paper Notes
AI
Paper Notes
Tree of Thoughts: Deliberate Problem Solving with Large Language Models | Paper Notes
AI
Paper Notes
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | Paper Notes
AI
Paper Notes
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer | Paper Notes
AI
Paper Notes
Language Models are Few-Shot Learners | Paper Notes
AI
Paper Notes
Language Models are Unsupervised Multitask Learners | Paper Notes
AI
Paper Notes
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding | Paper Notes
AI
Paper Notes
Improving Language Understanding by Generative Pre-Training | Paper Notes
AI
Paper Notes
Attention Is All You Need | Paper Notes
AI
Paper Notes
Python
Neural Machine Translation by Jointly Learning to Align and Translate | Paper Notes
AI
Paper Notes
Python
Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference | Paper Notes
AI
Paper Notes
Sequence to Sequence Learning with Neural Networks | Paper Notes
AI
Paper Notes
Python
GloVe: Global Vectors for Word Representation | Paper Notes
AI
Paper Notes
Python
Efficient Estimation of Word Representations in Vector Space | Paper Notes
AI
Paper Notes
Python