Blog

Footer

Follow me on X/Twitter Follow me on Bluesky Find me on GitHub

Paper Notes

Notes on papers I read.

2025

October

Blockwise Parallel Transformer for Large Context Models | Paper Notes
AI
Paper Notes

September

Fast Inference from Transformers via Speculative Decoding | Paper Notes
AI
Paper Notes
Fast Transformer Decoding: One Write-Head is All You Need | Paper Notes
AI
Paper Notes
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints | Paper Notes
AI
Paper Notes
Ring Attention with Blockwise Transformers for Near-Infinite Context | Paper Notes
AI
Paper Notes
Effective Long-Context Scaling of Foundation Models | Paper Notes
AI
Paper Notes
YaRN: Efficient Context Window Extension of Large Language Models | Paper Notes
AI
Paper Notes

August

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention | Paper Notes
AI
Paper Notes
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness | Paper Notes
AI
Paper Notes
Longformer: The Long-Document Transformer | Paper Notes
AI
Paper Notes
ReAct: Synergizing Reasoning and Acting in Language Models | Paper Notes
AI
Paper Notes
RoFormer: Enhanced Transformer with Rotary Position Embedding | Paper Notes
AI
Paper Notes
The Impact of Positional Encoding on Length Generalization in Transformers | Paper Notes
AI
Paper Notes
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation | Paper Notes
AI
Paper Notes
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models | Paper Notes
AI
Paper Notes
Tree of Thoughts: Deliberate Problem Solving with Large Language Models | Paper Notes
AI
Paper Notes
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | Paper Notes
AI
Paper Notes
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer | Paper Notes
AI
Paper Notes
Language Models are Few-Shot Learners | Paper Notes
AI
Paper Notes
Language Models are Unsupervised Multitask Learners | Paper Notes
AI
Paper Notes
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding | Paper Notes
AI
Paper Notes
Improving Language Understanding by Generative Pre-Training | Paper Notes
AI
Paper Notes
Attention Is All You Need | Paper Notes
AI
Paper Notes
Python
Neural Machine Translation by Jointly Learning to Align and Translate | Paper Notes
AI
Paper Notes
Python
Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference | Paper Notes
AI
Paper Notes
Sequence to Sequence Learning with Neural Networks | Paper Notes
AI
Paper Notes
Python

July

GloVe: Global Vectors for Word Representation | Paper Notes
AI
Paper Notes
Python
Efficient Estimation of Word Representations in Vector Space | Paper Notes
AI
Paper Notes
Python