In this session, we provide a list tutorials for writing various distributed operations with Triton-distributed. It is recommended that you first read the technique report, which contains design and implementation details, and then play with these tutorials.
- [Primitives]: Basic notify and wait operation
- [Primitives & Communication]: Use copy engine and NVSHMEM primitives for AllGather
- [Communication]: Inter-node AllGather
- [Communication]: Intra-node and Inter-node DeepSeek EP AllToAll
- [Communication]: Intra-node ReduceScatter
- [Communication]: Inter-node ReduceScatter
- [Overlapping]: AllGather GEMM overlapping
- [Overlapping]: GEMM ReduceScatter overlapping
- [Overlapping]: AllGather GEMM overlapping on AMD
- [Overlapping]: GEMM ReduceScatter overlapping on AMD