Posts
2025
[Note] Attention OptimizationJanuary 15, 2025[Note] TransfomerJanuary 12, 20252024
MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUsDecember 31, 2024[Note] Roofline ModelDecember 31, 2024TwinPilots: A New Computing Paradigm for GPU-CPU Parallel LLM InferenceDecember 30, 2024Hybrid Heterogeneous Clusters Can Lower the Energy Consumption of LLM Inference WorkloadsDecember 29, 2024HETEGAN: Heterogeneous Parallel Inference for Large Language Models on Resource-Constrained DevicesDecember 28, 2024FIDDLER: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts ModelsDecember 27, 2024Accelerating Distributed MoE Training and Inference with LinaJune 6, 2024OpenMoE: An Early Effort on Open Mixture-of-Experts Language ModelsJune 5, 2024Shuffling, Fast and Slow: Scalable Analytics on Serverless InfrastructureJanuary 27, 2024[PCPP Note] Topic_7 Lock Free Data StructuresJanuary 3, 2024[PCPP Note] Topic_6 Performance and ScalabilityJanuary 3, 20242023
[PCPP Note] Topic_5 Performance MeasurementsDecember 22, 2023[PCPP Note] Topic_4 TestingDecember 22, 2023[PCPP Note] Topic_3 Shared Memory IIDecember 20, 2023[PCPP Note] Topic_2 Shared Memory IDecember 17, 2023[PCPP Note] Topic_1 Intro to Concurrency and the Mutual Exclusion ProblemDecember 17, 2023FalconDB: Blockchain-based Collaborative DatabaseOctober 25, 2023Last_testSeptember 9, 2023My First PostSeptember 9, 2023