HanumaGPT
Enhanced nanoGPT architecture with optimized key/query vector sizes, sliding window attention, register tokens, advanced MLP layers, and alternative Softmax approximations, improving efficiency and performance.
HanumaGPT
HanumaGPT is an enhanced nanoGPT architecture featuring:
- Optimized key/query vector sizes
- Sliding window attention
- Register tokens
- Advanced MLP layers
- Alternative Softmax approximations
These modifications improve efficiency and performance, making HanumaGPT a more optimized variant of nanoGPT.
📌 GitHub Repository
🔗 HanumaGPT on GitHub

🔹 Features & Contributions
- Efficient Memory Optimization: Optimized key/query vector sizes to reduce memory usage.
- Enhanced Attention Mechanisms: Implemented sliding window attention for better long-sequence processing.
- Custom Softmax Approximations: Experimented with alternative Softmax methods for improved efficiency.
- Improved MLP Layers: Inspired by LLaMA models, leading to better performance.
- Register Tokens: Added register tokens to improve autoregressive decoding.
🚀 Results & Impact
- Improved computational efficiency over standard nanoGPT.
- Reduced inference costs while maintaining performance.
- Better handling of long-sequence generation with enhanced attention mechanisms.