HanumaGPT

Enhanced nanoGPT architecture with optimized key/query vector sizes, sliding window attention, register tokens, advanced MLP layers, and alternative Softmax approximations, improving efficiency and performance.

HanumaGPT

HanumaGPT is an enhanced nanoGPT architecture featuring:

  • Optimized key/query vector sizes
  • Sliding window attention
  • Register tokens
  • Advanced MLP layers
  • Alternative Softmax approximations

These modifications improve efficiency and performance, making HanumaGPT a more optimized variant of nanoGPT.

📌 GitHub Repository

🔗 HanumaGPT on GitHub

🔹 Features & Contributions

  • Efficient Memory Optimization: Optimized key/query vector sizes to reduce memory usage.
  • Enhanced Attention Mechanisms: Implemented sliding window attention for better long-sequence processing.
  • Custom Softmax Approximations: Experimented with alternative Softmax methods for improved efficiency.
  • Improved MLP Layers: Inspired by LLaMA models, leading to better performance.
  • Register Tokens: Added register tokens to improve autoregressive decoding.

🚀 Results & Impact

  • Improved computational efficiency over standard nanoGPT.
  • Reduced inference costs while maintaining performance.
  • Better handling of long-sequence generation with enhanced attention mechanisms.