We research and develop linguistically-grounded optimization techniques for Large Language Models, focusing on how ancient linguistic structures can solve modern computational challenges.
Our flagship project explores using Arabic morphological structure as an intermediate representation layer for LLMs.
Current tokenizers fragment text inefficiently, creating a "Token Tax" that:
Arabic's 1,400-year-old root system offers a mathematical framework for semantic compression:
ك-ت-ب (k-t-b) = "writing"
│
├─ كَتَبَ wrote
├─ كِتَاب book
├─ كَاتِب writer
├─ مَكْتُوب written
└─ مَكْتَبَة library
One root → Many meanings
Expected Impact:
We're working on releasing:
| Type | Description | Status |
|---|---|---|
| 🤖 Models | Root-compressed LLM variants | 🔬 In Research |
| 📊 Datasets | Arabic root-to-concept mappings | 📋 Planned |
| 🚀 Spaces | Interactive compression demos | 📋 Planned |
We're an open research initiative seeking collaborators:
Making AI more efficient through linguistic insight
Open Research • Open Source • Open Collaboration