ghastlyDB
a lightweight vector database built from first principles
this is *not* a production grade database! it's just something i wanted to build because i was curious.
multiple embeddings
text-embedding-3-small, colBERT-ir/v2, NVEmbed/v2
LSM tree storage
the coolest data structure i've seen
containerized
because managing dependencies is hard
why did i build this? 🤔
i had to perform a case study / vendor analysis for a vector DB migration task during an internship. that got me interested in how these things work under the hood, how vector arithmetic works etc.
i decided the best way to learn was to build one from scratch. ghastlyDB uses a Log-Structured Merge (LSM) tree architecture (like RocksDB or ScyllaDB), with a memory-mapped memtable for writes and persistent SSTables on disk. the actual storage is powered by a skiplist index built from the ground up. and the vector similarity search takes a simple kNN approach for now (hnsw coming soon, hopefully)
right now, it's super basic - you can only do puts, gets, deletes, and semantic search. but i'm actively working on adding more features like proper indexing, compaction, write-ahead logs and maybe full implementing the Redis API! also planning to add more embedding models and distance metrics. if you're curious about any of this stuff, the code is open source! 📚
ps: this is not production ready yet! it's just a fun learning project. if you need a real vector db, check out weaviate or milvus! 🚀