ghastlyDB

a lightweight vector database built from first principles

this is *not* a production grade database! it's just something i wanted to build because i was curious.

multiple embeddings

text-embedding-3-small, colBERT-ir/v2, NVEmbed/v2

LSM tree storage

the coolest data structure i've seen

containerized

because managing dependencies is hard

why did i build this? 🤔

i had to perform a case study / vendor analysis for a vector DB migration task during an internship. that got me interested in how these things work under the hood, how vector arithmetic works etc.

i decided the best way to learn was to build one from scratch. ghastlyDB uses a Log-Structured Merge (LSM) tree architecture (like RocksDB or ScyllaDB), with a memory-mapped memtable for writes and persistent SSTables on disk. the actual storage is powered by a skiplist index built from the ground up. and the vector similarity search takes a simple kNN approach for now (hnsw coming soon, hopefully)

right now, it's super basic - you can only do puts, gets, deletes, and semantic search. but i'm actively working on adding more features like proper indexing, compaction, write-ahead logs and maybe full implementing the Redis API! also planning to add more embedding models and distance metrics. if you're curious about any of this stuff, the code is open source! 📚

ps: this is not production ready yet! it's just a fun learning project. if you need a real vector db, check out weaviate or milvus! 🚀

GhastlyDB Console

Welcome to GhastlyDB CLI v0.1.0 Type 'help' to see available commands