10STATUS · ARCHIVED

VQA Viet

Vietnamese Visual Question Answering — deep-learning final project

▸ Summary

A multi-modal model that answers natural-language questions about images, trained and evaluated on Vietnamese-language data.

▸ Highlights

End-to-end pipeline: image encoder + language encoder fused into a joint representation for answer prediction.
Adapted English-first VQA techniques to Vietnamese tokenization and grammar quirks.
Final project for the Deep Learning course — full report and ablations included in the repo.