10STATUS · ARCHIVED
VQA Viet
Vietnamese Visual Question Answering — deep-learning final project
A multi-modal model that answers natural-language questions about images, trained and evaluated on Vietnamese-language data.
- End-to-end pipeline: image encoder + language encoder fused into a joint representation for answer prediction.
- Adapted English-first VQA techniques to Vietnamese tokenization and grammar quirks.
- Final project for the Deep Learning course — full report and ablations included in the repo.