10STATUS · ARCHIVED

VQA Viet

Vietnamese Visual Question Answering — deep-learning final project

▸ Summary

A multi-modal model that answers natural-language questions about images, trained and evaluated on Vietnamese-language data.

▸ Highlights
  • End-to-end pipeline: image encoder + language encoder fused into a joint representation for answer prediction.
  • Adapted English-first VQA techniques to Vietnamese tokenization and grammar quirks.
  • Final project for the Deep Learning course — full report and ablations included in the repo.
← Back to projects