IIIT Hyderabad’s BharatGen Team Launches Patram, India’s First Vision-Language AI Model for Documents

The BharatGen team from IIIT Hyderabad has launched Patram, India’s first vision-language foundational model designed specifically for document understanding.
BharatGen, a government-supported initiative focused on developing India-centric multimodal large language models, has achieved a major milestone with the launch of Patram-7B-Instruct - India’s first vision-language foundational model built from the ground up for complex document understanding. The model has been developed by a team representing BharatGen from the International Institute of Information Technology, Hyderabad (IIIT H) and the Indian Institute of Technology, Bombay (IIT B).
Patram by BharatGen
Patram is part of the BharatGen suite of multimodal large language models being developed with funding support from the Department of Science and Technology (DST). Patram-7B-Instruct is a 7-billion parameter vision-language AI model trained on a large and diverse corpus of Indian documents. Designed to analyze and understand scanned or photographed documents, the model can interpret and respond to natural-language instructions. It is now freely available as an open-source release on Hugging Face and MeitY IndiaAI’s AIKosh platform.
Developed in just five months, Patram was created by a team based at IIIT Hyderabad, comprising engineers (alumni) and student interns, with institutional support from IIIT-H and TiH-IoT at IIT Bombay. The project was led by Dr. Ravi Kiran Sarvadevabhatla, Associate Professor at IIIT-Hyderabad, and Dr. Ganesh Ramakrishnan, Professor at IIT-Bombay.
Despite its compact size, Patram outperforms several larger international models including DeepSeek-VL-2 on key benchmarks like DocVQA and VisualMRC. It also delivers impressive results on Patram-Bench, a custom benchmark designed to reflect real-world Indian document scenarios.
Prof. P. J. Narayanan, Director, IIIT Hyderabad, said, "Patram marks a significant step as India designs state-of-the-art foundational models. With this launch, we integrate language available in all forms: as text, as speech, and as images. This can power multimodal applications with integrated vision-language intelligence."
Dr. Ravi Kiran Sarvadevabhatla, Associate Professor at IIIT-Hyderabad and lead researcher on the project, said, “With Patram, we’ve built a model that understands the unique structure and diversity of Indian documents. This is just the beginning of what India can achieve in vision-language AI.”
Follow Shiksha.com for latest education news in detail on Exam Results, Dates, Admit Cards, & Schedules, Colleges & Universities news related to Admissions & Courses, Board exams, Scholarships, Careers, Education Events, New education policies & Regulations.
To get in touch with Shiksha news team, please write to us at news@shiksha.com

Latest News
Next Story