CaLoRAify - Calorie Estimation with Visual-Text Pairing and LoRA-Driven Visual Language Models

Abstract

The obesity phenomenon, known as the heavy issue, is a leading cause of preventable chronic diseases worldwide. Traditional calorie estimation tools often rely on specific data formats or complex pipelines, limiting their practicality in real-world scenarios. Recently, vision-language models (VLMs) have excelled in understanding real-world contexts and enabling conversational interactions, making them ideal for downstream tasks such as ingredient analysis. However, applying VLMs to calorie estimation requires domain-specific data and alignment strategies. To this end, we curated CalData, a 330K image-text pair dataset tailored for ingredient recognition and calorie estimation, combining a large-scale recipe dataset with detailed nutritional instructions for robust vision-language training. Built upon this dataset, we present CaLoRAify, a novel VLM framework aligning ingredient recognition and calorie estimation via training with visual-text pairs. During inference, users only need a single monocular food image to estimate calories while retaining the flexibility of agent-based conversational interaction. With Low-rank Adaptation (LoRA) and Retrieve-augmented Generation (RAG) techniques, our system enhances the performance of foundational VLMs in the vertical domain of calorie estimation.

Date
Jan 8, 2025 12:00 PM — 1:00 PM
Event
EMIL Spring'25 Seminars
Location
Online (Zoom)
Saman Khamesian
Saman Khamesian
Graduate Research Associate

I am a Ph.D. researcher at Arizona State University, specializing in Artificial Intelligence with a focus on health applications. As a Graduate Research Assistant in the EMIL Lab under Dr. Hassan Ghasemzadeh, I work on developing advanced machine learning solutions for Type 1 diabetes management, including personalized glucose forecasting and automated insulin delivery systems.