Hybrid Attention Model Using Feature Decomposition and Knowledge Distillation for Blood Glucose Forecasting

Ebrahim Farahmand, Shovito Barua Soumma, Nooshin Taheri Chatrudi, Hassan Ghasemzadeh

April 2026

Abstract

The availability of continuous glucose monitors (CGMs) as over-the-counter commodities has created a unique opportunity to monitor a person’s blood glucose levels, forecast blood glucose trajectories, and provide automated interventions to prevent devastating chronic complications that arise from poor glucose control. However, forecasting blood glucose levels (BGL) is challenging because blood glucose changes consistently in response to food intake, medication intake, physical activity, sleep, and stress. It is particularly difficult to accurately predict BGL from multimodal and irregularly sampled mobile sensor data and over long prediction horizons. Furthermore, these forecasting models need to operate in real-time on edge devices to provide in-the-moment interventions. To address these challenges, we propose GlucoNet1, an AI model to forecast blood glucose patterns using sensor data about behavioral and physiological health. GlucoNet devises a feature decompositionbased lightweight transformer model that incorporates patients' behavioral and physiological data (e.g., blood glucose, diet, medication) and transforms sparse and irregular patient data (e.g., diet and medication intake data) into continuous features using a mathematical model, facilitating better integration with the BGL signals. Given the non-linear and non-stationary nature of blood glucose signals, we propose a decomposition method to extract both low-frequency (long-term) and high-frequency (short-term) components from the BGL signals, thus enabling the model to capture complex glucose dynamics for accurate forecasting. To reduce the computational complexity of transformer-based predictions, we propose to employ knowledge distillation (KD) to compress the transformer model. Our comprehensive analysis on two real-world T1D cohorts demonstrates that GlucoNet achieves a 35% improvement in RMSE, a 33% improvement in MAE, and a 62% reduction in the number of parameters over state of the art work such as PatchTST on the OhioT1DM dataset (12 patients), while additional experiments on the AZT1D dataset (25 patients), together with extensive ablation and robustness analyses, further demonstrate its generalizability and stability. These results underscore GlucoNet’s potential as a compact and reliable tool for real-world diabetes prevention and management.

Type

Journal article

Publication

IEEE Transactions on Mobile Computing (IEEE TMC) - April 2026

featured

Hybrid Attention Model Using Feature Decomposition and Knowledge Distillation for Blood Glucose Forecasting

Abstract

Ebrahim Farahmand

Graduate Teaching Assistant

Shovito Barua Soumma

Graduate Research Associate

Nooshin Taheri Chatrudi

Graduate Teaching Assistant

Hassan Ghasemzadeh

Director