Diet and physical activity are known as important lifestyle factors in self-management and prevention of many chronic diseases. Mobile sensors such as accelerometers have been used to measure physical activity or detect eating time. In many intervention studies, however, stringent monitoring of overall dietary composition and energy intake is needed. Currently, such a monitoring relies on self-reported data by either entering text or taking an image that represents food intake. These approaches suffer from limitations such as low adherence in technology adoption and time sensitivity to the diet intake context. In order to address these limitations, we introduce development and validation of Speech2Health, a voice- based mobile nutrition monitoring system that devises speech processing, natural language processing (NLP), and text mining techniques in a unified platform to facilitate nutrition monitoring. After converting the spoken data to text, nutrition-specific data are identified within the text using an NLP-based approach that combines standard NLP with our introduced pattern mapping technique. We then develop a tiered matching algorithm to search the food name in our nutrition database and accurately compute calorie intake values. We evaluate Speech2Health using real data collected with 30 participants. Our experimental results show that Speech2Health achieves an accuracy of 92.2% in computing calorie intake. Furthermore, our user study demonstrates that Speech2Health achieves significantly higher scores on technology adoption metrics compared to text-based and image-based nu- trition monitoring. Our research demonstrates that new sensor modalities such as voice can be used either standalone or as a complementary source of information to existing modalities to improve accuracy and acceptability of mobile health technologies for dietary composition monitoring.