This presentation explores the effectiveness of large language models (LLMs) like GPT-4 and Llama 2 in identifying early signs of cognitive decline from real-world electronic health record (EHR) clinical notes. The study compares these LLMs with traditional machine learning models and introduces an ensemble method that combines their predictions. Results show that while LLMs provide complementary value, the ensemble approach significantly improves diagnostic accuracy, achieving an F1 score of 92.1% with 90.2% precision and 94.2% recall. This talk highlights how combining general-purpose LLMs with local models can support early detection of dementia-related disorders and enhance clinical decision-making.