Deep learning models for clinical event forecasting (CEF) based on a patient’s medical history have improved significantly over the past decade. However, their transition into practice has been limited, particularly for diseases with very low prevalence In this paper, we introduce CEF-CL, a novel method based on contrastive learning to forecast in the face of a limited number of positive training instances. CEF-CL consists of two primary components- (1) unsupervised contrastive learning for patient representation and (2) supervised transfer learning over the derived representation. We evaluate the new method along with state-of-the-art model architectures trained in a supervised manner with electronic health records data from Vanderbilt University Medical Center and the All of Us Research Program, covering 48 000 and 16 000 patients, respectively. We assess forecasting for over 100 diagnosis codes with respect to their area under the receiver operator characteristic curve (AUROC) and area under the precision-recall curve (AUPRC). We investigate the correlation between forecasting performance improvement and code prevalence via a Wald Test. CEF-CL achieved an average AUROC and AUPRC performance improvement over the state-of-the-art of 8.0%–9.3% and 11.7%–32 0%, respectively. The improvement in AUROC was negatively correlated with the number of positive training instances (P < .001). This investigation indicates that clinical event forecasting can be improved significantly through contrastive representation learning, especially when the number of positive training instances is small.