We present an electrocardiogram (ECG) -based emotion recognition system using self-supervised learning. Our proposed architecture consists of two main networks, a signal transformation recognition network and an emotion recognition network. First, unlabelled data are used to successfully train the former network to detect specific pre-determined signal transformations in the self-supervised learning step. Next, the weights of the convolutional layers of this network are transferred to the emotion recognition network, and two dense layers are trained in order to classify arousal and valence scores. We show that our self-supervised approach helps the model learn the ECG feature manifold required for emotion recognition, performing equal or better than the fully-supervised version of the model. Our proposed method outperforms the state-of-the-art in ECG-based emotion recognition with two publicly available datasets, SWELL and AMIGOS. Further analysis highlights the advantage of our self-supervised approach in requiring significantly less data to achieve acceptable results.