Activity recognition using data collected with smart devices such as mobile and wearable sensors has become a critical component of many emerging applications ranging from behavioral medicine to gaming. However, an unprecedented increase in the diversity of smart devices in the Internet-of-Things era has limited the adoption of activity recognition models for use across different devices. This lack of cross-domain adaptation is particularly notable across sensors of different modalities where the mapping of the sensor data in the traditional feature level is highly challenging. To address this challenge, we propose ActiLabel, a combinatorial framework that learns structural similarities among the events that occur in a target domain and those of a source domain and identifies an optimal mapping between the two domains at their structural level. The structural similarities are captured through a graph model, referred to as the dependency graph, which abstracts details of activity patterns in low-level signal and feature space. The activity labels are then autonomously learned in the target domain by finding an optimal tiered mapping between the dependency graphs. We carry out an extensive set of experiments on three large datasets collected with wearable sensors involving human subjects. The results demonstrate the superiority of ActiLabel over state-of-the-art transfer learning and deep learning methods. In particular, ActiLabel outperforms such algorithms by average F1-scores of 36.3%, 32.7%, and 9.1% for cross-modality, cross-location, and cross-subject activity recognition, respectively.