Machine learning promises to revolutionize clinical decision making and diagnosis. In medical diagnosis a doctor aims to explain a patient’s symptoms by determining the diseases causing them. However, existing diagnostic algorithms are purely associative, identifying diseases that are strongly correlated with a patients symptoms and medical history. We show that this inability to disentangle correlation from causation can result in sub-optimal or dangerous diagnoses. To overcome this, we reformulate diagnosis as a counterfactual inference task and derive new counterfactual diagnostic algorithms. We show that this approach is closer to the diagnostic reasoning of clinicians and significantly improves the accuracy and safety of the resulting diagnoses. We compare our counterfactual algorithm to the standard Bayesian diagnostic algorithm and a cohort of 44 doctors using a test set of clinical vignettes. While the Bayesian algorithm achieves an accuracy comparable to the average doctor, placing in the top 48% of doctors in our cohort, our counterfactual algorithm places in the top 25% of doctors, achieving expert clinical accuracy. This improvement is achieved simply by changing how we query our model, without requiring any additional model improvements. Our results show that counterfactual reasoning is a vital missing ingredient for applying machine learning to medical diagnosis.