**Methods:** The basic idea of our method is to approximate the radiation physics to calculate a first-order exposure estimate quickly. This initial estimate is then refined using prior knowledge derived from MC simulations. To this end, the primary photon propagation inside a voxelized patient model is estimated using a less accurate but fast photon ray casting (RC) simulation based on the Beer–Lambert law. The results of the RC simulation are then fed into a convolutional neural network (CNN), which maps the propagation of primary photons to the dose deposition inside the patient model. Additionally, the patient model itself including anatomy and material properties, such as mass density and mass energy-absorption coefficients, are fed into the CNN as well. The CNN is trained using smoothed results of MC simulations as output and RC simulations of identical imaging settings and patient models as input.

**Results:** In total, 163 MC and associated RC simulations are carried out for the head, thorax, abdomen, and pelvis in three different voxel phantoms. We used 1e8 or 1e9 primarily emitted photons sampled from a 125 kV peak voltage spectrum, respectively. Edge-preserving smoothing (EPS) is applied to reduce (a) general stochastic uncertainties and (b) stochastic uncertainty concerning MC simulations of less primary photons. The CNN is trained using seven imaging settings of the abdomen in a single phantom. Testing its performance on the remaining datasets, the CNN is capable of estimating skin dose with an error of below 10% for the majority of test cases.

**Conclusion:** The combination of deep neural networks and MC simulation of particle physics has the potential to decrease the computational complexity of accurate skin dose estimation. The proposed approach can provide dose distributions in under one second when running on high-end hardware. On lower cost hardware, it took up to 2 min to arrive at the same result. This makes our approach applicable in high-end environments as well as in budget solutions. Furthermore, the number of primary photons only affects the training time, while the execution time is independent of the number of primary photons.