Diffused Multi-scale Generative Adversarial Network for low-dose PET images reconstruction

Yu, Xiang; Hu, Daoyan; Yao, Qiong; Fu, Yu; Zhong, Yan; Wang, Jing; Tian, Mei; Zhang, Hong

doi:10.1186/s12938-025-01348-x

Research
Open access
Published: 09 February 2025

Diffused Multi-scale Generative Adversarial Network for low-dose PET images reconstruction

Xiang Yu¹,
Daoyan Hu²,
Qiong Yao³,
Yu Fu⁴,
Yan Zhong³,
Jing Wang³,
Mei Tian⁵ &
…
Hong Zhang^2,3,6

BioMedical Engineering OnLine volume 24, Article number: 16 (2025) Cite this article

639 Accesses
15 Altmetric
Metrics details

Abstract

Purpose

The aim of this study is to convert low-dose PET (L-PET) images to full-dose PET (F-PET) images based on our Diffused Multi-scale Generative Adversarial Network (DMGAN) to offer a potential balance between reducing radiation exposure and maintaining diagnostic performance.

Methods

The proposed method includes two modules: the diffusion generator and the u-net discriminator. The goal of the first module is to get different information from different levels, enhancing the generalization ability of the generator to the image and improving the stability of the training. Generated images are inputted into the u-net discriminator, extracting details from both overall and specific perspectives to enhance the quality of the generated F-PET images. We conducted evaluations encompassing both qualitative assessments and quantitative measures. In terms of quantitative comparisons, we employed two metrics, structure similarity index measure (SSIM) and peak signal-to-noise ratio (PSNR) to evaluate the performance of diverse methods.

Results

Our proposed method achieved the highest PSNR and SSIM scores among the compared methods, which improved PSNR by at least 6.2% compared to the other methods. Compared to other methods, the synthesized full-dose PET image generated by our method exhibits a more accurate voxel-wise metabolic intensity distribution, resulting in a clearer depiction of the epilepsy focus.

Conclusions

The proposed method demonstrates improved restoration of original details from low-dose PET images compared to other models trained on the same datasets. This method offers a potential balance between minimizing radiation exposure and preserving diagnostic performance.

Introduction

As one of the most widely used medical imaging technologies, PET plays a key role in navigated surgery, medical assessment and clinical examination [1], which detects biochemical and physiological changes unlike other imaging technologies, such as magnetic resonance imaging and computed tomography [2]. Because biochemical and physiological alterations frequently precede anatomical changes, PET is also extensively employed for preventive treatment and early disease identification. PET can assess molecular changes within the human body in vivo evaluations. In spite of the considerable advantages offered by PET, an increasing concern among individuals pertains to the potential health hazards linked to exposure to radiation during the scanning procedure. For example, the injected activity is often restricted by concerns about the radiation dose to patients in clinical practice, because more radiation dose has the potential to raise the cancer risk and harm the body to some extent [3, 4]. Hence, the utilization of low-dose PET (L-PET) images, enabling image acquisition with minimal radiation exposure, has garnered significant interest among researchers [5, 6]. However, the L-PET images exhibit elevated noise levels, diminished image contrast, and heightened artifacts in contrast with full-dose PET (F-PET) images [5], this renders them challenging for physicians to perform a precise diagnosis. Therefore, it has strong practical significance to obtain high-quality images from low-dose images to minimize image exposure while preserving image quality.

Numerous methods have been suggested to improve the quality of PET images [7,8,9]. One approach for achieving high-quality PET images involves integrating prior information into the image reconstruction process [10]. This method allows for the direct incorporation of imaging physics information. However, it faces challenges related to intensive computation, and access to the physics projection model is essential. Numerous studies have explored voxel-wise estimation methods post-image reconstruction. These methods include the random forest-based regression approach [11], the mapping-based sparse representation method [2], the semi-supervised tripled dictionary learning method [12], and the multi-level canonical correlation analysis framework [13]. While these existing methods have demonstrated promising results, they tend to produce overly smoothed images.

In recent years, deep learning methods have been widely investigated in the field of medical imaging [14]. Generative Adversarial Networks and Convolutional Neural Networks have been proven to make a success of denoising low-dose CT images [15, 16]. As deep learning is widely used in various fields, it has made a big difference in L-PET images tasks [17]. Xiang et al. proposed a sophisticated convolutional neural network model with auto-context learning that predicts high-dose PET images using only 1/4 dose of the full-dose PET images and their corresponding MR T1-images [18, 18]. Wang et al. developed a comprehensive framework using 3D conditional Generative Adversarial Networks (GANs) to generate superior-quality PET images from corresponding L-PET images [19]. Kaplan and Zhu presented a model incorporating particular image characteristics into the loss function to denoise 1/10 dose of the full-dose PET image slices and estimate their full-dose counterparts [20]. Chen et al. proposed synthesizing PET images of superior quality and precision utilizing either PET-only data or a combination of PET and MR information [21, 21]. Ouyang et al. proposed that a Generative Adversarial Network can achieve similar performance levels even in the absence of MR information [22]. More recent work by Yu et al. introduced a streamlined framework for L-PET images reconstruction, capable of rapidly producing F-PET images and leveraging the spatial details of the generated F-PET slices to improve the overall quality of the final 3D F-PET images [23].

However, there are still have several limitations, such as how we can enhance the comprehension of semantic information from different views in L-PET reconstruction, and so on. To tackle these concerns, we introduced a novel framework for L-PET reconstruction, including the diffusion generator and the u-net discriminator. We validated it by our experiment and the results showed that our approach demonstrates superiority compared to other reconstruction methods for L-PET images.

Results

Experimental setting

The experimental dataset comprised 45 pediatric subjects diagnosed with epilepsy, each of whom underwent a single full-dose brain PET scan, from which low-dose PET images were reconstructed through 5% undersampling of the corresponding list-mode full-dose PET data. By processing the three-dimensional brain images in the PET dataset, we can obtain 256*256 2D brain image slices. This step is performed to facilitate the use of these slices in our network model training. By extracting 2D slices from the 3D images, we can focus on key spatial features and reduce the complexity of the data, making it more computationally feasible for training deep learning models. The dataset was divided randomly into training, validation, and testing sets, with 80% for training, 10% for validation, and 10% for testing, to ensure diversity and representativeness. To verify the performance of proposed method, we compared it with cGAN [26], CycleGAN [27], and transGAN [23]. And in order to eliminate the impact of different initialization parameters on experiments, we set the same random seeds for all experimental methods. We used Pytorch library on a NVIDIA RTX 4090 GPU. The batch size was 1 and the epoch was set to be 300. Finally, we evaluated them through qualitative evaluation and quantitative measures. For quantitative comparisons, we chose two metrics to evaluate performance of various methods: SSIM and PSNR.

Quantitative evaluation

As shown in Table 1, the proposed method achieves the highest PSNR and SSIM scores compared to the other methods. The bold values in Table 1 indicate the best results for each metric. These results indicate that the images generated by our approach more closely resemble the corresponding original F-PET images in terms of these quantitative metrics. Specifically, our method achieves an improvement in PSNR of at least 6.2% over the compared methods.

Table 1 Quantitative compared results of DMGAN and other methods for L-PET reconstruction

Full size table

Qualitative evaluation

In Fig. 1, some random samples are selected to compare. We can observe that our proposed method has more accurate structural details than other approaches. However, these observations are based on visual comparisons, so further clinical validation is needed to confirm these findings. While SSIM and PSNR are standard measures to quantify image quality in computer vision, in nuclear medicine, a physician's observer score is often more relevant and necessary. Deep learning methods can sometimes introduce structural changes that do not significantly affect these quantitative measures but may still be important for clinical diagnosis. Therefore, a more in-depth evaluation of the images, such as expert assessments, is valuable. To address this concern, we included evaluations from an experienced nuclear medicine physician. The opinion scores aim to evaluate clinical feasibility, specifically concerning metabolic details of the brain in generated F-PET images. Test set images, with labels removed, are presented to the physician for assessment in a randomized order. Each PET image is rated for image quality on a five-point scale by the physician. As depicted in Fig. 2, the opinion scores for DMGAN are higher than those of the other compared methods, indicating its potential advantages in L-PET image reconstruction based on expert assessments. A more detailed comparison is presented in Fig. 3. Generated F-PET images and original F-PET images are computed in pseudo-color difference maps, revealing that the proposed method exhibited the smallest voxel-scale difference compared to the other methods. Given the clinical importance of precise low-dose reconstruction of epilepsy focus in pediatric patients, we present the epilepsy focus L-PET image, the synthesized epilepsy focus F-PET images, and the ground truth epilepsy focus F-PET image in Fig. 4. The results indicate that the F-PET image synthesized by DMGAN suggests a more accurate voxel-wise metabolic intensity distribution, leading to a clearer depiction of the epilepsy focus compared to other methods.

Ablation study

To certificate the effectiveness of every module, we designed the ablation study by removing key modules from the final method on PET dataset in Table 2. In this part, we proved that our proposed method consists of two modules, which both make a contribution to the final result. As we can see, compared to the final method, the reduction of Diffusion Generator and u-net discriminator both decrease the metric value, respectively, declines over four hundredths. These results provide evidence for the effectiveness of each module in the proposed method. Figure 5 shows the comparative results from this ablation study.

Table 2 Performance of every module in our proposed method DMGAN

Full size table

Discussion

PET is a widely utilized medical imaging technology for diagnosing various diseases. While full-dose imaging ensures high-quality images, concerns about radiation exposure persist. Balancing the need to reduce radiation exposure with maintaining diagnostic accuracy is crucial. This challenge is effectively addressed by reconstructing L-PET images to match the quality of F-PET images counterparts. In this study, we suggested DMGAN for efficient full-dose reconstruction of L-PET images. DMGAN comprises two modules: the diffusion generator and the u-net discriminator. Sequential L-PET slices are processed by the diffusion generator, generating higher quality F-PET images. The u-net discriminator learns to distinguish from global and local views between the real data, generated data and generated diffused data. Experimental results demonstrated that our proposed DMGAN performs excellently according to commonly used assessment criteria. The performance of the model is summarized as follows: compared with L-PET images, it achieves 38.6% improvement in terms of PSNR and 24.6% improvement in terms of SSIM, demonstrating its ability to generate F-PET images from L-PET images. To further highlight its advantages over existing models, we compared it with other methods. For example, compared with cGAN, CycleGAN and transGAN, it achieves about 24.2%, 25.1% and 6.2% improvement, respectively, in terms of PSNR and 9.6%, 9.9% and 1.6% improvement, respectively, in terms of SSIM. Our results showed that the proposed model can convert L-PET images to F-PET images, potentially balancing the reduction of radiation exposure with the preservation of diagnostic performance.

In previous research, the training process involves fusing conditional information into the generator and discriminator for cGAN [26]. For example, the input to both the generator and discriminator encompasses the conditional information derived from PET images, guiding the generation process and ensuring that the generated data exhibit heightened realism and controllability under the given PET images. Overall, cGAN serves as an extension to Generative Adversarial Networks, introducing PET image data information to enable the generation model to produce data tailored to the distribution of F-PET images. CycleGAN was originally designed for image translation tasks with the primary goal of learning mappings between two different domains without the need for paired training data [27]. Its most common application involves translating images between two domains, such as we can use it mapping L-PET to F-PET. In comparison to earlier GAN architectures, CycleGAN introduced the concept of cycle consistency loss as introduced before to address the training challenges faced by traditional GANs when dealing with unpaired data. The cycle consistency loss ensures that the two-directional image transformations performed by the generator are reversible, meaning that the transformed images can be accurately reverted back to their original form. This helps prevent the generator from producing unnatural outputs. The overall objective of training the model is to minimize the discrepancies between the PET images generated by the network and the corresponding F-PET images. TransGAN is a variant of GANs built upon the Transformer architecture, originally designed for image generation tasks [23]. Leveraging the self-attention mechanism inherent to Transformers, TransGAN aims to capture global information and dependencies within PET images. The model utilizes the multi-head attention mechanism to simultaneously focus on different regions of PET images. Both the generator and discriminator of TransGAN are constructed based on the Transformer architecture. In the context of L-PET reconstruct tasks, TransGAN exhibits greater flexibility and potential for global modeling, allowing it to effectively handle PET images.

We have deduced and explained proposed method principle. Getting different information from different views in the same images aims to improve the ability of the generator. A u-net discriminator is introduced to extract features from various views. By using u-net discriminator, we enable each layer to make an assessment so that different layers capture distinct semantic information, providing judgments on the image from a multi-scale perspective. Both of them are the main components of the structure and they enhance the quality of synthetic PET images with modified loss function. The comprehensive experiments, incorporating qualitative assessments and quantitative metrics, have substantiated that the proposed DMGAN is capable of synthesizing realistic PET images. It can restore original details from L-PET images, showing superior performance compared to other models trained on the same datasets. The synthesized F-PET image generated by DMGAN provides a sharper and clearer depiction of the epilepsy focus. We also conducted ablation experiments to further demonstrate the contribution of each module to the final result.

There are some limitations in our study that need to be acknowledged. First, although we added additional efficient modules to improve performance, this may increase the requirements for hardware resources at runtime, which may limit the applicability of our method in resource-constrained environments. For example, resource limitations in actual clinical environments lead to slower running speeds. Second, the current method is specifically designed for PET images, and this study has optimized the PET image reconstruction method accordingly. Although it shows promising results, it may limit its versatility to other imaging modalities. Future research can focus on improving efficiency while meeting high accuracy, and expanding the framework to integrate and analyze multimodal medical images (such as CT or MRI) to further verify and enhance the robustness of this method.

Conclusion

The study introduced an innovative framework based on generative adversarial networks for reconstructing low-dose PET images. In addition, we proposed a diffusion generator which can generate higher quality reconstructed images, which can be more approached to full-dose PET images. The final experimental results showed that our method performs better in both quantitative and in qualitative evaluation compared with other reconstruction methods of low-dose PET images. This study has the potential to mitigate patients' radiation exposure while maintaining clinical diagnostic efficacy.

Materials and methods

Development of GAN

A Generative Adversarial Network is widely recognized as a zero-sum game involving two network structures: a generator, which may consist of an autoencoder or its improved variations, and a discriminator, typically an autoencoder or a Convolutional Neural Network (CNN)-based discriminator. Since 2017, GAN-based models were chosen in the realm of medical image synthesis swiftly [24]. The proliferation of GAN-based models can be attributed to their adaptable components and the increasing availability of Graphics Processing Unit (GPU) resources. In this paper, some GAN-based models are included in Table 1 to analyze and reproduce for the purpose of comparison. Many contemporary GAN-based models rely on the image-to-image translation technique introduced by Isola et al. [25]. This method integrated the conditional GAN loss, as proposed by Mirza and Osindero [26], with an L1 regularizer loss. Consequently, the network acquires the transformation from input to target image and comprehends the loss function, enabling the generation of images closely resembling the ground truth. The conditional GAN loss is mathematically formulated as follows:

$${\mathcal{L}}_{{{\text{CGAN}}}} (D,G) = - E_{x,y} [(D(x,y) - 1)^{2} ] - E_{x,z} [D(x,G(x,z))^{2} ],$$

(1)

where $z\sim p(z)$ is random noise, $x$ is input and $y$ is target. The L1 regularization loss can be described as followed:

$${\mathcal{L}}_{L1} (G) = E_{x,y\sim P(x,y),z\sim P(z)} [||y - G(x,z)||_{1} ].$$

(2)

Therefore, considering Eqs. (1), (2), final objective function can be written as:

$$G^{*} ,D^{*} = \arg \mathop {\min }\limits_{G} \mathop {\max }\limits_{D} {\mathcal{L}}_{CGAN} (G,D) + {\uplambda \mathcal{L}}_{L1} (G),$$

(3)

where the $\uplambda$ is a hype-parameter to equalize losses.

The CycleGAN is employed for high-resolution image-to-image translation utilizing both paired and unpaired data [27]. It consists of two generators and two discriminators. ${G}_{AB}$ transfers image A to image B and ${G}_{BA}$ makes the opposite transformation of ${G}_{AB}$. At the same time, ${D}_{A}$ and ${D}_{B}$ decide the domain of the image. The adversarial loss function for ${G}_{AB}$ and ${D}_{B}$ pair is expressed as:

$${\mathcal{L}}_{{{\text{GAN}}}} (G_{AB} ,D_{B} ) = E_{{b\sim P_{B} (b)}} [\log D_{B} (b)] + E_{{a\sim P_{A} (a)}} [1 - \log (D_{B} (G_{AB} (a)))].$$

(4)

The adversarial loss for ${G}_{BA}$ and ${D}_{A}$ is represented as ${L}_{GAN}\left({G}_{BA},{D}_{A}\right)$ similarly. The cyclic-consistency loss in CycleGAN is:

$${\mathcal{L}}_{cyc} (G_{AB} ,G_{BA} ) = E_{{a\sim P_{A} (a)}} [a - G_{BA} (G_{AB} (a))||_{1} ] + E_{{b\sim P_{B} (b)}} [b - G_{AB} (G_{BA} (b))||_{1} ].$$

(5)

Considering the above of both, the overall loss of this model can be expressed as:

$${\mathcal{L}}(G_{AB} ,G_{BA} ,D_{A} ,D_{B} ) = {\mathcal{L}}_{GAN} (G_{AB} ,D_{B} ) + {\mathcal{L}}_{GAN} (G_{BA} ,D_{A} ) + {\mathcal{L}}_{cyc} (G_{AB} ,G_{BA} ).$$

(6)

Therefore, the objective function of the CycleGAN is:

$$G_{AB}^{*} ,G_{BA}^{*} = \arg \mathop {\min }\limits_{{G_{AB} ,G_{BA} }} \mathop {\max }\limits_{{D_{A} ,D_{B} }} {\mathcal{L}}(G_{AB} ,G_{BA} ,D_{A} ,D_{B} ).$$

(7)

Architecture of DMGAN

We presented a new framework based on generative adversarial network called Diffused Multi-scale Generative Adversarial Network, including two modules: the diffusion generator and the u-net discriminator, as shown in Fig. 6. Specific structure will be introduced later. For PET images in clinical situation, our objective is to generate images that closely resemble the original F-PET images from L-PET images. The advantages of our proposed method are generating an output sample, which uses a recognized distribution of actual images and make full use of its variability. Firstly, original low-dose PET image slices are fed into the diffusion generator to synthesize the full-dose images, which also contains a sequence of corresponding target slices. In the diffusion generator, noised L-PET images are generated by introducing noise into the input L-PET images. And then, the diffusion generator generates F-PET images and noised F-PET images. The purpose of this step is to enhance the generalization ability of the generator to the image and improve the stability of the training. Generated images are inputted into the u-net discriminator, extracting details from both overall and specific perspectives to enhance the quality of the generated F-PET images.

The diffusion generator

Different from the previous generative adversarial networks like a u-net generator [28] or a ResNet Generator [29, 30], which were widely used in previous studies. A new generator based on the generator in transGAN [23] has been designed as shown in Fig. 7. The best advantage of this model is to get different information from different levels, improving the ability of the generator to get information from the original images. In other words, the diffusion generator provides another angle to learn original image distribution relative to the other common generators. While training DMGAN, the adversarial loss function is expressed as follows:

$${\mathcal{L}}_{GAN} (D,G) = - E_{x,y} [(D(x,y) - 1)^{2} ] - E_{x} [D(x,G(x))^{2} ],$$

(8)

where x is representative of L-PET images, $G(x)$ is representative of F-PET images generated by the generator and y is corresponding F-PET images. We use Charbonnier loss [31] to penalize the Euclidean disparity not only between generated F-PET images and original F-PET images:

$${\mathcal{L}}(G) = E_{x,y} [\sqrt {||y - G(x)||^{2} +\upvarepsilon ^{2} } ],$$

(9)

but also generated noised F-PET images and original F-PET images:

$${\mathcal{L}}(G_{noised} ) = E_{x,y} [\sqrt {||y - G_{noised} (x)||^{2} +\upvarepsilon ^{2} } ].$$

(10)

Inspired by [32,33,34,35,36,37], and considering the perceptual difference between generated F-PET images, generated noised F-PET images and original F-PET images, we adopt a VGG16-Net that is trained on the ImageNet [38] to extract feature representations of $G(x)$ and $y$. The perceptual loss is:

$${\mathcal{L}}_{perc} (G,V) = E_{x,y} [\sqrt {||V(y) - V(G(x))||^{2} +\upvarepsilon ^{2} } ],$$

(11)

where V denotes the feature maps extracted by VGG16-Net. Based on the equations above, the total loss of diffusion generator is expressed as:

$${\mathcal{L}}_{dmgan} (G,V,D) = {\mathcal{L}}_{GAN} (D,G) + \upalpha ({\mathcal{L}}(G) + {\mathcal{L}}(G_{noised} )) + {\upbeta \mathcal{L}}_{perc} (G,V).$$

(12)

The hyper-parameters $\upalpha$ and $\upbeta$ are both 100.

The u-net discriminator

U-nets [39] have showcased state of art performance in numerous intricate image segmentation assignments. In these models, the encoder downsamples the input to capture the global information. And then the decoder performs upsampling. Skip connections transfer data between the encoder and decoder. Inspired by the recent GAN studies [40], we introduce a u-net discriminator in our model to get original images information from global and local views, which can cooperate the diffusion generator better. Unlike the original u-net network, the u-net discriminator consists of the original downsampling network and a new upsampling network, which are connected by the skip-connection and bottleneck. In contrast with the original networks, the u-net discriminator performs on a per-pixel basis. The loss for the encoder is:

$${\mathcal{L}}_{{D_{enc} }} = - E_{x} [\log D_{enc} (x)] - E_{z} [\log (1 - D_{enc} (G(z)))].$$

(13)

And the loss for the decoder as the mean of the all pixels can be expressed as:

$${\mathcal{L}}_{{D_{dec} }} = - E_{x} [\sum\limits_{i,j} {\log [D_{dec} (x)]_{i,j} } ] - E_{z} [\sum\limits_{i,j} {\log (1 - [D_{dec} (G(z))]_{i,j} )} ].$$

(14)

In Eq. (14), ${{[D}_{dec}(x)]}_{i,j}$ and ${{[D}_{dec}(G(z))]}_{i,j}$ are represented discriminator decision at pixel (i, j). The per-pixel outputs of ${D}_{dec}$ are obtained by integrating specific details from lower-level features, facilitated by skip connections originating from intermediate layers of the encoder network, with global information derived from high-level features through the process of upsampling from the bottleneck.

Considering Eqs. (13) and (14), the generator objective is:

$${\mathcal{L}}_{Gu - net} = - E_{z} [\log D_{enc} (G(z)) + \sum\limits_{i,j} {\log [D_{dec} (G(z))]_{i,j} } ],$$

(15)

encouraging the generator to concentrate on synthesizing images by capturing both global structures and local details effectively, aiming to deceive the discriminator more potent.

Inspired by [25], the loss of basic discriminator is:

$${\mathcal{L}}_{adv} = E_{x,y} [\log D_{dec} (x,y)] + E_{x} [\log (1 - D_{dec} (G(x),x))].$$

(16)

In our proposed method, the u-net diffusion discriminator is return two values, representing the decoder and the encoder’s output. The middle loss is used to describe the loss of the u-net discriminator encoder, which is expressed as:

$${\mathcal{L}}_{middle} = - \log (1 - D_{encfake} ) - \log (D_{encreal} ),$$

(17)

where ${D}_{encfake}$ and ${D}_{encreal}$ are the scalar outputs of the encoder. Therefore, the overall loss of diffusion discriminator is:

$${\mathcal{L}}_{d} = ({\mathcal{L}}_{adv} + {\mathcal{L}}_{middle} )*0.5{\upnu ,}$$

(18)

where $\upnu$ is a hyper-parameter set to be 1.

Datasets

The experimental dataset utilized in this study comprises 45 pediatric subjects diagnosed with epilepsy, encompassing both low-dose brain PET scans and corresponding full-dose brain PET scans, which are collected in 2020. The FDG-PET images of all subjects' brains are obtained using the whole-body hybrid PET/MR system (SIGNA PET/MR, GE Healthcare). In the context of clinical practice, we did not exclude images exhibiting comparatively lower quality. L-PET images are generated by reconstructing list-mode F-PET data, which undergoes a 5% undersampling process. The full-dose PET scan had an acquisition time of 20 min, with an administered radiotracer activity of 3.7 MBq/kg. The low-dose PET images were reconstructed using the first minute of the list-mode data. Using 5% dose can significantly reduce radiation risks, making the research results more valuable in actual clinical practice. And the 5% dose selection provides an extremely challenging testing environment to verify the ability of our reconstruction algorithm to generate high-quality images under extremely low-dose conditions. This rigorous testing helps demonstrate the robustness and effectiveness of the algorithm. Prior to reconstruction, both L-PET and F-PET images are subjected to preprocessing using Statistical Parametric Mapping for realignment and normalization. Following this preprocessing, the voxel size of these PET images is standardized to 1 × 1 × 1 mm³. This voxel size is chosen to provide higher spatial resolution. This is particularly important for detecting and analyzing fine structures, especially during the reconstruction of low-dose PET images. High-resolution images help improve image quality and the presentation of details. Through the processing of three-dimensional brain images within the PET dataset, we derived 256*256 2D brain image slices, which served as the basis for our experimental analyses.

Data analysis

In this study, to illustrate the effectiveness of the DMGAN for L-PET images reconstruction, we calculated the metrics of the model on the dataset, including SSIM and PSNR. SSIM is used to measure the similarity between the reconstructed image and the reference image. We used images of the entire brain region to calculate SSIM. This is because the structural characteristics of the entire brain region are important for evaluating image quality. The calculation of the SSIM value is based on the local window of each voxel, and the similarity of the image is evaluated by comparing the brightness, contrast and structural information. PSNR is used to measure the noise level between the reconstructed image and the reference image. We also used the image of the entire brain region to calculate PSNR. This is because the evaluation of the noise level needs to consider the global characteristics of the entire image. Meanwhile, the reconstruction ability of each model to reconstruct L-PET images was analyzed in the same way. The analysis and comparison of pseudo-color difference images were carried out through OpenCV package to further intuitively show the difference in the results of each method. To mitigate the influence of diverse initialization parameters on experimental outcomes, we set the same random seeds for all experimental methods. We utilized the PyTorch library to conduct this experiment.

Data availability

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

Tian M, He X, Jin C, et al. Transpathology: molecular imaging-based pathology. Eur J Nucl Med Mol Imaging. 2021;48:2338–50.
Article MATH Google Scholar
Wang Y, Zhang P, An L, et al. Predicting standard-dose PET image from low-dose PET and multimodal MR images using mapping-based sparse representation. Phys Med Biol. 2016;61(2):791.
Article MATH Google Scholar
Tan H, Sui X, Yin H, et al. Total-body PET/CT using half-dose FDG and compared with conventional PET/CT using full-dose FDG in lung cancer. Eur J Nucl Med Mol Imaging. 2021;48:1966–75.
Article MATH Google Scholar
Boellaard R, Delgado-Bolton R, Oyen WJG, et al. FDG PET/CT: EANM procedure guidelines for tumour imaging: version 20. Eur J Nucl Med Mol Imaging. 2015;42:328–54.
Article Google Scholar
Zhou L, Schaefferkoetter JD, Tham IWK, et al. Supervised learning with cyclegan for low-dose FDG PET image denoising. Med Image Anal. 2020;65: 101770.
Article Google Scholar
Wang YR, Baratto L, Hawk KE, et al. Artificial intelligence enables whole-body positron emission tomography scans with minimal radiation exposure. Eur J Nucl Med Mol Imaging. 2021;48:2771–81.
Article MATH Google Scholar
Fu Y, Dong S, Niu M, et al. AIGAN: attention–encoding integrated generative adversarial network for the reconstruction of low-dose CT and low-dose PET images. Med Image Anal. 2023;86: 102787.
Article MATH Google Scholar
Fu Y, Dong S, Huang Y, et al. MPGAN: multi pareto generative adversarial network for the denoising and quantitative analysis of low-dose PET images of human brain. Med Image Anal. 2024;98: 103306.
Article MATH Google Scholar
Zhou X, Fu Y, Dong S, et al. Intelligent ultrafast total-body PET for sedation-free pediatric [18F] FDG imaging. Eur J Nucl Med Mol Imaging. 2024: 1–14.
Wang C, Hu Z, Shi P, et al. Low dose PET reconstruction with total variation regularization//2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE, 2014: 1917–1920.
Kang J, Gao Y, Shi F, et al. Prediction of standard-dose brain PET image by using MRI and low-dose brain [18F] FDG PET images. Med Phys. 2015;42(9):5301–9.
Article MATH Google Scholar
Wang Y, Ma G, An L, et al. Semisupervised tripled dictionary learning for standard-dose PET image prediction using low-dose PET and multimodal MRI. IEEE Trans Biomed Eng. 2016;64(3):569–79.
Article MATH Google Scholar
An L, Zhang P, Adeli E, et al. Multi-level canonical correlation analysis for standard-dose PET image estimation. IEEE Trans Image Process. 2016;25(7):3303–15.
Article MathSciNet MATH Google Scholar
Kawahara J, Brown CJ, Miller SP, et al. BrainNetCNN: convolutional neural networks for brain networks; towards predicting neurodevelopment. Neuroimage. 2017;146:1038–49.
Article MATH Google Scholar
Chen H, Zhang Y, Kalra MK, et al. Low-dose CT with a residual encoder–decoder convolutional neural network. IEEE Trans Med Imaging. 2017;36(12):2524–35.
Article MATH Google Scholar
Yang Q, Yan P, Zhang Y, et al. Low-dose CT image denoising using a generative adversarial network with Wasserstein distance and perceptual loss. IEEE Trans Med Imaging. 2018;37(6):1348–57.
Article MATH Google Scholar
Shan H, Zhang Y, Yang Q, et al. 3-D convolutional encoder–decoder network for low-dose CT via transfer learning from a 2-D trained network. IEEE Trans Med Imaging. 2018;37(6):1522–34.
Article MATH Google Scholar
Xiang L, Qiao Y, Nie D, et al. Deep auto-context convolutional neural networks for standard-dose PET image estimation from low-dose PET/MRI. Neurocomputing. 2017;267:406–16.
Article MATH Google Scholar
Wang Y, Yu B, Wang L, et al. 3D conditional generative adversarial networks for high-quality PET image estimation at low dose. Neuroimage. 2018;174:550–62.
Article MATH Google Scholar
Kaplan S, Zhu YM. Full-dose PET image estimation from low-dose PET image using deep learning: a pilot study. J Digit Imaging. 2019;32(5):773–8.
Article MATH Google Scholar
Chen KT, Gong E, de Carvalho MFB, et al. Ultra–low-dose 18F-florbetaben amyloid PET imaging using deep learning with multi-contrast MRI inputs. Radiology. 2019;290(3):649–56.
Article Google Scholar
Ouyang J, Chen KT, Gong E, et al. Ultra-low-dose PET reconstruction using generative adversarial network with feature matching and task-specific perceptual loss. Med Phys. 2019;46(8):3555–64.
Article MATH Google Scholar
Fu Y, Dong S, Liao Y, et al. A resource-efficient deep learning framework for low-dose brain PET image reconstruction and analysis//2022 IEEE 19th International Symposium on Biomedical Imaging (ISBI). IEEE, 2022: 1–5.
Wang T, Lei Y, Fu Y, et al. A review on medical imaging synthesis using deep learning and its clinical applications. J Appl Clin Med Phys. 2021;22(1):11–36.
Article MATH Google Scholar
Isola P, Zhu J Y, Zhou T, et al. Image-to-image translation with conditional adversarial networks//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 1125–1134.
Mirza M, Osindero S. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784, 2014.
Zhu JY, Park T, Isola P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks//Proceedings of the IEEE international conference on computer vision. 2017: 2223–2232.
Isola P, Zhu J-Y, Zhou T, Efros AA. Image-to-image translation with conditional adversarial net-works. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1125–1134.
Dar SUH, Yurt M, Karacan L, et al. Image synthesis in multi-contrast MRI with conditional generative adversarial networks. IEEE Trans Med Imaging. 2019;38(10):2375–88.
Article MATH Google Scholar
Hu S, Shen Y, Wang S, et al. Brain MR to PET synthesis via bidirectional generative adversarial network//Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part II 23. Springer International Publishing, 2020: 698-707.
Barron JT. A general and adaptive robust loss function. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 4331–4339.
Ouyang J, Chen KT, Gong E, Pauly J, Zaharchuk G. Ultra‐low‐dose PET reconstruction using generative adversarial network with feature matching and task‐specific perceptual loss. Med Phys. 2019;46(8):3555–64. https://doiorg.publicaciones.saludcastillayleon.es/10.1002/mp.13626.
Article Google Scholar
Lei Y, Dong X, Wang T, et al. Whole-body pet estimation from low count statistics using cycle-consistent generative adversarial networks. Phys Med Biol. 2019;64(21): 215017.
Article MATH Google Scholar
Spuhler K, Serrano-Sosa M, Cattell R, DeLorenzo C, Huang C. Full-count pet recovery from low-count image using a dilated convolutional neural network. Med Phys. 2020;47(10):4928–38.
Article Google Scholar
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
Isola P, Zhu J-Y, Zhou T, Efros AA. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2017, pp. 1125–1134.
Salman UHD, Yurt M, Karacan L, Erdem A, Erdem E, Cukur T. Image synthesis in multi-contrast MRI with conditional generative adversarial networks. IEEE Trans Med Imaging. 2019;38(10):2375–88. https://doiorg.publicaciones.saludcastillayleon.es/10.1109/TMI.2019.2901750.
Article MATH Google Scholar
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. IEEE, 2009, pp. 248–255
Ronneberger O, Fischer P, Brox T. U-net: convolutional networks for biomedical image segmentation//Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5–9, 2015, Proceedings, Part III 18. Springer International Publishing, 2015: 234–241.
Schonfeld E, Schiele B, Khoreva A. A u-net based discriminator for generative adversarial networks//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 8207–8216.

Download references

Acknowledgements

This study was supported by the National Natural Science Foundation of China (82030049), the National Key Research and Development Program of China (2021YFA1101700, 2021YFE0108300) and the Fundamental Research Funds for the Central Universities.

Funding

National Natural Science Foundation of China, 82030049, National Key Research and Development Program of China, 2021YFA1101700, Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

Polytechnic Institute, Zhejiang University, Hangzhou, China
Xiang Yu
The College of Biomedical Engineering and Instrument Science of Zhejiang University, Hangzhou, China
Daoyan Hu & Hong Zhang
Department of Nuclear Medicine and Medical PET Center, The Second Affiliated Hospital of Zhejiang University School of Medicine, Hangzhou, 310009, China
Qiong Yao, Yan Zhong, Jing Wang & Hong Zhang
College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, China
Yu Fu
Human Phenome Institute, Fudan University, 825 Zhangheng Road, Shanghai, 201203, China
Mei Tian
Key Laboratory for Biomedical Engineering of Ministry of Education, Zhejiang University, Hangzhou, China
Hong Zhang

Authors

Xiang Yu
View author publications
You can also search for this author inPubMed Google Scholar
Daoyan Hu
View author publications
You can also search for this author inPubMed Google Scholar
Qiong Yao
View author publications
You can also search for this author inPubMed Google Scholar
Yu Fu
View author publications
You can also search for this author inPubMed Google Scholar
Yan Zhong
View author publications
You can also search for this author inPubMed Google Scholar
Jing Wang
View author publications
You can also search for this author inPubMed Google Scholar
Mei Tian
View author publications
You can also search for this author inPubMed Google Scholar
Hong Zhang
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

XY contributed to designing the study, acquiring the data, developing the deep learning algorithm, analyzing the study data and drafting the manuscript. DYH helped analyzing the data and revised the manuscript. QY helped revising the manuscript. YF contributed to the study design and data acquisition. YZ and JW contributed to the study design. MT and HZ contributed to the study design, data acquisition and revising the manuscript. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Mei Tian or Hong Zhang.

Ethics declarations

Ethics approval and consent to participate

This study was performed in line with the principles of the Declaration of Helsinki. This retrospective study received approval from the institutional review board of Hangzhou Universal Medical Imaging Diagnostic Center and the requirement of informed consent was waived (Approval No. [2021] 001).

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Yu, X., Hu, D., Yao, Q. et al. Diffused Multi-scale Generative Adversarial Network for low-dose PET images reconstruction. BioMed Eng OnLine 24, 16 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12938-025-01348-x

Download citation

Received: 25 September 2024
Accepted: 29 January 2025
Published: 09 February 2025
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12938-025-01348-x

Diffused Multi-scale Generative Adversarial Network for low-dose PET images reconstruction