Frequency domain-enhanced spectral-spatial fusion transformer for semantic segmentation of remote sensing images

Li X, Xu F, Li J, Su Y, Li L, Lyu X, Xu Z, Kaup A (2026)


Publication Type: Journal article

Publication year: 2026

Journal

Book Volume: 132

Article Number: 104248

DOI: 10.1016/j.inffus.2026.104248

Abstract

Semantic segmentation of remote sensing images (RSIs) is challenged by the coexistence of coarse spatial layouts, fine structural boundaries, and heterogeneous textures that are distributed unevenly across frequency components. To better characterize these complementary properties, we introduce FSSFFormer, a frequency domain-enhanced transformer that leverages multi-resolution wavelet decomposition to enrich representation learning. A fixed 2D discrete wavelet transform (DWT) is first applied to decompose RSIs into low-frequency and high-frequency subbands, allowing the model to explicitly preserve global spatial structures while retaining fine-scale details. On top of these frequency-decomposed features, two lightweight modules are employed: a Spectral Enhancement Attention (SEA) module that selectively emphasizes informative subband responses, and a Spectral-Spatial Context Attention (SSCA) module that improves contextual modeling across the decomposed feature spectrum. The refined representation is subsequently reconstructed in the spatial domain using inverse DWT for dense prediction. Extensive experiments on three public benchmarks, ISPRS Vaihingen, ISPRS Potsdam, and LoveDA, show that FSSFFormer delivers consistently strong performance, with notable gains in boundary fidelity as measured by BF-score. Ablation studies further demonstrate the complementary roles of subband decomposition and the two attention modules in enhancing both structural detail and semantic consistency.

Authors with CRIS profile

Involved external institutions

How to cite

APA:

Li, X., Xu, F., Li, J., Su, Y., Li, L., Lyu, X.,... Kaup, A. (2026). Frequency domain-enhanced spectral-spatial fusion transformer for semantic segmentation of remote sensing images. Information Fusion, 132. https://doi.org/10.1016/j.inffus.2026.104248

MLA:

Li, Xin, et al. "Frequency domain-enhanced spectral-spatial fusion transformer for semantic segmentation of remote sensing images." Information Fusion 132 (2026).

BibTeX: Download