Recently, the integration of multiple remote sensing modalities has gained significant attention in land use classification research, offering improved performance. However, this approach comes with additional challenges such as modality-specific feature extraction and effective feature fusion. In this work, a DL-based technique is proposed that utilizes dual remote sensing modalities (HSI and LiDAR) for land use classification. The proposed technique consists of three modules: 1) a CNN-based feature extraction module, 2) Attention modules designed specifically for each modality, i.e., Convolution Block Attention Module (CBAM) and a spatial attention module for the HSI and the LiDAR features respectively. 3) A fusion module to fuse separately extracted features of both modalities. The features extracted from convolution blocks are subsequently enhanced using attention modules, later, feature-level fusion is performed, and final classification is achieved. The novel combination of these modules has demonstrated a notable performance gain over the CNN-based approaches across different classes and metrics on the Trento dataset. It achieves 98.21% average accuracy on the Trento dataset, which shows its significant potential to be applied in resource management and planning and environmental monitoring.