The surface roughness between the wheel and rail has a huge influence on the rolling noise level. The presence of the third body such as frost or grease at the wheel-rail interface contributes towards change in adhesion coefficient resulting in the generation of acoustic noise at various levels. Therefore, it is possible to estimate adhesion conditions between the wheel and rail from the analysis of audio patterns originating from wheel-rail interaction. In this study, a new approach to estimate adhesion condition is proposed which takes rolling noise as input. Acoustic sensors (Behringer B-5 condenser microphone) for audio data acquisition were installed on a scaled bogie test rig. The cardioid configuration of the sensor was chosen for picking up the source signal while avoiding the surround sound. The test rig was operated at the speed of 40 and 60 rpm multiple times in both dry and wet friction conditions. The proposition behind running the setup several times was to get a large set of acoustic signals under different adhesion conditions because there exists no such dataset in the public repositories. 30 seconds interval of rolling noise data from the continuous audio signal was extracted as samples for the training W-RICE model. Each sample was pre-processed using the Librosa python package to extract seven basic features/signatures: zero-crossing rate, spectral centroid, spectral bandwidth, spectral roll-off, MFCCs (Mel-frequency cepstral coefficients), RMS (root-mean-square) energy and Chroma frequencies. These features were used as input to an MLP (Multi-Layered Perceptron) neural network and trained for four different classes (dry and wet conditions with speed of 40 rpm and 60 rpm each). MLP model training and testing were implemented in Keras deep learning library. MLPs can automatically learn useful and meaningful relationships among the input features for classifying the input audio sample to one of the output classes/categories. For four output categories considered, 100% classification accuracy was achieved on the test set while accuracies of 99.56%, 73.24%, and 63.59% were achieved for three validation sets consisting of audio samples of varying noise levels.