Pre-harvest fruit yield estimation is useful to guide harvesting and marketing resourcing, but machine vision estimates based on a single view from each side of the tree ("dual-view") underestimates the fruit yield as fruit can be hidden from view. A method is proposed involving deep learning, Kalman filter, and Hungarian algorithm for on-tree mango fruit detection, tracking, and counting from 10 frame-per-second videos captured of trees from a platform moving along the inter row at 5 km/h. The deep learning based mango fruit detection algorithm, MangoYOLO, was used to detect fruit in each frame. The Hungarian algorithm was used to correlate fruit between neighbouring frames, with the improvement of enabling multiple-to-one assignment. The Kalman filter was used to predict the position of fruit in following frames, to avoid multiple counts of a single fruit that is obscured or otherwise not detected with a frame series. A "borrow" concept was added to the Kalman filter to predict fruit position when its precise prediction model was absent, by borrowing the horizontal and vertical speed from neighbouring fruit. By comparison with human count for a video with 110 frames and 192 (human count) fruit, the method produced 9.9% double counts and 7.3% missing count errors, resulting in around 2.6% over count. In another test, a video (of 1162 frames, with 42 images centred on the tree trunk) was acquired of both sides of a row of 21 trees, for which the harvest fruit count was 3286 (i.e., average of 156 fruit/tree). The trees had thick canopies, such that the proportion of fruit hidden from view from any given perspective was high. The proposed method recorded 2050 fruit (62% of harvest) with a bias corrected Root Mean Square Error (RMSE) = 18.0 fruit/tree while the dual-view image method (also using MangoYOLO) recorded 1322 fruit (40%) with a bias corrected RMSE = 21.7 fruit/tree. The video tracking system is recommended over the dual-view imaging system for mango orchard fruit count.