In this test, there are three rectangles (masses) fixed on the shaking table at different heights, thus, each has its unique resonant frequency in horizontal direction although they have the same mass. The frequencies of the shaking table are increased from 4 Hz to 13.65 Hz to excite these masses and cause their harmonic vibration. From the video, 150 frames are randomly selected from a total of 6,674 frames and labeled for training the Mask R-CNN, whereas the image size is 640×480. SIFT is not applied to smooth the measurement here, since the goal of this test is to detect the frequencies instead of accurate amplitudes of vibration, which is in pixel unit in this test. Thus, the motion of the bounding box represents the translation of each object. The LK tracker is utilized to verify our method by tracking the same vibration of the shaking table. On one hand, all the raw data are processed by Butterworth filter and FFT is applied to extract the frequencies for each tracking target. The filtered vibrations of the shaking table are obtained by the LK tracker and Mask R-CNN. There are three designed frequencies of vibration excited by the table, such as 4 Hz, 6.35 Hz and 11.35 Hz. Both methods capture these frequencies with a less than 2.6% error. On the other hand, the vibrations of three rectangles are measured by the Mask R-CNN, and their resonant frequencies are very close to the intended frequencies (i.e., 4 Hz, 6.35 Hz and 11.35 Hz). The error rate for this measurement is 0%, 0.3% and 2.6%, respectively. This indicates that the proposed Mask R-CNN can be used alone to track multiple objects and capture the vibration of them precisely.
Related paper: Automatic Displacement and Vibration Measurement in Laboratory Experiments with a Deep Learning Method, Bai Y., Ramzi M. A., Sezen H., Yilmaz A., IEEE SENSORS 2021, IEEE SENSORS 2021 Conference, 2021.