Present face video forgery detectors use huge or dual-stream backbones. We present {that a} single, light-weight fusion of two handcrafted cues can obtain larger accuracy with a a lot smaller mannequin. Based mostly on the Xception baseline mannequin (21.9 million parameters), we construct two detectors: LFWS, which provides a 1×1 convolution to mix a low-frequency Wavelet-Denoised Function (WDF) with the phase-only Spatial-Section Shallow Studying (SPSL) map, and LFWL, which merges WDF with Native Binary Patterns (LBP) in the identical manner. This additional module provides solely 292 parameters, conserving the entire at 21.9 million—smaller than F3Net (22.5 million) and fewer than half the dimensions of SRM (55.3 million). Even with this minimal overhead, the fused fashions improve the typical space beneath the curve (AUC) from 74.8% to 78.6% on FaceForensics++ and from 70.5% to 74.9% on DFDC-Preview, beneficial properties of three.8% and 4.4% over the Xception baseline. Additionally they persistently outperform F3Net, SRM, and SPSL in eight public benchmarks, with out additional information or test-time augmentation. These outcomes present that fastidiously paired, handcrafted options, mixed by means of the light-weight fusion block, can present state-of-the-art robustness at a considerably decrease price. Our findings counsel a must reevaluate scale-driven design selections in face video forgery detection.
- ‡ Carnegie Mellon College
- ** Work completed whereas at Apple
