Monday, May 11, 2026

Bootstrapping Signal Language Annotations with Signal Language Fashions


AI-driven signal language interpretation is restricted by an absence of high-quality annotated information. New datasets together with ASL STEM Wiki and FLEURS-ASL include skilled interpreters and 100s of hours of knowledge however stay solely partially annotated and thus underutilized, partially as a result of prohibitive prices of annotating at this scale. On this work, we develop a pseudo-annotation pipeline that takes signed video and English as enter and outputs a ranked set of probably annotations, together with time intervals, for glosses, fingerspelled phrases, and signal classifiers. Our pipeline makes use of sparse predictions from our fingerspelling recognizer and remoted signal recognizer (ISR), together with a Ok-Shot LLM method, to estimate these annotations. In service of this pipeline, we set up easy but efficient baseline fingerspelling and ISR fashions, reaching state-of-the-art on FSBoard (6.7% CER) and on ASL Citizen datasets (74% top-1 accuracy). To validate and supply a gold-standard benchmark, knowledgeable interpreter annotated practically 500 movies from ASL STEM Wiki with sequence-level gloss labels containing glosses, classifiers, and fingerspelling indicators. These human annotations and over 300 hours of pseudo-annotations are being launched in supplemental materials.

Related Articles

Latest Articles