Answers to multiple questions (Does the person on the ladder have three points of contact? Are they using the ladder as stilts to move around?) are combined to determine whether the ladder in the picture is being used safely. “Our system has over a dozen layers of questioning just to get to that answer,” Lorenzo says. DroneDeploy has not publicly released its data for review, but he says he hopes to have his methodology independently audited by safety experts.
The missing 5%
Using vision language models for construction AI shows promise, but there are “some pretty fundamental issues” to resolve, including hallucinations and the problem of edge cases, those anomalous hazards for which the VLM hasn’t trained, says Chen Feng. He leads New York University’s AI4CE lab, which develops technologies for 3D mapping and scene understanding in construction robotics and other areas. “Ninety-five percent is encouraging—but how do we fix that remaining 5%?” he asks of Safety AI’s success rate. Feng points to a 2024 paper called “Eyes Wide Shut?”—written by Shengbang Tong, a PhD student at NYU, and coauthored by AI luminary Yann LeCun—that noted “systematic shortcomings” in VLMs. “For object detection, they can reach human-level performance pretty well,” Feng says. “However, for more complicated things—these capabilities are still to be improved.” He notes that VLMs have struggled to interpret 3D scene structure from 2D images, don’t have good situational awareness in reasoning about spatial relationships, and often lack “common sense” about visual scenes.
Lorenzo concedes that there are “some major flaws” with LLMs and that they struggle with spatial reasoning. So Safety AI also employs some older machine-learning methods to help create spatial models of construction sites. These methods include the segmentation of images into crucial components and photogrammetry, an established technique for creating a 3D digital model from a 2D image. Safety AI has also trained heavily in 10 different problem areas, including ladder usage, to anticipate the most common violations.
Even so, Lorenzo admits there are edge cases that the LLM will fail to recognize. But he notes that for overworked safety managers, who are often responsible for as many as 15 sites at once, having an extra set of digital “eyes” is still an improvement.
Aaron Tan, a concrete project manager based in the San Francisco Bay Area, says that a tool like Safety AI could be helpful for these overextended safety managers, who will save a lot of time if they can get an emailed alert rather than having to make a two-hour drive to visit a site in person. And if the software can demonstrate that it is helping keep people safe, he thinks workers will eventually embrace it.
However, Tan notes that workers also fear that these types of tools will be “bossware” used to get them in trouble. “At my last company, we implemented cameras [as] a security system. And the guys didn’t like that,” he says. “They were like, ‘Oh, Big Brother. You guys are always watching me—I have no privacy.’”
Older doesn’t mean obsolete
Izhak Paz, CEO of a Jerusalem-based company called Safeguard AI, has considered incorporating VLMs, but he has stuck with the older machine-learning paradigm because he considers it more reliable. The “old computer vision” based on machine learning “is still better, because it’s hybrid between the machine itself and human intervention on dealing with deviation,” he says. To train the algorithm on a new category of danger, his team aggregates a large volume of labeled footage related to the specific hazard and then optimizes the algorithm by trimming false positives and false negatives. The process can take anywhere from weeks to over six months, Paz says. With training completed, Safeguard AI performs a risk assessment to identify potential hazards on the site. It can “see” the site in real time by accessing footage from any nearby internet-connected camera. Then it uses an AI agent to push instructions on what to do next to the site managers’ mobile devices. Paz declines to give a precise price tag, but he says his product is affordable only for builders at the “mid-market” level and above, specifically those managing multiple sites. The tool is in use at roughly 3,500 sites in Israel, the United States, and Brazil.