Why Did the Human Cross the Road? To Confuse the Self-Driving Car

Home / Why Did the Human Cross the Road? To Confuse the Self-Driving Car

Driving in a busy city, you have to get good at scrutinizing the body language of pedestrians. Your foot hovers somewhere between the gas and the brake, waiting for your brain to triangulate their intent: Is that one trying to cross the street, or just waiting for the bus? Still, a whole lot of the time you hit the brakes for nothing, ending up in a kind of dance with the pedestrian (you go, no you go, no YOU go).

If you think that’s frustrating, then you’ve never been a self-driving car. As human drivers slowly go extinct (and human pedestrians don’t), autonomous vehicles will have to get better at decoding those unspoken intersection interactions. So a startup called Perceptive Automata is tackling that looming problem. The company says its computer vision system can scrutinize a pedestrian to determine not only their awareness of an oncoming car, but their intent—that is, using body language to predict behavior.

Typically if you want a machine to recognize something like trees, you first have humans label tens of thousands of pictures: trees or not trees. It’s a nice, neat binary. It gives the machine learning algorithms a base level of knowledge. But detecting human body language is more complex.

“In the case of a pedestrian, it's not, this person is crossing the road and this person isn't crossing the road. It's, this person isn't crossing the road but they clearly want to,” says Sam Anthony, co-founder of Perceptive Automata. Is the person looking down the road at oncoming traffic? If they’ve got grocery bags, have they set them down to wait, or are they mid-hoist, getting ready to cross?

Perceptive trains its models to look at those kinds of behaviors. They begin with human trainers, who watch and analyze videos of different pedestrians. Perceptive will take a clip of, say, a human looking down the street to cross the road, and manipulate it hundreds of ways—obscuring portions of it, for instance. Maybe sometimes the head is easier to see, maybe sometimes it’s harder. Then they depart from the tree-not-tree binary by asking the trainers a range of questions, such as, "Is that pedestrian hoping to eventually cross the street?" or “If you were that cyclist, would you be trying to stop the car from passing?”

When different parts of the image are harder to see, the human trainers have to think harder about their judgements of body language, which Perceptive can measure by tracking eye movement and hesitation. Maybe the head is harder to make out, for example, and the trainer has to put more thought into it. “This tells us that there's information about the appearance of the person's head in this particular slice that's an important part of how people judge whether that person in that training video is going to cross the street,” Anthony says.

The head is clearly an important clue for human observers, so it’s also an important clue for the machines. “So when the model saw a novel image where the head was important,” Anthony says, “it would be primed based on the training data to believe that people would likely really care about the pixels around the head area, and would produce an output that captured that human intuition.”

By considering cues like where the pedestrian is looking, Perceptive can quantify awareness and intent. A person walking down the sidewalk with their back to the car, for example, isn’t anything to worry about—both unaware and not intending to cross the street. But someone standing at a crosswalk peering down the street is another story. This insight would give a self-driving car extra time to slow down in case the pedestrian does decide to make a run for it.

Perceptive says it’s already working with automakers—it won’t reveal which—to deploy the system, and plans to license the technology to the makers of self-driving cars. (Daimler, for its part, has also studied tracking pedestrian head movements.) It’s also interested in other robotics companies producing machines that will need to interact closely with humans.

Because in this strange new world of complex interactions between people and robots, it’s as much about machines adapting to humans as it is humans adapting to machines. Determining the intent of pedestrians will help, but it won’t be easy. “Knowing the intent of pedestrians would certainly make [autonomous vehicle] deployment safer,” says Carnegie Mellon roboticist Raj Rajkumar, who works in self-driving cars. “It is, however, a very difficult problem to solve perfectly.”

“Consider Manhattan,” Rajkumar adds. And consider a big group of people crossing, specifically a person on the far side of a group from a robocar. “Among this group, one person is either short or starts running to cross quickly after the vehicle has decided to make a turn. Machine vision is not perfect.” And machine vision can get confused by optics, just like humans can. Reflections, the sun dropping low on the horizon, alternating light and dark patches on the road, not to mention heavy rain or snow, all can bamboozle the machines.

Then there’s the simple matter of people just acting weird. Perceptive’s system can pick up on tell-tale cues, but humans aren’t always so consistent. “There were about 7,000 pedestrian fatalities in the US in 2017 alone,” says Rajkumar. “The primary issue is the presence of significant uncertainty and sudden decisions that get made. Most pedestrians are very traffic-conscious most of the time. But, occasionally, a pedestrian is either in a hurry or changes their mind at the last moment and starts crossing the street, or even reverses direction.”

No one’s about to claim that self-driving cars will totally eliminate traffic deaths—not even machines are perfect, and there’s always going to be the unpredictable human pedestrian element. But bit by bit, robocars are getting better at navigating both our world and our vagaries.

About Author

Leave a Reply

Your email address will not be published. Required fields are marked *