Waymo "teamed up with Chandler Police and Fire in Arizona to set up an "emergency vehicle testing day." The authorities had ambulances, police cars, motorcycles and firetrucks pass by, trail and lead the Chryslers all day and night while the minivans' sensors collected as much data as possible from all speeds, distances and angles.
Waymo was able to compile a library of sights and sounds from the event thanks to its minivans' upgraded sensors. The new suite of sensors include an audio detection system designed in-house and an upgraded LiDAR and vision system, which are capable of seeing emergency vehicles and their flashing lights. They also allow the technology to recognize other types of emergency vehicles it hasn't seen yet." (source)
Most likely they would have to be specific sensors that are designed to capture the visual and audio cues from the emergency vehicles. They also need to identify where the sound is coming from. Needless to say, this is a very complex scenario, and it may take more time to get it perfect.