Three Problems in Interoperating with H.264 of Flash Player

H.264 decoding has been part of Flash Player since version 9, but H264 encoding was recently added in version 11. Once Flash Player 11 beta was out I started looking in to integrating video translation in the SIP-RTMP gateway project. For a Flash-to-Flash video conference you do not need to understand the problems related to H.264 in Flash Player because everything is taken care of behind the scenes by Flash Player. Adding H.264 support in the flash-videoio project was relatively straight forward. However if you are building your own translator to interoperate video between Flash Player and some other application you will need to understand these problems.

1) The first problem is that Flash Player doesn't enable H.264 even for decoding if the RTMP connection does not use the new-style "secure" handshake. In the older version handshaking with bytes containing zeros worked, but not when using H.264. Eventually I found about this on reading some open-source-flash (osflash) forum post and incorporated it in my gateway.

2) The H.264 encoder generates some sequence headers (called SPS and PPS) which are essential in decoding the rest of the video data packets. The same is true with AAC audio codec. In particular in live H.264 publish mode, Flash Player generates periodic SPS/PPS packets so the other Flash Player (or SIP phone) can join the call later and still be able to start decoding the stream. However, some existing SIP video phones generate the sequence packets only once at the beginning. The SIP-RTMP gateway needed to be modified to cache the sequence packets received from non-Flash Player client and re-send them with correct timestamp to the Flash Player client that joined the stream late.

3) Looks like Flash Player 11.0 changed something related to buffering of live stream, which causes problems if the SIP side generates multiple slice NALU (primitive data units in H.264) per frame. The Flash Player itself generates one NALU per frame, however some existing SIP video phones (e.g., Bria 3) generate old-style multiple slice per frame and one NALU per slice and cannot be decoded and displayed in Flash Player 11 in live mode. You can read more about the problem. This is not a problem for buffered playback though. (update on 12/12/2011 -- I can verify that this bug has been fixed in Flash Player 11.2.202.96 and video call works fine now between Bria 3 and Flash Player via my SIP-RTMP gateway)

Ekiga SIP phone uses the new-style RTP mechanism for fragmenting a full H.264 frame instead of using multiple slices in H.264 encoding. This can be easily translated to Flash Player and works with my SIP-RTMP gateway. However, Ekiga has another problem in incorrectly interpreting RTP timestamp of received stream which makes it play the stream much slower.

No comments: