WebRTC for the Streamer
We wrote this site to explain/share the ways that we believe WebRTC (WHIP/WHEP) can improve streaming.
These things are available in other protocols (SRT, MoQ, RIST). The protocol itself isn't the important part. We
would just like streamers to know what is possible. These improvements can change so many things around streaming for the better.
WebRTC for the Streamer was written by the developers who added
WHIP/WebRTC support to OBS
and maintain
Broadcast Box. A companion piece to
this site is WebRTC for the Curious.
Speed
With WebRTC you get sub-500ms latency, you will get the experience of a video call. Having this latency can change the dynamics of streaming.
Streaming together
Streaming to a private group of friends is more connected when the latency is lower. It's a lot of fun to recreate the "sitting on the couch together"
experience when you stream gameplay/movie to your friends.
Co-streaming to an audience
When co-streaming to an audience you want the lowest latency possible. It allows you to have authentic conversations with the other streamer, instead of
an awkward back and forth. High latency leads to desync between you and your partner's gameplay. It is confusing as a viewer to see events happening
at different times on the two feeds.
Audience interaction
WebRTC allows you to respond to chat like a real conversation. It feels like a more connected/human experience to talk with people directly, and not responding seconds later.
The audience interaction doesn't have to be text only. Some games allow the audience to change the game environment itself. Seeing it instantly react when they press
the button is kind of magical.
Privacy
WebRTC provides APIs that lets broadcasters encrypt media and viewers decrypt it so the server has no access to the video. The server can support
all different types of clients thanks to simulcast.
flowchart LR
Broadcaster[Broadcaster]
Server[WebRTC Server]
ViewerA[Viewer A]
ViewerB[Viewer B]
Broadcaster <-->|P2P key exchange| ViewerA
Broadcaster <-->|P2P key exchange| ViewerB
Broadcaster -->|Encrypted media| Server
Server -->|Encrypted media| ViewerA
Server -->|Encrypted media| ViewerB
Self Hosted
WebRTC has quite a few self-hosting options. This has happened for a few reasons.
Wide usage outside of streaming
WebRTC is widely used outside of broadcasting. It is used for robotics, conferencing, "AI voice assistants" and more.
So it can benefit from the ecosystem that existed before WebRTC broadcasting.
One protocol for publish+playback
If you are using RTMP you have to use another protocol for playback (usually HLS/DASH). With WHIP and WHEP you can use WebRTC for both,
which means fewer moving parts to run.
Cheaper to run/no transcoding
A WebRTC server just forwards media packets instead of transcoding the stream. It's a lot easier to deploy/manage/scale because of this.
Flexible topologies (P2P and Mesh)
WebRTC isn't limited to client-server. You can connect viewers directly (P2P) or in a Mesh. This makes self-hosting easier and cheaper since you don't always need a powerful central server to distribute media.
graph LR
A[OBS] --> B[Browser]
graph LR
A[OBS] --> B[User A]
B --> C[User B]
B --> D[User C]
D --> E[User E]
Everyone Streams
Streaming from the browser increases accessibility. The video quality/composition won't be as good, but these voices are important.
Everyone can broadcast
Streaming today requires that you install dedicated software. When configuring your software you have to be aware of things like bitrate, codecs and watch your resource usage.
Broadcasting from the browser significantly reduces the barrier of entry to streaming. So many new voices and types of streams will be available when it is opened to more people.
Browser is everywhere
A web browser is available everywhere. Phones, TVs, tablets and smart cars etc... this allows you to broadcast from all these places where it wasn't available before.
Also many people are using computers where they aren't able to install additional software. It would be great to enable them to stream even if they don't have root access
to the machine.
Stream Anywhere
WebRTC gives a lot of flexibility in how you can stream. You can configure it to have the lowest latency possible (at the expense of video quality)
or you can run it over TCP and have perfect video quality but higher latency. These are some of the knobs that WebRTC gives you.
Protocol choice (TCP or UDP)
WebRTC allows you to choose per session if you want TCP or UDP. If you pick TCP you will have zero packet loss, but may experience higher delay.
If you pick UDP you get more control over the experience. You can use things like FEC+NACK to accommodate for a poor network, but still keep lowest latency possible.
Sender driven bandwidth estimation
With WebRTC a broadcaster can dynamically change bitrate if needed. WebRTC has a mechanism built into the protocol that is constantly measuring packet loss and delivery time
RFC 8888. This means instead of setting a static bitrate you can dynamically change to get the best experience possible
for your network/hardware.
This is a simplified example. Broadcasting software starts at 1080p and tries to upgrade to 2160p. If that results in a bad experience it drops back to 1080p. WebRTC provides the
receiver feedback needed to make these decisions.
sequenceDiagram
OBS->>Server: Sending 1080p
Server-->>OBS: Zero Packet Loss, 50ms trip time
OBS->>Server: Sending 2160p
Server-->>OBS: Packet Loss, 150ms trip time
OBS->>Server: Sending 1080p
Server-->>OBS: Zero Packet Loss, 50ms trip time
Forward error correction (FEC)
Forward error correction allows you to send redundancy/duplicated info ahead of time so packet loss has no impact on the stream. It consumes extra bandwidth, but is a great solution
if you are running over satellite/cellular and have bandwidth available but are combating packet loss.
flowchart TD
subgraph Sender
A[Video Frame A]
B[Video Frame B]
C[Video Frame C]
A2[Video Frame A]
B2[Video Frame B]
C2[Video Frame C]
end
subgraph Receiver
ARecv[Video Frame A]
BRecv[Video Frame B]
CRecv[Video Frame C]
end
I((Internet))
Sender-->I
I-->Receiver
style A fill:red
style B2 fill:red
style C fill:red
style ARecv fill:green
style BRecv fill:green
style CRecv fill:green
Negative-acknowledgement (NACK)
NACK is another error correction technique. Instead of sending duplicated data ahead of time the receiver asks for missing packets again.
This is a good fit when packet loss is small and the stream still has time to repair the frame. It uses less bandwidth than FEC, but if
the network is already too delayed the resent packet might arrive too late to be useful.
sequenceDiagram
participant OBS
participant Server
OBS->>Server: Sending packet 101
Note over OBS,Server: Packet 102 is lost
OBS->>Server: Sending packet 103
Server-->>OBS: NACK packet 102
OBS->>Server: Re-send packet 102
Server->>Server: Build video frame from 101, 102, 103
Mobility (ICE renomination)
Switching between WiFi/cellular used to require a full reconnect of the stream. With WebRTC you can switch networks without
disconnecting anything. The network switch also allows for easier administration. Servers can be updated/restarted without requiring
users to fully disconnect.
Connection bonding
WebRTC allows accepting video from multiple sources. You can combine multiple 5G/WiFi interfaces and send video
over them. This technique is niche, but it opens up lots of interesting options. You can send your most important video
feed over your most stable interface. Some users will also send their more latency sensitive media (audio) over one interface
while using others for video.
Higher Quality
WebRTC does not hardcode video codecs into the protocol. Codecs are negotiated at runtime so other codecs (like HEVC and AV1) can be trivially added.
AV1 is designed to deliver about 30% better quality at the same bitrate, so a 6 Mbps stream can look noticeably better without requiring more upload bandwidth.
Custom codecs could also be added if both client/server supported so you can do custom things if you need.
Simulcast
When streaming video you will need to support different types of users. A phone on 5G works best with 1080p while a
desktop computer on fiber can support 2160p. With simulcast the broadcaster generates and uploads all quality levels.
The server then forwards the video feeds appropriately.
flowchart LR
A[OBS/FFmpeg]
B[Server]
C[Viewer]
D[Viewer]
E[Viewer]
A --> |2160p|B
A --> |1440p|B
A --> |1080p|B
B --> |2160p|C
B --> |1440p|D
B --> |1080p|E
linkStyle 0,3 stroke: red
linkStyle 1,4 stroke: green
linkStyle 2,5 stroke: blue
Traditionally with broadcast software you would upload the 2160p feed, and the server would transcode
down to the other layers. WebRTC's approach to this problem has a few benefits.
Better quality
A 1080p video stream generated via transcoding will not have the same quality as one generated in OBS directly. Transcoding suffers from generational loss.
When video is decoded and re-encoded additional compression artifacts and loss of detail will happen. With simulcast you only encode once.
More control
When doing transcodes you only control the encoding quality of one stream. With simulcast you can ensure all your video streams are high quality.
Lower latency
Transcoding adds additional latency. Running an additional decode + encode for your transcoded feeds means they will have a different latency than your
uploaded feed. With simulcast all your feeds run at the same latency.
Simpler servers
Running servers without transcoding is much easier. Transcoding requires a lot of computing power, while simulcast just means extra upload traffic.
No tampering/ad insertion
Simulcast means you pass through video feeds untampered. If the server is re-encoding your video it can insert watermarks/ads or modify the video in unexpected ways.
With E2E encryption broadcasters can even have it that the server can't even decode your video. They will see a stream of bytes pass through, but will not actually be able to watch the video.