Exploring How Protobuf OneOfs Are Represented

2024-12-05

This is a short exploration of how Protobuf3 OneOf fields are represented using Golang as our exploration medium.

OneOf types, aka Tagged Unions are data structures that are used to hold one of a finite list of distinct types. A variable of a tagged union type can hold a value of one of several types defined for that tagged union type. This might be easier to understand with an example via pseudocode.

type Media = Movie | Show | Short

var favorite1: Media = Movie{}
var favorite2: Media = Show{}
var favorite3: Media = Short{}

var favorites: [Media] = [favorite1, favorite2, favorite3]

Variables favorite1, favorite2, and favorite3 all have the data type Media but have values Movie, Show and Short, which works because the Media data type has been defined as a Tagged Union of all 3 other data types. Consequently, the favorites variable, which is defined as an array of Media values, can have values of the 3 defined data types in it.

Crucially, the values all take up the same space in memory and only one can be set at any given time. My knowledge here becomes fuzzy, but typically, the memory allocated for a OneOf corresponds to the largest possible field within it, with additional space for internal bookkeeping.

Protobuf has OneOf as its implementation of the Tagged Union data type. A message field can be marked as oneof with the variants specified.

message Event {
	oneof media {
		Movie movie = 1;
		Show show = 2;
		Short short = 3;
	}
	int64 timestamp = 4;
	optional bool had_fun = 5;
}

message Movie {
	string title = 1;
	string director = 2;
}

message Show {
	string title = 1;
	string runner = 2;
}

message Short {
	string title = 1;
	string shorty = 2; // I couldn't think of something unique, here.
}

Protobuf allows assigning only one of the 3 fields out of movie, show and short, and setting one automatically clears any of the others set, ensuring that memory for only one of the types gets consumed.

Protobuf’s descriptor.proto is a protobuf file that describes the structure of protobuf files. It hurts my head a little to think about it.

Let’s inspect the DescriptorProto message, which is the schema for a protobuf Message.

// Describes a message type.
message DescriptorProto {
  optional string name = 1;

  repeated FieldDescriptorProto field = 2;
  ...
  ...
  repeated OneofDescriptorProto oneof_decl = 8;
  ...
}

Let’s look at the relevant parts of FieldDescriptorProto and OneOfDescriptorProto and then we’ll talk about them.

// Describes a field within a message.
message FieldDescriptorProto {
	...
	// If set, gives the index of a oneof in the containing type's oneof_decl
	// list.  This field is a member of that oneof.
	optional int32 oneof_index = 9;
	...
	// If true, this is a proto3 "optional". When a proto3 field is optional, it
	// tracks presence regardless of field type.
	//
	// When proto3_optional is true, this field must belong to a oneof to signal
	// to old proto3 clients that presence is tracked for this field. This oneof
	// is known as a "synthetic" oneof, and this field must be its sole member
	// (each proto3 optional field gets its own synthetic oneof). Synthetic oneofs
	// exist in the descriptor only, and do not generate any API. Synthetic oneofs
	// must be ordered after all "real" oneofs.
	//
	// For message fields, proto3_optional doesn't create any semantic change,
	// since non-repeated message fields always track presence. However it still
	// indicates the semantic detail of whether the user wrote "optional" or not.
	// This can be useful for round-tripping the .proto file. For consistency we
	// give message fields a synthetic oneof also, even though it is not required
	// to track presence. This is especially important because the parser can't
	// tell if a field is a message or an enum, so it must always create a
	// synthetic oneof.
	//
	// Proto2 optional fields do not set this flag, because they already indicate
	// optional with `LABEL_OPTIONAL`.
	optional bool proto3_optional = 17;
}

// Describes a oneof.
message OneofDescriptorProto {
  optional string name = 1;
  optional OneofOptions options = 2;
}

I pasted the whole documentation for the proto3_optional field since it has heavy implications on how OneOfs are implemented. OneOfDescriptorProto is relatively simple and for us only the name field matters.

To fully grasp how to detect and make use of OneOf fields, we must understand both how OneOfs are represented and how optional fields are represented.

Remember that optional bool had_fun field? That wasn’t there just for flavor but to demonstrate how optional fields need to be taken into account when making use of OneOfs.

Looking at DescriptorProto we see a field repeated OneofDescriptorProto oneof_decl = 8;. This field is what lists all OneOf fields in the message. Using our very first example of message Event, oneof_decl would contain 1 value:

DescriptorProto{
	Name: "Event",
	OneofDecl: []OneofDescriptorProto{
		{
			Name: "media",
		},
		{
			Name: "_had_fun" // Keep an eye on this one. We'll get back to it.
		}
	},
}

Wait, what’s that _had_fun doing in there? We didn’t define that as an OneOf! Keep reading.

The fields of the message would have info on each field:

DescriptorProto{
	Name: "Event",
	Field: []FieldDescriptorProto{
		{
			Name: "movie",
			OneOfIndex: 0,
		},
		{
			Name: "show",
			OneOfIndex: 0,
		},
		{
			Name: "short",
			OneOfIndex: 0,
		},
		{
			Name: "timestamp",
			OneOfIndex: nil,
		},
		{
			Name: "had_fun",
			OneOfIndex: 1,
		},
	},
}

Of interest to is the value of the OneOfIndex property. The description of the property tells us how it works.

If set, gives the index of an OneOf in the containing type’s list. This field is a member of that OneOf.

That seems straightforward enough. In the Event message, we have two OneOfs listed (even though we did not list the second one, ourselves). And each of the fields tells us whether it is part of one of the OneOfs (indicating by index which OneOf it belongs to) or not (if the value is nil).

As expected, movie, show, and short have the OneOfIndex of 0, part of the OneOf located at the 0th index of the OneofDecl list, media.

But why is had_fun is also a part of an OneOf?

Turns out, because of some backwards compatibility shenanigans, optional fields are also represented as OneOfs, with the two options being (I suppose) either the concrete type or nothing.

So marking had_fun with the optional label causes the descriptor to generate a synthetic OneOf. A OneOf that’s not a real OneOf.

When parsing this schema for our own purposes we have to be aware of this synthetic OneOf. We detect real OneOfs by checking if the OneOfIndex is not nil and the Proto3Optional is false. If both conditions are met, we can be sure that the field is part of a real OneOf.

All this was discovered over a frustrating afternoon of reading through documentation and experimenting. It was annoying but also rewarding. Hope you enjoyed the read.


More posts like this

Protopath Problems in Go

2024-09-11 | #golang #grpc #protobuf

Failed to compute set of methods to expose - Symbol not found This cryptic error has reared its ugly head a few times in the past when working with grpc in Golang. I can’t remember how I’ve solved it before. Probably with some combination of stackoverflow answers and random github gists with advice that I have now forgotten. When it happened most recently, I decided to try to actually understand what was going on under the hood.

Continue reading 


Building Rant

2022-11-14 | #golang #humor #intepreters

Intro Sometime in mid to late 2021 (which is a period I’ve entirely made up because I don’t actually remember when any of this happened), I needed to rant on my company Slack. Something had happened to ruin my day and I needed to get my feelings out. But ranting takes more effort than you’d think. If you want your rant to have impact there needs to be just the right amount of !

Continue reading 