-
Notifications
You must be signed in to change notification settings - Fork 101
feat: Add support for Proto and Enum types #2662
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Change-Id: I27c4d06a3e29bb52b0e5391acba9730c05997164
*/ | ||
@InternalApi | ||
static SqlType.Proto protoOf(com.google.bigtable.v2.Type.Proto protoType) { | ||
return Type.SchemalessProto.fromProto(protoType); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need the SchemalessProto variants here?
We have SchemalessStruct so that for e.g. getMap("c", SqlType.mapOf(SqlType.string(), SqlType.struct())
the user doesn't have to specify the schema of the struct. We know that the struct schema will be validated on calls to the nested struct's getters.
I don't really understand how that maps to the proto usecase here. I would think we'd always want to return e.g Map<KeyType, ProtoMsgType> using Type.Proto and Type.Enum below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is primarily to support the SqlType.fromProto() method here:
where we assume that a com.google.bigtable.v2.Type
proto can always be converted to a SqlType object.
In our case, however, the information contained within the com.google.bigtable.v2.Type.Proto
and com.google.bigtable.v2.Type.Enum
messages (http://google3/google/bigtable/common/v2/types.proto;l=449-471;rcl=783815016) is insufficient for instantiating the standard SqlType.Proto
or SqlType.Enum
objects because it doesn't include the actual deserialization logic (i.e., the Message
parser or Enum
resolver functions).
So the SchemalessProto
and SchemalessEnum
variants address this limitation. They allow the creation of SqlType objects based on the type metadata available in the com.google.bigtable.v2.Type
message, without requiring the deserialization functions to be provided at the time of fromProto() invocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove the SqlType factories for the schemaless varients and just directly construct Type.SchemalessProto, etc in fromProto instead?
My main concern is that Im not sure users always notice the @internalapi annotations and might end up using these incorrectly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this feature available on the server yet?
If so can you add this to our ITs here: https://github.com/googleapis/java-bigtable/blob/main/google-cloud-bigtable/src/test/java/com/google/cloud/bigtable/data/v2/it/ExecuteQueryIT.java
If not can we follow on with that as soon as it's available?
case PROTO: | ||
checkExpectedKind(value, Value.KindCase.BYTES_VALUE, type); | ||
case ENUM: | ||
checkExpectedKind(value, KindCase.INT_VALUE, type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: just for consistency let's do Value.KindCase.INT_VALUE and drop the import this added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
|
||
/** | ||
* This is a special version of proto that is intended to only be used in the {@link | ||
* com.google.cloud.bigtable.data.v2.models.sql.StructReader} getters that require types. We don't |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rephrase this to something like 'only intended to be used internally, to facilitate proto schema parsing, which does not have the full information required to parse the protobuf messages'.
'in the getters' makes it sound like we expect users to pass this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
* @see <a | ||
* href="https://developers.google.com/protocol-buffers/docs/reference/java-generated#message">getDefaultInstance()</a> | ||
*/ | ||
<MsgType extends AbstractMessage> MsgType getProtoMessage(int columnIndex, MsgType message); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is proto support launching in preview or going straight to GA? We should annotate this and all other user facing parts of the api with @BetaApi
until it's GA
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's going to be a public preview launch.
Done!
google-cloud-bigtable/src/main/java/com/google/cloud/bigtable/common/Type.java
Outdated
Show resolved
Hide resolved
google-cloud-bigtable/src/main/java/com/google/cloud/bigtable/common/Type.java
Show resolved
Hide resolved
google-cloud-bigtable/src/main/java/com/google/cloud/bigtable/common/Type.java
Outdated
Show resolved
Hide resolved
google-cloud-bigtable/src/main/java/com/google/cloud/bigtable/common/Type.java
Outdated
Show resolved
Hide resolved
google-cloud-bigtable/src/main/java/com/google/cloud/bigtable/common/Type.java
Outdated
Show resolved
Hide resolved
google-cloud-bigtable/src/main/java/com/google/cloud/bigtable/data/v2/models/sql/SqlType.java
Show resolved
Hide resolved
if (left instanceof Type.SchemalessProto && right instanceof Type.SchemalessProto) { | ||
return left.equals(right); | ||
} | ||
if (left instanceof Type.SchemalessProto || right instanceof Type.SchemalessProto) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why could left and right have different type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Type.SchemalessProto
(or Type.SchemalessEnum
) is the type constructed from the proto. Because the proto only has string-typed message_name
(or enum_name
) field, the SchemalessProto
(or SchemalessEnum
) also only contains a string of the message/enum name.
On the other hand, the Type.Proto
(or Type.Enum
) objects are constructed at runtime from the user-provided actual protobuf Message
object (or the forNumber
function for enums), which are actually used for deserialization.
Here is an example user code:
import com.example.my_package.MyMessage;
import com.example.my_package.MyEnum;
...
try (ResultSet rs = dataClient.executeQuery(Statement.of(
"SELECT CAST(cf['col1'] AS my_bundle.my_package.MyMessage) AS protoCol, "
+ "CAST(cf['col2'] AS my_bundle.my_package.MyEnum) AS enumCol FROM my_table"))) {
while (rs.next()) {
MyMessage protoMessageVal = rs.getProtoMessage("protoCol", MyMessage.getDefaultInstance());
MyEnum protoEnumVal = rs.getProtoEnum("enumCol", MyEnum::forNumber);
}
}
So here, when the users call getProtoMessage
, two different type objects are created:
- A
Type.Proto
object will be constructed from the user-providedMyMessage.getDefaultInstance()
. This object contains the concrete Message instance needed for deserialization. - Also, a
Type.SchemalessProto
object is constructed from the column's metadata. The code looks up the type forprotoCol
and finds its schema defined in the below proto:and therefore constructs akind { proto_type { message_name: "my_bundle.my_package.MyMessage" } }
Type.SchemalessProto
that only contains the string "my_bundle.my_package.MyMessage".
Hence during the type checking, one side will be Type.Proto
object and the other side will be Type.SchemalessProto
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks for the explanation! Do we need to check for Type.Proto && Type.SchemalessProto then?
else if (left instanceof SchemalessProto) {
return right instanceof Proto;
} else if (right instance of SchemalessProto) {
return left instance of Proto;
}
And i'm not sure how we can check the actual value for different types or if that's necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: check for Type.Proto && Type.SchemalessProto: I think line 387-389 sort of did the check?
re: check the actual value: if a user provides a Proto message that doesn't match the proto value, the value parsing logic in https://github.com/googleapis/java-bigtable/pull/2662/files#diff-561b40482ef38989be63d2b35775f5c21a4be4576a8d8435fb40966d276ef7c3R358-R363 will fail and return the parsing error.
clirr failed:
You'll need to update https://github.com/googleapis/java-bigtable/blob/main/google-cloud-bigtable/clirr-ignored-differences.xml to include the new methods. |
google-cloud-bigtable/src/main/java/com/google/cloud/bigtable/data/v2/models/sql/SqlType.java
Outdated
Show resolved
Hide resolved
- Remove SchemalessProto/Enum SqlType APIs - Add @BetaApi - Suppress clirr failures - Change docstring etc - Change several internal function names in SchemalessProto/Enum Type Change-Id: Ifbeb710ac77245a741500b305c2f96374a4717e9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay. I have a few more minor comments but otherwise LGTM
return ProtoStruct.create(schema, value.getArrayValue()); | ||
case PROTO: | ||
try { | ||
return ((SqlType.Proto<?>) type).getParserForType().parseFrom(value.getBytesValue()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I think pulling the cast onto its own line makes this more readable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
* | ||
* @param <T> Java type of the protobuf message | ||
*/ | ||
interface Proto<T extends AbstractMessage> extends SqlType<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and Enum should get the BetaApi tag as well in case they need to change during preview
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
return left.equals(right); | ||
} | ||
if (left instanceof Type.SchemalessProto || right instanceof Type.SchemalessProto) { | ||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this still check the messageNames match?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this would be simpler if we pulled this logic into the Type equals methods, not a strong opinion on that though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The challenge is that there is a java_package
option in a .proto file (e.g., option java_package = "com.google.cloud.bigtable.data.v2.test";
), and that often doesn't match the protobuf package name in the server's metadata (e.g., package bigtable.data.v2.test;
).
I don't know how to properly deal with this discrepancy so I ended up omitting the check. Worth noting that java-spanner also doesn't check the message name AFAIK (not 100% sure though).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re pulling the type matching logic in Type's equals() method:
I think I tried that at some point, but the equals()
implementations gets messy in order to handle all the various cross-type comparisons.. so I ended up throwing all those away.
On the bright side, the current code clearly centralizes the logic for comparing different types!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline. Updated the code so that the unqualified message names must match when comparing across schemaless and concrete Proto/Enum types.
return left.equals(right); | ||
} | ||
if (left instanceof Type.SchemalessEnum || right instanceof Type.SchemalessEnum) { | ||
return left.getCode().equals(right.getCode()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question as above, should this check enum name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as the above comment.
Change-Id: Ie24e59a43eeca52075465f930d269de7482aaadc
… concrete types Change-Id: Ide2ce0fe2e887daacc844b441276034b5d90ccfc
*/ | ||
@InternalApi | ||
static boolean typesMatch(SqlType<?> left, SqlType<?> right) { | ||
Function<String, String> getUnqualifiedMessageName = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you pull this into a static helper function? There's a cost to creating a new Function object every call, and I think it will be a more readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
Change-Id: Iacef70723df560273fef94aacb85c7f6a4819bc3
Change-Id: I27c4d06a3e29bb52b0e5391acba9730c05997164
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #<issue_number_goes_here> ☕️
If you write sample code, please follow the samples format.