KEMBAR78
ExternalAccountCredentials serialization is broken · Issue #1347 · googleapis/google-auth-library-java · GitHub
Skip to content

ExternalAccountCredentials serialization is broken #1347

@kohsuke

Description

@kohsuke

ExternalAccountCredentials has protected transient HttpTransportFactory transportFactory, which becomes null if this object gets serialized & restored. The design intent of this appears to be described in #67, but the implementation in ExternalAccountCredentials lacks the crucial part, quoted below:

When serializing an option object we only transmit the class name for the transport factory and try to instantiate the factory from its classname upon deserialization.

The same problem has been seen and fixed in #132. I believe we need to bring the same fix to ExternalAccountCredentials

More details

NPE happens at the following call site:

HttpRequestFactory requestFactory = transportFactory.create().createRequestFactory();

Full stack trace below:

com.google.cloud.spark.bigquery.repackaged.io.grpc.StatusRuntimeException: UNAUTHENTICATED: Failed computing credential metadata
	at com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.rpc.ApiExceptionFactory.createException(ApiExceptionFactory.java:116)
	at com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.rpc.ApiExceptionFactory.createException(ApiExceptionFactory.java:41)
	at com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:86)
	at com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:66)
	at com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.grpc.ExceptionResponseObserver.onErrorImpl(ExceptionResponseObserver.java:82)
	at com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.rpc.StateCheckingResponseObserver.onError(StateCheckingResponseObserver.java:84)
	at com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.grpc.GrpcDirectStreamController$ResponseObserverAdapter.onClose(GrpcDirectStreamController.java:148)
	at com.google.cloud.spark.bigquery.repackaged.io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
	at com.google.cloud.spark.bigquery.repackaged.io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
	at com.google.cloud.spark.bigquery.repackaged.io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
	at com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.grpc.ChannelPool$ReleasingClientCall$1.onClose(ChannelPool.java:546)
	at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.DelayedClientCall$DelayedListener$3.run(DelayedClientCall.java:489)
	at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.DelayedClientCall$DelayedListener.delayOrExecute(DelayedClientCall.java:453)
	at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.DelayedClientCall$DelayedListener.onClose(DelayedClientCall.java:486)
	at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:567)
	at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:71)
	at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:735)
	at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:716)
	at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
	at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)
	Suppressed: java.lang.RuntimeException: Asynchronous task failed
		at com.google.cloud.bigquery.connector.common.StreamCombiningIterator.hasNext(StreamCombiningIterator.java:152)
		at com.google.cloud.bigquery.connector.common.ReadRowsResponseInputStreamEnumeration.loadNextResponse(ReadRowsResponseInputStreamEnumeration.java:57)
		at com.google.cloud.bigquery.connector.common.ReadRowsResponseInputStreamEnumeration.<init>(ReadRowsResponseInputStreamEnumeration.java:37)
		at com.google.cloud.spark.bigquery.v2.context.ArrowColumnBatchPartitionReaderContext.makeSingleInputStream(ArrowColumnBatchPartitionReaderContext.java:234)
		at com.google.cloud.spark.bigquery.v2.context.ArrowColumnBatchPartitionReaderContext.<init>(ArrowColumnBatchPartitionReaderContext.java:224)
		at com.google.cloud.spark.bigquery.v2.context.ArrowInputPartitionContext.createPartitionReaderContext(ArrowInputPartitionContext.java:89)
		at com.google.cloud.spark.bigquery.v2.Spark32BigQueryPartitionReaderFactory.createColumnarReader(Spark32BigQueryPartitionReaderFactory.java:21)
		at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.advanceToNextIter(DataSourceRDD.scala:79)
		at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63)
		at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
		at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
		at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown Source)
		at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
		at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:35)
		at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.hasNext(Unknown Source)
		at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:968)
		at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
		at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
		at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
		at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
		at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
		at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
		at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:205)
		at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65)
		at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
		at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
		at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
		at org.apache.spark.scheduler.Task.run(Task.scala:138)
		at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
		at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1516)
		at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
		... 3 more
Caused by: com.google.cloud.spark.bigquery.repackaged.io.grpc.StatusRuntimeException: UNAUTHENTICATED: Failed computing credential metadata
	at com.google.cloud.spark.bigquery.repackaged.io.grpc.Status.asRuntimeException(Status.java:537)
	... 17 more
Caused by: java.lang.NullPointerException: Cannot invoke "com.google.cloud.spark.bigquery.repackaged.com.google.auth.http.HttpTransportFactory.create()" because "this.transportFactory" is null
	at com.google.cloud.spark.bigquery.repackaged.com.google.auth.oauth2.AwsCredentials.retrieveResource(AwsCredentials.java:213)
	at com.google.cloud.spark.bigquery.repackaged.com.google.auth.oauth2.AwsCredentials.retrieveResource(AwsCredentials.java:202)
	at com.google.cloud.spark.bigquery.repackaged.com.google.auth.oauth2.AwsCredentials.getAwsRegion(AwsCredentials.java:338)
	at com.google.cloud.spark.bigquery.repackaged.com.google.auth.oauth2.AwsCredentials.retrieveSubjectToken(AwsCredentials.java:173)
	at com.google.cloud.spark.bigquery.repackaged.com.google.auth.oauth2.AwsCredentials.refreshAccessToken(AwsCredentials.java:152)
	at com.google.cloud.spark.bigquery.repackaged.com.google.auth.oauth2.OAuth2Credentials$1.call(OAuth2Credentials.java:269)
	at com.google.cloud.spark.bigquery.repackaged.com.google.auth.oauth2.OAuth2Credentials$1.call(OAuth2Credentials.java:266)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at com.google.cloud.spark.bigquery.repackaged.com.google.auth.oauth2.OAuth2Credentials$RefreshTask.run(OAuth2Credentials.java:633)
	... 3 more

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions