Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.1k views
in Technique[技术] by (71.8m points)

amazon web services - Why do I need to set the `transformation_ctx` parameter when calling transformation and sink operations for AWS Glue bookmark to work?

The AWS Glue Bookmark document (https://docs.aws.amazon.com/glue/latest/dg/monitor-continuations.html) seems to suggest one has to pass a transformation_ctx parameter to source, transform and sink operation for the bookmark to work. This is reflected in the sample code in that page, where invocation of all of create_dynamic_frame.from_catalog(), ApplyMapping.apply() and write_dynamic_frame.from_options() are passed with a transformation_ctx value.

I can understand the point to pass such a transformation_ctx to create_dynamic_frame.from_catalog() method, as AWS Glue needs to store the information about files which have been read in the bookmark under the given transformation_ctx key.

However, I don't understand why this is also necessary for methods like ApplyMapping.apply() and write_dynamic_frame.from_options(). To put it another way, what is the state information these operations need to store in the bookmark? If I don't pass transformation_ctx to these methods, what problems will this cause?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I had the same doubts about the bookmarking months ago (October 2019) and since the documentation provided by Amazon is not very clear I opened a support case to understand more how it is implemented.

In my Glue Job there was:

  • A read function from S3 (glue_context.create_dynamic_frame.from_options)
  • A ResolveChoice.apply
  • A write function to Redshift (glue_context.write_dynamic_frame.from_jdbc_conf)

All of these operations has the transformation_ctx value, I tested different possible behaviours (same transformation_ctx for all, different, fixed values, dynamic values ecc).

After many message with the AWS support they confirm that the bookmarking works only on the read function (They also said with only S3 as a source but I didn't test it), so I ask if the transformation_ctx is useless in the ResolveChoice (and write function too) and they said YES! They confirmed that doesn't make any difference.

Futhermore for the write function it doesn't change anything, so there is no bookmark logic, no "avoid function" if it has been already run before.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...