Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
3.5k views
in Technique[技术] by (71.8m points)

apache spark - concatenate array field in one record with all other recodes - pySpark

Schema

 root
     |-- userId: string (nullable = true)
     |-- languageknowList: array (nullable = true)
     |    |-- element: struct (containsNull = false)
     |    |    |-- code: string (nullable = false)
     |    |    |-- description: string (nullable = false)
     |    |    |-- name: string (nullable = false)

In this schema there is a user with userId 0, I have to concatenate the languageknowList in userId 0 with languageknowList of all other users.

How can I do that

Example: input data to DF

[{
  "userId":1,
  "languageknowList": [[10,"Hindi","Hindi"],[11,"Spanish","Spanish"]]
},
{
  "userId":2,
  "languageknowList": [[11,"Spanish","Spanish"]]
},
{
  "userId":0,
  "languageknowList": [[1,"English","English"],[2,"German","German"]]
}]

output df should be like:

[{
  "userId":1,
  "languageknowList": [[10,"Hindi","Hindi"],[11,"Spanish","Spanish"],[1,"English","English"],[2,"German","German"]]
},
{
  "userId":2,
  "languageknowList": [[11,"Spanish","Spanish"],[1,"English","English"],[2,"German","German"]]
}]

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can cross join the dataframe to the row with userId = 0, and concat the arrays of languages:

result = df.filter('userId != 0').crossJoin(
    df.filter('userId = 0').select('languageknowList').toDF('language')
).select(
    'userId',
    F.concat('languageknowList', 'language').alias('languageknowList')
)

result.show(20,0)
+------+----------------------------------------------------------------------------------------+
|userId|languageknowList                                                                        |
+------+----------------------------------------------------------------------------------------+
|1     |[[10, Hindi, Hindi], [11, Spanish, Spanish], [1, English, English], [2, German, German]]|
|2     |[[11, Spanish, Spanish], [1, English, English], [2, German, German]]                    |
+------+----------------------------------------------------------------------------------------+

result.coalesce(1).write.json('result')
$ cat result/part-00000-b34b3748-71b5-46d4-b011-6b208978cc5a-c000.json
{"userId":1,"languageknowList":[["10","Hindi","Hindi"],["11","Spanish","Spanish"],["1","English","English"],["2","German","German"]]}
{"userId":2,"languageknowList":[["11","Spanish","Spanish"],["1","English","English"],["2","German","German"]]}

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...