Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
688 views
in Technique[技术] by (71.8m points)

hadoop - Access a secured Hive when running Spark in an unsecured YARN cluster

We have two cloudera 5.7.1 clusters, one secured using Kerberos and one unsecured.

Is it possible to run Spark using the unsecured YARN cluster while accessing hive tables stored in the secured cluster? (Spark version is 1.6)

If so, can you please provide some explanation on how can I get it configured?

Update:

I want to explain a little the end goal behind my question. Our main secured cluster is heavily utilized and our jobs can't get enough resources to complete in a reasonable time. In order to overcome this, we wanted to use resources from another unsecured cluster we have without needing to copy the data between the clusters.

We know it's not the best solution as the data locality level might not be optimal, however that's the best solution we can come up for now.

Please let me know if you have any other solution as it seems like we can't achieve the above.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

If you run Spark in local mode, you can make it use an arbitrary set of Hadoop conf files -- i.e. core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.xml, hive-site.xml copied from the Kerberized cluster.
So you can access HDFS on that cluster -- if you have a Kerberos ticket that grants you access to that cluster, of course.

  export HADOOP_CONF_DIR=/path/to/conf/of/remote/kerberized/cluster
  kinit [email protected]
  spark-shell --master local[*]

But in yarn-client or yarn-cluster mode, you cannot launch containers in the local cluster and access HDFS in the other.

  • either you use the local core-site.xml that says that hadoop.security.authentication is simple, and you can connect to local YARN/HDFS
  • or you point to a copy of the remote core-site.xml that says that hadoop.security.authentication is kerberos, and you can connect to remote YARN/HDFS
  • but you cannot use the local, unsecure YARN and access the remote, secure HDFS

Note that with unsecure-unsecure or secure-secure combinations, you could access HDFS in another cluster, by hacking your own custom hdfs-site.xml to define multiple namespaces. But you are stuck to a single authentication model.
[edit] see the comment by Mighty Steve Loughran about an extra Spark property to access remote, secure HDFS from a local, secure cluster.

Note also that with DistCp you are stuck the same way -- except that there's a "cheat" property that allows you to go from secure to unsecure.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...