ryanjin/run_hbase_on_spark_mr.md

Last active August 29, 2015 14:15

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/ryanjin/5ce7400534ad131b4879.js"></script>
Save ryanjin/5ce7400534ad131b4879 to your computer and use it in GitHub Desktop.

蛋疼的在spark的mapreduce任务中执行hbase

Raw

##hbase-protocol.jar cd /opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hadoop

ln -s /opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hbase/lib/hbase-protocol-0.98.1-cdh5.1.0.jar hbase-protocol-0.98.1-cdh5.1.0.jar

##Configuration/HTable的初始化因为在MR中调用hbase，一开始在Function内新建,但是会报错，java.io.NotSerializableException 最后使用partitions类似的方法批量调用,但此时要注意合理设置partitions的数量

mapPartitions()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment