spark
文章目录
HDFS
- Namenode(nn)
- Secondary Namenode(snn)
- Journalnode
- Datanode
YARN
- Resourcemanager
- Nodemanager
计算模型
- MapReduce
- Spark
部署
说明
192.168.33.51 hdfs:NameNode,SecondaryNameNode yarn:ResourceManager
192.168.33.52 hdfs:DataNode yarn:NodeManager
192.168.33.53 hdfs:DataNode yarn:NodeManager
192.168.33.54 hdfs:DataNode yarn:NodeManager
Vagrantfile
[51,52,53,54].each do |i|
config.vm.define "n#{i}" do |node|
node.vm.network "private_network", ip: "192.168.33.#{i}"
node.vm.synced_folder "/data/vagrant/shell", "/shell"
node.vm.network :forwarded_port, guest: 22, host: "2#{i}22", host_ip: "0.0.0.0"
node.vm.provider "virtualbox" do |vb|
vb.memory = "2048"
vb.cpus = 2
end
node.vm.provision "shell", inline: <<-SHELL
echo "vagrant:vagrant" | sudo chpasswd
mkdir -p /data
chown -R vagrant:vagrant /data
hostnamectl set-hostname n#{i}
SHELL
end
end
安装
# 设置hosts
sudo vi /etc/hosts
192.168.33.51 n51
192.168.33.52 n52
192.168.33.53 n53
192.168.33.54 n54
# 设置系统时区
sudo timedatectl set-timezone Asia/Shanghai
# 下载解压缩
tar -xvf jdk-8u201-linux-x64.tar.gz
tar -xvf scala-2.12.8.tgz
tar -xvf apache-maven-3.6.0-bin.tar.gz
tar -xvf hadoop-3.2.0.tar.gz
tar -xvf spark-2.4.0-bin-hadoop2.7.tgz
tar -xvf apache-hive-3.1.1-bin.tar.gz
tar -xvf hbase-2.1.4-bin.tar.gz
# 设置环境变量
sudo touch /etc/profile.d/myenv.sh
# java
export JAVA_HOME=/data/java/jdk
export M2_HOME=/data/java/maven
export JRE_HOME=${JAVA_HOME}/jre
export SCALA_HOME=/data/java/scala
export CLASS_PATH=${JAVA_HOME}/lib:${JRE_HOME}/lib:${SCALA_HOME}/lib
export PATH=$PATH:${JAVA_HOME}/bin:${SCALA_HOME}/bin:$M2_HOME/bin
# hadoop
export HADOOP_HOME=/data/hadoop
#export HADOOP_MAPRED_HOME=$HADOOP_HOME
#export HADOOP_COMMON_HOME=$HADOOP_HOME
#export HADOOP_HDFS_HOME=$HADOOP_HOME
#export YARN_HOME=$HADOOP_HOME
#export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
# spark
export SPARK_HOME=/data/spark
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
# hive
export HIVE_HOME=/data/hive
export PATH=$PATH:$HIVE_HOME/bin
# hbase
export HBASE_HOME=/data/hbase
export PATH=$PATH:$HBASE_HOME/bin
# source
source /etc/profile
# 检查版本
[hduser@n51 ~]$ java -version
java version "1.8.0_201"
Java(TM) SE Runtime Environment (build 1.8.0_201-b09)
Java HotSpot(TM) 64-Bit Server VM (build 25.201-b09, mixed mode)
[hduser@n51 ~]$ scala -version
Scala code runner version 2.12.8 -- Copyright 2002-2018, LAMP/EPFL and Lightbend, Inc.
# 增加系统用户和组
sudo groupadd hadoop
sudo useradd -g hadoop hduser
sudo cat /etc/passwd |grep hduser
sudo cat /etc/group |grep hadoop
sudo vi /etc/sudoers
#hduser ALL=(ALL) ALL
#sudo usermod -aG hadoop vagrant
sudo passwd hduser
sudo su -l hduser
# 创建ssh密钥
mkdir ~/.ssh;
cd ~/.ssh/
# 都按回车
ssh-keygen -t rsa
# 加入授权
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
# 修改文件权限
chmod 600 ~/.ssh/authorized_keys
# 授权登录
ssh-copy-id -i ~/.ssh/id_rsa.pub hduser@n51
ssh-copy-id -i ~/.ssh/id_rsa.pub hduser@n52
ssh-copy-id -i ~/.ssh/id_rsa.pub hduser@n53
ssh-copy-id -i ~/.ssh/id_rsa.pub hduser@n54
#cat ~/.ssh/authorized_keys
# 创建haddop数据目录
sudo mkdir /data/hadoop_data
sudo chown -R hduser:hadoop /data/hadoop/
sudo chown -R hduser:hadoop /data/hadoop_data/
sudo chmod -R a+w /data/hadoop_data
su -l hduser
hdfs namenode -format
# 拷贝压缩包到各个节点
# cd /data
# tar -czf ~/hadoop.tar.gz *;
# scp ~/hadoop.tar.gz hduser@n52:~/
# scp ~/hadoop.tar.gz hduser@n53:~/
# scp ~/hadoop.tar.gz hduser@n54:~/
# 拷贝配置文件到各节点
scp -r /data/hadoop/etc hduser@n52:/data/hadoop
scp -r /data/hadoop/etc hduser@n53:/data/hadoop
scp -r /data/hadoop/etc hduser@n54:/data/hadoop
vi etc/hadoop/core-site.xml
<property>
<name>hadoop.tmp.dir</name>
<value>/data/hadoop_data/</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://n51:9000</value>
</property>
<property>
<name>hadoop.proxyuser.hduser.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hduser.hosts</name>
<value>*</value>
</property>
vi etc/hadoop/hdfs-site.xml
<property>
<name>dfs.namenode.http-address</name>
<value>n51:50070</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>n51:50090</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
vi etc/hadoop/mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>n51:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>n51:19888</value>
</property>
vi etc/hadoop/yarn-site.xml
<property>
<name>yarn.resourcemanager.hostname</name>
<value>n51</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
vi etc/hadoop/workers
n52
n53
n54
验证
- HDFS页面:50070
- YARN的管理界面:8088
- HistoryServer的管理界面:19888
- Zookeeper的服务端口号:2181
- Mysql的服务端口号:3306
- Hive.server1=10000
- Kafka的服务端口号:9092
- azkaban界面:8443
- Hbase界面:16010,60010
- Spark的界面:8080
- Spark的URL:7077
## HDFS页面:50070
http://n51:50070
## YARN界面:8088
http://n51:8088
## Spark的界面:8080
http://n51:8080
## Spark的URL:7077
spark
Hive & Beeline
vi $HIVE_HOME/conf/hive-site.xml
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:postgresql://192.168.33.26:30007/metastore</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>org.postgresql.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hiveuser</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>mypassword</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://n51:9083</value>
</property>
<property>
<name>hive.metastore.event.db.notification.api.auth</name>
<value>false</value>
</property>
<!--
<property>
<name>hive.server2.enable.doAs</name>
<value>false</value>
</property>
<property>
<name>hive.server2.webui.host</name>
<value>n51</value>
</property>
<property>
<name>hive.server2.webui.port</name>
<value>10002</value>
</property>
<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
</property>
<property>
<name>hive.server2.thrift.bind.host</name>
<value>n51</value>
</property>
-->
</configuration>
#wget https://jdbc.postgresql.org/download/postgresql-42.2.5.jar
# 初始化metastore
../bin/schematool -dbType postgres -initSchema
# 启动metastore
hive --service metastore --hiveconf hive.root.logger=DEBUG,console > ~/metastore.log 2>&1 &
# 启动hiveserver2
hive --service hiveserver2 --hiveconf hive.root.logger=DEBUG,console > ~/hiveserver2.log 2>&1 &
# 没有10000端口,没有10002端口
# hive.metastore.event.db.notification.api.auth
# 启动hive cli
hive --service cli -hiveconf hive.root.logger=DEBUG,console
# Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
# hive on spark
hive --service cli -hiveconf hive.execution.engine=spark -hiveconf hive.root.logger=DEBUG,console
# create table test_1(id int, name string, pid int, price int) ROW FORMAT delimited fields terminated by '\t' stored as textfile ;
# 启动 spark-sql
spark-sql --master yarn
vi $SPARK_HOME/conf/hive-site.xml
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hive.metastore.uris</name>
<value>thrift://n51:9083</value>
</property>
</configuration>
## 错误1
hive> show databases;
FAILED: HiveException java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
## 没有启动 metastore,先启动metastore
spark on YARN
cp -r spark-2.4.0-bin-hadoop2.7 /data/
cd /data/
ln -s spark-2.4.0-bin-hadoop2.7 spark
sudo chown -R hduser:hadoop /data/spark/
# 提交应用程序时使用local-cluster[x,y,z]参数:x代表要生成的executor数,y和z分别代表每个executor所拥有的core和memory数。
spark-submit 或者 spark-submit --master local
spark-submit --master local-cluster[2, 3, 1024]
spark-submit --master spark://node1:7077 或者 spark-submit --master spark://node1:7077 --deploy-mode client
spark-submit --master spark://node1:6066 --deploy-mode cluster
spark-submit --master yarn 或者 spark-submit --master yarn --deploy-mode client
spark-submit --master yarn --deploy-mode cluster
# 测试
/data/spark/bin/spark-shell --class org.apache.spark.examples.JavaSparkPi --master yarn-client /data/spark/examples/jars/spark-examples*.jar 10
# 错误1
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/Logger
# 解决
# vi /data/spark/conf/spark-env.sh
export SPARK_DIST_CLASSPATH=$(hadoop classpath)
# 错误2
Exception in thread "main" org.apache.spark.SparkException: When running with master 'yarn-client' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the environment.
# 解决
# vi /data/spark/conf/spark-env.sh
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop/
# 错误3
Warning: Master yarn-client is deprecated since 2.0. Please use master "yarn" with specified deploy mode instead.
> /data/spark/bin/spark-shell --class org.apache.spark.examples.JavaSparkPi --master yarn-client --jars /data/spark/examples/jars/spark-examples*.jar 10
< /data/spark/bin/spark-shell --class org.apache.spark.examples.JavaSparkPi --master yarn --deploy-mode client --jars/data/spark/examples/jars/spark-examples*.jar 10
# 错误4
WARN NativeCodeLoader:60 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
# export HADOOP_OPTS="${HADOOP_OPTS} -Djava.library.path=${HADOOP_HOME}/lib/native"
# 解决
# vi /data/spark/conf/spark-env.sh
export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native/:$LD_LIBRARY_PATH
# 错误5
/data/spark/bin/spark-shell --class org.apache.spark.examples.JavaWordCount \
--master yarn \
--deploy-mode client \
--jars /data/spark/examples/jars/spark-examples_2.12-2.4.0.jar \
/data/spark/examples/src/main/resources/people.txt
Exception in thread "main" org.apache.spark.sql.AnalysisException: Path does not exist: hdfs://n51:9000/data/spark/examples/src/main/resources/people.txt;
at org.apache.spark.sql.execution.datasources.DataSource.$anonfun$checkAndGlobPathIfNecessary$1(DataSource.scala:558)
at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:240)
at scala.collection.immutable.List.foreach(List.scala:388)
at scala.collection.TraversableLike.flatMap(TraversableLike.scala:240)
at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:237)
at scala.collection.immutable.List.flatMap(List.scala:351)
at org.apache.spark.sql.execution.datasources.DataSource.checkAndGlobPathIfNecessary(DataSource.scala:545)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:359)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
at org.apache.spark.sql.DataFrameReader.text(DataFrameReader.scala:715)
at org.apache.spark.sql.DataFrameReader.textFile(DataFrameReader.scala:757)
at org.apache.spark.sql.DataFrameReader.textFile(DataFrameReader.scala:724)
at org.apache.spark.examples.JavaWordCount.main(JavaWordCount.java:45)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
# 错误6
Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: file://data/spark/examples/src/main/resources/people.txt, expected: file:///
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:730)
at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:87)
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:661)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:987)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:656)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:454)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1683)
at org.apache.spark.sql.execution.datasources.DataSource.$anonfun$checkAndGlobPathIfNecessary$1(DataSource.scala:557)
at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:240)
at scala.collection.immutable.List.foreach(List.scala:388)
at scala.collection.TraversableLike.flatMap(TraversableLike.scala:240)
at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:237)
at scala.collection.immutable.List.flatMap(List.scala:351)
at org.apache.spark.sql.execution.datasources.DataSource.checkAndGlobPathIfNecessary(DataSource.scala:545)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:359)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
at org.apache.spark.sql.DataFrameReader.text(DataFrameReader.scala:715)
at org.apache.spark.sql.DataFrameReader.textFile(DataFrameReader.scala:757)
at org.apache.spark.sql.DataFrameReader.textFile(DataFrameReader.scala:724)
at org.apache.spark.examples.JavaWordCount.main(JavaWordCount.java:45)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
# 错误7
2019-03-28 10:04:10 INFO DAGScheduler:54 - Job 0 failed: collect at JavaWordCount.java:53, took 11.890651 s
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, n52, executor 2): java.io.FileNotFoundException: File file:/data/spark/examples/src/main/resources/people.txt does not exist
It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the Dataset/DataFrame involved.
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:132)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:177)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:619)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:454)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:454)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:485)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:454)
at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:191)
at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:62)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:405)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:1887)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:1875)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:1874)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:58)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:51)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1874)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:926)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:926)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2108)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2057)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2046)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:737)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2082)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2101)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2126)
at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:945)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at org.apache.spark.rdd.RDD.collect(RDD.scala:944)
at org.apache.spark.api.java.JavaRDDLike.collect(JavaRDDLike.scala:361)
at org.apache.spark.api.java.JavaRDDLike.collect$(JavaRDDLike.scala:360)
at org.apache.spark.api.java.AbstractJavaRDDLike.collect(JavaRDDLike.scala:45)
at org.apache.spark.examples.JavaWordCount.main(JavaWordCount.java:53)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.FileNotFoundException: File file:/data/spark/examples/src/main/resources/people.txt does not exist
It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the Dataset/DataFrame involved.
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:132)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:177)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:619)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:454)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:454)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:485)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:454)
at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:191)
at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:62)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:405)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
# 错误
2019-03-29 12:59:41 WARN Client:66 - Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
# 正确操作1
hadoop fs -put /data/spark/examples/src/main/resources/people.txt /user/hduser/
hadoop fs -ls -R /
drwxr-xr-x - hduser supergroup 0 2019-03-28 09:12 /user
drwxr-xr-x - hduser supergroup 0 2019-03-28 11:28 /user/hduser
drwxr-xr-x - hduser supergroup 0 2019-03-28 10:35 /user/hduser/.sparkStaging
-rw-r--r-- 3 hduser supergroup 32 2019-03-28 11:28 /user/hduser/people.txt
/data/spark/bin/spark-shell --class org.apache.spark.examples.JavaWordCount \
--master yarn --deploy-mode client \
--jars /data/spark/examples/jars/spark-examples*.jar \
/user/hduser/people.txt
# 正确操作2
/data/spark/bin/spark-shell --class org.apache.spark.examples.JavaSparkPi --master yarn --deploy-mode client --jars /data/spark/examples/jars/spark-examples*.jar 10
2019-03-28 09:12:21 INFO SparkContext:54 - Running Spark version 2.4.0
2019-03-28 09:12:22 INFO SparkContext:54 - Submitted application: Spark Pi
2019-03-28 09:12:22 INFO SecurityManager:54 - Changing view acls to: hduser
2019-03-28 09:12:22 INFO SecurityManager:54 - Changing modify acls to: hduser
2019-03-28 09:12:22 INFO SecurityManager:54 - Changing view acls groups to:
2019-03-28 09:12:22 INFO SecurityManager:54 - Changing modify acls groups to:
2019-03-28 09:12:22 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hduser); groups with view permissions: Set(); users with modify permissions: Set(hduser); groups with modify permissions: Set()
2019-03-28 09:12:22 INFO Utils:54 - Successfully started service 'sparkDriver' on port 34268.
2019-03-28 09:12:22 INFO SparkEnv:54 - Registering MapOutputTracker
2019-03-28 09:12:22 INFO SparkEnv:54 - Registering BlockManagerMaster
2019-03-28 09:12:22 INFO BlockManagerMasterEndpoint:54 - Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
2019-03-28 09:12:22 INFO BlockManagerMasterEndpoint:54 - BlockManagerMasterEndpoint up
2019-03-28 09:12:22 INFO DiskBlockManager:54 - Created local directory at /tmp/blockmgr-51798937-8c00-4270-a97e-f4b2e1c0ea1a
2019-03-28 09:12:22 INFO MemoryStore:54 - MemoryStore started with capacity 413.9 MB
2019-03-28 09:12:22 INFO SparkEnv:54 - Registering OutputCommitCoordinator
2019-03-28 09:12:22 INFO log:192 - Logging initialized @3323ms
2019-03-28 09:12:22 INFO Server:351 - jetty-9.3.z-SNAPSHOT, build timestamp: 2018-06-06T01:11:56+08:00, git hash: 84205aa28f11a4f31f2a3b86d1bba2cc8ab69827
2019-03-28 09:12:22 INFO Server:419 - Started @3450ms
2019-03-28 09:12:22 INFO AbstractConnector:278 - Started ServerConnector@669253b7{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2019-03-28 09:12:22 INFO Utils:54 - Successfully started service 'SparkUI' on port 4040.
2019-03-28 09:12:22 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@547e29a4{/jobs,null,AVAILABLE,@Spark}
2019-03-28 09:12:22 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@5167268{/jobs/json,null,AVAILABLE,@Spark}
2019-03-28 09:12:22 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@1cfd1875{/jobs/job,null,AVAILABLE,@Spark}
2019-03-28 09:12:22 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@2c444798{/jobs/job/json,null,AVAILABLE,@Spark}
2019-03-28 09:12:22 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@1af7f54a{/stages,null,AVAILABLE,@Spark}
2019-03-28 09:12:22 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6ebd78d1{/stages/json,null,AVAILABLE,@Spark}
2019-03-28 09:12:22 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@436390f4{/stages/stage,null,AVAILABLE,@Spark}
2019-03-28 09:12:22 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6d1310f6{/stages/stage/json,null,AVAILABLE,@Spark}
2019-03-28 09:12:22 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3228d990{/stages/pool,null,AVAILABLE,@Spark}
2019-03-28 09:12:22 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@54e7391d{/stages/pool/json,null,AVAILABLE,@Spark}
2019-03-28 09:12:22 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@50b8ae8d{/storage,null,AVAILABLE,@Spark}
2019-03-28 09:12:22 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@255990cc{/storage/json,null,AVAILABLE,@Spark}
2019-03-28 09:12:22 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@51c929ae{/storage/rdd,null,AVAILABLE,@Spark}
2019-03-28 09:12:22 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3c8bdd5b{/storage/rdd/json,null,AVAILABLE,@Spark}
2019-03-28 09:12:22 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@29d2d081{/environment,null,AVAILABLE,@Spark}
2019-03-28 09:12:23 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@40e4ea87{/environment/json,null,AVAILABLE,@Spark}
2019-03-28 09:12:23 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@58783f6c{/executors,null,AVAILABLE,@Spark}
2019-03-28 09:12:23 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3a7b503d{/executors/json,null,AVAILABLE,@Spark}
2019-03-28 09:12:23 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@512d92b{/executors/threadDump,null,AVAILABLE,@Spark}
2019-03-28 09:12:23 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@62c5bbdc{/executors/threadDump/json,null,AVAILABLE,@Spark}
2019-03-28 09:12:23 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7bdf6bb7{/static,null,AVAILABLE,@Spark}
2019-03-28 09:12:23 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@e72dba7{/,null,AVAILABLE,@Spark}
2019-03-28 09:12:23 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@33c2bd{/api,null,AVAILABLE,@Spark}
2019-03-28 09:12:23 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3abd581e{/jobs/job/kill,null,AVAILABLE,@Spark}
2019-03-28 09:12:23 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4d4d8fcf{/stages/stage/kill,null,AVAILABLE,@Spark}
2019-03-28 09:12:23 INFO SparkUI:54 - Bound SparkUI to 0.0.0.0, and started at http://n51:4040
2019-03-28 09:12:24 INFO RMProxy:133 - Connecting to ResourceManager at n51/192.168.33.51:8032
2019-03-28 09:12:24 INFO Client:54 - Requesting a new application from cluster with 3 NodeManagers
2019-03-28 09:12:24 INFO Configuration:2752 - resource-types.xml not found
2019-03-28 09:12:24 INFO ResourceUtils:419 - Unable to find 'resource-types.xml'.
2019-03-28 09:12:24 INFO Client:54 - Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
2019-03-28 09:12:24 INFO Client:54 - Will allocate AM container, with 896 MB memory including 384 MB overhead
2019-03-28 09:12:24 INFO Client:54 - Setting up container launch context for our AM
2019-03-28 09:12:24 INFO Client:54 - Setting up the launch environment for our AM container
2019-03-28 09:12:25 INFO Client:54 - Preparing resources for our AM container
2019-03-28 09:12:25 WARN Client:66 - Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
2019-03-28 09:12:26 INFO Client:54 - Uploading resource file:/tmp/spark-006831fe-7db9-4735-b02c-989d273409a7/__spark_libs__4818467303892521996.zip -> hdfs://n51:9000/user/hduser/.sparkStaging/application_1553701436298_0001/__spark_libs__4818467303892521996.zip
2019-03-28 09:12:30 INFO Client:54 - Uploading resource file:/data/spark/examples/jars/spark-examples_2.12-2.4.0.jar -> hdfs://n51:9000/user/hduser/.sparkStaging/application_1553701436298_0001/spark-examples_2.12-2.4.0.jar
2019-03-28 09:12:31 INFO Client:54 - Uploading resource file:/tmp/spark-006831fe-7db9-4735-b02c-989d273409a7/__spark_conf__1337752320075848084.zip -> hdfs://n51:9000/user/hduser/.sparkStaging/application_1553701436298_0001/__spark_conf__.zip
2019-03-28 09:12:31 INFO SecurityManager:54 - Changing view acls to: hduser
2019-03-28 09:12:31 INFO SecurityManager:54 - Changing modify acls to: hduser
2019-03-28 09:12:31 INFO SecurityManager:54 - Changing view acls groups to:
2019-03-28 09:12:31 INFO SecurityManager:54 - Changing modify acls groups to:
2019-03-28 09:12:31 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hduser); groups with view permissions: Set(); users with modify permissions: Set(hduser); groups with modify permissions: Set()
2019-03-28 09:12:32 INFO Client:54 - Submitting application application_1553701436298_0001 to ResourceManager
2019-03-28 09:12:33 INFO YarnClientImpl:311 - Submitted application application_1553701436298_0001
2019-03-28 09:12:33 INFO SchedulerExtensionServices:54 - Starting Yarn extension services with app application_1553701436298_0001 and attemptId None
2019-03-28 09:12:34 INFO Client:54 - Application report for application_1553701436298_0001 (state: ACCEPTED)
2019-03-28 09:12:34 INFO Client:54 -
client token: N/A
diagnostics: [Thu Mar 28 09:12:34 +0800 2019] Scheduler has assigned a container for AM, waiting for AM container to be launched
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1553735553132
final status: UNDEFINED
tracking URL: http://n51:8088/proxy/application_1553701436298_0001/
user: hduser
2019-03-28 09:12:35 INFO Client:54 - Application report for application_1553701436298_0001 (state: ACCEPTED)
2019-03-28 09:12:36 INFO Client:54 - Application report for application_1553701436298_0001 (state: ACCEPTED)
2019-03-28 09:12:37 INFO Client:54 - Application report for application_1553701436298_0001 (state: ACCEPTED)
2019-03-28 09:12:38 INFO Client:54 - Application report for application_1553701436298_0001 (state: ACCEPTED)
2019-03-28 09:12:39 INFO Client:54 - Application report for application_1553701436298_0001 (state: ACCEPTED)
2019-03-28 09:12:40 INFO Client:54 - Application report for application_1553701436298_0001 (state: ACCEPTED)
2019-03-28 09:12:41 INFO Client:54 - Application report for application_1553701436298_0001 (state: ACCEPTED)
2019-03-28 09:12:42 INFO Client:54 - Application report for application_1553701436298_0001 (state: ACCEPTED)
2019-03-28 09:12:43 INFO YarnClientSchedulerBackend:54 - Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> n51, PROXY_URI_BASES -> http://n51:8088/proxy/application_1553701436298_0001), /proxy/application_1553701436298_0001
2019-03-28 09:12:43 INFO JettyUtils:54 - Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /jobs, /jobs/json, /jobs/job, /jobs/job/json, /stages, /stages/json, /stages/stage, /stages/stage/json, /stages/pool, /stages/pool/json, /storage, /storage/json, /storage/rdd, /storage/rdd/json, /environment, /environment/json, /executors, /executors/json, /executors/threadDump, /executors/threadDump/json, /static, /, /api, /jobs/job/kill, /stages/stage/kill.
2019-03-28 09:12:43 INFO YarnSchedulerBackend$YarnSchedulerEndpoint:54 - ApplicationMaster registered as NettyRpcEndpointRef(spark-client://YarnAM)
2019-03-28 09:12:43 INFO Client:54 - Application report for application_1553701436298_0001 (state: RUNNING)
2019-03-28 09:12:43 INFO Client:54 -
client token: N/A
diagnostics: N/A
ApplicationMaster host: 192.168.33.52
ApplicationMaster RPC port: -1
queue: default
start time: 1553735553132
final status: UNDEFINED
tracking URL: http://n51:8088/proxy/application_1553701436298_0001/
user: hduser
2019-03-28 09:12:43 INFO YarnClientSchedulerBackend:54 - Application application_1553701436298_0001 has started running.
2019-03-28 09:12:43 INFO Utils:54 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 36330.
2019-03-28 09:12:43 INFO NettyBlockTransferService:54 - Server created on n51:36330
2019-03-28 09:12:43 INFO BlockManager:54 - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
2019-03-28 09:12:43 INFO BlockManagerMaster:54 - Registering BlockManager BlockManagerId(driver, n51, 36330, None)
2019-03-28 09:12:43 INFO BlockManagerMasterEndpoint:54 - Registering block manager n51:36330 with 413.9 MB RAM, BlockManagerId(driver, n51, 36330, None)
2019-03-28 09:12:43 INFO BlockManagerMaster:54 - Registered BlockManager BlockManagerId(driver, n51, 36330, None)
2019-03-28 09:12:43 INFO BlockManager:54 - Initialized BlockManager: BlockManagerId(driver, n51, 36330, None)
2019-03-28 09:12:43 INFO JettyUtils:54 - Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /metrics/json.
2019-03-28 09:12:43 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@668cc9a2{/metrics/json,null,AVAILABLE,@Spark}
2019-03-28 09:12:50 INFO YarnSchedulerBackend$YarnDriverEndpoint:54 - Registered executor NettyRpcEndpointRef(spark-client://Executor) (192.168.33.52:49272) with ID 1
2019-03-28 09:12:50 INFO BlockManagerMasterEndpoint:54 - Registering block manager n52:44918 with 413.9 MB RAM, BlockManagerId(1, n52, 44918, None)
2019-03-28 09:12:51 INFO YarnSchedulerBackend$YarnDriverEndpoint:54 - Registered executor NettyRpcEndpointRef(spark-client://Executor) (192.168.33.54:46766) with ID 2
2019-03-28 09:12:51 INFO YarnClientSchedulerBackend:54 - SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
2019-03-28 09:12:51 INFO BlockManagerMasterEndpoint:54 - Registering block manager n54:35494 with 413.9 MB RAM, BlockManagerId(2, n54, 35494, None)
2019-03-28 09:12:52 INFO SparkContext:54 - Starting job: reduce at SparkPi.scala:38
2019-03-28 09:12:52 INFO DAGScheduler:54 - Got job 0 (reduce at SparkPi.scala:38) with 10 output partitions
2019-03-28 09:12:52 INFO DAGScheduler:54 - Final stage: ResultStage 0 (reduce at SparkPi.scala:38)
2019-03-28 09:12:52 INFO DAGScheduler:54 - Parents of final stage: List()
2019-03-28 09:12:52 INFO DAGScheduler:54 - Missing parents: List()
2019-03-28 09:12:52 INFO DAGScheduler:54 - Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34), which has no missing parents
2019-03-28 09:12:52 INFO MemoryStore:54 - Block broadcast_0 stored as values in memory (estimated size 3.0 KB, free 413.9 MB)
2019-03-28 09:12:52 INFO MemoryStore:54 - Block broadcast_0_piece0 stored as bytes in memory (estimated size 1765.0 B, free 413.9 MB)
2019-03-28 09:12:52 INFO BlockManagerInfo:54 - Added broadcast_0_piece0 in memory on n51:36330 (size: 1765.0 B, free: 413.9 MB)
2019-03-28 09:12:52 INFO SparkContext:54 - Created broadcast 0 from broadcast at DAGScheduler.scala:1161
2019-03-28 09:12:52 INFO DAGScheduler:54 - Submitting 10 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9))
2019-03-28 09:12:52 INFO YarnScheduler:54 - Adding task set 0.0 with 10 tasks
2019-03-28 09:12:52 INFO TaskSetManager:54 - Starting task 0.0 in stage 0.0 (TID 0, n54, executor 2, partition 0, PROCESS_LOCAL, 7402 bytes)
2019-03-28 09:12:52 INFO TaskSetManager:54 - Starting task 1.0 in stage 0.0 (TID 1, n52, executor 1, partition 1, PROCESS_LOCAL, 7402 bytes)
2019-03-28 09:12:53 INFO BlockManagerInfo:54 - Added broadcast_0_piece0 in memory on n52:44918 (size: 1765.0 B, free: 413.9 MB)
2019-03-28 09:12:53 INFO BlockManagerInfo:54 - Added broadcast_0_piece0 in memory on n54:35494 (size: 1765.0 B, free: 413.9 MB)
2019-03-28 09:12:54 INFO TaskSetManager:54 - Starting task 2.0 in stage 0.0 (TID 2, n54, executor 2, partition 2, PROCESS_LOCAL, 7402 bytes)
2019-03-28 09:12:54 INFO TaskSetManager:54 - Finished task 0.0 in stage 0.0 (TID 0) in 1412 ms on n54 (executor 2) (1/10)
2019-03-28 09:12:54 INFO TaskSetManager:54 - Starting task 3.0 in stage 0.0 (TID 3, n54, executor 2, partition 3, PROCESS_LOCAL, 7402 bytes)
2019-03-28 09:12:54 INFO TaskSetManager:54 - Finished task 2.0 in stage 0.0 (TID 2) in 84 ms on n54 (executor 2) (2/10)
2019-03-28 09:12:54 INFO TaskSetManager:54 - Starting task 4.0 in stage 0.0 (TID 4, n54, executor 2, partition 4, PROCESS_LOCAL, 7402 bytes)
2019-03-28 09:12:54 INFO TaskSetManager:54 - Finished task 3.0 in stage 0.0 (TID 3) in 64 ms on n54 (executor 2) (3/10)
2019-03-28 09:12:54 INFO TaskSetManager:54 - Starting task 5.0 in stage 0.0 (TID 5, n52, executor 1, partition 5, PROCESS_LOCAL, 7402 bytes)
2019-03-28 09:12:54 INFO TaskSetManager:54 - Finished task 1.0 in stage 0.0 (TID 1) in 1520 ms on n52 (executor 1) (4/10)
2019-03-28 09:12:54 INFO TaskSetManager:54 - Starting task 6.0 in stage 0.0 (TID 6, n54, executor 2, partition 6, PROCESS_LOCAL, 7402 bytes)
2019-03-28 09:12:54 INFO TaskSetManager:54 - Finished task 4.0 in stage 0.0 (TID 4) in 58 ms on n54 (executor 2) (5/10)
2019-03-28 09:12:54 INFO TaskSetManager:54 - Starting task 7.0 in stage 0.0 (TID 7, n52, executor 1, partition 7, PROCESS_LOCAL, 7402 bytes)
2019-03-28 09:12:54 INFO TaskSetManager:54 - Finished task 5.0 in stage 0.0 (TID 5) in 55 ms on n52 (executor 1) (6/10)
2019-03-28 09:12:54 INFO TaskSetManager:54 - Starting task 8.0 in stage 0.0 (TID 8, n54, executor 2, partition 8, PROCESS_LOCAL, 7402 bytes)
2019-03-28 09:12:54 INFO TaskSetManager:54 - Finished task 6.0 in stage 0.0 (TID 6) in 73 ms on n54 (executor 2) (7/10)
2019-03-28 09:12:54 INFO TaskSetManager:54 - Starting task 9.0 in stage 0.0 (TID 9, n52, executor 1, partition 9, PROCESS_LOCAL, 7402 bytes)
2019-03-28 09:12:54 INFO TaskSetManager:54 - Finished task 7.0 in stage 0.0 (TID 7) in 93 ms on n52 (executor 1) (8/10)
2019-03-28 09:12:54 INFO TaskSetManager:54 - Finished task 8.0 in stage 0.0 (TID 8) in 61 ms on n54 (executor 2) (9/10)
2019-03-28 09:12:54 INFO TaskSetManager:54 - Finished task 9.0 in stage 0.0 (TID 9) in 60 ms on n52 (executor 1) (10/10)
2019-03-28 09:12:54 INFO YarnScheduler:54 - Removed TaskSet 0.0, whose tasks have all completed, from pool
2019-03-28 09:12:54 INFO DAGScheduler:54 - ResultStage 0 (reduce at SparkPi.scala:38) finished in 2.141 s
2019-03-28 09:12:54 INFO DAGScheduler:54 - Job 0 finished: reduce at SparkPi.scala:38, took 2.257604 s
Pi is roughly 3.1413311413311416
2019-03-28 09:12:54 INFO AbstractConnector:318 - Stopped Spark@669253b7{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2019-03-28 09:12:54 INFO SparkUI:54 - Stopped Spark web UI at http://n51:4040
2019-03-28 09:12:54 INFO YarnClientSchedulerBackend:54 - Interrupting monitor thread
2019-03-28 09:12:54 INFO YarnClientSchedulerBackend:54 - Shutting down all executors
2019-03-28 09:12:54 INFO YarnSchedulerBackend$YarnDriverEndpoint:54 - Asking each executor to shut down
2019-03-28 09:12:54 INFO SchedulerExtensionServices:54 - Stopping SchedulerExtensionServices
(serviceOption=None,
services=List(),
started=false)
2019-03-28 09:12:54 INFO YarnClientSchedulerBackend:54 - Stopped
2019-03-28 09:12:54 INFO MapOutputTrackerMasterEndpoint:54 - MapOutputTrackerMasterEndpoint stopped!
2019-03-28 09:12:54 INFO MemoryStore:54 - MemoryStore cleared
2019-03-28 09:12:54 INFO BlockManager:54 - BlockManager stopped
2019-03-28 09:12:54 INFO BlockManagerMaster:54 - BlockManagerMaster stopped
2019-03-28 09:12:54 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - OutputCommitCoordinator stopped!
2019-03-28 09:12:54 INFO SparkContext:54 - Successfully stopped SparkContext
2019-03-28 09:12:54 INFO ShutdownHookManager:54 - Shutdown hook called
2019-03-28 09:12:54 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-ab81b4a5-dd8d-4f2e-bb9e-0aee50fc95fe
2019-03-28 09:12:54 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-006831fe-7db9-4735-b02c-989d273409a7
# 2019-04-02 08:25:27 WARN Client:66 - Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
# 正确操作3
/data/spark/bin/spark-submit --class org.apache.spark.examples.SparkPi \
--master yarn --deploy-mode cluster \
--jars /data/spark/examples/jars/spark-examples_2.12-2.4.0.jar \
/data/spark/examples/jars/spark-examples_2.12-2.4.0.jar 10
## hdfs追加文件
hadoop fs -appendToFile ~/cp.sh /user/hduser/words.txt && hadoop fs -cat /user/hduser/words.txt
# Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
# 在hdfs上创建保存spark相关jars的目录:
hdfs dfs -mkdir -p /jars/spark_jars
# 上传spark的jars
hdfs dfs -put /data/spark/jars/* /jars/spark_jars/
# 在spark的conf的spark-default.conf添加配置
spark.yarn.jars=hdfs://n51:9000/jars/spark_jars/*
# hive on spark
hive --service cli \
-hiveconf hive.root.logger=DEBUG,console \
-hiveconf hive.execution.engine=spark \
-hiveconf spark.kryo.referenceTracking=false \
-hiveconf spark.kryo.classesToRegister=org.apache.hadoop.hive.ql.io.HiveKey,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch
sparkSQL
start-thriftserver.sh --hiveconf hive.server2.thrift.port=10011 --master yarn
#--driver-class-path /data/spark-2.2.0-bin-hadoop2.7/jars/mysql-connector-java-5.1.43-bin.jar --executor-memory 5g --total-executor-cores 5
spark-sql --master yarn
spark-shell --master yarn
val input = sc.textFile("/user/hduser/random_words.txt")
val words = input.flatMap(line => line.split(" "))
val counts = words.map(word => (word, 1)).reduceByKey(_ + _)
counts.saveAsTextFile("/user/hduser/random_words_out.txt")
//input.count()
//input.first()
Hbase
说明
192.168.33.51 hdfs:NameNode,SecondaryNameNode yarn:ResourceManager zookpeer hbase:HMaster
192.168.33.52 hdfs:DataNode yarn:NodeManager zookpeer hbase:HRegionServer
192.168.33.53 hdfs:DataNode yarn:NodeManager zookpeer hbase:HRegionServer
192.168.33.54 hdfs:DataNode yarn:NodeManager hbase:HRegionServer
安装
vi /data/hbase/conf/hbase-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hbase.zookeeper.quorum</name>
<value>n51,n52,n53</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
<property>
<name>hbase.tmp.dir</name>
<value>/data/hbase_tmp</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://n51:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.unsafe.stream.capability.enforce</name>
<value>false</value>
</property>
<property>
<name>hbase.wal.provider</name>
<value>filesystem</value>
</property>
<!--
<property>
<name>hbase.master.info.port</name>
<value>16010</value>
</property>
-->
</configuration>
vi /data/hbase/conf/regionservers
n52
n53
n54
vi /data/hbase/conf/hbase-env.sh
export HBASE_MANAGES_ZK=false
启动
bin/start-hbase.sh
bin/stop-hbase.sh
bin/hbase-daemon.sh start master
bin/hbase-daemon.sh start regionserver
2019-04-11 14:49:56,938 ERROR [main] regionserver.HRegionServer: Failed construction RegionServer
java.lang.NoClassDefFoundError: org/apache/htrace/SamplerBuilder
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:644)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:628)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:93)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2701)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2683)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:372)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.hadoop.hbase.util.CommonFSUtils.getRootDir(CommonFSUtils.java:362)
at org.apache.hadoop.hbase.util.CommonFSUtils.isValidWALRootDir(CommonFSUtils.java:411)
at org.apache.hadoop.hbase.util.CommonFSUtils.getWALRootDir(CommonFSUtils.java:387)
at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeFileSystem(HRegionServer.java:704)
at org.apache.hadoop.hbase.regionserver.HRegionServer.<init>(HRegionServer.java:613)
at org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:489)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:3093)
at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:236)
at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:140)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:149)
at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:3111)
Caused by: java.lang.ClassNotFoundException: org.apache.htrace.SamplerBuilder
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 25 more
2019-04-11 14:49:56,959 ERROR [main] master.HMasterCommandLine: Master exiting
java.lang.RuntimeException: Failed construction of Master: class org.apache.hadoop.hbase.master.HMaster.
at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:3100)
at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:236)
at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:140)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:149)
at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:3111)
Caused by: java.lang.NoClassDefFoundError: org/apache/htrace/SamplerBuilder
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:644)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:628)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:93)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2701)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2683)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:372)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.hadoop.hbase.util.CommonFSUtils.getRootDir(CommonFSUtils.java:362)
at org.apache.hadoop.hbase.util.CommonFSUtils.isValidWALRootDir(CommonFSUtils.java:411)
at org.apache.hadoop.hbase.util.CommonFSUtils.getWALRootDir(CommonFSUtils.java:387)
at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeFileSystem(HRegionServer.java:704)
at org.apache.hadoop.hbase.regionserver.HRegionServer.<init>(HRegionServer.java:613)
at org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:489)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:3093)
... 5 more
Caused by: java.lang.ClassNotFoundException: org.apache.htrace.SamplerBuilder
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 25 more
2019-04-11 15:18:00,964 ERROR [master/n51:16000:becomeActiveMaster] master.HMaster: Failed to become active master
java.lang.IllegalStateException: The procedure WAL relies on the ability to hsync for proper operation during component failures, but the underlying filesystem does not support doing so. Please check the config value of 'hbase.procedure.store.wal.use.hsync' to set the desired level of robustness and ensure the config value of 'hbase.wal.dir' points to a FileSystem mount that can provide it.
at org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.rollWriter(WALProcedureStore.java:1086)
at org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.recoverLease(WALProcedureStore.java:423)
at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:611)
at org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1458)
at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:890)
at org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2272)
at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:581)
at java.lang.Thread.run(Thread.java:748)
2019-04-11 15:18:00,965 ERROR [master/n51:16000:becomeActiveMaster] master.HMaster: ***** ABORTING master n51,16000,1554967074912: Unhandled exception. Starting shutdown. *****
java.lang.IllegalStateException: The procedure WAL relies on the ability to hsync for proper operation during component failures, but the underlying filesystem does not support doing so. Please check the config value of 'hbase.procedure.store.wal.use.hsync' to set the desired level of robustness and ensure the config value of 'hbase.wal.dir' points to a FileSystem mount that can provide it.
at org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.rollWriter(WALProcedureStore.java:1086)
at org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.recoverLease(WALProcedureStore.java:423)
at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:611)
at org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1458)
at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:890)
at org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2272)
at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:581)
at java.lang.Thread.run(Thread.java:748)
java.lang.ExceptionInInitializerError
at org.apache.hadoop.hbase.ipc.RemoteWithExtrasException.unwrapRemoteException(RemoteWithExtrasException.java:79)
at org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.makeIOExceptionOfException(ProtobufUtil.java:362)
at org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:339)
at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2633)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:964)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Failed to create local dir /data/hbase_tmp/local/jars, DynamicClassLoader failed to init
at org.apache.hadoop.hbase.util.DynamicClassLoader.initTempDir(DynamicClassLoader.java:110)
at org.apache.hadoop.hbase.util.DynamicClassLoader.<init>(DynamicClassLoader.java:98)
at org.apache.hadoop.hbase.ipc.RemoteWithExtrasException$ClassLoaderHolder.lambda$static$0(RemoteWithExtrasException.java:56)
at java.security.AccessController.doPrivileged(Native Method)
at org.apache.hadoop.hbase.ipc.RemoteWithExtrasException$ClassLoaderHolder.<clinit>(RemoteWithExtrasException.java:55)
... 6 more
2019-04-11 17:04:26,903 INFO [RpcServer.priority.FPBQ.Fifo.handler=18,queue=0,port=16020] regionserver.RSRpcServices: Open hbase:meta,,1.1588230740
2019-04-11 17:04:26,925 INFO [RS_OPEN_META-regionserver/n53:16020-0] wal.AbstractFSWAL: WAL configuration: blocksize=256 MB, rollsize=128 MB, prefix=n53%2C16020%2C1554973432966.meta, suffix=.meta, logDir=hdfs://n51:9000/hbase/WALs/n53,16020,1554973432966, archiveDir=hdfs://n51:9000/hbase/oldWALs
2019-04-11 17:04:26,936 ERROR [RS_OPEN_META-regionserver/n53:16020-0] handler.OpenRegionHandler: Failed open of region=hbase:meta,,1.1588230740
java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper
at org.apache.hadoop.hbase.io.asyncfs.AsyncFSOutputHelper.createOutput(AsyncFSOutputHelper.java:51)
at org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.initOutput(AsyncProtobufLogWriter.java:169)
at org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufLogWriter.init(AbstractProtobufLogWriter.java:166)
at org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createAsyncWriter(AsyncFSWALProvider.java:113)
at org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createWriterInstance(AsyncFSWAL.java:614)
at org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createWriterInstance(AsyncFSWAL.java:126)
at org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:756)
at org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:486)
at org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.<init>(AsyncFSWAL.java:253)
at org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createWAL(AsyncFSWALProvider.java:73)
at org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createWAL(AsyncFSWALProvider.java:48)
at org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:152)
at org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:60)
at org.apache.hadoop.hbase.wal.WALFactory.getWAL(WALFactory.java:284)
at org.apache.hadoop.hbase.regionserver.HRegionServer.getWAL(HRegionServer.java:2126)
at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:284)
at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:108)
at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2019-04-11 17:04:29,406 INFO [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] regionserver.RSRpcServices: Open hbase:meta,,1.1588230740
2019-04-11 17:04:29,423 INFO [RS_OPEN_META-regionserver/n53:16020-0] wal.AbstractFSWAL: WAL configuration: blocksize=256 MB, rollsize=128 MB, prefix=n53%2C16020%2C1554973432966.meta, suffix=.meta, logDir=hdfs://n51:9000/hbase/WALs/n53,16020,1554973432966, archiveDir=hdfs://n51:9000/hbase/oldWALs
2019-04-11 17:04:29,425 ERROR [RS_OPEN_META-regionserver/n53:16020-0] handler.OpenRegionHandler: Failed open of region=hbase:meta,,1.1588230740
java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper
at org.apache.hadoop.hbase.io.asyncfs.AsyncFSOutputHelper.createOutput(AsyncFSOutputHelper.java:51)
at org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.initOutput(AsyncProtobufLogWriter.java:169)
at org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufLogWriter.init(AbstractProtobufLogWriter.java:166)
at org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createAsyncWriter(AsyncFSWALProvider.java:113)
at org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createWriterInstance(AsyncFSWAL.java:614)
at org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createWriterInstance(AsyncFSWAL.java:126)
at org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:756)
at org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:486)
at org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.<init>(AsyncFSWAL.java:253)
at org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createWAL(AsyncFSWALProvider.java:73)
at org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createWAL(AsyncFSWALProvider.java:48)
at org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:152)
at org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:60)
at org.apache.hadoop.hbase.wal.WALFactory.getWAL(WALFactory.java:284)
at org.apache.hadoop.hbase.regionserver.HRegionServer.getWAL(HRegionServer.java:2126)
at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:284)
at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:108)
at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
# http://www.zhongruitech.com/4186901.html
bin/hbase shell
hbase(main)> create 'hbase_test',{NAME=>'cf1'},{NAME=>'cf2'}
# ERROR: Failed to create local dir /data/hbase_tmp/local/jars, DynamicClassLoader failed to init
# ERROR: Could not initialize class org.apache.hadoop.hbase.ipc.RemoteWithExtrasException$ClassLoaderHolder
# For usage try 'help "create"'
hbase(main)> put 'hbase_test', '001','cf1:name','Tom'
hbase(main)> put 'hbase_test', '001','cf1:age','18'
hbase(main)> put 'hbase_test', '001','cf2:phone’, ‘13309882999’
hbase(main)> put 'hbase_test', '001','cf2:address',’昆明’
# HBase表的创建
# 语法:create <table>, {NAME => <family>, VERSIONS => <VERSIONS>}
# 例如:创建表t1,有两个family name:f1,f2,且版本数前者为3,后者为1
hbase(main)> create 't1',{NAME => 'f1', VERSIONS => 3},{NAME => 'f2', VERSIONS => 1}
# disable表
hbase(main)> disable 't1'
# drop表
hbase(main)> drop 't1'
# HBase表的清空
hbase(main)> truncate 't1'
hbase(main)> balance_switch true
hbase(main)> create 'blog','article','author'
hbase(main)> create "test","col1","col2"
hbase(main)> list
hbase(main)> list "test"
hbase(main)> status
ERROR: org.apache.hadoop.hbase.PleaseHoldException: Master is initializing
at org.apache.hadoop.hbase.master.HMaster.checkInitialized(HMaster.java:2977)
at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1973)
at org.apache.hadoop.hbase.master.MasterRpcServices.createTable(MasterRpcServices.java:630)
at org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
# http://www.zhongruitech.com/4186901.html
# https://ken.io/note/hbase-cluster-deploy-guide
# http://www.cnblogs.com/jishilei/archive/2013/05/27/3101172.html
cp $HADOOP_HOME/etc/hadoop/hdfs-site.xml $HBASE_HOME/conf/
zookeeper集群
安装
vi conf/zoo.cfg
tickTime=2000
dataDir=/opt/zookeeper/data
clientPort=2181
initLimit=10
syncLimit=5
server.1=192.168.33.181:2888:3888
server.2=192.168.33.182:2888:3888
server.3=192.168.33.183:2888:3888
mkdir -p /opt/zookeeper/data/
mkdir -p /opt/zookeeper/logs/
echo "1" >> /opt/zookeeper/data/myid
./bin/zkServer.sh start
./bin/zkServer.sh stop
./bin/zkServer.sh status
systemd
[Unit]
Description=zookeeper.service
After=network.target
[Service]
Type=forking
Environment=ZOO_LOG_DIR=/opt/zookeeper/logs/
Environment=ZOO_LOG4J_PROP=INFO,ROLLINGFILE
Environment=ZOOPIDFILE=/tmp/zookeeper_server.pid
Environment=PATH=/usr/local/jdk1.8.0_152/bin:/usr/local/jdk1.8.0_152/jre/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
ExecStart=/opt/zookeeper-3.4.13/bin/zkServer.sh start
ExecStop=/opt/zookeeper-3.4.13/bin/zkServer.sh stop
ExecReload=/opt/zookeeper-3.4.13/bin/zkServer.sh restart
#PIDFile=/tmp/zookeeper_server.pid
#WorkingDirectory=
#User=www
[Install]
WantedBy=multi-user.target
hadoop3端口
PORT | CONFIG NAME | CONFIG VALUE |
---|---|---|
0 | dfs.balancer.address | 0.0.0.0:0 |
9866 | dfs.datanode.address | 0.0.0.0:9866 |
9864 | dfs.datanode.http.address | 0.0.0.0:9864 |
9865 | dfs.datanode.https.address | 0.0.0.0:9865 |
9867 | dfs.datanode.ipc.address | 0.0.0.0:9867 |
8111 | dfs.federation.router.admin-address | 0.0.0.0:8111 |
50071 | dfs.federation.router.http-address | 0.0.0.0:50071 |
50072 | dfs.federation.router.https-address | 0.0.0.0:50072 |
8888 | dfs.federation.router.rpc-address | 0.0.0.0:8888 |
8480 | dfs.journalnode.http-address | 0.0.0.0:8480 |
8481 | dfs.journalnode.https-address | 0.0.0.0:8481 |
8485 | dfs.journalnode.rpc-address | 0.0.0.0:8485 |
0 | dfs.mover.address | 0.0.0.0:0 |
50100 | dfs.namenode.backup.address | 0.0.0.0:50100 |
50105 | dfs.namenode.backup.http-address | 0.0.0.0:50105 |
9870 | dfs.namenode.http-address | 0.0.0.0:9870 |
9871 | dfs.namenode.https-address | 0.0.0.0:9871 |
9868 | dfs.namenode.secondary.http-address | 0.0.0.0:9868 |
9869 | dfs.namenode.secondary.https-address | 0.0.0.0:9869 |
50200 | dfs.provided.aliasmap.inmemory.dnrpc-address | 0.0.0.0:50200 |
2181 | hadoop.registry.zk.quorum | localhost:2181 |
10020 | mapreduce.jobhistory.address | 0.0.0.0:10020 |
10033 | mapreduce.jobhistory.admin.address | 0.0.0.0:10033 |
19888 | mapreduce.jobhistory.webapp.address | 0.0.0.0:19888 |
19890 | mapreduce.jobhistory.webapp.https.address | 0.0.0.0:19890 |
0 | yarn.nodemanager.address | ${yarn.nodemanager.hostname}:0 |
8049 | yarn.nodemanager.amrmproxy.address | 0.0.0.0:8049 |
8048 | yarn.nodemanager.collector-service.address | ${yarn.nodemanager.hostname}:8048 |
8040 | yarn.nodemanager.localizer.address | ${yarn.nodemanager.hostname}:8040 |
8042 | yarn.nodemanager.webapp.address | ${yarn.nodemanager.hostname}:8042 |
8044 | yarn.nodemanager.webapp.https.address | 0.0.0.0:8044 |
8032 | yarn.resourcemanager.address | ${yarn.resourcemanager.hostname}:8032 |
8033 | yarn.resourcemanager.admin.address | ${yarn.resourcemanager.hostname}:8033 |
8031 | yarn.resourcemanager.resource-tracker.address | ${yarn.resourcemanager.hostname}:8031 |
8030 | yarn.resourcemanager.scheduler.address | ${yarn.resourcemanager.hostname}:8030 |
8088 | yarn.resourcemanager.webapp.address | ${yarn.resourcemanager.hostname}:8088 |
8090 | yarn.resourcemanager.webapp.https.address | ${yarn.resourcemanager.hostname}:8090 |
8089 | yarn.router.webapp.address | 0.0.0.0:8089 |
8091 | yarn.router.webapp.https.address | 0.0.0.0:8091 |
8047 | yarn.sharedcache.admin.address | 0.0.0.0:8047 |
8045 | yarn.sharedcache.client-server.address | 0.0.0.0:8045 |
8046 | yarn.sharedcache.uploader.server.address | 0.0.0.0:8046 |
8788 | yarn.sharedcache.webapp.address | 0.0.0.0:8788 |
10200 | yarn.timeline-service.address | ${yarn.timeline-service.hostname}:10200 |
8188 | yarn.timeline-service.webapp.address | ${yarn.timeline-service.hostname}:8188 |
8190 | yarn.timeline-service.webapp.https.address | ${yarn.timeline-service.hostname}:8190 |
参考资料
- zookeeper.service
- HowToSetupYourDevelopmentEnvironment
- Submitting Applications
- BigDataLearning
- sqoop
- kudu vs hbase
- impala vs hive vs Spark SQL vs Presto
- alluxio vs Ignite
- Apache Druid
- Apache Kylin
- Apache Storm
- Apache Samza
- Apache Flink
- Apache Kafka
- parquet
- Spark Thrift Server
- scylla
- flume logstash
- ELK(ElasticSearch+Logstash+ Kibana)
- EFK(ElasticSearch+Filebeat+ Kibana)
- Fluent:fluentbit 和 fluentd.
- Kafka Streams
- Ganglia
- hue
- Apache Presto vs Apache Impala
- Spark Configuration
- Hive on Spark: Getting Started
- Setting Up HiveServer2
上次更新 2019-01-11
原始文档 查看本文 Markdown 版本 »