博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
hive跑mapreduce报java.lang.RuntimeException: Error in configuring object
阅读量:5024 次
发布时间:2019-06-12

本文共 32506 字,大约阅读时间需要 108 分钟。

写于2016.7月

最近项目需要在hbase上做统计分析,在本机上装了hive,结果跑小批量数据sum时报错:

hive> select count(*) from page_view;Total jobs = 1Launching Job 1 out of 1Number of reduce tasks determined at compile time: 1In order to change the average load for a reducer (in bytes):  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers: set hive.exec.reducers.max=
In order to set a constant number of reducers: set mapreduce.job.reduces=
Starting Job = job_1468481753104_0014, Tracking URL = http://wangkai8:8088/proxy/application_1468481753104_0014/Kill Command = /Users/wangkai8/app/hadoop-2.3.0-cdh5/bin/hadoop job -kill job_1468481753104_0014Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 12016-07-14 18:38:11,888 Stage-1 map = 100%, reduce = 100%Ended Job = job_1468481753104_0014 with errorsError during job, obtaining debugging information...Examining task ID: task_1468481753104_0014_m_000000 (and more) from job job_1468481753104_0014Task with the most failures(1):-----Task ID: task_1468481753104_0014_r_000000URL: http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1468481753104_0014&tipid=task_1468481753104_0014_r_000000-----Diagnostic Messages for this Task:java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:409) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) at org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.runSubtask(LocalContainerLauncher.java:351) at org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.run(LocalContainerLauncher.java:232) at java.lang.Thread.run(Thread.java:745)Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 7 moreCaused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:135) ... 12 moreFAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTaskMapReduce Jobs Launched:Job 0: Map: 1 Reduce: 1 HDFS Read: 499 HDFS Write: 218201 FAILTotal MapReduce CPU Time Spent: 0 msechive>

查看yarn日志,一样也是空指针异常,还有个提示是No plan file found: hdfs://...

2016-07-14 15:37:15,211 INFO [uber-SubtaskRunner] org.apache.hadoop.hive.ql.exec.mr.ObjectCache: Ignoring retrieval request: __REDUCE_PLAN__2016-07-14 15:37:15,211 INFO [uber-SubtaskRunner] org.apache.hadoop.hive.ql.exec.Utilities: No plan file found: hdfs://wangkai8/tmp/hive-wangkai8/hive_2016-07-14_15-37-06_635_5512340937878595139-1/-mr-10004/1e60df26-2b73-4384-adbc-651342dd48f8/reduce.xml2016-07-14 15:37:15,211 INFO [uber-SubtaskRunner] org.apache.hadoop.hive.ql.exec.mr.ObjectCache: Ignoring cache key: __REDUCE_PLAN__2016-07-14 15:37:15,212 WARN [uber-SubtaskRunner] org.apache.hadoop.mapred.LocalContainerLauncher: Exception running local (uberized) 'child' : java.lang.RuntimeException: Error in configuring object    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:409)    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)    at org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.runSubtask(LocalContainerLauncher.java:351)    at org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.run(LocalContainerLauncher.java:232)    at java.lang.Thread.run(Thread.java:745)Caused by: java.lang.reflect.InvocationTargetException    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)    at java.lang.reflect.Method.invoke(Method.java:606)    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)    ... 7 moreCaused by: java.lang.NullPointerException    at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:132)    ... 12 more2016-07-14 15:37:15,212 INFO [uber-SubtaskRunner] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Status update from attempt_1468481753104_0002_r_000000_02016-07-14 15:37:15,212 INFO [uber-SubtaskRunner] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1468481753104_0002_r_000000_0 is : 0.02016-07-14 15:37:15,212 INFO [uber-SubtaskRunner] org.apache.hadoop.mapred.Task: Runnning cleanup for the task2016-07-14 15:37:15,212 INFO [uber-SubtaskRunner] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1468481753104_0002_r_000000_0: java.lang.RuntimeException: Error in configuring object    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:409)    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)    at org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.runSubtask(LocalContainerLauncher.java:351)    at org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.run(LocalContainerLauncher.java:232)    at java.lang.Thread.run(Thread.java:745)Caused by: java.lang.reflect.InvocationTargetException    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)    at java.lang.reflect.Method.invoke(Method.java:606)    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)    ... 7 moreCaused by: java.lang.NullPointerException    at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:132)    ... 12 more2016-07-14 15:37:15,213 INFO [uber-SubtaskRunner] org.apache.hadoop.mapred.LocalContainerLauncher: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for container container_1468481753104_0002_01_000001 taskAttempt attempt_1468481753104_0002_m_000000_02016-07-14 15:37:15,213 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1468481753104_0002_r_000000_0: java.lang.RuntimeException: Error in configuring object    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:409)    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)    at org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.runSubtask(LocalContainerLauncher.java:351)    at org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.run(LocalContainerLauncher.java:232)    at java.lang.Thread.run(Thread.java:745)Caused by: java.lang.reflect.InvocationTargetException    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)    at java.lang.reflect.Method.invoke(Method.java:606)    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)    ... 7 moreCaused by: java.lang.NullPointerException    at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:132)    ... 12 more

 

查看hive源码,发现gWork为null,导致空指针异常

//src: org.apache.hadoop.hive.ql.exec.mr.ExecReducer#configure     ObjectCache cache = ObjectCacheFactory.getCache(jc);    ReduceWork gWork = (ReduceWork) cache.retrieve(PLAN_KEY);    if (gWork == null) {      gWork = Utilities.getReduceWork(job);      cache.cache(PLAN_KEY, gWork);    } else {      Utilities.setReduceWork(job, gWork);    }    reducer = gWork.getReducer(); //gWork为null,导致空指针

接着查看Utilities.getReduceWork(job)方法,最后跟到org.apache.hadoop.hive.ql.exec.Utilities#getBaseWork:

/**   * Returns the Map or Reduce plan   * Side effect: the BaseWork returned is also placed in the gWorkMap   * @param conf   * @param name   * @return BaseWork based on the name supplied will return null if name is null   * @throws RuntimeException if the configuration files are not proper or if plan can not be loaded   */  private static BaseWork getBaseWork(Configuration conf, String name) {    BaseWork gWork = null;    Path path = null;    InputStream in = null;    try {      path = getPlanPath(conf, name);      assert path != null;      if (!gWorkMap.containsKey(path)) {        Path localPath;        if (ShimLoader.getHadoopShims().isLocalMode(conf)) {          localPath = path;        } else {          localPath = new Path(name);        }        if (HiveConf.getBoolVar(conf, ConfVars.HIVE_RPC_QUERY_PLAN)) {          LOG.debug("Loading plan from string: "+path.toUri().getPath());          String planString = conf.get(path.toUri().getPath());          if (planString == null) {            LOG.info("Could not find plan string in conf");            return null;          }          byte[] planBytes = Base64.decodeBase64(planString);          in = new ByteArrayInputStream(planBytes);          in = new InflaterInputStream(in);        } else {          in = new FileInputStream(localPath.toUri().getPath());        }        if(MAP_PLAN_NAME.equals(name)){          if (ExecMapper.class.getName().equals(conf.get(MAPRED_MAPPER_CLASS))){            gWork = deserializePlan(in, MapWork.class, conf);          } else if(RCFileMergeMapper.class.getName().equals(conf.get(MAPRED_MAPPER_CLASS))) {            gWork = deserializePlan(in, MergeWork.class, conf);          } else if(ColumnTruncateMapper.class.getName().equals(conf.get(MAPRED_MAPPER_CLASS))) {            gWork = deserializePlan(in, ColumnTruncateWork.class, conf);          } else if(PartialScanMapper.class.getName().equals(conf.get(MAPRED_MAPPER_CLASS))) {            gWork = deserializePlan(in, PartialScanWork.class,conf);          } else {            throw new RuntimeException("unable to determine work from configuration ."                + MAPRED_MAPPER_CLASS + " was "+ conf.get(MAPRED_MAPPER_CLASS)) ;          }        } else if (REDUCE_PLAN_NAME.equals(name)) {          if(ExecReducer.class.getName().equals(conf.get(MAPRED_REDUCER_CLASS))) {            gWork = deserializePlan(in, ReduceWork.class, conf);          } else {            throw new RuntimeException("unable to determine work from configuration ."                + MAPRED_REDUCER_CLASS +" was "+ conf.get(MAPRED_REDUCER_CLASS)) ;          }        }        gWorkMap.put(path, gWork);      } else {        LOG.debug("Found plan in cache.");        gWork = gWorkMap.get(path);      }      return gWork;    } catch (FileNotFoundException fnf) {      // happens. e.g.: no reduce work.      LOG.info("No plan file found: "+path);      return null;    } catch (Exception e) {      LOG.error("Failed to load plan: "+path, e);      throw new RuntimeException(e);    } finally {      if (in != null) {        try {          in.close();        } catch (IOException cantBlameMeForTrying) { }      }    }  }

 再结合yarn报错信息:2016-07-14 15:37:15,211 INFO [uber-SubtaskRunner] org.apache.hadoop.hive.ql.exec.Utilities: No plan file found: hdfs://wangkai8/tmp/hive-wangkai8/hive_2016-07-14_15-37-06_635_5512340937878595139-1/-mr-10004/1e60df26-2b73-4384-adbc-651342dd48f8/reduce.xml

很明显reduce.xml找不到,报FileNotFoundException:No plan file found,但通过dfs fs -ls hdfs://wangkai8/tmp/hive-wangkai8/hive_2016-07-14_15-37-06_635_5512340937878595139-1/-mr-10004/1e60df26-2b73-4384-adbc-651342dd48f8/reduce.xml查看文件,文件的确是存在的,文件存在还报这个错,那这个问题就很诡异了。

 接着在hive中映射一张hbase表,此表记录较多,大概50万条,同样执行select count操作:

hive> select count(*) from qry_ins_user;Total jobs = 1Launching Job 1 out of 1Number of reduce tasks determined at compile time: 1In order to change the average load for a reducer (in bytes):  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers: set hive.exec.reducers.max=
In order to set a constant number of reducers: set mapreduce.job.reduces=
Starting Job = job_1468481753104_0016, Tracking URL = http://wangkai8:8088/proxy/application_1468481753104_0016/Kill Command = /Users/wangkai8/app/hadoop-2.3.0-cdh5/bin/hadoop job -kill job_1468481753104_0016Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 12016-07-14 19:42:16,196 Stage-1 map = 0%, reduce = 0%2016-07-14 19:42:30,570 Stage-1 map = 50%, reduce = 0%2016-07-14 19:42:32,627 Stage-1 map = 100%, reduce = 0%2016-07-14 19:42:38,792 Stage-1 map = 100%, reduce = 100%Ended Job = job_1468481753104_0016MapReduce Jobs Launched:Job 0: Map: 2 Reduce: 1 HDFS Read: 566 HDFS Write: 7 SUCCESSTotal MapReduce CPU Time Spent: 0 msecOK509844Time taken: 34.023 seconds, Fetched: 1 row(s)hive>

 这个居然能执行成功,3条记录执行就报错?这个问题不能不说更诡异了。后面各种调试,最后从AppMaster日志中得到关键信息:

Log Type: syslogLog Length: 1554142016-07-14 18:40:17,962 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for application appattempt_1468481753104_0015_0000012016-07-14 18:40:18,211 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.2016-07-14 18:40:18,230 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.2016-07-14 18:40:18,327 WARN [main] org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable2016-07-14 18:40:18,340 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Executing with tokens:2016-07-14 18:40:18,340 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Kind: YARN_AM_RM_TOKEN, Service: , Ident: (org.apache.hadoop.yarn.security.AMRMTokenIdentifier@6b7a5e49)2016-07-14 18:40:18,363 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: The specific max attempts: 2 for application: 15. Attempt num: 1 is last retry: false2016-07-14 18:40:18,452 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: dfs.datanode.data.dir;  Ignoring.2016-07-14 18:40:18,454 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.2016-07-14 18:40:18,460 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: dfs.namenode.name.dir;  Ignoring.2016-07-14 18:40:18,464 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.2016-07-14 18:40:18,837 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: OutputCommitter set in config org.apache.hadoop.hive.shims.HadoopShimsSecure$NullOutputCommitter2016-07-14 18:40:18,839 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: OutputCommitter is org.apache.hadoop.hive.shims.HadoopShimsSecure$NullOutputCommitter2016-07-14 18:40:18,876 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.jobhistory.EventType for class org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler2016-07-14 18:40:18,877 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.job.event.JobEventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher2016-07-14 18:40:18,877 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.job.event.TaskEventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskEventDispatcher2016-07-14 18:40:18,878 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.job.event.TaskAttemptEventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher2016-07-14 18:40:18,878 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventType for class org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler2016-07-14 18:40:18,882 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.speculate.Speculator$EventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$SpeculatorEventDispatcher2016-07-14 18:40:18,882 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.rm.ContainerAllocator$EventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter2016-07-14 18:40:18,883 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncher$EventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerLauncherRouter2016-07-14 18:40:18,940 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.job.event.JobFinishEvent$Type for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler2016-07-14 18:40:19,105 INFO [main] org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties2016-07-14 18:40:19,147 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).2016-07-14 18:40:19,147 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MRAppMaster metrics system started2016-07-14 18:40:19,153 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Adding job token for job_1468481753104_0015 to jobTokenSecretManager2016-07-14 18:40:19,239 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Uberizing job job_1468481753104_0015: 1m+1r tasks (132 input bytes) will run sequentially on single node.2016-07-14 18:40:19,257 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Input size for job job_1468481753104_0015 = 132. Number of splits = 12016-07-14 18:40:19,258 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Number of reduces for job job_1468481753104_0015 = 12016-07-14 18:40:19,258 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1468481753104_0015Job Transitioned from NEW to INITED2016-07-14 18:40:19,258 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: MRAppMaster uberizing job job_1468481753104_0015 in local container ("uber-AM") on node wangkai8:59734.2016-07-14 18:40:19,280 INFO [main] org.apache.hadoop.ipc.CallQueueManager: Using callQueue class java.util.concurrent.LinkedBlockingQueue2016-07-14 18:40:19,286 INFO [Socket Reader #1 for port 57486] org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 574862016-07-14 18:40:19,301 INFO [main] org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding protocol org.apache.hadoop.mapreduce.v2.api.MRClientProtocolPB to the server
2016-07-14 18:40:19,258 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: MRAppMaster uberizing job job_1468481753104_0015 in local container ("uber-AM") on node wangkai8:59734. reduce跑在local模式,难道50万记录的表跑在集群模式?查看50万条记录对应AppMaster日志:
Log Type: syslogLog Length: 476182016-07-14 19:42:13,007 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for application appattempt_1468481753104_0016_0000012016-07-14 19:42:13,254 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.2016-07-14 19:42:13,277 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.2016-07-14 19:42:13,366 WARN [main] org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable2016-07-14 19:42:13,380 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Executing with tokens:2016-07-14 19:42:13,380 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Kind: YARN_AM_RM_TOKEN, Service: , Ident: (org.apache.hadoop.yarn.security.AMRMTokenIdentifier@5f4829f5)2016-07-14 19:42:13,407 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: The specific max attempts: 2 for application: 16. Attempt num: 1 is last retry: false2016-07-14 19:42:13,499 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: dfs.datanode.data.dir;  Ignoring.2016-07-14 19:42:13,502 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.2016-07-14 19:42:13,508 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: dfs.namenode.name.dir;  Ignoring.2016-07-14 19:42:13,511 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.2016-07-14 19:42:14,025 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: OutputCommitter set in config org.apache.hadoop.hive.shims.HadoopShimsSecure$NullOutputCommitter2016-07-14 19:42:14,027 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: OutputCommitter is org.apache.hadoop.hive.shims.HadoopShimsSecure$NullOutputCommitter2016-07-14 19:42:14,071 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.jobhistory.EventType for class org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler2016-07-14 19:42:14,072 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.job.event.JobEventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher2016-07-14 19:42:14,073 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.job.event.TaskEventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskEventDispatcher2016-07-14 19:42:14,073 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.job.event.TaskAttemptEventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher2016-07-14 19:42:14,074 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventType for class org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler2016-07-14 19:42:14,080 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.speculate.Speculator$EventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$SpeculatorEventDispatcher2016-07-14 19:42:14,081 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.rm.ContainerAllocator$EventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter2016-07-14 19:42:14,082 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncher$EventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerLauncherRouter2016-07-14 19:42:14,159 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.job.event.JobFinishEvent$Type for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler2016-07-14 19:42:14,409 INFO [main] org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties2016-07-14 19:42:14,480 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).2016-07-14 19:42:14,480 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MRAppMaster metrics system started2016-07-14 19:42:14,487 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Adding job token for job_1468481753104_0016 to jobTokenSecretManager2016-07-14 19:42:14,603 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Not uberizing job_1468481753104_0016 because: too much input;2016-07-14 19:42:14,628 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Input size for job job_1468481753104_0016 = 446693376. Number of splits = 22016-07-14 19:42:14,630 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Number of reduces for job job_1468481753104_0016 = 12016-07-14 19:42:14,630 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1468481753104_0016Job Transitioned from NEW to INITED2016-07-14 19:42:14,631 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: MRAppMaster launching normal, non-uberized, multi-container job job_1468481753104_0016.2016-07-14 19:42:14,666 INFO [main] org.apache.hadoop.ipc.CallQueueManager: Using callQueue class java.util.concurrent.LinkedBlockingQueue2016-07-14 19:42:14,676 INFO [Socket Reader #1 for port 59418] org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 594182016-07-14 19:42:14,698 INFO [main] org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding protocol org.apache.hadoop.mapreduce.v2.api.MRClientProtocolPB to the server
2016-07-14 19:42:14,603 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Not uberizing job_1468481753104_0016 because: too much input; 好吧,果然是集群模式,那就查看org.apache.hadoop.mapreduce.v2.app.job.impl.JobImp源码,看看local、集群模式是怎么决定的:
/**   * Decide whether job can be run in uber mode based on various criteria.   * @param dataInputLength Total length for all splits   */  private void makeUberDecision(long dataInputLength) {    //FIXME:  need new memory criterion for uber-decision (oops, too late here;    // until AM-resizing supported,    // must depend on job client to pass fat-slot needs)    // these are no longer "system" settings, necessarily; user may override    int sysMaxMaps = conf.getInt(MRJobConfig.JOB_UBERTASK_MAXMAPS, 9);    int sysMaxReduces = conf.getInt(MRJobConfig.JOB_UBERTASK_MAXREDUCES, 1);    long sysMaxBytes = conf.getLong(MRJobConfig.JOB_UBERTASK_MAXBYTES,        fs.getDefaultBlockSize(this.remoteJobSubmitDir)); // FIXME: this is wrong; get FS from                                   // [File?]InputFormat and default block size                                   // from that    long sysMemSizeForUberSlot =        conf.getInt(MRJobConfig.MR_AM_VMEM_MB,            MRJobConfig.DEFAULT_MR_AM_VMEM_MB);    long sysCPUSizeForUberSlot =        conf.getInt(MRJobConfig.MR_AM_CPU_VCORES,            MRJobConfig.DEFAULT_MR_AM_CPU_VCORES);    boolean uberEnabled =        conf.getBoolean(MRJobConfig.JOB_UBERTASK_ENABLE, false);    boolean smallNumMapTasks = (numMapTasks <= sysMaxMaps);    boolean smallNumReduceTasks = (numReduceTasks <= sysMaxReduces);    boolean smallInput = (dataInputLength <= sysMaxBytes);    // ignoring overhead due to UberAM and statics as negligible here:    boolean smallMemory =        ( (Math.max(conf.getLong(MRJobConfig.MAP_MEMORY_MB, 0),            conf.getLong(MRJobConfig.REDUCE_MEMORY_MB, 0))            <= sysMemSizeForUberSlot)            || (sysMemSizeForUberSlot == JobConf.DISABLED_MEMORY_LIMIT));    boolean smallCpu =        (            Math.max(                conf.getInt(                    MRJobConfig.MAP_CPU_VCORES,                     MRJobConfig.DEFAULT_MAP_CPU_VCORES),                 conf.getInt(                    MRJobConfig.REDUCE_CPU_VCORES,                     MRJobConfig.DEFAULT_REDUCE_CPU_VCORES))              <= sysCPUSizeForUberSlot        );    boolean notChainJob = !isChainJob(conf);    // User has overall veto power over uberization, or user can modify    // limits (overriding system settings and potentially shooting    // themselves in the head).  Note that ChainMapper/Reducer are    // fundamentally incompatible with MR-1220; they employ a blocking    // queue between the maps/reduces and thus require parallel execution,    // while "uber-AM" (MR AM + LocalContainerLauncher) loops over tasks    // and thus requires sequential execution.    isUber = uberEnabled && smallNumMapTasks && smallNumReduceTasks        && smallInput && smallMemory && smallCpu         && notChainJob;    if (isUber) {      LOG.info("Uberizing job " + jobId + ": " + numMapTasks + "m+"          + numReduceTasks + "r tasks (" + dataInputLength          + " input bytes) will run sequentially on single node.");      // make sure reduces are scheduled only after all map are completed      conf.setFloat(MRJobConfig.COMPLETED_MAPS_FOR_REDUCE_SLOWSTART,                        1.0f);      // uber-subtask attempts all get launched on same node; if one fails,      // probably should retry elsewhere, i.e., move entire uber-AM:  ergo,      // limit attempts to 1 (or at most 2?  probably not...)      conf.setInt(MRJobConfig.MAP_MAX_ATTEMPTS, 1);      conf.setInt(MRJobConfig.REDUCE_MAX_ATTEMPTS, 1);      // disable speculation      conf.setBoolean(MRJobConfig.MAP_SPECULATIVE, false);      conf.setBoolean(MRJobConfig.REDUCE_SPECULATIVE, false);    } else {      StringBuilder msg = new StringBuilder();      msg.append("Not uberizing ").append(jobId).append(" because:");      if (!uberEnabled)        msg.append(" not enabled;");      if (!smallNumMapTasks)        msg.append(" too many maps;");      if (!smallNumReduceTasks)        msg.append(" too many reduces;");      if (!smallInput)        msg.append(" too much input;");      if (!smallMemory)        msg.append(" too much RAM;");      if (!notChainJob)        msg.append(" chainjob;");      LOG.info(msg.toString());    }  }

mapreduce是否跑在local模式由7个变量决定:uberEnabled && smallNumMapTasks && smallNumReduceTasks&& smallInput && smallMemory && smallCpu && notChainJob,只有这7个变量同时满足才能跑在local模式。

首先要满足uberEnabled为true,看看uberEnable定义:uberEnabled = conf.getBoolean(MRJobConfig.JOB_UBERTASK_ENABLE, false),MRJobConfig.JOB_UBERTASK_ENABLE对应配置项为mapreduce.job.ubertask.enable(mapred-site.xml),说明mapreduce.job.ubertask.enable配置为true,查看mapred-site.xml配置文件,果然是true

mapreduce.job.ubertask.enable
true
Whether to enable the small-jobs "ubertask" optimization, which runs "sufficiently small" jobs sequentially within a single JVM. "Small" is defined by the following maxmaps, maxreduces, and maxbytes settings. Users may override this value.

问题找到了就好办,第一种办法直接把mapreduce.job.ubertask.enable改为false,都走集群模式;

第二种办法是mapreduce.job.ubertask.enable还是为true,继续分析reduce.xml文件找不到的原因,还是查看读取reduce.xml那块:org.apache.hadoop.hive.ql.exec.Utilities#getBaseWork

if (HiveConf.getBoolVar(conf, ConfVars.HIVE_RPC_QUERY_PLAN)) {          LOG.debug("Loading plan from string: "+path.toUri().getPath());          String planString = conf.get(path.toUri().getPath());          if (planString == null) {            LOG.info("Could not find plan string in conf");            return null;          }          byte[] planBytes = Base64.decodeBase64(planString);          in = new ByteArrayInputStream(planBytes);          in = new InflaterInputStream(in);        } else {          in = new FileInputStream(localPath.toUri().getPath());        }

可见hive执行计划文件由ConfVars.HIVE_RPC_QUERY_PLAN决定是通过rpc读集群上的文件还是读本地文件,默认为本地文件。 

转载于:https://www.cnblogs.com/yueweimian/p/5671497.html

你可能感兴趣的文章
Swift--错误集:couldn’t be opened because you don’t have permission to view it
查看>>
javascript引擎执行的过程的理解--语法分析和预编译阶段
查看>>
Java_基础—递归
查看>>
Apache+Tomcat构建Tomcat负载均衡集群
查看>>
学会去分享,也许会走得更远
查看>>
redis基本命令
查看>>
inherited 的研究。
查看>>
Hibernate由model类自动同步数据库表结构
查看>>
Android 架构 2.界面
查看>>
SPEL语言-Spring Expression Language
查看>>
设置nginx中文件上传的大小限制度
查看>>
C# 课堂总结3-语句
查看>>
精读《Scheduling in React》
查看>>
[Leetcode] 4Sum
查看>>
java.lang.StackOverflowError 解决方法
查看>>
hibernate事务控制
查看>>
[WCF编程]10.操作:请求/应答操作
查看>>
xfire实现webservice客户端之测试关注点
查看>>
在MySQL数据库中出现中文显示乱码
查看>>
对高内聚,低耦合的理解
查看>>