Context Navigation

Changes between Version 4 and Version 5 of inputouput

Timestamp:: 11/23/2012 12:05:02 PM (14 years ago)
Author:: liaojiaohe
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

inputouput

-                      v4
+                      v5
 可以将文件以行为单位进行split，比如文件的每一行对应一个map。得到的key是每一行的位置（偏移量,LongWritable类型），value是每一行的内容,Text类型。
 CompositeInputFormat，用于多个数据源的join。[[BR]]
+CompositeInputFormat，用于多个数据源的join。（可以参考hadoop例子join)[[BR]]
 ZipFileInputFormat[[BR]]
 …
+'''用关系数据库：'''
+'''用关系数据库：'''[[BR]]
 DBInputFormat[[BR]]
 对mysql支持比较好，1.0.3版对oracle时候split有问题，具体是[[BR]]
 OracleDBRecordReader 这个类 84行       if (split.getLength() > 0 && split.getStart() > 0){ 这个判断有问题，第一个split start值就是为0，要去掉[[BR]]
+使用DBInputformat要给一个读数据的sql和读总数的sql过去，用户map分割
+{{{
+                DBInputFormat.setInput(job, DataRecord.class, sql, countSql);
+}}}
 另外要增加驱动，jobtracker 机器上的hadoop/lib目录要放驱动，不用重启，[[BR]]
 …
+HBASE作为输入
+HBASE作为输入[[BR]]
+使用hbase输入比较简单，hadoop会根据表的region数定义map的数量
+{{{
+       Scan scan = new Scan();
+        scan.setCaching(500);
+        scan.setCacheBlocks(false);
+        TableMapReduceUtil.initTableMapperJob(
+                T_APP_DEVICE, // input table
+                scan, // Scan instance to control CF and attribute selection
+                MyMapper.class, // mapper class
+                null, // mapper output key
+                null, // mapper output value
+                job);
+}}}