Is it possible to run hadoop fs -getmerge in S3?
I have an Elastic Map Reduce job which is writing some files in S3 and I want to concatenate all the files to produce a unique text file. Currently I'm manually copying the folder with all the files to our HDFS (hadoop fs copyFromLocal), then I'm running hadoop fs -getmerge and hadoop fs copyToLocal to obtain the file. is there anyway to use hadoop fs directly on S3?
Actually, this response about getmerge is incorrect. getmerge expects a local destination and will not work with S3. It throws an
IOException if you try and responds with -getmerge: Wrong FS:.
hadoop fs [generic options] -getmerge [-nl] <src> <localdst>
An easy way (if you are generating a small file that fits on the master machine) is to do the following:
I haven’t personally tried the getmerge command myself but
hadoop fs commands on EMR cluster nodes support S3 paths just like HDFS paths. For example, you can SSH into the master node of your cluster and run:
hadoop fs -ls s3://<my_bucket>/<my_dir>/
The above command will list of out all the S3 objects under the specified directory path.
I would expect
hadoop fs -getmerge to work the same way. So, just use full S3 paths (starting with s3://) instead of HDFS paths.
- Database Administration Tutorials
- Programming Tutorials & IT News
- Linux & DevOps World
- Ebook Reviews
- PES Matches, Skills & News