Last active
February 19, 2020 13:48
-
-
Save seanorama/768148de376417afdaa00628c611d27d to your computer and use it in GitHub Desktop.
Revisions
-
seanorama revised this gist
Feb 13, 2020 . 1 changed file with 3 additions and 4 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -4,11 +4,10 @@ This hacky method processes 1 file at a time: 1. **copy to a local disk** 2. compress 3. put back onto HDFS 4. delete original file from HDFS and compressed file from local disk. BE CAREFUL: **Before executing, inspect the size of each file!** - The risk is: a single large file could fill the local disk or you could leave the server compressing a single large file for hours. # How -
seanorama revised this gist
Feb 13, 2020 . 1 changed file with 10 additions and 2 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,8 +1,16 @@ # Compress files which are already on HDFS This hacky method processes 1 file at a time: 1. **copy to a local disk** 2. compress 3. put back onto HDFS 4. delete original file from HDFS So **BE CAREFUL**: - inspect the list of files to be compressed before executing! - if not you could fill the local disk, or being dealing with a compression that takes a very long time. # How 1. (optional) SSH to a data node. Running from a data node will make it faster, but it isn't required. -
seanorama revised this gist
Jan 22, 2020 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -8,7 +8,7 @@ This is a hacky approach involving copying each file locally, compressing, and t 2. (optional) Become HDFS and kinit. You can do this as any user that can access the files. ``` sudo -u hdfs -i keytab=/etc/security/keytabs/hdfs.headless.keytab kinit -kt ${keytab} $(klist -kt ${keytab}| awk '{print $NF}'|tail -1) -
seanorama revised this gist
Jan 22, 2020 . 1 changed file with 3 additions and 2 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -4,11 +4,12 @@ This is a hacky approach involving copying each file locally, compressing, and t # How: 1. (optional) SSH to a data node. Running from a data node will make it faster, but it isn't required. 2. (optional) Become HDFS and kinit. You can do this as any user that can access the files. ``` sudo -u hfds -i keytab=/etc/security/keytabs/hdfs.headless.keytab kinit -kt ${keytab} $(klist -kt ${keytab}| awk '{print $NF}'|tail -1) ``` -
seanorama created this gist
Jan 22, 2020 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,33 @@ # Compress files which are already on HDFS This is a hacky approach involving copying each file locally, compressing, and then putting back onto HDFS. This obviously won't work if the file is larger than can fit in any local disk. # How: 1. SSH to a data node. 2. Become HDFS and kinit. ``` sudo -u hfds -i keytab=/etc/security/keytabs/hdfs.headless.keytab kinit -kt ${keytab} $(klist -kt ${keytab}| awk '{print $NF}'|tail -1) ``` 3. Change to a partition that is big enough to hold 1-2 of the uncompressed files: 4. Get list of files (this example is getting Ranger YARN audits) ``` files=$(hdfs dfs -find /ranger/audit/yarn | grep -Ev "($(date '+%Y%m%d')|$(date -d yesterday +'%Y%m%d'))" | grep .log$) ``` 5. Compress and remove uncompressed ``` for file in ${files}; do filename="$(basename ${file})" filedir="$(dirname ${file})" hdfs dfs -copyToLocal "${file}" && gzip "${filename}" && hdfs dfs -moveFromLocal "${filename}".gz "${filedir}/" && hdfs dfs -stat "${file}.gz" && hdfs dfs -rm -skipTrash "${file}" done ```