-
-
Save yaravind/49efb18fbc867ad0afc31eb43a3f05ba to your computer and use it in GitHub Desktop.
Revisions
-
airawat revised this gist
Jul 3, 2013 . 3 changed files with 18 additions and 16 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -18,7 +18,8 @@ WITH SERDEPROPERTIES ( stored as textfile; b) Create partitions and load data: Note: Replace '/user/airawat' with '/user/<your userID>' hive> Alter table LogParserSample Add IF NOT EXISTS partition(year=2013, month=04) location '/user/airawat/LogParserSampleHive/logs/airawat-syslog/2013/04/'; This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -5,21 +5,6 @@ hive> set hive.cli.print.header=true; hive> add jar hadoop-lib/hive-contrib-0.10.0-cdh4.2.0.jar; --I need this as my environment is not properly configured hive> select Year,Month,Day,Event,Count(*) Occurrence from LogParserSample group by year,month,day,event order by event desc,year,month,day; This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,16 @@ Query output ------------ year month day event occurrence 2013 05 7 udevd[361]: 1 2013 04 23 sudo: 1 2013 05 3 sudo: 1 2013 05 3 ntpd_initres[1705]: 144 2013 05 4 ntpd_initres[1705]: 261 2013 05 5 ntpd_initres[1705]: 264 2013 05 6 ntpd_initres[1705]: 123 2013 05 3 kernel: 5 2013 05 6 kernel: 1 2013 05 7 kernel: 52 2013 05 3 init: 5 2013 05 7 init: 18 -
airawat revised this gist
Jul 3, 2013 . 1 changed file with 15 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -7,6 +7,21 @@ hive> select Year,Month,Day,Event,Count(*) Occurrence from LogParserSample group Output ------- year month day event occurrence 2013 05 7 udevd[361]: 1 2013 04 23 sudo: 1 2013 05 3 sudo: 1 2013 05 3 ntpd_initres[1705]: 144 2013 05 4 ntpd_initres[1705]: 261 2013 05 5 ntpd_initres[1705]: 264 2013 05 6 ntpd_initres[1705]: 123 2013 05 3 kernel: 5 2013 05 6 kernel: 1 2013 05 7 kernel: 52 2013 05 3 init: 5 2013 05 7 init: 18 -
airawat revised this gist
Jul 3, 2013 . 1 changed file with 1 addition and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -18,6 +18,7 @@ WITH SERDEPROPERTIES ( stored as textfile; b) Create partitions and load data: [Replace '/user/airawat' with '/user/<your userID>'] hive> Alter table LogParserSample Add IF NOT EXISTS partition(year=2013, month=04) location '/user/airawat/LogParserSampleHive/logs/airawat-syslog/2013/04/'; -
airawat revised this gist
Jul 3, 2013 . 5 changed files with 65 additions and 5 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,6 +1,16 @@ Data download ------------- https://groups.google.com/forum/?hl=en#!topic/hadooped/_tj8w_E-MGY Directory structure ------------------- LogParserSampleHive logs airawat-syslog 2013 04 messages 2013 05 messages This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,2 +1,15 @@ Data load commands ------------------ a) Load the data $ hadoop fs -mkdir LogParserSampleHive $ hadoop fs -mkdir LogParserSampleHive/logs $ hadoop fs -put LogParserSampleHive/logs/* LogParserSampleHive/logs/ $ hadoop fs -ls -R LogParserSampleHive/ | awk {'print $8'} LogParserSampleHive/logs LogParserSampleHive/logs/airawat-syslog LogParserSampleHive/logs/airawat-syslog/2013 LogParserSampleHive/logs/airawat-syslog/2013/04 LogParserSampleHive/logs/airawat-syslog/2013/04/messages LogParserSampleHive/logs/airawat-syslog/2013/05 LogParserSampleHive/logs/airawat-syslog/2013/05/messages This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,2 +1,26 @@ Hive commands -------------- a) Create external table: hive> CREATE EXTERNAL TABLE LogParserSample( month_name STRING, day STRING, time STRING, host STRING, event STRING, log STRING) PARTITIONED BY(year int, month int) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES ( "input.regex" = "(\\w+)\\s+(\\d+)\\s+(\\d+:\\d+:\\d+)\\s+(\\w+\\W*\\w*)\\s+(.*?\\:)\\s+(.*$)" ) stored as textfile; b) Create partitions and load data: hive> Alter table LogParserSample Add IF NOT EXISTS partition(year=2013, month=04) location '/user/airawat/LogParserSampleHive/logs/airawat-syslog/2013/04/'; hive> Alter table LogParserSample Add IF NOT EXISTS partition(year=2013, month=05) location '/user/airawat/LogParserSampleHive/logs/airawat-syslog/2013/05/'; This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,15 @@ Hive query ----------- hive> set hive.cli.print.header=true; hive> add jar hadoop-lib/hive-contrib-0.10.0-cdh4.2.0.jar; --I need this as my environment is not properly configured hive> select Year,Month,Day,Event,Count(*) Occurrence from LogParserSample group by year,month,day,event order by event desc,year,month,day; Output ------- This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,2 +0,0 @@ -
airawat revised this gist
Jul 3, 2013 . 6 changed files with 39 additions and 10 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,13 +1,12 @@ This gist includes hive ql scripts to create an external partitioned table for Syslog generated log files using regex serde; Usecase: Count the number of occurances of processes that got logged, by year, month, day and process. Includes: --------- Sample data and structure: 01-SampleDataAndStructure Data download: 02-DataDownload Data load commands: 03-DataLoadCommands Hive commands: 04-HiveCommands Sample output: 05-SampleOutput This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,18 @@ Sample data ------------ May 3 11:52:54 cdh-dn03 init: tty (/dev/tty6) main process (1208) killed by TERM signal May 3 11:53:31 cdh-dn03 kernel: registered taskstats version 1 May 3 11:53:31 cdh-dn03 kernel: sr0: scsi3-mmc drive: 32x/32x xa/form2 tray May 3 11:53:31 cdh-dn03 kernel: piix4_smbus 0000:00:07.0: SMBus base address uninitialized - upgrade BIOS or use force_addr=0xaddr May 3 11:53:31 cdh-dn03 kernel: nf_conntrack version 0.5.0 (7972 buckets, 31888 max) May 3 11:53:57 cdh-dn03 kernel: hrtimer: interrupt took 11250457 ns May 3 11:53:59 cdh-dn03 ntpd_initres[1705]: host name not found: 0.rhel.pool.ntp.org Structure ---------- Month = May Day = 3 Time = 11:52:54 Node = cdh-dn03 Process = init: Log msg = tty (/dev/tty6) main process (1208) killed by TERM signal This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,6 @@ Data download ------------- Directory structure ------------------- This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,2 @@ Data load commands ------------------ This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,2 @@ Hive commands -------------- This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,2 @@ Hive query and output --------------------- -
airawat created this gist
Jul 3, 2013 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,13 @@ This gist includes a mapper, reducer and driver in java that can parse log files using regex; The code for combiner is the same as reducer; Usecase: Count the number of occurances of processes that got logged, inception to date. Includes: --------- Mapper: 01-LogEventCountMapper.java Reducer: 02-LogEventCountReducer.java Driver: 03-LogEventCount.java Sample data and scripts for download:04-ScriptAndDataDownload Sample data and structure: 05-SampleDataAndStructure Commands: 06-Commands Sample output: 07-Output