Skip to content

Instantly share code, notes, and snippets.

@yaravind
Forked from airawat/00-LogParser-Hive-Regex
Created May 18, 2018 02:41
Show Gist options
  • Select an option

  • Save yaravind/49efb18fbc867ad0afc31eb43a3f05ba to your computer and use it in GitHub Desktop.

Select an option

Save yaravind/49efb18fbc867ad0afc31eb43a3f05ba to your computer and use it in GitHub Desktop.

Revisions

  1. @airawat airawat revised this gist Jul 3, 2013. 3 changed files with 18 additions and 16 deletions.
    3 changes: 2 additions & 1 deletion 04-HiveCommands
    Original file line number Diff line number Diff line change
    @@ -18,7 +18,8 @@ WITH SERDEPROPERTIES (
    stored as textfile;

    b) Create partitions and load data:
    [Replace '/user/airawat' with '/user/<your userID>']

    Note: Replace '/user/airawat' with '/user/<your userID>'

    hive> Alter table LogParserSample Add IF NOT EXISTS partition(year=2013, month=04)
    location '/user/airawat/LogParserSampleHive/logs/airawat-syslog/2013/04/';
    15 changes: 0 additions & 15 deletions 05-QueryAndOutput
    Original file line number Diff line number Diff line change
    @@ -5,21 +5,6 @@ hive> set hive.cli.print.header=true;
    hive> add jar hadoop-lib/hive-contrib-0.10.0-cdh4.2.0.jar; --I need this as my environment is not properly configured
    hive> select Year,Month,Day,Event,Count(*) Occurrence from LogParserSample group by year,month,day,event order by event desc,year,month,day;

    Output
    -------
    year month day event occurrence
    2013 05 7 udevd[361]: 1
    2013 04 23 sudo: 1
    2013 05 3 sudo: 1
    2013 05 3 ntpd_initres[1705]: 144
    2013 05 4 ntpd_initres[1705]: 261
    2013 05 5 ntpd_initres[1705]: 264
    2013 05 6 ntpd_initres[1705]: 123
    2013 05 3 kernel: 5
    2013 05 6 kernel: 1
    2013 05 7 kernel: 52
    2013 05 3 init: 5
    2013 05 7 init: 18



    16 changes: 16 additions & 0 deletions 06-HiveQueryOutput
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,16 @@
    Query output
    ------------

    year month day event occurrence
    2013 05 7 udevd[361]: 1
    2013 04 23 sudo: 1
    2013 05 3 sudo: 1
    2013 05 3 ntpd_initres[1705]: 144
    2013 05 4 ntpd_initres[1705]: 261
    2013 05 5 ntpd_initres[1705]: 264
    2013 05 6 ntpd_initres[1705]: 123
    2013 05 3 kernel: 5
    2013 05 6 kernel: 1
    2013 05 7 kernel: 52
    2013 05 3 init: 5
    2013 05 7 init: 18
  2. @airawat airawat revised this gist Jul 3, 2013. 1 changed file with 15 additions and 0 deletions.
    15 changes: 15 additions & 0 deletions 05-QueryAndOutput
    Original file line number Diff line number Diff line change
    @@ -7,6 +7,21 @@ hive> select Year,Month,Day,Event,Count(*) Occurrence from LogParserSample group

    Output
    -------
    year month day event occurrence
    2013 05 7 udevd[361]: 1
    2013 04 23 sudo: 1
    2013 05 3 sudo: 1
    2013 05 3 ntpd_initres[1705]: 144
    2013 05 4 ntpd_initres[1705]: 261
    2013 05 5 ntpd_initres[1705]: 264
    2013 05 6 ntpd_initres[1705]: 123
    2013 05 3 kernel: 5
    2013 05 6 kernel: 1
    2013 05 7 kernel: 52
    2013 05 3 init: 5
    2013 05 7 init: 18





  3. @airawat airawat revised this gist Jul 3, 2013. 1 changed file with 1 addition and 0 deletions.
    1 change: 1 addition & 0 deletions 04-HiveCommands
    Original file line number Diff line number Diff line change
    @@ -18,6 +18,7 @@ WITH SERDEPROPERTIES (
    stored as textfile;

    b) Create partitions and load data:
    [Replace '/user/airawat' with '/user/<your userID>']

    hive> Alter table LogParserSample Add IF NOT EXISTS partition(year=2013, month=04)
    location '/user/airawat/LogParserSampleHive/logs/airawat-syslog/2013/04/';
  4. @airawat airawat revised this gist Jul 3, 2013. 5 changed files with 65 additions and 5 deletions.
    12 changes: 11 additions & 1 deletion 02-DataDownload
    Original file line number Diff line number Diff line change
    @@ -1,6 +1,16 @@
    Data download
    -------------
    https://groups.google.com/forum/?hl=en#!topic/hadooped/_tj8w_E-MGY


    Directory structure
    -------------------
    -------------------
    LogParserSampleHive
    logs
    airawat-syslog
    2013
    04
    messages
    2013
    05
    messages
    15 changes: 14 additions & 1 deletion 03-DataLoadCommands
    Original file line number Diff line number Diff line change
    @@ -1,2 +1,15 @@
    Data load commands
    ------------------
    ------------------
    a) Load the data
    $ hadoop fs -mkdir LogParserSampleHive
    $ hadoop fs -mkdir LogParserSampleHive/logs
    $ hadoop fs -put LogParserSampleHive/logs/* LogParserSampleHive/logs/
    $ hadoop fs -ls -R LogParserSampleHive/ | awk {'print $8'}

    LogParserSampleHive/logs
    LogParserSampleHive/logs/airawat-syslog
    LogParserSampleHive/logs/airawat-syslog/2013
    LogParserSampleHive/logs/airawat-syslog/2013/04
    LogParserSampleHive/logs/airawat-syslog/2013/04/messages
    LogParserSampleHive/logs/airawat-syslog/2013/05
    LogParserSampleHive/logs/airawat-syslog/2013/05/messages
    26 changes: 25 additions & 1 deletion 04-HiveCommands
    Original file line number Diff line number Diff line change
    @@ -1,2 +1,26 @@
    Hive commands
    --------------
    --------------

    a) Create external table:

    hive> CREATE EXTERNAL TABLE LogParserSample(
    month_name STRING,
    day STRING,
    time STRING,
    host STRING,
    event STRING,
    log STRING)
    PARTITIONED BY(year int, month int)
    ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
    WITH SERDEPROPERTIES (
    "input.regex" = "(\\w+)\\s+(\\d+)\\s+(\\d+:\\d+:\\d+)\\s+(\\w+\\W*\\w*)\\s+(.*?\\:)\\s+(.*$)"
    )
    stored as textfile;

    b) Create partitions and load data:

    hive> Alter table LogParserSample Add IF NOT EXISTS partition(year=2013, month=04)
    location '/user/airawat/LogParserSampleHive/logs/airawat-syslog/2013/04/';

    hive> Alter table LogParserSample Add IF NOT EXISTS partition(year=2013, month=05)
    location '/user/airawat/LogParserSampleHive/logs/airawat-syslog/2013/05/';
    15 changes: 15 additions & 0 deletions 05-QueryAndOutput
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,15 @@
    Hive query
    -----------

    hive> set hive.cli.print.header=true;
    hive> add jar hadoop-lib/hive-contrib-0.10.0-cdh4.2.0.jar; --I need this as my environment is not properly configured
    hive> select Year,Month,Day,Event,Count(*) Occurrence from LogParserSample group by year,month,day,event order by event desc,year,month,day;

    Output
    -------






    2 changes: 0 additions & 2 deletions 05-SampleOutput
    Original file line number Diff line number Diff line change
    @@ -1,2 +0,0 @@
    Hive query and output
    ---------------------
  5. @airawat airawat revised this gist Jul 3, 2013. 6 changed files with 39 additions and 10 deletions.
    19 changes: 9 additions & 10 deletions 00-LogParser-Hive-Regex
    Original file line number Diff line number Diff line change
    @@ -1,13 +1,12 @@
    This gist includes a mapper, reducer and driver in java that can parse log files using
    regex; The code for combiner is the same as reducer;
    Usecase: Count the number of occurances of processes that got logged, inception to date.
    This gist includes hive ql scripts to create an external partitioned table for Syslog
    generated log files using regex serde;
    Usecase: Count the number of occurances of processes that got logged, by year, month,
    day and process.

    Includes:
    ---------
    Mapper: 01-LogEventCountMapper.java
    Reducer: 02-LogEventCountReducer.java
    Driver: 03-LogEventCount.java
    Sample data and scripts for download:04-ScriptAndDataDownload
    Sample data and structure: 05-SampleDataAndStructure
    Commands: 06-Commands
    Sample output: 07-Output
    Sample data and structure: 01-SampleDataAndStructure
    Data download: 02-DataDownload
    Data load commands: 03-DataLoadCommands
    Hive commands: 04-HiveCommands
    Sample output: 05-SampleOutput
    18 changes: 18 additions & 0 deletions 01-SampleDataAndStructure
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,18 @@
    Sample data
    ------------
    May 3 11:52:54 cdh-dn03 init: tty (/dev/tty6) main process (1208) killed by TERM signal
    May 3 11:53:31 cdh-dn03 kernel: registered taskstats version 1
    May 3 11:53:31 cdh-dn03 kernel: sr0: scsi3-mmc drive: 32x/32x xa/form2 tray
    May 3 11:53:31 cdh-dn03 kernel: piix4_smbus 0000:00:07.0: SMBus base address uninitialized - upgrade BIOS or use force_addr=0xaddr
    May 3 11:53:31 cdh-dn03 kernel: nf_conntrack version 0.5.0 (7972 buckets, 31888 max)
    May 3 11:53:57 cdh-dn03 kernel: hrtimer: interrupt took 11250457 ns
    May 3 11:53:59 cdh-dn03 ntpd_initres[1705]: host name not found: 0.rhel.pool.ntp.org

    Structure
    ----------
    Month = May
    Day = 3
    Time = 11:52:54
    Node = cdh-dn03
    Process = init:
    Log msg = tty (/dev/tty6) main process (1208) killed by TERM signal
    6 changes: 6 additions & 0 deletions 02-DataDownload
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,6 @@
    Data download
    -------------


    Directory structure
    -------------------
    2 changes: 2 additions & 0 deletions 03-DataLoadCommands
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,2 @@
    Data load commands
    ------------------
    2 changes: 2 additions & 0 deletions 04-HiveCommands
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,2 @@
    Hive commands
    --------------
    2 changes: 2 additions & 0 deletions 05-SampleOutput
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,2 @@
    Hive query and output
    ---------------------
  6. @airawat airawat created this gist Jul 3, 2013.
    13 changes: 13 additions & 0 deletions 00-LogParser-Hive-Regex
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,13 @@
    This gist includes a mapper, reducer and driver in java that can parse log files using
    regex; The code for combiner is the same as reducer;
    Usecase: Count the number of occurances of processes that got logged, inception to date.

    Includes:
    ---------
    Mapper: 01-LogEventCountMapper.java
    Reducer: 02-LogEventCountReducer.java
    Driver: 03-LogEventCount.java
    Sample data and scripts for download:04-ScriptAndDataDownload
    Sample data and structure: 05-SampleDataAndStructure
    Commands: 06-Commands
    Sample output: 07-Output