Skip to content

Instantly share code, notes, and snippets.

View sfsgagi's full-sized avatar
🦉

Dragan Stankovic sfsgagi

🦉
View GitHub Profile
@sfsgagi
sfsgagi / apache-logs-hive.sql
Created October 4, 2012 09:35 — forked from emk/apache-logs-hive.sql
Apache log analysis with Hadoop, Hive and HBase
-- This is a Hive program. Hive is an SQL-like language that compiles
-- into Hadoop Map/Reduce jobs. It's very popular among analysts at
-- Facebook, because it allows them to query enormous Hadoop data
-- stores using a language much like SQL.
-- Our logs are stored on the Hadoop Distributed File System, in the
-- directory /logs/randomhacks.net/access. They're ordinary Apache
-- logs in *.gz format.
--
-- We want to pretend that these gzipped log files are a database table,