Skip to content

Instantly share code, notes, and snippets.

View andrewrgoss's full-sized avatar

Andrew Goss andrewrgoss

View GitHub Profile
# !/usr/bin/env python
__author__ = 'agoss'
import argparse
import json
import requests
from scraper_api import ScraperAPIClient
import sys
@andrewrgoss
andrewrgoss / dataproc-cluster-m.txt
Created July 18, 2018 19:18
GCP Leveraging Unstructured Data - Lab 3: Submit Dataproc jobs for unstructured data
Connected, host fingerprint: ssh-rsa 2048 F3:7F:24:6D:E9:7B:B1:16:6C:D8:49:A7:CF:C0:7A:23:25
:EB:72:AF
The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
google710312_student@dataproc-cluster-m:~$ cd
@andrewrgoss
andrewrgoss / TweetLength.jar
Created March 27, 2018 17:40
Twitter_streaming console output from TweetLength.scala
-------------------------------------------
Time: 1522165838000 ms
-------------------------------------------
(140,94)
(139,16)
(138,4)
(120,3)
(109,3)
(131,3)
(84,2)
@andrewrgoss
andrewrgoss / publish.go
Last active August 29, 2018 10:48
Getting started with NSQ for Go - creating a producer
package main
import (
"log"
"github.com/nsqio/go-nsq"
)
func main() {
config := nsq.NewConfig()
@andrewrgoss
andrewrgoss / consume.go
Last active August 29, 2018 10:48
Getting started with NSQ for Go - creating a consumer
package main
import (
"log"
"sync"
"github.com/nsqio/go-nsq"
)
func main() {
@andrewrgoss
andrewrgoss / MovieSimilarities1M.jar
Last active March 23, 2018 19:44
Console output from MovieSimilarities1M.scala - creating similar movie recommendations from one million ratings, run on AWS EMR cluster
mymachine:~ andgoss$ ssh -i ~/.credentials/ag-spark.pem [email protected]
Last login: Thu Mar 8 16:53:31 2018
__| __|_ )
_| ( / Amazon Linux AMI
___|\___|___|
https://aws.amazon.com/amazon-linux-ami/2017.09-release-notes/
3 package(s) needed for security, out of 5 available
Run "sudo yum update" to apply all updates.
@andrewrgoss
andrewrgoss / DataChatter.jar
Last active March 23, 2018 19:28
Twitter_streaming console output from DataChatter.scala
-------------------------------------------
Time: 1521748832000 ms
-------------------------------------------
(facebook,62)
(cambridge,22)
(zuckerberg,20)
(analytica,18)
(weather,17)
(mark,15)
(people,15)
@andrewrgoss
andrewrgoss / PopularHashtags.jar
Last active June 21, 2018 03:07
Twitter_streaming console output from PopularHashtags.scala
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
18/03/21 11:05:05 INFO SparkContext: Running Spark version 2.2.0
18/03/21 11:05:06 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/03/21 11:05:06 WARN Utils: Your hostname, 'mymachine' resolves to a loopback address: 127.0.0.1; using 192.168.1.105 instead (on interface en0)
18/03/21 11:05:06 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
18/03/21 11:05:06 INFO SparkContext: Submitted application: PopularHashtags
18/03/21 11:05:06 INFO SecurityManager: Changing view acls to: andgoss
18/03/21 11:05:06 INFO SecurityManager: Changing modify acls to: andgoss
18/03/21 11:05:06 INFO SecurityManager: Changing view acls groups to:
18/03/21 11:05:06 INFO SecurityManager: Changing modify acls groups to: