Skip to content

Instantly share code, notes, and snippets.

View bluepine's full-sized avatar

Song Wei bluepine

View GitHub Profile
@bluepine
bluepine / dict_to_cols.py
Last active March 18, 2019 16:10
pyspark dataframe dictionary to new columns
from pyspark.sql import Row
from pyspark.sql import HiveContext
from pyspark.sql.functions import udf
from pyspark.context import SparkContext
sc = SparkContext("local", "dict to col")
hc = HiveContext(sc)
data = hc.createDataFrame([Row(user_id=1, app_usage={'snapchat': 2, 'facebook': 10, 'gmail': 1}, active_hours={4: 1, 6: 11, 22: 1}),

环境 (Environment)

版本:Ubuntu 14.04 LTS 默认语言:English(United States)

安装 (Setup)

Debian 和 Ubuntu 下对中文支持比较好的字体有: fonts-droid、ttf-wqy-zenhei 和 ttf-wqy-microhei 等,除了文泉驿系列字体外,比较流行的免费中文字体还有文鼎提供的楷体和上海宋,包名分别是: fonts-arphic-ukai 和 fonts-arphic-uming。

@bluepine
bluepine / processPdftext.js
Created November 20, 2018 22:12
process text produced by pdf2text
var fs = require('fs')
var nodehun = require('nodehun');
var pdf2Text = require('pdf2text')
var affbuf = fs.readFileSync('/data/en_US.aff');
var dictbuf = fs.readFileSync('/data/en_US.dic');
var dict = new nodehun(affbuf,dictbuf);
function log(m) {
console.log(m);
@bluepine
bluepine / plot.py
Created November 10, 2018 01:42
some data plotting functions
# pyline: disable=no-member
""" plot3d using existing visuals : LinePlotVisual """
import numpy as np
import sys
import seaborn as sns
from vispy import app, visuals, scene
import matplotlib.pyplot as plt
@bluepine
bluepine / CouchDB_Python.md
Created November 7, 2018 23:52 — forked from marians/CouchDB_Python.md
The missing Python couchdb tutorial

This is an unofficial manual for the couchdb Python module I wish I had had.

Installation

pip install couchdb

Importing the module

@bluepine
bluepine / emacs_python_ide.md
Created July 30, 2018 19:01 — forked from widdowquinn/emacs_python_ide.md
Turning Emacs into a Python IDE

Turning emacs into a Python IDE

## Create a new config/initialisation file

Create a user-level initialisation file init.el:

touch .emacs.d/init.el
@bluepine
bluepine / gist:3ca662668a4425e0e0481eac7341ed63
Last active July 19, 2017 17:15
How to find out what extra source directories to include after importing Spark project using Intellij
According to https://spark.apache.org/developer-tools.html
" If so, open the “Project Settings” and select “Modules”. Based on your selected Maven profiles, you may need to add source folders to the following modules:"
However the list of source directories you need to add manually can vary depending on the state of source code. Here is my latest heuristics to catch all of them:
1. follow https://spark.apache.org/docs/latest/building-spark.html to build spark successfully from command line. This would ensure that the buildscript have done all the preparations maven needs to compile the source tree. (e.g. generate source code from avro)
2. Run "find . -type d -name src_managed -exec find {} -type f \;" to list all generated source files. Be carefully with the source directory you select to add them to build path. Make sure their relative pathes match their package names.
@bluepine
bluepine / pysyslog.py
Created July 12, 2017 02:20 — forked from marcelom/pysyslog.py
Tiny Python Syslog Server
#!/usr/bin/env python
## Tiny Syslog Server in Python.
##
## This is a tiny syslog server that is able to receive UDP based syslog
## entries on a specified port and save them to a file.
## That's it... it does nothing else...
## There are a few configuration parameters.
LOG_FILE = 'youlogfile.log'