Skip to content

Instantly share code, notes, and snippets.

@xrazor1031
xrazor1031 / FMCG.md
Created June 9, 2020 15:11
FMCG shopping

  • 诚衣(lativ)
  • 迪卡侬
  • 优衣库

  • timberland
@xrazor1031
xrazor1031 / spark_find_na.py
Created January 20, 2020 06:06
[spark find nan] #nan #spark
import pyspark.sql.functions as F
df.select([F.count(F.when(F.isnan(c), c)).alias(c) for c in df.columns]).show()
@xrazor1031
xrazor1031 / iplocation.md
Created January 4, 2020 07:01
[Ip定位工具] #ip #location
@xrazor1031
xrazor1031 / netty_confilct.md
Last active December 24, 2019 03:22
[Bug: Spark application dependencies confilict] #spark #netty

environment

spark 2.4.3 bloomd java client

situation

No matter how large ulimit is set. IOException "too many open files" occurs here and there when shuffling.

handle

jar shaded, to rename class name when assemble jar.

@xrazor1031
xrazor1031 / rad_hdfs_to_pandas_df.py
Created December 5, 2019 09:49
[read hdfs into pandas df] #pandas
import pandas as pd
from hdfs import *
client = Client("http://127.0.0.1:50070")
def read_as_df(path):
with client.read(path) as reader:
_df = pd.read_json(reader.read(), lines=True)
return _df
@xrazor1031
xrazor1031 / json2df.py
Created November 28, 2019 06:13
[read json to dataframe] #pandas
import pandas as pd
import glob
# path = "/*"
path = "/*.csv"
all_files = glob.glob(path)
li = []
for filename in all_files:
df = pd.read_json(filename, lines=True)
@xrazor1031
xrazor1031 / dataframe2libsvm.scala
Created November 26, 2019 07:01
[dataframe to libsvm] #libsvm
// method 1
import org.apache.spark.ml.linalg.Vectors
import org.apache.spark.ml.feature.LabeledPoint
val pos = LabeledPoint(1.0, Vectors.dense(1.0, 0.0, 3.0))
val neg = LabeledPoint(0.0, Vectors.sparse(3, Array(0, 2), Array(1.0, 3.0)))
val df = Seq(neg,pos).toDF("label","features")
df.write.format("libsvm").save("/data/foo")
// method 2
@xrazor1031
xrazor1031 / cookie_visit.py
Last active November 23, 2019 08:20
[Cookie Visit] #crawler #selenium #cookie
@xrazor1031
xrazor1031 / wilson_score.py
Created November 22, 2019 08:12
[wilson score]
#Rewritten code from /r2/r2/lib/db/_sorts.pyx
#威尔逊区间
from math import sqrt
def confidence(ups, downs):
n = ups + downs
if n == 0:
return 0
@xrazor1031
xrazor1031 / auto_get_taobao_maobi.js
Last active November 5, 2019 02:49
[领猫币]
// auto.js
auto.waitFor();
var height = device.height;
var width = device.width;
toast("\n设备宽" + width + "\n" + "设备高" + height + "\n" + "手机型号" + device.model + "\n安卓版本" + device.release)
setScreenMetrics(width, height);lingqu();
 
function lingqu() {
    app.launchApp("手机淘宝");
    toast("打开淘宝")