Skip to content

Instantly share code, notes, and snippets.

View FareedKhan-dev's full-sized avatar

Fareed Khan FareedKhan-dev

View GitHub Profile
# Input Preferred A (short) B (short)
0 Move project between orgs? Move project steps Move project steps
1 Why set env vars? A Env vars purpose Env vars needed
2 Trace Llama V2? B Use LangChain LLMs Yes, with LangSmith
3 Use traceable decorator? A Use traceable decorator Use traceable decorator
4 What's a LangSmith dataset? A Dataset = example pairs Dataset = input-output
5 Query all project runs? A Use list_runs(...) Query all runs (basic)
6 What is LangChain? A Framework for LLM apps
ID Output 1 Output 2 Acc 1 Acc 2
04a95 LangChain info LangChain info 1.0 1.0
198d70 Query runs Query runs 0.0 0.0
ea7b3f Dataset intro Dataset intro 1.0 1.0
3cdd7b Traceable use Traceable use 0.0 0.0
5ec65a Llama V2 trace Llama V2 trace 1.0 1.0
7b74f0 Move project Move project 0.0 0.0
f23253 Env variables Env variables 1.0 1.0
Statistic Faithfulness Answer Correctness Context Recall Context Precision
Count 21.000 21.000 21.000 21.000
Mean 0.928571 0.857143 0.812500 0.785714
Std 0.112142 0.231455 0.250000 0.279771
Min 0.750000 0.333333 0.500000 0.500000
25% 0.880000 0.750000 0.666667 0.666667
50% 1.000000 1.000000 1.000000 0.750000
75% 1.000000 1.000000 1.000000 1.000000
Max 1.000000 1.000000 1.000000 1.000000
Statistic feedback.tool_selection_precision error execution_time run_id
count 100.000000 0 100.000000 100
unique NaN 0 NaN 100
top NaN NaN NaN 827e2f98
freq NaN NaN NaN 1
mean 0.636667 NaN 1.417737 NaN
std 0.370322 NaN 0.581734 NaN
min 0.000000 NaN 0.468482 NaN
Statistic feedback.exact_match feedback.matches_label error execution_time run_id
count 2.000000 2.000000 0 2.000000 2
mean 0.500000 0.500000 NaN 0.648931 NaN
std 0.707107 0.707107 NaN 0.250109 NaN
min 0.000000 0.000000 NaN 0.472097 NaN
25% 0.250000 0.250000 NaN 0.560514 NaN
50% 0.500000 0.500000 NaN 0.648931 NaN
75% 0.750000 0.750000 NaN 0.737348 NaN
max 1.000000 1.000000 NaN 0.825765 NaN
Memory Type What is Stored Human Example Agent Example
Semantic Facts Things I learned in school Facts about a user
Episodic Experiences Things I did Past agent actions
Procedural Instructions Instincts or motor skills Agent system prompt
Cache Strategy Avg Latency (s) Avg Inference Memory (MB) Avg Score
Dynamic 0.8046 1056.3977 0.4327
Static 4.0182 1056.1599 0.4224
Quantized 5.4194 1056.2500 0.4143
without caching with caching
for each step, recompute all previous K and V for each step, only compute current K and V
attention cost per step is quadratic with sequence length attention cost per step is linear with sequence length (memory grows linearly, but compute/token remains low)
Model Configuration Avg. Latency (s) Avg. Peak Memory (MB) Avg. LLM Judge Score
W4A16 + SDPA 1.103 1003.81 0.421
W4A16 + SDPA Paged 1.303 1041.80 0.391
Sequence Length (N) Memory Needed (bfloat16, 40 heads)
1,000 ~76 MB
16,000 ~19 GB
100,000 ~745 GB