| # | Input | Preferred | A (short) | B (short) |
|---|---|---|---|---|
| 0 | Move project between orgs? | – | Move project steps | Move project steps |
| 1 | Why set env vars? | A | Env vars purpose | Env vars needed |
| 2 | Trace Llama V2? | B | Use LangChain LLMs | Yes, with LangSmith |
| 3 | Use traceable decorator? | A | Use traceable decorator | Use traceable decorator |
| 4 | What's a LangSmith dataset? | A | Dataset = example pairs | Dataset = input-output |
| 5 | Query all project runs? | A | Use list_runs(...) |
Query all runs (basic) |
| 6 | What is LangChain? | A | Framework for LLM apps |
| ID | Output 1 | Output 2 | Acc 1 | Acc 2 |
|---|---|---|---|---|
| 04a95 | LangChain info | LangChain info | 1.0 | 1.0 |
| 198d70 | Query runs | Query runs | 0.0 | 0.0 |
| ea7b3f | Dataset intro | Dataset intro | 1.0 | 1.0 |
| 3cdd7b | Traceable use | Traceable use | 0.0 | 0.0 |
| 5ec65a | Llama V2 trace | Llama V2 trace | 1.0 | 1.0 |
| 7b74f0 | Move project | Move project | 0.0 | 0.0 |
| f23253 | Env variables | Env variables | 1.0 | 1.0 |
| Statistic | Faithfulness | Answer Correctness | Context Recall | Context Precision |
|---|---|---|---|---|
| Count | 21.000 | 21.000 | 21.000 | 21.000 |
| Mean | 0.928571 | 0.857143 | 0.812500 | 0.785714 |
| Std | 0.112142 | 0.231455 | 0.250000 | 0.279771 |
| Min | 0.750000 | 0.333333 | 0.500000 | 0.500000 |
| 25% | 0.880000 | 0.750000 | 0.666667 | 0.666667 |
| 50% | 1.000000 | 1.000000 | 1.000000 | 0.750000 |
| 75% | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
| Max | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
| Statistic | feedback.tool_selection_precision | error | execution_time | run_id |
|---|---|---|---|---|
| count | 100.000000 | 0 | 100.000000 | 100 |
| unique | NaN | 0 | NaN | 100 |
| top | NaN | NaN | NaN | 827e2f98 |
| freq | NaN | NaN | NaN | 1 |
| mean | 0.636667 | NaN | 1.417737 | NaN |
| std | 0.370322 | NaN | 0.581734 | NaN |
| min | 0.000000 | NaN | 0.468482 | NaN |
| Statistic | feedback.exact_match | feedback.matches_label | error | execution_time | run_id |
|---|---|---|---|---|---|
| count | 2.000000 | 2.000000 | 0 | 2.000000 | 2 |
| mean | 0.500000 | 0.500000 | NaN | 0.648931 | NaN |
| std | 0.707107 | 0.707107 | NaN | 0.250109 | NaN |
| min | 0.000000 | 0.000000 | NaN | 0.472097 | NaN |
| 25% | 0.250000 | 0.250000 | NaN | 0.560514 | NaN |
| 50% | 0.500000 | 0.500000 | NaN | 0.648931 | NaN |
| 75% | 0.750000 | 0.750000 | NaN | 0.737348 | NaN |
| max | 1.000000 | 1.000000 | NaN | 0.825765 | NaN |
| Memory Type | What is Stored | Human Example | Agent Example |
|---|---|---|---|
| Semantic | Facts | Things I learned in school | Facts about a user |
| Episodic | Experiences | Things I did | Past agent actions |
| Procedural | Instructions | Instincts or motor skills | Agent system prompt |
| Cache Strategy | Avg Latency (s) | Avg Inference Memory (MB) | Avg Score |
|---|---|---|---|
| Dynamic | 0.8046 | 1056.3977 | 0.4327 |
| Static | 4.0182 | 1056.1599 | 0.4224 |
| Quantized | 5.4194 | 1056.2500 | 0.4143 |
| without caching | with caching |
|---|---|
for each step, recompute all previous K and V |
for each step, only compute current K and V |
| attention cost per step is quadratic with sequence length | attention cost per step is linear with sequence length (memory grows linearly, but compute/token remains low) |
| Model Configuration | Avg. Latency (s) | Avg. Peak Memory (MB) | Avg. LLM Judge Score |
|---|---|---|---|
| W4A16 + SDPA | 1.103 | 1003.81 | 0.421 |
| W4A16 + SDPA Paged | 1.303 | 1041.80 | 0.391 |
| Sequence Length (N) | Memory Needed (bfloat16, 40 heads) |
|---|---|
| 1,000 | ~76 MB |
| 16,000 | ~19 GB |
| 100,000 | ~745 GB |
NewerOlder