skip to main content
📰
NewsNook
nesting hacker news in a more meaningful way
≡
menu
(⌘/)
LLM / page 13
newer
older
your nooks
add yours
BenchPress: Predict any LLM's score on any benchmark
How to Passive-Aggressively Shame People Who Use LLMs Selfishly
Measuring Search Ranking Quality with LLM Judged NDCG
ClaudeMeter – macOS menu bar app to track Claude usage and limits
LLM-CTF benchmark – 2,639 real data points from NeurIPS and original runs
California AB 2047 makes 3D printers off-limits to students, educators, business
Serving Large Language Models with a Minimalist Python CLI
Inference Compute Shapes Frontier LLM Evaluation
Confidence estimation is a better metric than agreement for LLM judges
LLMs Are Digitizing Judgment
Show HN: RLM-based local debugger for AI agent traces
Show HN: Hallu – a web framework where an LLM hallucinates your app
Charon: A blind, end-to-end-encrypted marketplace for LLM inference
Wayfinder – routing LLM prompts without another LLM
Show HN: Compilr.dev, multi LLM AI workspace
Why developers use LLMs to write blog posts
Show HN: peerd – AI agent harness that runs entirely in your browser