60 stories tagged with #bench, in publish-time order across the WeSearch catalog. Tag pages update as new stories ingest.
⌘ RSS feed for this tag → or search "Bench"
2x RTX 6000 build during an extended bench test
CPU at 100% and both GPUs at 535w cap. Recorded for noise and thermals documentation. This is as loaded as I can get it on the 1600w titanium PSU. Pulling ~1650w at the wall. Peopl…
Benchmark: Azure Sentinel vs. Splunk 10.0 vs. AWS Security Hub for SIEM in Multi-Cloud Environments
In a 12-week benchmark across 3 cloud providers, 1.2PB of security logs, and 14,000 EPS (events per...…
Lightning top Canadiens 1-0 in OT, push series to seventh game: Takeaways
The Tampa Bay Lightning forced their Eastern Conference First Round series to a seventh game with a 1-0 overtime victory […]…
Benchmark: 1Password 8 vs. Bitwarden 2026 for Credential Autofill Speed
1Password 8 vs Bitwarden 2026: Credential Autofill Speed Benchmark Password manager...…
Buccaneers, Falcons named top landing spots for recently-benched $6 million Browns QB
The Tampa Bay Buccaneers were recently linked to a trade for a 25-year-old Cleveland Browns quarterback.…
micro-benchmarks don’t tell the whole story
Comments…
Show HN: Stealth Benchmark test if AI coding interview tools can be detected
Special Supreme Court bench to hear today TMC plea on counting staff
Trinamool Congress has moved the Supreme Court, challenging a directive mandating central government employees as counting supervisors or assistants for Bengal's assembly election …
Ghostty 1.0 vs Warp OSS vs WezTerm: 14 Days of Daily Use — Real Latency, Memory, and Workflow Numbers
The Fastest Terminal on macOS? A 14-Day Stress Test of Ghostty, Warp, and WezTerm After 14...…
Graduate student makes impact off the bench for Vermont women's lacrosse
Jose Alvarado, Jordan Clarkson filling critical bench role for Knicks to start playoff run
There are so many reasons to believe the best is yet to come.…
Grok 4.3 underperforms Grok 4.20 0309 on the Extended NYT Connections Benchmark, dropping from 93.4 to 67.5, though it achieves this result at a lower cost than the earlier Grok 4.20 run
MacRumors Giveaway: Win a Mac Mini to Run AI Agents With Astropad's 'Workbench' App
For this week's giveaway, we've teamed up with Astropad to offer MacRumors readers a chance to win a Mac mini to use with Astropad's new Workbench app . For those unfamiliar with A…
Barkley To Start, Luiz On The Bench | 4-2-3-1 Aston Villa Predicted Lineup Vs Tottenham Hotspur
Aston Villa will welcome Tottenham Hotspur at Villa Park on Sunday as they look to secure all three points at home in the Premier League. The Villans suffered a tough 1-0 loss away…
California’s Pathetic Democrat Governor Bench: Why the Machine’s Top Choice Xavier Becerra Is the Worst of Them
Democrats have run California into the ground for 16 straight years, with supermajorities and zero excuses—not a single Republican to blame. Now, with Gavin Newsom term-limited, th…
Tenacious-Bench: Building a Sales Domain Evaluation Benchmark When No Dataset Exists
The Gap General-purpose LLM benchmarks like τ²-Bench evaluate task completion in retail...…
Claude Opus 4.7 vs. Claude Opus 4.6: What Changed?
Claude Opus 4.7 improves software engineering benchmarks by 10% and visual reasoning by 13%, but regresses on agentic search. Here's the full breakdown.…
Analyzing GPT-5.5 and Opus 4.7 with ARC-AGI-3
ARC Prize Foundation is a nonprofit advancing open-source AGI research through benchmarks & prizes.…
Show HN: Reckoner – A query workbench for domain experts
Semantic query workbench for structured data. Contribute to peirce-lang/reckoner development by creating an account on GitHub.…
GPT-5.5 vs. GPT-5.4 vs. Opus 4.7 on 56 real coding tasks from 2 open source repo
Opus 4.7 vs GPT-5.5 vs GPT-5.4 on 56 real coding tasks across two open-source repos. Opus writes smaller patches; GPT-5.5 writes patches that more often survive review.…
Scope Graphs – A Theory of Name Resolution
Independent eval of Openai/privacy-filter vs GLiNER on 600 PII samples. The model is much better than naive benchmarks make it look
OpenAI dropped Privacy Filter last month under Apache 2.0 and I wanted to see how it actually stacks up against the other serious open weight option for PII detection, GLiNER large…
AI commerce needs an MLPerf – early attempt at one
Xmemory: Benchmarking Structured AI Memory Against RAG and Hybrid RAG
Persistent AI memory is often reduced to a retrieval problem: store prior interactions as text, embed them, and ask the model to recover relevant context later. This design is usef…
What benchmark would you build for “reply quality” in SDR generation? [D]
Working on evaluating some AI-generated outbound (SDR-style emails along with follow-ups), and I’m running into a weird problem. Everyone talks about better personalisation or high…
Advocates demand High Court circuit bench for central region of State in Shivamogga
Shivamogga advocates demand a High Court circuit bench in their city, opposing the proposed Mangaluru bench for central Karnataka.…
AI Commerce Needs MLPerf — and Here's an Early Attempt
Validating a UCP manifest takes a second. Scoring it for agent-readiness takes another. Neither of...…
Benchmark: Vue 3.5 vs. Angular 18 for Large Dashboard Applications
Vue 3.5 vs Angular 18: Benchmark for Large Dashboard Applications Large dashboard applications power critical enterprise workflows, from financial analytics to IoT monitoring. Thes…
Linux 7.0 Release, Age Verification Laws, Ryzen 9 9950X3D2 & Other April Happenings
A lot happened in the Linux and open-source world during the month of April. Ubuntu 26.04 LTS and Fedora 44 shipped, a lot of news around age attestation/verification laws, the Lin…
The State of AI Agent Memory in 2026: What the Research Actually Shows
The State of AI Agent Memory in 2026: What the Research Actually Shows Published by Vektor...…
SEC clears Nasdaq proposal for prediction market options tied to benchmark index
Benchmarking a Bug Scanner
We ran a tournament pitting Detail's findings against thousands of comments from code review bots.…
The Human Creativity Benchmark – Evaluating Generative AI in Creative Work
The frontier human data and evaluation lab for creative AI. 1.5M+ verified creative experts setting the benchmark for style, tone, and taste with next-gen creative tools.…
Benchmarking Local LLM/Harness Combinations
I’ve been running a small benchmark, harness-bench, that pairs local LLMs (served via llama.cpp’s llama-server) with agent harnesses (Aider, Claude Code, Ope...…
An unreleased Microsoft Surface Laptop popped up in benchmark listings. Heres what they reveal.
Surface Laptop 8 with Panther Lake incoming...?…
Benchmark: 2026 Backup Tools — Velero 2.0 vs. Restic 0.17 vs. Duplicati 2.0 for 1TB Data
2026 Backup Tools Benchmark: Velero 2.0 vs Restic 0.17 vs Duplicati 2.0 for 1TB...…
Chithirai festival is not merely a temple event, but a civilisational continuum: Madurai Bench
Madurai High Court emphasizes the Chithirai festival's cultural significance and orders measures against disruptions during the ceremonial procession.…
KROMATID to Present Breakthrough Genomic Integrity Benchmarking at ASGCT 2026, Powering the World's First Genomic Intelligence Platform - Morningstar
Comprehensive up-to-date news coverage, aggregated from sources all over the world by Google News.…
‘Not circular’: Benchmark defends Strategy’s STRC bitcoin accumulation model
CachyOS Linux Performance Leading Over Ubuntu 26.04 LTS, Fedora Workstation 44
It's not too entirely surprising given the aggressive stance that the CachyOS Linux distribution has taken on out-of-the-box performance, but for those curious, it continues largel…
I corrected my own benchmark claim from 91.5% to 88%. Here's what changed.
A week after shipping a flattering tokens-saved number for my AI context tool, I noticed it was apples-to-oranges. Here's the workload-matched redo, the smaller honest number, and …
Fed holds benchmark interest rate steady as Americans face rising inflation
The Federal Reserve on Wednesday held its benchmark interest rate steady for the third consecutive month as the U.S. economy faces rising inflation. Kelly O'Grady reports.…
Linux 7.1-rc1 Showing Off Some Wins On AMD Ryzen Threadripper
My initial testing of the Linux 7.1 development kernel on various systems in the lab continues going well…
Xtrackers drops ESG screening from 11 ETFs, shifts benchmarks
Benchmark: 2026 AI Engineer Salaries vs. Traditional Backend Roles Using TypeScript 6.0 and Go 1.24
\n In 2026, AI engineers building production LLM pipelines with TypeScript 6.0 and Go 1.24 command a...…
Benchmark: Cloudflare WAF 3.0 vs. AWS WAF 2026 vs. ModSecurity 3.0 Request Blocking Accuracy
In 2025, a single false negative in a web application firewall (WAF) cost a mid-sized SaaS provider...…
IPL 2026: From bench to finishing line, Shubham Dubey makes the wait worthwhile
Shubham Dubey, a 31-year-old cricketer, seized his long-awaited opportunity with a match-winning unbeaten 31 off 12 balls. His explosive innings, coupled with Donovan Ferreira's su…
Benchmark Electronics, Inc. (BHE) Q1 2026 Earnings Call Transcript
Benchmark Electronics, Inc. (BHE) Q1 2026 Earnings Call April 29, 2026 5:00 PM EDTCompany ParticipantsPaul Mansky - Investor Relations & Corporate...…
Anthropic unveils BioMysteryBench to test Claude's bioinformatics skills against human experts, and says Mythos solved ~30% of 23 questions that stumped experts (Anthropic)
Anthropic : Anthropic unveils BioMysteryBench to test Claude's bioinformatics skills against human experts, and says Mythos solved ~30% of 23 questions that stumped experts — In th…
I put ChatGPT-5.5 vs Gemini 3.1 Pro through 7 impossible tests — and the winner surprised me
We put OpenAI's new GPT-5.5 and Google's Gemini 3.1 Pro through 7 brutal real-world prompts. The winner of this ultimate AI showdown might surprise you…
Allahabad HC bench split over NHRC-ordered probe against madrasas
The Allahabad High Court issued conflicting interim orders on an NHRC probe into 558 madrasas in UP, highlighting differing views on the commission's role.| India News…
AI evals are becoming the new compute bottleneck
A Blog post by EvalEval Coalition on Hugging Face…
Caddy 2.8 vs Nginx 1.26: Static File Serving Speed Benchmark 2026
In 2026, static file serving remains the backbone of 78% of public-facing web workloads, yet the...…
atomic_queue benchmarks SMT vs no-SMT performance
atomic_queue benchmarks SMT vs no-SMT performance
Knicks bench provides energy on both ends of floor to nail down Game 5 win over Hawks
Four bench players gave the Knicks the scoring and defense needed to make Game 5's win over the Hawks a laugher.…
Performance Test: AWS Graviton4 Reduces EC2 Costs 40% vs. Intel Xeon 5th Gen
In a 12-week production benchmark across 14 workload types, AWS Graviton4-based EC2 instances...…
MiniMax M2.5 API Guide: 80% SWE-Bench at $0.15/M Tokens
MiniMax M2.5 matches Claude Opus on SWE-Bench at a fraction of the cost. Architecture breakdown, benchmark replay, and full API setup guide for 2026.…
Can Current Agents Close the Discovery-to-Application Gap? A Case Study in Minecraft
Discovering causal regularities and applying them to build functional systems--the discovery-to-application loop--is a hallmark of general intelligence, yet evaluating this capacit…
Quantifying Divergence in Inter-LLM Communication Through API Retrieval and Ranking
Large language models (LLMs) increasingly operate as autonomous agents that reason over external APIs to perform complex tasks. However, their reliability and agreement remain poor…