WeSearch
Hub / Search / metrics
SEARCH · METRICS

Results for "metrics".

25 stories match your query across our 700+ source catalog. Ranked by relevance and recency.

25 results for "metrics"

DEV COMMUNITY

I replaced CAPTCHA with passive biometrics after AI hit 91% bypass rate — 7 biological signals, no puzzles, free tier

CAPTCHA is broken AI now bypasses reCAPTCHA at 91%+ success rates. Every CAPTCHA you add...…

· 5 views
FORTUNE

The metrics driving Verizon’s turnaround

Under new CEO Dan Schulman, Verizon posted its first positive Q1 postpaid net adds in more than a decade.…

· 5 views
REDDIT

Kernel console live metrics

· 7 views
DEV.TO (TOP)

Part 1: Intent vs State — How AWS DevOps Agent Closes the Gap Between What Your System Is and What You Decided It Should Be

When something breaks at 3am, you look at logs, metrics, traces. You don't go and re-read the ADR your team wrote in January. AWS DevOps Agent does. Here's why that changes the first hour of an incide…

· 3 views
ARXIV CS.AI

RADIANT-LLM: an Agentic Retrieval Augmented Generation Framework for Reliable Decision Support in Safety-Critical Nuclear Engineering

Reliable decision support in nuclear engineering requires traceable, domain-grounded knowledge retrieval, yet safety and risk analysis workflows remain hampered by fragmented documentation and halluci…

· 6 views
ARXIV CS.AI

Quantifying Divergence in Inter-LLM Communication Through API Retrieval and Ranking

Large language models (LLMs) increasingly operate as autonomous agents that reason over external APIs to perform complex tasks. However, their reliability and agreement remain poorly characterized. We…

· 5 views
ARXIV CS.AI

Behavioral Intelligence Platforms: From Event Streams to Autonomous Insight via Probabilistic Journey Graphs, Behavioral Knowledge Extraction, and Grounded Language Generation

Contemporary product analytics systems require users to pose explicit queries, such as writing SQL, configuring dashboards, or constructing funnels, before insights can surface. This pull-based paradi…

· 4 views
ARXIV CS.AI

When VLMs 'Fix' Students: Identifying and Penalizing Over-Correction in the Evaluation of Multi-line Handwritten Math OCR

Accurate transcription of handwritten mathematics is crucial for educational AI systems, yet current benchmarks fail to evaluate this capability properly. Most prior studies focus on single-line expre…

· 8 views
ARXIV CS.AI

BiTA: Bidirectional Gated Recurrent Unit-Transformer Aggregator in a Temporal Graph Network Framework for Alert Prediction in Computer Networks

Proactive alert prediction in computer networks is critical for mitigating evolving cyber threats and enabling timely defensive actions. Temporal Graph Neural Networks (TGNs) provide a principled fram…

· 5 views
ARXIV CS.AI

Applied AI-Enhanced RF Interference Rejection

AI-enhanced interference rejection in radio frequency (RF) transmissions has recently attracted interest because deep learning approaches trained on both the signal of interest (SOI) and the signal mi…

· 5 views
ARXIV CS.AI

DO-Bench: An Attributable Benchmark for Diagnosing Object Hallucination in Vision-Language Models

Object level hallucination remains a central reliability challenge for vision language models (VLMs), particularly in binary object existence verification. Existing benchmarks emphasize aggregate accu…

· 5 views
ARXIV CS.AI

Structure Guided Retrieval-Augmented Generation for Factual Queries

Retrieval-Augmented Generation (RAG) has been proposed to mitigate hallucinations in large language models (LLMs), where generated outputs may be factually incorrect. However, existing RAG approaches …

· 5 views
ARXIV.ORG

The Controllability Trap: A Governance Framework for Military AI Agents

Agentic AI systems - capable of goal interpretation, world modeling, planning, tool use, long-horizon operation, and autonomous coordination - introduce distinct control failures not addressed by exis…

· 9 views
ANDROID POLICE

I'm officially 'un-locked' and ready to scroll, as long as I'm on my couch

No more biometrics in your trusted places…

· 6 views
SEEKING ALPHA

Kimco Realty: Why The Preferred Stocks Offer A Better Risk/Return Than The Common

Kimco Realty (KIM) preferreds KIM.PR.L & KIM.PR.M yield 6.5%+ below par with strong credit metrics—better than common/bonds.…

· 6 views
ARXIV.ORG

Do Transaction-Level and Actor-Level AML Queues Agree? An Empirical Evaluation of Granularity Effects on the Elliptic++ Graph

Graph-based anti-money laundering (AML) systems on blockchain networks can score suspicious activity at two granularity levels -- transactions or actor addresses -- yet compliance action is conducted …

· 7 views
ARXIV.ORG

MetaGAI: A Large-Scale and High-Quality Benchmark for Generative AI Model and Data Card Generation

The rapid proliferation of Generative AI necessitates rigorous documentation standards for transparency and governance. However, manual creation of Model and Data Cards is not scalable, while automate…

· 6 views
ARXIV.ORG

FinGround: Detecting and Grounding Financial Hallucinations via Atomic Claim Verification

Financial AI systems must produce answers grounded in specific regulatory filings, yet current LLMs fabricate metrics, invent citations, and miscalculate derived quantities. These errors carry direct …

· 5 views
ARXIV.ORG

Does Machine Unlearning Preserve Clinical Safety? A Risk Analysis for Medical Image Classification

The application of Deep Learning in medical diagnosis must balance patient safety with compliance with data protection regulations. Machine Unlearning enables the selective removal of training data fr…

· 5 views
ARXIV.ORG

Context-Aware Hospitalization Forecasting Evaluations for Decision Support using LLMs

Medical and public health experts must make real-time resource decisions, such as expanding hospital bed capacity, based on projected hospitalization trends during large-scale healthcare disruptions (…

· 8 views
ARXIV.ORG

CT-FineBench: A Diagnostic Fidelity Benchmark for Fine-Grained Evaluation of CT Report Generation

The evaluation of generated reports remains a critical challenge in Computed Tomography (CT) report generation, due to the large volume of text, the diversity and complexity of findings, and the prese…

· 8 views
ARXIV.ORG

The Kerimov-Alekberli Model: An Information-Geometric Framework for Real-Time System Stability

This study introduces the Kerimov-Alekberli model, a novel information-geometric framework that redefines AI safety by formally linking non-equilibrium thermodynamics to stochastic control for the eth…

· 5 views
ARXIV.ORG

Multi-Dimensional Evaluation of Sustainable City Trips with LLM-as-a-Judge and Human-in-the-Loop

Evaluating nuanced conversational travel recommendations is challenging when human annotations are costly and standard metrics ignore stakeholder-centric goals. We study LLMs-as-Judges for sustainable…

· 7 views
ARXIV.ORG

STELLAR-E: a Synthetic, Tailored, End-to-end LLM Application Rigorous Evaluator

The increasing reliance on Large Language Models (LLMs) across diverse sectors highlights the need for robust domain-specific and language-specific evaluation datasets; however, the collection of such…

· 5 views
REDDIT

A 14-day “Growth Forge” sprint: build an AI-powered growth agent on a real stack

Sharing something that sits at the intersection of AI agents and growth systems. VideoDB (backend for video/audio for AI agents) is running a 14-day sprint called Growth Forge for 5 builders to design…

· 9 views