WeSearch
Hub / Tags / Inference
TAG · #INFERENCE

Inference coverage.

Every story in the WeSearch catalog tagged with #inference, chronological, with view counts. Subscribe to the per-tag RSS feed to follow this topic in your reader of choice.

60 stories tagged with #inference, in publish-time order across the WeSearch catalog. Tag pages update as new stories ingest.

⌘ RSS feed for this tag →   or   search "Inference"

RELATED TAGS
#ai8#ml5#ai-inference3#local-inference3#llm-inference2#gpu-optimization2#apple-silicon2#bayesian-inference2#causal-inference2#technology2#code-llms1#performance-benchmark1
ARXIV.ORG

Still: Amortized KV Cache Compaction in a Single Forward Pass

The KV cache is the memory bottleneck of long-horizon language model deployment. Practically, a deployable compactor must be lightweight enough to call during inference, expressive…

9 views ·
#machine‑learning#natural‑language‑processing#model‑compression
GITHUB

TensorSharp: Open-Source Local LLM Inference Engine

A C# inference engine for running large language models (LLMs) locally using GGUF model files. TensorSharp provides a console application, a web-based chatbot interface, and Ollama…

29 views ·
#technology#software#open-source
HACKER NEWS (AI / LLM)

Lean Inference: Lean Manufacturing Principles Applied to AI

Making inference scale in a cost effective way…

31 views ·
#ai#technology#manufacturing
THEHIVERYIQ

Show HN: Hive Trust – Ed25519-signed benchmarks for every AI inference primitive

Hive primitives benchmarked against published SOTA adversaries. Every result is a signed Ed25519 receipt from hivemorph — queryable, tamper-evident, reproducible.…

19 views ·
#ai#technology#benchmarking
YAHOO FINANCE

FingerMotion shares rise on entry into edge AI inference computing market

14 views ·
DEV.TO (TOP)

Building a High-Performance Real-Time Data Pipeline with Edge Inference and Observability

Building a High-Performance Real-Time Data Pipeline with Edge Inference and...…

18 views ·
#iot#data-pipeline#edge-computing
IEEE SPECTRUM

With Nvidia Groq 3, the Era of AI Inference Is (Probably) Here (⌛ March 2026)

What makes Nvidia's new Groq 3 LPU chip a must-watch in the AI world?…

22 views ·
#nvidia#ai
DEV.TO (TOP)

Computer Use Agents Go Local: A Deep Technical Dive into On-Device GUI Automation, Quantized Inference & Holo3.1

Meta Description: Learn how to build production-grade local computer use agents using Holo3.1's...…

16 views ·
#ai#automation#privacy
ARXIV CS.AI

Do Real-World Datasets Contain Natural Experiments? An Empirical Study Using Causal Feature Selection

In nature, events that affect some individuals or groups but not others constitute an implicit intervention and are known as natural experiments. For example, the COVID-19 pandemic…

25 views ·
#artificial intelligence#machine learning#causal inference
ARXIV CS.AI

Unveiling the Structure of Do-Calculus Reasoning via Derivation Graphs

The do-calculus defines a general system of inference for interventional queries, allowing causal quantities to be transformed through successive applications of its rules. This pr…

31 views ·
#artificial intelligence#causal inference#do-calculus
R/HARDWARE

Inference + Agentic AI race (groq LPU vs SambaNova RDU) vs alternatives for Decode

18 views ·
INVESTING.COM — NEWS

Megaport secures 4 AI deals, to raise $594 million to build inference cloud

21 views ·
R/LOCALLLAMA

Everyone here self-hosts inference. Almost nobody self-hosts the tooling around it. That feels backwards to me.

16 views ·
YAHOO FINANCE

Prediction: This Artificial Intelligence (AI) Inference Specialist Is Going to Soar After June 3

13 views ·
DEV.TO (TOP)

Inference Theft Is the New AI App Security Bug: How to Protect Your LLM Endpoints

A practical checklist for protecting public AI endpoints from model abuse, runaway agent loops, and surprise inference bills.…

13 views ·
#ai#security#webdev
R/HARDWARE

Silicon Motion new SM2524XT PCIe 5 controller achieves 14GB/s read and 12GB/s write speeds with up to 2.5 million IOPS and up to 25% higher performance-per-watt, designed for AI inference

24 views ·
DEV.TO (TOP)

Enterprise AI Governance Starts With Identity, Not Inference

The mistake most teams make with AI governance is starting in the wrong place. They start with model...…

17 views ·
#ai#governance#security
GITHUB

Show HN: Tiny-vLLM – high performance LLM inference engine in C++ and CUDA

Build your own high performance LLM inference engine in C++ and CUDA - a smaller version of vLLM - jmaczan/tiny-vllm…

17 views ·
#technology#programming#machine learning
R/RUST

DinoV3 Embedding inference and visualization with Rust, ort and egui!

17 views ·
HACKER NEWS (AI / LLM)

How Many GPUs? A simple LLM inference sizing calculator

16 views ·
ALVARO-VIDELA

The Apple Neural Engine Inference Book

14 views ·
#technology#apple#machine learning
TECHMEME

Sources: ByteDance has partnered with chipmaker InnoStar to develop an AI inference chip modeled after Groq's LPUs, which are built to run AI models at low cost (The Information)

15 views ·
DEV.TO (TOP)

KV-Pool: 4.5x Agent Inference Throughput with Persistent KV Cache

Why Agent Workloads Are Expensive LLM inference costs always scale with context length. In...…

11 views ·
#ai#technology#cloud
KOG LABS

Real-time LLM Inference on Standard GPUs: 3k tokens/s per request

Today, Kog AI launches a tech preview of the Kog Inference Engine (KIE): 3,000 output tokens/s per request on 8× AMD MI300X GPUs and 2,100 on 8× NVIDIA H200 (FP16, no speculative d…

18 views ·
#ai#technology#gpu
GITHUB

Show HN: Static-allocation MLP inference in ANSI C using a 2-slot ring buffer

Static-allocation MLP inference in ANSI C using 2-slot circular buffer with fixed stride indexing. An easy to use, minimal MLP alternative to GiorgosXou/NeuralNetworks enhanced wit…

14 views ·
#technology#programming#machine learning
THEREGISTER

Argonne flexes spare supercompute to build private AI inference service

Think ChatDoE…

21 views ·
#ai#supercomputing#research
CHARLIE LABS

90% cheaper repo inference with GPT-5.4 nano

For bounded orchestration decisions, the right model is often the smallest one that can pass a focused validation loop.…

22 views ·
#technology#artificial intelligence#cost reduction
HACKER NEWS (NEWEST)

Stress disrupts hippocampal integration of overlapping events, memory inference

16 views ·
TECHMEME

Tensormesh, whose inference platform uses KV caching to reduce costs, raised a $20M seed extension, bringing its total funding to $24.5M (Chris Metinko/Axios)

19 views ·
GOOGLE NEWS

Tensormesh Raises $20M from Investors Including AMD Ventures, CoreWeave, NVentures, Launches Tensormesh Inference to Fix AI’s Most Expensive Problem - Morningstar

Comprehensive up-to-date news coverage, aggregated from sources all over the world by Google News.…

21 views ·
GITHUB

Imece – Distributed AI inference using volunteer GPUs and FLOP token

A decentralized AI compute cooperative where contributors earn inference credits by donating idle GPU/CPU time — measured in FLOPs, not crypto. - aslankose/imece…

20 views ·
#ai#decentralization#technology
CRYPTO BRIEFING

I Squared Capital buys $225M data center portfolio from Cogent Fiber to build AI inference platform

I Squared Capital acquires 10 data center facilities from Cogent Fiber for $225M, committing up to $1B to build a US platform focused on AI inference workloads.…

14 views ·
#investment#data centers#ai
ARXIV CS.AI

MobileExplorer: Accelerating On-Device Inference for Mobile GUI Agents via Online Exploration

Mobile graphical user interface (GUI) agents enable AI models to autonomously operate smartphones on behalf of users. However, most existing systems focus primarily on optimizing t…

23 views ·
#artificial intelligence#mobile#technology
ARXIV CS.AI

AGORA: Adapter-Grounded Observation-Action Retention for Inference-Free Prompt Compression in LLM Agents

The token-level extractive compressors widely used for general LM context are structurally inappropriate for LLM agents: across 17 (env, backbone, method) cells spanning two indepe…

21 views ·
#artificial intelligence#machine learning#language models
ARXIV CS.AI

Pretraining Data Exposure in Large Language Models: A Survey of Membership Inference, Data Contamination, and Security Implications

Large Language Models (LLMs) have become the predominant paradigm in NLP, advancing both research and industry. As model sizes and pretraining data grow, concerns about Pretraining…

17 views ·
#artificial intelligence#machine learning#data privacy
DEV.TO (TOP)

I built a Rust inference engine that streams MoE expert weights from NVMe SSDs, no GPU required

Most people trying to run Mixtral or DeepSeek-V3 locally hit the same wall: they don't have 80GB of...…

21 views ·
#ai#rust#moe
GOOGLE NEWS

Boom Times for Inference Providers? - The Information

Comprehensive up-to-date news coverage, aggregated from sources all over the world by Google News.…

17 views ·
TECHMEME

Source: AI inference provider Baseten is in talks to raise $1B at a post-money valuation of $11B, up from $5B after its $300M Series E announced in January (The Information)

18 views ·
YCOMBINATOR

Show HN: MurrDB: A RocksDB-based NVMe/S3 cache for AI inference workloads

15 views ·
R/MACHINELEARNING

Verbosity is not faithfulness: an architectural argument that reasoning models cannot perform faithful inference [D]

20 views ·
PHYS.ORG

Researchers develop Bayesian inference for hidden dependence structures in multi-group high-dimensional data

26 views ·
YAHOO FINANCE

I Squared bets on AI inference with $225 million data center buy from Cogent

12 views ·
ARXIV CS.AI

BODHI: Precise OS Kernel Specification Inference

The formal verification of operating system kernels requires precise specifications that capture the intended behavior of system calls. Writing these specifications manually demand…

27 views ·
#artificial intelligence#programming languages#software engineering
ARXIV CS.AI

Inference Time Context Sparsity: Illusion or Opportunity?

Sparsity has long been a central theme in LLM efficiency, but its role in context processing remains unresolved. As LLM workloads shift toward longer contexts and agentic interacti…

16 views ·
#artificial intelligence#machine learning#language models
ARXIV CS.AI

EPPC-OASIS: Ontology-Aware Adaptation and Structured Inference Refinement for Electronic Patient-Provider Communication Mining in Secure Messages

Secure patient-provider messages contain clinically important communication behaviors that are difficult to characterize manually at scale. The Electronic Patient-Provider Communic…

16 views ·
#artificial intelligence#healthcare#communication
ARXIV CS.AI

Identifying and Mitigating Systemic Measurement Bias in Production LLM Inference Benchmarks

As Large Language Models (LLMs) transition from research environments to production deployments, evaluating their performance against strict Service Level Objectives (SLOs) has bec…

26 views ·
#artificial intelligence#machine learning#performance evaluation
ARXIV CS.AI

Hypothesis Generation and Inductive Inference in Children and Language Models

Real world decision-making requires constructing mental models under uncertainty over evidence, over the underlying causal rules, and over the state of the world itself. Which comp…

18 views ·
#artificial intelligence#machine learning#cognitive science
ARXIV CS.AI

Beyond Inference-Only Deployment: Comparing Weight-Based Consolidation Against Cascading Compaction

Major LLM platforms deploy models in an inference-only configuration: the model serves requests but never updates per-user weights. Users must repeatedly re-teach preferences, corr…

17 views ·
#artificial intelligence#machine learning#software engineering
ARXIV CS.AI

Boosting Inference with Guided Reasoning: Stochastic Exploration for Recursive Models

Recent work on recursive architectures has shown that tiny neural networks can be surprisingly powerful on structured reasoning tasks. The trick is to model reasoning trajectories …

21 views ·
#artificial intelligence#machine learning#neural networks
R/LOCALLLAMA

New local model reaching near frontier on PII removal at 9 ms CPU inference

22 views ·
R/ARTIFICIAL

Building Conifer, an open-source local inference runtime (free + open source)

21 views ·
R/HOMELAB

Planning a dual 3090 inference server -- sanity check before I buy

19 views ·
R/LOCALLLAMA

Server build for local inference. 128 gb 3200 or 256 gb 2133mhz RAM?

19 views ·
R/MACHINELEARNING

DCGAN inference on a microcontroller: 12.6M parameters, 512KB SRAM, 26-second generation, pure C [P]

21 views ·
R/MACHINELEARNING

Is AI inference platform really that saturated now? [D]

21 views ·
ANTIREZ

Distributing LLM Inference in DwarfStar

15 views ·
DEV.TO (TOP)

Model Routing Cost Checklist: Hosted APIs, Open Models, Or Self-Hosted Inference?

Originally published on TechSaaS Cloud Originally published on TechSaaS Cloud Model...…

14 views ·
#ai#technology#business
GITHUB

Show HN: YieldOS-Lite – A simulator for LLM inference control-plane governance

Contribute to nikitph/yieldos development by creating an account on GitHub.…

16 views ·
#technology#research#simulation
R/BUILDAPC

Components Check Before Order - Inference/Games

15 views ·
ARXIV CS.AI

XWind: A Cross-site Router for Large Language Model Inference Serving at Renewable Energy Farms

AI power demand is growing at an unprecedented rate while power grids are often ailing and struggle to keep up. Grid expansion comes with high capital expenditure and long-distance…

15 views ·
#artificial intelligence#renewable energy#distributed computing