WeSearch
Hub / Tags / Inference
TAG · #INFERENCE

Inference coverage.

Every story in the WeSearch catalog tagged with #inference, chronological, with view counts. Subscribe to the per-tag RSS feed to follow this topic in your reader of choice.

42 stories tagged with #inference, in publish-time order across the WeSearch catalog. Tag pages update as new stories ingest.

⌘ RSS feed for this tag →   or   search "Inference"

RELATED TAGS
#ai7#ml4#ai-inference3#local-inference3#causal-inference2#data-science2#amd2#transformers-js2#ai-integration2#github2#ubuntu2#edge-computing2
THE REGISTER

Inference is giving AI chip startups a second chance to make their mark

In a disaggregated AI world, Nvidia can be both a friend and an enemy AI adoption is reaching an inflection point as the focus shifts from training new models to serving them. For …

6 views ·
#ai#chips
STABLEDIFFUSION

Built a local LLM inference engine on CachyOS — runs faster than llama.cpp on my 9070 XT

Hey folks, we've been hacking on a Vulkan-based LLM engine the last few weeks, figured I'd share since I'm running it exclusively on CachyOS with Mesa RADV. It's called VulkanForge…

5 views ·
GITHUB

VulkanForge – 14 MB Vulkan LLM engine that runs native FP8 models on AMD (Rust)

interfernece in rust and vulkan. Contribute to maeddesg/vulkanforge development by creating an account on GitHub.…

4 views ·
#machine learning#gpu computing#rust
LATEST FROM TOM'S HARDWARE

Anthropic in early talks to buy DRAM-less AI inference chips from UK startup — Fractile's SRAM architecture reduces need for pricey memory during extreme pricing and shortage crunch

Anthropic has reportedly held early discussions with London-based chip startup Fractile about purchasing the company's inference accelerators.…

4 views ·
#artificial intelligence#semiconductors#tech startups
LOCALLLAMA

[Paper on Hummingbird+: low-cost FPGAs for LLM inference] Qwen3-30B-A3B Q4 at 18 t/s token-gen, 24GB, expected $150 mass production cost

3 views ·
TOWARDS DATA SCIENCE

Inference Scaling (Test-Time Compute): Why Reasoning Models Raise Your Compute Bill

Why reasoning models dramatically increase token usage, latency, and infrastructure costs in production systems The post Inference Scaling (Test-Time Compute): Why Reasoning Models…

2 views ·
#artificial intelligence#machine learning#cloud computing
STRATECHERY BY BEN THOMPSON

2026.18: Long-term, Peripheral & Myopic Visions

The best Stratechery content from the week of April 27, 2026, including Amazon and AI, the future of AR devices, and Beijing's myopia.…

10 views ·
#ai inference#ar hardware#amazon aws
TECHMEME

Sources: Anthropic is in early talks to buy AI inference chips from UK-based Fractile when they become available in 2027 (The Information)

The Information : Sources: Anthropic is in early talks to buy AI inference chips from UK-based Fractile when they become available in 2027 — As Anthropic's sales explode, straining…

15 views ·
#ai#semiconductors#technology
LOCALLLAMA

Hybrid on-device inference on Android: llama.cpp + LiteRT + NPU/GPU routing

Hi everyone, I’m the maintainer of Box — a fork of Google’s AI Edge Gallery that I’ve been extending into a fully offline AI assistant for Android. Full disclosure: I built this pr…

8 views ·
DEV COMMUNITY

Chapter 12: Inference - Generating New Text

What You'll Build A sampling loop that generates new names from the trained model. Depends On Chapter 11 (the trained model). How Generation Works After training, the parameters ar…

9 views ·
#machine learning#natural language processing#csharp
ACTUAL COMPUTER

Welcome to Actual Computer

Actual Computer is building software for mesh inference across heterogeneous hardware, abstracting device communication, topology, OS compatibility, and provider API equivalency so…

7 views ·
#artificial intelligence#distributed computing#edge computing
FREECODECAMP PROGRAMMING TUTO

Product Experimentation with Propensity Scores: Causal Inference for LLM-Based Features in Python

Every product experimentation team running causal inference on LLM-based features eventually hits the same wall: when users click "Try our AI assistant," the volunteers aren't a ra…

6 views ·
#causal inference#product experimentation#propensity scores
TECHMEME

Cloud computing provider Nebius agrees to buy Eigen AI, which optimizes the performance of chips running AI inference tasks, for $615M in stock and cash (Dina Bass/Bloomberg)

Dina Bass / Bloomberg : Cloud computing provider Nebius agrees to buy Eigen AI, which optimizes the performance of chips running AI inference tasks, for $615M in stock and cash — C…

18 views ·
#cloud computing#ai infrastructure#acquisition
GOOGLE NEWS

Nebius Agrees to Acquire Eigen AI, Strengthening Nebius Token Factory as a Frontier Inference Platform - Morningstar

Comprehensive up-to-date news coverage, aggregated from sources all over the world by Google News.…

5 views ·
GITHUB

Openpi-flash: Real-time inference engine for openpi

Real-time inference engine for openpi. Contribute to Hebbian-Robotics/openpi-flash development by creating an account on GitHub.…

6 views ·
#robotics#inference engine#low-latency
SUBSTACK

More Tokens Isn't More Intelligence

Cost vs. benefit imbalance: Biology vs. AI scaling…

7 views ·
#ai scaling#inference efficiency#artificial intelligence
DEV.TO (TOP)

Video Demo: How Does Model Compression Change AI Reasoning?

In this video, I benchmark Mistral-7B-Instruct-v0.2 on an NVIDIA H200 DigitalOcean GPU in three...…

6 views ·
#ai#model compression#quantization
TECHMEME

Serverless inference platform Featherless.ai raised a $20M Series A co-led by AMD Ventures and Airbus Ventures; the startup supports over 30,000 open models (Cate Lawrence/Tech.eu)

By Cate Lawrence / Tech.eu. View the full context on Techmeme.…

9 views ·
#ai#startups#funding
STRATECHERY — BEN THOMPSON

Amazon Earnings, Trainium and Commodity Markets, Additional Amazon Notes

Amazon’s earnings suggest that the shift away from training towards inference and agents means their bet on Trainium is paying off. Plus, additional notes on ads, agents, and sport…

7 views ·
#amazon earnings#trainium#ai inference
YCOMBINATOR

Ask HN: What are you doing during inference?

6 views ·
#ai agents#software development#llms
DEV.TO (TOP)

Comparison: vLLM 0.6 vs. Text Generation Inference 1.4 for Serving Code LLMs

Serving code LLMs at production scale is 3.2x more expensive than general-purpose LLMs when using...…

7 views ·
#comparison#vllm#text generation inference
ARXIV CS.AI

Applied AI-Enhanced RF Interference Rejection

AI-enhanced interference rejection in radio frequency (RF) transmissions has recently attracted interest because deep learning approaches trained on both the signal of interest (SO…

7 views ·
#ai-enhanced rf#interference rejection#transformer models
ARXIV CS.AI

Complete Cyclic Subtask Graphs for Tool-Using LLM Agents: Flexibility, Cost, and Bottlenecks in Multi-Agent Workflows

Long-horizon tool-using tasks sometimes benefit from revisiting earlier subtasks for recovery and exploration, but added multi-agent workflow flexibility can also introduce coordin…

9 views ·
#multi-agent systems#llm agents#workflow flexibility
APPLIEDCOMPUTE

Benchmarking Inference Engines on Agentic Workloads

6 views ·
#inference engines#agentic workloads#benchmarking
GOOGLE DOCS

vLLM-Compile: Bringing Compiler Optimizations to LLM Inference

vLLM-compile: Bringing Compiler Optimizations to LLM Inference Luka Govedič vLLM Committer Senior Machine Learning Engineer, Red Hat 1…

9 views ·
INVESTING.COM — NEWS

DigitalOcean launches AI-Native Cloud platform for inference workloads

8 views ·
GITHUB

Private Decentralized Inference on Consumer Hardware [pdf]

Decentralized Private Inference. Contribute to Layr-Labs/d-inference development by creating an account on GitHub.…

6 views ·
#decentralized inference#privacy-preserving ai#consumer hardware
GITHUB

PAVO-Bench – 50K voice turns and an 85K-param router for ASR→LLM→TTS

A 50K-turn voice pipeline benchmark and an 85K-param meta-controller that cuts P95 latency 10.3% and energy 71% vs fixed cloud. TMLR 2026. - vnmoorthy/pavo-bench…

8 views ·
#voice orchestration#asr-llm-tts pipeline#inference routing
TOM'S HARDWARE

Ubuntu's AI roadmap revealed, universal AI 'kill switch' and forced AI integration are not part of the plan — cloud tracking, local inference, and agentic system tools take center stage

AI is coming to Ubuntu…

8 views ·
#ubuntu#ai roadmap#local inference
DIGITAL TRENDS

AI is coming to Linux, but not in the obnoxious way that will grind your gear

Ubuntu is bringing AI into the OS carefully, focusing on optional features, local processing, and tools that enhance workflows without disrupting the traditional Linux experience.…

9 views ·
#ai integration#ubuntu#linux
ALL NEWS

DigitalOcean launches AI inference engine with routing capabilities

7 views ·
HUGGINGFACE

How to Use Transformers.js in a Chrome Extension

We’re on a journey to advance and democratize artificial intelligence through open source and open science.…

6 views ·
#transformers.js#chrome extension#manifest v3
SEEKING ALPHA

AMD: Inference And Agentic AI Are Expanding Its Runway

Advanced Micro Devices is Buy-rated on expanding AI demand, strong EPYC/data center momentum, and discounted valuation. Learn more about AMD stock here.…

8 views ·
#amd#ai inference#agentic ai
ARXIV.ORG

Active Inference: A method for Phenotyping Agency in AI systems?

The proliferation of agentic artificial intelligence has outpaced the conceptual tools needed to characterize agency in computational systems. Prevailing definitions mainly rely on…

5 views ·
#artificial intelligence#agency#active inference
ARXIV.ORG

Causal Discovery as Dialectical Aggregation: A Quantitative Argumentation Framework

Constraint-based causal discovery is brittle in finite-sample regimes because erroneous conditional-independence (CI) decisions can cascade into substantial structural errors. We p…

6 views ·
#artificial intelligence#causal inference#machine learning
LOCALLLAMA

We benchmarked gpt-oss-120b across 6 inference providers and found a 10x throughput spread

We ran a benchmark across 10+ LLM routers, providers, and inference backends to answer the questions that come up every time someone picks a provider. Key findings: Do LLM routers …

12 views ·
REDDIT

Skymizer Taiwan Inc. Unveils Breakthrough Architecture Enabling Ultra-Large LLM Inference on a Single Card

Source Article excerpt: With a single PCIe card — powered by six HTX301 chips and 384 GB of memory — enterprises can now run 700B-parameter model inference locally at just ~240W pe…

8 views ·
LOCALLLAMA

Ubuntu 26.04 vs 24.04 speed improvements for inference?

I'm curious if any brave soul has upgraded their computer (especially if it's Strix Halo) from Ubuntu 24.04 -> 26.04 and seen a significant performance improvement for inference wi…

8 views ·
REDDIT

AMD Hipfire - a new inference engine optimized for AMD GPU's

Came across hipfire the other day. It's a brand new inference engine focused on all AMD GPU's (not just the latest). Github. It uses a special mq4 quantization method. The hipfire …

10 views ·
REDDIT

llama.cpp DeepSeek v4 Flash experimental inference

Hi, here you can find experimental llama.cpp support for DeepSeek v4, and here there is the GGUF you can use to run the inference with "just" (lol) 128GB of RAM. The model, even qu…

12 views ·
LMSYS

DeepSeek-V4 on Day 0: From Fast Inference to Verified RL with SGLang and Miles

We are thrilled to announce Day-0 support for DeepSeek-V4 across both inference and RL training. SGLang and Miles form the first open-source stack to serve and train DeepSeek-V4 on…

8 views ·
#deepseek-v4#sglang#miles
REDDIT

FP4 inference in llama.cpp (NVFP4) and ik_llama.cpp (MXFP4) landed - Finally

Both llama.cpp and ik_llama.cpp now have FP4 support — but with different flavors worth knowing about. llama.cpp recently merged NVFP4 (Nvidia's block-scaled FP4, `GGML_TYPE_NVFP4 …

17 views ·