19 results for "ai inference"
AMD: Inference And Agentic AI Are Expanding Its Runway
Advanced Micro Devices is Buy-rated on expanding AI demand, strong EPYC/data center momentum, and discounted valuation. Learn more about AMD stock here.…
Active Inference: A method for Phenotyping Agency in AI systems?
The proliferation of agentic artificial intelligence has outpaced the conceptual tools needed to characterize agency in computational systems. Prevailing definitions mainly rely on autonomy and goal-d…
Skymizer Taiwan Inc. Unveils Breakthrough Architecture Enabling Ultra-Large LLM Inference on a Single Card
Source Article excerpt: With a single PCIe card — powered by six HTX301 chips and 384 GB of memory — enterprises can now run 700B-parameter model inference locally at just ~240W per card. The memory-b…
DigitalOcean launches AI inference engine with routing capabilities
An Intelligent Fault Diagnosis Method for General Aviation Aircraft Based on Multi-Fidelity Digital Twin and FMEA Knowledge Enhancement
Fault diagnosis of general aviation aircraft faces challenges including scarce real fault data, diverse fault types, and weak fault signatures. This paper proposes an intelligent fault diagnosis frame…
TeamPCP Supply Chain Campaign: Update 008
TeamPCP Supply Chain Campaign: Update 008 - 26-Day Pause Ends with Three Concurrent Compromises (Checkmarx KICS, Bitwarden CLI Cascade, xinference PyPI), CanisterSprawl npm Worm Identified, and Tier 1…
We benchmarked gpt-oss-120b across 6 inference providers and found a 10x throughput spread
We ran a benchmark across 10+ LLM routers, providers, and inference backends to answer the questions that come up every time someone picks a provider. Key findings: Do LLM routers add latency? No, Ope…
DeepSeek-V4 on Day 0: From Fast Inference to Verified RL with SGLang and Miles
We are thrilled to announce Day-0 support for DeepSeek-V4 across both inference and RL training. SGLang and Miles form the first open-source stack to serve and train DeepSeek-V4 on launch day — with s…
your daily driver stack, what's it look like? and why?
What it says in the title, I'm interested in hearing what you all have landed on as a workable / useful stack for you. Mine looks like this: back end inference servers - llama.cpp, vLLM | V hermes-age…
I got 3× faster HFQ4 prefill on Strix Halo in hipfire with an opt-in MMQ path
I recently contributed an experimental HFQ4-G256 MMQ prefill path to hipfire, an RDNA-focused LLM inference engine. Disclaimer: I authored the PR, so this is partly a contribution note, but I am mainl…
GSAR: Typed Grounding for Hallucination Detection and Recovery in Multi-Agent LLMs
Autonomous multi-agent LLM systems are increasingly deployed to investigate operational incidents and produce structured diagnostic reports. Their trustworthiness hinges on whether each claim is groun…
Ulterior Motives: Detecting Misaligned Reasoning in Continuous Thought Models
Chain-of-Thought (CoT) reasoning has emerged as a key technique for eliciting complex reasoning in Large Language Models (LLMs). Although interpretable, its dependence on natural language limits the m…
Agentic Adversarial Rewriting Exposes Architectural Vulnerabilities in Black-Box NLP Pipelines
Multi-component natural language processing (NLP) pipelines are increasingly deployed for high-stakes decisions, yet no existing adversarial method can test their robustness under realistic conditions…
Tandem: Riding Together with Large and Small Language Models for Efficient Reasoning
Recent advancements in large language models (LLMs) have catalyzed the rise of reasoning-intensive inference paradigms, where models perform explicit step-by-step reasoning before generating final ans…
PhysNote: Self-Knowledge Notes for Evolvable Physical Reasoning in Vision-Language Model
Vision-Language Models (VLMs) have demonstrated strong performance on textbook-style physics problems, yet they frequently fail when confronted with dynamic real-world scenarios that require temporal …
MIMIC: A Generative Multimodal Foundation Model for Biomolecules
Biological function emerges from coupled constraints across sequence, structure, regulation, evolution, and cellular context, yet most foundation models in biology are trained within one modality or f…
Microsoft TRELLIS.2: An Open-Source, 4B-Parameter, Image-to-3D Model [pdf]
Recent advancements in 3D generative modeling have significantly improved the generation realism, yet the field is still hampered by existing representations, which struggle to capture assets with com…
Speculative Decoding Implementations: EAGLE-3, Medusa-1, PARD, Draft Models, N-gram and Suffix Decoding from scratch
I’ve been working on an educational implementation repo for speculative decoding: The goal is not to wrap existing libraries, but to implement several speculative decoding methods from scratch behind …
Speculative Decoding Implementations: EAGLE-3, Medusa-1, PARD, Draft Models, N-gram and Suffix Decoding from scratch [P]
I’ve been working on an educational implementation repo for speculative decoding: The goal is not to wrap existing libraries, but to implement several speculative decoding methods from scratch behind …