AI.news

30 items last 24h
generated 2026-03-21 23:27 feed.json
7 pts
r/LocalLLaMA Community 7h ago

DeepSeek Core Researcher Daya Guo Rumored to Have Resigned

Recently, heavy-hitting news regarding a major personnel change has emerged in the field of Large Language Models (LLMs): Daya Guo, a core researcher at DeepSeek and one of the primary authors of the DeepSeek-R1 paper, has reportedly resigned. Public records show that Daya Guo po...

6 pts
r/LocalLLaMA Community 9h ago

TGI is in maintenance mode. Time to switch?

Our company uses hugging face TGI as the default engine on AWS Sagemaker AI. I really had bad experiences of TGI comparing to my home setup using llama.cpp and vllm. I just saw that Huggingface ended new developments of TGI: https://huggingface.co/docs/text-generation-inference/i...

5 pts
The Verge AI Industry 9h ago

The gen AI Kool-Aid tastes like eugenics

Like many people, director Valerie Veatch was intrigued when OpenAI first released its Sora text-to-video generative AI model to the public in 2024. Though she didn't fully understand the technology, she was curious about what it could do, and she saw that other artists were buil...

4 pts
r/LocalLLaMA Community 24m ago

Nemotron-Cascade-2 10GB MAC ONLY Scores 88% on MMLU.

Even if someone did happen to make an MLX quant of this size (10gb) it would be completely incoherent at 2bit. https://huggingface.co/JANGQ-AI/Nemotron-Cascade-2-30B-A3B-JANG\_2L Mistral 4 30-40gb and a 60-70gb version coming out later today. submitted by /u/HealthyCo...

4 pts
r/LocalLLaMA Community 3h ago

I wrote a PowerShell script to sweep llama.cpp MoE nCpuMoe vs batch settings

Hi all, I have been playing around with Qwen 3.5 MOE models and found the sweetspot tradeoff between nCpuMoe and the batchsize for speed isn't linear. I also kept rerunning the same tests across different quants, which got tedious. If there is a tool/script that does this alr...

4 pts
r/LocalLLaMA Community 9h ago

Fixing Qwen thinking repetition

ok so I found the fix to Qwen thinking repetition. I discovered that pasting this system prompt from Claude fixes it completely. Other long system prompts might also work. I use 1.5 presence penalty, everything else llama.cpp webui defaults, no kv cache quant (f16), and i use a q...