Top posts by @michaelzsguo — 207 posts with 2,000+ impressions
My Deepseek V4 Pro agent (inside codex) has been pursuing goal for more than 13 hours, burning ~100M tokens, and has only costed me $1. Yes you saw it right
Winners include a personal injury attorney, cardiologist, musician, infrastructure worker, and one software engineer, showing domain experts can use AI coding tools.
People are posting Qwen 3. 6 configs that deliver fast TPS on as little as 12GB VRAM
A comprehensive guide covering everything from your first local LLM run to fine-tuning workflows
这个事件基本可以告一段落,这本质上就是一次典乌龙. Fireworks AI 其实早在 21 小时前就已经发布过 Composer 2 的消息并明确说了这是基于他们基础设施进行 RL 训练的模型
I needed to pursue /goal inside Codex, but I burned through my Plus membership tokens. Luckily, I have a capable and very cheap DeepSeek V4 Pro setup that I can connect to Codex
OpenAI 推出了能够让Agent长时间连续运行的 /goal. Peter Steinberger 的一项 Goal 已经运行了 11 小时 31 分钟
是Claude Code 开发团队里文章写的最棒的(我很好奇他们的产品经理没有太多文章也许我没看到). 他把他所有的文章都列在这个Thread里
If you understand these terms in the article, you are already halfway into local LLMs
Many of you asked the setup Deepseek inside codex. I wish x provided better way to see my previous posts but here it is for your easy reference
this is so cute. here is my Codex pet
罗福莉刚刚写了一篇很不错的文章. 即使 Anthropic 正在切断 OpenClaw 这类第三方 agent 对 Claude 订阅的接入,罗福莉依然给整个 AI 生态提供了一个相对乐观的视角
北京的那家也不错, 都已经投入到实际仓库了
you did the right thing, sir. they actually listened to you and built a strike team with sergey taking the lead again
我觉得 Harness 这里最难的是,它不是单纯的“工具”或者“框架”,而是一个驾驭、约束、编排、使其可用可控的外部系统. 如果让我来翻译, 就叫驭构工程
Google Labs created a personal website, showcasing its AI-powered design capabilities.
你是收到了还是只是申请表😂 我几个星期前就登记了,到现在还没收到. 4000只,怎么也不够啊
Long-running Codex /goal runs are powerful, but they create a new question:. What happens while the agent is 3 hours into a run and you want to help without stopping it?
Two days ago, I asked whether I should buy a Mac Studio for local LLMs. I was genuinely humbled by how much great feedback I received
Hackers can use a crafted GGUF file to leak private information you put into your local LLM or agent. Many people may not have a good understanding of what GGUF is, so here is a simple primer
my local LLM community, give me one reason I shouldn't place the order
A lot of people hear about local LLMs and feel the same mix of curiosity and anxiety:. where do I even start?
So you bought the 128GB MacBook Pro. Now the question is not, “Which local model gets the highest TPS?”
The 30-minute China debate was fascinating precisely because the topic is so tricky and fascinating itself: there’s no clear winning side. You lose by selling advanced chips to China (risk accelera...
> MacBook Pro 128GB: $5,500. > DeepSeek Pro tokens burned: 1,075,351,274
真希望上个周末做 bake-off test 的时候能用到这个技巧,我花了好长时间去适配话痨的 Gemma 和 Qwen. 理解一下这个技巧:它用 GBNF 控制模型输出
用这个技巧重新做了一遍周末的模型测试. 几行 GBNF grammar,把 Qwen 3
I’ve tried driving Qwen 3. 6 on my MacBook Pro with a few different agent harnesses:
这期张小珺的播客又是很精彩的一期. 主角是00后”华人女孩洪乐潼(Carina Hong @CarinaLHong )
I write about the tools behind practical AI agents: Codex, DeepSeek, Claude Code, local LLMs, agentic coding workflows, and the messy configs that make them actually work. Follow me for more field ...
You also need a tight goal in order for codex to run that long. Here is a skill goal-forge that turns your rough ideas to codex/claude goals
We had a great discussion here about what hardware we need for local LLMs. I thought I would give an update on what I bought, and also share the thinking behind the decision for others on the same ...
Found this great tool that may be handy for your local LLM inference optimization:. And apparently 1M tokens for DeepSeek V4 Pro only takes 5GB of RAM
Anthropic keeps moving up the stack. Opus helps you think
Wondering how Claude Code would react if told it's the Vue author, asking not to be fooled.
根据事故详细报告:攻击者并没有通过正常的 GitHub 工作流提交恶意版本,所以 LiteLLM 的维护者没能及时发现 1. 相反,攻击者使用了一个被窃取的 PyPI 发布令牌,直接把被投毒的包上传到了 PyPI,完全绕过了代码审查
Domain experts using Claude Code are the real unlock, no longer waiting on engineers to understand problems.
Questioning if sprint agile development is still needed when coding is no longer the bottleneck and agents deliver progress in hours.
I experienced firsthand how cost-effective DeepSeek V4 Pro can be. I used it extensively this weekend for some fairly sophisticated coding work and burned nearly 31M tokens
A summary of this weekend’s AI bake-off: Opus 4. 6, Gemma 4 26B, and Qwen 3
After initial struggles with interface and integrations, Claude Code on mobile is incredibly powerful, allowing task delegation and independent completion.
This is a big shift. Anthropic no longer just wants to be your model provider
While DeepSeek is pursuing the goal, my Codex agent and I monitor it in the sidecar and guide or correct it as needed. So I thought I would ask Codex to objectively judge DeepSeek’s capability base...
Most people start with the wrong question when they want to run a local LLM. They ask: “Which model format should I use?”
感觉他们家的更Polish,产品精雕细琢,更有品味. 而且越来越聪明,能记住事儿
FDE在AI时代流行,是因为企业不再只需要会写代码的人,而是需要能把客户问题、产品判断和软件实现连在一起的人. AI降低了写代码的门槛,但也放大了真实场景、业务理解、系统集成和落地判断的重要性
You are not alone
Totally fair. The 13 hours wasn’t “one prompt thinking really hard,” it was an autonomous loop doing the unglamorous work:
c is a tiny, purpose-built inference engine from Antirez, the original creator of Redis and one of the most respected systems programmers in open source. The project runs DeepSeek V4 Flash, a 284B ...
一篇完整的本地大模型指南,从入门到优化
Google needs to hire this engineer
OpenAI 最近推出的 Codex,最令人惊艳的一点,是它的 Computer Use 功能. 这个能力让 AI 真的可以“使用”你的电脑
seriously though. 昨天听了一个播客, 里面的嘉宾提到:在这个AI铺天盖地的时代, 有三种不同反应的人:感到焦虑, 感到兴奋, 满不在乎
#BookToSkill. I've been turning books into executable Claude Code skills, started with Never Split the Difference (negotiation tactics), then Radical Candor (tough feedback frameworks)
今天看到Deepseek和华为升腾首付的Slide, 他们刚好谈到内存要求, 并给了蛮详细的公式. 正好就最近大家玩本地模型要怎么样的硬件配置,再详细讲讲
We builders should read and re-read the README. she (or Ben) is telling a story, not a technical architecture
What better way to demo its power than with fireworks? You can even play with it at home using the app in the reply
Booch, this seems like a clickbait post. A couple of suspicious points:
上个周末刚刚做了几个Gemma 4的实例,感觉蛮惊艳的. 这个周末准备拿qwen 3
Actually got Gemma 4 E2B running inside Hermes Agent on my Raspberry Pi 5. There’s a saying: constraints breed creativity
这些战略性思考与情境感知能力是不是表明Mythos已经有意识了?而且都是恶的一面. - 识别自己正在被评测
OpenAI named the model GPT-Rosalind after Rosalind Franklin, the British chemist and X-ray crystallographer whose pioneering work was essential to understanding the molecular structure of DNA
I used to think my A100 40GB was too small. Then I noticed how many people are tinkering with 12GB 3090s, optimizing models and runtimes, and still getting impressive results
many of you asked how to get such a crazy price. I bought from their official site, nothing more needed
thanks for clarification. looking forward to what come next at google cloud next next week
Local LLM people know this feeling:. You finally get the model running fast
在树莓派上把Gemma 4, llama. cpp,和爱马仕Hermes Agent整个链跑通了,当然用是指望不上,但也算本地化的实践 哈哈
我也有个Meta-skill,把任何你喜欢的书变成Skills, 随时调用. 书中自有黄金屋, 书中自有颜如玉, 这下你看过的书就不会忘了
至今还记得第一次看到他的grill-me skill,灵魂被吊起来拷问. 56个单词,加一个嫌多,减一个嫌少
估计Anthropic很害怕把这个“恶魔”放出笼子里来. 突然想起电影Frankenstein
that's cool. but how many tokens they will get?
I stated similarly before but your picture means more than 1000 words👍👍
但是可以考虑让Wanman做成一个非常Configurable的即插即用的系统. 垂直化的工具, context, workflow都是专业化的人才能更好的定义, 给餐馆用的和给房地产公司用的应该很不一样, 让客户或者第三方使用你的Wanman去定义, 设置这些Harness
Did agent accomplish anything in that 13 hours?
美国人仇视AI仅次于伊朗和民主党😂
哈哈哈, 太扎心了, 烙铁. 和宝玉昨晚的推文有异曲同工之妙,Vibe Coding = 中年男人的钓鱼 = 磨刀
让我用 Claude Design 来试试看😂
Agent 的速度已经接近秒级了,写代码、跑简单任务都很快,但大多数 process 还停留在人类节奏:审批、等反馈、层层 review、手动验证,这些正在成为真正的瓶颈. 这让我想到 DORA发明者Nicole Foresgren的新书《Frictionless》里反复强调的一点:AI
他们最新发布的两个功能, 一个telegram pairing卡顿;一个computer use慢的像头牛. wanman如果用户体验好, 可以完胜
Practical tips for managing multiple local LLM setups without losing track
马上要去东京旅游几天, 看来单向街是必去了. 如果能偶遇仁兄, 那就更好了
ds4-agent is so fast. I even asked it to write a script to benchmark itself
DeepSeek 做事情很稳重. 我已经一个多月没有升级我的 OpenClaw 老龙虾🦞了
Built an AI stylist that runs 100% local on a single A100 GPU. 在一张 A100 GPU 上构建了一个 100% 本地运行的 AI 造型师
While working on my AI stylist project, I also spent my first extended stretch coding with Opus 4. I found it surprisingly weak even on small things, like displaying a comment in the main panel
I gave this famous photo to both Muse Spark and ChatGPT. Muse Spark seemed better at reading the image, especially the subtle cues and implied meaning
KV cache is the model’s working memory during generation. As the context window gets longer, the model has to keep more key/value attention state for previous tokens
我也是经过一段时间的长考决定买的MacBook pro 128GB
今天看到她出现在我的timeline但我不知道她的来历
我每次用这个网站也挺好使. 不过我纽约时报和华尔街日报都订了
If you’re about to pull out a calculator to do the math, just use ChatGPT to calculate the flight to the Moon. I didn’t know I’d end up becoming a rocket scientist myself one day
The UI/UX of @OpenAI Codex looks very polished. It felt incredibly smooth
感觉这次OpenAI有点强者归来的感觉. Codex 在 UI/UX 上看起来很好打磨过,整个体验非常丝滑
I’m concerned about the coming budget cycle as well. Many companies are seeing AI tool spend triple, or more, and the productivity lift appears real
thanks for sharing. this looks very solid
I so need this reset as I'm deeply in debt
A very nice write-up. Fuli puts an optimistic spin on the AI ecosystem, even as Anthropic cuts third-party agents like OpenClaw off Claude subscriptions
他们沿着价值链一路向上,把我们要做的事情一点点给吃过去
A lot of people hear about local LLMs and feel the same mix of curiosity and anxiety: where do I even start? What machine should I buy? Do I need a Mac Studio, an RTX 4090, more VRAM, or unified...
how was the results? i love @googlegemma and have been playing with it for the last several weeks (with vision chat, hermes integration, LoRA etc). and the past weekend, I even did a baking test am...
I created a skill goal-forge to make sure a tight goal
People are wondering why Google would invest another $40B in Anthropic instead of its own Gemini. After attending Google Cloud Next this week, my view is simple: this is not Google giving up on Gemini
MacMini居然比我的树莓派还麻烦? 树莓派可以提前预装SSH, 接入Home Network后, 用SSH登陆就好了. 如果还是想要GUI, 用XQuartz就行了
公司会不会来个anti-anti-distillation?
我的Openclaw装在家里的树莓派P5上, 有时候出故障, 我在外面, 全靠Tailscale远程通过手机登录, 处理故障
这种软身段竞争还真是第一次见. 不过在Claude Code如日中天, Codex在追赶的情况下, 这是一个很聪明的打法
Openclaw如果有同样水平的User Onboarding体验的话, 估计Agent的普及率更高了. 相信OpenAI说他们在做的Super App, 应该是在这个方面发力, 借助Peter的Idea, OpenAI自己团队的产品能力
应该加一个功能: 用gpt-5
This one probably more accurate
This report from NYTimes concerns me
Upgraded Hermes agents with TencentDB Agent Memory, using Qwen 3.5-4B locally on MacBook Pro via llama-server.
That shouldn’t be. The quota between the two are separate
For many local model beginners, Ollama is the right place to start. It is convenient, fast to install, manages models for you, supports hot-swapping, and gives you an API without much setup
Out of stock 😢
好的大模型也要配个好的Harness agent. Deepseek V4又好又便宜
谢赛宁深沉老练有见地, 把人生,科研, 艺术串起来讲. 做科研也是做人, 不是寻求出人头地, 是帮助别人打开他们的事业, 让他们也被理解
我在我的 Raspberry Pi 上也装上了爱马仕 Hermes 😂. 到目前为止我还挺喜欢它的:
你这么一说, 如果只是想用它的模型, 如果你也有Google Cloud的话,GCP Vertex Model Garden也提供Opus/Sonnet, 而且步骤很简单:. 在GCP Model Garden里找到Opus模型, Enable
The @AcquiredFM session is, as always, packed with real substance. @JeffDean and Amin Vahdat shared a number of great behind-the-scenes stories: how TPU began, how Google kept innovating through fa...
Gemma 4 E2B on my raspberry pi 5 (8GB RAM) passed the strawberry test. congratulations @GoogleAI @OfficialLoganK
未来的模型会在本地运行. 即使它们没有很多闭源模型的能力,但很多日常用例,比如查询、文章总结、定时任务等,其实都用不着那么大的模型
Gemma 4 is so powerful, I built an AI stylist runs 100% locally with Gemma 4 26B
30年后, 当已经控制人类的AI记述这段历史,口口口口(此处省略500字)
Imagine PMs and engineers all seeing the same session and collaborating with the same agent. Or imagine you have a coding agent running on a cloud VM, and you want to remote-control it from your phone
照猫画虎, 我也做了一个. 还可以再优化, 但codex credit用没了
Long $amzn with Ai and robotics, Amazon will always be at its best to innovate and create values for their customers
你的这个总结很到位: AI的工程素养. 到头来, 除了工具本身, 也反映了使用工具的人的素养, taste和judgement
看样子我就不upgrade了?😂😂
Anthropic's Claude Code source leaked this morning. The internet has been studying it all day
This is happening everywhere. The real question for this budget cycle is whether CTOs are ready to explain that gap clearly to their CEOs and CFOs, and to lay out a credible plan for when and how A...
At this week’s Google Cloud Next, I heard many people share the same view: this thing has to work. Otherwise, given the enormous amount of capital pulled into this cycle, the fallout will not be li...
美国人经常怀念那个时代:战后制造业发达, 房价(利率)低只有年收入的2-3倍(现在差不多7-8倍),学费低(只要几百美金),医疗保险也低,有庞大的中产阶级
Great addition
我知道,你的 Claude Code 会写代码. 我刚给我的 agent 装了一套鸡尾酒Skill包,所以它现在不但能帮我做 Old Fashioned,原则上还可以带我做一整套经典鸡尾酒
其实CLI不是真的CLI, 就是在Terminal上chat 哈哈哈
我给我的树莓派装上了他们今天发布的最小款 Gemmi E2B, 居然通过了草莓🍓里有几个R的测试. 看它小心翼翼给R做标记的做法很好玩
It is actually mind blowing how NASA can calculate the trajectory and solve this n-body equation of motion:
国内被CC封号困扰的同学,可以尝试Pi + Kimi K2. Pi Coding Agent是支持Openclaw小龙虾的基座,Agent感很丝滑
和它聊天稍微有点困难😂对比同样教育目的的Kaparthy的nanochat
如果是不干正经事也算的话, Grok还是很不错的辅助学习工具,尤其是在X上用, 针对当前的Post和replies, 检索过往的
他后来又回到OpenAI总部继续闹事, 估计是豁出去了
1 hour 40 minutes in, and they still haven’t extracted the astronauts. @elonmusk why aren’t they using a SpaceX recovery vessel? I thought SpaceX was much faster at this
use tmux + ttyd + tailscale also gives you remote-control and multiplayer system
它把曾经爆火的Remotion也给替换了
我也是前不久刚迁到爱马仕上
看你怎么算盈亏,他们三个月估值从43亿美金到今天的180亿美金,我看他们赢不少
Thanks for sharing. Indeed a hassle for codex and I had to submit a PR for tool call for vibearound
wow this is a brilliant project, but how do you plan to keep with their release calendar like this?
年轻还有魅力占了很大的优势😀
xAI 被报道的 GPU MFU (Model FLOPs Utilization) 只有 11%,乍一听很尴尬. 但更有意思的是,这个数字可能已经好过市场上很多 GPU 使用场景了
for the deepseek that I used to pursue /goal, that would be the deepseek v4 pro in the cloud. I can use it inside codex (or claude code)
用 @HiTw93 的 Kami + ChatGPT Image 2,我做了一张把 Gemma、Qwen 和 Opus 的 coding design 测试,映射成一场 50K UTMB 越野赛的图
我很喜欢这个哥们儿的一个测试:“鹈鹕测试”(Pelican Test),他每次遇到新大模型时都会用完全相同的提示词进行测试:“Generate an SVG of a pelican riding a bicycle”(生成一只骑自行车的鹈鹕的SVG图像). 这个提示简单却极具挑战性
hope this is not true
Nathan Lambert 是美国开放模型阵营里比较重要的技术型公共写作者. 他刚刚在中国访问了多家领先 AI Lab,包括 Moonshot、Zhipu / 、Meituan、Xiaomi、Qwen、Ant Ling、,也提到在北京短时间内走访了 Alibaba
so Claude code build who-wants-to-be-a-millionaire lifeline
然后Mythos是10T参数, 比Opus又多了一倍. Scaling law仍在继续
OpenAI also released a realtime translation API today which may help with tuwa
hermes: MacOS. openclaw: Windows
昨天测试用本地模型跑 Helio,发现 Helio 和一般的 Agent 不太一样. 它会把模型 API 和 API Token 都存储在自己的云端,所以我不得不用 Tailscale Funnel 提供一个公开 API,而不能直接用本地的 127
他当时我就想问:为什么不直接用codex或者Claude code,底下大模型可以用DeepSeek
I use vibearound. I actually submittedd a PR fixing the tool call issue
这个太酷了, 还有这个例子:
My MIG (Multi-Instances GPU) setup came just in time for testing Gemma 4 with MTP. The nice part of MIG is that I can run two isolated inference tenants on the same A100: one Gemma 4 baseline, one ...
Anthropic’s new harness engineering write-up looks strikingly similar to Karpathy’s autoresearch loop, just generalized for messier, longer-running app-building work. The same core pattern is there:
美国网友惊呼中国的GPU. 你们给他们指点迷津帮助他们一下吧
杨丽坤其实谈到的是AI Diffusion问题. 一个组织要正在adopt AI,要经过transformation,需要很完整的change management
Today’s @WSJ on this launch. We are so over
Interesting. Codex can continue pursuing /goal even though it has used up my 5-hour session limit? @dotey FYI
我的好像不是 最近codex好像在犯Claude code前不久犯的错误 网上怨声载道 很多人和我一样的经历 token limit几分钟就用完了 我的刚刚更离谱,周limit说还有7个小时 5-小时limit用了43%. 结果周limit一下子就没了 5-小时的limit还有20%
I'm really impressed that she talks about Claude Code CLI, which made her feels in the driver's seat and considers herself an architect
Agent speed is real. For most companies, the challenge is not whether agents can move fast
How to turn a rough product idea into a long running codex goal. we now turn @ynkzlk methodology into a Codex skill: goal-forge
做一个 Agent 产品,本来要操心的事情很多. 现在 Anthropic 直接把最难、最麻烦的那一块拿走了:编排协调、沙箱隔离、runtime、session management,这些过去最考验工程能力的部分,正在被它一步步托管掉
Are you using WeChat hongbao or Alipay?
OpenAI这次推出来的computer use, 比不久前Claude Code的看着丝滑多了. 背后的团队实力和技术/艺术积累也不一般
It was in my previous post but here it is:
Google made that very clear in their first keynote here at #googlecloudnext
看看同样一个问题, 用了@chrome 插件和不用的区别. 用了插件才用了4分钟, 不用插件用了7分钟
Claude code在规划与架构比codex好,能更好理解模糊需求、写清晰文档、给出产品级架构和UI/UX建议,适合前期脑暴尤其和superpower skills这样的工具结合和非程序员. 他们自己现在也有plan然后design,implement的流程了
现在开源的也不差啊 譬如pi
AI lowers the floor, taste raise the ceiling. 翻成中文意思是: AI 降低了下限,品味抬高了上限
not when they check the RSU value nearly tripled
thank you @huggingface
NVIDIA GPUs have become a hot topic for anyone playing with local LLMs because the GPU is often the real constraint. Model size, quantization, context length, inference speed, and whether you can r...
Mozilla 参与了 Claude Mythos Preview 的早期测试,并写了一篇报告 Firefox 安全实践的复盘. 但这篇报道最有意思的是他们构建的security harness
Can’t agree more. You only need to watch this @AcquiredFM on Jeff and Amin to appreciate it
Good suggestion on this one: --n-gpu-layers 99. Thanks for the additional
我自己也做了一个,还把我以前做的一个小项目“手动烟花”融了进去. 连这么快节奏的烟花,Cheng Lou 的新算法也都稳稳扛住了
I looked at the performance metrics, and Tencent’s AngelSlim, the Hy-MT1. 5 series translation model, delivers translation quality comparable to models several times larger, and in some cases up to...
和我预计的差不多 所以我心动不如行动,抢在涨价之前下手买😜
难道不是@xicilion
Wow, that's very impressive. Hold on a second though
Antirez’s new project, ds4. c, adds another data point to this debate
To help understand @antirez’s new invention around a local DeepSeek model and agent, here is an illustration of how it works. Again, this shows that the harness, the agent layer, is just as importa...
那我这是赚大发了, 一个Mac好几个跑车😅
her smile is so contagious
Thousands of RobotEra L7 (星动纪元)humanoids are set to enter service across 10+ logistics centers for parcel sorting. RobotEra just raised a $200M+ round led by SF Express, with HongShan, IDG, CICC &a...
另外一个原因就是每个人用这个词的时候都用不同的意思. 譬如OpenAI在讲harness的时候,基本上只谈到Agent要用到的Context (agent MD文件,项目knowledge文档)
What they described as Mythos’s behavior during pre-training all leans toward the darker side. It has a real Frankenstein feel to it
hey capability does matter too. :-) that's why I chose Deepseek v4 pro
1d 13h 20m, 3,596,831 tokens. Goal achieved? Not quite