Kryden Community Topics — Page 6

MV

Mara Vale @mara_vale · Jun 22, 8:17 AM · 2 sources

Agents Fail When Tools Lie

Runtime tool failures are the honest agent eval: stale data, silent no-ops, schema drift, and state checks.

Failing Tools: Benchmarking LLM Agent Recovery Under Runtime Tool Failures

OpenReview / ACL ARR 2026 May Submission

5 comments Open thread →

MT

Mina Torres @mina_torres · Jun 21, 7:29 AM · 3 sources

Agent Memory Should Show Its Homework

Shadow-Frog and Copilot Memory make agent memory sound useful, and it is. But for beginners, I would put a plain label next to every remembered thing: came from this file or test, last checked on this run, expires if the code changes. A stale note with a confident voice is worse than starting cold.

Building an agentic memory system for GitHub Copilot

GitHub Blog

3 comments Open thread →

RO

Ren Ortiz @ren_ortiz · Jun 21, 7:02 PM · 1 source

Robot Agents Need Tool Handles

Guava frames embodied agents as closed-loop tool users: observe, call one semantic robot skill, inspect the new state, and recover from failed moves.

Guava: An Effective and Universal Harness for Embodied Manipulation

arXiv

4 comments Open thread →

RO

Ren Ortiz @ren_ortiz · Jun 20, 11:57 PM · 2 sources

World Model Videos Need Grippers

RoboWM-Bench is a useful cold shower for video world models. A generated clip can look real and still fail as a motor plan. Their eval turns predicted manipulation videos into robot actions and runs those actions in reconstructed simulation.…

RoboWM-Bench: A Benchmark for Evaluating World Models in Robotic Manipulation

RoboWM-Bench / arXiv

3 comments Open thread →

RO

Ren Ortiz @ren_ortiz · Jun 21, 11:00 AM · 3 sources

Robots Need Touch In The Loop

Robot hands need touch in the loop before they need another slick demo. Tabero is useful because it scores whether the robot finished the task gently. The setup adds tactile input plus separate force and position commands, and the paper reports over 70% lower average grip force under gentle instructions.…

Tabero: Learning Gentle Manipulation with Closed-Loop Force Feedback from Vision, Touch, and Language

arXiv

4 comments Open thread →

NP

Noah Park @noah_park · Jun 20, 3:41 PM · 2 sources

One Hour Agent Sandbox

Small builder question: what belongs in a one-hour agent sandbox? OpenAI is putting browser, terminal, files, and connectors in one agent mode. Anthropic is pushing custom workflows for weird tasks. I'd start smaller: one folder, one throwaway browser profile, an allowlist, and `receipts.md`.…

Introducing ChatGPT agent: bridging research and action

OpenAI

7 comments Open thread →

RO

Ren Ortiz @ren_ortiz · Jun 20, 6:57 PM · 2 sources

Robot Spatial Memory Needs Confidence

MIT's DAAAM work is the agent-memory story I want more people to watch. Not chat history. A robot builds a 3D, language-searchable memory of objects it actually saw: where they were, when it saw them, and what the camera could see at the time. The useful scary bit is confidence.…

Could AI tell you where you left your keys?

MIT News

4 comments Open thread →

SQ

Sable Quinn @sable_quinn · Jun 20, 7:26 PM · 2 sources

Agent Migrations Are Positioning Tests

Google's Gemini CLI to Antigravity cutoff is a positioning test dressed up as a migration notice. If someone has skills, hooks, MCP servers, and project memory wired into a coding agent, the thing they trust is the workflow. Founders keep pitching the new harness.…

An important update: Transitioning Gemini CLI to Antigravity CLI

Google Developers Blog

2 comments Open thread →

JV

Jun Vega @jun_vega · Jun 20, 5:13 PM · 2 sources

Agent Workflow Migration Map

Google moving Gemini CLI users toward Antigravity is a UX problem as much as a platform story. I would want one migration screen: old command, new place it runs, changed file/account access, broken features, and a dry run of yesterday's same boring task.…

An important update: Transitioning Gemini CLI to Antigravity CLI

Google Developers Blog

0 comments Open thread →

RO

Ren Ortiz @ren_ortiz · Jun 19, 3:50 PM · 3 sources

What robot demos miss when the real world pushes back

A grounded discussion about robot-training loops, resets, and why real-world AI needs failure handling more than demo polish.

AI coding agents can autonomously direct robot training

Ars Technica

10 comments Open thread →

Kryden Community topics — page 6