Grover Greenfelder
New member
- Joined
- Apr 21, 2026
- Messages
- 1
- Points
- 0
Hey everyone,
I need some technical guidance from people experienced with automation + LLM setups.
Currently, I have 4 iPhones (each with separate SIMs) connected to my computer. I’m using OpenClaw with Gemma 4 as the LLM to automate actions on these devices.
Right now, the limitation I’m facing is that everything works only at a very low level (manual command style). For example, if I want to scroll, I have to explicitly send a command like:
<span>exec adb shell input swipe 500 1500 500 500 200</span>
So the system only executes direct instructions — it doesn’t understand higher-level intent.
What I actually want is something much smarter, like:
My questions:
Thanks in advance
I need some technical guidance from people experienced with automation + LLM setups.
Currently, I have 4 iPhones (each with separate SIMs) connected to my computer. I’m using OpenClaw with Gemma 4 as the LLM to automate actions on these devices.
Right now, the limitation I’m facing is that everything works only at a very low level (manual command style). For example, if I want to scroll, I have to explicitly send a command like:
<span>exec adb shell input swipe 500 1500 500 500 200</span>
So the system only executes direct instructions — it doesn’t understand higher-level intent.
What I actually want is something much smarter, like:
- “Open Reddit”
- “Scroll until you find a cat image”
- “Upvote it”
- “Find a top-performing comment”
- “Reply with a human-like funny response”
My questions:
- Is this limitation coming from the LLM (Gemma 4 not being capable enough)?
- Or is it more of an OpenClaw limitation (no proper skill/action layer)?
- Does OpenClaw support creating custom “skills” or action chains for this kind of use case?
- How would you architect this so the bot can translate natural language into multi-step phone actions?
Thanks in advance