Willis Rice
New member
- Joined
- Jun 1, 2026
- Messages
- 1
- Points
- 0
Hey everyone,
I’ve been messing around with local LLMs using Ollama for a while, but I'm hitting the point where I need to take things to the next level and I'm looking for some advice from anyone who has experience here.
I am building a custom AI backend for automated assistant bot Right now, I'm using Ollama to run the models, but I need the agent to have long-term persistent memory and a strict, consistent persona that doesn't break character.
The Problem:
Ollama is great for testing out of the box, but handling dynamic memory (saving past chats) and keeping the bot acting exactly how I want it to over long conversations is getting tricky. I'm moving toward building a custom programmatic architecture (using Python/TypeScript) to handle the prompt stitching and database memory outside of Ollama.
What I’m looking for advice on:
I’ve been messing around with local LLMs using Ollama for a while, but I'm hitting the point where I need to take things to the next level and I'm looking for some advice from anyone who has experience here.
I am building a custom AI backend for automated assistant bot Right now, I'm using Ollama to run the models, but I need the agent to have long-term persistent memory and a strict, consistent persona that doesn't break character.
The Problem:
Ollama is great for testing out of the box, but handling dynamic memory (saving past chats) and keeping the bot acting exactly how I want it to over long conversations is getting tricky. I'm moving toward building a custom programmatic architecture (using Python/TypeScript) to handle the prompt stitching and database memory outside of Ollama.
What I’m looking for advice on:
- Has anyone successfully built a persistent memory system (like SQLite or vector databases) that feeds context into a local Ollama model?
- For those running custom personas—are you just using really good dynamic system prompts, or did you actually bite the bullet and fine-tune (LoRA) your own base model?
- Are there better alternatives you'd recommend over Ollama if my main goal is full API control and raw, unfiltered outputs?