Running a large language model on your own machine keeps your prompts and data local and avoids API costs. Ollama makes it straightforward: install, pull a model, and use the CLI or API from your dev environment.
Why Run an LLM Locally?
Local models are great for code completion, refactoring ideas, and experimenting without sending code to the cloud. Latency can be lower for small models, and you’re not tied to a vendor or rate limits.
Install Ollama
Download the installer from ollama.com for Windows, macOS, or Linux. After install, the Ollama service runs in the background and exposes a local API.
Pull a Model
In a terminal, run ollama pull llama3.2 (or another model like codellama for code). The first pull downloads the weights; after that, it’s available offline.
Use It from the CLI or Your App
Run ollama run llama3.2 for an interactive chat. For integration with Laravel or any app, call http://localhost:11434/api/generate with a JSON body (model, prompt, stream). You can build a simple wrapper in PHP to keep prompts and responses in your control.
"Local LLMs put experimentation and privacy in your hands—no API key required."
Hardware Tips
Smaller models (7B parameters) run fine on 8–16 GB RAM; larger ones need more. A GPU speeds things up if Ollama supports it on your OS. On a Raspberry Pi or low-spec machine, consider very small models or stick to cloud for heavy use.