GoogleDeepmind

Setup AI Code Completions In VS Code Using Ollama

Content

Background

There was a time when developers use to search for issues in Stack Overflow or learn how to write code for something from official documentation sites. People still do but I see many of us have started to rely on AI GPTs like ChatGPT, Perplexity AI, Google Gemini and more. Before all this LLMs came along, there was another super helpful tool introduced called Github Copilot. It was made available as Extension in VS Code and it seemed crazy how it automatically generated code, correct few mistakes and suggests code completion on the fly. A need to lookup things on browser started diminishing.

How about interacting with ChatGPT like LLMs offline and get code completions in code editor without paying a dime, everything living in your machine? There are number of advantages to this:

  1. It is FREE, without ADs & Open Source!
  2. It works offline without internet connection that means all data lives in your system.
  3. If you work at a strict business having stringent data compliances, something like offline LLM could save your day!

Prerequisites

To setup Ollama & a LLM model you need good internet connection. It may require 6 to 8 GB of data, so make sure you are connected to a high speed network. I am assuming you already have VS Code installed on your system.

You do not need to know anything about AI, Ollama, LLMs or similar buzzwords. It might be good to read & learn about them though!

While I will demonstrate the process below using Mac OS, but similar steps should work out for Windows or Linux.

Installing Ollama

The way I see Ollama, it is just a software that is designed to run a server to enable you to chat with AI model. It also helps you manage multiple AI models. It exposes API endpoints that one can utilise and build AI Chatbots or AI Agents leveraging it.

Ollama Download
Ollama Download

Download Ollama installer from ollama.com/download after choosing your OS. For macOS it will download zip, just extract and move it to Applications, like how we do for most applications. For windows it might open a regular installer, just follow the usual steps and you are good to go.

Launch Ollama application from spotlight (macOS) or start menu (windows), you should see welcome screen like below:

Ollama Welcome Screen
Ollama Welcome Screen

Click Next, it will ask you to install the command line, Click Install

Ollama Command Line Install
Ollama Command Line Install

All done! it will finally show you command to run a model (which is not yet downloaded), we will talk about this in next section.

Ollama Run Model
Ollama Run Model

You can click Finish and Ollama popup will close, it will be running in background. In mac you will see Ollama icon at top right in menu bar. If you click on it, you will just see option to quit, don't click it yet & keep Ollama running.

Ollama Mac Menu Bar
Ollama Mac Menu Bar

Large Language Models (LLMs)

We know that a computer cpu processes information in bits of 0s and 1s. But it is difficult for us humans to write instructions in 0s and 1s. So we made Assembler that takes in Assembly language code and convert it into 0s and 1s. Then came programming languages which are kind of close to our daily speaking languages. And now, we have LLMs that can understand our language and can speak maybe better than us.

The idea is to have a computer understand instructions / tasks in the simplest way possible. With each era passing in computer science, we are closing-in in achieving this very goal. Therefore it was kind of a revolutionary moment when OpenAI made their ChatGPTs publicly accessible which can interpret our language and converse back with probable solutions, codes, diagrams, pictures (dall E), and much more. As of February 2025, one can just talk with ChatGPT like how we talk with anyone over phone!!

So LLMs are basically machine learning models trained on very huge, thousands or millions of gigabytes worth of textual information. That is why they are referred as "Large" Language Models. Because these models have gone through such vast examples of text, they can relate to many of those examples and respond us back with relevant information when we ask something. I suggest you to read more about it in this cloudflare blog post.

Let us get one LLM on your computer. Open Terminal / Command Prompt & type following (if terminal was already open, I suggest to restart it so that ollama commands are sourced.)

ollama help
ollama help
ollama help

This will list all available commands we can run. Notice a command called pull let us use that command to get a LLM good at coding. I prefer qwen2.5-coder model as you can see its evaluation result is quite impressive compared to others:

https://ollama.com/library/qwen2.5-coder
https://ollama.com/library/qwen2.5-coder
LLM models specifies parameters count like 7b, 14b, 32b.. b stands for billion. Ollama github repository has this note "You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models."

I have considered Qwen2.5 Coder with 7B parameters to be suitable for my case. To get the model, run below command:

 ollama pull qwen2.5-coder:7b

You can see list of all available parameters for Qwen2.5 Coder here. Once the pull is complete you can start running model with below command:

ollama run qwen2.5-coder:7b
ollama pull & run
ollama pull & run

As soon as you type above run command, a prompt field appears. You can chat with your LLM model now. You can even turn off wifi of your laptop and try chatting offline.

Enter /bye to exit this prompt. To verify that ollama is serving this model in background, you can run this command:

ollama ps

This will list all running models.

ollama ps
ollama ps

If you don't see model in the list, then just run it again and exit it with /bye again. Like how I have done in above screenshot. It is important to keep your model running in background else VS Code extension won't be able to utilise it.

VS Code & LLM Integration

This is the last part to achieve AI code completions in VS Code. Open Extensions view in VS Code

VS Code Extensions
VS Code Extensions

Search for Continue extension by continue.dev & click install.

Continue VS Code Extension
Continue VS Code Extension

Now we need to configure Continue extension to use ollama's Qwen2.5 Coder LLM model which we have running already from previous section. Click on Continue icon on the activity bar (left most side panel) & then click on Open Continue Config button (gear icon) as shown in below image.

Continue Prompt
Continue Prompt

It will open a config.json file, here we need to specify model name, title and provider in two blocks modelstabAutocompleteModel like below:

...
"models": [
	{
		"title": "Qwen 2.5 Coder 7b",
		"model": "qwen2.5-coder:7b",
		"provider": "ollama"
	}
],
"tabAutocompleteModel": {
	"title": "Auto Complete",
	"model": "qwen2.5-coder:7b",
	"provider": "ollama",
	"apiKey": ""
},
...

After this changes config.json should look like this:



Save the config file and close it, now in the continue chat side panel, you should see Qwen 2.5 Coder 7b selected & you can type hi & hit enter key.

You now have a working ChatGPT like AI chatbot inside VS Code that can run offline & most importantly your chats & data stays in your computer!!

Continue & Qwen 2.5 Coder LLM Integration
Continue & Qwen 2.5 Coder LLM Integration

Code Completions

After completing previous section, we can now test the code completions in VS Code. I have created an empty folder and opened it in VS Code. I then create a new javascript file code app.js and written some code comments for Continue to generate code:

When code suggestions appear, press Tab key to accept it.

I hope this was helpful, see you next time!