Today, I spent some time studying the OpenWebUI tool, with a particular focus on how to deploy it alongside Ollama on a MacBook with Unified Memory. In summary, it essentially replicates all the functionalities of POE, ChatGPT, and Claude, along with many additional features, and can be deployed entirely locally. The experience was so impressive that I was truly amazed.
-
Its model inference end supports Ollama and OpenAI's API, as well as Claude, thus supporting a variety of popular models. It also supports dynamic switching and comparison of models. For example, you can select GPT-4o and Llama 3.2, and it will display the results from both side by side for comparison. It also provides a "Model Arena" feature that displays results from two anonymous models and lets you choose which is better. This is a method to collectively evaluate model quality. Additionally, it supports editing conversation history, allowing you to edit not only what you've said but also what the AI has said.
-
One unique feature that others lack is full-text search. This is particularly useful when you have accumulated a significant amount of chat records and suddenly want to find something you discussed earlier. I was very surprised that platforms like ChatGPT and Claude do not support full-text search, but OpenWebUI does provide this feature. Although it's just string matching and not fuzzy searching, it is already quite useful.
-
It supports file uploads, such as PDF files. You can choose to parse the content using OCR or a text parser. It also supports image understanding; for example, you can upload an image and connect it with Llama 3.2 Vision for Q&A.
-
It supports RAG (Retrieval-Augmented Generation) and allows you to customize your own knowledge base. One of its best features is the high degree of customization, such as being able to use various embeddings. These embeddings can use OpenAI's API or local BERT-based models. To my surprise, I discovered a good embedding model called BGE-m3 through this exploration. According to benchmarks, its retrieval performance is better than the largest OpenAI model, and it is just a small BERT based model with 500 million parameters. You can simply run
ollama pull bge-m3
to seamlessly integrate it into OpenWebUI, switch to this model with a few mouse clicks. However, overall, I am quite unsatisfied with the RAG Q&A results. Compared to directly feeding all content into the context window, the performance is much worse. -
From a programming perspective, it supports features like the canvas in GPT or artifacts in Claude. The HTML and JavaScript code you write can be previewed directly in the browser, and you can edit AI-generated code files on the spot or download them in bulk. Its editor even includes auto-completion and allows you to run Python code you've written right there. The only missing feature is that I haven't figured out how to upload a file and have it write Python code to visualize or process that file. However, from what I've seen online, this functionality might exist; I just haven't learned how to use it yet.
-
It also has internet search capabilities, allowing you to call APIs like Google, Bing, or DuckDuckGo for searches.
-
It features voice capabilities with both batch-processing, i.e. standard speech recognition and reading back, as well as direct chat-like interactions similar to GPT Chat Mode, though real-time conversation is not yet supported. Like other features, the voice aspect offers extensive customization options, such as using various local speech recognition models, including distill-whisper-large-v3, and a range of offline TTS models, like CMU's BERT-based series, which perform surprisingly well. This allows for the entire process to run entirely locally.
-
OpenWeb UI also supports customizing AI, including custom prompts, adding private knowledge bases, and using your own private agents to enhance its capabilities. It even supports distributing these customized AI models, with a community where you can download, share, and review others' custom AIs.
-
Of course, one of its most significant advantages is that it's a completely open-source, free system, enabling complete offline deployment, full customization, and total control. If you want additional features, you can write the code to add them yourself. It works fine even when disconnected from the internet, especially on a MacBook, where many model inferences run quite well. For instance, I use a 32B Qwen 2.5 on an M4 Max and achieve about 18 tokens per second. This is sufficient for coding, answering questions, and RAG.
In summary, I used to think that open-source kits like OpenWeb UI were just poor imitations of the latest closed-source AI models and products. After all, companies like OpenAI and Anthropic have received massive investments and employ a large number of people, whereas many in the open-source community, including Facebook as a company for now, are often working on a volunteer basis. However, after trying out the combination of OpenWeb UI and ollama, I found that while their models might be slightly behind the latest ones like o1, they are quite comparable to GPT-4o or Claude 3.5 in terms of capabilities. In terms of product experience, although there are a few imperfections—like the lack of real-time voice chat—it essentially has all the features that other vendors offer and even adds new functionalities and customization options. From my usage tonight, the experience was quite good, which surprised me greatly. Perhaps I should break my preconception about closed-source commercial models and explore more open-source AI platforms. The level of customization and the potential for imagination in these platforms is limitless.
Comments