"Building a Desktop LLM App with cpp-httplib" (#2403)

This commit is contained in:
yhirose
2026-03-21 23:31:55 -04:00
committed by GitHub
parent c2bdb1c5c1
commit 7178f451a4
35 changed files with 8889 additions and 35 deletions

View File

@@ -1,23 +1,26 @@
---
title: "Building a Desktop LLM App with cpp-httplib"
order: 0
status: "draft"
---
Build an LLM-powered translation desktop app step by step, learning both the server and client sides of cpp-httplib along the way. Translation is just an example — swap it out to build your own summarizer, code generator, chatbot, or any other LLM application.
Have you ever wanted to add a web API to your own C++ library, or quickly build an Electron-like desktop app? In Rust you might reach for "Tauri + axum," but in C++ it always seemed out of reach.
## Dependencies
With [cpp-httplib](https://github.com/yhirose/cpp-httplib), [webview/webview](https://github.com/webview/webview), and [cpp-embedlib](https://github.com/yhirose/cpp-embedlib), you can take the same approach in pure C++ — and produce a small, easy-to-distribute single binary.
- [llama.cpp](https://github.com/ggml-org/llama.cpp) — LLM inference engine
- [nlohmann/json](https://github.com/nlohmann/json) — JSON parser (header-only)
- [webview/webview](https://github.com/webview/webview) — WebView wrapper (header-only)
- [cpp-httplib](https://github.com/yhirose/cpp-httplib) — HTTP server/client (header-only)
In this tutorial we build an LLM-powered translation app using [llama.cpp](https://github.com/ggml-org/llama.cpp), progressing step by step from "REST API" to "SSE streaming" to "Web UI" to "desktop app." Translation is just the vehicle — replace llama.cpp with your own library and the same architecture works for any application.
![Desktop App](app.png#large-center)
If you know basic C++17 and understand the basics of HTTP / REST APIs, you're ready to start.
## Chapters
1. **Embed llama.cpp and create a REST API**Start with a simple API that accepts text via POST and returns a translation as JSON
2. **Add token streaming with SSE**Stream translation results token by token using the standard LLM API approach
3. **Add model discovery and download**Use the client to search and download GGUF models from Hugging Face
4. **Add a Web UI** — Serve a translation UI with static file hosting, making the app accessible from a browser
5. **Turn it into a desktop app with WebView**Wrap the web app with webview/webview to create an Electron-like desktop application
6. **Code reading: llama.cpp's server implementation**Compare your implementation with production-quality code and learn from the differences
1. **[Set up the project](ch01-setup)** — Fetch dependencies, configure the build, write scaffold code
2. **[Embed llama.cpp and create a REST API](ch02-rest-api)** — Return translation results as JSON
3. **[Add token streaming with SSE](ch03-sse-streaming)** — Stream responses token by token
4. **[Add model discovery and management](ch04-model-management)** — Download and switch models from Hugging Face
5. **[Add a Web UI](ch05-web-ui)** — A browser-based translation interface
6. **[Turn it into a desktop app with WebView](ch06-desktop-app)** — A single-binary desktop application
7. **[Reading the llama.cpp server source code](ch07-code-reading)** — Compare with production-quality code
8. **[Making it your own](ch08-customization)** — Swap in your own library and customize