"Building a Desktop LLM App with cpp-httplib" (#2403)

2026-06-11 17:17:17 +00:00 · 2026-03-21 23:31:55 -04:00
parent c2bdb1c5c1
commit 7178f451a4
35 changed files with 8889 additions and 35 deletions
--- a/docs-src/pages/en/llm-app/index.md
+++ b/docs-src/pages/en/llm-app/index.md
@@ -1,23 +1,26 @@
 ---
 title: "Building a Desktop LLM App with cpp-httplib"
 order: 0
-status: "draft"
+
 ---

-Build an LLM-powered translation desktop app step by step, learning both the server and client sides of cpp-httplib along the way. Translation is just an example — swap it out to build your own summarizer, code generator, chatbot, or any other LLM application.
+Have you ever wanted to add a web API to your own C++ library, or quickly build an Electron-like desktop app? In Rust you might reach for "Tauri + axum," but in C++ it always seemed out of reach.

-## Dependencies
+With [cpp-httplib](https://github.com/yhirose/cpp-httplib), [webview/webview](https://github.com/webview/webview), and [cpp-embedlib](https://github.com/yhirose/cpp-embedlib), you can take the same approach in pure C++ — and produce a small, easy-to-distribute single binary.

- [llama.cpp](https://github.com/ggml-org/llama.cpp) — LLM inference engine
- [nlohmann/json](https://github.com/nlohmann/json) — JSON parser (header-only)
- [webview/webview](https://github.com/webview/webview) — WebView wrapper (header-only)
- [cpp-httplib](https://github.com/yhirose/cpp-httplib) — HTTP server/client (header-only)
+In this tutorial we build an LLM-powered translation app using [llama.cpp](https://github.com/ggml-org/llama.cpp), progressing step by step from "REST API" to "SSE streaming" to "Web UI" to "desktop app." Translation is just the vehicle — replace llama.cpp with your own library and the same architecture works for any application.
+
+![Desktop App](app.png#large-center)
+
+If you know basic C++17 and understand the basics of HTTP / REST APIs, you're ready to start.

 ## Chapters

-1. **Embed llama.cpp and create a REST API** — Start with a simple API that accepts text via POST and returns a translation as JSON
-2. **Add token streaming with SSE** — Stream translation results token by token using the standard LLM API approach
-3. **Add model discovery and download** — Use the client to search and download GGUF models from Hugging Face
-4. **Add a Web UI** — Serve a translation UI with static file hosting, making the app accessible from a browser
-5. **Turn it into a desktop app with WebView** — Wrap the web app with webview/webview to create an Electron-like desktop application
-6. **Code reading: llama.cpp's server implementation** — Compare your implementation with production-quality code and learn from the differences
+1. **[Set up the project](ch01-setup)** — Fetch dependencies, configure the build, write scaffold code
+2. **[Embed llama.cpp and create a REST API](ch02-rest-api)** — Return translation results as JSON
+3. **[Add token streaming with SSE](ch03-sse-streaming)** — Stream responses token by token
+4. **[Add model discovery and management](ch04-model-management)** — Download and switch models from Hugging Face
+5. **[Add a Web UI](ch05-web-ui)** — A browser-based translation interface
+6. **[Turn it into a desktop app with WebView](ch06-desktop-app)** — A single-binary desktop application
+7. **[Reading the llama.cpp server source code](ch07-code-reading)** — Compare with production-quality code
+8. **[Making it your own](ch08-customization)** — Swap in your own library and customize