Rapid Prototyping of Chatbots with Streamlit and Chainlit

process of building — and collecting regular user feedback on — simple versions of a product to quickly validate important assumptions and hypotheses, and assess key risks. This approach is closely aligned with the practice of agile software development, and the “build-measure-learn” process in the Lean Startup methodology, and can significantly reduce development costs and shorten the time-to-market. Rapid prototyping is especially useful for shipping successful AI products, given the early-stage nature of related technologies, use cases, and user expectations.

To this end, Streamlit was launched in 2019 as a Python framework that simplifies the process of prototyping AI apps that require user interfaces (UIs). Data scientists and engineers can focus on the backend parts (e.g., training an ML model and exposing a prediction endpoint via an API), and with only a few lines of Python code, Streamlit can spin up a user-friendly, customizable UI. Chainlit, also a Python framework, was launched more recently in 2023 to specifically address pain points in prototyping conversational AI applications (i.e., chatbots). While Streamlit and Chainlit are similar in some ways, there are also important differences. In this article, we will examine the pros and cons of both frameworks by building end-to-end demo chatbot applications, and provide practical recommendations.

Note: All figures in the following sections have been created by the author of this article.

End-to-End Chatbot Demos

Local Setup

For simplicity, we will build the demo applications so that they can easily be tested in a local environment using open-source large language models (LLMs) accessed via Ollama, a tool for downloading, managing, and interacting with open-source LLMs in a user-friendly way on one’s local machine.

Of course, the demos can later be modified for use in production, e.g., by leveraging the latest LLMs offered by the likes of OpenAI or Google, and by deploying the chatbot on a commonly used hyperscaler such as AWS, Azure, or GCP. All implementation steps below have been tested on macOS Sequoia 15.6.1, and should be roughly similar on Linux and Windows.

Go here to download and install Ollama. Check that the installation was successful by running this command in the Terminal:

ollama --version

We will use Google’s lightweight Gemma 2 model with 2B parameters, which can be downloaded with this command:

ollama pull gemma:2b

The model file size is around 1.7 GB, so the download might take a few minutes depending on your internet connection. Verify that the model has been downloaded using this command:

ollama list

This will show all the models that have been downloaded via Ollama so far.

Next, we will set up the project directory using uv, a fast and user-friendly project management tool for Python. Follow the instructions here to install uv, and verify the installation using this command:

uv --version

Initialize a project directory called chatbot-demos at a suitable location on your local machine like this:

uv init --bare chatbot-demos

Without specifying the --bare option, uv would have created some standard artifacts during initialization, such as main.py, README.md, and a Python version pin file, but these are not needed for our demos. The minimal process only creates a pyproject.toml file.

In the chatbot-demos project directory, create a requirements.txt file with the following dependencies:

chainlit==2.7.2
ollama==0.5.3
streamlit==1.49.1

Now create a virtual Python 3.12 environment inside the project directory, activate the environment, and install the dependencies:

uv venv --python=3.12 
source .venv/bin/activate
uv add -r requirements.txt

Check that the dependencies have been installed:

uv pip list

We will implement a class called LLMClient for backend functionality that can be decoupled from the UI-centric functionality, which is the key differentiator of frameworks like Streamlit and Chainlit. For example, LLMClient could take care of tasks such as choosing between LLM providers, executing LLM calls, interacting with external databases for retrieval-augmented generation (RAG), and logging the conversation history for later analysis. Here is an example implementation of LLMClient, kept in a file called llm_client.py:

import logging
import time
from datetime import datetime, timezone
from typing import List, Dict, Optional, Callable, Any, Generator
import os
import ollama

LOG_FILE = os.path.join(os.path.dirname(__file__), "conversation_history.log")

logger = logging.getLogger("conversation_logger")
logger.setLevel(logging.INFO)

if not logger.handlers:
    fh = logging.FileHandler(LOG_FILE, encoding="utf-8")
    fmt = logging.Formatter("%(asctime)s - %(message)s")
    fh.setFormatter(fmt)
    logger.addHandler(fh)

class LLMClient:
    def __init__(
        self,
        provider: str = "ollama",
        model: str = "gemma:2b",
        temperature: float = 0.2,
        retriever: Optional[Callable[[str], List[str]]] = None,
        feedback_handler: Optional[Callable[[Dict[str, Any]], None]] = None,
        logger: Optional[Callable[[Dict[str, Any]], None]] = None
    ):
        self.provider = provider
        self.model = model
        self.temperature = temperature
        self.retriever = retriever
        self.feedback_handler = feedback_handler
        self.logger = logger or self.default_logger

    def default_logger(self, data: Dict[str, Any]):
        logging.info(f"[LLMClient] {data}")

    def _format_messages(self, messages: List[Dict[str, str]]) -> str:
        return "\n".join(f"{m['role'].capitalize()}: {m['content']}" for m in messages)

    def _stream_provider(self, prompt: str, temperature: float) -> Generator[str, None, None]:
        if self.provider == "ollama":
            for chunk in ollama.generate(
                model=self.model,
                prompt=prompt,
                stream=True,
                options={"temperature": temperature}
            ):
                yield chunk.get("response", "")
        else:
            raise ValueError(f"Streaming not implemented for provider: {self.provider}")

    def stream_generate(
        self,
        messages: List[Dict[str, str]],
        on_token: Callable[[str], None],
        temperature: Optional[float] = None
    ) -> Dict[str, Any]:
        start_time = time.time()

        if self.retriever:
            query = messages[-1]["content"]
            docs = self.retriever(query)
            if docs:
                context_str = "\n".join(docs)
                messages = [{"role": "system", "content": f"Use this context:\n{context_str}"}] + messages

        prompt = self._format_messages(messages)
        assembled_text = ""
        temp_to_use = temperature if temperature is not None else self.temperature

        try:
            for token in self._stream_provider(prompt, temp_to_use):
                assembled_text += token
                on_token(token)
        except Exception as e:
            assembled_text = f"Error: {e}"

        latency = time.time() - start_time

        result = {
            "text": assembled_text,
            "timestamp": datetime.now(timezone.utc),
            "latency": latency,
            "provider": self.provider,
            "model": self.model,
            "temperature": temp_to_use,
            "messages": messages
        }

        self.logger({
            "event": "llm_stream_call",
            "provider": self.provider,
            "model": self.model,
            "temperature": temp_to_use,
            "latency": latency,
            "prompt": prompt,
            "response": assembled_text
        })

        return result

    def record_feedback(self, feedback: Dict[str, Any]):
        if self.feedback_handler:
            self.feedback_handler(feedback)
        else:
            self.logger({"event": "feedback", **feedback})
    
    def log_interaction(self, role: str, content: str):
        logger.info(f"{role.upper()}: {content}")

Basic Streamlit Demo

Create a file called st_app_basic.py in the project directory and paste in the following code:

import streamlit as st
from llm_client import LLMClient

MAX_HISTORY = 5
llm_client = LLMClient(provider="ollama", model="gemma:2b")

st.set_page_config(page_title="Streamlit Basic Chatbot", layout="centered")
st.title("Streamlit Basic Chatbot")

if "messages" not in st.session_state:
    st.session_state.messages = []

# Display chat history
for msg in st.session_state.messages:
    with st.chat_message(msg["role"]):
        st.markdown(msg["content"])

# User input
if prompt := st.chat_input("Type your message..."):
    st.session_state.messages.append({"role": "user", "content": prompt})
    st.session_state.messages = st.session_state.messages[-MAX_HISTORY:]
    llm_client.log_interaction("user", prompt)

    with st.chat_message("assistant"):
        response_container = st.empty()
        state = {"full_response": ""}

        def on_token(token):
            state["full_response"] += token
            response_container.markdown(state["full_response"])

        result = llm_client.stream_generate(st.session_state.messages, on_token)
        st.session_state.messages.append({"role": "assistant", "content": result["text"]})
        llm_client.log_interaction("assistant", result["text"])

Launch the app at localhost:8501 like this:

streamlit run st_app_basic.py

If the app does not open automatically in your default browser, navigate to the URL manually (http://localhost:8501). You should see a bare-bones chat interface. Enter the following question in the prompt field and hit Enter:

What is the formula to convert Celsius to Fahrenheit?

Figure 1 shows the result:

Figure 1: Initial Streamlit Q&A

Now, ask this follow-up question:

Can you implement that formula in Python?

Since our demo implementation keeps track of the conversation history for up to 5 previous messages, the chatbot will be able to associate “that formula” with the one in the preceding prompt, as shown in Figure 2 below:

Figure 2: Follow-Up Streamlit Q&A

Feel free to play around with a few more prompts. To close the app, execute Control + c in the Terminal.

Basic Chainlit Demo

Create a file called cl_app_basic.py in the project directory and paste in the following code:

import chainlit as cl
from llm_client import LLMClient

MAX_HISTORY = 5
llm_client = LLMClient(provider="ollama", model="gemma:2b")

@cl.on_chat_start
async def start():
    await cl.Message(content="Welcome! Ask me anything.").send()
    cl.user_session.set("messages", [])

@cl.on_message
async def main(message: cl.Message):
    messages = cl.user_session.get("messages")
    messages.append({"role": "user", "content": message.content})
    messages[:] = messages[-MAX_HISTORY:]
    llm_client.log_interaction("user", message.content)

    state = {"full_response": ""}
    
    def on_token(token):
        state["full_response"] += token

    result = llm_client.stream_generate(messages, on_token)
    messages.append({"role": "assistant", "content": result["text"]})
    llm_client.log_interaction("assistant", result["text"])

    await cl.Message(content=result["text"]).send()

Launch the app at localhost:8000 (note the different port) like this:

chainlit run cl_app_basic.py

For the sake of comparison, we will run the same two prompts as before. The results are shown in Figures 3 and 4 below:

Figure 3: Initial Chainlit Q&A
Figure 4: Follow-Up Chainlit Q&A

As before, after playing around with some more prompts, close the app by executing Control + c in the Terminal.

Advanced Streamlit Demo

We will now extend the basic Streamlit demo with a persistent sidebar on the left side with a slider widget to toggle the temperature parameter of the LLM, a button to download chat history, and feedback buttons below each chatbot response (“Helpful”, “Not Helpful”). Customizing the app layout and adding global widgets can be done relatively easily in Streamlit but may be cumbersome to replicate in Chainlit — interested readers can give it a go to experience the difficulties first-hand.

Here is the extended Streamlit app, kept in a file called st_app_advanced.py:

import streamlit as st
from llm_client import LLMClient
import json

MAX_HISTORY = 5
llm_client = LLMClient(provider="ollama", model="gemma:2b")

st.set_page_config(page_title="Streamlit Advanced Chatbot", layout="wide")
st.title("Streamlit Advanced Chatbot")

# Sidebar controls
st.sidebar.header("Model Settings")
temperature = st.sidebar.slider("Temperature", 0.0, 1.0, 0.2, 0.1)  # min, max, default, increment size
st.sidebar.download_button(
    "Download Chat History",
    data=json.dumps(st.session_state.get("messages", []), indent=2),
    file_name="chat_history.json",
    mime="application/json"
)

if "messages" not in st.session_state:
    st.session_state.messages = []

# Display chat history
for msg in st.session_state.messages:
    with st.chat_message(msg["role"]):
        st.markdown(msg["content"])

# User input
if prompt := st.chat_input("Type your message..."):
    st.session_state.messages.append({"role": "user", "content": prompt})
    st.session_state.messages = st.session_state.messages[-MAX_HISTORY:]
    llm_client.log_interaction("user", prompt)

    with st.chat_message("assistant"):
        response_container = st.empty()
        state = {"full_response": ""}

        def on_token(token):
            state["full_response"] += token
            response_container.markdown(state["full_response"])

        result = llm_client.stream_generate(
            st.session_state.messages,
            on_token,
            temperature=temperature
        )
        llm_client.log_interaction("assistant", result["text"])
        st.session_state.messages.append({"role": "assistant", "content": result["text"]})

        # Feedback buttons
        col1, col2 = st.columns(2)
        if col1.button("Helpful"):
            llm_client.record_feedback({"rating": "up", "comment": "User liked the answer"})
        if col2.button("Not Helpful"):
            llm_client.record_feedback({"rating": "down", "comment": "User disliked the answer"})

Figure 5 shows an example screenshot:

Figure 5: Demo of Advanced Streamlit Features

Advanced Chainlit Demo

Next, we will extend the basic Chainlit demo with per-message interactive actions and multimodal input handling (text and images in our case). The chat-native primitives of the Chainlit framework make it easier to implement these types of features than in Streamlit. Again, interested readers are encouraged to experience the difference by attempting to replicate the functionality using Streamlit.

Here is the extended Chainlit app, kept in a file called cl_app_advanced.py:

import os
import json
from typing import List, Dict
import chainlit as cl
from llm_client import LLMClient

MAX_HISTORY = 5
DEFAULT_TEMPERATURE = 0.2
SESSIONS_DIR = os.path.join(os.path.dirname(__file__), "sessions")
os.makedirs(SESSIONS_DIR, exist_ok=True)

llm_client = LLMClient(provider="ollama", model="gemma:2b", temperature=DEFAULT_TEMPERATURE)

def _session_file(session_name: str) -> str:
    safe = "".join(c for c in session_name if c.isalnum() or c in ("-", "_"))
    return os.path.join(SESSIONS_DIR, f"{safe or 'default'}.json")

def _save_session(session_name: str, messages: List[Dict]):
    with open(_session_file(session_name), "w", encoding="utf-8") as f:
        json.dump(messages, f, ensure_ascii=False, indent=2)

def _load_session(session_name: str) -> List[Dict]:
    path = _session_file(session_name)
    if os.path.exists(path):
        with open(path, "r", encoding="utf-8") as f:
            return json.load(f)
    return []

@cl.on_chat_start
async def start():
    cl.user_session.set("messages", [])
    cl.user_session.set("session_name", "default")
    cl.user_session.set("last_assistant_idx", None)

    await cl.Message(
        content=(
            "Welcome! Ask me anything."
        ),
        actions=[
            cl.Action(name="set_session_name", label="Set session name", payload={"turn": None}),
            cl.Action(name="save_session", label="Save session", payload={"turn": "save"}),
            cl.Action(name="load_session", label="Load session", payload={"turn": "load"}),
        ],
    ).send()

@cl.action_callback("set_session_name")
async def set_session_name(action):
    await cl.Message(content="Please type: /name YOUR_SESSION_NAME").send()

@cl.action_callback("save_session")
async def save_session(action):
    session_name = cl.user_session.get("session_name")
    _save_session(session_name, cl.user_session.get("messages", []))
    await cl.Message(content=f"Session saved as '{session_name}'.").send()

@cl.action_callback("load_session")
async def load_session(action):
    session_name = cl.user_session.get("session_name")
    loaded = _load_session(session_name)
    cl.user_session.set("messages", loaded[-MAX_HISTORY:])
    await cl.Message(content=f"Loaded session '{session_name}' with {len(loaded)} turn(s).").send()

@cl.on_message
async def main(message: cl.Message):
    if message.content.strip().startswith("/name "):
        new_name = message.content.strip()[6:].strip() or "default"
        cl.user_session.set("session_name", new_name)
        await cl.Message(content=f"Session name set to '{new_name}'.").send()
        return

    messages = cl.user_session.get("messages")

    user_text = message.content or ""
    if message.elements:
        for element in message.elements:
            if getattr(element, "mime", "").startswith("image/"):
                user_text += f" [Image: {element.name}]"

    messages.append({"role": "user", "content": user_text})
    messages[:] = messages[-MAX_HISTORY:]
    llm_client.log_interaction("user", user_text)

    state = {"full_response": ""}
    msg = cl.Message(content="")

    def on_token(token: str):
        state["full_response"] += token
        cl.run_sync(msg.stream_token(token))

    result = llm_client.stream_generate(messages, on_token, temperature=DEFAULT_TEMPERATURE)
    messages.append({"role": "assistant", "content": result["text"]})
    llm_client.log_interaction("assistant", result["text"])

    msg.content = state["full_response"]
    await msg.send()
    
    turn_idx = len(messages) - 1
    cl.user_session.set("last_assistant_idx", turn_idx)

    await cl.Message(
        content="Was this helpful?",
        actions=[
            cl.Action(name="thumbs_up", label="Yes", payload={"turn": turn_idx}),
            cl.Action(name="thumbs_down", label="No", payload={"turn": turn_idx}),
            cl.Action(name="save_session", label="Save session", payload={"turn": "save"}),
        ],
    ).send()

@cl.action_callback("thumbs_up")
async def thumbs_up(action):
    turn = action.payload.get("turn")
    llm_client.record_feedback({"rating": "up", "turn": turn})
    await cl.Message(content="Thanks for your feedback!").send()

@cl.action_callback("thumbs_down")
async def thumbs_down(action):
    turn = action.payload.get("turn")
    llm_client.record_feedback({"rating": "down", "turn": turn})
    await cl.Message(content="Thanks for your feedback.").send()

Figure 6 shows an example screenshot:

Figure 6: Demo of Advanced Chainlit Features

Practical Guidance

As the previous section demonstrates, it is possible to rapidly prototype simple chatbot applications with both Streamlit and Chainlit. In the basic demos that we implemented, there were a few architectural similarities: the calls to Ollama and conversation logging was abstracted away using the LLMClient class, the context size was limited using a constant variable called MAX_HISTORY, and the history was serialized into a plaintext chat format. As the advanced demos show, however, the scope of each framework is somewhat different, which entails certain pros and cons depending on the use case along with related practical recommendations.

Whereas Streamlit is a general-purpose framework for data-centric, interactive web apps, Chainlit is focused on building and deploying conversational AI apps. Thus, Chainlit may make more sense to use if the chatbot is central to the prototype; as the above code examples illustrate, Chainlit takes care of several boilerplate operational details (e.g., built-in chat features for native typing indicators, message streaming, and markdown/code rendering). But if the chatbot is embedded in a larger AI product, Streamlit may be able to better cope with the larger application scope (e.g., combining the chat interface with data visualizations, dashboards, global widgets, and custom layouts).

Furthermore, the conversational elements in AI applications may need to be handled in an asynchronous manner to ensure a good user experience (UX), since messages can arrive at any time and need to be processed quickly while other tasks may be in progress (e.g., calling another API or streaming model output). Chainlit makes it easy to prototype asynchronous chat logic using Python’s async and await keywords, ensuring that the app can handle concurrent operations without blocking the UI. The framework takes care of low-level details around managing WebSocket connections and custom polling, so that whenever an event is triggered (e.g., message sent, token streamed, state changed), the event handling logic of Chainlit automatically triggers UI updates as required. By contrast, Streamlit uses synchronous communication, which causes the app script to rerun with each user interaction; for complex apps that need to juggle multiple concurrent processes, Chainlit may allow for a smoother UX than Streamlit.

Finally, beyond the limitations that come with focusing primarily on chat-based applications, Chainlit was released a few years after Streamlit, so it is currently less technically mature and has a smaller developer community; e.g., fewer third‑party extensions, community‑contributed examples, and troubleshooting resources are available at the moment. Although Chainlit is evolving rapidly and gaps are actively being addressed, developers may encounter occasional breaking changes between versions, less comprehensive documentation for advanced use cases, and limited integration guidance for certain deployment environments. Product teams that still wish to prototype chatbot-centric AI applications using Chainlit due to potential long-term architectural benefits should thus be prepared to make some additional short-term investments in custom development, experimentation, and direct engagement with the framework maintainers and relevant community forums to resolve issues and request additional functionality.


Source link

About AI Writer

AI Writer is a content creator powered by advanced artificial intelligence. Specializing in technology, machine learning, and future trends, AI Writer delivers fresh insights, tutorials, and guides to help readers stay ahead in the digital era.

Check Also

[2506.24000] The Illusion of Progress? A Critical Look at Test-Time Adaptation for Vision-Language Models

[Submitted on 30 Jun 2025 (v1), last revised 13 Oct 2025 (this version, v2)] View …

Leave a Reply

Your email address will not be published. Required fields are marked *