Back to Blog

Anton

April 2025
Harshit Raj
Gen AiFine-TuningFull Stack DevelopmentMicroservice
Anton

Anton - Advanced Conversational AI Platform

Overview

Live Demo: Anton as Aanya on Telegram

Anton is a sophisticated conversational AI assistant. It leverages modern AI techniques to provide natural, context-aware conversations through a Telegram interface. The system combines retrieval-augmented generation (RAG), vector databases, and fine-tuned large language models to create human-like conversational experiences with a unique, evolving personality.

Architecture

The platform is built on a microservice architecture with four main components:

  1. Server (Node.js): Original NodeJS backend that handles Telegram integration via gram.js and message processing
  2. Server (Flask): New Python backend that provides an alternative implementation using Telethon for Telegram integration
  3. Llama: AI service that manages embeddings, vector search, and response generation
  4. Landing Page: React frontend showcasing the platform's capabilities

Key Technologies

  • Frontend: React, Vite, TailwindCSS, PostCSS
  • Backend:
    • Node.js: Express.js
    • Python: Flask, Flask-CORS
  • Database: MongoDB, Pinecone (Vector DB for both storage and search)
  • AI/ML: Google Gemini Flash 2.0 (fine-tuned on 1M+ conversations), Vertex AI
  • Messaging:
    • Node.js: gram.js (Telegram MTProto API client)
    • Python: Telethon (Telegram client library)
  • Deployment: Google Cloud Run
  • DevOps: CI/CD Pipeline with GitHub Actions

Core Features

Advanced Conversational AI

  • Context-aware responses using RAG (Retrieval-Augmented Generation)
  • Personality modeling that adapts and evolves through user interactions
  • Natural language processing with emotion detection
  • Hinglish support (English + Hindi) for Indian users
  • Chain-of-thought processing via multiple chained LLMs

Telegram Integration

  • Seamless connection with Telegram via gram.js MTProto client (Node.js) or Telethon (Python)
  • Real-time message processing and response generation
  • Media handling (photos, videos, documents, etc.)
  • Command recognition and processing

Vector Search & Retrieval

  • Document embedding using Google's Gemini models
  • Semantic search with Pinecone vector database
  • Efficient top-K retrieval for relevant context
  • Progressive context building from user interactions

User Management

  • User history tracking and context maintenance
  • Session management for persistent conversations
  • Role-based message processing (user vs agent)
  • Personalized experience based on conversation history

System Components in Detail

Server Module (Node.js)

The original server component manages Telegram integration, processes incoming messages, and coordinates with the Llama service for AI-powered responses:

  • Message Handling: Processes incoming Telegram messages via gram.js
  • User Management: Tracks users and their conversation history
  • MongoDB Integration: Stores messages, user data, and conversation context
  • Telegram Client: Connects to Telegram using MTProto protocol via gram.js

Server Module (Flask)

The new Python-based server provides an alternative implementation with similar functionality:

  • Flask RESTful API: Provides endpoints for message handling and Telegram listener management
  • Telethon Integration: Uses Telethon library for Telegram MTProto API connectivity
  • Message Controller: Handles message CRUD operations with MongoDB
  • Listener Management: Provides start/stop/status functionality for the Telegram message listener

Llama Module

The Llama service is responsible for the AI capabilities of the platform:

  • Embedding Generation: Creates vector representations of messages
  • Vector Storage/Search: Uses Pinecone for both storing and querying semantic data
  • Context Generation: Builds prompts with relevant conversation history
  • Response Generation: Uses fine-tuned Google Gemini Flash 2 to generate human-like responses
  • LLM Chaining: Implements multiple LLM stages to replicate chain-of-thought reasoning

Landing Page

The web frontend provides information about the platform and directs users to the Telegram bot:

  • Responsive Design: Mobile-first approach with TailwindCSS
  • Silicon Valley Theme: Custom color palette inspired by the show
  • Interactive Elements: Animated components and hover effects
  • CTAs: Direct links to the Telegram bot

Technical Implementation Highlights

RAG Pipeline

The retrieval-augmented generation pipeline combines:

  1. Vector embedding of user messages
  2. Similarity search in Pinecone
  3. Context construction with relevant history
  4. Response generation with fine-tuned Gemini Flash 2

Personality Modeling

The system includes a sophisticated personality layer:

// Snippet from createPrompt.js
const PROMPT = `
You are helping create a casual, emotional conversation context for an Indian chat agent.

User's latest message: ${query}
Past chat messages: ${msgs}
Past agent's similar responses: ${agent}

What's happening:
- User is the person chatting casually (like WhatsApp/Instagram vibes).
- Agent is the chat support buddy replying casually.
- Past messages may include both user and agent texts with timestamps.
`;

Chain-of-Thought Processing

The platform chains multiple LLMs to create more natural responses:

  1. First LLM analyzes message intent and emotion
  2. Second LLM retrieves relevant context from Pinecone
  3. Third LLM generates reasoning path (not shown to user)
  4. Final LLM creates human-like response based on reasoning

Message Processing Flow

Messages flow through the system as follows:

  1. Telegram client (gram.js/Telethon) receives Telegram message
  2. Server processes and stores message in MongoDB
  3. Server sends message to Llama API
  4. Llama generates embeddings and queries/updates Pinecone
  5. Llama constructs context, executes LLM chain, and generates response
  6. Server receives response and sends back to user via Telegram client

Performance Metrics

  • Response Time: < 2 seconds average
  • Accuracy: 85%+ contextual relevance
  • Scalability: Handles 1000+ concurrent users
  • Availability: 99.9% uptime

Future Enhancements

  • Multi-platform Support: Extending beyond Telegram to WhatsApp, Discord, etc.
  • Voice Interaction: Adding speech-to-text and text-to-speech capabilities
  • Personalization: Enhanced user preference learning
  • Multi-language Support: Expanding beyond English and Hinglish

Installation and Setup

Prerequisites

  • Node.js 16+ (for Node.js server)
  • Python 3.13+ (for Flask server)
  • MongoDB
  • Google Cloud account with Vertex AI access
  • Pinecone account
  • Telegram API credentials

Environment Configuration

Create .env files in the server, server-flask, and llama directories:

# Node.js Server .env
PORT=3000
NODE_ENV=production
MONGODB_URI=mongodb://[connection-string]
LLAMA_URL=https://llama-service-url
apiId=[telegram-api-id]
apiHash=[telegram-api-hash]
stringSession=[telegram-session]

# Flask Server .env
PORT=3000
MONGODB_URI=mongodb://[connection-string]
LLAMA_URL=https://llama-service-url
apiId=[telegram-api-id]
TELEGRAM_API_HASH=[telegram-api-hash]
TELEGRAM_BOT_TOKEN=[telegram-bot-token]
TELEGRAM_SESSION_NAME=bot_session

# Llama .env
PORT=3001
NODE_ENV=production
GOOGLE_API_KEY=[vertex-ai-api-key]
PINECONE_API_KEY=[pinecone-api-key]

Running Locally

Node.js Server Setup

# Clone the repository
cd anton

# Server setup
cd server
npm install
npm start

Flask Server Setup

# In the project root
cd server-flask
# Create and activate virtual environment (optional but recommended)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
# Install dependencies
pip install -e .
# Run the server
python index.py

Llama and Landing Page Setup

# Llama setup (in another terminal)
cd llama
npm install
npm start

# Landing page setup (in another terminal)
cd landing-page
npm install
npm run dev

Conclusion

Anton represents a sophisticated implementation of modern AI techniques for conversational applications. By combining vector databases, retrieval-augmented generation, and chained large language models, it delivers natural, contextually relevant interactions with an evolving personality. The microservice architecture provides flexibility and scalability, while the fine-tuned Gemini model ensures high-quality, human-like responses based on a massive conversation dataset. The addition of a Flask-based server offers greater flexibility and alternative implementation options.

Share this article

More articles