🤖 TG Bulk Downloader — `main_v4.py`

A production-grade, single-file Telegram bot built on Pyrogram + FastAPI that bulk-downloads and re-uploads entire Telegram channel ranges — preserving reply chains, handling FloodWait, and keeping you live-updated with a real-time status board.

📋 Table of Contents

Why This Exists
Architecture Overview
Features
Prerequisites
Installation
Environment Variables
Deployment
Bot Commands Reference
Pipeline Deep-Dive
Task Status Lifecycle
Live Status Board
Configuration Tuning
Fault Tolerance & Retry System
User Settings
Code Structure
Logging
Known Limitations
Download Source Files

💡 Why This Exists

HuggingFace Spaces blocks api.telegram.org (the standard Bot HTTP API), which means typical webhook bots simply cannot run there. This bot solves that by using Pyrogram over MTProto (raw TCP) — a lower-level protocol that HF Spaces does not block.

It also solves a common problem: copying large ranges of posts from one Telegram channel/group to another, including media albums, polls, stickers, voice notes, and documents — without losing message order or reply chains.

🏗 Architecture Overview

User sends /bdl <start_url> <end_url>
           │
           ▼
┌─────────────────────────────────────────────────────┐
│                  FastAPI Webhook                     │
│  POST /webhook  ─►  route handler  ─►  task launch  │
└─────────────────────────────────────────────────────┘
           │
           ▼
┌──────────────────────────────────────────────────────────────────┐
│                    Parallel Pipeline                              │
│                                                                  │
│  Phase 1 ─ FETCH   │  Phase 2 ─ DOWNLOAD  │  Phase 3 ─ UPLOAD  │
│  ─────────────────  │  ──────────────────  │  ─────────────────  │
│  Chunked metadata  │  asyncio.Semaphore   │  Ordered delivery  │
│  get_messages()    │  pool (N workers)    │  ready_queue +     │
│  batch=20 msgs     │  MAX_CONCURRENT_     │  send_pointer      │
│                    │  DOWNLOADS=3         │  MAX_UPLOAD_       │
│                    │                      │  WORKERS=2         │
│                                                                  │
│  Phase 4 ─ RETRY                                                 │
│  ────────────────                                                │
│  /retry re-runs failed/abandoned slots                           │
│  Edits placeholder messages in-place                             │
└──────────────────────────────────────────────────────────────────┘
           │
           ▼
┌─────────────────────────────────────────────────────┐
│              BatchStateManager                       │
│  Live Telegram UI — edits status every 5 seconds    │
│  delete+resend after each file (always-bottom mode) │
└─────────────────────────────────────────────────────┘

The bot uses two Pyrogram clients simultaneously:

Client	Auth type	Role
`user`	Session string (user account)	Reads source channel messages
`bot`	Bot token	Sends files to destination chat

✨ Features

Forward EVERYTHING — media (photos, videos, audio, documents), text, polls, stickers, voice notes, and forwarded messages
Preserve reply chains — maps source reply_to_msg_id to our group’s equivalent message
Ordered delivery — files always arrive in the correct sequence, even with parallel downloads
Partial file fallback — if download stalls, sends whatever bytes arrived with a ⚠️ Partial warning caption
Abandoned placeholder — on total failure, sends a text stub to hold sequence position (retryable)
/retry command — re-attempts all failed / abandoned / partial slots and edits placeholders in-place
/killall — hard-cancels all worker tasks for the active batch immediately
/skip — skips the currently stalling download slot
Live status board — edits a single Telegram message every STATE_LOG_INTERVAL seconds with real-time counters
Upload progress bar — ephemeral 📤 Uploading... message showing % completion, auto-deleted on finish
Always-bottom status — status message auto-deleted and resent after each file so it stays newest
Inline Refresh button — tap 🔄 on the status message for an instant redraw
/settings panel — toggle bot behaviour per-chat via inline keyboard
FloodWait handling — respects FloodWait and FloodPremiumWait errors with automatic backoff
Rotating log file — logs.txt (5 MB, 10 backups), downloadable via /logs

📦 Prerequisites

Python 3.10+
A Telegram API ID and API Hash from my.telegram.org
A Telegram Bot Token from @BotFather
A Pyrogram Session String for a user account (generated via pyrogram.Client)
Optionally: a HuggingFace Spaces Docker environment

🚀 Installation

1. Clone / download

git clone https://github.com/yourname/tg-bulk-downloader.git
cd tg-bulk-downloader

2. Install dependencies

pip install -r requirements.txt

requirements.txt:

pyrofork
pyleaves
tgcrypto
python-dotenv
psutil
fastapi
uvicorn[standard]

Package	Purpose
`pyrofork`	Pyrogram fork — MTProto Telegram client
`pyleaves`	Pyrogram leaves plugin system
`tgcrypto`	Native crypto acceleration for Pyrogram
`python-dotenv`	`.env` file loader for config
`psutil`	System stats for `/stats` command
`fastapi`	ASGI web framework for the webhook server
`uvicorn[standard]`	ASGI server (with uvloop + httptools)

3. Configure environment

Create a config.env file (see Environment Variables below).

4. Run

uvicorn main:app --host 0.0.0.0 --port 8000

Or on HuggingFace Spaces (port 7860):

uvicorn main:app --host 0.0.0.0 --port 7860

🔑 Environment Variables

Create config.env or config.env.local in the project root:

# ── Required ──────────────────────────────────────────────────────────
BOT_TOKEN=123456789:AABBCCyour-bot-token-here
SESSION_STRING=BQAyour-pyrogram-session-string-here
API_ID=12345678
API_HASH=abcdef1234567890abcdef1234567890

# ── Worker concurrency ─────────────────────────────────────────────────
MAX_FETCH_WORKERS=3           # parallel metadata fetch threads
MAX_CONCURRENT_DOWNLOADS=3   # parallel download slots
MAX_UPLOAD_WORKERS=2          # parallel Telegram upload slots

# ── Retry behaviour ────────────────────────────────────────────────────
MAX_DOWNLOAD_RETRIES=3        # per-slot download retry limit
RETRY_DELAY_SECONDS=5         # wait between retries

# ── Timeouts / safety ─────────────────────────────────────────────────
CONSECUTIVE_ERROR_LIMIT=5     # abort batch after N consecutive errors
FLOOD_WAIT_DELAY=3            # extra seconds added on FloodWait
FETCH_BATCH_SIZE=20           # messages fetched per get_messages() call

# ── UI timings ─────────────────────────────────────────────────────────
STATE_LOG_INTERVAL=5.0        # status board edit interval (seconds)
UPLOAD_PROGRESS_INTERVAL=3.0  # upload progress message edit interval

Tip: Generate a session string with:

from pyrogram import Client
with Client("my_account", api_id=API_ID, api_hash=API_HASH) as app:
    print(app.export_session_string())

☁️ Deployment

HuggingFace Spaces (Docker)

Add a Dockerfile:

FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY main_v4.py main.py
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "7860"]

Set all environment variables as HF Secrets. The MTProto raw TCP connection used by Pyrogram is not blocked by HuggingFace, unlike the standard Bot API HTTP endpoint.

Local / VPS

# systemd service or just:
nohup uvicorn main:app --host 0.0.0.0 --port 8000 &

Webhook Setup

After the server is running, register the Telegram webhook:

curl "https://api.telegram.org/bot<BOT_TOKEN>/setWebhook?url=https://your-domain.com/webhook"

📟 Bot Commands Reference

Command	Arguments	Description
`/start`	—	Welcome message
`/help`	—	Full command reference
`/dl`	`<post_url>`	Download & re-upload a single post
`/bdl`	`<start_url> <end_url>`	Bulk copy a range of posts
`/retry`	—	Re-attempt all failed/abandoned/partial slots
`/skip`	—	Skip the currently stalling download
`/killall`	—	Hard-cancel all workers for the active batch
`/refresh`	—	Force-redraw the status board immediately
`/stats`	—	Show CPU, RAM, disk, uptime stats
`/logs`	—	Upload `logs.txt` as a document
`/cleanup`	—	Delete all temp files in `downloads/`
`/settings`	—	Open the per-chat settings panel

`/bdl` Example

/bdl https://t.me/mychannel/100 https://t.me/mychannel/120

This copies posts #100 through #120 from mychannel to your chat.

🔬 Pipeline Deep-Dive

Phase 1 — Metadata Fetch

Messages are fetched in chunks of FETCH_BATCH_SIZE=20 using get_messages(). The fetch pool uses up to MAX_FETCH_WORKERS=3 concurrent tasks, each handling a slice of the total ID range. This avoids a single serial fetch loop on large batches.

Phase 2 — Parallel Download

Each fetched message is queued into a download pool protected by an asyncio.Semaphore(MAX_CONCURRENT_DOWNLOADS). Workers grab a permit, download the file to downloads/<chat_id>/<filename>, and upon completion place the result into a ready_queue.

Download progress is tracked via a Pyrogram progress callback, updating bytes_done on the slot’s TaskState. The BatchStateManager reads these live values to render per-file progress bars in the status board.

Phase 3 — Ordered Upload

A dedicated sender task monitors ready_queue and a send_pointer (the index of the next expected slot). Files are uploaded in strict sequential order: if slot #5 finishes before slot #3, slot #5 waits in the queue until the pointer reaches it.

Up to MAX_UPLOAD_WORKERS=2 uploads run in parallel once ordering is satisfied. Telegram-native albums (grouped media) are batched into send_media_group() calls.

Phase 4 — Retry

/retry iterates over all FAILED, ABANDONED, and PARTIAL slots. For each:

Re-downloads the source message
Attempts to re-upload the file
On success: edits the placeholder message in-place with the actual content
On failure: increments still_failed counter

🔄 Task Status Lifecycle

PENDING ──► FETCHING ──► DOWNLOADING ──► SENDING ──► UPLOADING ──► COMPLETED
                │                │                                    │
                │                └──────────────────────────────► PARTIAL
                │                                                    │
                └───────────────────────────────────────────────► ABANDONED
                                                                     │
                                                               (use /retry)
                                                                     │
                                                               ──► COMPLETED
                                                               ──► FAILED

Status	Icon	Meaning
`PENDING`	⏳	Queued, not yet started
`FETCHING`	🔍	Fetching message metadata
`DOWNLOADING`	⬇️	Downloading file bytes
`SENDING`	📦	Downloaded, queued for upload slot
`UPLOADING`	📤	Actively uploading to Telegram
`COMPLETED`	✅	Successfully sent
`FAILED`	❌	All retries exhausted
`PARTIAL`	⚠️	Partial file sent with warning caption
`ABANDONED`	🔲	Placeholder sent; retry available
`SKIPPED`	⏭️	Skipped via `/skip`

📊 Live Status Board

The bot maintains a single Telegram message that it edits every STATE_LOG_INTERVAL seconds. The board looks like:

⚡ Batch Download — 20 posts
▓▓▓▓▓▓░░░░ 60%  •  ✅12 ⚠️1 🔲0 ❌0 ⏭️0  •  2m 14s
⬇️2 dl  📤1 up  📦0 queued

Active:
⬇️ ▓▓▓▓░░ video_clip.mp4
    45.3 MB / 112.0 MB
📤 document.pdf 8.2 MB — uploading…

Recent:
✅ photo_album.jpg 2.1 MB
✅ voice_note.ogg 0.4 MB
✅ sticker.webp

⏳ 7 post(s) queued…

/skip — skip current  •  /killall — stop all  •  /refresh — refresh now

A 🔄 Refresh inline button allows instant manual redraw.

⚙️ Configuration Tuning

Throughput vs. Stability

Goal	Adjust
Faster bulk downloads	↑ `MAX_CONCURRENT_DOWNLOADS`
Fewer FloodWait errors	↓ `MAX_UPLOAD_WORKERS`, ↑ `FLOOD_WAIT_DELAY`
Reduce Telegram API calls	↑ `FETCH_BATCH_SIZE`
More responsive status UI	↓ `STATE_LOG_INTERVAL`
Reduce edit rate on busy batches	↑ `STATE_LOG_INTERVAL`

Memory / Disk

Downloaded files are stored temporarily in downloads/<chat_id>/. Use /cleanup to purge them, or they are removed automatically after each successful upload. psutil is used by /stats to report live disk usage.

🛡 Fault Tolerance & Retry System

FloodWait

The bot catches both FloodWait and FloodPremiumWait errors. On FloodWait(x), it waits x + FLOOD_WAIT_DELAY seconds before retrying. This applies to both downloads and uploads.

Consecutive Error Limit

If CONSECUTIVE_ERROR_LIMIT slots fail back-to-back, the batch is aborted to avoid hammering the API on a persistent issue.

Partial File Fallback

If a download is interrupted mid-stream, the bot detects a truncated file and sends it anyway with a caption flagging the partial state. The slot is marked PARTIAL and is retryable.

Abandoned Slots

When a file cannot be sent (e.g. content too large, unsupported type), the bot sends a plain-text placeholder message to hold the sequence position. The slot is marked ABANDONED and can be recovered via /retry.

🎛 User Settings

Open /settings to see and toggle per-chat options:

Setting	Default	Description
📌 Status bar always at bottom	On	After each file, the status message is deleted and re-sent so it stays as the newest message

Settings are stored in-memory and reset on redeployment. Future versions may persist them.

🗂 Code Structure

Since this is intentionally a single-file bot, all components live in main_v4.py:

main_v4.py
├── Logging setup             (RotatingFileHandler, console)
├── PyroConf                  (all env var config in one place)
├── TaskStatus (Enum)         (slot state machine)
├── UserSettings              (per-chat preferences)
├── UX helpers                (fmt_size, fmt_time, progress_bar, ...)
├── TaskState (dataclass)     (per-slot mutable state)
├── BatchStateManager         (live Telegram UI, update queue)
├── Batch Registry            (_batch_registry, _batch_worker_tasks)
├── File helpers              (get_download_path, cleanup_download, ...)
├── Pyrogram clients          (user + bot)
├── Message handlers          (handle_download, handle_batch_download, ...)
│   ├── Single /dl handler
│   ├── Batch /bdl handler
│   │   ├── fetch_worker()
│   │   ├── download_worker()
│   │   ├── upload_worker()  (ordered sender)
│   │   └── status_logger()  (live UI loop)
│   └── /retry handler
├── Command routers           (/start, /help, /stats, /killall, /skip, ...)
├── FastAPI lifespan          (start/stop Pyrogram clients)
└── FastAPI webhook           (POST /webhook — routes all updates)

📝 Logging

Logs are written to both the console and logs.txt (5 MB rolling, 10 backups).

Format:

[DD-Mon-YY HH:MM:SS AM/PM - LEVEL] - function_name() - Line N: module - message

Pyrogram’s own logs are suppressed to ERROR level to reduce noise.
Send /logs in Telegram to download the current log file as a document.
Old logs.txt is deleted on each bot restart (clean slate).

⚠️ Known Limitations

Settings are in-memory — toggled preferences reset on each redeploy.
No multi-batch support per chat — starting a new /bdl while one is running cancels the old one.
Large files — Telegram has a 2 GB upload limit for bots. Files exceeding this will be marked ABANDONED.
Private channels — the user account (SESSION_STRING) must be a member of the source channel.
Album ordering — grouped media (albums) are sent as a unit; individual photos within an album preserve their internal order.

📥 Download Source Files

You can download the full source package (includes main_v4.py and requirements.txt) below:

⬇️ Download tg_bulk_downloader.zip

🙏 Credits

MTProto client: Pyrofork (Pyrogram fork)
Web framework: FastAPI

Built with ❤️ for the Telegram automation community.

🤖 TG Bulk Downloader — main_v4.py