FastFetchBot

A social media fetch bot based on FastAPI.

Supported most mainstream social media platforms. You can get a permanent copy of the content by just sending the url to the bot.

Other separated microservices for this project:

Installation

Docker (Recommended)

Download the docker-compose.yml file and set the environment variables as the following section.

Env

Create a .env file at the same directory and set the environment variables.

Local Telegram API Sever

If you want to send documents that larger than 50MB, you need to run a local telegram api server. The docker-compose.yml file has already give you an example. You just need to fill the TELEGRAM_API_ID and TELEGRAM_API_HASH in the yml file. If you don't need it, just comment it out.

docker-compose up -d

Python (Not Recommended)

Local Telegram API sever and video download function is not supported in this way. If you do really need these functions, you can run the telegram api server and the file export server manually.

We use Poetry as the package manager for this project. You can install it by the following command.

pip install poetry

Then, install the dependencies.

poetry install

Finally, run the server.

poetry run gunicorn -k uvicorn.workers.UvicornWorker app.main:app --preload

Environment Variables

Note: Many of the services requires cookies to fetch content. You can get your cookies by browser extension Get cookies.txt LOCALLY and set the cookies as environment variables.

Required Variables

BASE_URL: The base url of the server. example: example.com
TELEGRAM_BOT_TOKEN: The token of the telegram bot.
TELEGRAM_CHAT_ID: The chat id of the telegram bot.

Optional Variables

FastAPI

PORT: Default: 10450
API_KEY: The api key for the FastAPI server. It would be generated automatically if not set.

Telegram

TELEBOT_API_SERVER_HOST: The host of the telegram bot api server. Default: telegram-bot-api
TELEBOT_API_SERVER_PORT: The port of the telegram bot api server. Default: 8081
TELEGRAM_CHANNEL_ID: The channel id of the telegram bot. Default: None
TELEGRAM_CHANNEL_ADMIN_LIST: The id list of the users who can send message to targeted telegram channel, divided by ,. You cannot send message to the channel if you are not in the list. Default: None

Twitter

Must set cookies variables if you want to fetch twitter content.

TWITTER_CT0: The ct0 cookie of twitter. Default: None
TWITTER_AUTH_TOKEN: The auth token of twitter. Default: None

Reddit

We use read_only mode of praw to fetch reddit content. We still need to set the client_id , client_secret , username and password of your reddit api account.

REDDIT_CLIENT_ID: The client id of reddit. Default: None
REDDIT_CLIENT_SECRET: The client secret of reddit. Default: None
REDDIT_USERNAME: The username of reddit. Default: None
REDDIT_PASSWORD: The password of reddit. Default: None

Weibo

WEIBO_COOKIES: The cookie of weibo. For some unknown reasons, some weibo posts may be not accessible if you don't are not logged in. Just copy the cookie from your browser and set it. Default: None

Xiaohongshu

XIAOHONGSHU_A1: The a1 cookie of xiaohongshu. Default: None
XIAOHONGSHU_WEBID: The webid cookie of xiaohongshu. Default: None
XIAOHONGSHU_WEBSESSION: The websession cookie of xiaohongshu. Default: None

OpenAI

You can set the api key of OpenAI to use the transcription function.

OPENAI_API_KEY: The api key of OpenAI. Default: None

Amazon S3 Picture Storage

AWS_ACCESS_KEY_ID: The access key id of Amazon S3. Default: None
AWS_SECRET_ACCESS_KEY: The secret access key of Amazon S3. Default: None
AWS_S3_BUCKET_NAME: The bucket name of Amazon S3. Default: None
AWS_S3_REGION_NAME: The region name of Amazon S3. Default: None
AWS_DOMAIN_HOST: The domain bound to the bucket. The picture upload function would generate images url by bucket name if customized host not set. Default: None

Supported Content Types

Social Media Content

Video Content

Youtube
Bilibili

Acknowledgements

The HTML to Telegra.ph converter function is based on html-telegraph-poster. I separated it from this project as an independent Python package: html-telegraph-poster-v2.

The Xiaohongshu scraper is based on MediaCrawler.

The Weibo scraper is based on weiboSpider.

The Twitter scraper is based on twitter-api-client.

The Zhihu scraper is based on fxzhihu.

All the code is licensed under the MIT license. I either used their code as-is or made modifications to implement certain functions. I want to express my gratitude to the projects mentioned above for their contributions.

Name		Name	Last commit message	Last commit date
Latest commit History 602 Commits
.github		.github
app		app
conf		conf
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.template.yml		docker-compose.template.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
template.env		template.env

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FastFetchBot

Installation

Docker (Recommended)

Env

Local Telegram API Sever

Python (Not Recommended)

Environment Variables

Required Variables

Optional Variables

FastAPI

Telegram

Twitter

Reddit

Weibo

Xiaohongshu

OpenAI

Amazon S3 Picture Storage

Supported Content Types

Social Media Content

Video Content

Acknowledgements

About

Releases

Packages

Contributors 3

Languages

License

aturret/FastFetchBot

Folders and files

Latest commit

History

Repository files navigation

FastFetchBot

Installation

Docker (Recommended)

Env

Local Telegram API Sever

Python (Not Recommended)

Environment Variables

Required Variables

Optional Variables

FastAPI

Telegram

Twitter

Reddit

Weibo

Xiaohongshu

OpenAI

Amazon S3 Picture Storage

Supported Content Types

Social Media Content

Video Content

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages