Demo: https://t.me/aturretrss_bot
A social media fetch bot based on FastAPI.
Supported most mainstream social media platforms. You can get a permanent copy of the content by just sending the url to the bot.
Other separated microservices for this project:
Download the docker-compose.yml file and set the environment variables as the following section.
Create a .env
file at the same directory and set the environment variables.
If you want to send documents that larger than 50MB, you need to run a local telegram api server. The docker-compose.yml
file has already give you an example. You just need to fill the TELEGRAM_API_ID
and TELEGRAM_API_HASH
in the yml file. If you don't need it, just comment it out.
docker-compose up -d
Local Telegram API sever and video download function is not supported in this way. If you do really need these functions, you can run the telegram api server and the file export server manually.
We use Poetry as the package manager for this project. You can install it by the following command.
pip install poetry
Then, install the dependencies.
poetry install
Finally, run the server.
poetry run gunicorn -k uvicorn.workers.UvicornWorker app.main:app --preload
Note: Many of the services requires cookies to fetch content. You can get your cookies by browser extension Get cookies.txt LOCALLY and set the cookies as environment variables.
BASE_URL
: The base url of the server. example:example.com
TELEGRAM_BOT_TOKEN
: The token of the telegram bot.TELEGRAM_CHAT_ID
: The chat id of the telegram bot.
PORT
: Default:10450
API_KEY
: The api key for the FastAPI server. It would be generated automatically if not set.
TELEBOT_API_SERVER_HOST
: The host of the telegram bot api server. Default:telegram-bot-api
TELEBOT_API_SERVER_PORT
: The port of the telegram bot api server. Default:8081
TELEGRAM_CHANNEL_ID
: The channel id of the telegram bot. Default:None
TELEGRAM_CHANNEL_ADMIN_LIST
: The id list of the users who can send message to targeted telegram channel, divided by,
. You cannot send message to the channel if you are not in the list. Default:None
Must set cookies variables if you want to fetch twitter content.
TWITTER_CT0
: The ct0 cookie of twitter. Default:None
TWITTER_AUTH_TOKEN
: The auth token of twitter. Default:None
We use read_only
mode of praw
to fetch reddit content. We still need to set the client_id
, client_secret
, username
and password
of your reddit api account.
REDDIT_CLIENT_ID
: The client id of reddit. Default:None
REDDIT_CLIENT_SECRET
: The client secret of reddit. Default:None
REDDIT_USERNAME
: The username of reddit. Default:None
REDDIT_PASSWORD
: The password of reddit. Default:None
WEIBO_COOKIES
: The cookie of weibo. For some unknown reasons, some weibo posts may be not accessible if you don't are not logged in. Just copy the cookie from your browser and set it. Default:None
XIAOHONGSHU_A1
: The a1 cookie of xiaohongshu. Default:None
XIAOHONGSHU_WEBID
: The webid cookie of xiaohongshu. Default:None
XIAOHONGSHU_WEBSESSION
: The websession cookie of xiaohongshu. Default:None
You can set the api key of OpenAI to use the transcription function.
OPENAI_API_KEY
: The api key of OpenAI. Default:None
AWS_ACCESS_KEY_ID
: The access key id of Amazon S3. Default:None
AWS_SECRET_ACCESS_KEY
: The secret access key of Amazon S3. Default:None
AWS_S3_BUCKET_NAME
: The bucket name of Amazon S3. Default:None
AWS_S3_REGION_NAME
: The region name of Amazon S3. Default:None
AWS_DOMAIN_HOST
: The domain bound to the bucket. The picture upload function would generate images url by bucket name if customized host not set. Default:None
- Bluesky (Beta, only supports part of posts)
- Threads
- Reddit (Beta, only supports part of posts)
- Quora
- WeChat Public Account Articles
- Zhihu
- Douban
- Xiaohongshu
- Youtube
- Bilibili
The HTML to Telegra.ph converter function is based on html-telegraph-poster. I separated it from this project as an independent Python package: html-telegraph-poster-v2.
The Xiaohongshu scraper is based on MediaCrawler.
The Weibo scraper is based on weiboSpider.
The Twitter scraper is based on twitter-api-client.
The Zhihu scraper is based on fxzhihu.
All the code is licensed under the MIT license. I either used their code as-is or made modifications to implement certain functions. I want to express my gratitude to the projects mentioned above for their contributions.