Skip to content

Get a pernament copy of social media contents from your favorite apps & websites by sending url to the bot.

License

Notifications You must be signed in to change notification settings

aturret/FastFetchBot

Repository files navigation

Demo: https://t.me/aturretrss_bot

FastFetchBot

A social media fetch bot based on FastAPI.

Supported most mainstream social media platforms. You can get a permanent copy of the content by just sending the url to the bot.

Other separated microservices for this project:

Installation

Docker (Recommended)

Download the docker-compose.yml file and set the environment variables as the following section.

Env

Create a .env file at the same directory and set the environment variables.

Local Telegram API Sever

If you want to send documents that larger than 50MB, you need to run a local telegram api server. The docker-compose.yml file has already give you an example. You just need to fill the TELEGRAM_API_ID and TELEGRAM_API_HASH in the yml file. If you don't need it, just comment it out.

docker-compose up -d

Python (Not Recommended)

Local Telegram API sever and video download function is not supported in this way. If you do really need these functions, you can run the telegram api server and the file export server manually.

We use Poetry as the package manager for this project. You can install it by the following command.

pip install poetry

Then, install the dependencies.

poetry install

Finally, run the server.

poetry run gunicorn -k uvicorn.workers.UvicornWorker app.main:app --preload

Environment Variables

Note: Many of the services requires cookies to fetch content. You can get your cookies by browser extension Get cookies.txt LOCALLY and set the cookies as environment variables.

Required Variables

  • BASE_URL: The base url of the server. example: example.com
  • TELEGRAM_BOT_TOKEN: The token of the telegram bot.
  • TELEGRAM_CHAT_ID: The chat id of the telegram bot.

Optional Variables

FastAPI

  • PORT: Default: 10450
  • API_KEY: The api key for the FastAPI server. It would be generated automatically if not set.

Telegram

  • TELEBOT_API_SERVER_HOST: The host of the telegram bot api server. Default: telegram-bot-api
  • TELEBOT_API_SERVER_PORT: The port of the telegram bot api server. Default: 8081
  • TELEGRAM_CHANNEL_ID: The channel id of the telegram bot. Default: None
  • TELEGRAM_CHANNEL_ADMIN_LIST: The id list of the users who can send message to targeted telegram channel, divided by ,. You cannot send message to the channel if you are not in the list. Default: None

Twitter

Must set cookies variables if you want to fetch twitter content.

  • TWITTER_CT0: The ct0 cookie of twitter. Default: None
  • TWITTER_AUTH_TOKEN: The auth token of twitter. Default: None

Reddit

We use read_only mode of praw to fetch reddit content. We still need to set the client_id , client_secret , username and password of your reddit api account.

  • REDDIT_CLIENT_ID: The client id of reddit. Default: None
  • REDDIT_CLIENT_SECRET: The client secret of reddit. Default: None
  • REDDIT_USERNAME: The username of reddit. Default: None
  • REDDIT_PASSWORD: The password of reddit. Default: None

Weibo

  • WEIBO_COOKIES: The cookie of weibo. For some unknown reasons, some weibo posts may be not accessible if you don't are not logged in. Just copy the cookie from your browser and set it. Default: None

Xiaohongshu

  • XIAOHONGSHU_A1: The a1 cookie of xiaohongshu. Default: None
  • XIAOHONGSHU_WEBID: The webid cookie of xiaohongshu. Default: None
  • XIAOHONGSHU_WEBSESSION: The websession cookie of xiaohongshu. Default: None

OpenAI

You can set the api key of OpenAI to use the transcription function.

  • OPENAI_API_KEY: The api key of OpenAI. Default: None

Amazon S3 Picture Storage

  • AWS_ACCESS_KEY_ID: The access key id of Amazon S3. Default: None
  • AWS_SECRET_ACCESS_KEY: The secret access key of Amazon S3. Default: None
  • AWS_S3_BUCKET_NAME: The bucket name of Amazon S3. Default: None
  • AWS_S3_REGION_NAME: The region name of Amazon S3. Default: None
  • AWS_DOMAIN_HOST: The domain bound to the bucket. The picture upload function would generate images url by bucket name if customized host not set. Default: None

Supported Content Types

Social Media Content

  • Twitter
  • Bluesky (Beta, only supports part of posts)
  • Instagram
  • Threads
  • Reddit (Beta, only supports part of posts)
  • Quora
  • Weibo
  • WeChat Public Account Articles
  • Zhihu
  • Douban
  • Xiaohongshu

Video Content

  • Youtube
  • Bilibili

Acknowledgements

The HTML to Telegra.ph converter function is based on html-telegraph-poster. I separated it from this project as an independent Python package: html-telegraph-poster-v2.

The Xiaohongshu scraper is based on MediaCrawler.

The Weibo scraper is based on weiboSpider.

The Twitter scraper is based on twitter-api-client.

The Zhihu scraper is based on fxzhihu.

All the code is licensed under the MIT license. I either used their code as-is or made modifications to implement certain functions. I want to express my gratitude to the projects mentioned above for their contributions.

About

Get a pernament copy of social media contents from your favorite apps & websites by sending url to the bot.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages