Proxying YouTube feeds

Published May 7, 2023·4 minutes read

    Contents

  1. Scraping and proxying
  2. Filters and features
  3. Conclusion

I consume the internet primarily trough my RSS reader which combines blog posts, videos and podcasts from many different websites. But recently, I started to get annoyed by YouTube shorts that were filling up my feeds from some of the channels I follow. While I like to watch most videos of some channels, I absolutely hate YouTube shorts because they actually contain no useful information in my opinion. So I wrote my own YouTube feed server in Rust 🦀 that filters out these annoying things.

Scraping and proxying #

YouTube has an official API that you can use to get channel and video information. For a small amount of requests it is free and it would probably be enough for myself. However, the API requires you to have a Google account, which was not an option for me because I didn't want to give up any of my privacy. Besides, the whole point of using an RSS reader is to be platform-independent without using accounts, so it didn't feel like the right way to go.

At first, I tied using an unmaintained Rust crate for getting YouTube metadata. But it quickly became clear that this was also not a good solution because it was very slow and didn't work on some channels.

Eventually the simplest way to receive data turned out to be webscraping. When I looked at the page source of YouTube I saw that it contained all data I required in JSON. Scraping this data was quite easy since YouTube doesn't seem to block automated scrapers. My code basically makes a GET request to the channel page and then it extracts the JSON data.

However, the information obtained from scraping wasn't enough to create a full feed. The data included only things that are actually displayed to the user instead of absolute values which are required to build a proper feed (e.g. '3 weeks ago' instead of a full date). Therefore, I ended up creating a proxy that uses the official YouTube feed server. The proxy simultaneously gets the feed from YouTube and scrapes for channel information. It can then modify the feeds according to my needs and remove uninteresting stuff.

Filters and features #

When the feed proxing was working I added filters for things like likes, views and video duration. These filters are configurable via URL parameters. See the following example:

http:://example.com/@ChannelHandle?duration=-600&views=100000-&likes=1000-

This URL will filter the videos from the channel @ChannelHandle for a duration under 10 minutes (600 seconds), 100,000 views and 1,000 likes.

Like shown in the example, channels can be specified using handles. These were only introduced last year and YouTube still uses channel IDs on the background and in their feed server. The channel ID is not shown anywhere anymore but you can easily find it in the <head> tag of a channel as meta element.

Also, I added a simple sponsor message blocker that deletes lines based on some keywords. You can already use the SponsorBlock browser extension to block sponsor messages in videos but that does not effect to video descriptions that will show up your RSS reader. Note that however this approach is far from perfect but it works more often than not during my testing.

At last, I added a line with information about the video at the top of the description of each feed entry. This gives a bit more information about a video that could be useful if you're using mpv to play videos from the command line like me.

Conclusion #

I have been using this proxy for all my YouTube feeds for a few months now and I am happy with my clean looking feeds. You can check out the code on GitHub to find the source code of this project and with instructions how to use it.

categories:random