gerpath.blogg.se - Rss feed website

I start by looking for tags pointing to RSS feeds, then parse the page looking for any a hrefs pointing to links with “xml”, “rss”, or “feed” in the URL. I’ve copied my solution below, which you should be able to interpret fairly easily. This script does have some non-standard dependencies, both of which you are probably already using if you’re doing anything related to web scraping or feed reading: feedparser and beautifulsoup4.

I wouldn’t include any links that were not valid RSS feeds.I wouldn’t miss any legitimate feeds that were on a website and.I wanted my function to be accruate and thorough, which (for me) means: My Solution: Python 3 function for extracting RSS feeds from URLs After fighting a losing battle trying to deal with Python’s 2to3 conversion tool, I realized I’d already wasted more time trying to port this old script than it would take me to write a new one. However, a major shortcoming of this script is that it’s fairly dated and written for Python 2. Aaron Swartz (RIP) wrote his own script called feedfinder.py which does this exact same thing. Essentially, I want to pass a URL to my API and have it return the RSS feed associated with that domain.Īs with most things, I wasn’t the first person to come across this problem.

I have been working on a project where I need to extract RSS feeds from various blogs and news websites.