Filter, Rewrite, and Scraper Rules
Feed Filtering Rules ¶
Miniflux has a basic filtering system that allows you to ignore or keep articles.
Block Rules
Block rules ignore articles with a title, an entry URL, a tag, or an author that matches the regex (RE2 syntax).
For example, the regex (?i)miniflux
will ignore all articles with a title that contains the word Miniflux (case insensitive).
Ignored articles won’t be saved into the database.
Keep Rules
Keep rules retain only articles that match the regex (RE2 syntax).
For example, the regex (?i)miniflux
will keep only the articles with a title that contains the word Miniflux (case insensitive).
Global Filtering Rules ¶
Global filters are defined on the Settings page and are automatically applied to all articles from all feeds.
- Each rule must be on a separate line.
- Duplicate rules are allowed. For example, having multiple
EntryTitle
rules is possible. - The provided regex should use the RE2 syntax.
- The order of the rules matters as the processor stops on the first match for both Block and Keep rules.
Rule Format:
FieldName=RegEx
FieldName=RegEx
FieldName=RegEx
Available Fields:
EntryTitle
EntryURL
EntryCommentsURL
EntryContent
EntryAuthor
EntryTag
EntryDate
Date Patterns
The EntryDate
field supports the following date patterns:
future
- Match entries with future publication dates.before:YYYY-MM-DD
- Match entries published before a specific date.after:YYYY-MM-DD
- Match entries published after a specific date.between:YYYY-MM-DD,YYYY-MM-DD
- Match entries published between two dates.
Date format must be YYYY-MM-DD, for example: 2024-01-01.
Block Rules
Block rules ignore articles that match a single rule.
For example, the rule EntryTitle=(?i)miniflux
will ignore all articles with a title that contains the word Miniflux (case insensitive).
Examples:
EntryDate=future
will ignore articles with future publication dates.EntryDate=before:2024-01-01
will ignore articles published before January 1st, 2024.
Keep Rules
Keep rules retain articles that match a single rule.
For example, the rule EntryTitle=(?i)miniflux
will keep only the articles with a title that contains the word Miniflux (case insensitive).
Examples:
EntryDate=between:2024-01-01,2024-12-31
will keep only articles published in 2024.EntryDate=after:2024-03-01
will keep only articles published after March 1st, 2024.
Global Rules & Feed Rules Ordering
Rules are processed in this order:
- Global Block Rules
- Feed Block Rules
- Global Keep Rules
- Feed Keep Rules
Rewrite Rules ¶
To improve the reading experience, it’s possible to alter the content of feed items.
For example, if you are reading a popular comic website like XKCD,
it’s nice to have the image title (the alt
attribute) added under the image,
especially on mobile devices where there is no hover
event.
add_dynamic_image
- Tries to add the highest quality images from sites that use JavaScript to load images (e.g., either lazily when scrolling or based on screen size).
add_dynamic_iframe
- Tries to add embedded videos from sites that use JavaScript to load iframes (e.g., either lazily when scrolling or after the rest of the page is loaded).
add_image_title
- Adds each image's title as a caption under the image.
add_youtube_video
- Inserts a YouTube video into the article (automatic for Youtube.com).
add_youtube_video_from_id
- Inserts a YouTube video into the article based on the video ID.
add_invidious_video
- Inserts an Invidious player into the article (automatic for https://invidio.us).
add_youtube_video_using_invidious_player
- Inserts an Invidious player into the article for YouTube feeds.
add_castopod_episode
- Inserts a Castopod episode player.
add_mailto_subject
- Inserts mailto links subject into the article.
base64_decode
- Decodes base64 content. It can be used with a selector:
base64_decode(".base64")
, but can also be used without arguments:base64_decode
. In this case, it will try to convert all TextNodes and always fall back to the original text if it cannot decode. nl2br
- Converts new lines
\n
to<br>
(useful for non-HTML content). convert_text_links
- Converts text links to HTML links (useful for non-HTML content).
fix_medium_images
- Attempts to fix Medium's images rendered in JavaScript.
use_noscript_figure_images
- Uses
<noscript>
content for images rendered with JavaScript. replace("search term"|"replace term")
- Searches and replaces text.
remove(".selector, #another_selector")
- Removes DOM elements.
parse_markdown
(Removed in v2.2.4)- Converts Markdown to HTML. This rule has been removed in version 2.2.4.
remove_tables
- Removes any tables while keeping the content inside (useful for email newsletters).
remove_clickbait
- Removes clickbait titles (converts uppercase titles).
replace_title("search-term"|"replace-term")
- Adjusts entry titles.
add_hn_links_using_hack
- Opens HN comments with Hack.
add_hn_links_using_opener
- Opens HN comments with Opener.
fix_ghost_cards
- Converts Ghost link cards to regular links.
Miniflux includes a set of predefined rules for some websites, but you can define your own rules.
On the feed edit page, enter your custom rules in the field “Rewrite Rules” like this:
rule1,rule2
Separate each rule with a comma.
Scraper Rules ¶
When an article contains only an extract of the content, you can fetch the original web page and apply a set of rules to get relevant content.
Miniflux uses CSS selectors for custom rules. These custom rules can be saved in the feed properties (select a feed and click on edit).
CSS Selector | Description |
---|---|
div#articleBody | Fetch a div element with the ID articleBody . |
div.content | Fetch all div elements with the class content . |
article, div.article | Use a comma to define multiple rules. |
Miniflux includes a list of predefined rules for popular websites. You can contribute to the project to keep them up to date.
Under the hood, Miniflux uses the library Goquery.
URL Rewrite Rules ¶
Sometimes it might be required to rewrite a URL in a feed to fetch better-suited content.
For example, for some users, the URL https://www.npr.org/sections/money/2021/05/18/997501946/the-case-for-universal-pre-k-just-got-stronger displays a cookie consent dialog instead of the actual content, and it would be preferred to fetch the URL https://text.npr.org/997501946 instead.
The following rule does this:
rewrite("^https:\/\/www\.npr\.org\/\d{4}\/\d{2}\/\d{2}\/(\d+)\/.*$"|"https://text.npr.org/$1")
This will rewrite all URLs from the original feed to URLs pointing to text.npr.org when the article content is fetched. You may also need to add your own scraper rule because the default rule will try to fetch #storytext.
Another example is the German page
https://www.heise.de/news/Industrie-ruestet-sich-fuer-Gasstopp-Forscher-vorsichtig-optimistisch-7167721.html
,
which splits the article into multiple pages. The full text can be read on
https://www.heise.de/news/Industrie-ruestet-sich-fuer-Gasstopp-Forscher-vorsichtig-optimistisch-7167721.html?seite=all
.
The URL rewrite rule for that would be:
rewrite("(.*?\.html)"|"$1?seite=all")