It all started when we were brainstorming a project idea for our Master's thesis in Generative Artificial Intelligence. We couldn’t come up with anything solid, and at the same time, we were struggling to keep up with cybersecurity news across multiple websites. Given how fast-paced the field is, we needed a way to stay updated quickly and efficiently — and that’s how the idea was born.
To explain everything better, I’ve broken it down into two parts: the
news pipeline (how we gather and process news), and the
production website (where the public sees the final result).
First Version of the News System
We built an initial prototype that pulled RSS feeds from various cybersecurity sources. These articles were then summarized using AI — in both English and Spanish — and to our surprise, the results were excellent. The summaries were quick to read and linked directly to the original articles.
First Problem
After seeing how effective and inexpensive it was to generate summaries, we considered making the site public. But first, we researched the licenses of the RSS feeds. Turns out, only two of them allowed commercial use without restrictions. Since we planned to monetize the site through ads, most of those sources were off-limits.
Second Version of the News System
To address this, we pivoted to a model where we still used the RSS feeds, but instead of republishing them, we would conduct our own investigations and publish original articles — sidestepping licensing issues.
New Problem
This didn’t work as expected. RSS content was often too short, and even when visiting the linked sites, there wasn’t always enough information. The system ended up producing very few articles and was discarded.
Third Version of the News Pipeline
We asked the AI how to better query public sources using keywords. We defined 18 categories for different types of cybersecurity news and refined our method to extract headlines. Eventually, we created a working pipeline that gathers recent articles, groups them, checks for duplicates, and ensures we’re not repeating past content.
First Version of the Production Website
This version was pretty solid. We used Google News as our main source since it allowed many queries.
Problems:
1. All article URLs came from Google.
2. We had to use headless Chrome to bypass Google’s bot detection.
3. Content was in raw HTML, hard to parse.
4. It consumed a lot of resources and was slow.
Second Version of the Production Website
We evaluated several news platforms. Finally, we chose an affordable provider delivering 10–20 articles daily. But after optimizing, it dropped to 2–5, so we added a second provider, now reaching 15–35 daily.
Still, we were missing one thing:
images.
Third Version of the Production Website
We bought access to a low-cost AI image generator. Initially full of clichés, we improved the prompts and got better results. We crop and convert images to lightweight formats and generate two versions per article (mobile and desktop).
Pipeline Update
We updated the pipeline to generate these images automatically every day.
Fourth Version of the Production Website
This week we translated the site into Brazilian Portuguese. It took just two days, and the full translation of nearly 700 articles cost
one dollar. We plan to support more languages soon.
Fifth Version of the Production Website
Currently in development. We’re working on multilingual support. Next languages: German, French, Italian.
Site Surprises
Since article classification is fully automated, some surprising stories appear: like the cybersecurity risks of obesity in the UK, or cheap phones lacking updates. Irrelevant articles are filtered early — only those with real potential are published.