Cybersecurity News Site Powered by AI

July 27, 2025 4 min read
Creation of a Cybersecurity News Site with AI-Driven Investigations

Origins of news.csraid.com

It all started when we were brainstorming a project idea for our Master's thesis in Generative Artificial Intelligence. We couldn’t come up with anything solid, and at the same time, we were struggling to keep up with cybersecurity news across multiple websites. Given how fast-paced the field is, we needed a way to stay updated quickly and efficiently — and that’s how the idea was born. To explain everything better, I’ve broken it down into two parts: the news pipeline (how we gather and process news), and the production website (where the public sees the final result).

First Version of the News System

We built an initial prototype that pulled RSS feeds from various cybersecurity sources. These articles were then summarized using AI — in both English and Spanish — and to our surprise, the results were excellent. The summaries were quick to read and linked directly to the original articles.

First Problem

After seeing how effective and inexpensive it was to generate summaries, we considered making the site public. But first, we researched the licenses of the RSS feeds. Turns out, only two of them allowed commercial use without restrictions. Since we planned to monetize the site through ads, most of those sources were off-limits.

Second Version of the News System

To address this, we pivoted to a model where we still used the RSS feeds, but instead of republishing them, we would conduct our own investigations and publish original articles — sidestepping licensing issues.

New Problem

This didn’t work as expected. RSS content was often too short, and even when visiting the linked sites, there wasn’t always enough information. The system ended up producing very few articles and was discarded.

Third Version of the News Pipeline

We asked the AI how to better query public sources using keywords. We defined 18 categories for different types of cybersecurity news and refined our method to extract headlines. Eventually, we created a working pipeline that gathers recent articles, groups them, checks for duplicates, and ensures we’re not repeating past content.

First Version of the Production Website

This version was pretty solid. We used Google News as our main source since it allowed many queries.

Problems:

1. All article URLs came from Google. 2. We had to use headless Chrome to bypass Google’s bot detection. 3. Content was in raw HTML, hard to parse. 4. It consumed a lot of resources and was slow.

Second Version of the Production Website

We evaluated several news platforms. Finally, we chose an affordable provider delivering 10–20 articles daily. But after optimizing, it dropped to 2–5, so we added a second provider, now reaching 15–35 daily. Still, we were missing one thing: images.

Third Version of the Production Website

We bought access to a low-cost AI image generator. Initially full of clichés, we improved the prompts and got better results. We crop and convert images to lightweight formats and generate two versions per article (mobile and desktop).

Pipeline Update

We updated the pipeline to generate these images automatically every day.

Fourth Version of the Production Website

This week we translated the site into Brazilian Portuguese. It took just two days, and the full translation of nearly 700 articles cost one dollar. We plan to support more languages soon.

Fifth Version of the Production Website

Currently in development. We’re working on multilingual support. Next languages: German, French, Italian.

Site Surprises

Since article classification is fully automated, some surprising stories appear: like the cybersecurity risks of obesity in the UK, or cheap phones lacking updates. Irrelevant articles are filtered early — only those with real potential are published.
Share this article
Help spread cybersecurity knowledge