Published On: Thu, 02 Jul 2026 17:22:01 GMT

HaloGuard 1.0 Beats Larger Guard Models on Prompt Safety

Trishool's HaloGuard 1.0 on Bittensor SN23 achieved state-of-the-art multilingual prompt safety, outperforming larger open guard models.

Akshat Thakur News Altcoin

Jul 2, 2026, 5:22 PM UTC

Author: Akshat Thakur

Low Attention

Overheated

HypeMeter

2nd July 2026 – Astroware Labs has released HaloGuard 1.0, an open-weight safety classifier for AI prompts. The team says its smallest model beats guard systems many times larger.

High Signal Summary For A Quick Glance

📌 Key Takeaways 👥 Who Is Impacted 📊 Implications

Astroware Labs released HaloGuard 1.0, an open-weight prompt-safety classifier, on July 2, 2026
The 0.8B model scores 90.9 average F1 across seven benchmarks, beating LlamaGuard4 (12B) at 75.9 and ShieldGemma (27B) at 70.0, according to the team
It runs on Bittensor Subnet 23 (Trishool), where miners attack the guard and validators score them in weekly cycles

Developers and AI teams who need a fast, cheap first-layer filter to catch jailbreaks and prompt injections before an LLM or agent runs
Makers of larger guard models, now facing a much smaller open-weight rival that claims higher benchmark scores

🟢 Short term: A free, downloadable guard that teams can test today under an Apache 2.0 license

🟡 Long term: Pressure on big-lab guard models if small, adversarially trained classifiers keep winning on benchmarks

🔴 Key risk: All scores are self-reported with the arXiv paper still pending, so independent reproduction is unproven

$TAO

Bittensor

N/A

Social Growth Rate

MoM

9.94%

Engagement Ratio

Active

86.33%

Organic Activity

Low bot signals

Extremely Positive

The Talk

Real voices. Real reactions.

KOLs

Media

Community

@trishoolai Well-done trishool team, miners, validators and supporters! Sn23 is undeniably a top-tier subnet! https://t.co/fs9xU1bBr5

We’re excited to announce that Trishool’s HaloGuard 1.0 𝐡𝐚𝐬 𝐚𝐜𝐡𝐢𝐞𝐯𝐞𝐝 𝐒𝐎𝐓𝐀 prompt-safety performance among open-weight guard models. Today, we present HaloGuard 1.0, a constitutional input classifier for multilingual AI safety. It is built as a first-layer input https://t.co/YJswErsZSs

04:20 PM·Jul 2, 2026

@trishoolai That's amazing!

03:52 PM·Jul 2, 2026

@trishoolai Incredibly proud of the Trishool team!

03:14 PM·Jul 2, 2026

Jul 24, 2026

Aster Vault Beta Launches With 5 Managed Perp Vaults

Jul 22, 2026

Uniswap DualPool Hook Launches, Audited and Open-Source

Jul 22, 2026

xStocks Taps GTN to Add Tokenized Hong Kong Equities

Jul 22, 2026

Movement Labs Bankruptcy: Ousted Founder Tops Creditors

Steady attention without excessive speculation.

HaloGuard 1.0 screens incoming prompts for jailbreaks, prompt injections, and policy violations. It acts as a first-layer filter, so risky requests get flagged before an LLM or agent ever runs. Astroware announced the release on July 2 through its Trishool account on X.

A Small Model With a Big Claim

The headline claim is size. The 0.8B version scores 90.9 average F1 across seven prompt-safety benchmarks, according to the team. The 4B version reaches 92.1.

The seven tests are OAI Moderation, Aegis, Aegis 2.0, ToxiC, SimpST, HarmBench, and WildGuardTest. On Aegis 2.0, the 0.8B model scores 87.9 F1. The rest sit in a similar range.

Those numbers matter because of what they beat. Meta’s LlamaGuard4, at 12B parameters, sits at 75.9 on the same tests. Google’s ShieldGemma, at 27B, lands at 70.0.

In other words, a model roughly 15 to 33 times smaller comes out ahead. The team also puts HaloGuard 1.0 above NemoGuard 8B (82.9), WildGuard 7B (85.8), Qwen3Guard-Gen 8B (86.2), and PolyGuard-Qwen 7B (87.0).

F1 score is the balance between catching real harms and avoiding false alarms. So a higher average suggests fewer missed attacks without flagging safe prompts as dangerous. For now, all of these results are self-reported.

Tweet not available.

How HaloGuard 1.0 Works

It is a constitutional classifier built on Qwen3.5 base models. Instead of a simple label head, it reads the prompt against a written safety “constitution” and then generates a verdict.

Because it generates text, the model returns a safe or unsafe call plus a category, rather than a single score. As a result, the output reads more like a reasoned judgment than a raw flag.

This first-layer approach is meant to be fast and cheap. So teams can screen traffic inline before spending money on a large downstream model or an agent run.

Model
Parameters
Avg F1 Score
Performance per Billion Parameters*
Size vs Performance

HaloGuard 1.0

0.8B

90.9

~113.6

Best efficiency

Delivers near state-of-the-art performance while using less than 1B parameters.

HaloGuard 1.0

4B

92.1

~23.0

Highest overall performance

Achieves the strongest benchmark score among compared safety models.

LlamaGuard 4

12B

75.9

~6.3

Requires significantly more parameters while producing materially lower benchmark performance.

ShieldGemma

27B

70.0

~2.6

Largest model, lowest efficiency

Uses the most parameters yet records the weakest overall benchmark results in this comparison.

*Performance per billion parameters = Average F1 score divided by model parameter count. Higher values indicate greater parameter efficiency.

Built on Bittensor’s Trishool Subnet

HaloGuard comes out of Trishool, also known as Subnet 23 on Bittensor. Trishool is a decentralized red-teaming network, and its whole point is adversarial pressure.

Here is the loop. Miners compete to break the guard with jailbreaks and injections. Then validators score those attacks, and the successful ones feed back into training.

These cycles usually run about seven days. So the guard keeps meeting fresh attacks. The team frames it as a “living” model, not a fixed release. According to Astroware, Subnet 23 earns a small share of Bittensor emissions, roughly 0.19% in recent snapshots.

The economic design ties into Bittensor’s core thesis. Attackers and validators earn rewards, so the incentive points toward finding weaknesses instead of hiding them.

From Alpha to Production

HaloGuard did not appear overnight. An earlier build, Halo Guard Alpha, reached about 87% F1 in the spring. The team has shipped steadily since then.

Real deployments came next. In May 2026, Halo integrated with Chutes on Subnet 64 for live serving. Then in June, the project joined the Google for Startups Web3 Program.

Astroware took over Subnet 23 roughly seven months ago. Since then, the work has moved through two phases. Phase 1 set the core guard and policy framework. Phase 2 now handles low-latency input guarding.

The Training Data Behind the Scores

The team says the granularity is what sets HaloGuard apart. It starts from 46 constitutional policies. Then those expand into 490 categories and 2,940 fine-grained subcategories.

That is far more detailed than the coarse “harmful or safe” labels used by many older guards. Because of that detail, the model can learn narrower intent boundaries.

The corpus itself holds 1,259,451 synthetic records, or about 1.26 million. Crucially, it uses 1:1 paired counterfactuals. Each pair keeps the same topic and vocabulary but flips the intent. So the model learns meaning, not surface keywords.

Coverage also spans 46 languages in a balanced mix. As a result, the team argues that a language switch should not slip past the filter.

What Still Needs Proof

Every benchmark here comes from Astroware and the model card, not an outside auditor. The full arXiv technical report is still pending. So independent researchers cannot yet reproduce the scores.

That gap matters in a field where eval sets can be cherry-picked or contaminated. Some machine learning researchers may also question the synthetic data. Others may ask how the multilingual coverage holds up on real-world traffic.

The models are open weight under an Apache 2.0 license. So anyone can download and test them today on Hugging Face. That openness makes independent checks possible, even if none have landed yet. None of this is financial advice.

For now, the model covers input prompts only. Response-side checks and agent tool-use scenarios sit on the roadmap, according to the team.

The real test arrives with the arXiv paper and the first outside reproductions. Until then, HaloGuard 1.0 stands as a bold, downloadable claim. A small decentralized guard says it can outscore far larger labs.

Frequently Asked Questions

What is HaloGuard 1.0?

HaloGuard 1.0 is an open-weight “constitutional” input classifier that screens user prompts for jailbreaks, prompt injections, and policy violations before they reach an LLM or agent. It was released by Astroware Labs on July 2, 2026 in 0.8B and 4B parameter sizes.

How does HaloGuard 1.0 compare to LlamaGuard and ShieldGemma?

According to the team, the 0.8B HaloGuard scores 90.9 average F1 across seven benchmarks, ahead of Meta’s 12B LlamaGuard4 at 75.9 and Google’s 27B ShieldGemma at 70.0. The 4B version reaches 92.1. These results are self-reported and not yet independently reproduced.

What is Bittensor’s Trishool subnet?

Trishool, or Subnet 23, is a decentralized AI red-teaming network on Bittensor. Miners compete to break the guard model with adversarial attacks, validators score them, and successful attacks retrain the guard in roughly weekly cycles.

Are HaloGuard 1.0’s benchmark results verified?

Not yet. Every score comes from Astroware and the model card, and the full arXiv technical report is still pending. Because the models are open weight under Apache 2.0, outside researchers can download and test them, but no independent reproduction has landed so far.

What’s next for HaloGuard?

HaloGuard 1.0 currently guards input prompts only. The team lists response-side checks and agent tool-use scenarios on its roadmap, with the arXiv paper and first outside reproductions expected to test the current claims.

Editorial Note

Our Crypto Talk is committed to unbiased, transparent, and true reporting to the best of our knowledge. This news article aims to provide accurate information in a timely manner. However, we advise the readers to verify facts independently and consult a professional before making any decisions based on the content since our sources could be wrong too. Check our Terms and conditions for more info.

Join Our Telegram Community

NEWS

OPINION/RESEARCH

FEATURES

ABOUT US

HaloGuard 1.0 Beats Larger Guard Models on Prompt Safety

The Talk

Read Other Altcoin articles

A Small Model With a Big Claim

How HaloGuard 1.0 Works

HaloGuard 1.0

HaloGuard 1.0

LlamaGuard 4

ShieldGemma

Built on Bittensor’s Trishool Subnet

From Alpha to Production

The Training Data Behind the Scores

What Still Needs Proof

What Comes Next

Frequently Asked Questions

Editorial Note

In this article

Related reads

Related reads

Related reads

Related reads

Read Other Altcoin articles