§ Research · The Nullsec model layer

Long-context intelligence for software, security, and agents.

We are building toward a proprietary, Nullsec-native model layer with a long-term technical target of up to 1.2 million tokens of context. The objective is not a large input window. It is useful long-context reasoning over complete systems.

Context target: ≤ 1,200,000 tokens
Domain: software · security · agents
Status: roadmap target — not shipped

§01/Why long context

Most software problems are not isolated.

A vulnerability in an API route may depend on an authentication helper in another directory, a middleware file, an environment variable, and a database permission model. A short-context system sees only the route. A long-context system is designed to see the full chain — and reason over the relationships between distant parts of a codebase.

The goal is not to support a large input window. The goal is useful long-context intelligence: the model must accurately locate, prioritize, reason over, and act on the most important information inside that context. That is what we optimize for.

Repository as a dependency graphFig. 1

Fig. 1. A repository is processed as a structured graph of files, imports, routes, schemas, and security boundaries — not as unstructured text. Highlighted nodes carry authentication, wallet, secret, and data-access risk and propagate across the system.

§02/The complexity wall

Naive context scaling does not work.

Self-attention couples every token to every other token. Cost grows with the square of sequence length, and the key–value cache grows linearly until it — not the parameters — dominates memory.

Attn(Q,K,V) = softmax(QK^⊤ / √d_k) V ∝ O(n²)

(1)

Attention compute and memory scale as O(n²) in sequence length n. Doubling the context quadruples the cost — the wall that makes brute-force scaling intractable.

M_kv = 2 · L · n_kv · d_head · n · p_bytes

(2)

KV-cache memory grows linearly in n. At n = 1.2M, the cache dominates the budget — motivating paged and quantized caching, context reuse, and tier-based access.

§03/Extending the window

Position methods that hold signal at distance.

Extending context begins with the positional encoding. We combine long-context fine-tuning with interpolation and RoPE base scaling so the model generalizes far beyond its pretraining length without losing local resolution.

m′ = m · ( L_train / L_target )

(3)

Position interpolation rescales token positions so a model trained to length L_train stays in-distribution when served at L_target.

θ_i = b^−2i/d, b′ = b · s^d/(d−2)

(4)

NTK-aware extension rescales the RoPE base b for length scale s = L_target/L_train, stretching low frequencies while preserving high-frequency detail.

§04/Constructing useful context

Not every token has equal value.

Even with a very large window, the window must be filled well. Retrieval and ranking select the spans that matter; compression folds down boilerplate and logs; hierarchical memory carries project, build, security, and agent state across a task.

max_S Σ_i∈S rel(s_i) s.t. Σ_i∈S |s_i| ≤ B

(5)

Under a token budget B, the context assembler selects the span set S that maximizes total relevance subject to the budget — keeping the window dense with signal rather than full of text.

Working-context allocation

Illustrative split of a long window (K tokens).

Budget

Repository structure and retrieved evidence dominate; logs and memory are compressed to preserve budget.

Contribution to effective context

Relative impact on keeping long context useful (indexed).

Indexed

Retrieval + ranking92

Context compression78

Hierarchical memory71

Repo context maps66

KV-cache paging61

Illustrative weighting of the techniques that turn raw window size into usable reasoning.

§05/Serving economics

The full pipeline is reserved for where it pays off.

Not every request needs 1.2M tokens. A router classifies each task by tier and difficulty: lightweight work goes to cost-efficient models, and the full long-context path is reserved for workflows where system-level understanding creates real value.

Inference & serving pipelineFig. 2

Fig. 2. A request is routed by tier and difficulty, context is constructed from the repository map, retrieval, compression, and hierarchical memory, then served by the long-context model with paged KV-cache. Output is audited by the Model Firewall before any action.

KV-cache reuse

Route accuracy

Context compression

Serving objectives the stack is engineered around. Illustrative targets, not measured results.

§06/Security-aware reasoning

Trained on how software actually breaks.

Software does not break in clean benchmark environments. It breaks where code, infrastructure, dependencies, and security assumptions interact. The model layer is trained against real failure data and designed to flag risk classes directly.

Exposed secrets

Broken authentication

Missing rate limits

Unsafe API routes

Dangerous file writes

Command injection

Prompt injection

Permission misuse

Wallet action risk

Dependency vulnerabilities

Insecure database access

Environment exposure

Risk classes the model layer is designed to detect — feeding Nullsec S1, Nullsec Guard, and the Model Firewall.

§07/Nullsec S1 · Security benchmark

Measured against real security cases.

We evaluate the security layer on a held-out benchmark of 111 real vulnerability cases — not generic leaderboards. Nullsec-1, the fine-tuned adapter behind Nullsec S1, is compared against its base model, a static-analysis baseline, and frontier API models.

0.9245

F1 — best in class

0.9423

Precision

0.9074

Recall

0.00

False-safe rate

Security benchmark · 111 casesFig. 4

PrecisionRecallF1

Fig. 4. Precision, recall, and F1 across systems. Nullsec-1 leads on all three, combining the precision of static analysis with recall beyond frontier API models. Higher is better.

System / tool	Cases	Analyzable	Precision	Recall	F1	False-safe ↓	Halluc. ↓
★Nullsec-1 Nullsec S1 adapter	111	110/111	0.9423	0.9074	0.9245	0.0000	0.0667
Qwen2.5-Coder-7B-Instruct base · no Nullsec adapter	111	4/111	0.3333	0.0093	0.0180	0.0000	0.5000
Semgrep local rules baseline	111	111/111	0.8627	0.4074	0.5535	0.5625	0.3333
Claude API claude-opus-4-8	111	68/111	0.8889	0.5185	0.6550	0.0000	0.1429
OpenAI / Codex API gpt-5.3-codex	111	105/111	0.6169	0.8796	0.7252	0.0000	0.6000

Higher is better for precision, recall, and F1; lower is better for false-safe and hallucination rate. Outputs analyzable = responses that returned valid, parseable findings. Internal evaluation across 111 held-out security cases.

§08/The Model Firewall

A control plane between agents and the world.

The future of software is agentic: agents write code, run commands, deploy, call APIs, and touch wallets. The Model Firewall evaluates each proposed action with full system context and decides whether it is safe, expensive, risky, or malicious — before it executes.

Model Firewall — Cloudflare for AI agent actionsFig. 3

Fig. 3. Proposed agent actions are inspected, scored with full-context risk evaluation, and gated allow / hold / deny. High-risk actions route to mobile for human approval. A firewall between AI agents and the systems they act on.

§09/Evaluation

We do not judge by generic benchmarks.

The model layer is evaluated against the tasks that matter to the ecosystem. A 1.2M window is not the goal — understanding the full environment around software is.

#	Capability under evaluation	Domain
01	Understand an entire repository	Software
02	Locate a security issue across multiple files	Security
03	Connect a failed deploy to the right env variable	Infra
04	Decide whether an agent action is safe	Agents
05	Review AI-generated software for real risks	Security
06	Improve NullsecBot build success rate	Software
07	Reduce repeated build failures	Software
08	Reason across contracts, wallets, APIs, and deploys	Web3

On status. The 1.2M-token model layer, internal inference, and the Model Firewall are roadmap targets and technical development objectives. Equations describe the methods we build on; figures are illustrative and do not represent measured results or a shipped capability today.