§ Research · The Nullsec model layer

Long-context intelligence for software, security, and agents.

We are building toward a proprietary, Nullsec-native model layer with a long-term technical target of up to 1.2 million tokens of context. The objective is not a large input window. It is useful long-context reasoning over complete systems.

Context target
≤ 1,200,000 tokens
Domain
software · security · agents
Status
roadmap target — not shipped
§01/Why long context

Most software problems are not isolated.

A vulnerability in an API route may depend on an authentication helper in another directory, a middleware file, an environment variable, and a database permission model. A short-context system sees only the route. A long-context system is designed to see the full chain — and reason over the relationships between distant parts of a codebase.

The goal is not to support a large input window. The goal is useful long-context intelligence: the model must accurately locate, prioritize, reason over, and act on the most important information inside that context. That is what we optimize for.

Repository as a dependency graphFig. 1
app/page.tsxapi/route.tsmiddleware.tsauth.tswallet.tsdb/schema.ts.envdeploy.ymlsecurity boundary (auth · wallet · secrets · data)
Fig. 1. A repository is processed as a structured graph of files, imports, routes, schemas, and security boundaries — not as unstructured text. Highlighted nodes carry authentication, wallet, secret, and data-access risk and propagate across the system.
§02/The complexity wall

Naive context scaling does not work.

Self-attention couples every token to every other token. Cost grows with the square of sequence length, and the key–value cache grows linearly until it — not the parameters — dominates memory.

Attn(Q,K,V) = softmax(QK / √dk) V   ∝   O(n²)
(1)

Attention compute and memory scale as O(n²) in sequence length n. Doubling the context quadruples the cost — the wall that makes brute-force scaling intractable.

Mkv = 2 · L · nkv · dhead · n · pbytes
(2)

KV-cache memory grows linearly in n. At n = 1.2M, the cache dominates the budget — motivating paged and quantized caching, context reuse, and tier-based access.

§03/Extending the window

Position methods that hold signal at distance.

Extending context begins with the positional encoding. We combine long-context fine-tuning with interpolation and RoPE base scaling so the model generalizes far beyond its pretraining length without losing local resolution.

m′ = m · ( Ltrain / Ltarget )
(3)

Position interpolation rescales token positions so a model trained to length Ltrain stays in-distribution when served at Ltarget.

θi = b−2i/d,   b′ = b · sd/(d−2)
(4)

NTK-aware extension rescales the RoPE base b for length scale s = Ltarget/Ltrain, stretching low frequencies while preserving high-frequency detail.

§04/Constructing useful context

Not every token has equal value.

Even with a very large window, the window must be filled well. Retrieval and ranking select the spans that matter; compression folds down boilerplate and logs; hierarchical memory carries project, build, security, and agent state across a task.

maxS Σi∈S rel(si)   s.t.   Σi∈S |si| ≤ B
(5)

Under a token budget B, the context assembler selects the span set S that maximizes total relevance subject to the budget — keeping the window dense with signal rather than full of text.

Working-context allocation

Illustrative split of a long window (K tokens).

Budget
5384032691340480KRepository220KDocs200KLogs/traces200KMemory100KReserve

Repository structure and retrieved evidence dominate; logs and memory are compressed to preserve budget.

Contribution to effective context

Relative impact on keeping long context useful (indexed).

Indexed
Retrieval + ranking92
Context compression78
Hierarchical memory71
Repo context maps66
KV-cache paging61

Illustrative weighting of the techniques that turn raw window size into usable reasoning.

§05/Serving economics

The full pipeline is reserved for where it pays off.

Not every request needs 1.2M tokens. A router classifies each task by tier and difficulty: lightweight work goes to cost-efficient models, and the full long-context path is reserved for workflows where system-level understanding creates real value.

Inference & serving pipelineFig. 2
Requesttask + scopeRoutertier · difficultyCONTEXT CONSTRUCTIONRepo context mapfiles · deps · boundariesRetrieval + rankingtop-k relevant spansContext compressionsummarize low-signalHierarchical memoryproject · build · agentLong-contextmodel · ≤1.2M ctxKV-cache · pagedFirewallaudit gateOutputaudited
Fig. 2. A request is routed by tier and difficulty, context is constructed from the repository map, retrieval, compression, and hierarchical memory, then served by the long-context model with paged KV-cache. Output is audited by the Model Firewall before any action.
88%KV-cache reuse
94%Route accuracy
76%Context compression

Serving objectives the stack is engineered around. Illustrative targets, not measured results.

§06/Security-aware reasoning

Trained on how software actually breaks.

Software does not break in clean benchmark environments. It breaks where code, infrastructure, dependencies, and security assumptions interact. The model layer is trained against real failure data and designed to flag risk classes directly.

Exposed secrets
Broken authentication
Missing rate limits
Unsafe API routes
Dangerous file writes
Command injection
Prompt injection
Permission misuse
Wallet action risk
Dependency vulnerabilities
Insecure database access
Environment exposure

Risk classes the model layer is designed to detect — feeding Nullsec S1, Nullsec Guard, and the Model Firewall.

§07/Nullsec S1 · Security benchmark

Measured against real security cases.

We evaluate the security layer on a held-out benchmark of 111 real vulnerability cases — not generic leaderboards. Nullsec-1, the fine-tuned adapter behind Nullsec S1, is compared against its base model, a static-analysis baseline, and frontier API models.

0.9245
F1 — best in class
0.9423
Precision
0.9074
Recall
0.00
False-safe rate
Security benchmark · 111 casesFig. 4
1.000.750.500.250.000.940.910.92Nullsec-10.330.010.02Qwen2.50.860.410.55Semgrep0.890.520.66Claude0.620.880.73Codex
PrecisionRecallF1
Fig. 4. Precision, recall, and F1 across systems. Nullsec-1 leads on all three, combining the precision of static analysis with recall beyond frontier API models. Higher is better.
System / toolCasesAnalyzablePrecisionRecallF1False-safe ↓Halluc. ↓
Nullsec-1
Nullsec S1 adapter
111110/1110.94230.90740.92450.00000.0667
Qwen2.5-Coder-7B-Instruct
base · no Nullsec adapter
1114/1110.33330.00930.01800.00000.5000
Semgrep
local rules baseline
111111/1110.86270.40740.55350.56250.3333
Claude API
claude-opus-4-8
11168/1110.88890.51850.65500.00000.1429
OpenAI / Codex API
gpt-5.3-codex
111105/1110.61690.87960.72520.00000.6000

Higher is better for precision, recall, and F1; lower is better for false-safe and hallucination rate. Outputs analyzable = responses that returned valid, parseable findings. Internal evaluation across 111 held-out security cases.

§08/The Model Firewall

A control plane between agents and the world.

The future of software is agentic: agents write code, run commands, deploy, call APIs, and touch wallets. The Model Firewall evaluates each proposed action with full system context and decides whether it is safe, expensive, risky, or malicious — before it executes.

Model Firewall — Cloudflare for AI agent actionsFig. 3
Autonomous agentshell execfile writeAPI callwallet txdeploydb queryMCP toolModel Firewallinspect · score · gatefull-context risk evalallowholddenyapproval routed to mobileThe real worldterminal · fschain · walletscloud · deploydatabases · APIsA control plane between agents and the systems they act on.
Fig. 3. Proposed agent actions are inspected, scored with full-context risk evaluation, and gated allow / hold / deny. High-risk actions route to mobile for human approval. A firewall between AI agents and the systems they act on.
§09/Evaluation

We do not judge by generic benchmarks.

The model layer is evaluated against the tasks that matter to the ecosystem. A 1.2M window is not the goal — understanding the full environment around software is.

#Capability under evaluationDomain
01Understand an entire repositorySoftware
02Locate a security issue across multiple filesSecurity
03Connect a failed deploy to the right env variableInfra
04Decide whether an agent action is safeAgents
05Review AI-generated software for real risksSecurity
06Improve NullsecBot build success rateSoftware
07Reduce repeated build failuresSoftware
08Reason across contracts, wallets, APIs, and deploysWeb3

On status. The 1.2M-token model layer, internal inference, and the Model Firewall are roadmap targets and technical development objectives. Equations describe the methods we build on; figures are illustrative and do not represent measured results or a shipped capability today.