Long-context intelligence for software, security, and agents.
We are building toward a proprietary, Nullsec-native model layer with a long-term technical target of up to 1.2 million tokens of context. The objective is not a large input window. It is useful long-context reasoning over complete systems.
- Context target
- ≤ 1,200,000 tokens
- Domain
- software · security · agents
- Status
- roadmap target — not shipped
Most software problems are not isolated.
A vulnerability in an API route may depend on an authentication helper in another directory, a middleware file, an environment variable, and a database permission model. A short-context system sees only the route. A long-context system is designed to see the full chain — and reason over the relationships between distant parts of a codebase.
The goal is not to support a large input window. The goal is useful long-context intelligence: the model must accurately locate, prioritize, reason over, and act on the most important information inside that context. That is what we optimize for.
Naive context scaling does not work.
Self-attention couples every token to every other token. Cost grows with the square of sequence length, and the key–value cache grows linearly until it — not the parameters — dominates memory.
Attention compute and memory scale as O(n²) in sequence length n. Doubling the context quadruples the cost — the wall that makes brute-force scaling intractable.
KV-cache memory grows linearly in n. At n = 1.2M, the cache dominates the budget — motivating paged and quantized caching, context reuse, and tier-based access.
Position methods that hold signal at distance.
Extending context begins with the positional encoding. We combine long-context fine-tuning with interpolation and RoPE base scaling so the model generalizes far beyond its pretraining length without losing local resolution.
Position interpolation rescales token positions so a model trained to length Ltrain stays in-distribution when served at Ltarget.
NTK-aware extension rescales the RoPE base b for length scale s = Ltarget/Ltrain, stretching low frequencies while preserving high-frequency detail.
Not every token has equal value.
Even with a very large window, the window must be filled well. Retrieval and ranking select the spans that matter; compression folds down boilerplate and logs; hierarchical memory carries project, build, security, and agent state across a task.
Under a token budget B, the context assembler selects the span set S that maximizes total relevance subject to the budget — keeping the window dense with signal rather than full of text.
Working-context allocation
Illustrative split of a long window (K tokens).
Repository structure and retrieved evidence dominate; logs and memory are compressed to preserve budget.
Contribution to effective context
Relative impact on keeping long context useful (indexed).
Illustrative weighting of the techniques that turn raw window size into usable reasoning.
The full pipeline is reserved for where it pays off.
Not every request needs 1.2M tokens. A router classifies each task by tier and difficulty: lightweight work goes to cost-efficient models, and the full long-context path is reserved for workflows where system-level understanding creates real value.
Serving objectives the stack is engineered around. Illustrative targets, not measured results.
Trained on how software actually breaks.
Software does not break in clean benchmark environments. It breaks where code, infrastructure, dependencies, and security assumptions interact. The model layer is trained against real failure data and designed to flag risk classes directly.
Risk classes the model layer is designed to detect — feeding Nullsec S1, Nullsec Guard, and the Model Firewall.
Measured against real security cases.
We evaluate the security layer on a held-out benchmark of 111 real vulnerability cases — not generic leaderboards. Nullsec-1, the fine-tuned adapter behind Nullsec S1, is compared against its base model, a static-analysis baseline, and frontier API models.
| System / tool | Cases | Analyzable | Precision | Recall | F1 | False-safe ↓ | Halluc. ↓ |
|---|---|---|---|---|---|---|---|
★Nullsec-1 Nullsec S1 adapter | 111 | 110/111 | 0.9423 | 0.9074 | 0.9245 | 0.0000 | 0.0667 |
Qwen2.5-Coder-7B-Instruct base · no Nullsec adapter | 111 | 4/111 | 0.3333 | 0.0093 | 0.0180 | 0.0000 | 0.5000 |
Semgrep local rules baseline | 111 | 111/111 | 0.8627 | 0.4074 | 0.5535 | 0.5625 | 0.3333 |
Claude API claude-opus-4-8 | 111 | 68/111 | 0.8889 | 0.5185 | 0.6550 | 0.0000 | 0.1429 |
OpenAI / Codex API gpt-5.3-codex | 111 | 105/111 | 0.6169 | 0.8796 | 0.7252 | 0.0000 | 0.6000 |
Higher is better for precision, recall, and F1; lower is better for false-safe and hallucination rate. Outputs analyzable = responses that returned valid, parseable findings. Internal evaluation across 111 held-out security cases.
A control plane between agents and the world.
The future of software is agentic: agents write code, run commands, deploy, call APIs, and touch wallets. The Model Firewall evaluates each proposed action with full system context and decides whether it is safe, expensive, risky, or malicious — before it executes.
We do not judge by generic benchmarks.
The model layer is evaluated against the tasks that matter to the ecosystem. A 1.2M window is not the goal — understanding the full environment around software is.
| # | Capability under evaluation | Domain |
|---|---|---|
| 01 | Understand an entire repository | Software |
| 02 | Locate a security issue across multiple files | Security |
| 03 | Connect a failed deploy to the right env variable | Infra |
| 04 | Decide whether an agent action is safe | Agents |
| 05 | Review AI-generated software for real risks | Security |
| 06 | Improve NullsecBot build success rate | Software |
| 07 | Reduce repeated build failures | Software |
| 08 | Reason across contracts, wallets, APIs, and deploys | Web3 |
On status. The 1.2M-token model layer, internal inference, and the Model Firewall are roadmap targets and technical development objectives. Equations describe the methods we build on; figures are illustrative and do not represent measured results or a shipped capability today.
