Discussion: Defense-in-Depth — Emergency Controls & Transfer-Layer Safety

Pre-SLURP discussion document — intended to open a thread on talk.stake.link. If the community converges on scope, this can be formalized as a SLURP in a follow-up.

Summary

This is a discussion document — not a SLURP — intended to open a forum thread on talk.stake.link and gauge community appetite before anyone invests effort drafting a formal proposal. It sketches a protocol-level emergency-response module that would close the gap between exploit detection and on-chain response: a new EmergencyController contract that consolidates (a) a transfer-layer hook on stLINK and SDL and (b) scoped pause logic for critical user paths, activated by a whitelisted automated monitoring address (Hypernative) rather than a manual multisig. A secondary topic — rate-limiting on large withdrawal flows — is mentioned briefly in an appendix but is not the focus of this discussion.

If the thread converges on scope, activator model, and governance sunset parameters, the natural next step is a formal SLURP that the DAO governance multisig can execute through the GovernanceTimelock once ratified.

Motivation

The recent Kelp DAO incident — a cross-chain / off-chain verifier compromise where funds were drained faster than governance could respond — demonstrates that smart-contract correctness is only half of protocol safety. The other half is what the protocol can do in the minutes immediately following breach detection. Kelp’s emergency multisig froze contracts roughly 46 minutes after detection and meaningfully reduced losses; the lesson we draw is that even multisig-mediated response is often too slow, and that automated detection wired to on-chain response is strictly better wherever trust assumptions permit.

The current stakedotlink/contracts codebase has strong deposit/withdrawal logic, but the incident-response surface is fragmented:

· PriorityPool inherits PausableUpgradeable, but that mechanism is used operationally for merkle distribution windows — it is not an emergency pause. Re-using it for exploit response would overload its semantics and interfere with distributions.

· The existing emergency-pause path is RebaseController.pausePool() via emergencyPauser, which flips poolStatus to CLOSED. Hypernative already holds a 1/1 authorized address on that path and can react within seconds of detection. This is the right model — but it is scoped to slashing-adjacent state and does not cover the stLINK / SDL transfer layer.

· StakingPool, WithdrawalPool, SecurityPool (outside slashing), SDLPool, and the stLINK / SDL ERC-677 tokens have no transfer hook and no way to halt laundering of stolen liquid tokens during an active incident.

· Access control is OwnableUpgradeable singletons with owner = GovernanceTimelock (0xb72d8F5213b3E52FAf13Aa074b03C4788e78349F). Proposer and executor is the DAO governance multisig (0xB351EC0FEaF4B99FdFD36b484d9EC90D0422493D), with the Governance Council ratifying proposals and reSDL holders voting via Snapshot. Every privileged deactivation must sit out the full timelock delay before execution — which is correct for unpausing, but too slow for activation.

Design Choice: Automated Activation over Manual Multisig

An earlier draft of this document proposed a 2-of-N Guardian multisig as the emergency pauser. On reflection, a manual multisig is the wrong primitive for the activation side of incident response:

· Exploits drain funds in minutes. Human signer coordination takes longer than that in the worst case.

· Hypernative already monitors the protocol and already holds an authorized pauser address for the existing poolStatus emergency path. Extending that same model to the new module re-uses trust assumptions the protocol has already accepted.

· Pausing is cheap and reversible; unpausing is the privileged operation. The correct asymmetry is: a fast, automated activator can pause, but only the DAO via timelock can unpause or unfreeze.

Accordingly, the proposal below is structured around a single new contract that consolidates the transfer hook and pause logic, with activation whitelisted to Hypernative’s monitoring address (and, optionally, a small backup set) rather than gated on a manual multisig.

Proposed Module (for Discussion)

The spec below is a starting point — the exact gating surfaces, activator set, and sunset parameters are open for debate in the forum thread.

EmergencyController — Transfer-Layer Hook + Scoped Pauser

Goal: a single new contract, owned by the GovernanceTimelock, that (a) provides a transfer-layer hook for stLINK and SDL, (b) exposes scoped pause flags for critical user paths, and (c) is activated by a whitelist of authorized addresses — primarily Hypernative’s automated monitor — while deactivation remains with governance.

Activation surface:

· transfersPaused (bool) — halts stLINK / SDL transfers globally during an active incident. Whitelisted activator flips true; only GovernanceTimelock flips false.

· frozen(address) (mapping) — freezes individual known-exploiter addresses. Whitelisted activator can freeze; only GovernanceTimelock can unfreeze. Hardcoded simultaneous-freeze cap (e.g. 16) to prevent role abuse.

· criticalPathsPaused (bitmap or per-contract flag) — scoped pause for WithdrawalPool.queueWithdrawal, WithdrawalPool.withdraw(ids, batchIds), WithdrawalPool.unqueueTokens, and SDLPool entry/exit paths. Intentionally does NOT pause PriorityPool deposit/withdraw — that contract’s Pausable is reserved for merkle distribution windows and should not be overloaded. Instead, a parallel guard is added at the PriorityPool call-site that consults EmergencyController.

Integration points:

· Override _transfer in StakingRewardsPoolBase.sol (currently has no hook) to call an internal _beforeTokenTransfer(from, to, amount) which consults EmergencyController.transfersPaused and EmergencyController.frozen(from) / EmergencyController.frozen(to).

· Add equivalent consults in SDL’s transfer path, and in the designated critical-path functions listed above. The check is a single external SLOAD in the default case — budget target ≤ 2,500 extra gas per transfer in the no-op path.

Activator whitelist:

· Primary: Hypernative’s existing authorized monitoring address (the same 1/1 address already authorized for RebaseController.pausePool()).

· Optional backup: a small additional set (e.g. 1–2 Council-held addresses) strictly as a failover in case Hypernative is unreachable. Backup addresses have identical powers — activate only, never deactivate.

· Whitelist is mutable by GovernanceTimelock so the DAO can add, rotate, or remove activator addresses through the standard SLURP path.

Scope boundaries:

· No generic blocklist / sanctions-list integration. The freeze list is scoped to “halt a known exploiter address within the incident window” — not compliance gating.

· Governance sunset: any transfer pause or individual freeze auto-expires after 14 days unless extended through the full SLURP path (reSDL Snapshot vote + Council ratification, executed by the DAO multisig via the timelock). Prevents indefinite activator-only state.

· The existing RebaseController.pausePool() / emergencyPauser path is unchanged. This module is additive.

Role & Trust Assumptions

· Activator (Hypernative monitoring address, plus optional backup): can flip transfersPaused=true, freeze individual addresses (capped), and set criticalPathsPaused. Cannot unpause, unfreeze, upgrade, or modify the whitelist.

· GovernanceTimelock (0xb72d…349F), executing transactions proposed by the DAO governance multisig (0xB351…493D) after the standard authorization flow (forum → reSDL Snapshot vote → Council ratifies → multisig executes): unpause, unfreeze, upgrade implementations, rotate activator whitelist, tighten or loosen any parameter.

· emergencyPauser (existing): unchanged. Retained for the existing RebaseController slashing flow; not expanded by this proposal.

Rationale & Design Choices

· Automated activation, governance-gated deactivation. Asymmetric trust — a false positive from the activator is a temporary pause (recoverable in hours via timelock); a true positive saves funds in the minutes that matter most.

· Single consolidated contract. Keeping the token hook and pause logic in one place — EmergencyController — gives Hypernative (and any future activator) one integration target, simplifies audit scope, and means the transfer-hook consult is a single external SLOAD.

· PriorityPool’s existing PausableUpgradeable is left alone. It is used for merkle distribution windows; overloading it for emergency response would produce operational ambiguity. Emergency-path gating for PriorityPool is achieved by adding an explicit EmergencyController consult, not by re-using Pausable.

· Freeze list is capped and time-boxed. Prevents the activator role from becoming a de-facto compliance blocklist. Any lasting freeze must be ratified through the standard SLURP path.

· No OFAC / sanctions integration. Keeps the protocol neutral. The freeze mechanism is scoped to active-incident response.

Backwards Compatibility

All additions are non-breaking at the ERC-20 / ERC-677 interface level — transfers, deposits, and withdrawals retain their signatures and revert semantics. Users and integrators see no API change in the default (unpaused, non-frozen) state. PriorityPool’s existing Pausable semantics — used for merkle distributions — are preserved unchanged. Upgrade is performed via the existing UUPS pattern under GovernanceTimelock authorization.

Security Considerations

· Hypernative monitor key compromise: worst case is a denial-of-service pause or a capped set of false-positive freezes. DAO unpauses / unfreezes via timelock. Signer rotation and Hypernative SLA should be documented on the forum.

· Hypernative monitor unreachable during incident: mitigated by the optional Council-held backup activator set.

· Timelock compromise: out of scope for this proposal — pre-existing trust assumption.

· Freeze-list abuse: mitigated by the hardcoded capacity limit, 14-day sunset, and timelock-only unfreeze.

· Transfer hook gas cost: must remain cheap in the no-op path — benchmark target ≤ 2,500 extra gas per transfer in the default case. Single external SLOAD + branch.

· Upgrade hook interaction: EmergencyController is UUPS-upgradeable under timelock, consistent with the rest of the codebase.

Implementation Plan

· Week 1: contract scaffolding — EmergencyController with activator whitelist, transfersPaused, frozen mapping, criticalPathsPaused; _beforeTokenTransfer hook wired into StakingRewardsPoolBase and SDL.

· Week 2: tests — unit, fuzz, and fork tests covering activate/deactivate flows, freeze/unfreeze, sunset expiry, cap enforcement, gas regression, and integration against the existing Hypernative monitor.

· Week 3: audit engagement — one pass from a recognized firm (Sigma Prime / Trail of Bits / ChainSecurity tier).

· Week 4: Council review of activator whitelist (Hypernative address + optional backup set), deployment proposal, on-chain vote, timelock queue, execution.

Indicative Cost (If Formalized)

· Engineering: 2–3 weeks of core-team time.

· Audit: one pass from a recognized firm (Sigma Prime / Trail of Bits / ChainSecurity tier).

· Bug bounty uplift: recommend raising Immunefi ceiling in line with added attack surface — would go in a separate SLURP.

Open Questions for the Forum

· Is there consensus that automated activation (Hypernative) is preferable to a manual Guardian multisig for the activation side of incident response?

· Should the activator whitelist be Hypernative-only, or include a small Council-held backup set as failover? If backup: how many addresses and who holds them?

· Is the 14-day auto-expiry on pauses and freezes the right window? Too short, too long?

· Should the freeze cap (suggested: 16 simultaneous addresses) be higher, lower, or parameterized by governance?

· Agreement that PriorityPool’s existing Pausable should be left alone (reserved for merkle distributions) and emergency gating added via an explicit EmergencyController consult?

· Does the community want the rate-limiting appendix developed further, or is it off the table entirely?

· Are there other emergency controls — not covered here — that should be part of the same module?

Appendix A — Optional Rate-Limiting (Not in Scope)

Included for transparency; not part of this proposal’s authorization request. A future proposal could add a rolling 24h withdrawal cap on PriorityPool and WithdrawalPool, expressed as min(absoluteCap, percentOfTVL), plus per-address cooldowns on large withdrawals. The UX cost on large depositors is non-trivial and warrants a dedicated discussion thread. The EmergencyController module addresses the exploit-response window more directly and should ship first.

Appendix B — Referenced Contracts

· Governance multisig (Gnosis Safe): 0xB351EC0FEaF4B99FdFD36b484d9EC90D0422493D

· GovernanceTimelock: 0xb72d8F5213b3E52FAf13Aa074b03C4788e78349F

· contracts/core/priorityPool/PriorityPool.sol (Pausable reserved for merkle distributions)

· contracts/core/priorityPool/WithdrawalPool.sol

· contracts/core/StakingPool.sol

· contracts/core/SecurityPool.sol

· contracts/core/RebaseController.sol (existing emergencyPauser / Hypernative path — unchanged)

· contracts/core/sdlPool/SDLPool.sol

· contracts/core/base/StakingRewardsPoolBase.sol (target for _beforeTokenTransfer hook)

· contracts/governance/GovernanceTimelock.sol

3 Likes

This is a highly in depth overview. Generally, I can’t comment as I am not a subject matter expert on all this.

The only “meta” counterpoint I may suggest is to ensure that vulnerabilities are not added as a result of adding attempted security features.

Basically we don’t want to add features that maybe unknowingly exploited as a result of trying to secure the system.

Thanks for looking into this @Asymmetric. As a non technical person, I’ll be honest that plenty of this is too technical for me, and need some clarification. I understand the Kelp exploit was the trigger to tighten security, but it isn’t used as 1:1 reference, as this is not about poisoned RPCs or state actors inflitrating the multisig, right? Could you spell out, in plain language, the specific threats this module is meant to mitigate inside stake.link? An OperatorVCS strategy bug, a corrupted merkle root in PriorityPool, a stolen-stLINK laundering window, something else? I think the SLURP would land much harder if the first paragraph said “this defends against X, Y, Z” and the design fell out of that.

wstLINK is not referenced here at all. Considering wstLINK is being used in DeFi now (Folks Finance, Morpho) and can be bridged via CCIP to multiple chains, it should be addressed as well. do SDL and wstLINK already have CCIP rate limits configured? If yes, can the doc cite the live values?

What happens in a case of a compromised Hypernative key, if they are the sole activator? what is the trade off here? a malicious flip would halt the protocol for the full timelock-gated unfreeze window, and the downstream effects on Curve, Morpho and the integrators could be significant. Two asks:

Could the proposal add a second independent guardian alongside Hypernative, so the activator role is not single-vendor?

Could the SLURP commit to publishing Hypernative’s documented SLA, mean time to flip, false-positive rate from prior deployments, and a written contingency if Hypernative ever disappears as a vendor (acquisition, contract lapse,key team turnover)? any info on that front in a simplified way will help us understand the risks/tradeoffs.

in the current draft, withdrawals get scoped-paused but deposits stay open. That feels backwards. If a user cannot withdraw, they should not be able to deposit either, otherwise someone deposits fresh funds mid-incident thinking the protocol is fighting back, and gets stuck. Can the next draft make pause symmetric on both sides?

No need to follow all the answers at all and in detail. I just want them on the record before this goes to a vote and being acknowledged by more technical individuals. Could the next iteration open with a one-paragraph plain English summary, written for non-technical DAO members, covering: what this defends against, what the worst case is if it goes wrong, what it costs to operate, and what the alternative was. The technical detail is good, but a vote-ready SLURP also needs framing for those of us who are not deep in the contracts.

Thanks @candide and @Tokenized2027, genuinely appreciate both of you engaging on a doc this dense. A pre-SLURP thread only works if people are willing to push back early, so this is exactly the input we were hoping to draw out before any Solidity is written.

@candide, on the meta-point: Every emergency control is itself an attack surface. The right way to think about whether any individual control earns its place is roughly expected loss prevented (probability of an incident in its scope, multiplied by severity if it hits) versus expected loss introduced (probability of misuse or key compromise, multiplied by blast radius of a wrongful activation). A control that doesn’t clearly clear that bar shouldn’t ship but for the discussed measures in here this holds true.

@Tokenized2027, taking your points in order:

  1. Plain-English framing: At the highest level, what this SLURP fixes is the gap between detection and response. Today, if an exploit is detected, whether that’s stolen liquid tokens being laundered, an active drain of withdrawal flows, or a known-exploiter address that needs to be stopped, the only on-chain response path runs through human signer coordination, which in the worst case takes longer than the drain itself. The Kelp incident is the trigger for the conversation, not a 1:1 threat model. Stake.link’s attack surfaces are different, but the shape of the problem (minutes-scale exploit, slower-than-that human response) is the same. The module gives the protocol a fast, narrowly-scoped way to halt transfers and critical paths while governance catches up, with activation that can run at machine speed and deactivation that stays firmly with the DAO.

  2. wstLINK and SDL: To be clear, both are in scope. wstLINK and SDL are covered by the same transfer-layer hook as stLINK, and any pause flag applies symmetrically across all three tokens. wstLINK’s CCIP path is the highest-leverage laundering route given Folks and Morpho integrations and multi-chain bridging, so it’s a first-class concern in the design, not an afterthought. The SLURP will cite the live CCIP per-lane rate limits from the Chainlink directory and note where stake.link’s own controls sit relative to them.

  3. Single-vendor activator: On reflection we think single-vendor activation is actually the right call here, and want to defend that rather than reverse it. The whole point of the module is to compress response time below what human coordination can achieve. Adding a second independent guardian by default reintroduces the coordination latency we’re trying to eliminate. The asymmetry that makes single-vendor activation acceptable is that activation is cheap and reversible (a wrongful pause is undone by the DAO via timelock), while deactivation, the privileged operation, stays with governance. That said, a lightweight backup makes sense for the case where Hypernative is unreachable rather than compromised, and Harris and Trotter or LinkPool multi-sig (2 signers) are a reasonable candidate for that role. Backup activator would have identical activate-only powers, never deactivate. On the compromise scenario you raised: the honest worst case if Hypernative’s key is taken and there’s no backup is a malicious transfersPaused=true that holds for the full timelock-gated unfreeze window, during which stLINK is non-transferable, which would break Curve LP exits, Morpho liquidations, and integrator transfer-liveness assumptions. That’s the cost the DAO is accepting in exchange for response speed, and it should be priced honestly rather than waved away.

  4. Pause symmetry: On reflection, deposits shouldn’t pause by default. The threats this module addresses are all on the exit side (transfers, withdrawals, laundering of stolen tokens), and pausing deposits doesn’t defend against any of them. New deposits add TVL, they don’t help an attacker extract it. The right principle is that pauses should be scoped to the vector actually under attack, which is what the v1 bitmap design supports: withdrawals pause when withdrawals are the issue, deposit accounting pauses only in the (rare) case where deposit logic is itself compromised. The legitimate concern you raised, a user depositing mid-incident and getting stuck, is a signaling problem, not a protocol-pause problem. A clear incident banner at the frontend is the right fix, not blocking a path that isn’t part of the exploit.

The piece we’d most like forum input on before formalizing: whether the room agrees that single-vendor activation (Hypernative primary, HT-style backup) is the correct trade-off for the response side, or whether anyone sees a second-independent-guardian model that doesn’t reintroduce coordination latency.

Thanks again to both of you for the time.

1 Like

Apologies for the belated reply. I wanted to think this through properly because I believe this is an important discussion.

I’m very positive and appreciate the discussion. I believe the earlier we start, the better. Right now, the consequences of a false positive, even something as serious as a temporary global transfer pause, would likely be much more contained than they could be in the future.

If stLINK/wstLINK becomes more deeply integrated across lending markets, LP venues, CCIP lanes, and other DeFi/institutional contexts, the same design decisions will carry much more weight.

In my view, this is probably the right moment to start defining the emergency framework while the asset footprint is still relatively contained.

That said, I think the eventual SLURP should aim to be clear about scope and activation criteria. I would like to better understand what Hypernative would actually monitor, what type of alerts could trigger an emergency pause, and whether different severity levels could map to different responses. For example, there is an important difference between a stake.link contract issue, stolen stLINK/wstLINK actively being moved or laundered, and an external integration having an isolated problem where stake.link assets are not directly part of the exploit path.

I do not think we need to solve all of that in this discussion, but making those boundaries legible will matter for governance and for integrator confidence.

I also think the asymmetry of the proposed model makes sense. Looking at other protocols, this general guardian/pauser pattern is clearly not unusual, but I do not think stake.link can simply copy another protocol’s emergency framework. We are fortunate to have an ecosystem design that already provides strong foundations for trust, including a professional set of reputable Chainlink node operators. The emergency layer should build on that, and be tailored to stake.link’s own architecture and growth path.

I would see this as something that probably needs to evolve over time. The emergency framework that makes sense today may not be the same one that makes sense once stLINK/wstLINK has deeper liquidity, more collateral integrations, more active CCIP lanes, or larger institutional relevance. The first version should be conservative, legible, and easy to reason about, but we should assume it will need to be reviewed and upgraded as the asset’s footprint expands.

I also agree with @Tokenized2027’s point that more context on Hypernative’s SLA, monitoring coverage, expected response assumptions, contingency assumptions, and operating cost would help the DAO reason about the trade-off.

The public context around Hypernative is very reassuring. It appears to be a widely used provider, there are public examples of monitoring and automated response integrations, and the broader model of fast detection plus bounded emergency action seems increasingly common. Success rate in timely manner looks really good, has saved Billions, many of which before exploit even took place.

In practice, the key questions are what would be monitored, which alerts could trigger automated action, how severity would be assessed, how the proposed backup activator would operate in Hypernative-unreachable scenarios, and what assumptions, costs, and responsibilities the DAO would be accepting.

I believe the CCIP side deserves careful treatment as well. Since wstLINK’s CCIP path is already considered a first-class concern, I think the eventual SLURP should spell out the relevant assumptions around live per-lane rate limits, in-flight messages, token pools, manual execution cases, and the gas overhead.

I also think the false-positive trade-off should remain very explicit. A mistaken or malicious pause may not steal funds, but it can still create real liveness problems for LP exits, liquidations, integrators, and user confidence. That does not make the design wrong. I think the trade-off is definitely worth it, but the DAO should be very clear about what risk it is accepting. Worth noting that false-positives seem extremely rare, but they are real (Flare 2 days ago, Iconic).

Overall, I support pushing this forward. My main view is that the strongest version of this is stake.link having a bounded emergency-containment layer, with narrow activation powers, governance-controlled recovery, and clear limits.

I’d be very interested to hear how others think about the right scope here, especially around external integrations, Hypernative’s role, and how conservative the first version should be.