Feature request: transplanting architectural state into OpenSPARC
Hi All,
This is a follow-up from our conversation at FCRC/ISCA last week.
I have a feature request for the OpenSPARC RTL, and potentially the FPGA model, which would help researchers experiment with real workloads on OpenSPARC:
I would like to see a process for transplanting checkpointed architectural state of a booted and running T1000/T2000 system (arch. registers, MMU, and memory values) into the OpenSPARC RTL model.
Ideally, this would happen over a socket so the transplant doesn't depend on a particular full-system simulator (e.g., we use Simics/Niagara at CMU, but I expect that many people would like to interface with SAM).
I attempted this on my own a few months ago, with limited success. I wrote Verilog PLI/socket code that 1. slams many of the architectural registers and MMU and 2. replaces the interface in dimm.v with socket calls to a remote Simics instance (non-trivial code, has to reconstruct the phys addr and ECC bits to work properly). Much of this actually worked.I was not, however, able to redirect the core's PC to a new location and properly initiate an i-cache miss.
Perhaps someone with a better understanding of the fetch/thread select stages knows the right combination of registers, valid bits, fetch/miss queue entries could make this work...
Brainstorm:
An alternative approach (which avoids many, but not all, dependencies on the specific RTL revision and is arguably superior to my first approach) might be to write a hyper-privileged "context switch" ASM routine to load architectural state and get the processor boostrapped at a new PC.
This program could be like the usual validation test program, except it loads registers/MMU state from the hijacked DRAM model. Once these values have been loaded into the core, the caches can be invalidated to remove these temporary values (trivial with PLI code), and the full-system memory image can be enabled in the DRAM model.
I can make my code available, if desired.
Thanks!
Jared
Hi Jared,
Thanks for following up on this.
There could be wider interesting in this work and we feel that it makes a very good OpenSPARC based community project that can be hosted on opensparc.net.
If you are interested in contributing the code and taking it forward then we can get it hosted and have right people from Sun help you in making it a successsful one.
I will send you separate email on this topic.
Thank You.
> Hi All,
>
> This is a follow-up from our conversation at
> FCRC/ISCA last week.
>
> I have a feature request for the OpenSPARC RTL, and
> potentially the FPGA model, which would help
> researchers experiment with real workloads on
> OpenSPARC:
>
> I would like to see a process for transplanting
> checkpointed architectural state of a booted and
> running T1000/T2000 system (arch. registers, MMU, and
> memory values) into the OpenSPARC RTL model.
>
> Ideally, this would happen over a socket so the
> transplant doesn't depend on a particular full-system
> simulator (e.g., we use Simics/Niagara at CMU, but I
> expect that many people would like to interface with
> SAM).
>
> I attempted this on my own a few months ago, with
> limited success. I wrote Verilog PLI/socket code
> that 1. slams many of the architectural registers
> and MMU and 2. replaces the interface in dimm.v with
> socket calls to a remote Simics instance (non-trivial
> code, has to reconstruct the phys addr and ECC bits
> to work properly). Much of this actually worked.I
> was not, however, able to redirect the core's PC to a
> new location and properly initiate an i-cache miss.
>
> Perhaps someone with a better understanding of the
> fetch/thread select stages knows the right
> combination of registers, valid bits, fetch/miss
> queue entries could make this work...
>
> Brainstorm:
> An alternative approach (which avoids many, but not
> all, dependencies on the specific RTL revision and is
> arguably superior to my first approach) might be to
> write a hyper-privileged "context switch" ASM routine
> to load architectural state and get the processor
> boostrapped at a new PC.
>
> This program could be like the usual validation test
> program, except it loads registers/MMU state from the
> hijacked DRAM model. Once these values have been
> loaded into the core, the caches can be invalidated
> to remove these temporary values (trivial with PLI
> code), and the full-system memory image can be
> enabled in the DRAM model.
>
> I can make my code available, if desired.
>
> Thanks!
>
> Jared
Hi Jared,
Regarding the "alternative" approach you outline, I'd be happy to share some ideas that we have previously worked on, for using an assembly routine to initialize register state.
A lot of the code is very much dependent on the RTL environment, so you need to understand how existing assembly testcases are written, in particular to specify context values, set privileged/hyperprivileged bits, provide trap handlers etc.
Memory state can be most easily initialized by means of a data file containing ".word" directives, that gets assembled together with the initialization routine. The specification of addresses, contexts and other properties for memory pages depends on the RTL validation environment.
The outline of the assembly routine for initializing registers is:
1. create TLB entries for the init code, for both user context (from snapshot) and for nucleus context (context 0).
2. call special trap that returns with pstate.priv set to 1
3. set tl=1 (hence the need for step 1).
4. initialize TSTATE using CCR, ASI, pstate and CWP values from the snapshot
5. set TPC and TNPC to the PC and NPC values from the snapshot
6. load fp registers from the data section of the init code (the data section has to be synthesized for each snapshot)
7. for each window, set %cwp, then set %o and %l registers
8. set FSR, GSR, FPRS
9. set global register values using the setx synthetic instruction
10. set CWP, cansave, canrestore, cleanwin, otherwin regs
11. set TBA, WSTATE, PIL
12. Invoke the retry instruction to jump to the PC/NPC values from the snapshot (set in step 5).
You need to be careful about trap handlers. The best scenario is if there will be no traps when resuming from the snapshot. But a lot of cpu-intensive codes will still encounter tlb misses and spill/fill traps. You need to either provide them as part of the init code, or as part of the common environment code.
Hope this helps...
Vega Paithankar
Microelectronics/Architecture Technology Group