V4 core speculative execution?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

V4 core speculative execution?

nop head
I am having a problem with code that works on 68K and V3 Coldfires not working on V4. I find it hard to believe, but it looks like the V4 core partially executes code that is branched around. Here is a distilled example of the sort of code that fails, not pretty but I think it is valid C: -

typedef void (*f_ty)(void);         // a function pointer

// A union of an integer and a pointer, initialised to an integer that is not a valid address
union {
    int x;
    f_ty  f;
} p = { 0x55555555 };

// This is the sort of sequence that fails
{
       dword d = p.x;
        f_ty f = p.f;
        if(d != 0x55555555)
             f();
 }

This compiles to: -

     movea.l  _p,a0
     cmpa.l  #0x55555555
     beq.s    skip
     jsr       (a0)
skip

So a0 should contain all 5's, which is not a valid address in my map, but the branch should prevent it being used as an address. What actually happends seems to depend on the code alignment, etc.

It can simply run past the code OK.

It can give an XLB interrupt with 55555554 in the address capture register.

It can lock up the processor. When I hit break in my BDM debugger the PC is a couple of instructions past this block.

If I set a breakpoint anywhere in the program the code runs OK, regardless of whether the breakpoint is triggered.

So it looks like the core sometimes does a pre-fetch on 55555555, even though it will not be executed. My questions are: -

Is this a known bug/feature in the V4 core?
Why does it sometimes lock the processor rather than giving an XLB interrupt.
Why does having a breakpoint active make it work?
Is that a workaround or do I have to find all the places in my code that this could happen an insert NOPs or something. A complete nightmare because I have a whole state machien architecture that relies on being able to contain addresses and some flags in the same 32 bit value!

Chris


[hidden email] Send a post to the list. [hidden email] Join the list. [hidden email] Join the list in digest mode. [hidden email] Leave the list.
Reply | Threaded
Open this post in threaded view
|

Re: V4 core speculative execution?

Tim Stoutamore
Chris,

 > I find it hard to believe, but it looks like the V4
 > core partially executes code that is branched around.

We have our RTOS running very reliably on V4E processors and aren't
having any problems like this. These issues are related to your
RAMBAR, ACR, and MMU configuration, and whether regions are set for
precise or imprecise accesses. Imprecise mode allows speculative
accesses, precise mode doesn't.

 > Why does it sometimes lock the processor rather than giving an XLB
 > interrupt.

May be related to the fact that bus errors in background debug mode
can hang the processor.

Best regards,
Tim

nop head wrote:

> I am having a problem with code that works on 68K and V3 Coldfires not
> working on V4. I find it hard to believe, but it looks like the V4 core
> partially executes code that is branched around. Here is a distilled
> example of the sort of code that fails, not pretty but I think it is
> valid C: -
>
> typedef void (*f_ty)(void);         // a function pointer
>
> // A union of an integer and a pointer, initialised to an integer that
> is not a valid address
> union {
>     int x;
>     f_ty  f;
> } p = { 0x55555555 };
>
> // This is the sort of sequence that fails
> {
>        dword d = p.x;
>         f_ty f = p.f;
>         if(d != 0x55555555)
>              f();
>  }
>
> This compiles to: -
>
>      movea.l  _p,a0
>      cmpa.l  #0x55555555
>      beq.s    skip
>      jsr       (a0)
> skip
>
> So a0 should contain all 5's, which is not a valid address in my map,
> but the branch should prevent it being used as an address. What actually
> happends seems to depend on the code alignment, etc.
>
> It can simply run past the code OK.
>
> It can give an XLB interrupt with 55555554 in the address capture register.
>
> It can lock up the processor. When I hit break in my BDM debugger the PC
> is a couple of instructions past this block.
>
> If I set a breakpoint anywhere in the program the code runs OK,
> regardless of whether the breakpoint is triggered.
>
> So it looks like the core sometimes does a pre-fetch on 55555555, even
> though it will not be executed. My questions are: -
>
> Is this a known bug/feature in the V4 core?
> Why does it sometimes lock the processor rather than giving an XLB
> interrupt.
> Why does having a breakpoint active make it work?
> Is that a workaround or do I have to find all the places in my code that
> this could happen an insert NOPs or something. A complete nightmare
> because I have a whole state machien architecture that relies on being
> able to contain addresses and some flags in the same 32 bit value!
>
> Chris
>
>
> [hidden email] Send a post to the list.
> [hidden email] Join the list. [hidden email]
> Join the list in digest mode. [hidden email] Leave the list.

------------------------------------------------------------
Tim Stoutamore, Principal Engineer
Blunk Microsystems, LLC
6576 Leyland Park Drive
San Jose, CA 95120-4558
Tel: 408/323-1758
[hidden email]
www.blunkmicro.com
------------------------------------------------------------
---
[hidden email]              Send a post to the list.
[hidden email]        Join the list.
[hidden email]    Join the list in digest mode.
[hidden email]     Leave the list.

Reply | Threaded
Open this post in threaded view
|

Re: V4 core speculative execution?

nop head
Hi Tim,


We have our RTOS running very reliably on V4E processors and aren't
having any problems like this. These issues are related to your
RAMBAR, ACR, and MMU configuration, and whether regions are set for
precise or imprecise accesses. Imprecise mode allows speculative
accesses, precise mode doesn't.
 
Yes I also have an RTOS running and as far as I can tell that works. Normally in C pointers point to a valid object, or one past the end of an array, or are NULL, in which case you could get away with speculatively derefencing them as long as address 0 was readable. It is only where I have unions where a pointer and a scalar share the same 32 bit field.

I am not using the MMU. 55555555 matches no ACRs RAMBARs or chip selects so it should use the default cache mode in CACR which is inhibited precise.
 


> Why does it sometimes lock the processor rather than giving an XLB
> interrupt.

May be related to the fact that bus errors in background debug mode
can hang the processor.

It locks up even with BDM not connected. When BDM is connected the debugger has to assert reset to get the CPU to enter background mode.

Oddly if I set A0 to an invalid address that doesn't make the branch condition, e.g. 55555554, then it actually tries to jump to that address and always gets an XLB bus error as expected. It only seems to be this speculative fetch that sometimes locks up, sometimes bus errors. For a particular program it will always do the same thing but if code moves about a bit it can either work, lock up, or bus error.

 I don't know whether this just applies to pointers to function or whether pointers to objects could be speculatively dereferenced as well. It seems like a major bug in the core but I can't find any referenc to it. I do vaguely remember a few years ago reading that somebody had to make address 0 readable to prevent bus errors on NULL pointers. I foolishly thought such a major bug would have been fixed by now.

The really wierd thing is that having an active breakpoint seems to fix it. Does that mean that having a dormant breakpoint modifies the processors prefetch behavior, reducing its performance? If so, it allows a whole new class of Heisenbugs.

Chris


Best regards,
Tim

nop head wrote:
I am having a problem with code that works on 68K and V3 Coldfires not working on V4. I find it hard to believe, but it looks like the V4 core partially executes code that is branched around. Here is a distilled example of the sort of code that fails, not pretty but I think it is valid C: -

typedef void (*f_ty)(void);         // a function pointer

// A union of an integer and a pointer, initialised to an integer that is not a valid address
union {
   int x;
   f_ty  f;
} p = { 0x55555555 };

// This is the sort of sequence that fails
{
      dword d = p.x;
       f_ty f = p.f;
       if(d != 0x55555555)
            f();
 }

This compiles to: -

    movea.l  _p,a0
    cmpa.l  #0x55555555
    beq.s    skip
    jsr       (a0)
skip

So a0 should contain all 5's, which is not a valid address in my map, but the branch should prevent it being used as an address. What actually happends seems to depend on the code alignment, etc.

It can simply run past the code OK.

It can give an XLB interrupt with 55555554 in the address capture register.

It can lock up the processor. When I hit break in my BDM debugger the PC is a couple of instructions past this block.

If I set a breakpoint anywhere in the program the code runs OK, regardless of whether the breakpoint is triggered.

So it looks like the core sometimes does a pre-fetch on 55555555, even though it will not be executed. My questions are: -

Is this a known bug/feature in the V4 core?
Why does it sometimes lock the processor rather than giving an XLB interrupt.
Why does having a breakpoint active make it work?
Is that a workaround or do I have to find all the places in my code that this could happen an insert NOPs or something. A complete nightmare because I have a whole state machien architecture that relies on being able to contain addresses and some flags in the same 32 bit value!

Chris


[hidden email] Send a post to the list. [hidden email] Join the list. [hidden email] Join the list in digest mode. [hidden email] Leave the list.

------------------------------------------------------------
Tim Stoutamore, Principal Engineer
Blunk Microsystems, LLC
6576 Leyland Park Drive
San Jose, CA 95120-4558
Tel: 408/323-1758
[hidden email]
www.blunkmicro.com
------------------------------------------------------------
---
[hidden email]              Send a post to the list.
[hidden email]        Join the list.
[hidden email]    Join the list in digest mode.
[hidden email]     Leave the list.


[hidden email] Send a post to the list. [hidden email] Join the list. [hidden email] Join the list in digest mode. [hidden email] Leave the list.
Reply | Threaded
Open this post in threaded view
|

Re: V4 core speculative execution?

nop head
Hi Tim,
 Sorry I was mistaken. 55555555 is cacheable and changing it to a non-cacheable address stops the bus error. The problem is though that 55555555 was just an arbirtary example. In general it could be anywhere, including I/O address, where random reads could cause some very obscure bugs.

Regards, Chris

2009/4/15 nop head <[hidden email]>
Hi Tim,


We have our RTOS running very reliably on V4E processors and aren't
having any problems like this. These issues are related to your
RAMBAR, ACR, and MMU configuration, and whether regions are set for
precise or imprecise accesses. Imprecise mode allows speculative
accesses, precise mode doesn't.
 
Yes I also have an RTOS running and as far as I can tell that works. Normally in C pointers point to a valid object, or one past the end of an array, or are NULL, in which case you could get away with speculatively derefencing them as long as address 0 was readable. It is only where I have unions where a pointer and a scalar share the same 32 bit field.

I am not using the MMU. 55555555 matches no ACRs RAMBARs or chip selects so it should use the default cache mode in CACR which is inhibited precise.
 


> Why does it sometimes lock the processor rather than giving an XLB
> interrupt.

May be related to the fact that bus errors in background debug mode
can hang the processor.

It locks up even with BDM not connected. When BDM is connected the debugger has to assert reset to get the CPU to enter background mode.

Oddly if I set A0 to an invalid address that doesn't make the branch condition, e.g. 55555554, then it actually tries to jump to that address and always gets an XLB bus error as expected. It only seems to be this speculative fetch that sometimes locks up, sometimes bus errors. For a particular program it will always do the same thing but if code moves about a bit it can either work, lock up, or bus error.

 I don't know whether this just applies to pointers to function or whether pointers to objects could be speculatively dereferenced as well. It seems like a major bug in the core but I can't find any referenc to it. I do vaguely remember a few years ago reading that somebody had to make address 0 readable to prevent bus errors on NULL pointers. I foolishly thought such a major bug would have been fixed by now.

The really wierd thing is that having an active breakpoint seems to fix it. Does that mean that having a dormant breakpoint modifies the processors prefetch behavior, reducing its performance? If so, it allows a whole new class of Heisenbugs.

Chris


Best regards,
Tim

nop head wrote:
I am having a problem with code that works on 68K and V3 Coldfires not working on V4. I find it hard to believe, but it looks like the V4 core partially executes code that is branched around. Here is a distilled example of the sort of code that fails, not pretty but I think it is valid C: -

typedef void (*f_ty)(void);         // a function pointer

// A union of an integer and a pointer, initialised to an integer that is not a valid address
union {
   int x;
   f_ty  f;
} p = { 0x55555555 };

// This is the sort of sequence that fails
{
      dword d = p.x;
       f_ty f = p.f;
       if(d != 0x55555555)
            f();
 }

This compiles to: -

    movea.l  _p,a0
    cmpa.l  #0x55555555
    beq.s    skip
    jsr       (a0)
skip

So a0 should contain all 5's, which is not a valid address in my map, but the branch should prevent it being used as an address. What actually happends seems to depend on the code alignment, etc.

It can simply run past the code OK.

It can give an XLB interrupt with 55555554 in the address capture register.

It can lock up the processor. When I hit break in my BDM debugger the PC is a couple of instructions past this block.

If I set a breakpoint anywhere in the program the code runs OK, regardless of whether the breakpoint is triggered.

So it looks like the core sometimes does a pre-fetch on 55555555, even though it will not be executed. My questions are: -

Is this a known bug/feature in the V4 core?
Why does it sometimes lock the processor rather than giving an XLB interrupt.
Why does having a breakpoint active make it work?
Is that a workaround or do I have to find all the places in my code that this could happen an insert NOPs or something. A complete nightmare because I have a whole state machien architecture that relies on being able to contain addresses and some flags in the same 32 bit value!

Chris


[hidden email] Send a post to the list. [hidden email] Join the list. [hidden email] Join the list in digest mode. [hidden email] Leave the list.

------------------------------------------------------------
Tim Stoutamore, Principal Engineer
Blunk Microsystems, LLC
6576 Leyland Park Drive
San Jose, CA 95120-4558
Tel: 408/323-1758
[hidden email]
www.blunkmicro.com
------------------------------------------------------------
---
[hidden email]              Send a post to the list.
[hidden email]        Join the list.
[hidden email]    Join the list in digest mode.
[hidden email]     Leave the list.



[hidden email] Send a post to the list. [hidden email] Join the list. [hidden email] Join the list in digest mode. [hidden email] Leave the list.