[cxx-abi-dev] Run-time array checking

Thu Sep 13 03:07:39 UTC 2012

On Sep 12, 2012, at 7:15 PM, Mike Herrick wrote:
> On Sep 11, 2012, at 8:44 PM, John McCall wrote:
>> On Sep 11, 2012, at 12:28 PM, Mike Herrick wrote:
>>> On Sep 10, 2012, at 12:35 PM, John McCall wrote:
>>>> I wouldn't say option 2 is *out*, it's just not *sufficient*, in the same ways that __cxa_vec_new was never sufficient.
>>>> 
>>>> Would you mind writing up a formal proposal (or even a patch)?
>>> 
>>> Not at all (assuming we can figure out what the best course of action is).
>> 
>> Thanks!
>> 
>>>> At a high level I think the required changes are:
>>>> 
>>>> 1) Adding the new __cxa_throw_bad_array_new_length routine.  There's a still an open question here, I think:  it's a better user experience if std::bad_array_new_length carries the length argument.  Unfortunately (a) that's a bit complicated to encode as an operand to the routine, because we'd also need to track whether that's signed or unsigned, and (b) it looks like libc++ doesn't have space for carrying this information, and libstdc++ apparently hasn't been revised for this rule change yet.
>>> 
>>> We agree that having the length argument is desirable from a user's point of view, but it seems rather difficult for the compiler to convey this value to a library routine given that its type may be signed or unsigned and it may or may not be larger than size_t/ptrdiff_t.
>> 
>> I hadn't thought of the wider-than-size_t problem, although amusingly I did
>> remember that case when writing the bounds checks in clang.
>> 
>> Hmm.  At the risk of prescribing an overly complicated API, I would suggest:
>> void __cxa_throw_bad_array_new_length(uintptr_t sizeData, int flags);
>> where 'flags' is:
>> (sizeof(size) << 1) | std::is_signed<decltype(size)>::value
>> and where sizeData is either:
>> size, converted to a uintptr_t, if sizeof(size) <= sizeof(uintptr_t), or
>> &size otherwise (throwing it in some temporary memory).
>> Converting to a uintptr_t means zero-extending or sign-extending as appropriate.
> 
> I'm a little leery of passing size (sizeData) in this fashion.  [Also, std::uintptr_t appears to be optional in the standard.]

Well, the simpler alternative is to saturate to a size_t/ssize_t (saturation being unnecessary unless the size is actually of a larger type than size_t) and just pass a flag indicating whether it's signed.  That inherently loses information, of course.

I know there are platforms which don't provide std::uintptr_t, but are there platforms which *can't* support std::uintptr_t?  That is, is this a real limitation or a "some system headers are dumber than others" limitation?

> If we went this route, I'd argue to separate flags above into two separate arguments.

Is there a good reason to, other than to get a slightly prettier-looking API?  I know that minimizing function arguments seems like a micro-optimization, but I'm not sure that's inappropriate in this context;  we certainly already have users that begrudge us the size of array-new, and this entire discussion is about making it larger.  Every instruction helps.

>> If we're not going to add signed/oversized variants — both reasonable
>> choices, in my view — then the compiler can still use __cxa_vec_new*
>> as long as as it puts an appropriate check in front if either:
>> - sizeof(size) > sizeof(size_t)
>> - decltype(size) is signed
> 
>  - size < number_of_initialized_elements

Oh, yes, of course.  If it's a nested array allocation, we need to do that
overflow check outside as well.

>> This check is required if __cxa_throw_bad_array_new_length takes
>> any information about the size value and type.  I want it to take that
>> information.  However, if the consensus goes the other way and
>> __cxa_throw_bad_array_new_length does *not* take any information
>> about the size value, we can avoid this extra call in the extremely
>> common case that sizeof(element) > 1, because the overflow check
>> in __cxa_vec_new* will automatically trigger for negative values.
>> Thus we can skip all checking relating to "normal" signed size values,
>> and for "oversized" size values we can simply saturate at -1 or some
>> other value which is guaranteed to fail to allocate.
> 
> Assuming there are no architectures where this doesn't hold true, it sounds good to me.

We'd certainly have a lot more optimization flexibility if we don't try
to preserve the bad size value.  My worry is that we'd be *forcing* a poor
debugging experience on programmers — they'd have to reproduce the
problem in a debugger in order to have any idea what the bad value was.
I'll readily grant that this is already true for a large class of other bugs in C++.

Anyway, I've asked Howard Hinnant, Apple's C++ library maintainer, to
catch up on the discussion and weigh in.

John.