Topic: Optional std::array backend on int/float-array/matrices

I'm just dumping some ideas here, which might interest others. I'm not in any planning phase, and I have plenty of more important things to start with, but;

We have a fair few methods that construct could really benefit from stack allocation.

Naturally, we do need the dynamical size of the heap in many places if we wish to keep the virtual method calls everywhere (in order to cater to elements of all shapes and sizes), but on those places, copying the data would be so cheap I think it's easily worth it.

A few places where I think it would make a difference;

  • FEInterpolation :: evaldndX - For each implementation all the sizes are hardcoded anyway, only the input/output would need to be the stack allocations.

  • Node :: coordinates - Hardcoded 3D would and let 2D code just pad out with 0s would save on total memory and remove one indirection == much fewer cache misses on a *very* hot code

  • MaterialStatus* :: internal state variables - A fixed size (6) for the stress would in almost all cases both save memory and reduce cache misses, and make implementations easier. Same applies to all other internal variables that fit into S3, S3E, etc.

Before diggin into this, one needs to consider;
1. There can absolutely be no virtual method calls here, as they are very very expensive and would kill performance in critical code.
2. Do we template all of our FloatArray/etc. code and instanciate 2 versions (std::vector backend, FloatArray, and a std::array backend FloatArray<N>)
3. Do we look at other libraries that have already done all of this?

2 vs 3 is a tough one, and I can think of reasons for either approach. We need to be able to add specializations for all of our important methods if we go for 3, but there are really good libraries out there.

Re: Optional std::array backend on int/float-array/matrices

Example code of vector+array backends with conversion

#include <vector>
#include <array>
#include <algorithm>

class FloatArray;

template< unsigned int N > class FloatArrayT {
public:
  std :: array< double, N > values;
  void add(double val) { for (auto &v : values) v += val; }

  inline void operator=(const FloatArray &x);

};

class FloatArray {
public:
  std :: vector< double > values;
  inline FloatArray(std :: initializer_list<double> list): values(list) {}

  auto begin() { return values.begin(); }
  auto end() { return values.end(); }
  auto const begin() const { return values.begin(); }
  auto const end() const { return values.end(); }

  template< unsigned int N > inline FloatArray(const FloatArrayT<N> &v);
  template< unsigned int T > inline void operator=(const FloatArrayT<T> &x);
};

template< unsigned int N >
void FloatArrayT<N>::operator=(const FloatArray &x) { std::copy_n(x.begin(), N, values.begin()); }

template< unsigned int N > FloatArray::FloatArray(const FloatArrayT<N> &v): values(v.values.begin(), v.values.end()) {}
template< unsigned int T > void FloatArray::operator=(const FloatArrayT<T> &x) { std::copy(x.values.begin(), x.values.end(), values.begin()); }


double foo(double num) {
  FloatArray v = {num, num, num};

  FloatArrayT<3> a = {num, num, num};
  a = v;
  a.add(123.);

  FloatArray x = {1.,2.};
  x = a;
  FloatArray x2(a);

  return a.values[0];
}

3

Re: Optional std::array backend on int/float-array/matrices

If we can avoid virtual calls and avoid dynamic resizing of arrays we can save a lot. I would here add the following idea:
What if we will not assemble the individual element contributions in sequential order (which leads to necessary dynamic resizing and virtual calls) as we could not assume anything, but somehow sort elements into groups of elements of the same type and then proceed elements in individual groups. Inside the group, the sizes of working arrays can be fixed and even shared between elements in the group, so definitely no need to check and resize dimensions. However, there is overhead in "sorting" the elements, but perhaps this can be done when reading the problem, by introducing additional lists (one for each group), into which the element numbers will be added depending on their type.

Re: Optional std::array backend on int/float-array/matrices

Even if we have elements pre-sorted, under ideal conditions, I think we would still hit a lot of virtual calls, while only avoiding the high-level calls.
It's just such a convenient way to handle the generality of material models, cross-sections, FE-interpolators, etc. that it would be hard to give up.
Templating all the code, moving everything to compile time, would be the only way I could see significant reduction in virtual calls when e.g. computing the element stiffness contribution.