You are mistaking the problem here. The issue will always be allocating sufficient memory, thus, the memory required will be what is important.. Computing the actual permutations is fast, IF you have sufficient memory, and a fast enough computer to access that memory.
However, the purely computational complexity is trivial. It is O(factorial(n)*n), where n is the number of elements to permute. Not surprisingly, that is also the number of elements in the array to be created, and this is a direct measure of the memory required.
That is, there are factorial(n) permutations, each of which is represented by a vector of length n to be stored in memory. Regardless of how you compute the numbers. you need to compute factorial(n) permutations. Therefore the complexity will grow as factorial(n). It matters not how it is cloaked in loops. (Trying to unravel the loops is a task that will only confuse things, if that somehow convinces you that it can do the work more quickly.) And since each permutation is a vector of length n...
That the actual computations are pretty efficient is my initial point, that what matters is not in fact the computational complexity. The real issue lies in how much memory you need to allocate, access, and then fill. And since the amount of memory grows faster than exponentially, the problem becomes the amount of memory, since it is simply filling an array of size factorial(n) by n elements.
Your computer operates in a way that will be linear in the number of elements it needs to stuff. And that number of elements grows as O(factorial(n)*n). Therefore the final overall computational complexity of the result MUST be O(factorial(n)*n).