pad step_workspace to 64 bytes, to speed up access to gct->steps[]