And refer to CAFormer correctly.
By using the smaller resolution, it starts noticing 2girls, otherwise the output appears similar.