HansJAvanLeeuwen
You're correct--the standard number is IEC 61094. Specifically, 61094-5 is the part that describes the pressure calibration procedure to which I referred.
You're also correct that the method of “pressure calibration by comparison” from the standard assumes that 2 microphones of the same shape and diameter are being compared.
As I mentioned, my concern about the high frequency response was how it differed from measurements on 2 previous iPhone models, which agreed much more closely with each other. All 3 models were measured with precisely the same procedure, so the differences between them would be less influenced by the imperfections (i.e. poor assumptions) in the measurement process, itself.
The removal of the protective grid is to bring the microphones in closer proximity to one another and is the proper procedure per the standard. Since the calibration is a pressure calibration, there is no other need to keep the grid in place other than the protection of the microphone. Of course, the iPhone's microphone cannot conveniently be removed from its own 'protective grid' which will necessarily lead to some degree of error in the measurement, increasing at higher frequencies. In fact, the different mechanical design of the iPhone 17 Pro, vs earlier models, is perhaps the most likely reason for the differences in the high frequency response.
As for the best way to measure the frequency response of an iPhone, this is something I plan to explore more in the future. I have a small anechoic chamber and a very precise automated microphone positioning system that I can use for repeatable results with different orientations of the microphones. However, I expect that a side-by-side comparison of the microphones will not improve measurement accuracy. In fact, I expect that arrangement to actually be worse at high frequencies. A better alternative would likely be to perform a free-field calibration by comparison, which would require the microphones to be measured one at a time, in precisely the same physical location. This approach also has its challenges.
Is it then correct to compare measuring systems with pink noise?
When to use pink noise depends on a number of factors. You mentioned working with discrete tones. If the response of the measurement system is important in the range of those tonal frequencies, then pink noise might not be the best excitation signal for comparing measurement systems, but that depends on how you intend to measure the measurement systems.
I'm not sure I understand your last question. Are you asking about the linearity of the iPhone response (i.e. how much the response changes as the sound level increases)? If so, I don't have an answer for you. This is another question I would like to explore in my anechoic chamber.
Ben