Sunday, 25 August 2013

Samsung caught artificially inflating Galaxy S4 benchmark results, fires back non-explanatory explanation

Samsung caught artificially inflating Galaxy S4 benchmark results, fires back non-explanatory explanation



Good, reliable benchmarks are important, but the only way to make certain they operate as intended is if they’re treated the same way by the hardware platforms they run on. When applications or processors start treating certain tests differently than others, it warps final results and leaves reviewers and readers with an inaccurate idea of total performance. Samsung is the latest company to get caught with its pants down in this regard; the company’s Galaxy S4 is artificially setting its GPU at a higher clock speed when it encounters a benchmark, and then reverting to the stock clock speed when you shift back to normal workloads.
Specifically, the international version of the Galaxy S4 (the one equipped with Samsung’s Exynos 5410 Octa) will boost the GPU clock to 532MHz, from 480MHz, if it detects that GLBenchmark 2.5.1, Antutu, or Quadrant is running. The team at Anandtech that investigated the problem dug further, and discovered a function, dubbed “BenchmarkBooster” buried inside the dynamic voltage and frequency scaling APK. That allows the GPU to set specific frequencies for specific titles.
GLBenchmark 2.5.1
This might not seem like much of a problem, given that both CPUs and GPUs are designed with variable clock speeds in mind these days. Intel, AMD, and Nvidia openly advertise “Turbo” modes as a feature, after all. The problem here, however, is with how this functionality is exposed to the end user. Intel, AMD, and Nvidia all have Turbo Modes, but these capabilities don’t just kick in when a particular benchmark is running; they’re available across the board. To put it another way: If AMD CPUs only clocked up to 4.5GHz while running SiSoft Sandra, Cinebench, and PCMark, that would be cheating. Since the overclocking is based on available TDP or thermal headroom, rather than an application detector, it isn’t.

Samsung’s half-hearted denial

Samsung has since hit back against these allegations, claiming:
Under ordinary conditions, the GALAXY S4 has been designed to allow a maximum GPU frequency of 533MHz. However, the maximum GPU frequency is lowered to 480MHz for certain gaming apps that may cause an overload, when they are used for a prolonged period of time in full-screen mode. Meanwhile, a maximum GPU frequency of 533MHz is applicable for running apps that are usually used in full-screen mode, such as the S Browser, Gallery, Camera, Video Player, and certain benchmarking apps, which also demand substantial performance.
Right. Except Samsung is using a whitelist, not a blacklist. In other words, the company doesn’t block “certain gaming apps” from causing an overload, it blocks all gaming apps from running at 533MHz, except for a handful of applications that just happen to be benchmarks. Again, that’s the opposite of what companies like AMD and Nvidia did a few years back when programs like Furmark became popular. In 2010-2011, Furmark was often used as a worst-case thermal virus tester for GPUs. While handy for reviewers, it could also kill GPUs if used improperly or run for too long. Both AMD and Nvidia introduced detection mechanisms to prevent the program from running at this point because there was a genuine risk of damage.
There is, of course, no chance whatsoever that this behavior was either accidental or the result of an errant whitelist. This is as deliberate as one can get.

Second blow for Antutu

This is the second time in a month that Antutu has been singled out as a benchmark behaving badly. An investigation earlier this month proved that Antutu’s latest version had been hyper-optimized for x86 in a fashion that left the Intel chips racing ahead of their ARM rivals thanks to a compiler switch and code optimizations that broke the benchmark’s functionality. When tested, Intel chips would fail to execute a loop for the necessary number of iterations, but reported they had done so regardless. This made Atom incorrectly appear far faster than ARM chips that were legitimately performing the test.
After the uproar, the Antutu app was quickly patched to a newer version that retained Intel’s own compiler for Atom (ARM binaries were compiled using GCC) but dropped the broken optimizations. Atom’s performance fell accordingly; the chip no longer smashes ARM’s performance ratings in this test. This time around, the problem doesn’t seem to have been Antutu’s fault, but this is why hardware reviewers have to keep an eye on all aspects of a smartphone’s ecosystem. The incentives to cheat — whether via compiler flags or benchmark-detection at the OS level — are enormous. Intel, ABI Research, and Antutu were roundly criticized for the debacle earlier this month, but Samsung has just as many reasonsto cheat as Intel does.
If the company is serious about operating in good faith, it’ll make dynamic GPU scaling available to any application that wants it and publicly document the feature. Implementing it only for particular tests clearly communicates what the real intention was.

0 comments:

Post a Comment