A Tale Of Two Upsamplers

Perpetual Technologies' P-1A Versus Assemblage D2D-1

I had the opportunity to compare Perpetual Technologies' P-1A and Sonic Frontiers/Assemblage D2D-1 on a friend's high resolution system. Both units can digitally upsample and interpolate standard 44.1 kHz 16 bit CDs to 96 kHz x 24. Priced at more than twice the D2D-1, the Perpetual unit can additionally do resolution enhancement in DSP. The P-1A also offers optional Speaker Correction for certain loudspeakers, and Room Correction is planned for the future. These compensate for errors in the room or speaker response by computing a numerical solution and applying it to the signal. Since these additional features were not installed in our P-1A, we did not test them. The P-1A is generally more flexible and can perform more kinds of operations, but our main interest was its behavior at attempting to increase the perceived resolution of CDs, since that is its most common use, so that's what we tested.

Get Your Hardware Here

The Assemblage D2D-1 consists of a Crystal CS8414 receiver, CS8420 Sample Rate Converter (SRC), and CS8404 transmitter, some glue and control logic, and the usual high-quality Assemblage power supply components. It also incorporates a proprietary Peter Schut designed dual-PLL with one quick-responding PLL and one very long time constant (10-20 seconds) PLL. The first PLL accepts a relatively broad range of clock frequencies and will lock to most any equipment quickly. The second PLL helps reduce jitter to a claimed 2 picoseconds or less, but will only lock to 44.1 and 48 kHz and related clocks of 88.2 and 96 kHz. 32 kHz as on sattelite DBS and terrestrial cable are left to the higher jitter first PLL only.

D2D-1 also implements an I2Se output which has separate clock lines over coax and data lines over twisted pair in a single 13W3 cable, which is more commonly found connecting UNIX workstations to their monitors. I2Se is used to communicate with the Assemblage DAC-3.1 or other DACs supporting this Ultra Analog designed interface such as the Sonic Frontiers Processor 3. D2D-1 also has S/PDIF on BNC, AES/EBU, and Audio Alchemy style I2S outputs so it can connect to pretty much any digital receiver or DAC. I2S would be the preferred interface to any DACs that supports it such as Audio Alchemy or Perpetual Technologies' separate P-3A DAC. While not widely adopted, I2Se is well designed and creates much less jitter than the standard AES/EBU or S/PDIF digital interfaces. The dual-PLLs attenuate jitter significantly, and the I2Se interface avoids re-introducing a large source of jitter when shipping data to the DAC. AES or S/PDIF interfaces combine clock and data over a single cable which results in the data modulating the clock. These are poorly-designed interfaces since they create data-related jitter. Though AES/EBU and S/PDIF are standard in the consumer world, in the professional audio world, common interfaces such as SDIF and others have separate clock and data lines. I2S has some of these same advantages as I2Se, but the latter is implemented with superior interface electronics, cables and connectors which should result in better performance.

P-1A has a similar internal Crystal chipset to D2D-1 but adds an Analog Devices SHARC DSP chip to perform resolution enhancement and other tricks. Both D2D-1 and P-1A have Crystal 8420 SRCs, but we wanted to use only the DSP from the P-1A. The P-1A is fully capable of Sample Rate Conversion, but we did not use it for that, since the DSP facilities are more unique, special and particular to it. To avoid using the Crystal 8420 SRC in the P-1A, we ran the P-1A in a non-upsampling mode. The original 44.1 kHz CDs came in and went out of the P-1A at the 44.1 kHz sampling rate. We used the P-1A's in "Bypass" and resolution enhancement modes so its DSP would change CD word lengths from 16 bits to 24. When used in conjunction with an external SRC (which we used the D2D-1 for), this is reportedly the P-1A manufacturer's preferred configuration.

When the P-1A was used, we ran it between the DVD player and D2D-1. When the P-1A was not used, we bypassed it by physically swapping Coaxial data cables to take it out of the circuit. In other words, when both P-1A and D2D-1 were both used the order of signal flow was:

DVD-transport -> P-1A -> D2D-1 -> DAC-3.1

When the P-1A was not used, the order of signal flow was:

DVD-transport -> D2D-1 -> DAC-3.1

The balance of equipment included an Assemblage DAC-3.1 Platinum fed from the D2D-1 via I2Se interface, Spectral DMC-20 Preamp fed unbalanced from the DAC, a commercial Pass X350 amplifier, Dunlavy SC-V speakers, with Nordost SPM Reference speaker cables and Quattro Fil interconnects. Room treatment consisted of felt curtains and fiberglass traps. Data cables were generic 75 ohm coax between an unmodified Pioneer DVD-414 and the P-1A and/or D2D-1, and Sonic Frontiers I2Se between D2D-1 and DAC-3.1. In separate listening tests the DVD player sounded better as a transport than several other CD players.

The DAC-3.1 has the PMD-200 Motorola-DSP-based filter, and the Platinum version adds K Grade Burr-Brown PCM1704 DACs, OP627AP op amps, Caddock resistors, Black Gate and OSCON electrolytics, over-the-top connectors, etc. It's the highest grade made, though we upgraded opamps to even tighter spec OPA627BPs. We ran the DAC-3.1's Motorola DSP-based PMD-200 filter in factory-default minimum dither mode. We changed the DAC-3.1 HDCD scaling jumper to be fixed rather than automatic. Automatic HDCD gain scaling is the factory setting and required under the HDCD license, but it attenuates non-HDCD discs in the digital domain resulting in a fewer bits output and a corresponding loss of resolution. In fixed mode, HDCD discs sound several dB quieter on average than non-HDCD, but the full digital resolution of all sources is preserved.

The overall resulting system is very high resolution, where differences between cables and most electronics are clearly audible.

Test Matrix

We listened to 2 recordings in 2 formats in 4 equipment configurations. Because we had already heard Chesky DVD-Videos with 96 kHz x 24-bit PCM stereo audio and felt they were of reference quality we used the DVD and CD versions of Sara K.'s Hobo, and John Faddis' Remeberances. We tried various tracks on these but focused on Sara K.'s version of the Commodore's Brick House and Faddis' version of Naima. We also auditioned Dave's True Story's Sex Without Bodies album in 96 x 24 DVD, but not on CD.

Generally we listened to the 96 x 24 DVD, then 44.1 x 16 CD, then CD upsampled/interpolated to 96 x 24 by D2D-1, then 16 to 24 interpolation and/or resolution enhancement by P-1A. As discussed above, when the P-1A was used, upsampling was provided by just the D2D-1 so as not to SRC twice. We occasionally changed orders or briefly changed back to another configuration to compare them.

The 96 x 24 DVD was passed through the D2D-1 in transparent mode for jitter reduction only. (In D2D-1 transparent mode, samples are bit-for-bit identical copies of the original.) 44.1 kHz CD were passed through D2D-1 in transparent mode, or upsampled/interpolated to 96 x 24 in the D2D-1. The P-1A was operated in Bypass and resolution enhancement modes.

We observed that P-1A's "Bypass" mode shifts levels by many dB and also obliterates HDCD coding, so it is not "transparent" in the sense above or in the typical, expected sense of a bypass, which usually means "do no processing". Perpetual claims bypass only adds dither but that does not appear to be the case. Transparent mode on the D2D-1 does not change the original signal data; no bits are changed, only jitter is removed via PLLs and prevented via I2Se.

Four Comparisons

CD converted to 96 x 24 with P-1A and D2D-1
Since we had access to a higher resolution original versions of the same recording, the most obvious question is which CD processing configuration sounds closest to the higher resolution 96 x 24 native master found on the DVD. (Chesky presumably created CD versions by downsampling and reducing the 96 x 24 originals presented in their full, amazing sounding glory on the DVD versions.) Both listeners felt that the P-1A was clearly changing the sound of the instruments, the space around the instruments, and the recorded room ambience. In many ways it was more pleasant with punchier microdynamics, more obvious harmonics, but with apparently reduced sibilance and other wind and breath type noises. Fretwork on Brick House acoustic bass was a lot clearer more apparent as was the string sound. Harmonica was more obviously placed and toned. Sara K.'s voice lost much fullness, breath and prominence in the mix however. Stage stomps that start part way through the song had more obvious strikes, but were less coherent, substantial and solid. However pleasant some of these changes may be, these effects were not found in the 96 x 24 DVD so they were judged to be non-original artifacts deliberately created through the DSP algorithms.
44.1 x 16 CD at original data rate
CDs played at their original 44.1 x 16 datarate had a soundstage limited to the space between the speakers and only projected forward a couple feet in front of the speakers' physical plane. Voice and instrument sounds were spatially flatter, more grainy sounding, and generally far less clear and less natural. On Brick House, Sara K.'s voice was thinner, weaker, flatter, coarser, bongo reverberation occupied less space, bongo strikes sounded less percussive and less natural, and the acoustic bass sounded far less informed. The general impression is of a far less detailed, less accurate, less natural, and more artificial rendition. Native rate CD really struggles to re-create the original event and ends up as a pale, cartoonish imitation. And this is with top equipment.
96 x 24 DVD
In comparison, the native 96 x 24 DVDs had a soundstage that extended well in front of and behind the plane of the speakers and a width far beyond the edges of the speakers to the room boundaries and perhaps past them. I have been listening at 96 x 24 for more than a year and to me the differences were very large, very obvious, and immediately apparent. Instrument sounds were far more natural, full bodied, and realistic. The space around the instruments were not isolated reverberation islands, instead the instruments on these minimally-miked, acoustically recorded shared a whole, uniform, common space. Since a real room was used as the recording venue rather than a sound-treated studio with electronic reverbs, this difference is perhaps to be expected. Bass and voice sounded smooth, subtle and detailed but natural. The stage stomp was solid, strong and clear, shaking the listening room. The music generally flowed much better, had a more consistent and vastly larger sense of space, more natural and realistic instrument sounds, and an easier, less straining presentation.
CD converted to 96 x 24 with D2D-1
Setting the D2D-1 to upsample and interpolate the 44.1 x 16 CDs to 96 x 24 brought some of the same advantages as native 96 x 24 DVD. In particular the instrument harmonics were cleaned up, spaces much clearer. Bass recovered quite a bit of the naturalness of the native 96 x 24 DVD. Sara K's voice regained some distinctive breathiness and subtle shadings also, though not to the full extent of the DVD. In general naturalness is greatly improved. It was apparent that there was not the ultimate resolution of the 96 x 24 native recordings, but the improvement of D2D-1 interpolation and upsampling was clearly in the same direction and of a larger than expected magnitude. I would say the D2D-1 made upsampled CD sound more like the 96 x 24 DVD than the native 44.1 CD. Many have commented that this seems to indicate that reconstruction/anti-aliasing filter artifacts at 44.1 x 16 must be quite significant, since moving the artifacts further away from the musical signal via the higher sample rate makes such a large difference. No real information can be added by upsampling/interpolation so most likely cause of this effect is through this better filter behavior. At higher sampling rates the filter is less able to affect the music. It does less harm. The inherently lower resolution 44.1 x 16 recordings ultimately have less resolution than the native 96 x 24s but it's nothing like the factor of 3 suggested by the raw datarate difference. In other words 96 x 24 native sounds better but not 3 times better than upsampled 44.1 x 16 CD.

Comparing the P-1A to D2D-1, the effect of the P-1A's DSP is to change the sound relative to the 96 x 24 native DVD, whereas the D2D-1's simpler upsampling and interpolation makes CDs sound closer to the higher resolution DVD masters. One could easily conclude that the D2D-1 (really the Crystal 8420 SRC) is therefore more faithful to the original signal. One should also consider that the 8420 in the P-1A can do the same basic (and arguably more accurate) interpolation and upsampling as the D2D-1's 8420 but P-1A can also add the DSP resolution enhancement effects. Those effects, while apparently artificial and not faithful to the higher resolution DVD, can be pleasing and may be particularly enjoyable on close-miked pop recordings, which certainly comprise a majority of recordings made today. On minimally miked jazz, classical or acoustic recordings the D2D-1 is probably preferable, however many or even most jazz and classical recordings are (unfortunately) multi-miked.

Conclusions

Perhaps it comes down to the oft-pondered philosophical dilemma about whether sound reproduction should endeavor to faithfully reconstruct the original event or whether it succeeds to the extent that it makes the listener happy. Is objective accuracy the goal, or is the goal simply to enjoy the music? Those who take high-fidelity at its name may argue the former. Others may prefer the latter for the perfectly valid reason that they listen for pleasure. For me the goal of hi-fi, science, and life in general is the pursuit of truth, and I find that accuracy and fidelity can be beautiful, and not just because they're intellectually honest. I prefer to get closer to the original event even if other ways may sound sweeter. That way when the occasional truly excellent recording comes along, I'm better equipped to appreciate its true beauty more fully and more accurately.

I'm fully convinced that all new recordings should be made at higher resolutions, with 20 bits at minimum. The practical problem now is in getting these into standard formats and out to buyers. I'd love to be able to do this over the Internet but the standards and last-mile bandwidth aren't here yet. So SACD, DVD-Video, and DVD-Audio are all possibilities. Watermarking must go, and recording practices must rise to the challenge of the new, much higher resolution media. DVD-Audio is by far the most flexible and in my opinion has the best chance for commercial success, especially if DVD-Universal players become common. Sound quality on good DVD-Audio should be on par with good 96 x 24 PCM DVD-Video. (The DVD FAQ has a good overview of the capabilities of the DVD family.) Until then, upsamplers like the D2D-1 and P-1A combined with good external DACs like the DAC-3.1 are a great way to hear existing CDs better than most of us have ever heard them and a lot closer to their true potential.

Additional Comments From Another Listener

First of all I'll say that for the comparisons done above I generally agree with Jeff. But the music we can get in both DVD and CD formats is limited. I have subsequently done a lot more listening and have some further observations on a variety of musical types.

We know any reconstruction of data is going to be imperfect, but the main issue in my mind is whether or not the P1-A algorithm is more enjoyable to listen to than regular 44.1 CDs or up-sampled CDs (via the Crystal 8420 only). Knowing something about DSP technology I can imagine any algorithm design will probably work well on some types of material and maybe only marginally on other material. I fundamentally believe that there is no one algorithm that can improve every piece of source material. When I say "improve" I use a broad definition in which an improvement could be either making something more enjoyable or making it closer to the original.

The following observations are based on selections from my CD collection. I have tried to detail the characteristics as I have observed them. I have not singled out pieces that are particularly bad.

Simply Baroque - Yo Yo Ma playing Bach cello concertos: P1 makes cello string and bow friction sound is pronounced. Resonance of the wood body is reduced. Resonance is still there, but less bloom/boom in the bass region. The sound seems smoother but simpler, like there is less non-harmonically related content. An idealized presentation of the instrument.
Diabolical Streak - Jill Tracy: The first track is heavily processed with lots of echo, phase distortion and intentionally added noise (intending to have the music sound like it was being received on an old AM tuner). The P1 changes the echo/phase sound and reduces the crackly noise.
Red - Laurie Anderson: Bass boom de-emphasized making bass attack seem stronger but overall power of bass is less. Cymbals are nice and crisp but they don't sound like brass; could be any metal. On the CD there are cues that let you know its brass.
Bone Machine - Tom Waits: First track has percussive instruments making click-clack racket. This sounds fairly realistic and clean direct from CD. With the P1 this cacophony is softened, put in the background and not live sounding.
The Mermen (San Francisco surf band), recorded live while I was in the room. Recording was done direct to digital VHS then copied to CD-R with no remixing.: The P1 again reduced the bass boom to levels which are way below what I experienced in the room. The CD is more realistic.

In conclusion I believe that the current P1 algorithm does some things to music that people may like to hear, but I fail to see that it is a more realistic presentation. It does not put you in the studio and remove layers of recording and playback distortion. It provides dynamic expansion, noise reduction and harmonic enhancement. (Sound like any other audio processes we know from the past?) On some music this generates a very pleasant presentation.

The P1 clearly messes up some features in recordings and will not pass HDCD when set in "bypass". The enhancement it does on some pieces is not beneficial enough to justify putting it in the processing chain and adding the additional step of testing each piece of music with and without the P1 before I listen.

Mike Hamilton

P-1A DSP Designer Reveals Some About P-1A Processing

On 9/4/2001, using the pseudonym misterdsp on the AudioAsylum.com message boards, the DSP designer of the P-1A responded with some answers to our collective questions. He clarifies that the 8420 in the P-1A is used for upsampling only, while the DSP does the interpolation and other trickery. misterdsp confirmed our hunches that the DSP in the P-1A is doing some codec- and companding-type signal processing:

"These are computationally intensive algorithms that first process the audio signal to look for certain characteristics, using some of the same kinds of preprocessing used in modern advanced codecs (e.g. multiresolution wavelet filtering). Here the goal is to use the signal characteristics as an aid to steer the modification of the upsampled data. In a practical sense, we modify the upsampler from a fixed, static process and make it dynamic. Accordingly, we modify the upsampling processing to best match the audio data at a specific instant."

He mentions that the pre-processing is used to select different types of dither and different interpolation strategies but does not go into any detail about the specifics. We also heard what sounds like single-ended dynamic range expansion and perhaps some single-ended noise reduction and his comments about using companding strategies seem to at least partially confirm some of this:

"Advanced codec theory has many decades of work in the identification area, usually applied to companding, and we take advantage of that research and apply the ideas to a different category of algorithm."

He doesn't attempt to justify whether these algorithms, which are more commonly used on 2-5 kHz bandwidth telephone and military radio circuits, are appropriate for high-fidelity audio. Yes, they can make some recodings sound better, but if the goal of hi-fi is to most accurately reproduce the original event, then they move us away from that goal. A visual analogy would be using MPEG edge enhancement to put dark black eyebrows and hair on the Mona Lisa. It may make the Mona Lisa look sharper on a DVD, but it does not bring us closer to Leonardo's vision of it. Instead it moves us away from the original event.

Since we compared Chesky CDs processed by the P-1A against full-resolution 96 x 24 PCM Chesky DVDs of the same recording and found that the P-1A was losing some information and adding other information not in the higher resolution version, we concluded that the P-1A is not accurately reproducing the original signal. While its changes are sometimes pleasing, sometimes they're not, and in either case it appears to be moving us away from the original. However clever and impressive the P-1A's processing may be, I cannot call that hi-fi.