Parallel coordinates



Figure 1: Constructing Parallel Coordinates

Parallel Coordinates is a visualization technique for multidimensional geometry with many applications for visual data analysis. A parallel-coordinates system is constructed by embedding a set of axes in parallel into a Cartesian coordinate system. Multidimensional points are represented using lines intersecting the parallel axes at the respective coordinates. Drawing a straight line for every point and every pair of axes yields the dashed lines in the figure. In the usual case, only the part between adjacent axes is actually rendered, resulting in a polyline for every multidimensional point (represented by the thick black line in the figure).

The Parallel Coordinates Matrix

Parallel Coordinates Matrix showing six dimensions of the Cars dataset.

The order of axes has a high impact on the appearance of patterns when constructing a parallel-coordinates plot. Similar to the scatterplot matrix (SPLOM), a parallel-coordinate plot visualizes a set of two-dimensional relations between every pair of adjacent axes. A common approach to help the user determine a "good" order of dimensions is to score pairwise plots, e.g. using a correlation measure.

The parallel-coordinates matrix (PCM) is uses the same approach as the scatterplot matrix: Show all pairwise relations. In contrast to the SPLOM, the PCM doesn't use a matrix-layout but effectively displays as list of parallel-coordinate plots with different axis orderings. Each of the single plots shows all dimensions of the data and all plots together show all pairwise relations. For more information, see the publication below or download the accompanying video here.

Continuous Parallel Coordinates

Figure 1: discrete parallel coordinates, random data
Figure 2: continuous parallel coordinates, random data
Figure 3: discrete parallel coordinates, tooth data
Figure 4: continuous parallel coordinates, tooth data

Continuous Parallel Coordinates visualize the density of lines instead of rendering "discrete" lines. This is particularly useful for large datasets where heavy overplotting might occur with the traditional approach. Ed Wegman provided such a line-density model for normally distributed data as early as 1991. In our work, we use a different density model which is applicable to any bivariate density distribution. Continuous Parallel Coordinates basically transform densities from a continuous 2D scatterplot to 2D parallel coordinates (i.e. with 2 axes) exploiting the point-line duality between Cartesian and parallel coordinates. Figure 1 shows an artificial dataset that was created on a 3-D grid with spatial resolution 10 x 10 x 10. For each voxel, two values were randomly drawn from a gaussian distribution. Figure 2 shows the corresponding continuous version. Figures 3 and 4 show a dataset that comprises a CT scan of a tooth with resolution 128 x 128 x 160. In addition to the value contained in the dataset, the gradient was computed at each voxel position. In this example, the sampling artifacts may lead to misinterpretation in the discrete parallel coordinates plot. First, there seems to exist a dense (white) region on the first dimension, which is not existent in continuous parallel coordinates. Second, a large number of values equal to one in the first dimension with a zero-length gradient are mapped to the same line, resulting in a visually striking peak in the density distribution. Again, this peak does not occur in continuous parallel coordinates. Finally, samples at "empty" regions which may be classified as background noise are naturally removed by continuous parallel coordinates.

Bundled Parallel Coordinates

The cars data displayed using traditional parallel coordinates. The axes denote the following attributes from left to right: consumption, displacement, horsepower, weight, acceleration, year.
The cars data bundled by number of engine cylinders (4,6, or 8).

A common way to visualize categorical data in parallel coordinates is to color each line by the class or category it belongs to. In many cases, such classes represent clusters of "similar" datapoints. Another technique that relies on geometry only is to employ bundling. Here, data samples are rendered as curves instead of lines. Then, all curves belonging to the same class or cluster are "bundled" together midway between adjacent axes. Bundles are easier to visually trace over all dimensions while retaining the original coordinates at the axes. The color channel is now free to be used for other mappings.