ggparcoord {GGally} | R Documentation |

`ggplot2`

graphics package.`scale`

is a character string that denotes how to
scale the variables in the parallel coordinate plot.
Options:

`std`

: univariately, subtract mean and divide by standard deviation`robust`

: univariately, subtract median and divide by median absolute deviation`uniminmax`

: univariately, scale so the minimum of the variable is zero, and the maximum is one`globalminmax`

: no scaling is done; the range of the graphs is defined by the global minimum and the global maximum`center`

: use`uniminmax`

to standardize vertical height, then center each variable at a value specified by the`scaleSummary`

param`centerObs`

: use`uniminmax`

to standardize vertical height, then center each variable at the value of the observation specified by the`centerObsID`

param

ggparcoord(data, columns, groupColumn = NULL, scale = "std", scaleSummary = "mean", centerObsID = 1, missing = "exclude", order = columns, showPoints = FALSE, alphaLines = 1, boxplot = FALSE, shadeBox = NULL, mapping = NULL, title = "")

`data` |
the dataset to plot |

`columns` |
a vector of variables (either names or indices) to be axes in the plot |

`groupColumn` |
a single variable to group (color) by |

`scale` |
method used to scale the variables (see Details) |

`scaleSummary` |
if scale=="center", summary statistic to univariately center each variable by |

`centerObsID` |
if scale=="centerObs", row number of case plot should univariately be centered on |

`missing` |
method used to handle missing values (see Details) |

`order` |
method used to order the axes (see Details) |

`showPoints` |
logical operator indicating whether points should be plotted or not |

`alphaLines` |
value of alpha scaler for the lines of the parcoord plot |

`boxplot` |
logical operator indicating whether or not boxplots should underlay the distribution of each variable |

`shadeBox` |
color of underlaying box which extends from the min to the max for each variable (no box is plotted if shadeBox == NULL) |

`mapping` |
aes string to pass to ggplot object |

`title` |
character string denoting the title of the plot |

`missing`

is a character string that denotes how to
handle missing missing values. Options:

`exclude`

: remove all cases with missing values`mean`

: set missing values to the mean of the variable`median`

: set missing values to the median of the variable`min10`

: set missing values to 10% below the minimum of the variable`random`

: set missing values to value of randomly chosen observation on that variable

`order`

is either a vector of indices or a character
string that denotes how to order the axes (variables) of
the parallel coordinate plot. Options:

`(default)`

: order by the vector denoted by`columns`

`(given vector)`

: order by the vector specified`anyClass`

: order variables by their separation between any one class and the rest (as opposed to their overall variation between classes). This is accomplished by calculating the F-statistic for each class vs. the rest, for each axis variable. The axis variables are then ordered (decreasing) by their maximum of k F-statistics, where k is the number of classes.`allClass`

: order variables by their overall F statistic (decreasing) from an ANOVA with`groupColumn`

as the explanatory variable (note: it is required to specify a`groupColumn`

with this ordering method). Basically, this method orders the variables by their variation between classes (most to least).`skewness`

: order variables by their sample skewness (most skewed to least skewed)`Outlying`

: order by the scagnostic measure, Outlying, as calculated by the package`scagnostics`

. Other scagnostic measures available to order by are`Skewed, Clumpy, Sparse, Striated, Convex, Skinny, Stringy,`

and`Monotonic`

. Note: To use these methods of ordering, you must have the`scagnostics`

package loaded.

ggplot object that if called, will print

Jason Crowley crowley.jason.s@gmail.com, Barret Schloerke schloerke@gmail.com, Di Cook dicook@iastate.edu, Heike Hofmann hofmann@iastate.edu, Hadley Wickham h.wickham@gmail.com

# use sample of the diamonds data for illustrative purposes diamonds.samp <- diamonds[sample(1:dim(diamonds)[1], 100), ] # basic parallel coordinate plot, using default settings ggparcoord(data = diamonds.samp, columns = c(1, 5:10)) # this time, color by diamond cut ggparcoord(data = diamonds.samp, columns = c(1, 5:10), groupColumn = 2) # underlay univariate boxplots, add title, use uniminmax scaling ggparcoord(data = diamonds.samp, columns = c(1, 5:10), groupColumn = 2, scale = "uniminmax", boxplot = TRUE, title = "Parallel Coord. Plot of Diamonds Data") # utilize ggplot2 aes to switch to thicker lines ggparcoord(data = diamonds.samp, columns = c(1, 5:10), groupColumn = 2, title = "Parallel Coord. Plot of Diamonds Data", mapping = aes(size = 1)) # basic parallel coord plot of the msleep data, using 'random' imputation # and coloring by diet (can also use variable names in the columns and # groupColumn arguments) ggparcoord(data = msleep, columns = 6:11, groupColumn = "vore", missing = "random", scale = "uniminmax") # center each variable by its median, using the default missing value # handler, 'exclude' ggparcoord(data = msleep, columns = 6:11, groupColumn = "vore", scale = "center", scaleSummary = "median") # with the iris data, order the axes by overall class (Species) separation # using the anyClass option ggparcoord(data = iris, columns = 1:4, groupColumn = 5, order = "anyClass") # add points to the plot, add a title, and use an alpha scalar to make the # lines transparent ggparcoord(data = iris, columns = 1:4, groupColumn = 5, order = "anyClass", showPoints = TRUE, title = "Parallel Coordinate Plot for the Iris Data", alphaLines = 0.3)

[Package *GGally* version 0.3.4 Index]