You can use the order() function directly without resorting to add-on tools — see this simpler answer which uses a trick right from the top of the example(order) code:
R> dd[with(dd, order(-z, b)), ]
b x y z
4 Low C 9 2
2 Med D 3 1
1 Hi A 8 1
3 Hi A 9 1
Edit some 2+ years later: It was just asked how to do this by column index. The answer is to simply pass the desired sorting column(s) to the order() function:
R> dd[order(-dd[,4], dd[,1]), ]
b x y z
4 Low C 9 2
2 Med D 3 1
1 Hi A 8 1
3 Hi A 9 1
R>
rather than using the name of the column (and with() for easier/more direct access).
Your choices
order from base
arrange from dplyr
setorder and setorderv from data.table
arrange from plyr
sort from taRifx
orderBy from doBy
sortData from Deducer
Most of the time you should use the dplyr or data.table solutions, unless having no-dependencies is important, in which case use base::order.
I recently added sort.data.frame to a CRAN package, making it class compatible as discussed here:
Best way to create generic/method consistency for sort.data.frame?
Therefore, given the data.frame dd, you can sort as follows:
dd <- data.frame(b = factor(c("Hi", "Med", "Hi", "Low"),
levels = c("Low", "Med", "Hi"), ordered = TRUE),
x = c("A", "D", "A", "C"), y = c(8, 3, 9, 9),
z = c(1, 1, 1, 2))
library(taRifx)
sort(dd, f= ~ -z + b )
If you are one of the original authors of this function, please contact me. Discussion as to public domaininess is here: https://chat.stackoverflow.com/transcript/message/1094290#1094290
You can also use the arrange() function from plyr as Hadley pointed out in the above thread:
library(plyr)
arrange(dd,desc(z),b)
Benchmarks: Note that I loaded each package in a new R session since there were a lot of conflicts. In particular loading the doBy package causes sort to return "The following object(s) are masked from 'x (position 17)': b, x, y, z", and loading the Deducer package overwrites sort.data.frame from Kevin Wright or the taRifx package.
#Load each time
dd <- data.frame(b = factor(c("Hi", "Med", "Hi", "Low"),
levels = c("Low", "Med", "Hi"), ordered = TRUE),
x = c("A", "D", "A", "C"), y = c(8, 3, 9, 9),
z = c(1, 1, 1, 2))
library(microbenchmark)
# Reload R between benchmarks
microbenchmark(dd[with(dd, order(-z, b)), ] ,
dd[order(-dd$z, dd$b),],
times=1000
)
Median times:
dd[with(dd, order(-z, b)), ] 778
dd[order(-dd$z, dd$b),] 788
library(taRifx)
microbenchmark(sort(dd, f= ~-z+b ),times=1000)
Median time: 1,567
library(plyr)
microbenchmark(arrange(dd,desc(z),b),times=1000)
Median time: 862
library(doBy)
microbenchmark(orderBy(~-z+b, data=dd),times=1000)
Median time: 1,694
Note that doBy takes a good bit of time to load the package.
library(Deducer)
microbenchmark(sortData(dd,c("z","b"),increasing= c(FALSE,TRUE)),times=1000)
Couldn't make Deducer load. Needs JGR console.
esort <- function(x, sortvar, ...) {
attach(x)
x <- x[with(x,order(sortvar,...)),]
return(x)
detach(x)
}
microbenchmark(esort(dd, -z, b),times=1000)
Doesn't appear to be compatible with microbenchmark due to the attach/detach.
m <- microbenchmark(
arrange(dd,desc(z),b),
sort(dd, f= ~-z+b ),
dd[with(dd, order(-z, b)), ] ,
dd[order(-dd$z, dd$b),],
times=1000
)
uq <- function(x) { fivenum(x)[4]}
lq <- function(x) { fivenum(x)[2]}
y_min <- 0 # min(by(m$time,m$expr,lq))
y_max <- max(by(m$time,m$expr,uq)) * 1.05
p <- ggplot(m,aes(x=expr,y=time)) + coord_cartesian(ylim = c( y_min , y_max ))
p + stat_summary(fun.y=median,fun.ymin = lq, fun.ymax = uq, aes(fill=expr))
(lines extend from lower quartile to upper quartile, dot is the median)
Given these results and weighing simplicity vs. speed, I'd have to give the nod to arrange in the plyr package. It has a simple syntax and yet is almost as speedy as the base R commands with their convoluted machinations. Typically brilliant Hadley Wickham work. My only gripe with it is that it breaks the standard R nomenclature where sorting objects get called by sort(object), but I understand why Hadley did it that way due to issues discussed in the question linked above.