Config Router

  • Google Sheets
  • CCNA Online training
    • CCNA
  • CISCO Lab Guides
    • CCNA Security Lab Manual With Solutions
    • CCNP Route Lab Manual with Solutions
    • CCNP Switch Lab Manual with Solutions
  • Juniper
  • Linux
  • DevOps Tutorials
  • Python Array
You are here: Home / How to sort a dataframe by multiple column(s)

How to sort a dataframe by multiple column(s)

August 23, 2021 by James Palmer

You can use the order() function directly without resorting to add-on tools — see this simpler answer which uses a trick right from the top of the example(order) code:
R> dd[with(dd, order(-z, b)), ]
b x y z
4 Low C 9 2
2 Med D 3 1
1 Hi A 8 1
3 Hi A 9 1

Edit some 2+ years later: It was just asked how to do this by column index. The answer is to simply pass the desired sorting column(s) to the order() function:
R> dd[order(-dd[,4], dd[,1]), ]
b x y z
4 Low C 9 2
2 Med D 3 1
1 Hi A 8 1
3 Hi A 9 1
R>

rather than using the name of the column (and with() for easier/more direct access).

Your choices

order from base
arrange from dplyr
setorder and setorderv from data.table
arrange from plyr
sort from taRifx
orderBy from doBy
sortData from Deducer

Most of the time you should use the dplyr or data.table solutions, unless having no-dependencies is important, in which case use base::order.

I recently added sort.data.frame to a CRAN package, making it class compatible as discussed here:
Best way to create generic/method consistency for sort.data.frame?
Therefore, given the data.frame dd, you can sort as follows:
dd <- data.frame(b = factor(c("Hi", "Med", "Hi", "Low"), levels = c("Low", "Med", "Hi"), ordered = TRUE), x = c("A", "D", "A", "C"), y = c(8, 3, 9, 9), z = c(1, 1, 1, 2)) library(taRifx) sort(dd, f= ~ -z + b ) If you are one of the original authors of this function, please contact me. Discussion as to public domaininess is here: https://chat.stackoverflow.com/transcript/message/1094290#1094290 You can also use the arrange() function from plyr as Hadley pointed out in the above thread: library(plyr) arrange(dd,desc(z),b) Benchmarks: Note that I loaded each package in a new R session since there were a lot of conflicts. In particular loading the doBy package causes sort to return "The following object(s) are masked from 'x (position 17)': b, x, y, z", and loading the Deducer package overwrites sort.data.frame from Kevin Wright or the taRifx package. #Load each time dd <- data.frame(b = factor(c("Hi", "Med", "Hi", "Low"), levels = c("Low", "Med", "Hi"), ordered = TRUE), x = c("A", "D", "A", "C"), y = c(8, 3, 9, 9), z = c(1, 1, 1, 2)) library(microbenchmark) # Reload R between benchmarks microbenchmark(dd[with(dd, order(-z, b)), ] , dd[order(-dd$z, dd$b),], times=1000 ) Median times: dd[with(dd, order(-z, b)), ] 778 dd[order(-dd$z, dd$b),] 788 library(taRifx) microbenchmark(sort(dd, f= ~-z+b ),times=1000) Median time: 1,567 library(plyr) microbenchmark(arrange(dd,desc(z),b),times=1000) Median time: 862 library(doBy) microbenchmark(orderBy(~-z+b, data=dd),times=1000) Median time: 1,694 Note that doBy takes a good bit of time to load the package. library(Deducer) microbenchmark(sortData(dd,c("z","b"),increasing= c(FALSE,TRUE)),times=1000) Couldn't make Deducer load. Needs JGR console. esort <- function(x, sortvar, ...) { attach(x) x <- x[with(x,order(sortvar,...)),] return(x) detach(x) } microbenchmark(esort(dd, -z, b),times=1000) Doesn't appear to be compatible with microbenchmark due to the attach/detach. m <- microbenchmark( arrange(dd,desc(z),b), sort(dd, f= ~-z+b ), dd[with(dd, order(-z, b)), ] , dd[order(-dd$z, dd$b),], times=1000 ) uq <- function(x) { fivenum(x)[4]} lq <- function(x) { fivenum(x)[2]} y_min <- 0 # min(by(m$time,m$expr,lq)) y_max <- max(by(m$time,m$expr,uq)) * 1.05 p <- ggplot(m,aes(x=expr,y=time)) + coord_cartesian(ylim = c( y_min , y_max )) p + stat_summary(fun.y=median,fun.ymin = lq, fun.ymax = uq, aes(fill=expr)) (lines extend from lower quartile to upper quartile, dot is the median) Given these results and weighing simplicity vs. speed, I'd have to give the nod to arrange in the plyr package. It has a simple syntax and yet is almost as speedy as the base R commands with their convoluted machinations. Typically brilliant Hadley Wickham work. My only gripe with it is that it breaks the standard R nomenclature where sorting objects get called by sort(object), but I understand why Hadley did it that way due to issues discussed in the question linked above.

Related

Filed Under: Uncategorized

Recent Posts

  • How do I give user access to Jenkins?
  • What is docker volume command?
  • What is the date format in Unix?
  • What is the difference between ARG and ENV Docker?
  • What is rsync command Linux?
  • How to Add Music to Snapchat 2021 Android? | How to Search, Add, Share Songs on Snapchat Story?
  • How to Enable Snapchat Notifications for Android & iPhone? | Steps to Turn on Snapchat Bitmoji Notification
  • Easy Methods to Fix Snapchat Camera Not Working Black Screen Issue | Reasons & Troubleshooting Tips to Solve Snapchat Camera Problems
  • Detailed Procedure for How to Update Snapchat on iOS 14 for Free
  • What is Snapchat Spotlight Feature? How to Make a Spotlight on Snapchat?
  • Snapchat Hack Tutorial 2021: Can I hack a Snapchat Account without them knowing?

Copyright © 2025 · News Pro Theme on Genesis Framework · WordPress · Log in