Skip to content

Modify & Aggregate

Modify

(dataframe-modify df new-names names procedure ...)

returns: a dataframe with new or replaced columns. Each procedure takes the columns in the corresponding names list as arguments. If a name in new-names already exists in df, that column is replaced; otherwise a new column is added. Expressions in later procedures can reference columns created by earlier ones in the same call.

If names is empty (), procedure must return either a scalar (repeated to fill the column) or a list of length equal to the number of rows.

(dataframe-modify* df (new-name names expr) ...)

returns: the same as dataframe-modify using a more concise implicit-lambda syntax. Each clause is (new-name (col ...) expr).

> (define df
    (make-df*
      (grp "a" "a" "b" "b" "b")
      (trt 'a 'b 'a 'b 'b)
      (adult 1 2 3 4 5)
      (juv 10 20 30 40 50)))

> (dataframe-display
    (dataframe-modify*
      df
      (grp (grp) (string-upcase grp))
      (total (adult juv) (+ adult juv))
      (prop-juv (juv total) (/ juv total))
      (scalar () 42)
      (lst () '(2 4 6 8 10))))

 dim: 5 rows x 8 cols
     grp     trt   adult     juv   total  prop-juv  scalar     lst
   <str>   <sym>   <num>   <num>   <num>     <num>   <num>   <num>
       A       a      1.     10.     11.     10/11     42.      2.
       A       b      2.     20.     22.     10/11     42.      4.
       B       a      3.     30.     33.     10/11     42.      6.
       B       b      4.     40.     44.     10/11     42.      8.
       B       b      5.     50.     55.     10/11     42.     10.

(dataframe-modify-at df procedure name ...)

returns: a dataframe with the specified columns transformed by procedure, which must accept exactly one argument.

(dataframe-modify-all df procedure)

returns: a dataframe with every column transformed by procedure.

> (dataframe-display (dataframe-modify-at df symbol->string 'grp 'trt))

 dim: 5 rows x 4 cols
     grp     trt   adult     juv
   <str>   <str>   <num>   <num>
       a       a      1.     10.
       a       b      2.     20.
       b       a      3.     30.
       b       b      4.     40.
       b       b      5.     50.

> (dataframe-display
    (dataframe-modify-all (make-df* (a 1 2 3) (b 4 5 6) (c 7 8 9))
                          (lambda (x) (* x 100))))

 dim: 3 rows x 3 cols
       a       b       c
   <num>   <num>   <num>
    100.    400.    700.
    200.    500.    800.
    300.    600.    900.

Aggregate

(dataframe-aggregate df group-names new-names names procedure ...)

returns: a dataframe where df is split by group-names and each group is summarised by applying each procedure to the corresponding names columns.

(dataframe-aggregate* df group-names (new-name names expr) ...)

returns: the same using implicit-lambda syntax. Each clause is (new-name (col ...) expr).

> (define df
    (make-df*
      (grp 'a 'a 'b 'b 'b)
      (trt 'a 'b 'a 'b 'b)
      (adult 1 2 3 4 5)
      (juv 10 20 30 40 50)))

> (dataframe-display
    (dataframe-aggregate*
      df
      (grp)
      (adult-sum (adult) (sum adult))
      (juv-sum (juv) (sum juv))))

 dim: 2 rows x 3 cols
     grp  adult-sum  juv-sum
   <sym>      <num>    <num>
       a         3.      30.
       b        12.     120.

;; multiple grouping columns
> (dataframe-display
    (dataframe-aggregate*
      df
      (grp trt)
      (adult-sum (adult) (sum adult))
      (juv-sum (juv) (sum juv))))

 dim: 4 rows x 4 cols
     grp     trt  adult-sum  juv-sum
   <sym>   <sym>      <num>    <num>
       a       a         1.      10.
       a       b         2.      20.
       b       a         3.      30.
       b       b         9.      90.