Pandas Extension¶
The Pandas extension provides the code generator, which targes Pandas.
Examples
Create an instance of a PandasCodeGenerator:
from nldsl import PandasCodeGenerator()
code_gen = PandasCodeGenerator()
Generate the code: x = y[y.col1 == m & y.col3.isin([v1, v2, v3])]:
model = "## x = on y | select rows y.col1 == m and y.col3 in [v1, v2, v3]"
result = self.code_gen(model)[0]
Generate the code: print(df.rename(columns={‘old1’: ‘new1’, ‘old2’: ‘new2’})):
model = "## on df | rename columns old1 to new1, old2 to new2 | show"
result = self.code_gen(model)[0]
Define a new rule and use it to generate code:
model_def = "#$ my pipeline $[$old to $new] = rename columns $[$old to $new] | show"
model_eva = "## on df | my pipeline old1 from new1, old2 from new2"
result = self.code_gen(model_def + model_eva)[0]
Grammar Rules¶
-
nldsl.pandas_extension.
expression_only
(code, args, env=None)¶ Parses python expressions outside of a pipeline.
Examples
x = 5
x = y % (5.7 + z / 2)
x = w and (y or not z)
Grammar
!expr
- Parameters
expr (expression) – The expression to be evaluated.
Type
Function
-
nldsl.pandas_extension.
on_dataframe
(code, args, env=None)¶ Starts a pipeline on the given DataFrame.
Examples
x = on df
x = on df | transformer 1 … | transformer n … | operation
Grammar
on $dataframe
- Parameters
dataframe (variable) – The name of DataFrame
Type
Initialization
-
nldsl.pandas_extension.
create_dataframe
(code, args, env=None)¶ Creates a new DataFrame from an list.
Examples
x = create dataframe from my_data with header ‘col1’, ‘col2’, ‘col3’
Grammar
create dataframe from $data with header $header[$col_name]
- Parameters
data (variable) – The data from which to create the dataframe.
header (list) – A list of column names
Type
Initialization
-
nldsl.pandas_extension.
load_from
(code, args, env=None)¶ Load a DataFrame from a file.
Examples
x = load from “my_file.json” as json
x = load from “my_file.csv” as csv | drop duplicates
Grammar
load from $path as $type type := { json, csv }
- Parameters
path (variable) – A string containing the path to a file.
type (variable) – The type of the file.
Type
Initialization
-
nldsl.pandas_extension.
save_to
(code, args, env=None)¶ Save a DataFrame to a file.
Examples
on df | save to “my_file.json” as json
on df | save to “my_file.csv” as csv
Grammar
save to $path as $type type := { json, csv }
- Parameters
path (variable) – A string containing the path to a file.
type (variable) – The type of the file.
Type
Operation
-
nldsl.pandas_extension.
union
(code, args, env=None)¶ Compute the union of rows.
Examples
x = on df | union other
Grammar
union $table
- Parameters
table (variable) – The table from which all rows will be added.
Type
Transformation
-
nldsl.pandas_extension.
difference
(code, args, env=None)¶ Remove all rows which are contained in table.
Examples
x = on df | difference other
Grammar
difference $table
- Parameters
table (variable) – The table which contains all rows that shall be removed.
Type
Transformation
-
nldsl.pandas_extension.
intersection
(code, args, env=None)¶ Remove all rows which are not contained in table.
Examples
x = on df | intersection other
Grammar
intersection $table
- Parameters
table (variable) – The table which contains all rows that shall not be removed.
Type
Transformation
-
nldsl.pandas_extension.
select_columns
(code, args, env=None)¶ Select certain columns from a DataFrame.
Examples
## x = on df | select columns df.col1, col2, df.col3
## x = on df | select columns “col1”, “col2”, “col3”
Grammar
select columns $columns[$col]
- Parameters
columns (varlist) – A list of column names.
Type
Transformation
-
nldsl.pandas_extension.
select_rows
(code, args, env=None)¶ Select the rows of a DataFrame based on some condition.
The condition can be composed out of boolean, comparison and arithmetic expression. The operator precedence is equivalent to python and it is possible to use brackets to modify it.
Examples
## x = on df | select rows df.col1 > (14.2 + z) and df.col2 == ‘A’
## x = on df | select rows df.col1 != 0 and not df.col2 in [3, 5, 7]
## x = on df | select rows df.col3 % df.col1 != 2 or df.col1 <= 12
Grammar
select rows !condition
- Parameters
condition (expression) – A boolean expression used as a row filter.
Type
Transformation
-
nldsl.pandas_extension.
drop_columns
(code, args, env=None)¶ Drop certain columns from a DataFrame.
Examples
## x = on df | drop columns df.col1, col2, df.col3
## x = on df | drop columns “col1”, “col2”, “col3”
Grammar
drop columns $columns[$col]
- Parameters
columns (varlist) – A list of column names.
Type
Transformation
-
nldsl.pandas_extension.
join
(code, args, env=None)¶ Compute join with this DataFrame.
Examples
x = on df | join inner df2 on ‘col1’, ‘col2’
x = on df | join left df2 on ‘col1’
Grammar
join $how $table on $columns[$col] how := { left, right, outer, inner }
- Parameters
how (variable) – How the join shall be performed.
table (variable) – The table with which to join.
columns (varlist) – A list of column on which to join.
Type
Transformation
-
nldsl.pandas_extension.
group_by
(code, args, env=None)¶ Group a DataFrame and apply an aggregation.
Examples
## x = on df | group by df.col1 apply min
## x = on df | group by df.col1, df.col2 apply mean
Grammar
group by $columns[$col] apply $aggregation aggregation := { min, max, sum, avg, mean, count }
- Parameters
columns (varlist) – A list of column names.
aggregation (variable) – The aggregation operation to be performed.
Type
Operation
-
nldsl.pandas_extension.
replace_values
(code, args, env=None)¶ Replace a value with another.
Every occurrence of old_value will be substituted with new_value.
Examples
## x = on df | replace values 1 by 0
## x = on df | replace values “old” by “new”
Grammar
replace $old_value with $new_value
- Parameters
old_value (variable) – The value to be replaced.
new_value (variable) – The value it will be replaced with.
Type
Operation
-
nldsl.pandas_extension.
append_column
(code, args, env=None)¶ Append a new column to the DataFrame.
Example
x = on df | append column df.col1 * 2 as ‘new_col’
Grammar
append column !col_expr as $col_name
- Parameters
col_expr (expression) – An expression defining the new value of the column.
col_name (variable) – The new name of the column.
Type
Transformation
-
nldsl.pandas_extension.
sort_by
(code, args, env=None)¶ Sort the DataFrame by certain columns.
Examples
## x = on df | sort by df.col1 descending
## x = on df | sort by “col1” ascending, “col2” descending
Grammar
sort by $columns[$col $order] order := { ascending, descending }
- Parameters
columns (varlist) – A list of column names and sorting order pairs.
Type
Transformation
-
nldsl.pandas_extension.
drop_duplicates
(code, args, env=None)¶ Drop duplicate rows from a DataFrame.
Examples
x = on df | drop duplicates
x = on df | select rows df.col1 != 0 | drop duplicates
Grammar
drop duplicates
Type
Transformation
-
nldsl.pandas_extension.
rename_columns
(code, args, env=None)¶ Rename some columns in a DataFrame.
Examples
x = on df | rename columns col1 to col2
x = on df | rename columns col1 to col2, col3 to col4 | show
Grammar
rename columns $columns[$current to $new]
- Parameters
columns (list) – A list of current and new column names.
Type
Transformation
-
nldsl.pandas_extension.
show
(code, args, env=None)¶ Print the DataFrame to stdout.
Examples
on df | show
on df | drop duplicates | show
Grammar
show
Type
Operation
-
nldsl.pandas_extension.
show_schema
(code, args, env=None)¶ Print the schema of the DataFrame to stdout.
Examples
on df | show schema
Grammar
show schema
Type
Operation
-
nldsl.pandas_extension.
describe
(code, args, env=None)¶ Print a description of the DataFrame to stdout.
Examples
on df | describe
on df | drop duplicates | describe
Grammar
describe
Type
Operation
-
nldsl.pandas_extension.
head
(code, args, env=None)¶ Get the num_rows top most rows in the DataFrame.
Examples
on df | head 10
on df | drop duplicates | head 100
Grammar
head $num_rows
- Parameters
num_rows (variable) – The number of rows to return.
Type
Operation
-
nldsl.pandas_extension.
count
(code, args, env=None)¶ Count the number of rows in the DataFrame.
Examples
on df | count
on df | drop duplicates | count
Grammar
count
Type
Operation
Code Generator¶
-
class
nldsl.pandas_extension.
PandasCodeGenerator
(recommend=True, import_name='pandas', **kwargs)[source]¶ Bases:
nldsl.core.codegen.CodeGenerator
A PandasCodeGenerator translates DSL statements into executable pandas code.
There are two kind of DSL statements, the ones which can be evaluate to executable code and the ones, which extend the DSL Grammer. As a result parsing a set of DSL statements usually has two impacts. Executable code is generate and the PandasCodeGenerator modifies itself in such a way that he is capable of parsing statements according to the new Grammer rules.
Furthermore the PandasCodeGenerator derives from the CodeMap class and his grammer can also be extended by rules, which can not be expressed within the DSL. This is done with the __setitem__ and registerFunction methods.
Example
Adding new rules:
def show(code, args): return "print({})".format(code) PandasCodeGenerator.registerFunction("show entire table", show) # Add to class myCodeGen = PandasCodeGenerator(); myCodeGen["show entire table"] = show # only add to this instance
- Parameters
recommend (bool) – Whether to return a Recommendation if possible or always raise an error.
import_name (str) – The name under which the Pandas module is imported
kwargs (dict) – Additional keyword argument, which will be added to the environment
-
class
nldsl.pandas_extension.
PandasExpressionRule
(expr_name, next_keyword)[source]¶ Bases:
nldsl.core.rules.ExpressionRule
An ExpressionRule dedicated to parsing Pandas expressions.
- Parameters
expr_name (str) – The name of the expression.
next_keyword (str) – The keyword following the expression or None