Architecture and Design¶
We’ll start with the C4 views:
Context
Container – this isn’t too interesting, but it can help to see this.
Components
This is a collection of various design notes describing some implementation details.
The code view is in the API Reference section.
Context¶
There are two distinct contexts for CEL Python:
The CLI – as a stand-alone application.
As an importable module to provide expressions to a DSL.
From the CLI, the celpy
application has a number of use cases:
A shell script can use
celpy
as a command to replace other shell commands, including expr, test, and jq.A person can run
celpy
interactively. This allows experimentation. It also supports exploring very complex JSON documents to understand their structure.
As a library, an application (for example, C7N) can import celpy
to provide an expression feature for the DSL.
This provides well-defined semantics, and widely-used syntax for the expression language.
There’s an explicit separation between building a program and executing the program to allow caching an expression for multiple executions without the overhead of building a Lark parser or compiling the expression.
Container¶
As a CLI, this is part of a shell script. It runs where the script runs.
As a library, this is improted into the application to extend the DSL.
There are no services offered or used.
Components¶
The Python code base has a number of modules.
__init__
– thecelpy
package as a whole.__main__
– the main applications used when runningcelpy
.celparser
– a Facade for the Lark parser.evaluation
– a Facade for run-time evaluation.celtypes
– the underlying Python implementations of CEL data structures.c7nlib
– a collection of components the C7N can use to introduce CEL filters.adapter
– Some JSON serialization components.
Here’s the conceptual organiation
While there is a tangle of dependencies, there are three top-level “entry points” for celpy
.
The
__main__
module is the CLI application.The
c7nlib
module exposes CEL functionality in a form usable by Cloud Custodian filter definitions. This library provides useful components to perform Custodian-related computations.The
__init__
module is exposes the most useful parts ofcelpy
for integration woth another application.
Compile-Time¶
Here are the essential classes used to compile a CEL expression and prepare it for evaluation.
The fundamental sequence of operations is
Create an
celpy.Environment
with any neededcelpy.Annotation
instances. For the most part, these are based on the overall application domain. Any type definitions should be subclasses ofcelpy.TypeType
or a callable function defined by thecelpy.CELFunction
type.Use the
celpy.Environment
to compile the CEL text to create a parse tree.Use the
celpy.Environment
to create acelpy.Runner
instance from the parse tree and any function definitions that override or extend the predefined CEL environment.Evaluate the
celpy.Runner
with acelpy.Context
. Thecelpy.Context
provides specific values for variables required for evaluation. Generally, each variable should have ancelpy.Annotation
defined in thecelpy.Environment
.
The celpy.Runner
can be evaluated with any number of distinct celpy.Context
values.
This amortizes the cost of compilation over multiple executions.
Evaluation-Time¶
Here’s the classes to evaluate a CEL expression.
The evalation of the CEL expression is done via a celpy.Runner
object.
There are two celpy.Runner
implementations.
The
celpy.InterpretedRunner
walks the AST, creating the final resultcelpy.Value
orcelpy.CELEvalError
exception. This uses acelpy.evaluation.Activation
to perform the evaluation.The
celpy.CompiledRunner
transpiles the AST into a Python sequence of statements. The internalcompile()
creates a code object that can then be evaluated with a givencelpy.evaluation.Activation
The internalexec()
functions performs the evaluation.
The subclasses of celpy.Runner
are Adapter classes to provide a tidy interface to the somewhat more complex celpy.Evaluator
or celpy.Transpiler
objects.
In the case of the celpy.InterpretedRunner
, evaluation involves creating an celpy.evaluation.Activation
and visiting the AST.
Whereas, the celpy.CompiledRunner
must first visit the AST to create code. At evaluation time, it create an celpy.evaluation.Activation
and uses exec()
to compute the final value.
The celpy.evaluation.Activation
contains several things:
The
Annotation
definitions to provide type information for identifiers.The
CELFunction
functions that extend or override the built-in functions.The values for identifiers.
The celpy.evaluation.Activation
is a kind of chainmap for name resolution.
The chain has the following structure:
The end of the chain has the built-in defaults. (This is the bottom-most base definition.)
A layer on top of this can offer types and functions which are provided to integrate into the containing app or framework.
The next layer is the “current” activation when evaluating a given expression. For the CLI, this has the command-line variables. For other integrations, these are the input values.
A transient layer on top of this is used to create a local variable binding for the macro evaluations. These can be nested, and introduce the macro variable as a temporary annotation and value binding.
CEL Types¶
There are ten extension types that wrap Python built-in types to provide the unique CEL semantics.
celtypes.TypeType
is a supertype for CEL types.celtypes.BoolType
wrapsint
and creates additional type overload exceptions.celtypes.BytesType
wrapsbytes
it handles conversion fromceltypes.StringType
.celtypes.DoubleType
wrapsfloat
and creates additional type overload exceptions.celtypes.IntType
wrapsint
and adds a 64-bit signed range constraint.celtypes.UintType
wrapsint
and adds a 64-bit unsigned range constraint.celtypes.ListType
wrapslist
and includes some type overload exceptions.celtypes.MapType
wrapsdict
and includes some type overload exceptions. Additionally, theMapKeyTypes
type hint is the subset of types permitted as keys.celtypes.StringType
wrapsstr
and includes some type overload exceptions.celtypes.TimestampType
wrapsdatetime.datetime
and includes a number of conversions fromdatetime.datetime
,int
, andstr
values.celtypes.DurationType
wrapsdatetime.timedelta
and includes a number of conversions fromdatetime.timedelta
,int
, andstr
values.
Additionally, a celtypes.NullType
is defined, but does not seem to be needed. It hasn’t been deleted, yet.
It should be considered deprecated.
Transpiler Missing Names¶
The member_dot
transpilation with a missing name will be found at evaluation time via member.get('IDENT')
. This raises No Such Member in Mapping error.
The primary :: ident
evaluation can result in one of the following conditions:
ident
denotes a type definition. The value’s type isTypeType
. The value is a type referencebool
becomescelpy.celtypes.BoolType
.
ident
denotes a built-in function. The value’s type isCELFunction
. The value is the Python function reference.
ident
denotes an annotation, but the value’s type is neitherTypeType
norCELFunction
.The transpiled value is
f"activation.{ident}"
, assuming it will be a defined variable.If, at
exec()
time the name is not in the Activation with a value, aNameError
exception will be raised that becomes aCELEvalError
exception.
The Member-Dot Production¶
Consider protobuf_message{field: 42}.field
.
This is parsed using the following productions.
member : member_dot | member_dot_arg | member_item | member_object | primary
member_dot : member "." IDENT
member_object : member "{" [fieldinits] "}"
The member_object
will be a primary
which can be an ident
.
It MUST refer to the Annotation (not the value) because it has fieldinits
.
All other choices are (generally) values.
They can be annotations, which means bool.type()
works the same as type(bool)
.
Here’s primary
production, which defines the ident
in the member
production.
primary : dot_ident_arg | dot_ident | ident_arg | ident
| paren_expr | list_lit | map_lit | literal
The ident
is not always transpiled as activation.{name}
.
Inside member_object
, it’s activation.resolve_name({name})
.
Outside member_object
, it can be activation.{name}
because it’s a simple variable.
It may make sense to rename the Activation.resolve_name()
method to Activation.get()
.
This, however, overloads the get()
method.
This has type hint consequences.
Important
The member
can be any of a variety of objects:
NameContainer(Dict[str, Referent])
Activation
MapType(Dict[Value, Value])
MessageType(MapType)
All of these classes must define a get()
method.
The nuance is the NameContainer
is also a Python dict
and there’s an
overload issue between that get()
and other get()
definitions.
The Transpilation currently leverages a common method named get()
for all of these types.
This is a Pythonic approach, but, the overload for the NameContainer
(a Dict
subclass) isn’t quite right:
it doesn’t return a Referent
, but the value from a Referent
.
A slightly smarter approach is to define a get_value(member, 'name')
function that uses a match/case structure to do the right thing for each type. The problem is, the result is a union of type, value, function, and any of these four containers!
Another possibility is to leverage the Annotations. They can provide needed type information to discern which method with specific result type.