capepy.aws package
Submodules
capepy.aws.dynamodb module
- class capepy.aws.dynamodb.EtlTable(table_name=None)
Bases:
Table
A DynamoDB table with specific structure for organizing ETL jobs.
- get_etls(bucket_name, prefix)
Retrieve a specific ETLs from the table.
- Parameters:
bucket_name – The name of the s3 bucket to get ETL jobs for.
prefix – The required prefix to check for ETL jobs.
- Returns:
The ETL Jobs triggered by the given bucket name and prefix.
- class capepy.aws.dynamodb.PipelineTable(table_name=None)
Bases:
Table
A DynamoDB table with specific structure for organizing analysis pipelines.
- get_pipeline(pipeline_name, pipeline_version)
Retrieve a specific pipeline from the table.
- Parameters:
pipeline_name – The name of the pipeline.
pipeline_version – The version of the pipeline.
- Returns:
The retrieved pipeline item.
- class capepy.aws.dynamodb.Table(table_name)
Bases:
Boto3Object
An object for working with specific DynamoDB Table structures in the CAPE system.
- name
the name of the table
- table
the table object retrieved with the boto3 dynamodb resource
- get_item(key)
Retrieve an item from the loaded table.
- Parameters:
key – The key of the entry to retrieve from the table.
- Returns:
The item value from the DyamoDB table.
capepy.aws.glue module
- class capepy.aws.glue.EtlJob
Bases:
Boto3Object
An object for creating ETL Jobs for use in AWS Glue
- spark_ctx
The PySpark session
- glue_ctx
The AWS Glue context
- logger
The logger for logging to AWS Glue
- parameters
A dictionary of parameters passed into the job
- get_raw_file()
Retrieve the raw file from S3 and return its contents as a byte string.
- Raises:
Exception – If the raw file is unable to be successfully retrieved from S3.
- Returns:
A byte string of the raw file contents
- write_clean_file(clean_data, clean_key=None)
Write data to a clean data file inside the clean S3 bucket as configured by the Glue ETL job.
- Parameters:
clean_data (byes or seekable file-like object) – Object data to be written to s3.
clean_key (
Optional
[str
]) – The prefix and filename for the new clean data file within the clean s3 bucket.
- Raises:
Exception – If the clean data file is unable to be successfully put into s3.
capepy.aws.lambda_ module
- class capepy.aws.lambda_.BucketNotificationRecord(record)
Bases:
Record
An object for S3 bucket notification related records passed into AWS Lambda handlers.
- bucket
The name of the bucket
- key
The key into the bucket if relevant
- class capepy.aws.lambda_.EtlRecord(record)
Bases:
QueueRecord
An object for ETL related records passed into AWS Lambda handlers.
- job
The name of the ETL Job
- bucket
The name of the bucket
- key
The key into the bucket if relevant
- class capepy.aws.lambda_.PipelineRecord(record)
Bases:
QueueRecord
An object for pipeline records passed into AWS Lambda handlers.
- name
The name of the analysis pipeline
- version
The version of the analysis pipeline
- parameters
A dictionary of parameters to pass to the analysis pipeline
capepy.aws.meta module
- class capepy.aws.meta.Boto3Object
Bases:
object
Contains general resources needed by all AWS utilities for interacting with the boto3 library
- logger
The logger for logging to AWS Glue
- clients
A dictionary of instantiated AWS clients indexed by the name of the AWS service.
- resources
A dictionary of instantiated AWS resources indexed by the name of the AWS service.
- get_client(service_name, **kwargs)
Get a client for the provided service, if one hasn’t been created yet, set a new client.
- Parameters:
service_name (
str
) – The name of the service to retrieve a client for.**kwargs – Optional keyword arguments passed to boto3.client() if a new client needs to be set.
- Returns:
The boto3 client.
- get_resource(service_name, **kwargs)
Get a resource for the provided service, if one hasn’t been created yet, set a new resource.
- Parameters:
service_name (
str
) – The name of the service to retrieve a resource for.**kwargs – Optional keyword arguments passed to boto3.resource() if a new resource needs to be set.
- Returns:
The boto3 client.
- set_client(service_name, **kwargs)
Set a new client for the provided service.
- Parameters:
service_name (
str
) – The name of the service to instantiate a new clientfor.
**kwargs – Optional keyword arguments passed to boto3.client().
- set_resource(service_name, **kwargs)
Set a new resource for the provided service.
- Parameters:
service_name (
str
) – The name of the service to instantiate a new resource for.**kwargs – Optional keyword arguments passed to boto3.resource().
capepy.aws.utils module
- capepy.aws.utils.decode_error(err)
Decode a client error message from AWS
- Parameters:
err (
ClientError
) – The ClientError to parse out the error code and message if they areavailable.
- Returns:
A tuple (code, message) where code is a string containing the error code, and message is a string containing the entire error message.