capepy.aws package

Submodules

capepy.aws.dynamodb module

class capepy.aws.dynamodb.EtlTable(table_name=None)

Bases: Table

A DynamoDB table with specific structure for organizing ETL jobs.

get_etls(bucket_name, prefix)

Retrieve a specific ETLs from the table.

Parameters:
  • bucket_name – The name of the s3 bucket to get ETL jobs for.

  • prefix – The required prefix to check for ETL jobs.

Returns:

The ETL Jobs triggered by the given bucket name and prefix.

class capepy.aws.dynamodb.PipelineTable(table_name=None)

Bases: Table

A DynamoDB table with specific structure for organizing analysis pipelines.

get_pipeline(pipeline_name, pipeline_version)

Retrieve a specific pipeline from the table.

Parameters:
  • pipeline_name – The name of the pipeline.

  • pipeline_version – The version of the pipeline.

Returns:

The retrieved pipeline item.

class capepy.aws.dynamodb.Table(table_name)

Bases: Boto3Object

An object for working with specific DynamoDB Table structures in the CAPE system.

name

the name of the table

table

the table object retrieved with the boto3 dynamodb resource

get_item(key)

Retrieve an item from the loaded table.

Parameters:

key – The key of the entry to retrieve from the table.

Returns:

The item value from the DyamoDB table.

capepy.aws.glue module

class capepy.aws.glue.EtlJob

Bases: Boto3Object

An object for creating ETL Jobs for use in AWS Glue

spark_ctx

The PySpark session

glue_ctx

The AWS Glue context

logger

The logger for logging to AWS Glue

parameters

A dictionary of parameters passed into the job

get_raw_file()

Retrieve the raw file from S3 and return its contents as a byte string.

Raises:

Exception – If the raw file is unable to be successfully retrieved from S3.

Returns:

A byte string of the raw file contents

write_clean_file(clean_data, clean_key=None)

Write data to a clean data file inside the clean S3 bucket as configured by the Glue ETL job.

Parameters:
  • clean_data (byes or seekable file-like object) – Object data to be written to s3.

  • clean_key (Optional[str]) – The prefix and filename for the new clean data file within the clean s3 bucket.

Raises:

Exception – If the clean data file is unable to be successfully put into s3.

capepy.aws.lambda_ module

class capepy.aws.lambda_.BucketNotificationRecord(record)

Bases: Record

An object for S3 bucket notification related records passed into AWS Lambda handlers.

bucket

The name of the bucket

key

The key into the bucket if relevant

class capepy.aws.lambda_.EtlRecord(record)

Bases: QueueRecord

An object for ETL related records passed into AWS Lambda handlers.

job

The name of the ETL Job

bucket

The name of the bucket

key

The key into the bucket if relevant

class capepy.aws.lambda_.PipelineRecord(record)

Bases: QueueRecord

An object for pipeline records passed into AWS Lambda handlers.

name

The name of the analysis pipeline

version

The version of the analysis pipeline

parameters

A dictionary of parameters to pass to the analysis pipeline

class capepy.aws.lambda_.QueueRecord(record)

Bases: Record

An object for general records in AWS Lambda handlers.

raw

The raw record

class capepy.aws.lambda_.Record(record)

Bases: object

An object for general records in AWS Lambda handlers.

raw

The raw record

capepy.aws.meta module

class capepy.aws.meta.Boto3Object

Bases: object

Contains general resources needed by all AWS utilities for interacting with the boto3 library

logger

The logger for logging to AWS Glue

clients

A dictionary of instantiated AWS clients indexed by the name of the AWS service.

resources

A dictionary of instantiated AWS resources indexed by the name of the AWS service.

get_client(service_name, **kwargs)

Get a client for the provided service, if one hasn’t been created yet, set a new client.

Parameters:
  • service_name (str) – The name of the service to retrieve a client for.

  • **kwargs – Optional keyword arguments passed to boto3.client() if a new client needs to be set.

Returns:

The boto3 client.

get_resource(service_name, **kwargs)

Get a resource for the provided service, if one hasn’t been created yet, set a new resource.

Parameters:
  • service_name (str) – The name of the service to retrieve a resource for.

  • **kwargs – Optional keyword arguments passed to boto3.resource() if a new resource needs to be set.

Returns:

The boto3 client.

set_client(service_name, **kwargs)

Set a new client for the provided service.

Parameters:
  • service_name (str) – The name of the service to instantiate a new client

  • for.

  • **kwargs – Optional keyword arguments passed to boto3.client().

set_resource(service_name, **kwargs)

Set a new resource for the provided service.

Parameters:
  • service_name (str) – The name of the service to instantiate a new resource for.

  • **kwargs – Optional keyword arguments passed to boto3.resource().

capepy.aws.utils module

capepy.aws.utils.decode_error(err)

Decode a client error message from AWS

Parameters:
  • err (ClientError) – The ClientError to parse out the error code and message if they are

  • available.

Returns:

A tuple (code, message) where code is a string containing the error code, and message is a string containing the entire error message.

Module contents