capepy.aws package

Submodules

capepy.aws.dynamodb module

class capepy.aws.dynamodb.CrawlerTable(table_name=None)

Bases: Table

A DynamoDB table with specific structure for organizing Glue Crawlers.

get_crawler(bucket_name)

Retrieve a specific crawler from the table.

Parameters:: bucket_name – The name of the bucket.
Returns:: The retrieved crawler item.

class capepy.aws.dynamodb.EtlTable(table_name=None)

Bases: Table

A DynamoDB table with specific structure for organizing ETL jobs.

get_etls(bucket_name, prefix)

Retrieve a specific ETLs from the table.

Parameters:

bucket_name – The name of the s3 bucket to get ETL jobs for.
prefix – The required prefix to check for ETL jobs.

Returns:

The ETL Jobs triggered by the given bucket name and prefix.

class capepy.aws.dynamodb.PipelineTable(table_name=None)

Bases: Table

A DynamoDB table with specific structure for organizing analysis pipelines.

get_pipeline(pipeline_name, pipeline_version)

Retrieve a specific pipeline from the table.

Parameters:

pipeline_name – The name of the pipeline.
pipeline_version – The version of the pipeline.

Returns:

The retrieved pipeline item.

class capepy.aws.dynamodb.Table(table_name)

Bases: Boto3Object

An object for working with specific DynamoDB Table structures in the CAPE system.

name: the name of the table

table: the table object retrieved with the boto3 dynamodb resource

get_item(key)

Retrieve an item from the loaded table.

Parameters:: key – The key of the entry to retrieve from the table.
Returns:: The item value from the DyamoDB table.

class capepy.aws.dynamodb.UserTable(table_name=None)

Bases: Table

A DynamoDB table with specific structure for managing CAPE users.

get_user(user_id)

Retrieve a specific user from the table.

Parameters:: user_id – The id of the user.
Returns:: The retrieved user attributes.

capepy.aws.glue module

class capepy.aws.glue.EtlJob

Bases: Boto3Object

An object for creating ETL Jobs for use in AWS Glue

spark_ctx: The PySpark session

glue_ctx: The AWS Glue context

logger: The logger for logging to AWS Glue

parameters: A dictionary of parameters passed into the job

get_src_file()

Retrieve the source file from S3 and return its contents as a byte string.

Raises:: Exception – If the source file is unable to be successfully retrieved from S3.
Returns:: A byte string of the source file contents

write_sink_file(sink_data, sink_key=None)

Write data to the sink data file inside the sink S3 bucket as configured by the Glue ETL job.

Parameters:

sink_data (byes or seekable file-like object) – Object data to be written to s3.
sink_key (Optional[str]) – The prefix and filename for the new sink data file within the sink s3 bucket.

Raises:

Exception – If the sink data file is unable to be successfully put into s3.

capepy.aws.lambda_ module

class capepy.aws.lambda_.BucketNotificationRecord(record)

Bases: Record

An object for S3 bucket notification related records passed into AWS Lambda handlers.

bucket: The name of the bucket

key: The key into the bucket if relevant

class capepy.aws.lambda_.EtlRecord(record)

Bases: QueueRecord

An object for ETL related records passed into AWS Lambda handlers.

job: The name of the ETL Job

bucket: The name of the bucket

key: The key into the bucket if relevant

class capepy.aws.lambda_.PipelineRecord(record)

Bases: QueueRecord

An object for pipeline records passed into AWS Lambda handlers.

name: The name of the analysis pipeline

version: The version of the analysis pipeline

parameters: A dictionary of parameters to pass to the analysis pipeline

class capepy.aws.lambda_.QueueRecord(record)

Bases: Record

An object for general records in AWS Lambda handlers.

raw: The raw record

class capepy.aws.lambda_.Record(record)

Bases: object

An object for general records in AWS Lambda handlers.

raw: The raw record

capepy.aws.meta module

class capepy.aws.meta.Boto3Object

Bases: object

Contains general resources needed by all AWS utilities for interacting with the boto3 library

logger: The logger for logging to AWS Glue

clients: A dictionary of instantiated AWS clients indexed by the name of the AWS service.

resources: A dictionary of instantiated AWS resources indexed by the name of the AWS service.

get_client(service_name, **kwargs)

Get a client for the provided service, if one hasn’t been created yet, set a new client.

Parameters:

service_name (str) – The name of the service to retrieve a client for.
**kwargs – Optional keyword arguments passed to boto3.client() if a new client needs to be set.

Returns:

The boto3 client.

get_resource(service_name, **kwargs)

Get a resource for the provided service, if one hasn’t been created yet, set a new resource.

Parameters:

service_name (str) – The name of the service to retrieve a resource for.
**kwargs – Optional keyword arguments passed to boto3.resource() if a new resource needs to be set.

Returns:

The boto3 client.

set_client(service_name, **kwargs)

Set a new client for the provided service.

Parameters:

service_name (str) – The name of the service to instantiate a new client
for.
**kwargs – Optional keyword arguments passed to boto3.client().

set_resource(service_name, **kwargs)

Set a new resource for the provided service.

Parameters:

service_name (str) – The name of the service to instantiate a new resource for.
**kwargs – Optional keyword arguments passed to boto3.resource().

capepy.aws.utils module

capepy.aws.utils.decode_error(err)

Decode a client error message from AWS

Parameters:

err (ClientError) – The ClientError to parse out the error code and message if they are
available.

Returns:

A tuple (code, message) where code is a string containing the error code, and message is a string containing the entire error message.

capepy.aws package

Submodules

capepy.aws.dynamodb module

capepy.aws.glue module

capepy.aws.lambda_ module

capepy.aws.meta module

capepy.aws.utils module

Module contents