Shared ETL Libraries

Shared ETL Libraries are an extension of external job libraries. The primary purpose of shared ETL libraries is to maintain a central repository of organization approved libraries/packages to be used across multiple ETL Jobs.

Amorphic Shared ETL Libraries provides the following capabilities:
  • User can have multiple packages attached to a job and has the ability to switch between them to perform various actions based on his/her job requirements.
  • Customize job dependencies to a granular level.
  • Flexibility to choose the type of packages. Currently based on the type of ETL Job Amorphic supports “py”, “egg” and “whl” extensions for python shell and “py”, “zip” for pyspark applications

The following picture depicts the Shared ETL library Console in Amorphic:

Shared ETL Libraries Home Page

What is a Shared ETL Library?

Shared ETL Library is defined as a collection of package/modules which provide standardized solutions for many problems that occur in everyday programming. These packages are unlike the OS provided python supporting collection but are explicitly designed by a User/Organization or the open source community to encourage and enhance the portability of Python programs by abstracting away platform-specifics into platform-neutral APIs.

Library has the following properties:
  • A Library can have multiple packages attached to it.
  • A Library can be attached to multiple ETL Jobs.
In Amorphic we have two types of ETL Libraries:
  • External Libraries : The scope of the library is within an ETL job and are deleted once we delete the ETL job
  • Shared Libraries : These Libraries have a global scope where multiple jobs can use the same shared library upon user Authentication and remain in the central repository even after ETL job deletion.

Amorphic Shared ETL Libraries contains the following information:

Shared ETL Libraries Metadata Information

Type Description
Library Name Library Name, which uniquely identifies the functionality of the library.
Library Description A brief explanation of the library typically the contents/package inside it.
Packages A package is a file or a list of files that can be imported into a ETL Job to perform a specific set of operations Ex: matplotlib: A numerical plotting library which is used by any data scientist or any data analyzer for visualizations
Jobs The list of etl jobs to which the library is attached to.
CreatedBy User who created the library.
LastModifiedBy User who has recently updated the library.
LastModifiedTime Timestamp when the library was recently updated.

Shared ETL library Operations

Amorphic Shared ETL library provides all the basic CRUD (Create, Read, Update and Delete) operations for a library.

Create Library

You can create new Library in Amorphic by using the “Create New Library” section under “ETL Libraries” of Amorphic application.

In order to create a new Library, you would require information like name and description to the library. The applications allows libraries to have zero or more packages/jobs attached to it. Please follow the animation to create a new library.

Create New Library

View Library

If the user has sufficient permissions to view a library, He/She can view all the existing library information by clicking on the Library name under the “ETL Libraries” section inside Job Menu.

Please follow the below animation to view the library information in detail

View library

Update Library

If the user has sufficient permissions to update a library, He/She can view all the existing library information by clicking on the Library Name under the “ETL Libraries” section inside Job Menu and by clicking on the Edit Library icon from the top right side Actions menu. This will re-direct you to a different page where you can start editing any of the Library metadata.

Please follow the below animation to update the library information in detail

Update library

Delete Library

If the user has sufficient permissions to delete a library, He/She can view all the existing library information by clicking on the Library Name under the Librarys section inside Management Menu and by clicking on the Delete Library icon from the top right side Actions menu. Please note user will not be able to delete a shared library if it is attached to any of the existing ETL Jobs. User should remove all the library usage in ETL jobs and re-try to delete the library.

Please follow the below animation to delete the library.

Delete library

Attach Library

Attach Library functionality is enabled for users from the job details page. There are two ways how a user can attach a shared library to a job i.e while creating or updating.

When creating/updating an ETL Job Amorphic provides a drop down menu Shared libraries along with other job parameters. User will be presented with a set of shared libraries which he/she has access to and can multi-select libraries from the drop down which needs to be attached to the job. Once attached all the packages in the shared library are passed as arguments to the ETL job automatically without any user intervention.

Please follow the below animation to attach a shared ETL library to an existing ETL Job.

Attach library