Shared ETL Libraries¶
Shared ETL Libraries are an extension of external job libraries. The primary purpose of shared ETL libraries is to maintain a central repository of organization approved libraries/packages to be used across multiple ETL Jobs.
- Amorphic Shared ETL Libraries provides the following capabilities:
- User can have multiple packages attached to a job and has the ability to switch between them to perform various actions based on his/her job requirements.
- Customize job dependencies to a granular level.
- Flexibility to choose the type of packages. Currently based on the type of ETL Job Amorphic supports “py”, “egg” and “whl” extensions for python shell and “py”, “zip” for pyspark applications
The following picture depicts the Shared ETL library Console in Amorphic:
What is a Shared ETL Library?¶
Shared ETL Library is defined as a collection of package/modules which provide standardized solutions for many problems that occur in everyday programming. These packages are unlike the OS provided python supporting collection but are explicitly designed by a User/Organization or the open source community to encourage and enhance the portability of Python programs by abstracting away platform-specifics into platform-neutral APIs.
- Library has the following properties:
- A Library can have multiple packages attached to it.
- A Library can be attached to multiple ETL Jobs.
- In Amorphic we have two types of ETL Libraries:
- External Libraries : The scope of the library is within an ETL job and are deleted once we delete the ETL job
- Shared Libraries : These Libraries have a global scope where multiple jobs can use the same shared library upon user Authentication and remain in the central repository even after ETL job deletion.
Amorphic Shared ETL Libraries contains the following information:
Shared ETL Libraries Metadata Information¶
||Library Name, which uniquely identifies the functionality of the library.|
||A brief explanation of the library typically the contents/package inside it.|
||A package is a file or a list of files that can be imported into a ETL Job to perform a specific set of operations Ex: matplotlib: A numerical plotting library which is used by any data scientist or any data analyzer for visualizations|
||The list of etl jobs to which the library is attached to.|
||User who created the library.|
||User who has recently updated the library.|
||Timestamp when the library was recently updated.|
Shared ETL library Operations¶
Amorphic Shared ETL library provides all the basic CRUD (Create, Read, Update and Delete) operations for a library.
- Create Library : Create a custom library by choosing package(s) of user’s choice.
- View Library : View existing library Shared ETL Libraries Metadata Information
- Update Library : Update an existing library.
- Delete Library : Delete an existing library.
- Attach Library : Attach an existing library to a ETL Job.
You can create new Library in Amorphic by using the “Create New Library” section under “ETL Libraries” of Amorphic application.
In order to create a new Library, you would require information like name and description to the library. The applications allows libraries to have zero or more packages/jobs attached to it. Please follow the animation to create a new library.
If the user has sufficient permissions to view a library, He/She can view all the existing library information by clicking on the Library name under the “ETL Libraries” section inside Job Menu.
Please follow the below animation to view the library information in detail
If the user has sufficient permissions to update a library, He/She can view all the existing library information by clicking on the Library Name under the “ETL Libraries” section inside Job Menu and by clicking on the Edit Library icon from the top right side Actions menu. This will re-direct you to a different page where you can start editing any of the Library metadata.
Please follow the below animation to update the library information in detail
If the user has sufficient permissions to delete a library, He/She can view all the existing library information by clicking on the Library Name under the Librarys section inside Management Menu and by clicking on the Delete Library icon from the top right side Actions menu. Please note user will not be able to delete a shared library if it is attached to any of the existing ETL Jobs. User should remove all the library usage in ETL jobs and re-try to delete the library.
Please follow the below animation to delete the library.
Attach Library functionality is enabled for users from the job details page. There are two ways how a user can attach a shared library to a job i.e while creating or updating.
When creating/updating an ETL Job Amorphic provides a drop down menu Shared libraries along with other job parameters. User will be presented with a set of shared libraries which he/she has access to and can multi-select libraries from the drop down which needs to be attached to the job. Once attached all the packages in the shared library are passed as arguments to the ETL job automatically without any user intervention.
Please follow the below animation to attach a shared ETL library to an existing ETL Job.