Datasets

The following sections describes all the known or existing issues related to the Amorphic datasets feature.

S3 Athena Datasets

5.1 Empty data for numeric or non-string columns:

Error message: Data parsing failed, emtpy field data found for non-string column

Issue description: When the user tries to load emtpy/null values for non-string columns, the load process fails throwing a data validation failed error message.

Explanation: This error message is thrown by the file parser which currently doesnot support the usage of non-string fields being null/empty. As per the documentation one work around to achieve this is to import them as string columns and create views on top of it by casting them to the required data types.

5.2 File parsing:

Error message: Data validation failed with message, new-line character seen in unquoted field - do you need to open the file in universal-newline mode?

Issue description: This is one of the data validation errors which occurs while loading the data into Dataset.

Explanation: This error message is thrown by the file parser which currently doesnot support the usage of embedded line breaks in the csv/xlsx file. Please follow the documentation for information documentation Possible solution of this is to perform a regex replace on in-appropriate new line or carriage return characters in the file.

5.3 Field validations:

Error message: N/A

Issue description: Validations not available for all data types

Explanation: Currently data type validations are only limited primitive types Strings/Varchar, Integers, Double, Boolean, Date and Timestamp. Support of complex structures is yet to be added. Moreover for data types like Date and Timestamp, value formats are not strictly validated as they are multi formatted.

5.4 Batch file uploads:

Error message: Data validation failed with message, Hive bad data exception

Issue description: This occurs when a user uploads a batch or multiple files of good and bad data files at the same time. Currently the validation fails and user wont be able to upload any of the file in the batch.

Explanation: This is a limitation on the validation feature for append type datasets, where even if one file in the batch upload is corrupt the user wont be able to load any of the individual files data. Temporary work around is to load individual files and correct and re-upload them in case of any failures. Our team is working on implementing an update in the next version.

Reload Datasets

  • If a dataset is created with table update type of reload and has the target as DWH of AuroraMysql. Please be informed that if the data contains headers only header from 1st file is being skipped if Skip file header is selected as True. Headers from remaining files are being loaded as the data in the AuroraMysql table. This is because of an issue on the AWS side and has nothing to do with Amorphic. Creating datsets with Skip file header as false and uploading files that doesn’t contain header should solve the issue. No ETA from AWS on when this issue gets resolved.