Skip to main content

File

Module file

Certified

This plugin pulls metadata from a previously generated file. The file sink can produce such files, and a number of samples are included in the examples/mce_files directory.

CLI based Ingestion

Install the Plugin

The file source works out of the box with acryl-datahub.

Starter Recipe

Check out the following recipe to get started with ingestion! See below for full configuration options.

For general pointers on writing and running a recipe, see our main recipe guide.

source:
type: file
config:
# Coordinates
filename: ./path/to/mce/file.json

sink:
# sink configs

Config Details

Note that a . is used to denote nested fields in the YAML recipe.

View All Configuration Options
Field [Required]TypeDescriptionDefaultNotes
aspect [✅]stringSet to an aspect to only read this aspect for ingestion.None
count_all_before_starting [✅]booleanWhen enabled, counts total number of records in the file before starting. Used for accurate estimation of completion time. Turn it off if startup time is too high.True
file_extension [✅]stringWhen providing a folder to use to read files, set this field to control file extensions that you want the source to process. * is a special value that means process every file regardless of extension.json
filename [✅]string[deprecated in favor of path] The file to ingest.None
path [✅]UnionType (See notes for variants)Path to folder or file to ingest. If pointed to a folder, all files with extension {file_extension} (default json) within that folder will be processed. This can also be in the form of a URL containing a single fileNoneOne of string,string(path)
read_mode [✅]EnumAUTO

Code Coordinates

  • Class Name: datahub.ingestion.source.file.GenericFileSource
  • Browse on GitHub

Questions

If you've got any questions on configuring ingestion for File, feel free to ping us on our Slack