Hi everyone,
I’m working towards preparing a Green DiSC application for the HetSys CDT at University of Warwick.
There’s currently a strong interest in computational materials science to ensure reproducability, so the datasets based on which a publication is written need/should be provided alongside the publication. Quite often, datasets are stored in large online data repositories that are not part of the university resources.
My questions now are:
- Should the datasets published alongside publications on data repositories also be part of the data inventory?
- If so, will the size of stored data become relevant at any point? The reason why I am asking is because it is not possible to see the size of datasets on some repositories, which in turn would mean dataset sizes need to be recorded before uploading (or it’ll need to be organised that repositories provide this information).