Table of Contents

Pillars of open science

  • Version control: Git and GitHub
  • Programming: Python and R
  • Data analysis: Jupyter Lab
  • Documentation: Sphinx/Doxygen
  • Software testing: PyTest,
  • Continuous integration: Travis, CicrcleCI
  • Reproducible containers: Docker and Binder
  • Figshare
  • arXiv

Here is a Berkeley course which provides many tutorials for each of these pillars. https://berkeley-stat159-f17.github.io/stat159-f17/

  • For better or worse, you probably want a website at this stage of human evolution, where you can link to free pdfs of your manuscripts, reference code, publish datasets, point people to X project, etc. The fastest (<1 hr), simplest (4 steps), and most elegant way I have come across is through github pages using jekyll themes. The following steps walk you through hosting your new website on your github account, which you will create in step 1 if you don't already have one. N.B. You do not need to install anything locally on your machine (and it is likely preferable not to) regardless if you are using Linux, OSX or Windows. The following steps are sufficient.

    1. Sign up for github if you do not have an account.
    2. Fork a jekyll repository to get an academic template on your account (e.g. fork this repo and your site will look like this).
    3. Rename the repository you just forked to (go to settings in upper right) [username].github.io. [username] is your username from step 1.
    4. Edit the data in the pages directory and config.yml files to suit your needs.
    • Go to your website (which will live at https://[username].github.io).
  • It helps to start any new project with a standard project structure for other and your future self to follow along. Here is an example you might find useful which can be set up with just a couple commands. If you do not yet have python installed see the progamming section.

    pip install cookiecutter
    cookiecutter https://github.com/drivendata/cookiecutter-data-science
  • Need a paper but behind a paywall? Try sci-hub

Data analysis

  • Download Atom. It is a very powerful and free! editor that integrates nicely with github. Use it for writing text, markup, code, scripts, etc.

  • Use jupyter lab for development and for analysis pipelines. Install Kyle Dunovan's jupyter themes to make your notebooks pretty and work faster.

  • Consider version-controlling your data along with your code. See DataLab for more info on sharing and storing your data.

  • Make an "autopilot" script for your analyses, so that figures can be updated in real time. Then write a cron job to execute the analysis script so that newly collected data is automatically integrated perhaps with an email summarizing the results sent to you or your advisor. Example here.

  • Make a startup file for your jupyter notebooks that preloads modules like numpy and scipy and figure specifications so they are consistent and pub ready. The config file can specify font sizes, legends, color themes etc.

  • Learning data science? Here is an extremely well curated series of quick references for data science in python (numpy, scipy, pandas), ML algorithms, probability and more

  • Diving into deep learning? Here is a similar reference for machine learning (ML) and deep learning.

Programming

  • Start using github. It is excellent for version control and for sharing. Consider how many times you have written a script called analysis_v5_final_reallyfinal_thistime_final.py. With github you will just have analysis.py. With github, other researchers can replicate exactly what you did. This will ultimately save you time, if someone emails you for example.

  • If you write software for the use of the greater scientific community, it will be a lot easier for others to port your code and collaborate if you follow a standard set of guidelines when packaging your project for release (e.g. on github). Here is a template to follow written by Ariel Rokem.

  • A lot of open software that is developed for neuroscience runs on either Linux or OSX but not Windows. So consider installing Linux. Ubuntu is a popular distribution that has extensive support if you get stuck.

  • After installing Linux, learn the art of the command line

  • Do you use Matlab? It is worth considering a switch to Python. Python offers simpler syntax, enables system wide interfacing, is open source, free and for these reasons is being used by more and more scientists. Replication is far easier with Python than Matlab.

  • Now want to learn Python?

    • Start by installing Anaconda which is a scientific distribution of python that enables high performance computing and analysis.
    • Everyone in our lab learned the syntax with Learn Python the Hard Way. 52 exercises spanning installing Python to building a web app.
    • Here is a Python Bootcamp notebook that provides excellent advice on learning Python, written by Tom Donoghue.
    • Read the style guide to write "pythonic" code.
    • Package your python project with this amazing guide by Vicki Boykis
    • Learn numpy (a package for scientific computing) with these 100 exercises written by Nicolas Rougier.
    • Become a python data ninja. Thomas Wiecki provides a great introduction to data science in python.
  • Make it easier on yourself and others by writing healthy and clean code. There are style guides you can reference. For example, see this one for python and this one for Bash.

  • Use hotkeys for google, gmail, atom, jupyter notebooks & and bash. Consider a mechanical keyboard so your labmates love you, then hotkey your keyboard to eliminate typing entirely.

  • Not sure how to code something? It may have an answer on stack overflow. Even professional programmers use stack overflow.

  • Access anything or anywhere on your computer with minimal effort using Keyboard launchers like Albert for linux and Alfred for mac.

  • Learn how to simulate data to ensure that your analysis works the way you think it does.

  • A basic understanding of data structures is useful for optimizing larger scale projects.

  • Need to sync files across your various lab computers/clusters and laptop you use at home and don't want to use Dropbox? Use rsync instead. e.g:

    rsync -zavr -e ssh --delete --include '*/' --include='*include_these_files.[ext]' --exclude='*' [local_dir] [remote_server]:[remote_dir]

Generating publication quality figures

  • First read these ten rules for better figures (and accompanying source code)

  • If you followed the programming advice above, you are now convinced that Python is your favorite language. Python has excellent data visualization built off matplotlib and a library called seaborn.

  • Use your plotting software of choice (e.g. seaborn) to get your figure as close to final as possible. Avoid having to make post-edits in illustrator/inkscape which can be a huge time sink as a graduate student.

  • Carefully consider the colors and colormaps of your figures. How would color blind readers interpret your figures?

  • Understand why people hate the jetmap colormap. Read about different colormaps here.

  • If you use Matlab, try out the gramm toolbox, inspired by R's ggplot2.

  • Have a look at the tutorials on flowingdata for excellent data visualization.

  • Save your figures in svg, or eps, not png.

Statistical analysis

  • Learning statistics or want to brush up? Here are three textbooks (available online) that you can choose from depending on the depth you want to explore and mathematical background.

    • Introduction to Statistics -- cover the fundamentals, requires little mathematical background. Another great introduction is Statistics Without Tears by Rowntree.
    • All of Statistics -- more detailed than above, requires calculus and linear algebra
    • Advanced data analysis -- if you dream about distributions, requires substantial statistical background
  • If you are teaching statistics, here are excellent visualizations of core concepts.

  • See this tutorial on machine learning concepts.

  • Learn to love Bayesian statistics, if you don't already. Read this introduction on bayesian vs. frequentist statistics written by Jake Vanderplas, an astrophysicist and python developer.

  • Looking for a Bayesian analysis package? Try JASP.

  • Beware of p-values and null hypothesis significance testing (NHST) the de facto standard in neurobiology, cognitive neuroscience and much of biomedical research:

  • Do you have multi-level data? E.g. do you have some cells from one animal and some other cells from a different animal? Are you pooling the data because they have similar distributions/variance? Instead you might want to consider hierarchical aka mixed effect models. Here is a really beautiful demonstration of this concept.

  • As soon as possible, understand:

    • bootstrapping -- insanely powerful.
    • cross-validation -- enables generalization
    • permutation tests -- gets you a null distribution, sometimes hard to analytically derive in closed form
    • Here are some lecture notes that look at these topics in the context of multivariate pattern analysis in fMRI.
  • Do not let your test data into your training data (i.e. double dipping)

  • Rob Kass, @CMU statistics, has written the extremely useful Ten Simple Rules for Effective Statistical Practice.

Statistics blogs

Artifical vs Biologicial Neural Networks

  • If you are a CS person you may want to know the main theoretical and practical differences between biological and artificial neural networks. Here is a incredibly well put together summary from the Erlich lab.

Writing papers

  • See these Ten simple rules for structuring papers, written by Konrad Kording and Brett Mensh.

  • Omit needless words, suppress the encyclopedic impulse, don't try to sound smart, and other sound advice from a mathematician.

  • Andrew Gelman provides some more general advice for academic writing here.

  • Publish your paper to one of the arXivs. If your PI doesn't support that, convince them.

  • If you are frustrated with writing, read this

  • Share your work with your friends as well as your enemies, the latter might give you even better criticism.

  • Steven Pinker has some interesting thoughts on how to make academic writing better

  • If you are struggling to write scientific papers in word, e.g. embedding equations, consider using Latex (pronounced "Lay-Tech"). Latex allows you to focus on writing rather than formatting.

Giving talks

  • Check out 'The David Attenborough style of scientific presentations' Give a talk by treating your work as a cool story that people will naturally be curious to hear.

  • You need to choose some medium of presenting your slides. It would be nice to always have access to them, to be able to share them with others who might not have your software (e.g. powerpoint) and to be easily viewable on mobile. Here is a cross-platform tool that meets those needs.

  • It is very challenging to give high-quality talks and everyone struggles with it. A lot of academics do not receive training on how to give talks and do not know the most effective ways of presenting information - but this has been looked at. Here are some incredibly useful notes how to prepare the actual content of the slides, and here are some notes on the speaking portion.

  • Here is one talk that might be a design inspiration. How Github uses Github to build Github

  • How to make slides

fMRI

  • Know your neuroanatomy. Julian Caspers, a neuroradiologist, provided a great set of guidelines at the 2017 Organization for Human Brain Mapping conference. You also may find this interactive brain explorer useful.

  • Best practices for reporting fMRI studies.

  • Neuroimaging is easy to do wrong and still get a result. Here are common pitfalls to avoid when running your analysis.

  • It is absolutely critical to know what kind of power you have and what you can conclude from the kind of analysis that you are doing. Here is a useful guide.

  • Standardize your imaging data set using the BIDS format - this will make your data more accessible to both your collaborators and the field at large.

  • As a benchmark, you should be able to write down the general linear model you are using from scratch and solve it in closed form.

  • Understand the difference between univariate and multivariate approaches to fMRI

  • Next learn representational similarity analysis & the related crossnobis distance measure, a powerful framework that can bridge behavior, imaging, and computational models.

  • You will need a visualization tool. A lot of labs have success with MRIcroGL or the connectome workbench. Recently James Gao written an indredibly powerful new tool called PyCortex which uses WebGL to render the flat maps and fiducial surfaces in your browser, you can even project movies on the surface.

  • Improve your understanding of anatomy with the web based user interface for exploring the human brain called Cortical Explorer.

  • Before you get really deep in your design, check out NeuroSynth (written by Tal Yarkoni) to run a meta analysis on your covariates of interest to see what has been done before.

MEG

  • MNE-python is the go-to for source localization and sensor space data processing

ECoG

Analyzing ephys data

  • Most ephys lab use in house analysis routines in (sometimes) relatively closed source and (oftentimes) expensive applications. Pavan Ramkumar @KordingLab has written an excellent open source package for spike data analysis and visualization in Python.

Biophysical and molecular modeling

  • Start here for a variety of software resources on realistic cellular, especially MCell and NEURON.

  • Check out CellBlender for visualization and simulation of realistic 3D cellular models.

  • Keep a digital lab notebook with Benchling, free for academics.

  • Try ApE for creating plasmid maps/visualization of restriction sites and planning experiments.

  • Recreate expensive hardware on the cheap with labrigger

  • Fiji is a free and easy to use image processsor.

  • You will need a citation manager early on, PaperPile is a good one that is well integrated with Pubmed

  • Find articles before they are officially published on arxiv

  • You can search the literature with Pubmed & Google-scholar. Now is a good time to make your own google scholar account if you don't have one. Also, stay on top of your favorite authors' publications with Google-scholar's alerts.

  • Before you start down some major project that you will be committed to for years, understand the current literature in your topic. Understand very clearly why you are going to do what you are going to do.

  • Be skeptical of author's use of the word prediction, often what they really mean is in-sample linear correlation, and not what prediction actually means, out-of-sample generalization of a model. Here Tal Yarkoni provides some insights.

Grant writing

  • Be aware of what grants have been funded in your field, by searching nih reporter. This will tell you what was funded, the program officer, the PI, etc.

Twitter is a great resource for identifying new papers, events, tips etc.

Papers

Meetings

  • Never show up empty handed to meetings with your PI.
  • Have a clear objective to all meetings that everyone else knows as well.
  • Be able to show some evidence of your productivity.
  • You will have some days or weeks where nothing worked. I found that in those cases it is productive to have a "rainy day" folder containing interesting analyses/figures you have not yet shown.

Guides

Pillars of open science

  • The three pillars of open science are open data, open code, and open papers.

Science blogs

  • Mark Humphries will blow up your world at the Spike

Facts about Brains

Acknowledgments

Thanks to contributions from Ran Liu, Annie Homan, Rory Flemming, Daniel Borek, and Matt Boring for making this page more useful.

Labhacks

Resources for data driven neuroscientists.

Labhacks Info

⭐ Stars 61
🔗 Source Code github.com
🕒 Last Update a year ago
🕒 Created 4 years ago
🐞 Open Issues 1
➗ Star-Issue Ratio 61
😎 Author pbeukema