This page contains a set of resources that we have curated as a research group.
ML has never been easier to get started with as a Computer Scientist due to the prevalence and availability of large datasets and GPU hardware.
A surge in comercial and academic interest in the field of machine learning has spawned a plethora of frameworks such as Tensorflow that perform a lot of the heavy lifting for us with respect to implementing ML algorithms.
Without a proper understanding of the theory behind ML algorithms, debugging models derived from existing work, designing new models, and tuning their performance becomes an operation on a "black box". In order to be successful in ML research you must strive to gain a deeper understanding into how and why the ML algorithms we have work the way they do, and through this gain intuition into how we should build and improve upon these methods.
How to find papers online, without getting stuck behind a paywall
Getting access to scientific papers online can be tricky. Here are some tips and resources to help you with your literature review and staying up to date with new developments.
Tunnel your internet connection via the university internet
The university has subscriptions to all major journals meaning if you are physically accessing the internet from the university paywalls seemingly vanish.
If you are working from outside of the university then connecting to an on-campus machine via remote desktop (RDP, teamviewer, ect) or via port forwarding (ssh, ssh tunnel, ect) can allow you to access papers that would otherwise be behind a paywall.
Find a direct PDF link to the paper
The most important skill you will need in research the is to be able find and consume as much of the relevant literature as you possibly can. Search engines provide syntax that allow you to fine tune what you are searching and looking for.
Knowing how to phrase a question or specify a request for an opaque resource in the language of the search engine will allow you gather relevant papers and resources for your research more efficiently. You are also less likely to get caught behind journal paywalls as your queries can made to look for direct pdf copies released by the authors.
Resources available online which reference to freely available literature.
9 Seminal Deep Learning Papers
Seminal works on CNN development with a focus on image classification, segmentation, and object localization. These are the core methods that newer state-of-the-art methods all build off of. Read all of these.
Alexander Jung's repository of paper summaries
A researcher has been compiling a github repository of summaries of the literature they read. Consistent formatting, accurate and meaningful summaries, sensible curation of the reported results and figures. As an example here is a summary of the Wide Residual Networks Paper (2016). The intention here is NOT to avoid reading the described paper, but to introduce yourself to the concepts at play to prempt the terseness of the full paper.
Deep Learning Field Roadmap
Cited chronology of highly influential papers in different sub-fields with download links to pdf's.
Good Review Paper
A review of the field written by leading scientists in the field Yann Lecun, Yoshua Bengio, and Geoffrey Hinton.
Free to download as
Foundation in core techniques and fundamental mathematics for machine learning. Start here:
- The Hundred-Page Machine Learning Book - Burkov
- Introduction to Applied Linear Algebra: Vectors, Matrices, and Least Squares - Boyd and Vandenberghe
More advanced books by leading researchers, offering a deeper look at the theory of machine learning:
Other interesting topics to learn about:
This is now deprecated. Learn Tensorflow 2.x, not 1.x.
A demonstration of using
keras API ontop of
tensorflow v1.x to load a VGG16 CNN model with weights pretrained on the
imagenet dataset. It shows using the pretrained network to classify several images and plot the top-5 class labels by confidence. [Code]
- Python, download python 3.6 as we know this version works with the libraries we will use. Installing Python, also installs
pipalong with it. This is a package manager for installing and updating Python libraries.
- Jupyter Notebooks as a way of writing and distributing readable datascience experiments written in Python.
- Here is an example Jupyter Notebook where I explore part of the CelebA dataset. This can be viewed by anyone in the browser and contains the output of the code after each cell. You can also download and run this notebook yourself, executing all of the code should give you the same results. This makes it much simpler for other researchers to understand, explore, and validate your results.
- Markdown which is a lightweight markup language for writing text with basic formating. In Jupyter Notebooks you can have a Markdown cell in between Python code cells to describe what part of your code does in a nicer format than Python code comments.
- TensorFlow, released by Google aimed towards neural networks. This abstracts the computation you want to perform from the hardware you would like to use to compute it. This makes accelerating your code with a GPU much simpler.
- Keras, a high level framework built on top of Tensorflow that makes many common things much simpler to implement and get working. Already installed with tensorflow now... Use
- LaTex as a typesetting and document preparation tool. Use this to write your dissertation documents as they will look professional and polished.
- GitHub and Git as a version control system.
The first steps I would take are to...
- Install required IDEs, toolboxes, etc.
- Start up 2 new LaTex documents, these will act as your lab-books over time. One for theory, and one for programming.
- Start some background on machine learning principles from Bishop’s book, make your own notes on the material in your "theory" lab book to ensure that you understand the concepts.
- At the same time, take refresher courses on Python. Again record details in the "code" lab-book, note syntax for certain applications, how to create loops, functions, etc...
- Once you are confident in Python and associated tools, you can move on to looking at the deep learning theory and the use of Keras and TensorFlow.
- Start with Keras, it is built on top of Tensorflow. Unless you are implementing fundementally new or exotic model architectures Keras should do what you want.
Packages to install for Deep Learning
Not all necessary, but most useful.
From pip install
pip install package_name
||Numpy is a CPU bound maths processing library. When Tensorflow actually outputs results back to python it converts them to numpy arrays.|
||Lots of useful dataprocessing and statistical functions which operate on numpy arrays.|
||Jupyter Notebooks and Lab IDE.|
||A Python HDFS file system allowing you to load and process large datasets off of disk efficiently.|
||A graph visualization tool (mathematical graphs, not line plots or bar charts).|
||A plotting library for actually showing line plots, bar charts, images, ect. Works nicely within jupyter notebooks!|
||Useful dataprocessing and machine learning library.|
||Like scikit-learn, but aimed more towards image processing.|
On Windows From Python Binaries
pip install "some_file.whl"
||Numpy compiled with Math Kernel Library (MKL). Offers better efficiency than the version installed by PIP on Windows.|
||Sometimes cannot be installed via PIP, install directly from precompiled wheel file instead.|
The remainder of this page contains a curated list of resources and datasets which have been made available for conducting machine learning and computer graphics research.
Each dataset is hosted by different companies, insitutions, and research bodies and should not be used without first ensuring that your use case falls under the guise of the licenses they are released under.
Machine Learning Datasets (mostly image datasets)
|ShapeNet||Labeled 3D meshes.||https://www.shapenet.org/|
|CIFAR-10 and CIFAR-100||
|Tiny Images Dataset||
|Cityscapes Dataset||High resolution RGB images of drivers point of view within