This January I completed a project I’ve had in mind for quite a while, Pente. It is a website on which my fellow students at the University of central Greece can share old exam questions in the form of photographs.
Students usually want to focus their studies on stuff that is more likely to be asked at the final exams so they keep hard copies of the questionnaires from previous years or ask for photographs from older students in order to find out what they should study more. Other universities have digital libraries that get updated every semester with old exam questions, but our department doesn’t, so students resort to trading the questions at a facebook group.
That’s what I wanted to create, a searchable website on which every student can share a questionnaire’s photo.
Naming is hard so I decided to name my project something students would remember easily and it would be at least a little bit funny. Pente means five, the lowest grade a student can pass a course with. You will always hear “I just need a 5”, so yeah not that hard to remember..
- Users should be able to upload images
- No user accounts needed in order to simplify the process
- The system should identify unrelated images and reject them without human intervention
- All images should be categorized according to course and year for easy navigation
- Some sort of searching
- I should learn at least one new library/technology
- NodeJS & Express for the backend
- MongoDB as a database(wanted to try mongo for a while now)
- Bootstrap/jQuery for the frontend
- Imgur for image hosting
- Clarifai for image analysis
I wanted to start this project during the summer of 2016 but until January, due to delays from other stuff, I only had some code for uploading images to the Imgur API. I started the analysis and coding during the Christmas holidays.
First I composed a list with all the courses and then created the upload page so that when the users uploaded their images they could also specify the course and year. Goals #1 and #4 - check
The next and main problem I wanted to tackle was how I could prevent unrelated images since the users would be anonymous. After some thought I decided that the best way would be some sort of image analysis.
I had 3 ideas, the first one was to compare(hamming distance or something) every new image with a stored one and if it differed a lot to reject it, the second idea was to analyze each new image’s histogram and decide if its a piece of paper or not and the last idea was to use a 3rd party computer vision API like Google’s cloud vision, Microsoft’s cognitive services etc. I have already worked with similar solutions as the first two ideas in the past so after an hour or so of trying them out I decided to go for the 3rd one that would be something new. I picked Clarifai’s API because it seemed pretty easy to work with and it had a great free plan that would cover my needs for sure.
Clarifai offers a predict endpoint that when given an image will respond with concepts found in that image. I uploaded few valid images and checked the returned concepts, the similar ones were paper, text etc. So every time a new image comes in, it gets forwarded to clarifai, the server checks if all needed concepts are included and all the unwanted ones are missing and then uploads the image to Imgur.
My friend Kounas(notorious for uploading inappropriate images everywhere and the #1 reason I needed a filter like this!) tested the upload filter with quite a few NSFW images from his collection and few handmade ones but he got wrecked, I could now check goals #2, #3 and #6!
Next was the database, I created some init scripts that would insert all the courses and create some metadata-like collections. Writing the read/write code for mongo with Monk was quite pleasant but designing the system architecture around NoSQL wasn’t as fun as I had imagined.
After some changes on the frontend and a lot of tweaking on the codebase I was ready to go live. I got my domain, a new VM from Azure, learned how to setup Nginx as a reverse proxy and I was ready to go! I contacted some friends and with their help uploaded the first batch of photos on the site so the rest of the users could immediately see the value of using the site. Everything was set so I posted on the department’s student facebook group. A lot of people found it pretty useful and also uploaded photos, a few of them also contacted me in order to offer their help!
The website went live almost a week before the final exams started(23/01). Since then there are more than 1000 unique visitors, around 100 sessions per day(during the finals) and 172 uploaded photos for 33 courses. Given the limited target group I’m very happy with the resulting stats.
There were no filtered images at all, just 2-3 false positives. Maybe I could have saved some time just banning Kounas’ IP address instead of using image analysis..
After some custom analytics I found out that only a tiny portion of the visitors(~10) use the instant search and almost all of them scroll to find the course they want! That was a little disappointing(and a big surprise!) because I spent a good amount of time implementing it and was pretty sure it would be useful.
- Handlebars template engine is great!
- I shouldn’t have used MongoDB since my data are quite relational, that’s one of the choices from this project that I regret the most. Maybe I don’t get how I should use NoSQL DBs!
- Don’t waste time implementing features that you are not sure they are necessary. YagNi
- Image analysis has advanced a lot! Services like Clarifai and Microsoft’s cognitive services are very interesting and easy to use tools.
I finished my studies at TeiSte so I didn’t personally use the tool at all but hopefully the rest of the students will benefit from it until the university buys an overpriced alternative ;)
The project’s source code is published on GitHub under the MIT license.