
Alok Chauhan
Summary
Coder | Hacker | Builder | Tech Generalist who thrives on creating innovative tech products. My extensive experience is primarily centered around FinTechs, where I have had the opportunity to build and deploy advanced ML/AI applications that scale. I have a deep-seated passion for developing cutting-edge tech products and have been fortunate to collaborate with talented individuals worldwide.
Skill Highlight
- Complete ML/AI model development life cycle
- Data Pipelining (Dagster.io, AWS Lambda, Google Cloud Function)
- Machine Learning/AI as a service
- Product Analytics
- Web Applications (Python, Typescript, Javascript, Ruby on Rails, REST Apis, GraphQL, TailwindUI)
- Deployment (CI/CD, DevOps, MLOps, Github Actions)
- Cloud Platforms ( AWS & GCP )
Education
Level | Year | University |
---|---|---|
BTech. Computer Science | 2017 | UPTU |
12th Standard | 2012 | CBSE |
10th Standard | 2010 | CBSE |
Core Knowledge & Key Competencies
Machine Learning, Generative AI(openai LLMs), Credit Risk Modelling(PD, LGD, EAD), Data Wrangling, Feature Engineering, Class of Regression algorithms, Ensemble techniques, Bagging and Boosting algorithms, Dimensionality Reduction, Clustering, Natural Language Processing, Web Scraping, Flask-restful Framework, Fast-api
Python, Ruby, Typescript, Javascript, Ruby on Rails, Quasar, SQL, Postgres, AWS, GCP, Docker
Professinal Journey
Fishtail.ai : Feb-2022 to Present
Role : Data Scientist
Led data science efforts, contributing to logistic regression-based risk model and spearheading LLM-PDF-Digitizer. Constructed Python APIs with authentication and deployed to Kubernetes, developed data pipelines with cloud functions. Headed full-stack Ruby on Rails development and containerized Trade Finance app with Docker. Implemented CI/CD pipeline in GitHub Actions, deploying seamlessly to Google Kubernetes Engine.
EXL Services Pvt. Ltd. : Sep-2021 to Feb-2022
Role : Lead Assistant Manager
Part of market modeling team to improve marketing process using Machine Learning and Statistical analysis.
Alphacrest Capital Management LLC : May-2020 to Sep 2021
Role : Data Scientist
Responsibility :
At Alphacrest, my core responsibilities encompass the analysis and manipulation of extensive stock market datasets procured from prominent vendors such as Bloomberg, Factset, ETF Global, etc. Additionally, I actively contribute to the development and enhancement of Python-based libraries, namely “accheck” and “fixSimulator,” focusing on the optimization and streamlining of financial data processing workflows.
Biz2Credit InfoServices Pvt. Ltd : Aug-2018 to April 2020
Role : Data Scientist
Responsibility :
As a Data scientist, I worked on Credit Risk Modelling(PD/LGD/EAD/Scorecard), various statistical studies like Hispanic/Non-Hispanic and Women/Non-Women owned businesses and gandalf decision engine to check credit worthiness of a customer using Python, SQL and Excel.
TPF Technologies Pvt. Ltd : Jun-2017 to Mar-2018
Role : Machine Learning Engineer
Responsibility :
- Started my professional journey here.
- I learned Web Scraping using python, Statistical and Machine Learning techniques like Linear Regression, Logistic Regression, Random Forest & XGBoost.
- As an MLE, I worked on User Attrition/Churn Prediction and Sentiment Analysis for mobile gaming apps like Epic Cricket from Nazara games and JungleBook from GoLive game studios.
- Used data visualizations like Cohort Analysis , Sunburst Chart , Sankey Chart to analyse user journey in an App.
- Created a web scraping bot to scrape prices of mobile devices from Amazon.in
Most Recent Projects
LLM-PDF-Digitizer(GPT-3.5-16k Context)
- Engineered a Vue.js-based User Interface capable of processing various types of PDFs (e.g., buyer invoices, supplier invoices, commercial invoices, loading lists, etc.).
- Implemented text extraction from PDFs utilizing Google Cloud Vision.
- Developed a classification system to categorize parsed text, enabling the selection of an appropriate JSON schema.
- Utilized the LLM model (GPT-3.5 with 16k context from OpenAI) to process parsed text and JSON schema, generating a structured JSON output containing relevant data.
- Established automated deployment procedures on Google Cloud Run, triggered by the merging of feature branches into the main branch.
Marketing Compliance Specialist AI
- Led the development of an AI system ensuring marketing materials comply with laws and policies.
- Engineered a RAG pipeline(loading, indexing, storing) and saved to vector database(weaviate) with updated laws, regulations and policies.
- Employed OpenAI’s GPT-3.5-turbo and integrated it with vector database to identify non-compliant marketing materials and activities.
- RAG model also suggests compliant alternatives to improve market campaigning.
Shipment Container Tracker Service
- Engineered a dedicated service for real-time tracking of containers linked to specific bills of lading.
- Implemented a resilient scraping infrastructure designed to retrieve container events from key carriers, including MSC, Hapag-Lloyd, Maersk, Cosco, and other industry leaders.
- Established background job processes to systematically update container events, ensuring the availability of the latest information.
- Integrated the AIS (Automatic Identification System) third-party API to fetch and incorporate real-time vessel coordinates while traversing the ocean.
Credit worthiness of a borrower (Credit Risk Modelling)
- Data preprocessing involved univariate and bivariate analysis
- Feature transformed into Weight of Evidence
- Features selection based on IV score and Embedded method like Random Forest
- Added rejected inference to remove biasness from data
- Used logistic regression to calculate Probability of default
- Used AUC ROC as evaluation metric with score 0.74
Fix Simulator
- Fix simulator is a simulator version of a Stock Exchange.
- FIX4.2 protocol used for Server Client communication.
- Client sends NewOrder/Replace Order/Cancel Order request to the server.
- Server processes order requests and send fills to client based on the rule NewOrder has.
- A rule defines how an order should be filled to the client.
- Time based algorithms like TWAP and VWAP were used for time based rules.
- Server has two parallel processes running, one receives order and another process them.
code
SlackBot
- SlackBot is a slack channel where user can drop queries.
- Query results are replies from bot as a message.
- Bot helps to get current status of servers, machines and data from vendor.
- Used slack api along with python.
- Used mongodb as database to push server and machines data.
ACchecker
- “ACchecker” has two components datachecker and alphachecker.
- Datachecker component reads vendor data files from data dropbox.
- Alphachecker component reads alphas created by researchers from alpha dropbox.
- Both components infinitely listen for new files and start execution in multiprocessing.