18 Comments
User's avatar
Melanie Goodman's avatar

I’m especially fascinated by how edge computing is levelling the playing field - multimodal power without the cloud bill? That’s a shift worth watching.

Expand full comment
Ashwin Francis's avatar

Very soon we will be able to run private secure LLMs on our phones

Expand full comment
Julia Diez's avatar

L10N would benefit loads of an AI that could make the correct interpretation of strings within its visual context.

Expand full comment
Ashwin Francis's avatar

Completely agree Julia, would help L10N teams capture the nuances of new markets through visual mediums

Expand full comment
Modern Mom Playbook's avatar

Really enjoyed this one 👏 I loved how you broke down alignment vs. fusion with such clear analogies (the food examples really clicked!). What stood out to me is how attention-based fusion feels closest to how our brains prioritize signals depending on context.

On the use case side, I think education could be huge here.

Expand full comment
Raghav Mehra's avatar

Thank you, happy to know you enjoyed the article! The food analogy does it makes easier to understand the processes (and perhaps makes one hungrier). I also think that multimodal capabilities can revolutionize education and EdTech

Expand full comment
Modern Mom Playbook's avatar

I think it increased both my understandability and my hunger :D

Expand full comment
Ashwin Francis's avatar

Thank you, so glad you found it helpful. I based the analogy on my love for take-out food 😅

Exactly, attention-based fusion is as close as we get to how our brains think, assigning priorities to different inputs depending on the situation and context.

That would a really cool application!

Expand full comment
Patricia's avatar

My AI program shows empathy when I ask it a question about cancer. Im not even complaining, just asking a question. A lot of humans show no empathy. I was surprised. It can be a useful tool if used right. Im just starting to play with it. So far, I'm fascinated

Expand full comment
Ashwin Francis's avatar

Thanks for sharing your experience Patricia. AI is trained on a large corpus of data, when you mentioned cancer, it might have tapped into conversational data between patients and medical professionals to understand the tone and nuances used.

Its alarming that now robots show more empathy than humans

Wishing you strength and good vibes for your journey

Expand full comment
Patricia's avatar

Thank you. Yes it surprised me

Expand full comment
Raghav Mehra's avatar

Good to hear your thoughts, Patricia! Wishing you all the strength and happiness in your life :)

Expand full comment
Luke Griege's avatar

Frankly, when are we going to establish a new touring test? At this point, AI can effectively pass it without a problem. But, we now have a new version of this standard not by which we gauge if it's a human or not, but if it's AI or not.

Expand full comment
Raghav Mehra's avatar

Agreed, Luke! I think the turing test will need an upgrade every time there is a new AI breakthrough. The line between real and synthetic grows slimmer each day.

Expand full comment
Ashwin Francis's avatar

We still have the HLE, its still infamous for breaking the toughest AI models out there. I wrote a note on it a while ago: https://substack.com/@ashwinfrancis/note/c-145143858

But your point is still valid, we need strong regulatory frameworks around AI generated content

Expand full comment
Hina Gondal's avatar

Well explained!

Expand full comment
Ashwin Francis's avatar

Thank you Hina!

Expand full comment
Suhrab Khan's avatar

Multimodal AI is next-level! Combining text, images, and audio lets machines understand context like humans. Real impact in healthcare, autonomous driving, and beyond.

Expand full comment